• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1% Computer Aided Hyphenation for Italian and Modern Latin
2% by Claudio Beccari
3% Dipartimento di Elettronica
4% Politecnico di Torino
5% e-mail beccari@polito.it
6%
7\documentstyle[ltugboat]{article}
8\title{Computer Aided Hyphenation for Italian and Modern Latin}
9\author{Claudio Beccari}
10\address{Dipartimento di Elettronica\\Politecnico di Torino\\ Turin, Italy}
11\netaddress{beccari@polito.it}
12%
13% New environment "comment"
14%
15\newenvironment{comment}{\begingroup\setbox0\vbox\bgroup}{\egroup\endgroup}
16%
17% New environment to typeset on three columns
18%
19\newenvironment{trecolonne}{%                   Opening commands
20\hbadness=10000 \vbadness=10000
21\widowpenalty=0 \clubpenalty=0% Necessary to counteract ltugboat.sty settings
22\dimen0=\textwidth \advance\dimen0 -2\columnsep
23\divide\dimen0 by 3
24\setbox0\vbox\bgroup\hsize\dimen0\parindent 1em
25bbbb\par
26}{%                                             Closing commands
27\par\egroup
28\setbox0=\vbox{\unvbox0\null}
29\setbox6\vsplit0 to \baselineskip
30\count255\ht0 \divide\count255 by \baselineskip \divide\count255 by 3
31\dimen2=\count255\baselineskip \advance\dimen2\topskip
32\global\setbox2\vsplit0 to\dimen2
33\setbox2\vbox{\unvbox2}
34\ifdim\ht2<\dimen2 \setbox2\vbox{\unvbox2\vsplit0 to \topskip}\fi
35\global\setbox4\vsplit0 to\dimen2
36\setbox4\vbox{\unvbox4}
37\ifdim\ht4<\dimen2 \setbox4\vbox{\unvbox4\vsplit0 to \topskip}\fi
38\setbox2\vtop{\unvbox2}
39\setbox4\vtop{\unvbox4}
40\setbox6\vtop{\unvbox0}
41\noindent\box2 \hfill \box4 \hfill \box6}%      End of definition
42%
43\let \italiano\relax \let\latino\relax
44
45\hbadness=5000
46\begin{document}
47\maketitle
48\section*{Abstract}
49After  an essential historical sketch of the evolution of latin into italian
50and modern latin, the peculiarities of both languages are described so as to
51understand the philosophy of the hyphenation patterns. The latter are one of
52the few cases where the same set serves two different languages.
53
54\section*{Sommario}  Dopo  aver delineato brevemente l'evoluzione del latino
55verso l'italiano e il latino moderno, vengono descritte  le  caratteristiche
56delle  due  lingue  in  modo  da  capire  la  filosofia dei {\it pattern} di
57divisione in sillabe. Questi  {\it  pattern}  costituiscono  uno  dei  pochi
58esem\-pi applicabile a due lingue differenti.
59
60\section*{Summarium}
61Latini sermonis evolutione ad italianum et la\-ti\-num
62modernum breviter exposita,  utrius  sermonis  spe\-cie\-ta\-tes  descriptae
63sunt  ut  philosophia  de  {\it pattern} ad syllabas dividendi intelligatur.
64Isti {\it pattern}  duobus  differentibus  sermonibus  applicabile  exemplum
65sunt.
66
67\section{Outline of historical evolution}
68Classical  latin  as we study it in schools and universities is the language
69that was used, especially in written form, by the authors of the  republican
70period  and  of  the  very  beginning  of  the empire. Common people spoke a
71similar language that was open to the contribution of new words  from  other
72countries,  to  new  constructs  and  to  a  general  simplification  of the
73inflection of nouns, adjectives and verbs.
74
75Cicero  himself  was complaining about the fact that common people (the {\it
76vulgus\/}) used to shorten the desinences leaving out the final  consonants,
77and  used to palatalize the `c' and `g' followed by the front vowels `e' and
78`i'. Those were the first signals of  the  autoctonous  evolution  of  latin
79towards  the modern language; in the other parts of the Roman empire similar
80evolutions were going on with a stronger influence of the  native  languages
81over    which    latin   had  superimposed  itself;  the  invasions  of  the
82``barbarians'' brought in peculiar  pronunciations  and  a  lot  of  lexical
83additions.
84
85Latin  decline was very slow because it was the scholar's, the chancellor's,
86the notary public's language for many centuries, and it was and still is the
87official  language  of the Roman Catholic Church; latin, in its modern form,
88is the official language  of  the  Vatican  State,  and  the  daily  Vatican
89newspaper,  {\it  L'Osservatore Romano,} is published mainly in italian, but
90with frequent contributions in latin, even commercial adds! Modern latin  is
91used  even  for  comics  books; I suggest Snoopy \cite{snoopy}, Mickey Mouse
92\cite{MMouse}, Asterix  \cite{asterix}\footnote{The  former  two  books  are
93intended  as  didactic  aids for teaching latin, and are fully accented with
94both prosodic and rythmic marks.}.
95
96Nowadays  latin  is  studied  in many countries as a regular subject both in
97high school and in  universities;  in  Italy  it  is  not  classified  as  a
98``foreign''  language  and  is  a  compulsory  subject both in classical and
99scientific {\it licei} (high schools). In the  past,  latin  was  even  more
100important  in the education of young people; forty years ago I started latin
101in sixth grade and had eight years  of  it  through  junior  high  and  high
102schools\footnote{I  frequented  the  {\it  liceo classico} and had also five
103years of classical greek; now I have  an  engineering  degree  and  I  am  a
104professor  of  electric circuit theory. I am very glad I had the opportunity
105of completing my education by studying humanities for so long,  and  I  wish
106the new generation could have the same.}.
107
108From  the common people's language of the first century several regional and
109local dialects evolved; in 960~A.D.\ there is the first document  explicitly
110written  in  what  we  might already call italian \cite{migliorini}; several
111documents, mostly including poems, were produced in the following centuries,
112and by the end of the thirteenth century the masterpiece of Dante Alighieri,
113the {\it Divina Commedia}, is  considered  the  main  landmark  of  the  new
114language,  that  was already so mature as to be used in a poetic treatise of
115history, philosophy and theology.
116
117The  modernization  of  Dante's  language  took  place during the past seven
118centuries, but compared  to  modern  italian  there  is  not  such  a  great
119difference  as  between  the language used by Chaucer in his {\it Canterbury
120Tales} and modern english; today's italian high  school  students  can  read
121Dante's  poem  and  other even older texts with no more difficulty than that
122required by any other conceptual text.
123
124\section{Alphabet}
125Italian  and  modern  latin  use the 26 letter alphabet that everybody knows
126with the name of {\it latin alphabet\/}; actually there are some fine points
127to consider with due attention.
128
129\noindent  {\it  Italian.}  The  letters  J, K, X, Y, and W are used only in
130technical terms and symbols, foreign names and some very specialized  words,
131such  as  the international word {\it taxi}. J, K and Y survive in toponyms,
132family names, and english style nick  names,  such  as  Stefy  for  Stefania
133(Stephanie).  The  letter J (see also below) used to be employed in the past
134as a graphic device to distinguish the semivowel role of the  letter  I,  so
135that  you have {\it Ajmone} (family name) and you may write {\it Iugoslavia}
136(modern spelling), {\it  Jugoslavia}  (old  fashioned  spelling),  and  {\it
137Yugoslavia}  (international  spelling)  according  to  your  preference;  in
138italian all three are correct and are pronounced exactly the same way.
139
140Besides the above mentioned letters, there are five vowels, none of which is
141mute: {\it a, e, i, o, u}, fifteen consonants: {\it b, c, d, f, g, l, m,  n,
142p,  q,  r, s, t, v, z}, and one diacritical letter: {\it h}. The latter does
143not correspond to any sound and is used only to mark half a dozen  words  in
144order  to  distinguish them from similar ones that sound the same but have a
145different meaning, to  mark  some  interjections,  and  to  mark  the  velar
146pronunciation  of  `c'  and `g' when otherwise they would be palatalized.
147
148\begin{comment}
149In  total  there  are  26  signs,  but,  in  spite  of  the modern times, in
150elementary school they keep teaching that the italian alphabet  contains  21
151letters;  in  facts  they  are having troubles with the children with exotic
152names such as John, Katia, Xenobia, Yuri, Walter and the like; these  names,
153due  to  the  influence of the mass media, are now very common (well\dots, I
154know just one person named Xenobia, but there are other names containing the
155letter `x').
156\end{comment}
157
158Except    for  a  dozen  among  articles,  prepositions  and  adverbs  (that
159nevertheless are used quite often), all common words in italian end  with  a
160vowel;  of  course  this  statement  does  not  apply  to  trade  marks, not
161assimilated foreign words, technical terms, and the like.
162
163Another  peculiarity  is that every consonant may occur in its doubled form,
164and this corresponds to its  reinforcement  when  the  double  consonant  is
165pronounced.  There  are  rare  instances of double vowels, but in this case,
166contrary to what happens in english, they form different  syllables  instead
167of  a  diphthong;  for  example,  {\it  zoologico}  can  be  divided in {\it
168zo-o-lo-gi-co}.
169
170\noindent  {\it Latin.} Classical latin missed J, U, and W, while V was used
171throughout wherever now U or V are  used.  Since  the  very  beginning  this
172anomaly  was passed by the scholars on into the spelling and printing of all
173languages; capital V was used in all circumstances, while `v'  was  used  in
174printing at the beginning of words and `u' in the middle or at the end. This
175confusing habit was common to all western languages but fortunately  it  was
176abandoned  starting  in  Holland  during  the sixteenth century; it lasted a
177little more in Italy because of the wide use of latin,  but  was  eventually
178done  away by the end of the seventeenth century. When Knuth \cite[reference
179106]{knuth} cites Pacioli's {\it Diuine Proportione}, published in Venice in
1801509,  he reports that title with the spelling of the original printing, but
181the pronunciation at that time already implied the consonant  V  instead  of
182the vowel~U.
183
184In the middle ages and in the early times of printing there was the habit of
185using `j' instead of `i' in those  cases  where  the  letter  `i'  formed  a
186diphthong  with  the  following  vowel;  it  was  just  a  graphic  trick to
187distinguish the two roles of the letter `i', and it was so  successful  that
188it was adopted also in other languages; this is the reason why even today we
189spell {\it junior} instead of {\it  iunior},  although  the  latter  is  the
190formal latin spelling.
191
192Modern  latin  uses  both U and V in the proper positions, while J and W are
193used only in foreign names and toponyms.
194
195There  are  six vowels: {\it a, e, i, o, u, y} and eighteen consonants: {\it
196b, c, d, f, g, h, k, l, m, n, p, q, r, s, t, v, x, z}.  The  ligatures  {\it
197\ae,  \oe}  do  not  belong  to latin; they were introduced in the sixteenth
198century in France and in England, and after  that  they  enjoyed  a  certain
199popularity  also  in  latin,  but  in  modern usage, as well as in classical
200latin, these two diphthongs are spelled with separate letters.
201
202\section{Accents}
203{\it  Italian.} In italian accents are used very sparingly; it is compulsory
204to mark with a suitable accent the last vowel of polysyllabic oxitone  words
205(those  that receive the stress on the last syllable), and to mark some well
206known and specified monosyllabic words that contain  a  diphthong.  This  is
207standardized by the Regulation UNI~6015 \cite{6015}.
208
209Contrary to spanish and portuguese, in italian there is no necessity to mark
210proparoxitone words with an accent, although the best grammars recommend  to
211do  so.  In  practice,  if  you  exclude  oxitone  words  (where accents are
212compulsory) and paroxitone words (where accents are not required), the other
213ones  {\it  may}  be marked with an accent only when a different position of
214the stress might change the meaning; for example {\it l\`avati} means  `wash
215yourself'  while {\it lav\`ati} is the masculine plural of `washed'; in this
216circumstance it is advisable to mark the first case  unless the  meaning  of
217the rest of the sentence does not make clear which case is implied. Although
218the `Sommario' of this article contains five proparoxitone words, no accents
219were used.
220
221The  accent  can  be  used  also for denoting the open or closed nature of a
222vowel (only for  tonic  `e'  and  `o'),  but  this  use  is  found  only  in
223dictionaries and grammars; a good grammar will certainly point out that {\it
224c\`olto} (picked up) is different from  {\it  c\'olto}  (educated),  but  in
225practice  the  meaning  is  determined  by  the  context  while  the  actual
226pronunciation very strongly depends on the regional origin of the speaker.
227
228The  grave (\`{}) accent is used on any vowel, while the acute (\'{}) accent
229may be used only on the vowel `e'  (and  on  the  vowel  `o',  but  only  in
230optional  situations) when it has a closed sound. Most Italians are not even
231aware of this choice; when they hand write, they just put any kind of  small
232surd  on  the vowel to be accented, and by so doing they intend to mark only
233the stress; the tonic value of the accent is used only in  dictionaries  and
234grammars, while in printing the difference is maintained only for the letter
235`e' in oxitone words more as a tribute to the tradition than for  an  actual
236semantic  necessity.
237
238\begin{comment}
239Some  fancy character sets have both accents merged into a single horizontal
240bar.
241\end{comment}
242
243When  the  accent  is  compulsory  and  upper  case letters are used, if the
244character set does not contain accented vowels, it is  accepted  to  use  an
245apostrophe:  UNITA' (unity) in place of UNIT\`A; this practice is considered
246bad style in typesetting, but is used quite often in advertising.
247
248The  diaeresis (\"{}) and the circumflex (\^{}) are not used anymore; in the
249past the diaeresis was  used  in  poetry  to  split  a  diphthong,  and  the
250circumflex   had  several  meanings  such  as,  for  example,  to  mark  the
251contraction of two `i' into one sign in those  plurals  that  centuries  ago
252were  spelled  with a double `i': {\it studii} (studies, two centuries ago),
253{\it stud\^\i} (one century ago), {\it studi} (modern).
254
255
256\noindent{\it Latin.} In latin no accents are used; the breve (\u{}) and the
257long (\={}) accents are  used  only  in  dictionaries,  grammars  and  where
258prosody  is  dealt  with. The diaeresis is sometimes used in grammars and in
259prosody to mark the splitting  of  a  diphthong:  {\it  a\"er}  (air),  {\it
260po\"eta} (poet).
261
262\section{Apocope and aphaeresis}
263{\it  Italian.}  In italian the dropping of one or more initial letters in a
264word (aphaeresis)  takes  place  only  in  poetry  and  is  marked  with  an
265apostrophe preceded by a white space.
266
267The  loss  of one or more terminal letters in a word (apocope) either is not
268marked at all (see in the `Sommario' {\it aver} in place of  {\it  avere\/})
269or  it is marked with an apostrophe when it corresponds to a vocalic elision
270(see above {\it l'evoluzione} in place of {\it la  evoluzione\/})  or  to  a
271complete  syllabic  apocope.  The  latter  case  is  very unusual, while the
272vocalic elision is very frequent, so that  this  case  must  be  taken  care
273properly  when  dealing with hyphenation; the rules stated in the Regulation
274UNI~6461 \cite{6461} require that when a line ends with an apostrophe,  this
275{\it  must  not}  be replaced back with the vowel it originally replaced. In
276the past, not too long ago, for example when I was in elementary school, the
277opposite  rule  was in use, so that there are occasional discussions between
278the old styled generation and the new one. Nevertheless even to  day  it  is
279considered  bad  style  to  end a line with an apostrophe, and in typography
280this practice is tolerated only when the line width is quite  small,  as  in
281the daily newspapers narrow columns.
282
283\noindent{\it  Latin.} I do not know of any case where apocope or aphaeresis
284are marked in any visible way; actually I am  almost  sure  that  these  two
285spelling behaviours are not legal in latin.
286
287\section{Diphthongs}
288{\it  Italian.}  In  italian  a diphthong is formed by any vowel preceded or
289followed by an {\it unstressed} closed vowel (`i' or `u'); so we have:
290 \begin{center}
291\it  ia, ie, io,  ai, ei, oi  \\
292     ua, ue, uo,  au, eu, ou  \\
293             iu,  ui
294\end{center}
295
296Italian  diphthongs  are  always  pronounced  maintaining  the sounds of the
297individual vowels, and the closed vowel plays the role of a semivowel  or  a
298glide.
299
300There  are  also  groups  of  three  vowels that contain two semivowels or a
301semivowel and a glide:
302 \begin{center}\it
303iuo, uie \\
304ieu, uoi, iei
305\end{center}
306
307An  `i'  (possibly also an `u', but I can't find examples) surrounded by two
308open vowels behaves always as  a  semivowel,  so  it  always  starts  a  new
309syllable.
310
311\noindent{\it Latin.} In latin there are more or less the same diphthongs as
312in italian with the addition of
313\begin{center}\it
314ae,  oe
315\end{center}
316that  one or two centuries ago were written with the corresponding ligatures
317{\it \ae, \oe}; in modern latin the pronunciation of both  these  diphthongs
318is  given  by  a  single  open `e'\footnote{I have seen a reproduction of an
319italian book printed in Venice in  the  sixteen  century  where  both  these
320diphthongs  where  replaced  by  their  sound  given  by  the  letter `e'.}.
321Furthermore in some words of greek origin, latin may have the diphthong {\it
322yi}, for example {\it Harpyia} \cite{manna}\footnote{One might think that it
323would be the same to consider the vowel `y' and the  diphthong  `ia',  since
324the  pronunciation would be practically the same; but if you look at it from
325the prosody point of view, the  situation  becomes  completely  reversed;  a
326diphthong  is  always  long  while  `y'  is always short, so that in prosody
327Har-pyi-a becomes \={}\={}\u{}, while Har-py-ia becomes \={}\u{}\={}.}.
328
329The main difference between italian and latin common diphthongs is that {\it
330ia, ie, io, iu} behave as such in latin only when they are at the  beginning
331of  a word or are preceded by another vowel; in any other case they are part
332of two different syllables; in italian they are always diphthongs provided
333the `i' is unstressed.
334
335\section{Di- and trigraphs}
336{\it  Italian.}  In  italian  there  are groups of two or three letters that
337imply a sound that is  not  implied  by  any  other  single  letter  of  the
338alphabet; besides `c' and `g' modified with the diacritical `h', and `c' and
339`g' modified with a diacritical `i'\footnote{In this  case  the  letter  `i'
340does  not  form  a  diphthong  with  the following vowel but is used just to
341palatalize the two consonants; under the  hyphenation  point  of  view  this
342subtle difference may be ignored.} there are
343 \begin{center}\it gn, gli, sc \end{center}
344where  {\it  gn}  is pronounced as in french, or as the spanish {\it \~n} or
345the portuguese {\it nh\/}; {\it sc} is pronounced as the  english  {\it  sh}
346when is followed by a front vowel `e' or `i', and {\it gli} is pronounced as
347the portuguese {\it lh} when it is not preceded by `n' and  is  followed  by
348another  vowel  or  it is at the end of a word. These digraphs and trigraphs
349must not be split by the hyphenation process.
350
351\noindent{\it  Latin.}  In latin by itself there are no indivisible digraphs
352or trigraphs, but since the classical times  the  transliteration  of  greek
353words  required  {\it  th} in place of $\theta$, {\it rh} in place of $\rho$
354(but {\it rrh} in place of $\rho\rho$), {\it ph} in  place  of  $\phi$,  and
355{\it  ch}  in  place of $\chi$; therefore these digraphs can not be split by
356the hyphenation process.
357
358\section{Hyphenation}
359{\it Italian.} The italian hyphenation rules are stated very simply as follows:
360\begin{enumerate}
361
362\item  every syllable contains at least one vowel\footnote{This rule applies
363to all languages,  although  in  every  language  the  notion  of  vowel  is
364different;  for  example  in  several  slavic  languages `r' is considered a
365vowel. If  \TeX\  contained  a  provision  for  this,  the  bad  line  break
366(compara-nds)  that  occurred  in  \TUB, vol.12, n.2, June 1991 at page 239,
367first column, 6-7 lines from bottom, would not have taken place.}
368
369\item diphthongs and `triphthongs' behave as one vowel
370
371\item  a  consonant  followed by a vowel belongs to the same syllable as the
372vowel
373
374\item  one  or more consonants not followed by a vowel (at the end o a word,
375possibly because of an apocope, or in technical terms, trade marks  and  the
376like) belong to the same syllable as the preceding vowel
377
378\item  when  a group of consonants is found, the hyphen position is the {\it
379leftmost} one (even at the left of the whole group) such that the consonants
380that remain on the right of the hyphen can be found also at the beginning of
381an italian word;\label{cons}
382
383\item  prefixes  and  suffixes  can  be ignored and the compound word may be
384divided as if it were a single word; in any case the division  according  to
385the  etymology is accepted; in practice this happens only with the technical
386prefixes {\it dis-, post-, sub-, trans-,} which are not very common.
387
388\end{enumerate}
389
390Once  it  is  clear  what  is  a  consonant,  a  vowel,  a  diphthong  and a
391`triphthong',  the  only  difficult  rule   to    apply    is    the    rule
392number~\ref{cons};  but  with the help of a school dictionary one can always
393find if there exists an italian word  starting  whith  a  certain  group  of
394consonants.
395
396The  point  is  that if you use a dictionary of too high a quality, you will
397find words starting with almost  any  possible  group  of  consonants:  {\it
398bdelio\footnote{Due to the extremely specialized nature of these words, I do
399not give the translation in english, because  I  did  not  find  a  suitable
400italian-english dictionary that reported them; I believe, though, that their
401scholarly nature is such that with minor modifications they  exist  also  in
402english  and  many  other languages.}, cnidio, ctenidio, ftalato, gmelinite,
403pneumatico, psicosi, pteridina, tmesi}. But many of these words,  mostly  of
404greek  origin,  do  not find their way into school dictionaries (except {\it
405pneumatico} and {\it psicosi\/}), so that a  diligent  person  will  not  be
406misled by too many technicalities and will find the correct division.
407
408The  Italian Standards Institute, in order to avoid confusion in this matter
409established the Regulation UNI~6461 \cite{6461}  that  lists  the  group  of
410consonants  that  must  be  divided, table~\ref{t:6461}. This table does not
411list the normal consonant divisions, that is
412 \begin{itemize}
413
414\item  digraphs  and  trigraphs  can {\it never} be divided, except {\it gn}
415when it appears in a foreign word or in a word that derives from  a  foreign
416one  and  where  the  two  letters are pronounced individually, such as {\it
417Wagner, wagneriano,\dots}
418
419\item geminated (double) consonants and {\it cq} must {\it always} be split
420
421\item  a  liquid  (`l',  `r') or a nasal (`m',`n') is {\it always} separated
422from a following consonant except for the cases shown in table~\ref{t:6461}
423
424\item  any  consonant  is  {\it  never}  separated from the following liquid
425except for the cases shown in table~\ref{t:6461}
426
427\item  the  letter `s' is {\it never} separated from any following consonant
428(unless it is another `s')
429
430\end{itemize}
431
432\begin{table}{\centering\tt
433\begin{tabular}{|*5{c|}}\hline
434b-d  & b-n  & b-s  & c-m  & c-n  \\
435c-s  & c-t  & c-z  & d-g  & d-m  \\
436d-v  & f-t  & g-m  & p-n  & p-s  \\
437p-t  & p-z  & t-m  & t-n  & z-t  \\
438g-fr & ld-m & ld-sp& l-st & mb-d \\
439mp-s & nc-n & ng-st& n-scr& n-st \\
440n-str& r-st & r-str& st-m &      \\
441\hline
442\end{tabular}\par}
443\caption{Groups    of   consonants  that  can  be  split  across  syllables}
444\label{t:6461}
445 \end{table}
446
447The  Regulation  UNI~6461 states also the rules for the apostrophe, i.e.\ it
448behaves as the vowel it replaces; line breaking (without hyphen) is  allowed
449after  it when the line is very short, but it is bad style to do it, so that
450line breaking is eliminated if  no  interword  space  is  left  between  the
451apostrophe  and  the  following  word.
452
453
454Italian  hyphenation  for  \TeX\  was  already  explained by D\'esarm\'enien
455\cite{desarmenien}, but, although I wish I knew french as well as  he  knows
456italian,  the  88  patterns  that  he created for italian were good only for
457consonants while completely  ignored  diphthongs  and  `triphthongs';  in  a
458previous  version  I  prepared,  150 patterns were needed to perform italian
459hyphenation correctly.
460
461For  the  rest the regulation is already made in such a way as to synthesize
462the hyphenation patterns \TeX\ requires, without the need  of  running  {\tt
463patgen};  of  course  some  care must be exercised in order to avoid strange
464situations and in order to replace \TeX\  inability  to  distinguish  vowels
465from consonants.
466
467With    the    advent  of  Version  3.xx  of  \TeX\  it  is  better  to  set
468\verb"\righthyphenmin" to the value 2, because there is no need  to  protect
469the hyphenation algorithm from the mute vowels (`e') that are so frequent in
470english; of course it is not good style to go on a new line  with  just  two
471letters,  but  this  is  so  rare  that it is much better to give \TeX\ more
472chances to  find  suitable  line  break  points  than  to  protect  it  from
473situations that in italian never take place.
474
475Another reason for choosing this reduced value for \verb"\righthyphenmin" is
476due to the accents; it was pointed out that in practice italian has accents,
477if any, only on the last ending vowel of a word. It is known that \TeX\ does
478not hyphenate a word after an accent control sequence, but this drawback has
479a  negligible  influence  on italian since after the accent control sequence
480the word may have just one letter; when accented letters will find their way
481into the 256 symbol character sets, this simple drawback will be eliminated,
482but even with the actual limitations (unless virtual fonts are  used)  \TeX\
483peculiarity  is  of no influence; I admit that {\it virt\`u} (virtue) cannot
484be hyphenated  because  is  too  short  (it  could  be  hyphenated  as  {\it
485vir-t\`u\/}),  while  there  are  no problems with longer words, for example
486{\it qualit\`a} (quality) is hyphenated by \TeX\ as  {\it  qua-lit\`a},  the
487full  possibility  being  {\it  qua-li-t\`a}. But \TeX\ gives correctly {\it
488per-ch\'e} (because), {\it af-fin-ch\'e} (so that), and so on.
489
490There are no known problems with the synthesized patterns listed at the end;
491the only point that leaves me partially  unsatisfied  but  is  grammatically
492perfectly  correct,  is  the fact that technical prefixes such as {\it dis-,
493post-, sub-, trans-} must be explicitly  separated  with  \verb"\-"  if  one
494wants to stress their specific prefix nature. See below the solution for the
495same problem in latin.
496
497
498
499\noindent{\it  Latin.}  The  patterns  that  are listed at the end include a
500subset that was originally designed just for italian; with a little  thought
501and  few additions the pattern set was extended so as to include also modern
502latin.
503
504For  what  concerns  diphthongs,  italian  and  latin diphthongs were merged
505together under the assumption that \TeX\  is  not  supposed  to  find  every
506possible  break point but only legal break points, so that if two vowels are
507treated as a diphthong even if they belong to two different  syllables,  the
508only  drawback  is that you miss a legal break point but you do not make any
509wrong division. More over most Italian readers  feel  uncomfortable  when  a
510break  point  is  taken  such that the new line starts with a vowel (this is
511certainly not the case with anglophone readers) so that the extension of the
512set  of  diphthongs  of    either  language  does not bother neither italian
513readers, nor latin ones. The declaration of  {\it  \ae}  and  {\it  \oe}  as
514letters with their \verb"\lccode" allows the hyphenation of words containing
515such ligatures, although their use is discouraged.
516
517For what concerns consonant groups there is no regulation as for italian; my
518grammar \cite{manna} claims that latin hyphenation is  done  as  in  italian
519(except  for  what  concerns  prefixes  and  suffixes  that  must be divided
520etymologically) but in latin there are  consonant  groups  that  in  italian
521never occur.
522
523In  order  to  find  out how unusual consonant groups are treated in latin I
524examined  an  important  scholar's  book  \cite{merk},  the  bilingual   New
525Testament in greek and latin ``apparato critico instructum'', reprinted as a
526``reeditio  photomechanica  ex  typographia~\dots,  Romae''  and  for  which
527``omnia  iura  reservantur'';  clearly  this  is  modern latin, although the
528book's contents, the latin part, contains  the  well  known  text  that  was
529translated  from  greek  and  aramaic  by  several  authors  across  several
530centuries and  copied  by  different  copyists  in  many  codices  that  are
531preserved  all  over the world. This critical edition is intended as a study
532material and is particularly cured in the language and the spelling for  the
533very purpose of the book.
534
535By  examining  the  hyphenations  of  this  book  I  could  list a series of
536consonant groups, and I could realize that the digraph {\it  gn}  (which  is
537such  in  italian but it is not supposed to be one in latin) was treated not
538uniformly so as to have both {\it reg-num} and {\it re-gnum}. I  decided  to
539chose  the  second  form  of  hyphenation  for  two  reasons: a) it does not
540conflict with the italian rule, and b) the pronunciation recommended to  the
541clergy  and  that  is  being  used in the catholic universities, seminaries,
542monasteries, etc., corresponds to the italian one.
543
544Also  the letter `s' is not treated uniformly; it is generally treated as in
545italian, but there are cases where it is treated as in english; for  example
546{\it blasphemia} (blasphemy) is hyphenated as {\it blas-phe-mia}. Since this
547does not conflict with the italian rule (in this language the group `sph' is
548missing)  a  suitable  pattern  was  generated  in  order  to cope with such
549situations.
550
551
552Some attention was given to the prefixes and suffixes in order to find a way
553to separate them correctly according to their etymology; for  what  concerns
554prefixes,  these  must be separated regardless of the groups of letters that
555get split away, provided that the prefix did not loose its  final  vowel  by
556elision  with  the  initial  vowel  of the compound word second element. For
557example the prefix {\it  paene-}  (almost)  looses  the  last  `e'  in  {\it
558paeninsula}  and therefore the whole word is treated as a single word and is
559hyphenated {\it pae-nin-su-la}.
560
561It was possible to find suitable patterns for certain instances of {\it ab-,
562ad-, ob-, trans-}, for the prefixes {\it abs-, dis-, circum-, sub-}, and for
563the  suffixes {\it -dem, -que} but the problem remains, although it shows up
564not so often.
565
566The  solution  can  be  found  in  a  macro  (already described by J.~Braams
567\cite{braams}) that has been in use by the German \TeX\ users, which have to
568cope  all  the  time  with  compound words that need a little help for their
569correct hyphenation:
570 \begin{verbatim}
571\def\allowhyphens{\penalty\@M\hskip\z@}
572\def~#1{\if\string#1-
573           \allowhyphens\-\allowhyphens
574        \else
575           \penalty\@M\ #1%
576        \fi
577}
578\end{verbatim}
579
580Here  this  macro  appears  in  a  modified  form; in the german version the
581character \verb|"| (instead of \verb|~|) was made active  and  was  given  a
582complex  definition so as to treat the umlaut in the proper way and to cover
583several other situations that occur in german. This implies several  changes
584to  be  made  here and there in the definitions of \plain, in particular the
585double quote must be added to the list of special characters so as  to  deal
586with them in a consistent way when typesetting in verbatim mode. I preferred
587to give a new definition to the tie  character  \verb|~|,  that  is  already
588listed  among the special characters; this new definition performs the usual
589tie function except when is followed by the hyphen character; in the  latter
590case  the  sequence \verb|~-| inserts a special discretionary break that has
591the property that normal hyphenation takes place in the rest  of  the  word;
592remember,   in  facts,  that  the  standard  sequence  \verb|\-|  inserts  a
593discretionary break but inhibits hyphenation in the rest of the word.
594
595Therefore,  if  wrong prefix or suffix hyphenations are found in the drafts,
596it is possible to correct (or to write it  that  way  since  the  beginning)
597\verb|con~-iungo,  ob~-iurgo|  so  that  the possible hyphenation points are
598{\it con-iun-go, ob-iur-go}.
599
600
601\begin{figure*}
602\begin{trecolonne}\italiano
603La lingua italiana e le lingue cosiddette romanze o neolatine, cio\`e lingue
604derivate anch'esse dal latino (francese,  spagnolo,  portoghese,  rumeno  ed
605altre minori), si fanno risalire all'idioma, che al tempo dell'impero romano
606era  parlato  nella  penisola  italiana,  nelle  regioni  del   Mediterraneo
607occidentale e nella Dacia, l'odierna Romania.
608
609Tracce  evidentissime  si  osservano  ancor  oggi non soltanto nel lessico e
610nella morfologia del gruppo linguistico neolatino, ma anche in altre  lingue
611europee,  quelle  del  gruppo  anglo-sassone, come conseguenza dell'influsso
612diretto o indiretto esercitato dalla lingua di Roma sugli idiomi particolari
613dei popoli nordici.
614
615Per  quel  che  riguarda la lingua italiana, essa si collega direttamente al
616{\it sermo vulgaris la\-ti\-nus,} cio\`e al latino parlato comunemente dalle
617famiglie e in pubblico nei quotidiani rapporti di commercio e di affari.
618 \end{trecolonne}
619\caption[]{Example of italian text typeset in narrow columns (from
620\cite{manna})}
621 \medskip
622\begin{trecolonne}\latino
623Et  sicut Moyses exaltavit serpentem in deserto, ita exaltari oportet Filium
624hominis, ut omnis, qui  credit  in  ipsum,  non  pereat,  sed  habeat  vitam
625aeternam.  Sic enim Deus dilexit mundum, ut Filium suum unigenitum daret, ut
626omnis qui credit in eum non pereat, sed  habeat  vitam  aeternam.  Non  enim
627misit  Deus Filium suum in mundum, ut iudicet mundum, sed ut salvetur mundus
628per ipsum. Qui credit in eum, non  iudicatur;  qui  autem  non  credit,  iam
629iudicatus  est, quia non credit in nomine unigeniti Filii Dei. Hoc est autem
630iudicium, quia lux venit in mundum, et  dilexerunt  homines  magis  tenebras
631quam  lucem;  erant  enim  eorum mala opera. Omnis enim, qui male agit, odit
632lucem et non venit ad lucem, ut non arguantur opera eius;  qui  autem  facit
633veritatem,  venit  ad  lucem,  ut manifestentur opera eius, quia in Deo sunt
634facta.
635 \end{trecolonne}
636\caption[]{Example of latin text typeset in narrow columns (J\,3,14-21)}
637\end{figure*}
638
639
640\section{Generation of the format file}
641In  the appendix the file {\tt italat.tex} is listed and the patterns may be
642checked against the rules that have been stated in  the  previous  sections.
643Special attention was given to the groups {\it ps} and {\it pn}, because the
644table~\ref{t:6461} states that they must  be  separated,  but  the  compound
645words  with  {\it  psic-}  (example {\it parapsicologia\/}) and {\it pneum-}
646(example {\it pseudopneumococco\/}) must not be hyphenated after the `p'.
647
648The  ligatures  `\ae'  and  `\oe'  have  been  included  with  the \verb|^^|
649notation, because the patterns can not contain control sequences; this poses
650no  problems to the final user, because the hyphenation algorithm is applied
651after all macro expansions have been reduced to non expandable tokens.
652
653The pattern list is preceded by some definitions:
654\begin{itemize}
655
656\item  the  category,  lower  case  and  upper case code definitions for the
657ligatures `\ae' and `\oe' so that they can be used in latin text;  I  stress
658again  that these ligatures should not be used, except when quoting verbatim
659some text where they have been used.
660
661\item the definition of the special control sequence \verb|~-|;
662
663\item  the  definition  of  the  new  language  ``italian'' with the command
664(\verb|\italiano|)  that  invokes  all  the  auxiliary   definitions;    the
665apostrophe  character  must be given its \verb"\lccode=39" so as to treat it
666as a normal letter and as the vowel it replaces.
667
668
669\item the command for latin (\verb'\latino', ablative and short for ``latino
670sermone'') is simply \verb'\let' to be identical with \verb'\italiano'.
671
672\end{itemize}
673
674The  patterns  are enclosed within a group so that the \verb'\lccode' of the
675apostrophe and the codes for the ligatures `\ae' and `\oe' remain local  and
676do  not  mix  things up with the default language and/or with the previously
677defined languages.
678
679Adding  these  hyphenation  patterns  to  the  format  that  has one or more
680languages already defined is not a heavy overhead; if you  add  italian  and
681latin  to  the default language `english' you do not need a large version of
682\TeX; the statistics, after running {\tt initex}, say that  the  hyphenation
683trie  is of size 6336 with 220 ops, 181 of which are used for english and 39
684for italian and latin; italo-latin hyphenation requires  just  202  patterns
685(some  of which probably never occur in practice) against the 4447 needed in
686english.
687
688\section{Conclusion}
689The  hyphenation  patterns  valid  for  both  italian  and  latin  have been
690generated directly from the grammar hyphenation  rules;  for  what  concerns
691italian  the  set  of  patterns  (a  subset  of  that shown in the file {\tt
692italat.tex} reported in the appendix) has been in use for two years  in  the
693Institution  where  I  work, and after a short period of careful observation
694and debugging it performed absolutely without errors of any  kind.  Although
695the  italian rules allow to hyphenate a compound word as if it were a simple
696one, some prefixes that are mainly used in technical terms may be explicitly
697hyphenated   with  the  help  of  the  special  discretionary  hyphen  macro
698\verb|~-|.
699
700For  what  concerns latin the there is less experience but the impression is
701that also in this language there are no  hyphenation  errors;  any  how  the
702author  is grateful to anyone that might report suggestions and corrections.
703The special discretionary hyphen macro \verb|~-| is very useful for prefixes
704and  suffixes  and  must  be  used  whenever  unusual consonant clusters are
705generated by the apposition of a prefix or a suffix.
706
707
708In  Figures~1  and~2  two  examples  show the performance of the hyphenation
709algorithm in italian and in latin when the line width  is  very  small;  the
710line  breaking  tolerance is the default one (200) and in each example there
711are a just couple of underfull hboxes.
712
713I  am  pleased to express my thanks to the Nuns of the Benedictine Monastery
714of Viboldone (S.~Giuliano, Milano, Italy) who helped me very much with their
715experience in typesetting latin and other ancient languages.
716
717
718
719
720\appendix
721\onecolumn
722\section{The {\tt italat.tex} file}
723This  file must be input after the last line of the file {\tt plain.tex} (or
724{\tt lplain.tex} for \LaTeX); the  definitions  given  before  the  list  of
725patterns  are  better  located in the format file, so they are valid for any
726style and there is no possibility to forget them out.
727 \small
728\begin{verbatim}
729%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
730%
731%                        F I L E    I T A L A T . T E X
732%
733%                  Hyphenation patterns for Italian and Latin
734%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
735%          Prepared by Claudio Beccari, Politecnico di Torino, Italy
736%                           e-mail beccari@polito.it
737%
738% Version date  27 august 1991
739%
740% Useful definitions
741%
742\def\catcodeAE{\catcode 26=11 \catcode 29=11 \lccode 29=26    % Ligature ae,AE
743               \uccode 29=29  \lccode 26=26  \uccode 26=29
744               \catcode 27=11 \catcode 30=11 \lccode 30=27    % Ligature oe,OE
745               \uccode 30=30  \lccode 27=27  \uccode 27=30}
746\makeatletter %                  Because when this file gets read @ is "other"
747\def\allowhyphens{\penalty\@M\hskip\z@}
748\gdef~#1{\if\string#1-\allowhyphens\-\allowhyphens
749           \else \penalty\@M\ #1\fi}
750\makeatother%                                             Restore @ to "other"
751%
752% A number is given to italian/latin hyphenation
753%
754\newlanguage\italian
755%
756% The commands \italiano and \latino are defined
757%
758\def\italiano{\language=\italian \righthyphenmin=2 \lccode`\'=39 \catcodeAE}
759\let\latino\italiano
760%
761% The patterns are defined within a group so that the \lccode of the apostrophe
762% remains local and does not interfere with other languages
763%
764{\language\italian \catcodeAE \lccode`\'=39
765%
766\patterns{
767.a2b2s3  .a2b3l
768.o2b3l   .o2b3m .o2b3r      .o2b3s
769.an1ti3  .a2p3n .di2s3ci3ne .cir1cu2m3 .wa2g3n
770.ca4p5s  .pre3i .pro3i
771.ri3a    .ri3e  .re3i       .ri3o      .ri3u
772.su4b3lu .su4b3r 2s3que.    2s3dem.
7733p4si3c4 3p4neu1
774^^Z1     ^^[1                                          %   Ligatures ae and oe
775a1a   a2e    a2i    a2j    a1o   a2u  a2y              %   Diphthongs
776a2y3o a3i2a  a3i2e  a3i2o  a3i2u ae3u
777e1a   e1e    e2i    e2j    e1o   e2u  e2y e3iu
778i2a   i2e    i1i    i2o    i2u   io3i
779o1a   o2e    o2i    o2j    o1o   o2u  o2y
780o3i2a o3i2e  o3i2o  o3i2u
781u2a   u2e    u2i    u2o    u1u   uo3u
7821b2   2b3b   4b3d   2b3n   2b3t                        %   Consonant groups
783      2b3s4a 2b3s4e 2b3s4i 2b3s4o 2b3s4u  2b3s4t   u2b3s4c
7841c2   2c3c   2c3m   2c3n   2c3q  2c3s  2c3t  2c3z  2ch3h
7851d2   2d3d   2d3g   2d3m   2d3s  2d3v  4d3w
7861f2   2f3f   2f3t
7871g2   2g3g   2g3d   2g3f   2g3m  2g3s
7881h2   1j2    2j3j   1k2    2k3k
7891l2a  1l2e   1l2i   1l2j   1l2o  1l2u
790      1l2l3l l3f4t  1l'    2l4l3m      1l2^^Z 1l2^^[
7911m2   2m3m   2m3b   2m3p   2m3l  2m3n  2m3r   2m4p3s 2m4p3t 4m3w
7921n2a  1n2e   1n2i   1n2j   1n2o  1n2u  2n3n   n2c1n  2n1l
793      n2g3n  2n1r   n2s3m  n2s3f 2n'   1n2^^Z 1n2^^[
7941p2   2p3p   2p3s   2p3n   2p3t  2p3z  2ph3p  2ph3t  2s3p2h
7951q2   2q3q
7961r2a  1r2e   1r2i   1r2j   1r2o  1r2u  1r2h   1r2^^Z 1r2^^[
7971s2   2s3s   2st3m
7981t2   2t3t   4t3m   2t3n   1t'   4t3w
7991v2   2v3v   1w2    2w3w   wa4r
8001x2a  1x2e   1x2i   1x2o   1x2u  2x3x  1x2^^Z  1x2^^[
801y2a   y2e    y2i    y2o    y2u
8021z2   2z3z   2z3t   1z'    }}
803\end{verbatim}
804
805
806\normalsize
807\begin{thebibliography}{99}
808
809\bibitem{snoopy}  Schulz  C.M., {\it Insuperabilis Snupius}, translated into
810latin by G.~Angelino, European Language Institute, Recanati, Italy, 1984
811
812\bibitem{MMouse}  Walt  Disney,  {\it  Michael  Musculus et Regina Africae},
813translated into latin by C.~Egger, European  Language  Institute,  Recanati,
814Italy, 1986
815
816\bibitem{asterix}  Goshinny  and Uderzo, {\it Asterix gladiator}, translated
817into latin  by  K.H.G.~von  Rothenburg  ({\it  Rubricastellanus\/}),  Delta,
818Stuttgart, 1978
819
820\bibitem{migliorini}  Migliorini  B.M.,  {\it Storia della lingua italiana},
821(History of the italian language), Sansoni, Firenze 1963
822
823\bibitem{knuth}  Graham  R.L.,  Knuth  D.E.,  Patashnik  O.,  {\it  Concrete
824mathematics}, Addison-Wesley Publ. Co., Reading, Mass., 1989 (3rd printing)
825
826\bibitem{6461}    {\it  Divisione  delle  parole  in  fin  di  linea}  (Word
827hyphenation at the end of a line), published by UNI, Ente Nazionale Italiano
828di Unificazione, Milano, 1969
829
830\bibitem{6015}  {\it  Segnaccento  obbligatorio nell'ortografia della lingua
831italiana} (Obbligatory accent marks for the correct spelling of the  italian
832language),  bublished  by  UNI,  Ente  Nazionale  Italiano  di Unificazione,
833Milano, 1967
834
835\bibitem{desarmenien}  D\'esarm\'enien  J.,  ``The  use  of \TeX\ in French:
836hyphenation and typography'' in {\it \TeX\  for  scientific  documentation},
837D.~Lucarella ed., Addison-Wesley Publ.\ Co., Reading, Mass., 1985
838
839\bibitem{manna}  Manna  F., {\it Il latino ieri e oggi} (Latin yesterday and
840today), Signorelli, Milano, 1985
841
842\bibitem{merk}  {\it  Novum Testamentum Graece et Latine} (The New Testament
843In Greek and Latin),  A.~Merk S.J.\ ed., Istituto Biblico Pontificio,  Roma,
8441984
845
846\bibitem{braams}  Braams  J.,{\it  Babel, a multilingual style option system
847for  use  with  \LaTeX's  standard  document  styles},  \TUB\  vol.12,  n.2,
848June~1991, pp.~291-301
849
850\end{thebibliography}
851
852\makesignature
853\end{document}
854
855