1% Computer Aided Hyphenation for Italian and Modern Latin 2% by Claudio Beccari 3% Dipartimento di Elettronica 4% Politecnico di Torino 5% e-mail beccari@polito.it 6% 7\documentclass{ltugboat} 8\title{Computer Aided Hyphenation for Italian and Modern Latin} 9\author{Claudio Beccari} 10\address{Dipartimento di Elettronica\\Politecnico di Torino\\ Turin, Italy} 11\netaddress{beccari@polito.it} 12% 13% New environment "comment" 14% 15\newenvironment{comment}{\begingroup\setbox0\vbox\bgroup}{\egroup\endgroup} 16% 17% New environment to typeset on three columns 18% 19\newenvironment{trecolonne}{% Opening commands 20\hbadness=10000 \vbadness=10000 21\widowpenalty=0 \clubpenalty=0% Necessary to counteract ltugboat.sty settings 22\dimen0=\textwidth \advance\dimen0 -2\columnsep 23\divide\dimen0 by 3 24\setbox0\vbox\bgroup\hsize\dimen0\parindent 1em 25bbbb\par 26}{% Closing commands 27\par\egroup 28\setbox0=\vbox{\unvbox0\null} 29\setbox6\vsplit0 to \baselineskip 30\count255\ht0 \divide\count255 by \baselineskip \divide\count255 by 3 31\dimen2=\count255\baselineskip \advance\dimen2\topskip 32\global\setbox2\vsplit0 to\dimen2 33\setbox2\vbox{\unvbox2} 34\ifdim\ht2<\dimen2 \setbox2\vbox{\unvbox2\vsplit0 to \topskip}\fi 35\global\setbox4\vsplit0 to\dimen2 36\setbox4\vbox{\unvbox4} 37\ifdim\ht4<\dimen2 \setbox4\vbox{\unvbox4\vsplit0 to \topskip}\fi 38\setbox2\vtop{\unvbox2} 39\setbox4\vtop{\unvbox4} 40\setbox6\vtop{\unvbox0} 41\noindent\box2 \hfill \box4 \hfill \box6}% End of definition 42% 43\let \italiano\relax \let\latino\relax 44 45\hbadness=5000 46\begin{document} 47\maketitle 48\section*{Abstract} 49After an essential historical sketch of the evolution of latin into italian 50and modern latin, the peculiarities of both languages are described so as to 51understand the philosophy of the hyphenation patterns. The latter are one of 52the few cases where the same set serves two different languages. 53 54\section*{Sommario} Dopo aver delineato brevemente l'evoluzione del latino 55verso l'italiano e il latino moderno, vengono descritte le caratteristiche 56delle due lingue in modo da capire la filosofia dei {\it pattern} di 57divisione in sillabe. Questi {\it pattern} costituiscono uno dei pochi 58esem\-pi applicabile a due lingue differenti. 59 60\section*{Summarium} 61Latini sermonis evolutione ad italianum et la\-ti\-num 62modernum breviter exposita, utrius sermonis spe\-cie\-ta\-tes descriptae 63sunt ut philosophia de {\it pattern} ad syllabas dividendi intelligatur. 64Isti {\it pattern} duobus differentibus sermonibus applicabile exemplum 65sunt. 66 67\section{Outline of historical evolution} 68Classical latin as we study it in schools and universities is the language 69that was used, especially in written form, by the authors of the republican 70period and of the very beginning of the empire. Common people spoke a 71similar language that was open to the contribution of new words from other 72countries, to new constructs and to a general simplification of the 73inflection of nouns, adjectives and verbs. 74 75Cicero himself was complaining about the fact that common people (the {\it 76vulgus\/}) used to shorten the desinences leaving out the final consonants, 77and used to palatalize the `c' and `g' followed by the front vowels `e' and 78`i'. Those were the first signals of the autoctonous evolution of latin 79towards the modern language; in the other parts of the Roman empire similar 80evolutions were going on with a stronger influence of the native languages 81over which latin had superimposed itself; the invasions of the 82``barbarians'' brought in peculiar pronunciations and a lot of lexical 83additions. 84 85Latin decline was very slow because it was the scholar's, the chancellor's, 86the notary public's language for many centuries, and it was and still is the 87official language of the Roman Catholic Church; latin, in its modern form, 88is the official language of the Vatican State, and the daily Vatican 89newspaper, {\it L'Osservatore Romano,} is published mainly in italian, but 90with frequent contributions in latin, even commercial adds! Modern latin is 91used even for comics books; I suggest Snoopy \cite{snoopy}, Mickey Mouse 92\cite{MMouse}, Asterix \cite{asterix}\footnote{The former two books are 93intended as didactic aids for teaching latin, and are fully accented with 94both prosodic and rythmic marks.}. 95 96Nowadays latin is studied in many countries as a regular subject both in 97high school and in universities; in Italy it is not classified as a 98``foreign'' language and is a compulsory subject both in classical and 99scientific {\it licei} (high schools). In the past, latin was even more 100important in the education of young people; forty years ago I started latin 101in sixth grade and had eight years of it through junior high and high 102schools\footnote{I frequented the {\it liceo classico} and had also five 103years of classical greek; now I have an engineering degree and I am a 104professor of electric circuit theory. I am very glad I had the opportunity 105of completing my education by studying humanities for so long, and I wish 106the new generation could have the same.}. 107 108From the common people's language of the first century several regional and 109local dialects evolved; in 960~A.D.\ there is the first document explicitly 110written in what we might already call italian \cite{migliorini}; several 111documents, mostly including poems, were produced in the following centuries, 112and by the end of the thirteenth century the masterpiece of Dante Alighieri, 113the {\it Divina Commedia}, is considered the main landmark of the new 114language, that was already so mature as to be used in a poetic treatise of 115history, philosophy and theology. 116 117The modernization of Dante's language took place during the past seven 118centuries, but compared to modern italian there is not such a great 119difference as between the language used by Chaucer in his {\it Canterbury 120Tales} and modern english; today's italian high school students can read 121Dante's poem and other even older texts with no more difficulty than that 122required by any other conceptual text. 123 124\section{Alphabet} 125Italian and modern latin use the 26 letter alphabet that everybody knows 126with the name of {\it latin alphabet\/}; actually there are some fine points 127to consider with due attention. 128 129\noindent {\it Italian.} The letters J, K, X, Y, and W are used only in 130technical terms and symbols, foreign names and some very specialized words, 131such as the international word {\it taxi}. J, K and Y survive in toponyms, 132family names, and english style nick names, such as Stefy for Stefania 133(Stephanie). The letter J (see also below) used to be employed in the past 134as a graphic device to distinguish the semivowel role of the letter I, so 135that you have {\it Ajmone} (family name) and you may write {\it Iugoslavia} 136(modern spelling), {\it Jugoslavia} (old fashioned spelling), and {\it 137Yugoslavia} (international spelling) according to your preference; in 138italian all three are correct and are pronounced exactly the same way. 139 140Besides the above mentioned letters, there are five vowels, none of which is 141mute: {\it a, e, i, o, u}, fifteen consonants: {\it b, c, d, f, g, l, m, n, 142p, q, r, s, t, v, z}, and one diacritical letter: {\it h}. The latter does 143not correspond to any sound and is used only to mark half a dozen words in 144order to distinguish them from similar ones that sound the same but have a 145different meaning, to mark some interjections, and to mark the velar 146pronunciation of `c' and `g' when otherwise they would be palatalized. 147 148\begin{comment} 149In total there are 26 signs, but, in spite of the modern times, in 150elementary school they keep teaching that the italian alphabet contains 21 151letters; in facts they are having troubles with the children with exotic 152names such as John, Katia, Xenobia, Yuri, Walter and the like; these names, 153due to the influence of the mass media, are now very common (well\dots, I 154know just one person named Xenobia, but there are other names containing the 155letter `x'). 156\end{comment} 157 158Except for a dozen among articles, prepositions and adverbs (that 159nevertheless are used quite often), all common words in italian end with a 160vowel; of course this statement does not apply to trade marks, not 161assimilated foreign words, technical terms, and the like. 162 163Another peculiarity is that every consonant may occur in its doubled form, 164and this corresponds to its reinforcement when the double consonant is 165pronounced. There are rare instances of double vowels, but in this case, 166contrary to what happens in english, they form different syllables instead 167of a diphthong; for example, {\it zoologico} can be divided in {\it 168zo-o-lo-gi-co}. 169 170\noindent {\it Latin.} Classical latin missed J, U, and W, while V was used 171throughout wherever now U or V are used. Since the very beginning this 172anomaly was passed by the scholars on into the spelling and printing of all 173languages; capital V was used in all circumstances, while `v' was used in 174printing at the beginning of words and `u' in the middle or at the end. This 175confusing habit was common to all western languages but fortunately it was 176abandoned starting in Holland during the sixteenth century; it lasted a 177little more in Italy because of the wide use of latin, but was eventually 178done away by the end of the seventeenth century. When Knuth \cite[reference 179106]{knuth} cites Pacioli's {\it Diuine Proportione}, published in Venice in 1801509, he reports that title with the spelling of the original printing, but 181the pronunciation at that time already implied the consonant V instead of 182the vowel~U. 183 184In the middle ages and in the early times of printing there was the habit of 185using `j' instead of `i' in those cases where the letter `i' formed a 186diphthong with the following vowel; it was just a graphic trick to 187distinguish the two roles of the letter `i', and it was so successful that 188it was adopted also in other languages; this is the reason why even today we 189spell {\it junior} instead of {\it iunior}, although the latter is the 190formal latin spelling. 191 192Modern latin uses both U and V in the proper positions, while J and W are 193used only in foreign names and toponyms. 194 195There are six vowels: {\it a, e, i, o, u, y} and eighteen consonants: {\it 196b, c, d, f, g, h, k, l, m, n, p, q, r, s, t, v, x, z}. The ligatures {\it 197\ae, \oe} do not belong to latin; they were introduced in the sixteenth 198century in France and in England, and after that they enjoyed a certain 199popularity also in latin, but in modern usage, as well as in classical 200latin, these two diphthongs are spelled with separate letters. 201 202\section{Accents} 203{\it Italian.} In italian accents are used very sparingly; it is compulsory 204to mark with a suitable accent the last vowel of polysyllabic oxitone words 205(those that receive the stress on the last syllable), and to mark some well 206known and specified monosyllabic words that contain a diphthong. This is 207standardized by the Regulation UNI~6015 \cite{6015}. 208 209Contrary to spanish and portuguese, in italian there is no necessity to mark 210proparoxitone words with an accent, although the best grammars recommend to 211do so. In practice, if you exclude oxitone words (where accents are 212compulsory) and paroxitone words (where accents are not required), the other 213ones {\it may} be marked with an accent only when a different position of 214the stress might change the meaning; for example {\it l\`avati} means `wash 215yourself' while {\it lav\`ati} is the masculine plural of `washed'; in this 216circumstance it is advisable to mark the first case unless the meaning of 217the rest of the sentence does not make clear which case is implied. Although 218the `Sommario' of this article contains five proparoxitone words, no accents 219were used. 220 221The accent can be used also for denoting the open or closed nature of a 222vowel (only for tonic `e' and `o'), but this use is found only in 223dictionaries and grammars; a good grammar will certainly point out that {\it 224c\`olto} (picked up) is different from {\it c\'olto} (educated), but in 225practice the meaning is determined by the context while the actual 226pronunciation very strongly depends on the regional origin of the speaker. 227 228The grave (\`{}) accent is used on any vowel, while the acute (\'{}) accent 229may be used only on the vowel `e' (and on the vowel `o', but only in 230optional situations) when it has a closed sound. Most Italians are not even 231aware of this choice; when they hand write, they just put any kind of small 232surd on the vowel to be accented, and by so doing they intend to mark only 233the stress; the tonic value of the accent is used only in dictionaries and 234grammars, while in printing the difference is maintained only for the letter 235`e' in oxitone words more as a tribute to the tradition than for an actual 236semantic necessity. 237 238\begin{comment} 239Some fancy character sets have both accents merged into a single horizontal 240bar. 241\end{comment} 242 243When the accent is compulsory and upper case letters are used, if the 244character set does not contain accented vowels, it is accepted to use an 245apostrophe: UNITA' (unity) in place of UNIT\`A; this practice is considered 246bad style in typesetting, but is used quite often in advertising. 247 248The diaeresis (\"{}) and the circumflex (\^{}) are not used anymore; in the 249past the diaeresis was used in poetry to split a diphthong, and the 250circumflex had several meanings such as, for example, to mark the 251contraction of two `i' into one sign in those plurals that centuries ago 252were spelled with a double `i': {\it studii} (studies, two centuries ago), 253{\it stud\^\i} (one century ago), {\it studi} (modern). 254 255 256\noindent{\it Latin.} In latin no accents are used; the breve (\u{}) and the 257long (\={}) accents are used only in dictionaries, grammars and where 258prosody is dealt with. The diaeresis is sometimes used in grammars and in 259prosody to mark the splitting of a diphthong: {\it a\"er} (air), {\it 260po\"eta} (poet). 261 262\section{Apocope and aphaeresis} 263{\it Italian.} In italian the dropping of one or more initial letters in a 264word (aphaeresis) takes place only in poetry and is marked with an 265apostrophe preceded by a white space. 266 267The loss of one or more terminal letters in a word (apocope) either is not 268marked at all (see in the `Sommario' {\it aver} in place of {\it avere\/}) 269or it is marked with an apostrophe when it corresponds to a vocalic elision 270(see above {\it l'evoluzione} in place of {\it la evoluzione\/}) or to a 271complete syllabic apocope. The latter case is very unusual, while the 272vocalic elision is very frequent, so that this case must be taken care 273properly when dealing with hyphenation; the rules stated in the Regulation 274UNI~6461 \cite{6461} require that when a line ends with an apostrophe, this 275{\it must not} be replaced back with the vowel it originally replaced. In 276the past, not too long ago, for example when I was in elementary school, the 277opposite rule was in use, so that there are occasional discussions between 278the old styled generation and the new one. Nevertheless even to day it is 279considered bad style to end a line with an apostrophe, and in typography 280this practice is tolerated only when the line width is quite small, as in 281the daily newspapers narrow columns. 282 283\noindent{\it Latin.} I do not know of any case where apocope or aphaeresis 284are marked in any visible way; actually I am almost sure that these two 285spelling behaviours are not legal in latin. 286 287\section{Diphthongs} 288{\it Italian.} In italian a diphthong is formed by any vowel preceded or 289followed by an {\it unstressed} closed vowel (`i' or `u'); so we have: 290 \begin{center} 291\it ia, ie, io, ai, ei, oi \\ 292 ua, ue, uo, au, eu, ou \\ 293 iu, ui 294\end{center} 295 296Italian diphthongs are always pronounced maintaining the sounds of the 297individual vowels, and the closed vowel plays the role of a semivowel or a 298glide. 299 300There are also groups of three vowels that contain two semivowels or a 301semivowel and a glide: 302 \begin{center}\it 303iuo, uie \\ 304ieu, uoi, iei 305\end{center} 306 307An `i' (possibly also an `u', but I can't find examples) surrounded by two 308open vowels behaves always as a semivowel, so it always starts a new 309syllable. 310 311\noindent{\it Latin.} In latin there are more or less the same diphthongs as 312in italian with the addition of 313\begin{center}\it 314ae, oe 315\end{center} 316that one or two centuries ago were written with the corresponding ligatures 317{\it \ae, \oe}; in modern latin the pronunciation of both these diphthongs 318is given by a single open `e'\footnote{I have seen a reproduction of an 319italian book printed in Venice in the sixteen century where both these 320diphthongs where replaced by their sound given by the letter `e'.}. 321Furthermore in some words of greek origin, latin may have the diphthong {\it 322yi}, for example {\it Harpyia} \cite{manna}\footnote{One might think that it 323would be the same to consider the vowel `y' and the diphthong `ia', since 324the pronunciation would be practically the same; but if you look at it from 325the prosody point of view, the situation becomes completely reversed; a 326diphthong is always long while `y' is always short, so that in prosody 327Har-pyi-a becomes \={}\={}\u{}, while Har-py-ia becomes \={}\u{}\={}.}. 328 329The main difference between italian and latin common diphthongs is that {\it 330ia, ie, io, iu} behave as such in latin only when they are at the beginning 331of a word or are preceded by another vowel; in any other case they are part 332of two different syllables; in italian they are always diphthongs provided 333the `i' is unstressed. 334 335\section{Di- and trigraphs} 336{\it Italian.} In italian there are groups of two or three letters that 337imply a sound that is not implied by any other single letter of the 338alphabet; besides `c' and `g' modified with the diacritical `h', and `c' and 339`g' modified with a diacritical `i'\footnote{In this case the letter `i' 340does not form a diphthong with the following vowel but is used just to 341palatalize the two consonants; under the hyphenation point of view this 342subtle difference may be ignored.} there are 343 \begin{center}\it gn, gli, sc \end{center} 344where {\it gn} is pronounced as in french, or as the spanish {\it \~n} or 345the portuguese {\it nh\/}; {\it sc} is pronounced as the english {\it sh} 346when is followed by a front vowel `e' or `i', and {\it gli} is pronounced as 347the portuguese {\it lh} when it is not preceded by `n' and is followed by 348another vowel or it is at the end of a word. These digraphs and trigraphs 349must not be split by the hyphenation process. 350 351\noindent{\it Latin.} In latin by itself there are no indivisible digraphs 352or trigraphs, but since the classical times the transliteration of greek 353words required {\it th} in place of $\theta$, {\it rh} in place of $\rho$ 354(but {\it rrh} in place of $\rho\rho$), {\it ph} in place of $\phi$, and 355{\it ch} in place of $\chi$; therefore these digraphs can not be split by 356the hyphenation process. 357 358\section{Hyphenation} 359{\it Italian.} The italian hyphenation rules are stated very simply as follows: 360\begin{enumerate} 361 362\item every syllable contains at least one vowel\footnote{This rule applies 363to all languages, although in every language the notion of vowel is 364different; for example in several slavic languages `r' is considered a 365vowel. If \TeX\ contained a provision for this, the bad line break 366(compara-nds) that occurred in \TUB, vol.12, n.2, June 1991 at page 239, 367first column, 6-7 lines from bottom, would not have taken place.} 368 369\item diphthongs and `triphthongs' behave as one vowel 370 371\item a consonant followed by a vowel belongs to the same syllable as the 372vowel 373 374\item one or more consonants not followed by a vowel (at the end o a word, 375possibly because of an apocope, or in technical terms, trade marks and the 376like) belong to the same syllable as the preceding vowel 377 378\item when a group of consonants is found, the hyphen position is the {\it 379leftmost} one (even at the left of the whole group) such that the consonants 380that remain on the right of the hyphen can be found also at the beginning of 381an italian word;\label{cons} 382 383\item prefixes and suffixes can be ignored and the compound word may be 384divided as if it were a single word; in any case the division according to 385the etymology is accepted; in practice this happens only with the technical 386prefixes {\it dis-, post-, sub-, trans-,} which are not very common. 387 388\end{enumerate} 389 390Once it is clear what is a consonant, a vowel, a diphthong and a 391`triphthong', the only difficult rule to apply is the rule 392number~\ref{cons}; but with the help of a school dictionary one can always 393find if there exists an italian word starting whith a certain group of 394consonants. 395 396The point is that if you use a dictionary of too high a quality, you will 397find words starting with almost any possible group of consonants: {\it 398bdelio\footnote{Due to the extremely specialized nature of these words, I do 399not give the translation in english, because I did not find a suitable 400italian-english dictionary that reported them; I believe, though, that their 401scholarly nature is such that with minor modifications they exist also in 402english and many other languages.}, cnidio, ctenidio, ftalato, gmelinite, 403pneumatico, psicosi, pteridina, tmesi}. But many of these words, mostly of 404greek origin, do not find their way into school dictionaries (except {\it 405pneumatico} and {\it psicosi\/}), so that a diligent person will not be 406misled by too many technicalities and will find the correct division. 407 408The Italian Standards Institute, in order to avoid confusion in this matter 409established the Regulation UNI~6461 \cite{6461} that lists the group of 410consonants that must be divided, table~\ref{t:6461}. This table does not 411list the normal consonant divisions, that is 412 \begin{itemize} 413 414\item digraphs and trigraphs can {\it never} be divided, except {\it gn} 415when it appears in a foreign word or in a word that derives from a foreign 416one and where the two letters are pronounced individually, such as {\it 417Wagner, wagneriano,\dots} 418 419\item geminated (double) consonants and {\it cq} must {\it always} be split 420 421\item a liquid (`l', `r') or a nasal (`m',`n') is {\it always} separated 422from a following consonant except for the cases shown in table~\ref{t:6461} 423 424\item any consonant is {\it never} separated from the following liquid 425except for the cases shown in table~\ref{t:6461} 426 427\item the letter `s' is {\it never} separated from any following consonant 428(unless it is another `s') 429 430\end{itemize} 431 432\begin{table}{\centering\tt 433\begin{tabular}{|*5{c|}}\hline 434b-d & b-n & b-s & c-m & c-n \\ 435c-s & c-t & c-z & d-g & d-m \\ 436d-v & f-t & g-m & p-n & p-s \\ 437p-t & p-z & t-m & t-n & z-t \\ 438g-fr & ld-m & ld-sp& l-st & mb-d \\ 439mp-s & nc-n & ng-st& n-scr& n-st \\ 440n-str& r-st & r-str& st-m & \\ 441\hline 442\end{tabular}\par} 443\caption{Groups of consonants that can be split across syllables} 444\label{t:6461} 445 \end{table} 446 447The Regulation UNI~6461 states also the rules for the apostrophe, i.e.\ it 448behaves as the vowel it replaces; line breaking (without hyphen) is allowed 449after it when the line is very short, but it is bad style to do it, so that 450line breaking is eliminated if no interword space is left between the 451apostrophe and the following word. 452 453 454Italian hyphenation for \TeX\ was already explained by D\'esarm\'enien 455\cite{desarmenien}, but, although I wish I knew french as well as he knows 456italian, the 88 patterns that he created for italian were good only for 457consonants while completely ignored diphthongs and `triphthongs'; in a 458previous version I prepared, 150 patterns were needed to perform italian 459hyphenation correctly. 460 461For the rest the regulation is already made in such a way as to synthesize 462the hyphenation patterns \TeX\ requires, without the need of running {\tt 463patgen}; of course some care must be exercised in order to avoid strange 464situations and in order to replace \TeX\ inability to distinguish vowels 465from consonants. 466 467With the advent of Version 3.xx of \TeX\ it is better to set 468\verb"\righthyphenmin" to the value 2, because there is no need to protect 469the hyphenation algorithm from the mute vowels (`e') that are so frequent in 470english; of course it is not good style to go on a new line with just two 471letters, but this is so rare that it is much better to give \TeX\ more 472chances to find suitable line break points than to protect it from 473situations that in italian never take place. 474 475Another reason for choosing this reduced value for \verb"\righthyphenmin" is 476due to the accents; it was pointed out that in practice italian has accents, 477if any, only on the last ending vowel of a word. It is known that \TeX\ does 478not hyphenate a word after an accent control sequence, but this drawback has 479a negligible influence on italian since after the accent control sequence 480the word may have just one letter; when accented letters will find their way 481into the 256 symbol character sets, this simple drawback will be eliminated, 482but even with the actual limitations (unless virtual fonts are used) \TeX\ 483peculiarity is of no influence; I admit that {\it virt\`u} (virtue) cannot 484be hyphenated because is too short (it could be hyphenated as {\it 485vir-t\`u\/}), while there are no problems with longer words, for example 486{\it qualit\`a} (quality) is hyphenated by \TeX\ as {\it qua-lit\`a}, the 487full possibility being {\it qua-li-t\`a}. But \TeX\ gives correctly {\it 488per-ch\'e} (because), {\it af-fin-ch\'e} (so that), and so on. 489 490There are no known problems with the synthesized patterns listed at the end; 491the only point that leaves me partially unsatisfied but is grammatically 492perfectly correct, is the fact that technical prefixes such as {\it dis-, 493post-, sub-, trans-} must be explicitly separated with \verb"\-" if one 494wants to stress their specific prefix nature. See below the solution for the 495same problem in latin. 496 497 498 499\noindent{\it Latin.} The patterns that are listed at the end include a 500subset that was originally designed just for italian; with a little thought 501and few additions the pattern set was extended so as to include also modern 502latin. 503 504For what concerns diphthongs, italian and latin diphthongs were merged 505together under the assumption that \TeX\ is not supposed to find every 506possible break point but only legal break points, so that if two vowels are 507treated as a diphthong even if they belong to two different syllables, the 508only drawback is that you miss a legal break point but you do not make any 509wrong division. More over most Italian readers feel uncomfortable when a 510break point is taken such that the new line starts with a vowel (this is 511certainly not the case with anglophone readers) so that the extension of the 512set of diphthongs of either language does not bother neither italian 513readers, nor latin ones. The declaration of {\it \ae} and {\it \oe} as 514letters with their \verb"\lccode" allows the hyphenation of words containing 515such ligatures, although their use is discouraged. 516 517For what concerns consonant groups there is no regulation as for italian; my 518grammar \cite{manna} claims that latin hyphenation is done as in italian 519(except for what concerns prefixes and suffixes that must be divided 520etymologically) but in latin there are consonant groups that in italian 521never occur. 522 523In order to find out how unusual consonant groups are treated in latin I 524examined an important scholar's book \cite{merk}, the bilingual New 525Testament in greek and latin ``apparato critico instructum'', reprinted as a 526``reeditio photomechanica ex typographia~\dots, Romae'' and for which 527``omnia iura reservantur''; clearly this is modern latin, although the 528book's contents, the latin part, contains the well known text that was 529translated from greek and aramaic by several authors across several 530centuries and copied by different copyists in many codices that are 531preserved all over the world. This critical edition is intended as a study 532material and is particularly cured in the language and the spelling for the 533very purpose of the book. 534 535By examining the hyphenations of this book I could list a series of 536consonant groups, and I could realize that the digraph {\it gn} (which is 537such in italian but it is not supposed to be one in latin) was treated not 538uniformly so as to have both {\it reg-num} and {\it re-gnum}. I decided to 539chose the second form of hyphenation for two reasons: a) it does not 540conflict with the italian rule, and b) the pronunciation recommended to the 541clergy and that is being used in the catholic universities, seminaries, 542monasteries, etc., corresponds to the italian one. 543 544Also the letter `s' is not treated uniformly; it is generally treated as in 545italian, but there are cases where it is treated as in english; for example 546{\it blasphemia} (blasphemy) is hyphenated as {\it blas-phe-mia}. Since this 547does not conflict with the italian rule (in this language the group `sph' is 548missing) a suitable pattern was generated in order to cope with such 549situations. 550 551 552Some attention was given to the prefixes and suffixes in order to find a way 553to separate them correctly according to their etymology; for what concerns 554prefixes, these must be separated regardless of the groups of letters that 555get split away, provided that the prefix did not loose its final vowel by 556elision with the initial vowel of the compound word second element. For 557example the prefix {\it paene-} (almost) looses the last `e' in {\it 558paeninsula} and therefore the whole word is treated as a single word and is 559hyphenated {\it pae-nin-su-la}. 560 561It was possible to find suitable patterns for certain instances of {\it ab-, 562ad-, ob-, trans-}, for the prefixes {\it abs-, dis-, circum-, sub-}, and for 563the suffixes {\it -dem, -que} but the problem remains, although it shows up 564not so often. 565 566The solution can be found in a macro (already described by J.~Braams 567\cite{braams}) that has been in use by the German \TeX\ users, which have to 568cope all the time with compound words that need a little help for their 569correct hyphenation: 570 \begin{verbatim} 571\def\allowhyphens{\penalty\@M\hskip\z@} 572\def~#1{\if\string#1- 573 \allowhyphens\-\allowhyphens 574 \else 575 \penalty\@M\ #1% 576 \fi 577} 578\end{verbatim} 579 580Here this macro appears in a modified form; in the german version the 581character \verb|"| (instead of \verb|~|) was made active and was given a 582complex definition so as to treat the umlaut in the proper way and to cover 583several other situations that occur in german. This implies several changes 584to be made here and there in the definitions of \plain, in particular the 585double quote must be added to the list of special characters so as to deal 586with them in a consistent way when typesetting in verbatim mode. I preferred 587to give a new definition to the tie character \verb|~|, that is already 588listed among the special characters; this new definition performs the usual 589tie function except when is followed by the hyphen character; in the latter 590case the sequence \verb|~-| inserts a special discretionary break that has 591the property that normal hyphenation takes place in the rest of the word; 592remember, in facts, that the standard sequence \verb|\-| inserts a 593discretionary break but inhibits hyphenation in the rest of the word. 594 595Therefore, if wrong prefix or suffix hyphenations are found in the drafts, 596it is possible to correct (or to write it that way since the beginning) 597\verb|con~-iungo, ob~-iurgo| so that the possible hyphenation points are 598{\it con-iun-go, ob-iur-go}. 599 600 601\begin{figure*} 602\begin{trecolonne}\italiano 603La lingua italiana e le lingue cosiddette romanze o neolatine, cio\`e lingue 604derivate anch'esse dal latino (francese, spagnolo, portoghese, rumeno ed 605altre minori), si fanno risalire all'idioma, che al tempo dell'impero romano 606era parlato nella penisola italiana, nelle regioni del Mediterraneo 607occidentale e nella Dacia, l'odierna Romania. 608 609Tracce evidentissime si osservano ancor oggi non soltanto nel lessico e 610nella morfologia del gruppo linguistico neolatino, ma anche in altre lingue 611europee, quelle del gruppo anglo-sassone, come conseguenza dell'influsso 612diretto o indiretto esercitato dalla lingua di Roma sugli idiomi particolari 613dei popoli nordici. 614 615Per quel che riguarda la lingua italiana, essa si collega direttamente al 616{\it sermo vulgaris la\-ti\-nus,} cio\`e al latino parlato comunemente dalle 617famiglie e in pubblico nei quotidiani rapporti di commercio e di affari. 618 \end{trecolonne} 619\caption[]{Example of italian text typeset in narrow columns (from 620\cite{manna})} 621 \medskip 622\begin{trecolonne}\latino 623Et sicut Moyses exaltavit serpentem in deserto, ita exaltari oportet Filium 624hominis, ut omnis, qui credit in ipsum, non pereat, sed habeat vitam 625aeternam. Sic enim Deus dilexit mundum, ut Filium suum unigenitum daret, ut 626omnis qui credit in eum non pereat, sed habeat vitam aeternam. Non enim 627misit Deus Filium suum in mundum, ut iudicet mundum, sed ut salvetur mundus 628per ipsum. Qui credit in eum, non iudicatur; qui autem non credit, iam 629iudicatus est, quia non credit in nomine unigeniti Filii Dei. Hoc est autem 630iudicium, quia lux venit in mundum, et dilexerunt homines magis tenebras 631quam lucem; erant enim eorum mala opera. Omnis enim, qui male agit, odit 632lucem et non venit ad lucem, ut non arguantur opera eius; qui autem facit 633veritatem, venit ad lucem, ut manifestentur opera eius, quia in Deo sunt 634facta. 635 \end{trecolonne} 636\caption[]{Example of latin text typeset in narrow columns (J\,3,14-21)} 637\end{figure*} 638 639 640\section{Generation of the format file} 641In the appendix the file {\tt italat.tex} is listed and the patterns may be 642checked against the rules that have been stated in the previous sections. 643Special attention was given to the groups {\it ps} and {\it pn}, because the 644table~\ref{t:6461} states that they must be separated, but the compound 645words with {\it psic-} (example {\it parapsicologia\/}) and {\it pneum-} 646(example {\it pseudopneumococco\/}) must not be hyphenated after the `p'. 647 648The ligatures `\ae' and `\oe' have been included with the \verb|^^| 649notation, because the patterns can not contain control sequences; this poses 650no problems to the final user, because the hyphenation algorithm is applied 651after all macro expansions have been reduced to non expandable tokens. 652 653The pattern list is preceded by some definitions: 654\begin{itemize} 655 656\item the category, lower case and upper case code definitions for the 657ligatures `\ae' and `\oe' so that they can be used in latin text; I stress 658again that these ligatures should not be used, except when quoting verbatim 659some text where they have been used. 660 661\item the definition of the special control sequence \verb|~-|; 662 663\item the definition of the new language ``italian'' with the command 664(\verb|\italiano|) that invokes all the auxiliary definitions; the 665apostrophe character must be given its \verb"\lccode=39" so as to treat it 666as a normal letter and as the vowel it replaces. 667 668 669\item the command for latin (\verb'\latino', ablative and short for ``latino 670sermone'') is simply \verb'\let' to be identical with \verb'\italiano'. 671 672\end{itemize} 673 674The patterns are enclosed within a group so that the \verb'\lccode' of the 675apostrophe and the codes for the ligatures `\ae' and `\oe' remain local and 676do not mix things up with the default language and/or with the previously 677defined languages. 678 679Adding these hyphenation patterns to the format that has one or more 680languages already defined is not a heavy overhead; if you add italian and 681latin to the default language `english' you do not need a large version of 682\TeX; the statistics, after running {\tt initex}, say that the hyphenation 683trie is of size 6336 with 220 ops, 181 of which are used for english and 39 684for italian and latin; italo-latin hyphenation requires just 202 patterns 685(some of which probably never occur in practice) against the 4447 needed in 686english. 687 688\section{Conclusion} 689The hyphenation patterns valid for both italian and latin have been 690generated directly from the grammar hyphenation rules; for what concerns 691italian the set of patterns (a subset of that shown in the file {\tt 692italat.tex} reported in the appendix) has been in use for two years in the 693Institution where I work, and after a short period of careful observation 694and debugging it performed absolutely without errors of any kind. Although 695the italian rules allow to hyphenate a compound word as if it were a simple 696one, some prefixes that are mainly used in technical terms may be explicitly 697hyphenated with the help of the special discretionary hyphen macro 698\verb|~-|. 699 700For what concerns latin the there is less experience but the impression is 701that also in this language there are no hyphenation errors; any how the 702author is grateful to anyone that might report suggestions and corrections. 703The special discretionary hyphen macro \verb|~-| is very useful for prefixes 704and suffixes and must be used whenever unusual consonant clusters are 705generated by the apposition of a prefix or a suffix. 706 707 708In Figures~1 and~2 two examples show the performance of the hyphenation 709algorithm in italian and in latin when the line width is very small; the 710line breaking tolerance is the default one (200) and in each example there 711are a just couple of underfull hboxes. 712 713I am pleased to express my thanks to the Nuns of the Benedictine Monastery 714of Viboldone (S.~Giuliano, Milano, Italy) who helped me very much with their 715experience in typesetting latin and other ancient languages. 716 717 718 719 720\appendix 721\onecolumn 722\section{The {\tt italat.tex} file} 723This file must be input after the last line of the file {\tt plain.tex} (or 724{\tt lplain.tex} for \LaTeX); the definitions given before the list of 725patterns are better located in the format file, so they are valid for any 726style and there is no possibility to forget them out. 727 \small 728\begin{verbatim} 729%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 730% 731% F I L E I T A L A T . T E X 732% 733% Hyphenation patterns for Italian and Latin 734%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 735% Prepared by Claudio Beccari, Politecnico di Torino, Italy 736% e-mail beccari@polito.it 737% 738% Version date 27 august 1991 739% 740% Useful definitions 741% 742\def\catcodeAE{\catcode 26=11 \catcode 29=11 \lccode 29=26 % Ligature ae,AE 743 \uccode 29=29 \lccode 26=26 \uccode 26=29 744 \catcode 27=11 \catcode 30=11 \lccode 30=27 % Ligature oe,OE 745 \uccode 30=30 \lccode 27=27 \uccode 27=30} 746\makeatletter % Because when this file gets read @ is "other" 747\def\allowhyphens{\penalty\@M\hskip\z@} 748\gdef~#1{\if\string#1-\allowhyphens\-\allowhyphens 749 \else \penalty\@M\ #1\fi} 750\makeatother% Restore @ to "other" 751% 752% A number is given to italian/latin hyphenation 753% 754\newlanguage\italian 755% 756% The commands \italiano and \latino are defined 757% 758\def\italiano{\language=\italian \righthyphenmin=2 \lccode`\'=39 \catcodeAE} 759\let\latino\italiano 760% 761% The patterns are defined within a group so that the \lccode of the apostrophe 762% remains local and does not interfere with other languages 763% 764{\language\italian \catcodeAE \lccode`\'=39 765% 766\patterns{ 767.a2b2s3 .a2b3l 768.o2b3l .o2b3m .o2b3r .o2b3s 769.an1ti3 .a2p3n .di2s3ci3ne .cir1cu2m3 .wa2g3n 770.ca4p5s .pre3i .pro3i 771.ri3a .ri3e .re3i .ri3o .ri3u 772.su4b3lu .su4b3r 2s3que. 2s3dem. 7733p4si3c4 3p4neu1 774^^Z1 ^^[1 % Ligatures ae and oe 775a1a a2e a2i a2j a1o a2u a2y % Diphthongs 776a2y3o a3i2a a3i2e a3i2o a3i2u ae3u 777e1a e1e e2i e2j e1o e2u e2y e3iu 778i2a i2e i1i i2o i2u io3i 779o1a o2e o2i o2j o1o o2u o2y 780o3i2a o3i2e o3i2o o3i2u 781u2a u2e u2i u2o u1u uo3u 7821b2 2b3b 4b3d 2b3n 2b3t % Consonant groups 783 2b3s4a 2b3s4e 2b3s4i 2b3s4o 2b3s4u 2b3s4t u2b3s4c 7841c2 2c3c 2c3m 2c3n 2c3q 2c3s 2c3t 2c3z 2ch3h 7851d2 2d3d 2d3g 2d3m 2d3s 2d3v 4d3w 7861f2 2f3f 2f3t 7871g2 2g3g 2g3d 2g3f 2g3m 2g3s 7881h2 1j2 2j3j 1k2 2k3k 7891l2a 1l2e 1l2i 1l2j 1l2o 1l2u 790 1l2l3l l3f4t 1l' 2l4l3m 1l2^^Z 1l2^^[ 7911m2 2m3m 2m3b 2m3p 2m3l 2m3n 2m3r 2m4p3s 2m4p3t 4m3w 7921n2a 1n2e 1n2i 1n2j 1n2o 1n2u 2n3n n2c1n 2n1l 793 n2g3n 2n1r n2s3m n2s3f 2n' 1n2^^Z 1n2^^[ 7941p2 2p3p 2p3s 2p3n 2p3t 2p3z 2ph3p 2ph3t 2s3p2h 7951q2 2q3q 7961r2a 1r2e 1r2i 1r2j 1r2o 1r2u 1r2h 1r2^^Z 1r2^^[ 7971s2 2s3s 2st3m 7981t2 2t3t 4t3m 2t3n 1t' 4t3w 7991v2 2v3v 1w2 2w3w wa4r 8001x2a 1x2e 1x2i 1x2o 1x2u 2x3x 1x2^^Z 1x2^^[ 801y2a y2e y2i y2o y2u 8021z2 2z3z 2z3t 1z' }} 803\end{verbatim} 804 805 806\normalsize 807\begin{thebibliography}{99} 808 809\bibitem{snoopy} Schulz C.M., {\it Insuperabilis Snupius}, translated into 810latin by G.~Angelino, European Language Institute, Recanati, Italy, 1984 811 812\bibitem{MMouse} Walt Disney, {\it Michael Musculus et Regina Africae}, 813translated into latin by C.~Egger, European Language Institute, Recanati, 814Italy, 1986 815 816\bibitem{asterix} Goshinny and Uderzo, {\it Asterix gladiator}, translated 817into latin by K.H.G.~von Rothenburg ({\it Rubricastellanus\/}), Delta, 818Stuttgart, 1978 819 820\bibitem{migliorini} Migliorini B.M., {\it Storia della lingua italiana}, 821(History of the italian language), Sansoni, Firenze 1963 822 823\bibitem{knuth} Graham R.L., Knuth D.E., Patashnik O., {\it Concrete 824mathematics}, Addison-Wesley Publ. Co., Reading, Mass., 1989 (3rd printing) 825 826\bibitem{6461} {\it Divisione delle parole in fin di linea} (Word 827hyphenation at the end of a line), published by UNI, Ente Nazionale Italiano 828di Unificazione, Milano, 1969 829 830\bibitem{6015} {\it Segnaccento obbligatorio nell'ortografia della lingua 831italiana} (Obbligatory accent marks for the correct spelling of the italian 832language), bublished by UNI, Ente Nazionale Italiano di Unificazione, 833Milano, 1967 834 835\bibitem{desarmenien} D\'esarm\'enien J., ``The use of \TeX\ in French: 836hyphenation and typography'' in {\it \TeX\ for scientific documentation}, 837D.~Lucarella ed., Addison-Wesley Publ.\ Co., Reading, Mass., 1985 838 839\bibitem{manna} Manna F., {\it Il latino ieri e oggi} (Latin yesterday and 840today), Signorelli, Milano, 1985 841 842\bibitem{merk} {\it Novum Testamentum Graece et Latine} (The New Testament 843In Greek and Latin), A.~Merk S.J.\ ed., Istituto Biblico Pontificio, Roma, 8441984 845 846\bibitem{braams} Braams J.,{\it Babel, a multilingual style option system 847for use with \LaTeX's standard document styles}, \TUB\ vol.12, n.2, 848June~1991, pp.~291-301 849 850\end{thebibliography} 851 852\makesignature 853\end{document} 854 855