1# © 2016 and later: Unicode, Inc. and others. 2# License & terms of use: http://www.unicode.org/copyright.html#License 3# 4# File: si_si_FONIPA.txt 5# Generated from CLDR 6# 7 8# Sinhala pronunciation rules 9# 10# Output 11# k ɡ ŋ ᵑɡ c ɟ ɲ ʈ ɖ ⁿɖ t d n ⁿd p b m ᵐb j r l w ʃ s h f 12# ə əː a aː æ æː i iː u uː e eː o oː 13# 14# References 15# [1] Asanka Wasala, Ruvan Weerasinghe, and Kumudu Gamage: 16# Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwa Epenthesis. 17# Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 18# pages 890–897. http://www.aclweb.org/anthology/P06-2114 19# Simplify ya + yansaya to plain ya after a consonant. 20[\u0D9A-\u0DC6] \u0DCA (\u200D)? { ය\u0DCAය → ය; 21# Delete ZWNJ and ZWJ to simplify further processing. 22\u200C → ; 23\u200D → ; 24# Insert a schwa after every consonant that is not followed by a dependent vowel 25# or virama. 26::Null; 27([\u0D9A-\u0DC6]) } [^\u0DCA-\u0DDF \u0DF2\u0DF3] → $1 ə; 28# Pronunciation rules proper. 29::Null; 30# fප is an alternative spelling of ෆ. 31# This occurs e.g. in ඩේව\u0DD2ඩ\u0DCA කොපර\u0DCAfප\u0DD3ල\u0DCAඩ\u0DCA (David Copperfield) 32# [see http://bradshawofthefuture.blogspot.com/2013/02/f.html]. 33[Ff]ප → f; 34# zස is seemingly the only way to unambiguously indicate a voiced /z/ sound. 35# This occurs in e.g. ඇල\u0DCAzසය\u0DD2ම' රෝගය (Alzheimer's disease) 36# [see https://si.wikipedia.org/wiki/ඇල\u0DCAzසය\u0DD2ම%27_රෝගය] 37# or in zස\u0DD3බ\u0DCAරා (zebra) [see https://si.wikipedia.org/wiki/zස\u0DD3බ\u0DCAරා]. 38[Zz]ස → z; 39ං → ŋ; 40o → ŋ; # common substitution for anusvaraya 41ඃ ([\u0D9A-\u0DC6]) → | $1 \u0DCA $1; # TODO: check which consonants geminate 42ඃ → h; 43අ → a; 44ආ → aː; 45ඇ → æ; 46ඈ → æː; 47ඉ → i; 48ඊ → iː; 49උ → u; 50ඌ → uː; 51ඍ → ri; 52ඎ → ruː; 53ඏ → ilu; 54ඐ → iluː; 55එ → e; 56ඒ → eː; 57ඓ → aj; 58ඔ → o; 59ඕ → oː; 60ඖ → aw; # TODO: check if this is correct 61ක → k; 62ඛ → k; 63ග → ɡ; 64ඝ → ɡ; 65ඞ → ŋ; 66ඟ → ᵑɡ; 67ච → c; 68ඡ → c; 69ජ → ɟ; 70ඣ → ɟ; 71ඤ → ɲ; 72ඥ → kɲ; # TODO: double-check 73ඦ → ɟ; 74ට → ʈ; 75ඨ → ʈ; 76ඩ → ɖ; 77ඪ → ɖ; 78ණ → n; 79ඬ → ⁿɖ; 80ත → t; 81ථ → t; 82ද → d; 83ධ → d; 84න → n; 85ඳ → ⁿd; 86ප → p; 87ඵ → p; 88බ → b; 89භ → b; 90ම → m; 91ඹ → ᵐb; 92ය → j; 93ර → r; 94ල → l; 95ව → w; 96ශ → ʃ; 97ෂ → ʃ; 98ස → s; 99හ → h; 100ළ → l; 101ෆ → f; 102\u0DCA → ; # delete virama 103ා → aː; 104ැ → æ; 105ෑ → æː; 106\u0DD2 → i; 107\u0DD3 → iː; 108\u0DD4 → u; 109\u0DD6 → uː; 110ෘ → ru; 111ෙ → e; 112ේ → eː; 113ෛ → aj; 114ො → o; 115ෝ → oː; 116ෞ → aw; # TODO: check if this is correct 117ෟ → lu; 118ෲ → ruː; 119ෳ → luː; 120# Heuristics for turning /ə/ into /a/. Based on [1]. 121$c=[k ɡ ŋ {ᵑɡ} c ɟ ɲ ʈ ɖ {ⁿɖ} t d n {ⁿd} p b m {ᵐb} j r l w ʃ s z h f]; 122$s=[:^L:]; 123# Rule #1 124::Null; 125$s sv { ə → ə; # exception (a) 126$s k { ə } r → ə; # exception (b) 127$s $c { ə } $s → ə; # exception (c) 128$s $c $c { ə → a; 129$s $c { ə → a; 130# Rule #2 131::Null; 132$c r { ə } $c → a; # clause (a) and (b) 133$c r { a } h → a; # clause (d), exception 134$c r { a } $c → ə; # clause (c) 135# Rule #3 136# The paper is unclear about what this rule means. The interpretation here 137# assumes that "preceded" in the paper is a typo and should be read "followed". 138::Null; 139[a e æ o ə] h { ə → a; 140# Rules #4 through #7 141::Null; 142ə } $c $c → a; # Rule #4 143ə } [rbɖʈ] $s → ə; # Rule #5 exception 144ə } $c $s → a; # Rule #5 145ə } ji $s → a; # Rule #6 146k { ə } [rl] u → a; # Rule #7 147# Rule #8 148# Note that the paper doesn't say explicitly that this rule should be 149# anchored at the beginning of a word, but the remarks before the rules 150# seem to imply this. 151::Null; 152$s k { a } l[aeo]ːj → ə; # Typo in paper: /j/ was /y/. 153$s k { a } le[mh][ui] → ə; 154$s k { alə } h[ui] → əle; 155$s k { a } lə → ə; 156# Diphthongs 157::Null; 158www+ → ww; # යෞව\u0DCAවන 159[i {iː} e {eː} æ {æː} o {oː} a {aː}] { wu → w; 160əji → aj; 161iji → iː; # perhaps: ij 162[u {uː} e {eː} æ {æː} o {oː} a {aː}] { ji → j; 163 164