• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# © 2016 and later: Unicode, Inc. and others.
2# License & terms of use: http://www.unicode.org/copyright.html#License
3#
4# File: si_si_FONIPA.txt
5# Generated from CLDR
6#
7
8# Sinhala pronunciation rules
9#
10# Output
11#     k ɡ ŋ ᵑɡ c ɟ ɲ ʈ ɖ ⁿɖ t d n ⁿd p b m ᵐb j r l w ʃ s h f
12#     ə əː a aː æ æː i iː u uː e eː o oː
13#
14# References
15# [1] Asanka Wasala, Ruvan Weerasinghe, and Kumudu Gamage:
16#     Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwa Epenthesis.
17#     Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions,
18#     pages 890–897. http://www.aclweb.org/anthology/P06-2114
19# Simplify ya + yansaya to plain ya after a consonant.
20[\u0D9A-\u0DC6] \u0DCA (\u200D)? { ය\u0DCA‍ය → ය;
21# Delete ZWNJ and ZWJ to simplify further processing.
22\u200C → ;
23\u200D → ;
24# Insert a schwa after every consonant that is not followed by a dependent vowel
25# or virama.
26::Null;
27([\u0D9A-\u0DC6]) } [^\u0DCA-\u0DDF \u0DF2\u0DF3] → $1 ə;
28# Pronunciation rules proper.
29::Null;
30# fප is an alternative spelling of ෆ.
31# This occurs e.g. in ඩේව\u0DD2ඩ\u0DCA කොපර\u0DCAfප\u0DD3ල\u0DCAඩ\u0DCA (David Copperfield)
32# [see http://bradshawofthefuture.blogspot.com/2013/02/f.html].
33[Ff]ප → f;
34# zස is seemingly the only way to unambiguously indicate a voiced /z/ sound.
35# This occurs in e.g. ඇල\u0DCAzසය\u0DD2ම' රෝගය (Alzheimer's disease)
36# [see https://si.wikipedia.org/wiki/ඇල\u0DCAzසය\u0DD2ම%27_රෝගය]
37# or in zස\u0DD3බ\u0DCA‍රා (zebra) [see https://si.wikipedia.org/wiki/‍zස\u0DD3බ\u0DCA‍රා].
38[Zz]ස → z;
39ං → ŋ;
40o → ŋ;  # common substitution for anusvaraya
41ඃ ([\u0D9A-\u0DC6]) → | $1 \u0DCA $1;  # TODO: check which consonants geminate
42ඃ → h;
43අ → a;
44ආ → aː;
45ඇ → æ;
46ඈ → æː;
47ඉ → i;
48ඊ → iː;
49උ → u;
50ඌ → uː;
51ඍ → ri;
52ඎ → ruː;
53ඏ → ilu;
54ඐ → iluː;
55එ → e;
56ඒ → eː;
57ඓ → aj;
58ඔ → o;
59ඕ → oː;
60ඖ → aw;  # TODO: check if this is correct
61ක → k;
62ඛ → k;
63ග → ɡ;
64ඝ → ɡ;
65ඞ → ŋ;
66ඟ → ᵑɡ;
67ච → c;
68ඡ → c;
69ජ → ɟ;
70ඣ → ɟ;
71ඤ → ɲ;
72ඥ → kɲ;  # TODO: double-check
73ඦ → ɟ;
74ට → ʈ;
75ඨ → ʈ;
76ඩ → ɖ;
77ඪ → ɖ;
78ණ → n;
79ඬ → ⁿɖ;
80ත → t;
81ථ → t;
82ද → d;
83ධ → d;
84න → n;
85ඳ → ⁿd;
86ප → p;
87ඵ → p;
88බ → b;
89භ → b;
90ම → m;
91ඹ → ᵐb;
92ය → j;
93ර → r;
94ල → l;
95ව → w;
96ශ → ʃ;
97ෂ → ʃ;
98ස → s;
99හ → h;
100ළ → l;
101ෆ → f;
102\u0DCA → ;  # delete virama
103ා → aː;
104ැ → æ;
105ෑ → æː;
106\u0DD2 → i;
107\u0DD3 → iː;
108\u0DD4 → u;
109\u0DD6 → uː;
110ෘ → ru;
111ෙ → e;
112ේ → eː;
113ෛ → aj;
114ො → o;
115ෝ → oː;
116ෞ → aw;  # TODO: check if this is correct
117ෟ → lu;
118ෲ → ruː;
119ෳ → luː;
120# Heuristics for turning /ə/ into /a/. Based on [1].
121$c=[k ɡ ŋ {ᵑɡ} c ɟ ɲ ʈ ɖ {ⁿɖ} t d n {ⁿd} p b m {ᵐb} j r l w ʃ s z h f];
122$s=[:^L:];
123# Rule #1
124::Null;
125$s sv    { ə      → ə;  # exception (a)
126$s k     { ə } r  → ə;  # exception (b)
127$s $c    { ə } $s → ə;  # exception (c)
128$s $c $c { ə      → a;
129$s $c    { ə      → a;
130# Rule #2
131::Null;
132$c r { ə } $c → a;  # clause (a) and (b)
133$c r { a } h  → a;  # clause (d), exception
134$c r { a } $c → ə;  # clause (c)
135# Rule #3
136# The paper is unclear about what this rule means. The interpretation here
137# assumes that "preceded" in the paper is a typo and should be read "followed".
138::Null;
139[a e æ o ə] h { ə → a;
140# Rules #4 through #7
141::Null;
142ə } $c $c     → a;  # Rule #4
143ə } [rbɖʈ] $s → ə;  # Rule #5 exception
144ə } $c     $s → a;  # Rule #5
145ə } ji     $s → a;  # Rule #6
146k { ə } [rl] u    → a;  # Rule #7
147# Rule #8
148# Note that the paper doesn't say explicitly that this rule should be
149# anchored at the beginning of a word, but the remarks before the rules
150# seem to imply this.
151::Null;
152$s k { a } l[aeo]ːj   → ə;  # Typo in paper: /j/ was /y/.
153$s k { a } le[mh][ui] → ə;
154$s k { alə } h[ui]    → əle;
155$s k { a } lə         → ə;
156# Diphthongs
157::Null;
158www+ → ww;  # යෞව\u0DCAවන
159[i {iː} e {eː} æ {æː} o {oː} a {aː}] { wu → w;
160əji → aj;
161iji → iː;  # perhaps: ij
162[u {uː} e {eː} æ {æː} o {oː} a {aː}] { ji → j;
163
164