• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# © 2016 and later: Unicode, Inc. and others.
2# License & terms of use: http://www.unicode.org/copyright.html#License
3#
4# File: sat_Olck_sat_FONIPA.txt
5# Generated from CLDR
6#
7
8# Santali (Ol Chiki) → Santali (International Phonetic Alphabet)
9# Output
10# ------
11# m mː n nː ɳ ɳː ɲ ɲː ŋ ŋː
12# p pʰ pʼ b bʰ t tʰ tʼ d dʰ ʈ ʈʰ ɖ ɖʰ c cʰ cʼ k kʰ kʼ ɡ ʔ
13# s sː h
14# d\u0361ʒ
15# ɽ r
16# l lː
17# w wː w\u0303 w\u0303ː
18#
19# i iː ĩ ĩː u uː ũ ũː
20# e eː ẽ ẽː ə əː ə\u0303 ə\u0303ː o oː õ õː
21# ɛ ɛː ɛ\u0303 ɛ\u0303ː ɔ ɔː ɔ\u0303 ɔ\u0303ː
22# a aː ã ãː
23# References
24# ----------
25# [1] Michael Everson: Final proposal to encode the Ol Chiki script
26#     in the UCS.  ISO/IEC JTC1/SC2/WG2 Working Group Document N2984R,
27#     September 21, 2005.  http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2984.pdf
28#
29# [2] George L. Campbell: Compendium of the World's Languages.
30#     Volume 2: Ladakhi to Zuni. ISBN 0-415-20297-3.  Taylor & Francis, 2000.
31#     Pages 1454 to 1458.
32# Notes
33# -----
34# According to [1] (page 3), ᱽ can only follow the four ejective
35# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/; these become
36# ᱵᱽ /b/, ᱫᱽ /d/, ᱡᱽ /d\u0361ʒ/, and ᱜᱽ /ɡ/.  In online texts, however,
37# we have occasionally encountered ᱽ following non-ejective plosives,
38# for example after ᱯ /p/. These might possibly be typos.  Our rules
39# try to be resilient and handle ᱯᱽ as /b/.
40#
41# According to [1] (page 2), U+1C7C PHAARKAA follows the four “glottal”
42# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/ (these are actually
43# ejective, not glottal).  In online texts, however, we have frequently
44# encountered ᱼ following non-ejective consonants.
45$inword = [[:L:][:M:]];
46# Some online texts use a decomposed form of U+1C7A MU-GAAHLAA TTUDDAG.
47ᱹᱸ → ᱺ ;
48ᱸᱹ → ᱺ ;
49::null();
50# To simplify the rules below, enforce a uniform ordering of marks.
51ᱻᱹ → ᱹᱻ ;
52ᱻᱸ → ᱸᱻ ;
53ᱻᱺ → ᱺᱻ ;
54ᱼᱹ → ᱹᱼ ;
55ᱼᱸ → ᱸᱼ ;
56ᱼᱺ → ᱺᱼ ;
57::null();
58# Some online texts use U+1C7C PHAARKAA instead of U+1C7B RELAA for indicating
59# long phonemes, presumably because the graphemes look similar in some fonts.
60# Since phaarkaa is used for voicing ejectives and plosives (which cannot
61# be lenghtened), we rewrite phaarkaa to relaa.
62[ᱚᱟᱤᱩᱮᱳᱶᱢᱝᱞᱱ] [ᱹᱸᱺ]* {ᱼ} → ᱻ ;
63::null();
64ᱚᱹᱻ → ɔː ;
65ᱚᱹ → ɔ ;
66ᱚᱸᱻ → ɔ\u0303ː ;
67ᱚᱸ → ɔ\u0303 ;
68ᱚᱺᱻ → ɔ\u0303ː ;
69ᱚᱺ → ɔ\u0303 ;
70ᱚᱻ → ɔː ;
71ᱚ → ɔ ;
72ᱛᱼ → t ;
73ᱛᱷ → tʰ ;
74ᱛᱽ → d ;
75$inword {ᱛ} → d ;
76ᱛ → t ;
77ᱜᱼ → kʼ ;
78ᱜᱷ → kʰ ;
79ᱜᱽ → ɡ ;
80$inword {ᱜ} → ɡ ;
81ᱜ → kʼ ;
82ᱝᱻ → ŋː ;
83ᱝ → ŋ ;
84ᱞᱻ → lː ;
85ᱞ → l ;
86ᱟᱹᱻ → əː ;
87ᱟᱹ → ə ;
88ᱟᱸᱻ → ãː ;
89ᱟᱸ → ã ;
90ᱟᱺᱻ → ə\u0303ː ;
91ᱟᱺ → ə\u0303 ;
92ᱟᱻ → aː ;
93ᱟ → a ;
94ᱠᱼ → k ;
95ᱠᱷ → kʰ ;
96ᱠᱽ → ɡ ;
97ᱠ → k ;
98ᱡᱼ → cʼ ;
99ᱡᱷ → cʰ ;
100ᱡᱽ →  d\u0361ʒ ;
101$inword {ᱡ} →  d\u0361ʒ ;
102ᱡ → cʼ ;
103ᱢᱻ → mː ;
104ᱢ → m ;
105# According to [1], ᱣ is sometimes /v/ and sometimes /w/.
106# TODO: Find out if there is a rule for this.
107ᱣᱸ → w\u0303 ;
108ᱣ → w ;
109ᱤᱹᱻ → iː ;
110ᱤᱹ → i ;
111ᱤᱸᱻ → ĩː ;
112ᱤᱸ → ĩ ;
113ᱤᱺᱻ → ĩː ;
114ᱤᱺ → ĩ ;
115ᱤᱻ → iː ;
116ᱤ → i ;
117ᱥᱻ → sː ;
118ᱥ → s ;
119# According to [1], ᱦ is sometimes /h/ and sometimes /ʔ/.
120# TODO: Find out if there is a rule for this.
121ᱦ → h ;
122ᱧᱻ → ɲː ;
123ᱧ → ɲ ;
124ᱨᱻ → r ;
125ᱨ → r ;
126ᱩᱹᱻ → uː ;
127ᱩᱹ → u ;
128ᱩᱸᱻ → ũː ;
129ᱩᱸ → ũ ;
130ᱩᱺᱻ → ũː ;
131ᱩᱺ → ũ ;
132ᱩᱻ → uː ;
133ᱩ → u ;
134ᱪᱼ → c ;
135ᱪᱷ → cʰ ;
136ᱪᱽ →  d\u0361ʒ ;
137ᱪ → c ;
138ᱫᱼ → tʼ ;
139ᱫᱷ → tʰ ;
140ᱫᱽ → d ;
141$inword {ᱫ} → d ;
142ᱫ → tʼ ;
143ᱬᱻ → ɳː ;
144ᱬ → ɳ ;
145# TODO: ᱵᱷᱭᱨᱚᱵ → bʰhrɔb seems unlikely; would be good to verify.
146ᱭ → h ;
147ᱮᱹᱻ → ɛː ;
148ᱮᱹ → ɛ ;
149ᱮᱺᱻ → ɛ\u0303ː ;
150ᱮᱺ → ɛ\u0303 ;
151ᱮᱸᱻ → ẽː ;
152ᱮᱸ → ẽ ;
153ᱮᱻ → eː ;
154ᱮ → e ;
155ᱯᱼ → p ;
156ᱯᱷ → pʰ ;
157ᱯᱽ → b ;
158ᱯ → p ;
159ᱰᱷ → ɖʰ ;
160ᱰ → ɖ ;
161ᱱᱻ → nː ;
162ᱱ → n ;
163ᱲᱻ → ɽ ;
164ᱲ → ɽ ;
165ᱳᱸᱻ → õː ;
166ᱳᱸ → õ ;
167ᱳᱻ → oː ;
168ᱳ → o ;
169ᱴᱼ → ʈ ;
170ᱴᱷ → ʈʰ ;
171ᱴᱽ → ɖ ;
172ᱴ → ʈ ;
173ᱵᱼ → pʼ ;
174ᱵᱷ → bʰ ;
175ᱵᱽ → b ;
176$inword {ᱵ} → b ;
177ᱵ → pʼ ;
178ᱶᱻ → w\u0303ː ;
179ᱶ → w\u0303 ;
180
181