1# © 2016 and later: Unicode, Inc. and others. 2# License & terms of use: http://www.unicode.org/copyright.html#License 3# 4# File: sat_Olck_sat_FONIPA.txt 5# Generated from CLDR 6# 7 8# Santali (Ol Chiki) → Santali (International Phonetic Alphabet) 9# Output 10# ------ 11# m mː n nː ɳ ɳː ɲ ɲː ŋ ŋː 12# p pʰ pʼ b bʰ t tʰ tʼ d dʰ ʈ ʈʰ ɖ ɖʰ c cʰ cʼ k kʰ kʼ ɡ ʔ 13# s sː h 14# d\u0361ʒ 15# ɽ r 16# l lː 17# w wː w\u0303 w\u0303ː 18# 19# i iː ĩ ĩː u uː ũ ũː 20# e eː ẽ ẽː ə əː ə\u0303 ə\u0303ː o oː õ õː 21# ɛ ɛː ɛ\u0303 ɛ\u0303ː ɔ ɔː ɔ\u0303 ɔ\u0303ː 22# a aː ã ãː 23# References 24# ---------- 25# [1] Michael Everson: Final proposal to encode the Ol Chiki script 26# in the UCS. ISO/IEC JTC1/SC2/WG2 Working Group Document N2984R, 27# September 21, 2005. http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2984.pdf 28# 29# [2] George L. Campbell: Compendium of the World's Languages. 30# Volume 2: Ladakhi to Zuni. ISBN 0-415-20297-3. Taylor & Francis, 2000. 31# Pages 1454 to 1458. 32# Notes 33# ----- 34# According to [1] (page 3), ᱽ can only follow the four ejective 35# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/; these become 36# ᱵᱽ /b/, ᱫᱽ /d/, ᱡᱽ /d\u0361ʒ/, and ᱜᱽ /ɡ/. In online texts, however, 37# we have occasionally encountered ᱽ following non-ejective plosives, 38# for example after ᱯ /p/. These might possibly be typos. Our rules 39# try to be resilient and handle ᱯᱽ as /b/. 40# 41# According to [1] (page 2), U+1C7C PHAARKAA follows the four “glottal” 42# consonants ᱵ /pʼ/, ᱡ /cʼ/, ᱫ /tʼ/, and ᱜ /kʼ/ (these are actually 43# ejective, not glottal). In online texts, however, we have frequently 44# encountered ᱼ following non-ejective consonants. 45$inword = [[:L:][:M:]]; 46# Some online texts use a decomposed form of U+1C7A MU-GAAHLAA TTUDDAG. 47ᱹᱸ → ᱺ ; 48ᱸᱹ → ᱺ ; 49::null(); 50# To simplify the rules below, enforce a uniform ordering of marks. 51ᱻᱹ → ᱹᱻ ; 52ᱻᱸ → ᱸᱻ ; 53ᱻᱺ → ᱺᱻ ; 54ᱼᱹ → ᱹᱼ ; 55ᱼᱸ → ᱸᱼ ; 56ᱼᱺ → ᱺᱼ ; 57::null(); 58# Some online texts use U+1C7C PHAARKAA instead of U+1C7B RELAA for indicating 59# long phonemes, presumably because the graphemes look similar in some fonts. 60# Since phaarkaa is used for voicing ejectives and plosives (which cannot 61# be lenghtened), we rewrite phaarkaa to relaa. 62[ᱚᱟᱤᱩᱮᱳᱶᱢᱝᱞᱱ] [ᱹᱸᱺ]* {ᱼ} → ᱻ ; 63::null(); 64ᱚᱹᱻ → ɔː ; 65ᱚᱹ → ɔ ; 66ᱚᱸᱻ → ɔ\u0303ː ; 67ᱚᱸ → ɔ\u0303 ; 68ᱚᱺᱻ → ɔ\u0303ː ; 69ᱚᱺ → ɔ\u0303 ; 70ᱚᱻ → ɔː ; 71ᱚ → ɔ ; 72ᱛᱼ → t ; 73ᱛᱷ → tʰ ; 74ᱛᱽ → d ; 75$inword {ᱛ} → d ; 76ᱛ → t ; 77ᱜᱼ → kʼ ; 78ᱜᱷ → kʰ ; 79ᱜᱽ → ɡ ; 80$inword {ᱜ} → ɡ ; 81ᱜ → kʼ ; 82ᱝᱻ → ŋː ; 83ᱝ → ŋ ; 84ᱞᱻ → lː ; 85ᱞ → l ; 86ᱟᱹᱻ → əː ; 87ᱟᱹ → ə ; 88ᱟᱸᱻ → ãː ; 89ᱟᱸ → ã ; 90ᱟᱺᱻ → ə\u0303ː ; 91ᱟᱺ → ə\u0303 ; 92ᱟᱻ → aː ; 93ᱟ → a ; 94ᱠᱼ → k ; 95ᱠᱷ → kʰ ; 96ᱠᱽ → ɡ ; 97ᱠ → k ; 98ᱡᱼ → cʼ ; 99ᱡᱷ → cʰ ; 100ᱡᱽ → d\u0361ʒ ; 101$inword {ᱡ} → d\u0361ʒ ; 102ᱡ → cʼ ; 103ᱢᱻ → mː ; 104ᱢ → m ; 105# According to [1], ᱣ is sometimes /v/ and sometimes /w/. 106# TODO: Find out if there is a rule for this. 107ᱣᱸ → w\u0303 ; 108ᱣ → w ; 109ᱤᱹᱻ → iː ; 110ᱤᱹ → i ; 111ᱤᱸᱻ → ĩː ; 112ᱤᱸ → ĩ ; 113ᱤᱺᱻ → ĩː ; 114ᱤᱺ → ĩ ; 115ᱤᱻ → iː ; 116ᱤ → i ; 117ᱥᱻ → sː ; 118ᱥ → s ; 119# According to [1], ᱦ is sometimes /h/ and sometimes /ʔ/. 120# TODO: Find out if there is a rule for this. 121ᱦ → h ; 122ᱧᱻ → ɲː ; 123ᱧ → ɲ ; 124ᱨᱻ → r ; 125ᱨ → r ; 126ᱩᱹᱻ → uː ; 127ᱩᱹ → u ; 128ᱩᱸᱻ → ũː ; 129ᱩᱸ → ũ ; 130ᱩᱺᱻ → ũː ; 131ᱩᱺ → ũ ; 132ᱩᱻ → uː ; 133ᱩ → u ; 134ᱪᱼ → c ; 135ᱪᱷ → cʰ ; 136ᱪᱽ → d\u0361ʒ ; 137ᱪ → c ; 138ᱫᱼ → tʼ ; 139ᱫᱷ → tʰ ; 140ᱫᱽ → d ; 141$inword {ᱫ} → d ; 142ᱫ → tʼ ; 143ᱬᱻ → ɳː ; 144ᱬ → ɳ ; 145# TODO: ᱵᱷᱭᱨᱚᱵ → bʰhrɔb seems unlikely; would be good to verify. 146ᱭ → h ; 147ᱮᱹᱻ → ɛː ; 148ᱮᱹ → ɛ ; 149ᱮᱺᱻ → ɛ\u0303ː ; 150ᱮᱺ → ɛ\u0303 ; 151ᱮᱸᱻ → ẽː ; 152ᱮᱸ → ẽ ; 153ᱮᱻ → eː ; 154ᱮ → e ; 155ᱯᱼ → p ; 156ᱯᱷ → pʰ ; 157ᱯᱽ → b ; 158ᱯ → p ; 159ᱰᱷ → ɖʰ ; 160ᱰ → ɖ ; 161ᱱᱻ → nː ; 162ᱱ → n ; 163ᱲᱻ → ɽ ; 164ᱲ → ɽ ; 165ᱳᱸᱻ → õː ; 166ᱳᱸ → õ ; 167ᱳᱻ → oː ; 168ᱳ → o ; 169ᱴᱼ → ʈ ; 170ᱴᱷ → ʈʰ ; 171ᱴᱽ → ɖ ; 172ᱴ → ʈ ; 173ᱵᱼ → pʼ ; 174ᱵᱷ → bʰ ; 175ᱵᱽ → b ; 176$inword {ᱵ} → b ; 177ᱵ → pʼ ; 178ᱶᱻ → w\u0303ː ; 179ᱶ → w\u0303 ; 180 181