1# © 2016 and later: Unicode, Inc. and others. 2# License & terms of use: http://www.unicode.org/copyright.html#License 3# 4# File: Hira_Kana.txt 5# Generated from CLDR 6# 7 8# note: a global filter is more efficient, but MUST include all source chars 9:: [\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ; 10:: NFKC (); 11# Hiragana-Katakana 12# This is largely a one-to-one mapping, but it has a 13# few kinks: 14# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no 15# Hiragana equivalents. We use Hiragana wa/wi/we/wo 16# (308F-3092) with a voicing mark (3099), which is 17# semantically equivalent. However, this is a non- 18# roundtripping transformation. 19# 2. The Katakana small ka/ke (30F5,30F6) have no 20# Hiragana equiavlents. We convert them to normal 21# Hiragana ka/ke (304B,3051). This is a one-way 22# information-losing transformation and precludes 23# round-tripping of 30F5 and 30F6. 24# 3. The combining marks 3099-309C are in the Hiragana 25# block, but they apply to Katakana as well, so we 26# leave them untouched. 27# 4. The Katakana prolonged sound mark 30FC doubles the 28# preceding vowel. This is a one-way information- 29# losing transformation from Katakana to Hiragana. 30# 5. The Katakana middle dot separates words in foreign 31# expressions; we leave this unmodified. 32# The above points preclude successful round-trip 33# transformations of arbitrary input text. However, 34# they provide naturalistic results that should conform 35# to user expectations. 36# Combining equivalents va/vi/ve/vo 37わ\u3099 ↔ ヷ; 38ゐ\u3099 ↔ ヸ; 39ゑ\u3099 ↔ ヹ; 40を\u3099 ↔ ヺ; 41# One-to-one mappings, main block 42# 3041:3094 ↔ 30A1:30F4 43# 309D,E ↔ 30FD,E 44ぁ ↔ ァ; 45あ ↔ ア; 46ぃ ↔ ィ; 47い ↔ イ; 48ぅ ↔ ゥ; 49う ↔ ウ; 50ぇ ↔ ェ; 51え ↔ エ; 52ぉ ↔ ォ; 53お ↔ オ; 54か ↔ カ; 55が ↔ ガ; 56き ↔ キ; 57ぎ ↔ ギ; 58く ↔ ク; 59ぐ ↔ グ; 60け ↔ ケ; 61げ ↔ ゲ; 62こ ↔ コ; 63ご ↔ ゴ; 64さ ↔ サ; 65ざ ↔ ザ; 66し ↔ シ; 67じ ↔ ジ; 68す ↔ ス; 69ず ↔ ズ; 70せ ↔ セ; 71ぜ ↔ ゼ; 72そ ↔ ソ; 73ぞ ↔ ゾ; 74た ↔ タ; 75だ ↔ ダ; 76ち ↔ チ; 77ぢ ↔ ヂ; 78っ ↔ ッ; 79つ ↔ ツ; 80づ ↔ ヅ; 81て ↔ テ; 82で ↔ デ; 83と ↔ ト; 84ど ↔ ド; 85な ↔ ナ; 86に ↔ ニ; 87ぬ ↔ ヌ; 88ね ↔ ネ; 89の ↔ ノ; 90は ↔ ハ; 91ば ↔ バ; 92ぱ ↔ パ; 93ひ ↔ ヒ; 94び ↔ ビ; 95ぴ ↔ ピ; 96ふ ↔ フ; 97ぶ ↔ ブ; 98ぷ ↔ プ; 99へ ↔ ヘ; 100べ ↔ ベ; 101ぺ ↔ ペ; 102ほ ↔ ホ; 103ぼ ↔ ボ; 104ぽ ↔ ポ; 105ま ↔ マ; 106み ↔ ミ; 107む ↔ ム; 108め ↔ メ; 109も ↔ モ; 110ゃ ↔ ャ; 111や ↔ ヤ; 112ゅ ↔ ュ; 113ゆ ↔ ユ; 114ょ ↔ ョ; 115よ ↔ ヨ; 116ら ↔ ラ; 117り ↔ リ; 118る ↔ ル; 119れ ↔ レ; 120ろ ↔ ロ; 121ゎ ↔ ヮ; 122わ ↔ ワ; 123ゐ ↔ ヰ; 124ゑ ↔ ヱ; 125を ↔ ヲ; 126ん ↔ ン; 127ゔ ↔ ヴ; 128ゝ ↔ ヽ; 129ゞ ↔ ヾ; 130# One-way Katakana-Hiragana xform of small K ka/ke to 131# normal H ka/ke. 132か ← ヵ; 133け ← ヶ; 134# Katakana followed by a prolonged sound mark 30FC has 135# its final vowel doubled. This is a Katakana-Hiragana 136# one-way information-losing transformation. We 137# include the small Katakana (e.g., small A 3041) and 138# do not distinguish them from their large 139# counterparts. It doesn't make sense to double a 140# small counterpart vowel as a small Hiragana vowel, so 141# we don't do so. In natural text this should never 142# occur anyway. If a 30FC is seen without a preceding 143# vowel sound (e.g., after n 30F3) we do not change it. 144### $long = ー; 145# The following categories are Hiragana, not Katakana 146# as might be expected, since by the time we get to the 147# 30FC, the preceding character will have already been 148# transformed to Hiragana. 149# {The following mechanically generated from the 150# Unicode 3.0 data:} 151$xa = [ \ 152ぁ あ か が さ ざ \ 153た だ な は ば ぱ \ 154ま ゃ や ら ゎ わ \ 155]; 156$xi = [ \ 157ぃ い き ぎ し じ \ 158ち ぢ に ひ び ぴ \ 159み り ゐ \ 160]; 161$xu = [ \ 162ぅ う く ぐ す ず \ 163っ つ づ ぬ ふ ぶ \ 164ぷ む ゅ ゆ る ゔ \ 165]; 166$xe = [ \ 167ぇ え け げ せ ぜ \ 168て で ね へ べ ぺ \ 169め れ ゑ \ 170]; 171$xo = [ \ 172ぉ お こ ご そ ぞ \ 173と ど の ほ ぼ ぽ \ 174も ょ よ ろ を \ 175]; 176あ ← $xa {ー}; 177い ← $xi {ー}; 178う ← $xu {ー}; 179え ← $xe {ー}; 180お ← $xo {ー}; 181:: (NFKC) ; 182# note: a global filter is more efficient, but MUST include all source chars!! 183:: ([\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]); 184# eof 185 186