1# © 2016 and later: Unicode, Inc. and others. 2# License & terms of use: http://www.unicode.org/copyright.html 3# Generated using tools/cldr/cldr-to-icu/build-icu-data.xml 4# 5# File: Hira_Kana.txt 6# Generated from CLDR 7# 8 9# note: a global filter is more efficient, but MUST include all source chars 10:: [[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]-[\u309B \u309C]]; 11:: NFKC (NFC); 12# Hiragana-Katakana 13# This is largely a one-to-one mapping, but it has a 14# few kinks: 15# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no 16# Hiragana equivalents. We use Hiragana wa/wi/we/wo 17# (308F-3092) with a voicing mark (3099), which is 18# semantically equivalent. However, this is a non- 19# roundtripping transformation. 20# 2. The Katakana small ka/ke (30F5,30F6) have no 21# Hiragana equiavlents. We convert them to normal 22# Hiragana ka/ke (304B,3051). This is a one-way 23# information-losing transformation and precludes 24# round-tripping of 30F5 and 30F6. 25# 3. The combining marks 3099-309C are in the Hiragana 26# block, but they apply to Katakana as well, so we 27# leave them untouched. 28# 4. The Katakana prolonged sound mark 30FC doubles the 29# preceding vowel. This is a one-way information- 30# losing transformation from Katakana to Hiragana. 31# 5. The Katakana middle dot separates words in foreign 32# expressions; we leave this unmodified. 33# The above points preclude successful round-trip 34# transformations of arbitrary input text. However, 35# they provide naturalistic results that should conform 36# to user expectations. 37# Combining equivalents va/vi/ve/vo 38わ\u3099 ↔ ヷ; 39ゐ\u3099 ↔ ヸ; 40ゑ\u3099 ↔ ヹ; 41を\u3099 ↔ ヺ; 42# One-to-one mappings, main block 43# 3041:3094 ↔ 30A1:30F4 44# 309D,E ↔ 30FD,E 45ぁ ↔ ァ; 46あ ↔ ア; 47ぃ ↔ ィ; 48い ↔ イ; 49ぅ ↔ ゥ; 50う ↔ ウ; 51ぇ ↔ ェ; 52え ↔ エ; 53ぉ ↔ ォ; 54お ↔ オ; 55か ↔ カ; 56が ↔ ガ; 57き ↔ キ; 58ぎ ↔ ギ; 59く ↔ ク; 60ぐ ↔ グ; 61け ↔ ケ; 62げ ↔ ゲ; 63こ ↔ コ; 64ご ↔ ゴ; 65さ ↔ サ; 66ざ ↔ ザ; 67し ↔ シ; 68じ ↔ ジ; 69す ↔ ス; 70ず ↔ ズ; 71せ ↔ セ; 72ぜ ↔ ゼ; 73そ ↔ ソ; 74ぞ ↔ ゾ; 75た ↔ タ; 76だ ↔ ダ; 77ち ↔ チ; 78ぢ ↔ ヂ; 79っ ↔ ッ; 80つ ↔ ツ; 81づ ↔ ヅ; 82て ↔ テ; 83で ↔ デ; 84と ↔ ト; 85ど ↔ ド; 86な ↔ ナ; 87に ↔ ニ; 88ぬ ↔ ヌ; 89ね ↔ ネ; 90の ↔ ノ; 91は ↔ ハ; 92ば ↔ バ; 93ぱ ↔ パ; 94ひ ↔ ヒ; 95び ↔ ビ; 96ぴ ↔ ピ; 97ふ ↔ フ; 98ぶ ↔ ブ; 99ぷ ↔ プ; 100へ ↔ ヘ; 101べ ↔ ベ; 102ぺ ↔ ペ; 103ほ ↔ ホ; 104ぼ ↔ ボ; 105ぽ ↔ ポ; 106ま ↔ マ; 107み ↔ ミ; 108む ↔ ム; 109め ↔ メ; 110も ↔ モ; 111ゃ ↔ ャ; 112や ↔ ヤ; 113ゅ ↔ ュ; 114ゆ ↔ ユ; 115ょ ↔ ョ; 116よ ↔ ヨ; 117ら ↔ ラ; 118り ↔ リ; 119る ↔ ル; 120れ ↔ レ; 121ろ ↔ ロ; 122ゎ ↔ ヮ; 123わ ↔ ワ; 124ゐ ↔ ヰ; 125ゑ ↔ ヱ; 126を ↔ ヲ; 127ん ↔ ン; 128ゔ ↔ ヴ; 129ゝ ↔ ヽ; 130ゞ ↔ ヾ; 131# One-way Katakana-Hiragana xform of small K ka/ke to 132# normal H ka/ke. 133か ← ヵ; 134け ← ヶ; 135# Katakana followed by a prolonged sound mark 30FC has 136# its final vowel doubled. This is a Katakana-Hiragana 137# one-way information-losing transformation. We 138# include the small Katakana (e.g., small A 3041) and 139# do not distinguish them from their large 140# counterparts. It doesn't make sense to double a 141# small counterpart vowel as a small Hiragana vowel, so 142# we don't do so. In natural text this should never 143# occur anyway. If a 30FC is seen without a preceding 144# vowel sound (e.g., after n 30F3) we do not change it. 145### $long = ー; 146# The following categories are Hiragana, not Katakana 147# as might be expected, since by the time we get to the 148# 30FC, the preceding character will have already been 149# transformed to Hiragana. 150# {The following mechanically generated from the 151# Unicode 3.0 data:} 152$xa = [ \ 153ぁ あ か が さ ざ \ 154た だ な は ば ぱ \ 155ま ゃ や ら ゎ わ \ 156]; 157$xi = [ \ 158ぃ い き ぎ し じ \ 159ち ぢ に ひ び ぴ \ 160み り ゐ \ 161]; 162$xu = [ \ 163ぅ う く ぐ す ず \ 164っ つ づ ぬ ふ ぶ \ 165ぷ む ゅ ゆ る ゔ \ 166]; 167$xe = [ \ 168ぇ え け げ せ ぜ \ 169て で ね へ べ ぺ \ 170め れ ゑ \ 171]; 172$xo = [ \ 173ぉ お こ ご そ ぞ \ 174と ど の ほ ぼ ぽ \ 175も ょ よ ろ を \ 176]; 177あ ← $xa {ー}; 178い ← $xi {ー}; 179う ← $xu {ー}; 180え ← $xe {ー}; 181お ← $xo {ー}; 182:: NFC (NFKC) ; 183# note: a global filter is more efficient, but MUST include all source chars!! 184:: ([[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]-[\u309B \u309C]]); 185# eof 186 187