1<?xml version="1.0" encoding="UTF-8" ?> 2<!DOCTYPE supplementalData SYSTEM "../../common/dtd/ldmlSupplemental.dtd"> 3<!-- 4Copyright © 1991-2013 Unicode, Inc. 5CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/) 6For terms of use, see http://www.unicode.org/copyright.html 7--> 8<supplementalData> 9 <version number="$Revision: 12263 $"/> 10 <transforms> 11 <transform source="Hira" target="Kana" direction="both" alias="Hiragana-Katakana und-Kana-t-und-hira" backwardAlias="Katakana-Hiragana und-Hira-t-und-kana"> 12 <tRule> 13# note: a global filter is more efficient, but MUST include all source chars 14:: [\u0000-\u007E 、。 ゙-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ; 15:: NFKC (); 16# Hiragana-Katakana 17# This is largely a one-to-one mapping, but it has a 18# few kinks: 19# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no 20# Hiragana equivalents. We use Hiragana wa/wi/we/wo 21# (308F-3092) with a voicing mark (3099), which is 22# semantically equivalent. However, this is a non- 23# roundtripping transformation. 24# 2. The Katakana small ka/ke (30F5,30F6) have no 25# Hiragana equiavlents. We convert them to normal 26# Hiragana ka/ke (304B,3051). This is a one-way 27# information-losing transformation and precludes 28# round-tripping of 30F5 and 30F6. 29# 3. The combining marks 3099-309C are in the Hiragana 30# block, but they apply to Katakana as well, so we 31# leave them untouched. 32# 4. The Katakana prolonged sound mark 30FC doubles the 33# preceding vowel. This is a one-way information- 34# losing transformation from Katakana to Hiragana. 35# 5. The Katakana middle dot separates words in foreign 36# expressions; we leave this unmodified. 37# The above points preclude successful round-trip 38# transformations of arbitrary input text. However, 39# they provide naturalistic results that should conform 40# to user expectations. 41# Combining equivalents va/vi/ve/vo 42わ゙ ↔ ヷ; 43ゐ゙ ↔ ヸ; 44ゑ゙ ↔ ヹ; 45を゙ ↔ ヺ; 46# One-to-one mappings, main block 47# 3041:3094 ↔ 30A1:30F4 48# 309D,E ↔ 30FD,E 49ぁ ↔ ァ; 50あ ↔ ア; 51ぃ ↔ ィ; 52い ↔ イ; 53ぅ ↔ ゥ; 54う ↔ ウ; 55ぇ ↔ ェ; 56え ↔ エ; 57ぉ ↔ ォ; 58お ↔ オ; 59か ↔ カ; 60が ↔ ガ; 61き ↔ キ; 62ぎ ↔ ギ; 63く ↔ ク; 64ぐ ↔ グ; 65け ↔ ケ; 66げ ↔ ゲ; 67こ ↔ コ; 68ご ↔ ゴ; 69さ ↔ サ; 70ざ ↔ ザ; 71し ↔ シ; 72じ ↔ ジ; 73す ↔ ス; 74ず ↔ ズ; 75せ ↔ セ; 76ぜ ↔ ゼ; 77そ ↔ ソ; 78ぞ ↔ ゾ; 79た ↔ タ; 80だ ↔ ダ; 81ち ↔ チ; 82ぢ ↔ ヂ; 83っ ↔ ッ; 84つ ↔ ツ; 85づ ↔ ヅ; 86て ↔ テ; 87で ↔ デ; 88と ↔ ト; 89ど ↔ ド; 90な ↔ ナ; 91に ↔ ニ; 92ぬ ↔ ヌ; 93ね ↔ ネ; 94の ↔ ノ; 95は ↔ ハ; 96ば ↔ バ; 97ぱ ↔ パ; 98ひ ↔ ヒ; 99び ↔ ビ; 100ぴ ↔ ピ; 101ふ ↔ フ; 102ぶ ↔ ブ; 103ぷ ↔ プ; 104へ ↔ ヘ; 105べ ↔ ベ; 106ぺ ↔ ペ; 107ほ ↔ ホ; 108ぼ ↔ ボ; 109ぽ ↔ ポ; 110ま ↔ マ; 111み ↔ ミ; 112む ↔ ム; 113め ↔ メ; 114も ↔ モ; 115ゃ ↔ ャ; 116や ↔ ヤ; 117ゅ ↔ ュ; 118ゆ ↔ ユ; 119ょ ↔ ョ; 120よ ↔ ヨ; 121ら ↔ ラ; 122り ↔ リ; 123る ↔ ル; 124れ ↔ レ; 125ろ ↔ ロ; 126ゎ ↔ ヮ; 127わ ↔ ワ; 128ゐ ↔ ヰ; 129ゑ ↔ ヱ; 130を ↔ ヲ; 131ん ↔ ン; 132ゔ ↔ ヴ; 133ゝ ↔ ヽ; 134ゞ ↔ ヾ; 135# One-way Katakana-Hiragana xform of small K ka/ke to 136# normal H ka/ke. 137か ← ヵ; 138け ← ヶ; 139# Katakana followed by a prolonged sound mark 30FC has 140# its final vowel doubled. This is a Katakana-Hiragana 141# one-way information-losing transformation. We 142# include the small Katakana (e.g., small A 3041) and 143# do not distinguish them from their large 144# counterparts. It doesn't make sense to double a 145# small counterpart vowel as a small Hiragana vowel, so 146# we don't do so. In natural text this should never 147# occur anyway. If a 30FC is seen without a preceding 148# vowel sound (e.g., after n 30F3) we do not change it. 149### $long = ー; 150# The following categories are Hiragana, not Katakana 151# as might be expected, since by the time we get to the 152# 30FC, the preceding character will have already been 153# transformed to Hiragana. 154# {The following mechanically generated from the 155# Unicode 3.0 data:} 156$xa = [ \ 157ぁ あ か が さ ざ \ 158た だ な は ば ぱ \ 159ま ゃ や ら ゎ わ \ 160]; 161$xi = [ \ 162ぃ い き ぎ し じ \ 163ち ぢ に ひ び ぴ \ 164み り ゐ \ 165]; 166$xu = [ \ 167ぅ う く ぐ す ず \ 168っ つ づ ぬ ふ ぶ \ 169ぷ む ゅ ゆ る ゔ \ 170]; 171$xe = [ \ 172ぇ え け げ せ ぜ \ 173て で ね へ べ ぺ \ 174め れ ゑ \ 175]; 176$xo = [ \ 177ぉ お こ ご そ ぞ \ 178と ど の ほ ぼ ぽ \ 179も ょ よ ろ を \ 180]; 181あ ← $xa {ー}; 182い ← $xi {ー}; 183う ← $xu {ー}; 184え ← $xe {ー}; 185お ← $xo {ー}; 186:: (NFKC) ; 187# note: a global filter is more efficient, but MUST include all source chars!! 188:: ([\u0000-\u007E 、。 ゙-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]); 189# eof 190 </tRule> 191 </transform> 192 </transforms> 193</supplementalData> 194