• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0" encoding="UTF-8" ?>
2<!DOCTYPE supplementalData SYSTEM "../../common/dtd/ldmlSupplemental.dtd">
3<!--
4Copyright © 1991-2013 Unicode, Inc.
5CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
6For terms of use, see http://www.unicode.org/copyright.html
7-->
8<supplementalData>
9	<version number="$Revision: 12263 $"/>
10	<transforms>
11		<transform source="Hira" target="Kana" direction="both" alias="Hiragana-Katakana und-Kana-t-und-hira" backwardAlias="Katakana-Hiragana und-Hira-t-und-kana">
12			<tRule>
13# note: a global filter is more efficient, but MUST include all source chars
14:: [\u0000-\u007E 、。 ゙-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;
15:: NFKC ();
16# Hiragana-Katakana
17# This is largely a one-to-one mapping, but it has a
18# few kinks:
19# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
20# Hiragana equivalents.  We use Hiragana wa/wi/we/wo
21# (308F-3092) with a voicing mark (3099), which is
22# semantically equivalent.  However, this is a non-
23# roundtripping transformation.
24# 2. The Katakana small ka/ke (30F5,30F6) have no
25# Hiragana equiavlents.  We convert them to normal
26# Hiragana ka/ke (304B,3051).  This is a one-way
27# information-losing transformation and precludes
28# round-tripping of 30F5 and 30F6.
29# 3. The combining marks 3099-309C are in the Hiragana
30# block, but they apply to Katakana as well, so we
31# leave them untouched.
32# 4. The Katakana prolonged sound mark 30FC doubles the
33# preceding vowel.  This is a one-way information-
34# losing transformation from Katakana to Hiragana.
35# 5. The Katakana middle dot separates words in foreign
36# expressions; we leave this unmodified.
37# The above points preclude successful round-trip
38# transformations of arbitrary input text.  However,
39# they provide naturalistic results that should conform
40# to user expectations.
41# Combining equivalents va/vi/ve/vo
42わ゙ ↔ ヷ;
43ゐ゙ ↔ ヸ;
44ゑ゙ ↔ ヹ;
45を゙ ↔ ヺ;
46# One-to-one mappings, main block
47# 3041:3094 ↔ 30A1:30F4
48# 309D,E ↔ 30FD,E
49ぁ ↔ ァ;
50あ ↔ ア;
51ぃ ↔ ィ;
52い ↔ イ;
53ぅ ↔ ゥ;
54う ↔ ウ;
55ぇ ↔ ェ;
56え ↔ エ;
57ぉ ↔ ォ;
58お ↔ オ;
59か ↔ カ;
60が ↔ ガ;
61き ↔ キ;
62ぎ ↔ ギ;
63く ↔ ク;
64ぐ ↔ グ;
65け ↔ ケ;
66げ ↔ ゲ;
67こ ↔ コ;
68ご ↔ ゴ;
69さ ↔ サ;
70ざ ↔ ザ;
71し ↔ シ;
72じ ↔ ジ;
73す ↔ ス;
74ず ↔ ズ;
75せ ↔ セ;
76ぜ ↔ ゼ;
77そ ↔ ソ;
78ぞ ↔ ゾ;
79た ↔ タ;
80だ ↔ ダ;
81ち ↔ チ;
82ぢ ↔ ヂ;
83っ ↔ ッ;
84つ ↔ ツ;
85づ ↔ ヅ;
86て ↔ テ;
87で ↔ デ;
88と ↔ ト;
89ど ↔ ド;
90な ↔ ナ;
91に ↔ ニ;
92ぬ ↔ ヌ;
93ね ↔ ネ;
94の ↔ ノ;
95は ↔ ハ;
96ば ↔ バ;
97ぱ ↔ パ;
98ひ ↔ ヒ;
99び ↔ ビ;
100ぴ ↔ ピ;
101ふ ↔ フ;
102ぶ ↔ ブ;
103ぷ ↔ プ;
104へ ↔ ヘ;
105べ ↔ ベ;
106ぺ ↔ ペ;
107ほ ↔ ホ;
108ぼ ↔ ボ;
109ぽ ↔ ポ;
110ま ↔ マ;
111み ↔ ミ;
112む ↔ ム;
113め ↔ メ;
114も ↔ モ;
115ゃ ↔ ャ;
116や ↔ ヤ;
117ゅ ↔ ュ;
118ゆ ↔ ユ;
119ょ ↔ ョ;
120よ ↔ ヨ;
121ら ↔ ラ;
122り ↔ リ;
123る ↔ ル;
124れ ↔ レ;
125ろ ↔ ロ;
126ゎ ↔ ヮ;
127わ ↔ ワ;
128ゐ ↔ ヰ;
129ゑ ↔ ヱ;
130を ↔ ヲ;
131ん ↔ ン;
132ゔ ↔ ヴ;
133ゝ ↔ ヽ;
134ゞ ↔ ヾ;
135# One-way Katakana-Hiragana xform of small K ka/ke to
136# normal H ka/ke.
137か ← ヵ;
138け ← ヶ;
139# Katakana followed by a prolonged sound mark 30FC has
140# its final vowel doubled.  This is a Katakana-Hiragana
141# one-way information-losing transformation.  We
142# include the small Katakana (e.g., small A 3041) and
143# do not distinguish them from their large
144# counterparts.  It doesn't make sense to double a
145# small counterpart vowel as a small Hiragana vowel, so
146# we don't do so.  In natural text this should never
147# occur anyway.  If a 30FC is seen without a preceding
148# vowel sound (e.g., after n 30F3) we do not change it.
149### $long = ー;
150# The following categories are Hiragana, not Katakana
151# as might be expected, since by the time we get to the
152# 30FC, the preceding character will have already been
153# transformed to Hiragana.
154# {The following mechanically generated from the
155# Unicode 3.0 data:}
156$xa = [ \
157ぁ あ か が さ ざ \
158た だ な は ば ぱ \
159ま ゃ や ら ゎ わ \
160];
161$xi = [ \
162ぃ い き ぎ し じ \
163ち ぢ に ひ び ぴ \
164み り ゐ \
165];
166$xu = [ \
167ぅ う く ぐ す ず \
168っ つ づ ぬ ふ ぶ \
169ぷ む ゅ ゆ る ゔ \
170];
171$xe = [ \
172ぇ え け げ せ ぜ \
173て で ね へ べ ぺ \
174め れ ゑ \
175];
176$xo = [ \
177ぉ お こ ご そ ぞ \
178と ど の ほ ぼ ぽ \
179も ょ よ ろ を \
180];
181あ ← $xa {ー};
182い ← $xi {ー};
183う ← $xu {ー};
184え ← $xe {ー};
185お ← $xo {ー};
186:: (NFKC) ;
187# note: a global filter is more efficient, but MUST include all source chars!!
188:: ([\u0000-\u007E 、。 ゙-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);
189# eof
190			</tRule>
191		</transform>
192	</transforms>
193</supplementalData>
194