• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# © 2016 and later: Unicode, Inc. and others.
2# License & terms of use: http://www.unicode.org/copyright.html
3# Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
4#
5# File: Hira_Kana.txt
6# Generated from CLDR
7#
8
9# note: a global filter is more efficient, but MUST include all source chars
10:: [[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]-[\u309B \u309C]];
11:: NFKC (NFC);
12# Hiragana-Katakana
13# This is largely a one-to-one mapping, but it has a
14# few kinks:
15# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
16# Hiragana equivalents.  We use Hiragana wa/wi/we/wo
17# (308F-3092) with a voicing mark (3099), which is
18# semantically equivalent.  However, this is a non-
19# roundtripping transformation.
20# 2. The Katakana small ka/ke (30F5,30F6) have no
21# Hiragana equiavlents.  We convert them to normal
22# Hiragana ka/ke (304B,3051).  This is a one-way
23# information-losing transformation and precludes
24# round-tripping of 30F5 and 30F6.
25# 3. The combining marks 3099-309C are in the Hiragana
26# block, but they apply to Katakana as well, so we
27# leave them untouched.
28# 4. The Katakana prolonged sound mark 30FC doubles the
29# preceding vowel.  This is a one-way information-
30# losing transformation from Katakana to Hiragana.
31# 5. The Katakana middle dot separates words in foreign
32# expressions; we leave this unmodified.
33# The above points preclude successful round-trip
34# transformations of arbitrary input text.  However,
35# they provide naturalistic results that should conform
36# to user expectations.
37# Combining equivalents va/vi/ve/vo
38わ\u3099 ↔ ヷ;
39ゐ\u3099 ↔ ヸ;
40ゑ\u3099 ↔ ヹ;
41を\u3099 ↔ ヺ;
42# One-to-one mappings, main block
43# 3041:3094 ↔ 30A1:30F4
44# 309D,E ↔ 30FD,E
45ぁ ↔ ァ;
46あ ↔ ア;
47ぃ ↔ ィ;
48い ↔ イ;
49ぅ ↔ ゥ;
50う ↔ ウ;
51ぇ ↔ ェ;
52え ↔ エ;
53ぉ ↔ ォ;
54お ↔ オ;
55か ↔ カ;
56が ↔ ガ;
57き ↔ キ;
58ぎ ↔ ギ;
59く ↔ ク;
60ぐ ↔ グ;
61け ↔ ケ;
62げ ↔ ゲ;
63こ ↔ コ;
64ご ↔ ゴ;
65さ ↔ サ;
66ざ ↔ ザ;
67し ↔ シ;
68じ ↔ ジ;
69す ↔ ス;
70ず ↔ ズ;
71せ ↔ セ;
72ぜ ↔ ゼ;
73そ ↔ ソ;
74ぞ ↔ ゾ;
75た ↔ タ;
76だ ↔ ダ;
77ち ↔ チ;
78ぢ ↔ ヂ;
79っ ↔ ッ;
80つ ↔ ツ;
81づ ↔ ヅ;
82て ↔ テ;
83で ↔ デ;
84と ↔ ト;
85ど ↔ ド;
86な ↔ ナ;
87に ↔ ニ;
88ぬ ↔ ヌ;
89ね ↔ ネ;
90の ↔ ノ;
91は ↔ ハ;
92ば ↔ バ;
93ぱ ↔ パ;
94ひ ↔ ヒ;
95び ↔ ビ;
96ぴ ↔ ピ;
97ふ ↔ フ;
98ぶ ↔ ブ;
99ぷ ↔ プ;
100へ ↔ ヘ;
101べ ↔ ベ;
102ぺ ↔ ペ;
103ほ ↔ ホ;
104ぼ ↔ ボ;
105ぽ ↔ ポ;
106ま ↔ マ;
107み ↔ ミ;
108む ↔ ム;
109め ↔ メ;
110も ↔ モ;
111ゃ ↔ ャ;
112や ↔ ヤ;
113ゅ ↔ ュ;
114ゆ ↔ ユ;
115ょ ↔ ョ;
116よ ↔ ヨ;
117ら ↔ ラ;
118り ↔ リ;
119る ↔ ル;
120れ ↔ レ;
121ろ ↔ ロ;
122ゎ ↔ ヮ;
123わ ↔ ワ;
124ゐ ↔ ヰ;
125ゑ ↔ ヱ;
126を ↔ ヲ;
127ん ↔ ン;
128ゔ ↔ ヴ;
129ゝ ↔ ヽ;
130ゞ ↔ ヾ;
131# One-way Katakana-Hiragana xform of small K ka/ke to
132# normal H ka/ke.
133か ← ヵ;
134け ← ヶ;
135# Katakana followed by a prolonged sound mark 30FC has
136# its final vowel doubled.  This is a Katakana-Hiragana
137# one-way information-losing transformation.  We
138# include the small Katakana (e.g., small A 3041) and
139# do not distinguish them from their large
140# counterparts.  It doesn't make sense to double a
141# small counterpart vowel as a small Hiragana vowel, so
142# we don't do so.  In natural text this should never
143# occur anyway.  If a 30FC is seen without a preceding
144# vowel sound (e.g., after n 30F3) we do not change it.
145### $long = ー;
146# The following categories are Hiragana, not Katakana
147# as might be expected, since by the time we get to the
148# 30FC, the preceding character will have already been
149# transformed to Hiragana.
150# {The following mechanically generated from the
151# Unicode 3.0 data:}
152$xa = [ \
153ぁ あ か が さ ざ \
154た だ な は ば ぱ \
155ま ゃ や ら ゎ わ \
156];
157$xi = [ \
158ぃ い き ぎ し じ \
159ち ぢ に ひ び ぴ \
160み り ゐ \
161];
162$xu = [ \
163ぅ う く ぐ す ず \
164っ つ づ ぬ ふ ぶ \
165ぷ む ゅ ゆ る ゔ \
166];
167$xe = [ \
168ぇ え け げ せ ぜ \
169て で ね へ べ ぺ \
170め れ ゑ \
171];
172$xo = [ \
173ぉ お こ ご そ ぞ \
174と ど の ほ ぼ ぽ \
175も ょ よ ろ を \
176];
177あ ← $xa {ー};
178い ← $xi {ー};
179う ← $xu {ー};
180え ← $xe {ー};
181お ← $xo {ー};
182:: NFC (NFKC) ;
183# note: a global filter is more efficient, but MUST include all source chars!!
184:: ([[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]-[\u309B \u309C]]);
185# eof
186
187