• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# © 2016 and later: Unicode, Inc. and others.
2# License & terms of use: http://www.unicode.org/copyright.html
3# Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
4#
5# File: Han_Spacedhan.txt
6# Generated from CLDR
7#
8
9# Only intended for internal use
10# Make sure Han are normalized, including characters that contain them.
11# The first set in the filter is computed with http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:tonfkd:/XXX/:]-[:ideographic:]-[:sc=han:]
12# Where XXX is the resolved [:ideographic:][:sc=han:]. It needs updating with each Unicode release!
13:: [[、。々《-』〜・㆒-㆟㈠-㉇㊀-㊰㋀-㋋ ㍘-㍰㍻-㍿㏠-㏾��-����-����-������][:ideographic:][:sc=han:]] nfkc;
14:: fullwidth-halfwidth;
15。 → '.';
16。→ '.';
17、→ ',';
18、→ ',';
19《→ '«';
20》→ '»';
21〈 → '‹';
22〉→ '›';
23「→ '‘';
24」→ '’';
25「→ '‘';
26」→ '’';
27『→ '“';
28』→ '”';
29・→ '‧';
30・ → '‧';
31々→ '⓶';
32〜→ '~';
33$terminalPunct = [\.\,\:\;\?\!.,:?!。、;[:Pe:][:Pf:]];
34$initialPunct = [:Ps:][:Pi:];
35# add space between any Han or terminal punctuation and letters, and
36# between letters and Han or initial punct
37[[:Ideographic:] $terminalPunct] {} [:Letter:] → ' ' ;
38[:Letter:] [:Mark:]* {} [[:Ideographic:] $initialPunct] → ' ' ;
39# remove spacing between ideographs and other letters
40← [:Ideographic:] { ' ' } [:Letter:] ;
41← [:Letter:] [:Mark:]* { ' ' } [:Ideographic:] ;
42
43