• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2title: Pinyin Fixes
3---
4
5# Pinyin Fixes
6
7As a part of the CLDR updates for Unicode 5.2, I've been looking at the pinyin support. This is in two areas:
8
9- We transform from Han characters to Pinyin
10- We sort according to Pinyin
11
12According to the directions from Richard Cook, the best algorithm to get the most frequently used pinyin reading is to use all kHanyuPinlu readings first; then take all kXHC1983; then kHanyuPinyin. Using a program to get this, and compare against the pinyin sorting and transforms, we get discrepancies. For example, for sorting there are about 1500 cases (see attachment). The format is:
13
14?? for items that look out of place (using a heuristic algorithm). Example:
15
16?? 606 \* kē (607) 錒
17
18The 606 is the "distance" from surrounding cases, the 607 is the rank order of the pinyin.
19
20Where there are multiple readings in Unihan, they are given in the format with --:
21
22?? 1 ào (20) 坳 垇
23
24 -- 坳 {ào=[xh, pn, ma], āo=[pn, ma], yǒu=[pn]}
25
26 -- 垇 {ào=[xh, ma], āo=[ma]}
27
28- lu is kHanyuPinlu
29- xh is kXHC1983
30- pn is kHanyuPinyin
31- ma is kMandarin
32
33[pinyinSortComparison.txt](https://drive.google.com/file/d/1XFMmbjipcf6pTH2VOJ_KOnjSdpkvyLcq/view?usp=sharing)
34
35