1--- 2title: Pinyin Fixes 3--- 4 5# Pinyin Fixes 6 7As a part of the CLDR updates for Unicode 5.2, I've been looking at the pinyin support. This is in two areas: 8 9- We transform from Han characters to Pinyin 10- We sort according to Pinyin 11 12According to the directions from Richard Cook, the best algorithm to get the most frequently used pinyin reading is to use all kHanyuPinlu readings first; then take all kXHC1983; then kHanyuPinyin. Using a program to get this, and compare against the pinyin sorting and transforms, we get discrepancies. For example, for sorting there are about 1500 cases (see attachment). The format is: 13 14?? for items that look out of place (using a heuristic algorithm). Example: 15 16?? 606 \* kē (607) 錒 17 18The 606 is the "distance" from surrounding cases, the 607 is the rank order of the pinyin. 19 20Where there are multiple readings in Unihan, they are given in the format with --: 21 22?? 1 ào (20) 坳 垇 23 24 -- 坳 {ào=[xh, pn, ma], āo=[pn, ma], yǒu=[pn]} 25 26 -- 垇 {ào=[xh, ma], āo=[ma]} 27 28- lu is kHanyuPinlu 29- xh is kXHC1983 30- pn is kHanyuPinyin 31- ma is kMandarin 32 33[pinyinSortComparison.txt](https://drive.google.com/file/d/1XFMmbjipcf6pTH2VOJ_KOnjSdpkvyLcq/view?usp=sharing) 34 35