1### Javascript porting of Markus Kuhn's wcwidth() implementation 2 3The following explanation comes from the original C implementation: 4 5This is an implementation of wcwidth() and wcswidth() (defined in 6IEEE Std 1002.1-2001) for Unicode. 7 8http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html 9http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html 10 11In fixed-width output devices, Latin characters all occupy a single 12"cell" position of equal width, whereas ideographic CJK characters 13occupy two such cells. Interoperability between terminal-line 14applications and (teletype-style) character terminals using the 15UTF-8 encoding requires agreement on which character should advance 16the cursor by how many cell positions. No established formal 17standards exist at present on which Unicode character shall occupy 18how many cell positions on character terminals. These routines are 19a first attempt of defining such behavior based on simple rules 20applied to data provided by the Unicode Consortium. 21 22For some graphical characters, the Unicode standard explicitly 23defines a character-cell width via the definition of the East Asian 24FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes. 25In all these cases, there is no ambiguity about which width a 26terminal shall use. For characters in the East Asian Ambiguous (A) 27class, the width choice depends purely on a preference of backward 28compatibility with either historic CJK or Western practice. 29Choosing single-width for these characters is easy to justify as 30the appropriate long-term solution, as the CJK practice of 31displaying these characters as double-width comes from historic 32implementation simplicity (8-bit encoded characters were displayed 33single-width and 16-bit ones double-width, even for Greek, 34Cyrillic, etc.) and not any typographic considerations. 35 36Much less clear is the choice of width for the Not East Asian 37(Neutral) class. Existing practice does not dictate a width for any 38of these characters. It would nevertheless make sense 39typographically to allocate two character cells to characters such 40as for instance EM SPACE or VOLUME INTEGRAL, which cannot be 41represented adequately with a single-width glyph. The following 42routines at present merely assign a single-cell width to all 43neutral characters, in the interest of simplicity. This is not 44entirely satisfactory and should be reconsidered before 45establishing a formal standard in this area. At the moment, the 46decision which Not East Asian (Neutral) characters should be 47represented by double-width glyphs cannot yet be answered by 48applying a simple rule from the Unicode database content. Setting 49up a proper standard for the behavior of UTF-8 character terminals 50will require a careful analysis not only of each Unicode character, 51but also of each presentation form, something the author of these 52routines has avoided to do so far. 53 54http://www.unicode.org/unicode/reports/tr11/ 55 56Markus Kuhn -- 2007-05-26 (Unicode 5.0) 57 58Permission to use, copy, modify, and distribute this software 59for any purpose and without fee is hereby granted. The author 60disclaims all warranties with regard to this software. 61 62Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c 63 64 65 66