• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1### Javascript porting of Markus Kuhn's wcwidth() implementation
2
3The following explanation comes from the original C implementation:
4
5This is an implementation of wcwidth() and wcswidth() (defined in
6IEEE Std 1002.1-2001) for Unicode.
7
8http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
9http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
10
11In fixed-width output devices, Latin characters all occupy a single
12"cell" position of equal width, whereas ideographic CJK characters
13occupy two such cells. Interoperability between terminal-line
14applications and (teletype-style) character terminals using the
15UTF-8 encoding requires agreement on which character should advance
16the cursor by how many cell positions. No established formal
17standards exist at present on which Unicode character shall occupy
18how many cell positions on character terminals. These routines are
19a first attempt of defining such behavior based on simple rules
20applied to data provided by the Unicode Consortium.
21
22For some graphical characters, the Unicode standard explicitly
23defines a character-cell width via the definition of the East Asian
24FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
25In all these cases, there is no ambiguity about which width a
26terminal shall use. For characters in the East Asian Ambiguous (A)
27class, the width choice depends purely on a preference of backward
28compatibility with either historic CJK or Western practice.
29Choosing single-width for these characters is easy to justify as
30the appropriate long-term solution, as the CJK practice of
31displaying these characters as double-width comes from historic
32implementation simplicity (8-bit encoded characters were displayed
33single-width and 16-bit ones double-width, even for Greek,
34Cyrillic, etc.) and not any typographic considerations.
35
36Much less clear is the choice of width for the Not East Asian
37(Neutral) class. Existing practice does not dictate a width for any
38of these characters. It would nevertheless make sense
39typographically to allocate two character cells to characters such
40as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
41represented adequately with a single-width glyph. The following
42routines at present merely assign a single-cell width to all
43neutral characters, in the interest of simplicity. This is not
44entirely satisfactory and should be reconsidered before
45establishing a formal standard in this area. At the moment, the
46decision which Not East Asian (Neutral) characters should be
47represented by double-width glyphs cannot yet be answered by
48applying a simple rule from the Unicode database content. Setting
49up a proper standard for the behavior of UTF-8 character terminals
50will require a careful analysis not only of each Unicode character,
51but also of each presentation form, something the author of these
52routines has avoided to do so far.
53
54http://www.unicode.org/unicode/reports/tr11/
55
56Markus Kuhn -- 2007-05-26 (Unicode 5.0)
57
58Permission to use, copy, modify, and distribute this software
59for any purpose and without fee is hereby granted. The author
60disclaims all warranties with regard to this software.
61
62Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
63
64
65
66