• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Introduction
2===============================================================
3
4This document will as the C++ port matures serve as a log to how
5different parts of the library work. As of today, there is some general
6info but mostly CMap specific details.
7
8------------------------------------------------------------------------
9
10Font Data Tables
11===========================================================================
12
13One of the important goals in `sfntly` is thread safety which is why
14tables can only be created with their nested `Builder` class and are
15immutable after creation.
16
17`CMapTable`
18--------------------------------------------------------
19
20*CMap* = character map; it converts *code points* in a *code page* to
21*glyph IDs*.
22
23The CMapTable is a table of CMaps (CMaps are also tables; one for every
24encoding supported by the font). Representing an encoding-dependent
25character map is in one of 14 formats, out of which formats 0 and 4 are
26the most used; sfntly/C++ will initially only support formats 0, 2, 4
27and 12.
28
29### `CMapFormat0` Byte encoding table
30
31Format 0 is a basic table where a character’s glyph ID is looked up in a
32glyphIdArray256. As it only supports 256 characters it can only encode
33ASCII and ISO 8859-x (alphabet-based languages).
34
35### `CMapFormat2` High-byte mapping through table
36
37Chinese, Japanese and Korean (CJK) need special 2 byte encodings for
38each code point like Shift-JIS.
39
40### `CMapFormat4` Segment mapping to delta values
41
42This is the preferred format for Unicode Basic Multilingual Plane (BMP)
43encodings according to the Microsoft spec. Format 4 defines segments
44(contiguous ranges of characters; variable length). Finding a
45character’s glyph id first means finding the segment it is part of using
46a binary search (the segments are sorted). A segment has a
47**`startCode`**, an **`endCode`** (the minimum and maximum code points
48in the segment), an **`idDelta`** (delta for all code points in the
49segment) and an **`idRangeOffset`** (offset into glyphIdArray or 0).
50
51`idDelta` and `idRangeOffset` seem to be the same thing, offsets. In
52fact, `idRangeOffset` uses the glyph array to get the index by relying
53on the fact that the array is immediately after the `idRangeOffset`
54table in the font file. So, the segment’s offset is `idRangeOffset[i]`
55but since the `idRangeOffset` table contains words and not bytes, the
56value is divided by 2.
57
58``` {.prettyprint}
59glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]))
60```
61
62`idDelta[i]` is another kind of segment offset used when
63`idRangeOffset[i] = 0`, in which case it is added directly to the
64character code.
65
66``` {.prettyprint}
67glyphIndex = idDelta[i] + c
68```
69
70### Class Hierarchy
71
72`CMapTable` is the main class and the container for all other CMap
73related classes.
74
75#### Utility classes
76
77-   `CMapTable::CMapId` describes a pair of IDs, platform ID and
78    encoding ID that form the CMaps ID. The ID a CMap has is usually a
79    good indicator as to what kind of format the CMap uses (Unicode
80    CMaps are usually either format 4 or format 12).
81-   `CMapTable::CMapIdComparator`
82-   `CMapTable::CMapIterator` iteration through the CMapTable is
83    supported through a Java-style iterator.
84-   `CMapTable::CMapFilter` Java-style filter; CMapIterator supports
85    filtering CMaps. By default, it accepts everything CMap.
86-   `CMapTable::CMapIdFilter` extends CMapFilter; only accepts one type
87    of CMap. Used in conjunction with CMapIterator, this is how the CMap
88    getters are implemented.
89-   **`CMapTable::Builder`** is the only way to create a CMapTable.
90
91#### CMaps
92
93-   **`CMapTable::CMap`** is the abstract base class that all
94    `CMapFormat*` derive. It defines basic functions and the abstract
95    `CMapTable::CMap::CharacterIterator` class to iterate through the
96    characters in the map. The basic implementation just loops through
97    every character between a start and an end. This is overridden so
98    that format specific iteration is performed.
99-   `CMapFormat0` (mostly done?)
100-   `CMapFormat2` (needs builders)
101-   ... coming soon
102
103`[todo: will add images soon; need to upload to svn]`
104
105------------------------------------------------------------------------
106
107# Table Building Pipeline
108
109Building a data table in sfntly is done by the
110`FontDataTable::Builder::build` method which defines the general
111pipeline and leaves the details to each implementing subclass
112(`CMapTable::Builder` for example). Note: **`sub*`** methods are table
113specific
114
115**`ReadableFontDataPtr data = internalReadData()`**
116> There are 2 private fields in the `FontDataTable::Builder` class:
117> `rData` and `wData` for `ReadableFontData` and `WritableFontData`.
118> This function returns `rData` if there is any or `wData` (it is cast
119> to readable font data) if `rData` is null. *They hold the same data!*
120
121**`if (model_changed_)`**
122> A font is essentially a binary blob when loaded inside a `FontData`
123> object. A *model* is the Java/C++ collection of objects that represent
124> the same data in a manipulable format. If you ask for the model (even
125> if you dont write to it), it will count as changed and the underlying
126> raw data will get updated.
127
128**`if (!subReadyToSerialize())`**
129**`return NULL`**
130`else`
1311.  **`size = subDataToSerialize()`**
1322.  **`WritableDataPtr new_data = container_->getNewData(size)`**
1333.  **`subSerialize(new_data)`**
1344.  **`data = new_data`**
135
136**`FontDataTablePtr table = subBuildTable(data)`**
137> The table is actually built, where `subBuildTable` is overridden by
138> every class of table but a table header is always added.
139
140Subtable Builders
141------------------------------------------------------------------------------
142
143Subtables are lazily built
144
145When creating the object view of the font and dealing with lots of
146tables, it would be wasteful to create builders for every subtable there
147is since most users only do fairly high level manipulation of the font.
148Instead, **only the tables at font level are fully built**.
149
150All other subtables have builders that contain valid FontData but the
151object view is not created by default. For the `CMapTable`, this means
152that if you don’t go through the `getCMapBuilders()` method, the CMap
153builders are not initialized. So, the builder map would seem to be empty
154when calling its `size()` method but there are CMaps in the font when
155calling `numCMaps(internalReadFont())`.
156
157------------------------------------------------------------------------
158
159Character encoders
160---------------------------------------------------------------------------------
161
162Sfntly/Java uses a native ICU-based API for encoding characters.
163Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in
164UTF16. Public APIs will use ICU classes like `UnicodeString`.
165