1Introduction 2=============================================================== 3 4This document will as the C++ port matures serve as a log to how 5different parts of the library work. As of today, there is some general 6info but mostly CMap specific details. 7 8------------------------------------------------------------------------ 9 10Font Data Tables 11=========================================================================== 12 13One of the important goals in `sfntly` is thread safety which is why 14tables can only be created with their nested `Builder` class and are 15immutable after creation. 16 17`CMapTable` 18-------------------------------------------------------- 19 20*CMap* = character map; it converts *code points* in a *code page* to 21*glyph IDs*. 22 23The CMapTable is a table of CMaps (CMaps are also tables; one for every 24encoding supported by the font). Representing an encoding-dependent 25character map is in one of 14 formats, out of which formats 0 and 4 are 26the most used; sfntly/C++ will initially only support formats 0, 2, 4 27and 12. 28 29### `CMapFormat0` Byte encoding table 30 31Format 0 is a basic table where a character’s glyph ID is looked up in a 32glyphIdArray256. As it only supports 256 characters it can only encode 33ASCII and ISO 8859-x (alphabet-based languages). 34 35### `CMapFormat2` High-byte mapping through table 36 37Chinese, Japanese and Korean (CJK) need special 2 byte encodings for 38each code point like Shift-JIS. 39 40### `CMapFormat4` Segment mapping to delta values 41 42This is the preferred format for Unicode Basic Multilingual Plane (BMP) 43encodings according to the Microsoft spec. Format 4 defines segments 44(contiguous ranges of characters; variable length). Finding a 45character’s glyph id first means finding the segment it is part of using 46a binary search (the segments are sorted). A segment has a 47**`startCode`**, an **`endCode`** (the minimum and maximum code points 48in the segment), an **`idDelta`** (delta for all code points in the 49segment) and an **`idRangeOffset`** (offset into glyphIdArray or 0). 50 51`idDelta` and `idRangeOffset` seem to be the same thing, offsets. In 52fact, `idRangeOffset` uses the glyph array to get the index by relying 53on the fact that the array is immediately after the `idRangeOffset` 54table in the font file. So, the segment’s offset is `idRangeOffset[i]` 55but since the `idRangeOffset` table contains words and not bytes, the 56value is divided by 2. 57 58``` {.prettyprint} 59glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i])) 60``` 61 62`idDelta[i]` is another kind of segment offset used when 63`idRangeOffset[i] = 0`, in which case it is added directly to the 64character code. 65 66``` {.prettyprint} 67glyphIndex = idDelta[i] + c 68``` 69 70### Class Hierarchy 71 72`CMapTable` is the main class and the container for all other CMap 73related classes. 74 75#### Utility classes 76 77- `CMapTable::CMapId` describes a pair of IDs, platform ID and 78 encoding ID that form the CMaps ID. The ID a CMap has is usually a 79 good indicator as to what kind of format the CMap uses (Unicode 80 CMaps are usually either format 4 or format 12). 81- `CMapTable::CMapIdComparator` 82- `CMapTable::CMapIterator` iteration through the CMapTable is 83 supported through a Java-style iterator. 84- `CMapTable::CMapFilter` Java-style filter; CMapIterator supports 85 filtering CMaps. By default, it accepts everything CMap. 86- `CMapTable::CMapIdFilter` extends CMapFilter; only accepts one type 87 of CMap. Used in conjunction with CMapIterator, this is how the CMap 88 getters are implemented. 89- **`CMapTable::Builder`** is the only way to create a CMapTable. 90 91#### CMaps 92 93- **`CMapTable::CMap`** is the abstract base class that all 94 `CMapFormat*` derive. It defines basic functions and the abstract 95 `CMapTable::CMap::CharacterIterator` class to iterate through the 96 characters in the map. The basic implementation just loops through 97 every character between a start and an end. This is overridden so 98 that format specific iteration is performed. 99- `CMapFormat0` (mostly done?) 100- `CMapFormat2` (needs builders) 101- ... coming soon 102 103`[todo: will add images soon; need to upload to svn]` 104 105------------------------------------------------------------------------ 106 107# Table Building Pipeline 108 109Building a data table in sfntly is done by the 110`FontDataTable::Builder::build` method which defines the general 111pipeline and leaves the details to each implementing subclass 112(`CMapTable::Builder` for example). Note: **`sub*`** methods are table 113specific 114 115**`ReadableFontDataPtr data = internalReadData()`** 116> There are 2 private fields in the `FontDataTable::Builder` class: 117> `rData` and `wData` for `ReadableFontData` and `WritableFontData`. 118> This function returns `rData` if there is any or `wData` (it is cast 119> to readable font data) if `rData` is null. *They hold the same data!* 120 121**`if (model_changed_)`** 122> A font is essentially a binary blob when loaded inside a `FontData` 123> object. A *model* is the Java/C++ collection of objects that represent 124> the same data in a manipulable format. If you ask for the model (even 125> if you dont write to it), it will count as changed and the underlying 126> raw data will get updated. 127 128**`if (!subReadyToSerialize())`** 129**`return NULL`** 130`else` 1311. **`size = subDataToSerialize()`** 1322. **`WritableDataPtr new_data = container_->getNewData(size)`** 1333. **`subSerialize(new_data)`** 1344. **`data = new_data`** 135 136**`FontDataTablePtr table = subBuildTable(data)`** 137> The table is actually built, where `subBuildTable` is overridden by 138> every class of table but a table header is always added. 139 140Subtable Builders 141------------------------------------------------------------------------------ 142 143Subtables are lazily built 144 145When creating the object view of the font and dealing with lots of 146tables, it would be wasteful to create builders for every subtable there 147is since most users only do fairly high level manipulation of the font. 148Instead, **only the tables at font level are fully built**. 149 150All other subtables have builders that contain valid FontData but the 151object view is not created by default. For the `CMapTable`, this means 152that if you don’t go through the `getCMapBuilders()` method, the CMap 153builders are not initialized. So, the builder map would seem to be empty 154when calling its `size()` method but there are CMaps in the font when 155calling `numCMaps(internalReadFont())`. 156 157------------------------------------------------------------------------ 158 159Character encoders 160--------------------------------------------------------------------------------- 161 162Sfntly/Java uses a native ICU-based API for encoding characters. 163Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in 164UTF16. Public APIs will use ICU classes like `UnicodeString`. 165