1# Introduction 2 3Several tables in the opentype format are formed internally by a graph of subtables. Parent node's 4reference their children through the use of positive offsets, which are typically 16 bits wide. 5Since offsets are always positive this forms a directed acyclic graph. For storage in the font file 6the graph must be given a topological ordering and then the subtables packed in serial according to 7that ordering. Since 16 bit offsets have a maximum value of 65,535 if the distance between a parent 8subtable and a child is more then 65,535 bytes then it's not possible for the offset to encode that 9edge. 10 11For many fonts with complex layout rules (such as Arabic) it's not unusual for the tables containing 12layout rules ([GSUB/GPOS](https://docs.microsoft.com/en-us/typography/opentype/spec/gsub)) to be 13larger than 65kb. As a result these types of fonts are susceptible to offset overflows when 14serializing to the binary font format. 15 16Offset overflows can happen for a variety of reasons and require different strategies to resolve: 17* Simple overflows can often be resolved with a different topological ordering. 18* If a subtable has many parents this can result in the link from furthest parent(s) 19 being at risk for overflows. In these cases it's possible to duplicate the shared subtable which 20 allows it to be placed closer to it's parent. 21* If subtables exist which are themselves larger than 65kb it's not possible for any offsets to point 22 past them. In these cases the subtable can usually be split into two smaller subtables to allow 23 for more flexibility in the ordering. 24* In GSUB/GPOS overflows from Lookup subtables can be resolved by changing the Lookup to an extension 25 lookup which uses a 32 bit offset instead of 16 bit offset. 26 27In general there isn't a simple solution to produce an optimal topological ordering for a given graph. 28Finding an ordering which doesn't overflow is a NP hard problem. Existing solutions use heuristics 29which attempt a combination of the above strategies to attempt to find a non-overflowing configuration. 30 31The harfbuzz subsetting library 32[includes a repacking algorithm](https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-repacker.hh) 33which is used to resolve offset overflows that are present in the subsetted tables it produces. This 34document provides a deep dive into how the harfbuzz repacking algorithm works. 35 36Other implementations exist, such as in 37[fontTools](https://github.com/fonttools/fonttools/blob/7af43123d49c188fcef4e540fa94796b3b44e858/Lib/fontTools/ttLib/tables/otBase.py#L72), however these are not covered in this document. 38 39# Foundations 40 41There's four key pieces to the harfbuzz approach: 42 43* Subtable Graph: a table's internal structure is abstraced out into a lightweight graph 44 representation where each subtable is a node and each offset forms an edge. The nodes only need 45 to know how many bytes the corresponding subtable occupies. This lightweight representation can 46 be easily modified to test new ordering's and strategies as the repacking algorithm iterates. 47 48* [Topological sorting algorithm](https://en.wikipedia.org/wiki/Topological_sorting): an algorithm 49 which given a graph gives a linear sorting of the nodes such that all offsets will be positive. 50 51* Overflow check: given a graph and a topological sorting it checks if there will be any overflows 52 in any of the offsets. If there are overflows it returns a list of (parent, child) tuples that 53 will overflow. Since the graph has information on the size of each subtable it's straightforward 54 to calculate the final position of each subtable and then check if any offsets to it will 55 overflow. 56 57* Offset resolution strategies: given a particular occurence of an overflow these strategies 58 modify the graph to attempt to resolve the overflow. 59 60# High Level Algorithm 61 62``` 63def repack(graph): 64 graph.topological_sort() 65 66 if (graph.will_overflow()) 67 assign_spaces(graph) 68 graph.topological_sort() 69 70 while (overflows = graph.will_overflow()): 71 for overflow in overflows: 72 apply_offset_resolution_strategy (overflow, graph) 73 graph.topological_sort() 74``` 75 76The actual code for this processing loop can be found in the function hb_resolve_overflows () of 77[hb-repacker.hh](https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-repacker.hh). 78 79# Topological Sorting Algorithms 80 81The harfbuzz repacker uses two different algorithms for topological sorting: 82* [Kahn's Algorithm](https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm) 83* Sorting by shortest distance 84 85Kahn's algorithm is approximately twice as fast as the shortest distance sort so that is attempted 86first (only on the first topological sort). If it fails to eliminate overflows then shortest distance 87sort will be used for all subsequent topological sorting operations. 88 89## Shortest Distance Sort 90 91This algorithm orders the nodes based on total distance to each node. Nodes with a shorter distance 92are ordered first. 93 94The "weight" of an edge is the sum of the size of the sub-table being pointed to plus 2^16 for a 16 bit 95offset and 2^32 for a 32 bit offset. 96 97The distance of a node is the sum of all weights along the shortest path from the root to that node 98plus a priority modifier (used to change where nodes are placed by moving increasing or 99decreasing the effective distance). Ties between nodes with the same distance are broken based 100on the order of the offset in the sub table bytes. 101 102The shortest distance to each node is determined using 103[Djikstra's algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm). Then the topological 104ordering is produce by applying a modified version of Kahn's algorithm that uses a priority queue 105based on the shortest distance to each node. 106 107## Optimizing the Sorting 108 109The topological sorting operation is the core of the repacker and is run on each iteration so it needs 110to be as fast as possible. There's a few things that are done to speed up subsequent sorting 111operations: 112 113* The number of incoming edges to each node is cached. This is required by the Kahn's algorithm 114 portion of boths sorts. Where possible when the graph is modified we manually update the cached 115 edge counts of affected nodes. 116 117* The distance to each node is cached. Where possible when the graph is modified we manually update 118 the cached distances of any affected nodes. 119 120Caching these values allows the repacker to avoid recalculating them for the full graph on each 121iteration. 122 123The other important factor to speed is a fast priority queue which is a core datastructure to 124the topological sorting algorithm. Currently a basic heap based queue is used. Heap based queue's 125don't support fast priority decreases, but that can be worked around by just adding redundant entries 126to the priority queue and filtering the older ones out when poppping off entries. This is based 127on the recommendations in 128[a study of the practical performance of priority queues in Dijkstra's algorithm](https://www3.cs.stonybrook.edu/~rezaul/papers/TR-07-54.pdf) 129 130## Special Handling of 32 bit Offsets 131 132If a graph contains multiple 32 bit offsets then the shortest distance sorting will be likely be 133suboptimal. For example consider the case where a graph contains two 32 bit offsets that each point 134to a subgraph which are not connected to each other. The shortest distance sort will interleave the 135subtables of the two subgraphs, potentially resulting in overflows. Since each of these subgraphs are 136independent of each other, and 32 bit offsets can point extremely long distances a better strategy is 137to pack the first subgraph in it's entirety and then have the second subgraph packed after with the 32 138bit offset pointing over the first subgraph. For example given the graph: 139 140 141``` 142a--- b -- d -- f 143 \ 144 \_ c -- e -- g 145``` 146 147Where the links from a to b and a to c are 32 bit offsets, the shortest distance sort would be: 148 149``` 150a, b, c, d, e, f, g 151 152``` 153 154If nodes d and e have a combined size greater than 65kb then the offset from d to f will overflow. 155A better ordering is: 156 157``` 158a, b, d, f, c, e, g 159``` 160 161The ability for 32 bit offsets to point long distances is utilized to jump over the subgraph of 162b which gives the remaining 16 bit offsets a better chance of not overflowing. 163 164The above is an ideal situation where the subgraphs are disconnected from each other, in practice 165this is often not this case. So this idea can be generalized as follows: 166 167If there is a subgraph that is only reachable from one or more 32 bit offsets, then: 168* That subgraph can be treated as an indepedent unit and all nodes of the subgraph packed in isolation 169 from the rest of the graph. 170* In a table that occupies less than 4gb of space (in practice all fonts), that packed independent 171 subgraph can be placed anywhere after the parent nodes without overflowing the 32 bit offsets from 172 the parent nodes. 173 174The sorting algorithm incorporates this via a "space" modifier that can be applied to nodes in the 175graph. By default all nodes are treated as being in space zero. If a node is given a non-zero space, n, 176then the computed distance to the node will be modified by adding `n * 2^32`. This will cause that 177node and it's descendants to be packed between all nodes in space n-1 and space n+1. Resulting in a 178topological sort like: 179 180``` 181| space 0 subtables | space 1 subtables | .... | space n subtables | 182``` 183 184The assign_spaces() step in the high level algorithm is responsible for identifying independent 185subgraphs and assigning unique spaces to each one. More information on the space assignment can be 186found in the next section. 187 188 189# Offset Resolution Strategies 190 191## Space Assignment 192 193The goal of space assignment is to find connected subgraphs that are only reachable via 32 bit offsets 194and then assign each such subgraph to a unique non-zero space. The algorithm is roughly: 195 1961. Collect the set, `S`, of nodes that are children of 32 bit offsets. 197 1982. Do a directed traversal from each node in `S` and collect all encountered nodes into set `T`. 199 Mark all nodes in the graph that are not in `T` as being in space 0. 200 2013. Set `next_space = 1`. 202 2034. While set `S` is not empty: 204 205 a. Pick a node `n` in set `S` then perform an undirected graph traversal and find the set `Q` of 206 nodes that are reachable from `n`. 207 208 b. During traversal if a node, `m`, has a edge to a node in space 0 then `m` must be duplicated 209 to disconnect it from space 0. 210 211 d. Remove all nodes in `Q` from `S` and assign all nodes in `Q` to `next_space`. 212 213 214 c. Increment `next_space` by one. 215 216 217## Manual Iterative Resolutions 218 219For each overflow in each iteration the algorithm will attempt to apply offset overflow resolution 220strategies to eliminate the overflow. The type of strategy applied is dependant on the characteristics 221of the overflowing link: 222 223* If the overflowing offset is inside a space other than space 0 and the subgraph space has more 224 than one 32 bit offset pointing into the subgraph then subdivide the space by moving subgraph 225 from one of the 32 bit offsets into a new space via the duplication of shared nodes. 226 227* If the overflowing offset is pointing to a subtable with more than one incoming edge: duplicate 228 the node so that the overflowing offset is pointing at it's own copy of that node. 229 230* Otherwise, attempt to move the child subtable closer to it's parent. This is accomplished by 231 raising the priority of all children of the parent. Next time the topological sort is run the 232 children will be ordered closer to the parent. 233 234# Test Cases 235 236The harfbuzz repacker has tests defined using generic graphs: https://github.com/harfbuzz/harfbuzz/blob/main/src/test-repacker.cc 237 238# Future Improvments 239 240The above resolution strategies are not sufficient to resolve all overflows. For example consider 241the case where a single subtable is 65k and the graph structure requires an offset to point over it. 242 243The current harfbuzz implementation is suitable for the vast majority of subsetting related overflows. 244Subsetting related overflows are typically easy to solve since all subsets are derived from a font 245that was originally overflow free. A more general purpose version of the algorithm suitable for font 246creation purposes will likely need some additional offset resolution strategies: 247 248* Currently only children nodes are moved to resolve offsets. However, in many cases moving a parent 249 node closer to it's children will have less impact on the size of other offsets. Thus the algorithm 250 should use a heuristic (based on parent and child subtable sizes) to decide if the children's 251 priority should be increased or the parent's priority decreased. 252 253* Many subtables can be split into two smaller subtables without impacting the overall functionality. 254 This should be done when an overflow is the result of a very large table which can't be moved 255 to avoid offsets pointing over it. 256 257* Lookup subtables in GSUB/GPOS can be upgraded to extension lookups which uses a 32 bit offset. 258 Overflows from a Lookup subtable to it's child should be resolved by converting to an extension 259 lookup. 260 261Once additional resolution strategies are added to the algorithm it's likely that we'll need to 262switch to using a [backtracking algorithm](https://en.wikipedia.org/wiki/Backtracking) to explore 263the various combinations of resolution strategies until a non-overflowing combination is found. This 264will require the ability to restore the graph to an earlier state. It's likely that using a stack 265of undoable resolution commands could be used to accomplish this. 266