1 LC-trie implementation notes. 2 3Node types 4---------- 5leaf 6 An end node with data. This has a copy of the relevant key, along 7 with 'hlist' with routing table entries sorted by prefix length. 8 See struct leaf and struct leaf_info. 9 10trie node or tnode 11 An internal node, holding an array of child (leaf or tnode) pointers, 12 indexed through a subset of the key. See Level Compression. 13 14A few concepts explained 15------------------------ 16Bits (tnode) 17 The number of bits in the key segment used for indexing into the 18 child array - the "child index". See Level Compression. 19 20Pos (tnode) 21 The position (in the key) of the key segment used for indexing into 22 the child array. See Path Compression. 23 24Path Compression / skipped bits 25 Any given tnode is linked to from the child array of its parent, using 26 a segment of the key specified by the parent's "pos" and "bits" 27 In certain cases, this tnode's own "pos" will not be immediately 28 adjacent to the parent (pos+bits), but there will be some bits 29 in the key skipped over because they represent a single path with no 30 deviations. These "skipped bits" constitute Path Compression. 31 Note that the search algorithm will simply skip over these bits when 32 searching, making it necessary to save the keys in the leaves to 33 verify that they actually do match the key we are searching for. 34 35Level Compression / child arrays 36 the trie is kept level balanced moving, under certain conditions, the 37 children of a full child (see "full_children") up one level, so that 38 instead of a pure binary tree, each internal node ("tnode") may 39 contain an arbitrarily large array of links to several children. 40 Conversely, a tnode with a mostly empty child array (see empty_children) 41 may be "halved", having some of its children moved downwards one level, 42 in order to avoid ever-increasing child arrays. 43 44empty_children 45 the number of positions in the child array of a given tnode that are 46 NULL. 47 48full_children 49 the number of children of a given tnode that aren't path compressed. 50 (in other words, they aren't NULL or leaves and their "pos" is equal 51 to this tnode's "pos"+"bits"). 52 53 (The word "full" here is used more in the sense of "complete" than 54 as the opposite of "empty", which might be a tad confusing.) 55 56Comments 57--------- 58 59We have tried to keep the structure of the code as close to fib_hash as 60possible to allow verification and help up reviewing. 61 62fib_find_node() 63 A good start for understanding this code. This function implements a 64 straightforward trie lookup. 65 66fib_insert_node() 67 Inserts a new leaf node in the trie. This is bit more complicated than 68 fib_find_node(). Inserting a new node means we might have to run the 69 level compression algorithm on part of the trie. 70 71trie_leaf_remove() 72 Looks up a key, deletes it and runs the level compression algorithm. 73 74trie_rebalance() 75 The key function for the dynamic trie after any change in the trie 76 it is run to optimize and reorganize. Tt will walk the trie upwards 77 towards the root from a given tnode, doing a resize() at each step 78 to implement level compression. 79 80resize() 81 Analyzes a tnode and optimizes the child array size by either inflating 82 or shrinking it repeatedly until it fulfills the criteria for optimal 83 level compression. This part follows the original paper pretty closely 84 and there may be some room for experimentation here. 85 86inflate() 87 Doubles the size of the child array within a tnode. Used by resize(). 88 89halve() 90 Halves the size of the child array within a tnode - the inverse of 91 inflate(). Used by resize(); 92 93fn_trie_insert(), fn_trie_delete(), fn_trie_select_default() 94 The route manipulation functions. Should conform pretty closely to the 95 corresponding functions in fib_hash. 96 97fn_trie_flush() 98 This walks the full trie (using nextleaf()) and searches for empty 99 leaves which have to be removed. 100 101fn_trie_dump() 102 Dumps the routing table ordered by prefix length. This is somewhat 103 slower than the corresponding fib_hash function, as we have to walk the 104 entire trie for each prefix length. In comparison, fib_hash is organized 105 as one "zone"/hash per prefix length. 106 107Locking 108------- 109 110fib_lock is used for an RW-lock in the same way that this is done in fib_hash. 111However, the functions are somewhat separated for other possible locking 112scenarios. It might conceivably be possible to run trie_rebalance via RCU 113to avoid read_lock in the fn_trie_lookup() function. 114 115Main lookup mechanism 116--------------------- 117fn_trie_lookup() is the main lookup function. 118 119The lookup is in its simplest form just like fib_find_node(). We descend the 120trie, key segment by key segment, until we find a leaf. check_leaf() does 121the fib_semantic_match in the leaf's sorted prefix hlist. 122 123If we find a match, we are done. 124 125If we don't find a match, we enter prefix matching mode. The prefix length, 126starting out at the same as the key length, is reduced one step at a time, 127and we backtrack upwards through the trie trying to find a longest matching 128prefix. The goal is always to reach a leaf and get a positive result from the 129fib_semantic_match mechanism. 130 131Inside each tnode, the search for longest matching prefix consists of searching 132through the child array, chopping off (zeroing) the least significant "1" of 133the child index until we find a match or the child index consists of nothing but 134zeros. 135 136At this point we backtrack (t->stats.backtrack++) up the trie, continuing to 137chop off part of the key in order to find the longest matching prefix. 138 139At this point we will repeatedly descend subtries to look for a match, and there 140are some optimizations available that can provide us with "shortcuts" to avoid 141descending into dead ends. Look for "HL_OPTIMIZE" sections in the code. 142 143To alleviate any doubts about the correctness of the route selection process, 144a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which 145gives userland access to fib_lookup(). 146