1# Diffs 2 3Consider two directed graphs, containing labelled nodes and edges. Given a 4designated starting node in each graph, describe how the reachable subgraphs are 5different in a textual report. 6 7STG separates the problem of reporting graph differences into two pieces: 8 91. comparison - generating difference graphs 101. reporting - serialising difference graphs 11 12The main benefits are: 13 14* separation of responsibility allowing reporting to vary without significant 15 changes to the comparison code 16* a single difference graph can be used to generate multiple reports with 17 guaranteed consistency and modest time savings 18* the difference graph data structure may be presented as a graph, manipulated, 19 subject to further analysis or stored 20 21## Abstract Graph Diffs 22 23There are 3 kinds of node difference and each node comparison pair can have any 24number of these: 25 261. node label difference - a purely local change 271. outgoing edge with matching labels - a recursive difference 281. added or removed outgoing edge - modelled as a recursive difference with an 29 "absent" node 30 31STG models comparisons as pairs of nodes where either node can be absent. While 32absent-absent comparisons can result from the composition of an addition and a 33removal, they do not occur naturally during pairwise comparison. 34 35## Comparison Implementation 36 37Comparison is mostly done pair-wise recursively with a DFS, by the function 38object `Compare` and with the help of the [SCC finder](SCC.md). 39 40The algorithm divides responsibility between `operator()(Id, Id)` and various 41`operator()(Node, Node)` methods. There are also trivial helpers `Removed`, 42`Added` and `Mismatch`. 43 44The `Result` type encapsulates the difference between two nodes being compared. 45It contains both a list (`Diff`) of differences (`DiffDetail`) and a boolean 46equality outcome. The latter is used to propagate inequality information in the 47presence of cycles in the diff comparison graph. 48 49### `operator()(Node, Node)` 50 51For a given `Node` type, this method has the job of computing local differences, 52matching edges and obtaining edge differences from recursive calls to 53`operator()(Id, Id)` (or `Removed` and `Added`, if edge labels are unmatched). 54 55Local differences can easily be rendered as text, but edge differences need 56recursive calls, the results of which are merged into the local differences 57`Result` with helper methods. 58 59In general we want each comparison operator to be as small as possible, 60containing no boilerplate and simply mirroring the node data. The helper 61functions were therefore chosen for power, laziness and concision. 62 63### `Added` and `Removed` 64 65These take care of comparisons where one side is absent. 66 67There are several reasons for not folding this functionality into `operator(Id, 68Id)` itself: 69 70* it would result in unnecessary extra work for unmatched edges as its callers 71 would pack and the function would unpack `std::optional<Id>` arguments 72* added and removed nodes have none of the other interesting features that it 73 handles 74* `Added` and `Removed` don't need to decorate their return values with any 75 difference information 76 77### `operator(Id, Id)` 78 79This controls recursion and handles some special cases before delegating to some 80`operator()(Node, Node)` in the "normal" case. 81 82It takes care of the following: 83 84* revisited, completed comparisons 85* revisited, in-progress comparison 86* qualified types 87* typedefs 88* incomparable and comparable nodes - handled by `Mismatch` and delegated, 89 respectively 90 91Note that the non-trivial special cases relating to typedefs and qualified types 92(and their current concrete representations) require non-parallel traversals of 93the graphs being compared. 94 95#### Revisited Nodes and Recursive Comparison 96 97STG comparison relies on `SCC` to control recursion and behaviour in the face of 98graphs containing arbitrary cycles. Any difference found affecting one node in a 99comparison cycle affects all nodes in that cycle. 100 101Excluding the two special cases documented in the following sections, the 102comparison steps are approximately: 103 1041. if the comparison already has a known result then return this 1051. if the comparison already is in progress then return a potential difference 1061. start node visit, register the node with the SCC finder 107 1. (special cases for qualified types and typedefs) 108 1. incomparable nodes go to `Mismatch` which returns a difference 109 1. otherwise delegate node comparison (with possible recursion) 110 1. result is a tentative node comparion 1111. finish node visit, informing the SCC finder 1121. if an SCC was closed, we've just finished its root comparison 113 1. root compared equal? discard unwanted potential differences 114 1. difference found? record confirmed differences 115 1. record all its comparisons as final 1161. return result (whether final or tentative) 117 118#### Typedefs 119 120Typedefs are just named type aliases which cannot refer to themselves or later 121defined types. The referred-to type is exactly identical to the typedef. So for 122difference *finding*, typedefs should just be resolved and skipped over. 123However, for *reporting*, it may be still useful to say where a difference came 124from. This requires extra handling to collect the typedef names on each side of 125the comparison, when there is something to report. 126 127If `operator()(Id, Id)` sees a comparison involving a typedef, it resolves 128typedef chains on both sides and keeps track of the names. Then it calls itself 129recursively. If the result is no-diff, it returns no-diff, otherwise, it reports 130the differences between the types at the end of the typedef chains. 131 132An alternative would be to genuinely just follow the epsilons. Store the 133typedefs in the diff tree but record the comparison of what they resolve to. The 134presentation layer can decorate the comparison text with resolution chains. 135 136Note that qualified typedefs present extra complications. 137 138#### Qualified Types 139 140STG currently represents type qualifiers as separate, individual nodes. They are 141relevant for finding differences but there may be no guarantee of the order in 142which they will appear. For diff reporting, STG currently reports added and 143removed qualifiers but also compares the underlying types. 144 145This implies that when faced with a comparison involving a qualifier, 146`operator()(Id, Id)` should collect and compare all qualifiers on both sides and 147treat the types as compound objects consisting of their qualifiers and the 148underlying types, either or both of which may have differences to report. 149Comparing the underlying types requires recursive calls. 150 151Note that qualified typedefs present extra complications. 152 153#### Qualified typedefs 154 155Qualifiers and typedefs have subtle interactions. For example: 156 157Before: 158 159```c++ 160const int quux; 161``` 162 163After 1: 164 165```c++ 166typedef int foo; 167const foo quux; 168``` 169 170After 2: 171 172```c++ 173typedef const int foo; 174foo quux; 175``` 176 177After 3: 178 179```c++ 180typedef const int foo; 181const foo quux; 182``` 183 184In all cases above, the type of `quux` is unchanged. These examples strongly 185suggest that a better model of C types would involve tracking qualification as a 186decoration present on every type node, including typedefs. 187 188Note that this behaviour implies C's type system is not purely constructive as 189there is machinery to discard duplicate qualifiers which would be illegal 190elsewhere. 191 192For the moment, we can pretend that outer qualifications are always significant, 193even though they may be absorbed by inner ones, and risk occasional false 194positives. 195 196A worse case is: 197 198Before: 199 200```c++ 201const int quux[]; 202``` 203 204After 1: 205 206```c++ 207typedef int foo[]; 208const foo quux; 209``` 210 211After 2: 212 213```c++ 214typedef const int foo[]; 215foo quux; 216``` 217 218After 3: 219 220```c++ 221typedef const int foo[]; 222const foo quux; 223``` 224 225All the `quux` are identically typed. There is an additional wart that what 226would normally be illegal qualifiers on an array type instead decorate its 227element type. 228 229Finally, worst is: 230 231 232Before: 233 234```c++ 235const int quux(); 236``` 237 238After 1: 239 240```c++ 241typedef int foo(); 242const foo quux; 243``` 244 245After 2: 246 247```c++ 248typedef const int foo(); 249foo quux; 250``` 251 252After 3: 253 254```c++ 255typedef const int foo(); 256const foo quux; 257``` 258 259The two `const foo quux` cases invoke undefined behaviour. The consistently 260crazy behaviour would have been to decorate the return type instead. 261 262### Diff helpers 263 264These are mainly used by the `Compare::operator()(Node, Node)` methods. 265 266* `MarkIncomparable` - nodes are just different 267* `AddNodeDiff` - add node difference, unconditionally 268* `AddEdgeDiff` - add edge difference (addition or removal), unconditionally 269* `MaybeAddNodeDiff` - add node difference (label change), conditionally 270* `MaybeAddEdgeDiff` - add matching edge recursive difference, conditionally 271 272Variants are possible where text is generated lazily on a recursive diff being 273found, as are ones where labels are compared and serialised only if different. 274 275## Diff Presentation 276 277In general, there are two problems to solve: 278 279* generating suitable text, for 280 * nodes and edges 281 * node and edge differences 282* building a report with some meaningful structure 283 284Node and edge description and report structure are the responsibility of the 285*reporting* code. See [Names](NAMES.md) for more detailed notes on node 286description, mainly C type name syntax. 287 288Several report formats are supported and the simplest is (omitting various 289complications) a rendering of a difference graph as a difference *tree* where 290revisiting nodes is avoided by reporting 2 additional artificial kinds of 291difference: 292 2931. already reported - to handle diff sharing 2941. being compared - to handle diff cycles 295 296The various formats are not documented further here. 297 298Finally, node and edge difference description is currently the responsibility of 299the *comparison* code. This may change in the future, but might require a typed 300difference graph. 301