• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Diffs
2
3Consider two directed graphs, containing labelled nodes and edges. Given a
4designated starting node in each graph, describe how the reachable subgraphs are
5different in a textual report.
6
7STG separates the problem of reporting graph differences into two pieces:
8
91. comparison - generating difference graphs
101. reporting - serialising difference graphs
11
12The main benefits are:
13
14* separation of responsibility allowing reporting to vary without significant
15  changes to the comparison code
16* a single difference graph can be used to generate multiple reports with
17  guaranteed consistency and modest time savings
18* the difference graph data structure may be presented as a graph, manipulated,
19  subject to further analysis or stored
20
21## Abstract Graph Diffs
22
23There are 3 kinds of node difference and each node comparison pair can have any
24number of these:
25
261. node label difference - a purely local change
271. outgoing edge with matching labels - a recursive difference
281. added or removed outgoing edge - modelled as a recursive difference with an
29   "absent" node
30
31STG models comparisons as pairs of nodes where either node can be absent. While
32absent-absent comparisons can result from the composition of an addition and a
33removal, they do not occur naturally during pairwise comparison.
34
35## Comparison Implementation
36
37Comparison is mostly done pair-wise recursively with a DFS, by the function
38object `Compare` and with the help of the [SCC finder](SCC.md).
39
40The algorithm divides responsibility between `operator()(Id, Id)` and various
41`operator()(Node, Node)` methods. There are also trivial helpers `Removed`,
42`Added` and `Mismatch`.
43
44The `Result` type encapsulates the difference between two nodes being compared.
45It contains both a list (`Diff`) of differences (`DiffDetail`) and a boolean
46equality outcome. The latter is used to propagate inequality information in the
47presence of cycles in the diff comparison graph.
48
49### `operator()(Node, Node)`
50
51For a given `Node` type, this method has the job of computing local differences,
52matching edges and obtaining edge differences from recursive calls to
53`operator()(Id, Id)` (or `Removed` and `Added`, if edge labels are unmatched).
54
55Local differences can easily be rendered as text, but edge differences need
56recursive calls, the results of which are merged into the local differences
57`Result` with helper methods.
58
59In general we want each comparison operator to be as small as possible,
60containing no boilerplate and simply mirroring the node data. The helper
61functions were therefore chosen for power, laziness and concision.
62
63### `Added` and `Removed`
64
65These take care of comparisons where one side is absent.
66
67There are several reasons for not folding this functionality into `operator(Id,
68Id)` itself:
69
70* it would result in unnecessary extra work for unmatched edges as its callers
71  would pack and the function would unpack `std::optional<Id>` arguments
72* added and removed nodes have none of the other interesting features that it
73  handles
74* `Added` and `Removed` don't need to decorate their return values with any
75  difference information
76
77### `operator(Id, Id)`
78
79This controls recursion and handles some special cases before delegating to some
80`operator()(Node, Node)` in the "normal" case.
81
82It takes care of the following:
83
84* revisited, completed comparisons
85* revisited, in-progress comparison
86* qualified types
87* typedefs
88* incomparable and comparable nodes - handled by `Mismatch` and delegated,
89  respectively
90
91Note that the non-trivial special cases relating to typedefs and qualified types
92(and their current concrete representations) require non-parallel traversals of
93the graphs being compared.
94
95#### Revisited Nodes and Recursive Comparison
96
97STG comparison relies on `SCC` to control recursion and behaviour in the face of
98graphs containing arbitrary cycles. Any difference found affecting one node in a
99comparison cycle affects all nodes in that cycle.
100
101Excluding the two special cases documented in the following sections, the
102comparison steps are approximately:
103
1041. if the comparison already has a known result then return this
1051. if the comparison already is in progress then return a potential difference
1061. start node visit, register the node with the SCC finder
107   1. (special cases for qualified types and typedefs)
108   1. incomparable nodes go to `Mismatch` which returns a difference
109   1. otherwise delegate node comparison (with possible recursion)
110   1. result is a tentative node comparion
1111. finish node visit, informing the SCC finder
1121. if an SCC was closed, we've just finished its root comparison
113   1. root compared equal? discard unwanted potential differences
114   1. difference found? record confirmed differences
115   1. record all its comparisons as final
1161. return result (whether final or tentative)
117
118#### Typedefs
119
120Typedefs are just named type aliases which cannot refer to themselves or later
121defined types. The referred-to type is exactly identical to the typedef. So for
122difference *finding*, typedefs should just be resolved and skipped over.
123However, for *reporting*, it may be still useful to say where a difference came
124from. This requires extra handling to collect the typedef names on each side of
125the comparison, when there is something to report.
126
127If `operator()(Id, Id)` sees a comparison involving a typedef, it resolves
128typedef chains on both sides and keeps track of the names. Then it calls itself
129recursively. If the result is no-diff, it returns no-diff, otherwise, it reports
130the differences between the types at the end of the typedef chains.
131
132An alternative would be to genuinely just follow the epsilons. Store the
133typedefs in the diff tree but record the comparison of what they resolve to. The
134presentation layer can decorate the comparison text with resolution chains.
135
136Note that qualified typedefs present extra complications.
137
138#### Qualified Types
139
140STG currently represents type qualifiers as separate, individual nodes. They are
141relevant for finding differences but there may be no guarantee of the order in
142which they will appear. For diff reporting, STG currently reports added and
143removed qualifiers but also compares the underlying types.
144
145This implies that when faced with a comparison involving a qualifier,
146`operator()(Id, Id)` should collect and compare all qualifiers on both sides and
147treat the types as compound objects consisting of their qualifiers and the
148underlying types, either or both of which may have differences to report.
149Comparing the underlying types requires recursive calls.
150
151Note that qualified typedefs present extra complications.
152
153#### Qualified typedefs
154
155Qualifiers and typedefs have subtle interactions. For example:
156
157Before:
158
159```c++
160const int quux;
161```
162
163After 1:
164
165```c++
166typedef int foo;
167const foo quux;
168```
169
170After 2:
171
172```c++
173typedef const int foo;
174foo quux;
175```
176
177After 3:
178
179```c++
180typedef const int foo;
181const foo quux;
182```
183
184In all cases above, the type of `quux` is unchanged. These examples strongly
185suggest that a better model of C types would involve tracking qualification as a
186decoration present on every type node, including typedefs.
187
188Note that this behaviour implies C's type system is not purely constructive as
189there is machinery to discard duplicate qualifiers which would be illegal
190elsewhere.
191
192For the moment, we can pretend that outer qualifications are always significant,
193even though they may be absorbed by inner ones, and risk occasional false
194positives.
195
196A worse case is:
197
198Before:
199
200```c++
201const int quux[];
202```
203
204After 1:
205
206```c++
207typedef int foo[];
208const foo quux;
209```
210
211After 2:
212
213```c++
214typedef const int foo[];
215foo quux;
216```
217
218After 3:
219
220```c++
221typedef const int foo[];
222const foo quux;
223```
224
225All the `quux` are identically typed. There is an additional wart that what
226would normally be illegal qualifiers on an array type instead decorate its
227element type.
228
229Finally, worst is:
230
231
232Before:
233
234```c++
235const int quux();
236```
237
238After 1:
239
240```c++
241typedef int foo();
242const foo quux;
243```
244
245After 2:
246
247```c++
248typedef const int foo();
249foo quux;
250```
251
252After 3:
253
254```c++
255typedef const int foo();
256const foo quux;
257```
258
259The two `const foo quux` cases invoke undefined behaviour. The consistently
260crazy behaviour would have been to decorate the return type instead.
261
262### Diff helpers
263
264These are mainly used by the `Compare::operator()(Node, Node)` methods.
265
266* `MarkIncomparable` - nodes are just different
267* `AddNodeDiff` - add node difference, unconditionally
268* `AddEdgeDiff` - add edge difference (addition or removal), unconditionally
269* `MaybeAddNodeDiff` - add node difference (label change), conditionally
270* `MaybeAddEdgeDiff` - add matching edge recursive difference, conditionally
271
272Variants are possible where text is generated lazily on a recursive diff being
273found, as are ones where labels are compared and serialised only if different.
274
275## Diff Presentation
276
277In general, there are two problems to solve:
278
279* generating suitable text, for
280   * nodes and edges
281   * node and edge differences
282* building a report with some meaningful structure
283
284Node and edge description and report structure are the responsibility of the
285*reporting* code. See [Names](NAMES.md) for more detailed notes on node
286description, mainly C type name syntax.
287
288Several report formats are supported and the simplest is (omitting various
289complications) a rendering of a difference graph as a difference *tree* where
290revisiting nodes is avoided by reporting 2 additional artificial kinds of
291difference:
292
2931. already reported - to handle diff sharing
2941. being compared - to handle diff cycles
295
296The various formats are not documented further here.
297
298Finally, node and edge difference description is currently the responsibility of
299the *comparison* code. This may change in the future, but might require a typed
300difference graph.
301