• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Regres - SwiftShader automated testing
2
3## Introduction
4
5Regres is a collection of tools to perform [dEQP](https://github.com/KhronosGroup/VK-GL-CTS)
6presubmit and continuous integration testing and code coverage evaluation for
7SwiftShader.
8
9Regres provides:
10
11* [Presubmit testing](#presubmit-testing) - An automatic OpenGL|ES and Vulkan
12  dEQP test run for each Gerrit patchset put up for review.
13* [Continuous integration testing](#daily-run-continuous-integration-testing) -
14  A OpenGL|ES and Vulkan dEQP test run performed against the `master` branch each night. \
15  This nightly run also produces code coverage information which can be viewed at
16  [swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage/).
17* [Local dEQP test runner](#local-dEQP-test-runner) Provides a local tool for
18  efficiently running a number of dEQP tests based wildcard or regex name
19  matching.
20
21The Regres source root directory is at [`<swiftshader>/tests/regres/`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/).
22
23## Presubmit testing
24
25Regres monitors changes that have been [put up for review with Gerrit](https://swiftshader-review.googlesource.com/q/status:open).
26
27Once a new [qualifying](#qualifying) patchset has been found, regres will
28checkout, build and test the change against the parent changelist. \
29Any differences in results are reported as a review comment on the change
30[[example]](https://swiftshader-review.googlesource.com/c/SwiftShader/+/46369/5#message-4f09ea3e6d01ed94ae26183c8b6c547c90492c12).
31
32### Qualifying
33
34As Regres may be running externally authored code on Google hardware,
35Regres will only test a change if it is authored by or reviewed by a Googler.
36
37Only the most recent patchset of a change will be tested. If a new patchset is
38pushed while the previous is currently being tested, then testing will continue
39to completion and the previous patchsets will be posted, and the new patchset
40will be queued for testing.
41
42### Prioritization
43
44At the time of writing a Regres presubmit run takes a little over 20 minutes to
45complete, and there is a single Regres machine servicing all changes.
46To keep Regres responsive, changes are prioritized based on their 'readiness to
47land', which is determined by the change's `Kokoro-Presubmit`, `Code-Review` and
48`Presubmit-Ready` Gerrit labels.
49
50### Test Filtering
51
52By default, Regres will run all the test lists declared in the
53[`<swiftshader>/tests/regres/ci-tests.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/ci-tests.json) file.\
54As new functionally is being implemented, the test lists in `ci-tests.json` may
55reference known-passing test lists updated by the [daily run](#daily-run-continuous-integration-testing),
56so that failing tests for incomplete functionality are skipped, but tests that
57pass for new functionality *are tested* to ensure they do not regres.
58
59Additional tests names found in the files referenced by
60[`<swiftshader>/tests/regres/full-tests.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/full-tests.json)
61can be explicitly included in the change's presubmit run
62by including a line in the change description with the signature:
63
64```text
65Test: <dEQP-test-pattern>
66```
67
68`<dEQP-test-pattern>` can be a single dEQP test name, or you can use wildcards
69[as documented here](https://golang.org/pkg/path/filepath/#Match).
70
71You can repeat `Test:` as many times as you like. `Tests:` is also acccepted.
72
73[For example](https://swiftshader-review.googlesource.com/c/SwiftShader/+/26574):
74
75```text
76Add support for OpLogicalEqual, OpLogicalNotEqual
77
78Test: dEQP-VK.glsl.operator.bool_compare.*
79Test: dEQP-VK.glsl.operator.binary_operator.equal.*
80Test: dEQP-VK.glsl.operator.binary_operator.not_equal.*
81Bug: b/126870789
82Change-Id: I9d33444d67792274d8027b7d1632235533cfc079
83```
84
85## Daily-run continuous integration testing
86
87Once a day, regres will also test another set of tests from [`<swiftshader>/tests/regres/full-tests.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/full-tests.json),
88and post the test result lists as a Gerrit changelist
89[[example]](https://swiftshader-review.googlesource.com/c/SwiftShader/+/46448).
90
91The daily run also performs code coverage instrumentation per dEQP test,
92automatically uploading the results of all the dEQP tests to the viewer at
93[swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage/).
94
95## Local dEQP test runner
96
97Regres also provides a multi-threaded, [process sandboxed](#process-sandboxing),
98local dEQP test runner with a wild-card / regex based test name matcher.
99
100The local test runner can be run with:
101
102[`<swiftshader>/tests/regres/run_testlist.sh`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/run_testlist.sh) `--deqp-vk=<path to deqp-vk> [--filter=<test name filter>]`
103
104`<test name filter>` can be a single dEQP test name, or you can use wildcards
105[as documented here](https://golang.org/pkg/path/filepath/#Match).
106Alternatively, start with a `/` to use a regex filter.
107
108Other useful flags:
109
110```text
111  -limit int
112        only run a maximum of this number of tests
113  -no-results
114        disable generation of results.json file
115  -output string
116        path to an output JSON results file (default "results.json")
117  -shuffle
118        shuffle tests
119  -test-list string
120        path to a test list file (default "vk-master-PASS.txt")
121```
122
123Run [`<swiftshader>/tests/regres/run_testlist.sh`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/run_testlist.sh) with `--help` to see all available flags.
124
125## Process sandboxing
126
127Regres will run each dEQP test in a separate process to prevent state
128leakage between tests.
129
130Tests are run concurrently, and crashing processes will not take down the test
131runner.
132
133Some dEQP tests are known to perform excessive memory allocations (i.e. keep
134allocating until no more can be claimed from the OS). \
135In order to prevent a single test starving other test processes of memory, each
136process is restricted to a fraction of the system's memory using [linux resource limits](https://man7.org/linux/man-pages/man2/getrlimit.2.html).
137
138Tests may also deadlock, so each test process has a time limit before they are
139automatically killed.
140
141## Implementation details
142
143### Presubmit & daily run process
144
145Regres runs until stopped, and will:
146
147* Download a known compatible version of Clang to a cache directory. This will
148  be used for all compilation stages below.
149* Periodically poll Gerrit for recently opened changes
150* Periodically query Gerrit for details about each tracked change, determining
151  [whether it should be tested](#qualifying), and determine its current
152  [priority](#prioritization).
153* A qualifying change with the highest priority will be picked, and the
154  following is performed for the change:
155  1. The change is `git fetch`ed into a temporary directory.
156  2. If not already cached, the dEQP version described in the
157     change's [`<swiftshader>/tests/regres/deqp.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/deqp.json) file is downloaded and built the into a cached directory.
158  3. The source for the change is built into a temporary build directory.
159  4. The built dEQP binaries are used to test the change. The full test results
160     are stored in a cached directory.
161  5. If the parent change's test results aren't already cached, then steps 3 and
162     4 are repeated for the parent change.
163  6. The results of the two changes are diffed, and the results of the diff are
164     posted to the change as a Gerrit review comment.
165* The above is repeated until it is time to perform a daily run, upon which:
166  1. The `HEAD` change of `master` is fetched into a temporary directory.
167  2. If not already cached, the dEQP version described in the
168     change's [`<swiftshader>/tests/regres/deqp.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/deqp.json) file is downloaded and built the into a cached directory.
169  3. The `HEAD` change is built into a temporary directory, optionally with code
170     coverage instrumenting.
171  4. The build dEQP binaries are used to test the change.  The full test results
172     are stored in a cached directory, and the each test is binned by status and
173     written to the [`<swiftshader>/tests/regres/testlists`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/testlists) directory.
174  5. A new Gerrit change is created containing the updated test lists and put up
175     for review, along with a summary of test result changes [[example]](https://swiftshader-review.googlesource.com/c/SwiftShader/+/46448).
176     If there's an existing daily test change up for review then this is reused
177     instead of creating another.
178  6. If the build included code coverage instrumentation, then the coverage
179     results are collated from all test runs, processed and compressed, and
180     uploaded to [github.com/swiftshader-regres/swiftshader-coverage](https://github.com/swiftshader-regres/swiftshader-coverage)
181     which is immediately reflected at [swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage).
182     This process is [described in more detail below](#code-coverage).
183  7. Stages 3 - 5 are repeated for both the LLVM and Subzero backends.
184
185### Caching
186
187The cache directory is heavily used to avoid duplicated work. For example, it
188is common for patchsets to be repeatedly pushed with the same parent change, so
189the test results of the parent can be calculated once and stored. A tested
190patchset that is merged into master would also be cached when used as a parent
191of another change.
192
193The cache needs to consider more than just the change identifier as the
194cache-key for storing and retrieving data. Both the test lists and version of
195dEQP used are dictated by the change being tested, and so both used as part of
196the cache key.
197
198### Vulkan Loader usage
199
200Applications make use of the Vulkan API by loading the [Vulkan Loader](https://github.com/KhronosGroup/Vulkan-Loader)
201library (`libvulkan.so.1` on Linux), which enumerates available Vulkan
202implementations (typically GPUs and their drivers) before an actual 'instance'
203is created to communicate with a specific Installable Client Driver (ICD).
204
205However, SwiftShader can build into libvulkan.so.1 itself, which implements the
206same API entry functions as the Vulkan Loader. Regres by default will make dEQP
207load this SwiftShader library instead of the system's Vulkan Loader. It ensures
208test results are independent of the system's Vulkan setup.
209
210To override this, one can set LD_LIBRARY_PATH to point to the location of a
211Loader's libvulkan.so.1.
212
213### Code coverage
214
215The [daily run](#daily-run-continuous-integration-testing) produces code
216coverage information that can be examined for each individual dEQP test at
217[swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage/).
218
219The process for generating this information is complex, and is described in
220detail below:
221
222#### Per-test generation
223
224Code coverage instrumentation is generated with
225[clang's `--coverage`](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html)
226functionality. This compiler option is enabled by using SwiftShader's
227`SWIFTSHADER_EMIT_COVERAGE` CMake flag.
228
229Each dEQP test process is run with a unique `LLVM_PROFILE_FILE` environment
230variable value which dictates where the process writes its raw coverage profile
231file. Each process gets a different path so that we can emit coverage from
232multiple, concurrent dEQP test processes.
233
234#### Parsing
235
236[Clang provides two tools](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#creating-coverage-reports) for processing coverage data:
237
238* `llvm-profdata` indexes the raw `.profraw` coverage profile file and emits a
239  `.profdata` file.
240* `llvm-cov` further processes the `.profdata` file into something human
241  readable or machine parsable.
242
243`llvm-cov` provides many options, including emitting an pretty HTML file, but is
244remarkably slow at producing easily machine-parsable data. Fortunately the core
245of `llvm-cov` is [a few hundreds of lines of code](https://github.com/llvm/llvm-project/tree/master/llvm/tools/llvm-cov), as it relies on LLVM libraries to do the heavy lifting. Regres
246replaces `llvm-cov` with ["`turbo-cov`"](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/cov/turbo-cov/) which efficiently converts a `.profdata` into a simple binary stream which can
247be consumed by Regres.
248
249#### Processing
250
251At the time of writing there are over 560,000 individual dEQP tests, and around
252176,000 lines of C++ code in [`<swiftshader>/src`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:src/).
253If you used 1 bit for each source line, per-line source coverage for all dEQP
254tests would require over 11GiB of storage. That's just for one snapshot.
255
256The processing and compression schemes described below reduces this down to
257around 10 MiB (~1100x reduction in size), and supports sub-line coverage scopes.
258
259##### Spans
260
261Code coverage information is described in spans.
262
263A span is a described as an interval of source locations, where a location is a
264line-column pair:
265
266```go
267type Location struct {
268    Line, Column int
269}
270
271type Span struct {
272    Start, End Location
273}
274```
275
276##### Test tree construction
277
278Each dEQP test is uniquely identified by a fully qualified name.
279Each test belongs to a group, and that group may be nested within any number of
280parent groups. The groups are described in the test name, using dots (`.`) to
281delimit the groups and leaf test name.
282
283For example, the fully qualified test name:
284
285`dEQP-VK.fragment_shader_interlock.basic.discard.ssbo.sample_unordered.4xaa.sample_shading.16x16`
286
287Can be broken down into the following groups and test name:
288
289```text
290dEQP-VK                       <-- root group name
291╰ fragment_shader_interlock
292basic.discard
293    ╰ ssbo
294      ╰ sample_unordered
295        ╰ 4xaa
296          ╰ sample_shading
297            ╰ 16x16           <-- leaf test name
298```
299
300Breaking down fully qualified test names into groups provide a natural way to
301structure coverage data, as tests of the same group are likely to have similar
302coverage spans.
303
304So, for each source file in the codebase, we create a tree with test groups as
305non-leaf nodes, and tests as leaf nodes.
306
307For example, given the following test list:
308
309```text
310a.b.d.h
311a.b.d.i.n
312a.b.d.i.o
313a.b.e.j
314a.b.e.k.p
315a.b.e.k.q
316a.c.f
317a.c.g.l.r
318a.c.g.m
319```
320
321We would construct the following tree:
322
323```text
324               a
325        ╭──────┴──────╮
326        b             c
327    ╭───┴───╮     ╭───┴───╮
328    d       e     f       g
329  ╭─┴─╮   ╭─┴─╮         ╭─┴─╮
330  h   i   j   k         l   m
331     ╭┴╮     ╭┴╮        │
332     n o     p q        r
333
334```
335
336Each leaf node in this tree (`h`, `n`, `o`, `j`, `p`, `q`, `f`, `r`, `m`)
337represent a test, and non-leaf nodes (`a`, `b`, `c`, `d`, `e`, `g`, `i`, `k`,
338`l`) are a groups.
339
340To begin, we create a test tree structure, and associate the full list of test
341coverage spans with every leaf node (test) in this tree.
342
343This data structure hasn't given us any compression benefits yet, but we can
344now do a few tricks to dramatically reduce number of spans needed to describe
345the graph:
346
347##### Optimization 1: Common span promotion
348
349The first compression scheme is to promote common spans up the tree when they
350are common for all children. This will reduce the number of spans needed to be
351encoded in the final file.
352
353For example, if the test group `a` has 4 children that all share the same span
354`X`:
355
356```text
357          a
358    ╭───┬─┴─┬───╮
359    b   c   d   e
360 [X,Y] [X] [X] [X,Z]
361```
362
363Then span `X` can be promoted up to `a`:
364
365```text
366         [X]
367          a
368    ╭───┬─┴─┬───╮
369    b   c   d   e
370   [Y] []   [] [Z]
371```
372
373##### Optimization 2: Span XOR promotion
374
375This idea can be extended further, by not requiring all the children to share
376the same span before promotion. If **most** child nodes share the same span, we
377can still promote the span, but this time we **remove** the span from the
378children **if they had it**, and **add** the span to children **if they didn't
379have it**.
380
381For example, if the test group `a` has 4 children with 3 that share the span
382`X`:
383
384```text
385          a
386    ╭───┬─┴─┬───╮
387    b   c   d   e
388 [X,Y] [X]  [] [X,Z]
389```
390
391Then span `X` can be promoted up to `a` by flipping the presence of `X` on the
392child nodes:
393
394```text
395         [X]
396          a
397    ╭───┬─┴─┬───╮
398    b   c   d   e
399   [Y] []  [X] [Z]
400```
401
402This process repeats up the tree.
403
404With this optimization applied, we now need to traverse the tree from root to
405leaf in order to know whether a given span is in use for the leaf node (test):
406
407* If the span is encountered an **odd** number of times during traversal, then
408  the span is **covered**.
409* If the span is encountered an **even** number of times during traversal, then
410  the span is **not covered**.
411
412See [`tests/regres/cov/coverage_test.go`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/cov/coverage_test.go) for more examples of this optimization.
413
414##### Optimization 3: Common span grouping
415
416With real world data, we encounter groups of spans that are commonly found
417together. To further reduce coverage data, the whole graph is scanned for common
418span patterns, and are indexed by each tree node.
419The XOR'ing of spans as described above is performed as if the spans were not
420grouped.
421
422##### Optimization 4: Lookup tables
423
424All spans, span-groups and strings are stored in de-duplicated tables, and are
425indexed wherever possible.
426
427The final serialization is performed by [`tests/regres/cov/serialization.go`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/cov/serialization.go).
428
429##### Optimization 5: zlib compression
430
431The coverage data is encoded into JSON for parsing by the web page.
432
433Before writing the JSON file, the text data is zlib compressed.
434
435#### Presentation
436
437The zlib-compressed JSON coverage data is decompressed using
438[`pako`](https://github.com/nodeca/pako), and consumed by some
439[vanilla JavaScript](https://github.com/swiftshader-regres/swiftshader-coverage/blob/gh-pages/index.html).
440
441[`codemirror`](https://codemirror.net/) is used to perform coverage span and C++
442syntax highlighting
443