1# Regres - SwiftShader automated testing 2 3## Introduction 4 5Regres is a collection of tools to perform [dEQP](https://github.com/KhronosGroup/VK-GL-CTS) 6presubmit and continuous integration testing and code coverage evaluation for 7SwiftShader. 8 9Regres provides: 10 11* [Presubmit testing](#presubmit-testing) - An automatic OpenGL|ES and Vulkan 12 dEQP test run for each Gerrit patchset put up for review. 13* [Continuous integration testing](#daily-run-continuous-integration-testing) - 14 A OpenGL|ES and Vulkan dEQP test run performed against the `master` branch each night. \ 15 This nightly run also produces code coverage information which can be viewed at 16 [swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage/). 17* [Local dEQP test runner](#local-dEQP-test-runner) Provides a local tool for 18 efficiently running a number of dEQP tests based wildcard or regex name 19 matching. 20 21The Regres source root directory is at [`<swiftshader>/tests/regres/`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/). 22 23## Presubmit testing 24 25Regres monitors changes that have been [put up for review with Gerrit](https://swiftshader-review.googlesource.com/q/status:open). 26 27Once a new [qualifying](#qualifying) patchset has been found, regres will 28checkout, build and test the change against the parent changelist. \ 29Any differences in results are reported as a review comment on the change 30[[example]](https://swiftshader-review.googlesource.com/c/SwiftShader/+/46369/5#message-4f09ea3e6d01ed94ae26183c8b6c547c90492c12). 31 32### Qualifying 33 34As Regres may be running externally authored code on Google hardware, 35Regres will only test a change if it is authored by or reviewed by a Googler. 36 37Only the most recent patchset of a change will be tested. If a new patchset is 38pushed while the previous is currently being tested, then testing will continue 39to completion and the previous patchsets will be posted, and the new patchset 40will be queued for testing. 41 42### Prioritization 43 44At the time of writing a Regres presubmit run takes a little over 20 minutes to 45complete, and there is a single Regres machine servicing all changes. 46To keep Regres responsive, changes are prioritized based on their 'readiness to 47land', which is determined by the change's `Kokoro-Presubmit`, `Code-Review` and 48`Presubmit-Ready` Gerrit labels. 49 50### Test Filtering 51 52By default, Regres will run all the test lists declared in the 53[`<swiftshader>/tests/regres/ci-tests.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/ci-tests.json) file.\ 54As new functionally is being implemented, the test lists in `ci-tests.json` may 55reference known-passing test lists updated by the [daily run](#daily-run-continuous-integration-testing), 56so that failing tests for incomplete functionality are skipped, but tests that 57pass for new functionality *are tested* to ensure they do not regres. 58 59Additional tests names found in the files referenced by 60[`<swiftshader>/tests/regres/full-tests.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/full-tests.json) 61can be explicitly included in the change's presubmit run 62by including a line in the change description with the signature: 63 64```text 65Test: <dEQP-test-pattern> 66``` 67 68`<dEQP-test-pattern>` can be a single dEQP test name, or you can use wildcards 69[as documented here](https://golang.org/pkg/path/filepath/#Match). 70 71You can repeat `Test:` as many times as you like. `Tests:` is also acccepted. 72 73[For example](https://swiftshader-review.googlesource.com/c/SwiftShader/+/26574): 74 75```text 76Add support for OpLogicalEqual, OpLogicalNotEqual 77 78Test: dEQP-VK.glsl.operator.bool_compare.* 79Test: dEQP-VK.glsl.operator.binary_operator.equal.* 80Test: dEQP-VK.glsl.operator.binary_operator.not_equal.* 81Bug: b/126870789 82Change-Id: I9d33444d67792274d8027b7d1632235533cfc079 83``` 84 85## Daily-run continuous integration testing 86 87Once a day, regres will also test another set of tests from [`<swiftshader>/tests/regres/full-tests.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/full-tests.json), 88and post the test result lists as a Gerrit changelist 89[[example]](https://swiftshader-review.googlesource.com/c/SwiftShader/+/46448). 90 91The daily run also performs code coverage instrumentation per dEQP test, 92automatically uploading the results of all the dEQP tests to the viewer at 93[swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage/). 94 95## Local dEQP test runner 96 97Regres also provides a multi-threaded, [process sandboxed](#process-sandboxing), 98local dEQP test runner with a wild-card / regex based test name matcher. 99 100The local test runner can be run with: 101 102[`<swiftshader>/tests/regres/run_testlist.sh`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/run_testlist.sh) `--deqp-vk=<path to deqp-vk> [--filter=<test name filter>]` 103 104`<test name filter>` can be a single dEQP test name, or you can use wildcards 105[as documented here](https://golang.org/pkg/path/filepath/#Match). 106Alternatively, start with a `/` to use a regex filter. 107 108Other useful flags: 109 110```text 111 -limit int 112 only run a maximum of this number of tests 113 -no-results 114 disable generation of results.json file 115 -output string 116 path to an output JSON results file (default "results.json") 117 -shuffle 118 shuffle tests 119 -test-list string 120 path to a test list file (default "vk-master-PASS.txt") 121``` 122 123Run [`<swiftshader>/tests/regres/run_testlist.sh`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/run_testlist.sh) with `--help` to see all available flags. 124 125## Process sandboxing 126 127Regres will run each dEQP test in a separate process to prevent state 128leakage between tests. 129 130Tests are run concurrently, and crashing processes will not take down the test 131runner. 132 133Some dEQP tests are known to perform excessive memory allocations (i.e. keep 134allocating until no more can be claimed from the OS). \ 135In order to prevent a single test starving other test processes of memory, each 136process is restricted to a fraction of the system's memory using [linux resource limits](https://man7.org/linux/man-pages/man2/getrlimit.2.html). 137 138Tests may also deadlock, so each test process has a time limit before they are 139automatically killed. 140 141## Implementation details 142 143### Presubmit & daily run process 144 145Regres runs until stopped, and will: 146 147* Download a known compatible version of Clang to a cache directory. This will 148 be used for all compilation stages below. 149* Periodically poll Gerrit for recently opened changes 150* Periodically query Gerrit for details about each tracked change, determining 151 [whether it should be tested](#qualifying), and determine its current 152 [priority](#prioritization). 153* A qualifying change with the highest priority will be picked, and the 154 following is performed for the change: 155 1. The change is `git fetch`ed into a temporary directory. 156 2. If not already cached, the dEQP version described in the 157 change's [`<swiftshader>/tests/regres/deqp.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/deqp.json) file is downloaded and built the into a cached directory. 158 3. The source for the change is built into a temporary build directory. 159 4. The built dEQP binaries are used to test the change. The full test results 160 are stored in a cached directory. 161 5. If the parent change's test results aren't already cached, then steps 3 and 162 4 are repeated for the parent change. 163 6. The results of the two changes are diffed, and the results of the diff are 164 posted to the change as a Gerrit review comment. 165* The above is repeated until it is time to perform a daily run, upon which: 166 1. The `HEAD` change of `master` is fetched into a temporary directory. 167 2. If not already cached, the dEQP version described in the 168 change's [`<swiftshader>/tests/regres/deqp.json`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/deqp.json) file is downloaded and built the into a cached directory. 169 3. The `HEAD` change is built into a temporary directory, optionally with code 170 coverage instrumenting. 171 4. The build dEQP binaries are used to test the change. The full test results 172 are stored in a cached directory, and the each test is binned by status and 173 written to the [`<swiftshader>/tests/regres/testlists`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/testlists) directory. 174 5. A new Gerrit change is created containing the updated test lists and put up 175 for review, along with a summary of test result changes [[example]](https://swiftshader-review.googlesource.com/c/SwiftShader/+/46448). 176 If there's an existing daily test change up for review then this is reused 177 instead of creating another. 178 6. If the build included code coverage instrumentation, then the coverage 179 results are collated from all test runs, processed and compressed, and 180 uploaded to [github.com/swiftshader-regres/swiftshader-coverage](https://github.com/swiftshader-regres/swiftshader-coverage) 181 which is immediately reflected at [swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage). 182 This process is [described in more detail below](#code-coverage). 183 7. Stages 3 - 5 are repeated for both the LLVM and Subzero backends. 184 185### Caching 186 187The cache directory is heavily used to avoid duplicated work. For example, it 188is common for patchsets to be repeatedly pushed with the same parent change, so 189the test results of the parent can be calculated once and stored. A tested 190patchset that is merged into master would also be cached when used as a parent 191of another change. 192 193The cache needs to consider more than just the change identifier as the 194cache-key for storing and retrieving data. Both the test lists and version of 195dEQP used are dictated by the change being tested, and so both used as part of 196the cache key. 197 198### Vulkan Loader usage 199 200Applications make use of the Vulkan API by loading the [Vulkan Loader](https://github.com/KhronosGroup/Vulkan-Loader) 201library (`libvulkan.so.1` on Linux), which enumerates available Vulkan 202implementations (typically GPUs and their drivers) before an actual 'instance' 203is created to communicate with a specific Installable Client Driver (ICD). 204 205However, SwiftShader can build into libvulkan.so.1 itself, which implements the 206same API entry functions as the Vulkan Loader. Regres by default will make dEQP 207load this SwiftShader library instead of the system's Vulkan Loader. It ensures 208test results are independent of the system's Vulkan setup. 209 210To override this, one can set LD_LIBRARY_PATH to point to the location of a 211Loader's libvulkan.so.1. 212 213### Code coverage 214 215The [daily run](#daily-run-continuous-integration-testing) produces code 216coverage information that can be examined for each individual dEQP test at 217[swiftshader-regres.github.io/swiftshader-coverage](https://swiftshader-regres.github.io/swiftshader-coverage/). 218 219The process for generating this information is complex, and is described in 220detail below: 221 222#### Per-test generation 223 224Code coverage instrumentation is generated with 225[clang's `--coverage`](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html) 226functionality. This compiler option is enabled by using SwiftShader's 227`SWIFTSHADER_EMIT_COVERAGE` CMake flag. 228 229Each dEQP test process is run with a unique `LLVM_PROFILE_FILE` environment 230variable value which dictates where the process writes its raw coverage profile 231file. Each process gets a different path so that we can emit coverage from 232multiple, concurrent dEQP test processes. 233 234#### Parsing 235 236[Clang provides two tools](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#creating-coverage-reports) for processing coverage data: 237 238* `llvm-profdata` indexes the raw `.profraw` coverage profile file and emits a 239 `.profdata` file. 240* `llvm-cov` further processes the `.profdata` file into something human 241 readable or machine parsable. 242 243`llvm-cov` provides many options, including emitting an pretty HTML file, but is 244remarkably slow at producing easily machine-parsable data. Fortunately the core 245of `llvm-cov` is [a few hundreds of lines of code](https://github.com/llvm/llvm-project/tree/master/llvm/tools/llvm-cov), as it relies on LLVM libraries to do the heavy lifting. Regres 246replaces `llvm-cov` with ["`turbo-cov`"](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/cov/turbo-cov/) which efficiently converts a `.profdata` into a simple binary stream which can 247be consumed by Regres. 248 249#### Processing 250 251At the time of writing there are over 560,000 individual dEQP tests, and around 252176,000 lines of C++ code in [`<swiftshader>/src`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:src/). 253If you used 1 bit for each source line, per-line source coverage for all dEQP 254tests would require over 11GiB of storage. That's just for one snapshot. 255 256The processing and compression schemes described below reduces this down to 257around 10 MiB (~1100x reduction in size), and supports sub-line coverage scopes. 258 259##### Spans 260 261Code coverage information is described in spans. 262 263A span is a described as an interval of source locations, where a location is a 264line-column pair: 265 266```go 267type Location struct { 268 Line, Column int 269} 270 271type Span struct { 272 Start, End Location 273} 274``` 275 276##### Test tree construction 277 278Each dEQP test is uniquely identified by a fully qualified name. 279Each test belongs to a group, and that group may be nested within any number of 280parent groups. The groups are described in the test name, using dots (`.`) to 281delimit the groups and leaf test name. 282 283For example, the fully qualified test name: 284 285`dEQP-VK.fragment_shader_interlock.basic.discard.ssbo.sample_unordered.4xaa.sample_shading.16x16` 286 287Can be broken down into the following groups and test name: 288 289```text 290dEQP-VK <-- root group name 291╰ fragment_shader_interlock 292 ╰ basic.discard 293 ╰ ssbo 294 ╰ sample_unordered 295 ╰ 4xaa 296 ╰ sample_shading 297 ╰ 16x16 <-- leaf test name 298``` 299 300Breaking down fully qualified test names into groups provide a natural way to 301structure coverage data, as tests of the same group are likely to have similar 302coverage spans. 303 304So, for each source file in the codebase, we create a tree with test groups as 305non-leaf nodes, and tests as leaf nodes. 306 307For example, given the following test list: 308 309```text 310a.b.d.h 311a.b.d.i.n 312a.b.d.i.o 313a.b.e.j 314a.b.e.k.p 315a.b.e.k.q 316a.c.f 317a.c.g.l.r 318a.c.g.m 319``` 320 321We would construct the following tree: 322 323```text 324 a 325 ╭──────┴──────╮ 326 b c 327 ╭───┴───╮ ╭───┴───╮ 328 d e f g 329 ╭─┴─╮ ╭─┴─╮ ╭─┴─╮ 330 h i j k l m 331 ╭┴╮ ╭┴╮ │ 332 n o p q r 333 334``` 335 336Each leaf node in this tree (`h`, `n`, `o`, `j`, `p`, `q`, `f`, `r`, `m`) 337represent a test, and non-leaf nodes (`a`, `b`, `c`, `d`, `e`, `g`, `i`, `k`, 338`l`) are a groups. 339 340To begin, we create a test tree structure, and associate the full list of test 341coverage spans with every leaf node (test) in this tree. 342 343This data structure hasn't given us any compression benefits yet, but we can 344now do a few tricks to dramatically reduce number of spans needed to describe 345the graph: 346 347##### Optimization 1: Common span promotion 348 349The first compression scheme is to promote common spans up the tree when they 350are common for all children. This will reduce the number of spans needed to be 351encoded in the final file. 352 353For example, if the test group `a` has 4 children that all share the same span 354`X`: 355 356```text 357 a 358 ╭───┬─┴─┬───╮ 359 b c d e 360 [X,Y] [X] [X] [X,Z] 361``` 362 363Then span `X` can be promoted up to `a`: 364 365```text 366 [X] 367 a 368 ╭───┬─┴─┬───╮ 369 b c d e 370 [Y] [] [] [Z] 371``` 372 373##### Optimization 2: Span XOR promotion 374 375This idea can be extended further, by not requiring all the children to share 376the same span before promotion. If **most** child nodes share the same span, we 377can still promote the span, but this time we **remove** the span from the 378children **if they had it**, and **add** the span to children **if they didn't 379have it**. 380 381For example, if the test group `a` has 4 children with 3 that share the span 382`X`: 383 384```text 385 a 386 ╭───┬─┴─┬───╮ 387 b c d e 388 [X,Y] [X] [] [X,Z] 389``` 390 391Then span `X` can be promoted up to `a` by flipping the presence of `X` on the 392child nodes: 393 394```text 395 [X] 396 a 397 ╭───┬─┴─┬───╮ 398 b c d e 399 [Y] [] [X] [Z] 400``` 401 402This process repeats up the tree. 403 404With this optimization applied, we now need to traverse the tree from root to 405leaf in order to know whether a given span is in use for the leaf node (test): 406 407* If the span is encountered an **odd** number of times during traversal, then 408 the span is **covered**. 409* If the span is encountered an **even** number of times during traversal, then 410 the span is **not covered**. 411 412See [`tests/regres/cov/coverage_test.go`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/cov/coverage_test.go) for more examples of this optimization. 413 414##### Optimization 3: Common span grouping 415 416With real world data, we encounter groups of spans that are commonly found 417together. To further reduce coverage data, the whole graph is scanned for common 418span patterns, and are indexed by each tree node. 419The XOR'ing of spans as described above is performed as if the spans were not 420grouped. 421 422##### Optimization 4: Lookup tables 423 424All spans, span-groups and strings are stored in de-duplicated tables, and are 425indexed wherever possible. 426 427The final serialization is performed by [`tests/regres/cov/serialization.go`](https://cs.opensource.google/swiftshader/SwiftShader/+/master:tests/regres/cov/serialization.go). 428 429##### Optimization 5: zlib compression 430 431The coverage data is encoded into JSON for parsing by the web page. 432 433Before writing the JSON file, the text data is zlib compressed. 434 435#### Presentation 436 437The zlib-compressed JSON coverage data is decompressed using 438[`pako`](https://github.com/nodeca/pako), and consumed by some 439[vanilla JavaScript](https://github.com/swiftshader-regres/swiftshader-coverage/blob/gh-pages/index.html). 440 441[`codemirror`](https://codemirror.net/) is used to perform coverage span and C++ 442syntax highlighting 443