1# Efficient Fuzzing Guide 2 3Once you have a fuzz target running, you can analyze and tweak it to improve its 4efficiency. This document describes techniques to minimize fuzzing time and 5maximize your results. 6 7*** note 8**Note:** If you haven’t created your first fuzz target yet, see the [Getting 9Started Guide]. 10*** 11 12The most direct way to gauge the effectiveness of your fuzz target is to collect 13metrics. You can get them manually, or take them from a [ClusterFuzz status] 14page after your fuzz target is checked into the Chromium repository. 15 16[TOC] 17 18## Key metrics of a fuzz target 19 20### Execution speed 21 22A fuzzing engine such as libFuzzer typically explores a large search space by 23performing randomized mutations, so it needs to run as fast as possible to find 24interesting code paths. 25 26Fuzz target speed is calculated in executions per second (`exec/s`). It is 27printed while a fuzz target is running: 28 29``` 30#11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098 31``` 32 33You should aim for at least 1,000 exec/s from your fuzz target locally before 34submitting it to the Chromium repository. If you’re under 1,000, consider the 35following improvements: 36 37* [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup) 38* [Minimizing memory usage](#Minimizing-memory-usage) 39 40#### Simplifying initialization/cleanup 41 42If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the 43fuzzer’s execution speed. It can also cause the fuzzer to target specific 44use-cases or fail to account for unexpected scenarios. 45 46Instead of performing setup and teardown on each input, use static 47initialization and shared resources. Check out this [startup initialization] in 48libFuzzer’s documentation for an example. 49 50*** note 51**Note:** You can skip freeing static resources. However, all other resources 52allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated, 53since the function gets called millions of times during a fuzzing session. If 54you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency. 55*** 56 57#### Minimizing memory usage 58 59Avoid allocation of dynamic memory wherever possible. Memory instrumentation 60works faster for stack-based and static objects than for heap-allocated ones. 61 62*** note 63**Note:** It’s always a good idea to try different variants for your fuzz target 64locally, then submit only the fastest implementation to the Chromium repository. 65*** 66 67### Code coverage 68 69You can check the percentage of code covered by your fuzz target to gauge 70fuzzing effectiveness: 71 72* Review aggregated Chrome coverage from recent runs by checking the [fuzzing 73 coverage] report. This report can provide insight on how to improve code 74 coverage. 75* Generate a source-level coverage report for your fuzzer by running the 76 [coverage script] stored in the Chromium repository. The script provides 77 detailed instructions and a usage example. 78 79For the `out/coverage` target in the coverage script, make sure to add all of 80the gn args you needed to build the `out/libfuzzer` target; this could include 81args like `target_os=chromeos` and `is_asan=true` depending on the [gn config] 82you chose. 83 84*** note 85**Note:** The code coverage of a fuzz target depends heavily on the corpus. A 86well-chosen corpus will produce much greater code coverage. On the other hand, 87a coverage report generated by a fuzz target without a corpus won't cover much 88code. If you don’t have a corpus to use, you can download the [corpus from 89ClusterFuzz]. For more information on the corpus, see 90[Corpus Size](#Corpus-Size). 91*** 92 93### Corpus size 94 95A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase 96or corpus unit) *interesting* if the input results in new code coverage (i.e., 97if the fuzzer reaches code that has not been reached before). The set of all 98interesting inputs is called the *corpus*. A corpus is shared across fuzzer runs 99and grows over time. 100 101If a fuzz target stops discovering new interesting inputs after running for a 102while, it typically indicates that the fuzz target is hitting a code barrier 103(also called a *coverage plateau*). The corpus for a reasonably complex target 104should contain hundreds (if not thousands) of inputs. 105 106If a fuzz target reaches coverage plateau with a small corpus, the common causes 107are checksums and magic numbers. Or, it may be impossible for your fuzzer to 108reach a lot of code. The easiest way to diagnose the problem is to generate and 109analyze a [coverage report](#code-coverage). Then, to fix the issue, try the 110following: 111 112* Change the code (e.g., disable CRC checks while fuzzing) with a 113 [custom build](#Custom-build). 114* Prepare or improve the [seed corpus](#Seed-corpus). 115* Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary). 116 117## Ways to improve a fuzz target 118 119### Seed corpus 120 121You can give your fuzz target a starting point by creating a set of valid and 122interesting inputs called a *seed corpus*. If you don’t provide a seed corpus, 123the fuzzing engine has to guess inputs from scratch, which can take time 124(depending on the size of the inputs and the complexity of the target format). 125In many cases, providing a seed corpus can increase code coverage by an order of 126magnitude. 127 128Seed corpuses work especially well for strictly defined file formats and data 129transmission protocols: 130 131* For file format parsers, add valid files from your test suite. 132* For protocol parsers, add valid raw streams from a test suite into separate 133 files. 134* For graphics libraries, add a variety of small PNG/JPG/GIF files. 135 136#### Using a corpus locally 137 138If you’re running a fuzz target locally, you can easily designate a corpus by 139passing a directory as an argument: 140 141``` 142./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus 143``` 144 145The fuzzer stores all the interesting inputs it finds in the directory. 146 147#### Creating a Chromium repository seed corpus 148 149When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined 150in the Chromium source repository. You can define one in your `BUILD.gn` file by 151adding a `seed_corpus` attribute to your `fuzzer_test` target definition: 152 153``` 154fuzzer_test("my_fuzzer") { 155 ... 156 seed_corpus = "test/fuzz/testcases" 157 ... 158} 159``` 160 161If you want to specify multiple seed corpus directories, use the `seed_corpuses` 162attribute instead: 163 164``` 165fuzzer_test("my_fuzzer") { 166 ... 167 seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ] 168 ... 169} 170``` 171 172All files found in these directories and their subdirectories are stored in a 173`<my_fuzzer>_seed_corpus.zip` output archive. 174 175#### Uploading corpus files to GCS 176 177If you can't store your seed corpus in the Chromium repository (e.g., it’s too 178large, can’t be open-sourced, etc.), you can upload the corpus to the Google 179Cloud Storage (GCS) bucket used by ClusterFuzz. 180 1811) Open the [Corpus GCS Bucket] in your browser. 1822) Search for the directory named `<my_fuzzer>`. If the directory does not 183 exist, create it. 1843) In the `<my_fuzzer>` directory, upload your corpus files. 185 186*** note 187**Note:** If you upload your corpus to GCS, you don’t need to add the 188`seed_corpus` attribute to your `fuzzer_test` target definition. However, adding 189seed corpus to the Chromium repository is the preferred way. 190*** 191 192You can do the same thing by using the [gsutil] command line tool: 193 194```bash 195gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer> 196``` 197 198*** note 199**Note:** To write to this bucket using `gsutil`, you must be logged into your 200@google.com account (@chromium.org will not work). You can use the `gcloud auth 201login` command to log into your account in `gsutil` if you installed `gsutil` 202through `gcloud`. 203*** 204 205#### Minimizing a seed corpus 206 207Your seed corpus is synced to all fuzzing bots for every iteration, so it's 208important to minimize it to a small set of interesting inputs before uploading. 209Keeping the seed corpus small improves fuzzing efficiency and prevents our bots 210from running out of disk space. 211 212You can minimize your seed corpus by using libFuzzer’s `-merge=1` option: 213 214```bash 215# Create an empty directory. 216mkdir seed_corpus_minimized 217 218# Run the fuzzer with -merge=1 flag. 219./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus 220``` 221 222After running the command, the `seed_corpus_minimized` directory will contain a 223minimized corpus that gives the same code coverage as your initial `seed_corpus` 224directory. 225 226### Fuzzer dictionary 227 228You can help your fuzzer increase its coverage by providing a set of common 229words or values that you expect to find in the input. Such a dictionary works 230especially well for certain use-cases (e.g., fuzzing file format decoders or 231text-based protocols like XML). 232 233Add a fuzzer dictionary: 234 2351) Create a flat ASCII text file that lists one input token per line in the 236 format `name="value"`. The value must appear in quotes with hex escaping 237 (`\xNN`) applied to all non-printable, high-bit, or otherwise problematic 238 characters (`\` and `"` shorthands are recognized, too). This syntax is 239 similar to the one used by the [AFL] fuzzing engine (`-x` option). 240 241 *** note 242 **Note:** `name` can be omitted, but it is a convenient way to document the 243 meaning of each token. Here’s an example dictionary: 244 *** 245 246 ``` 247 # Lines starting with '#' and empty lines are ignored. 248 249 # Adds "blah" word (w/o quotes) to the dictionary. 250 kw1="blah" 251 # Use \\ for backslash and \" for quotes. 252 kw2="\"ac\\dc\"" 253 # Use \xAB for hex values. 254 kw3="\xF7\xF8" 255 # Key name before '=' can be omitted: 256 "foo\x0Abar" 257 ``` 258 2592) Test your dictionary by running your fuzz target locally: 260 261 ```bash 262 ./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus> 263 ``` 264 265 If the dictionary is effective, you should see `NEW` units discovered in the 266 output. 267 2683) Add the dictionary file in the same directory as your fuzz target, then add 269 the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file: 270 271 ``` 272 fuzzer_test("my_fuzzer") { 273 ... 274 dict = "my_fuzzer.dict" 275 } 276 ``` 277 278 The dictionary is submitted to the Chromium repository. Once ClusterFuzz 279 picks up a new revision build, the dictionary is used automatically. 280 281### Custom build 282 283If you need to change the code being tested by your fuzz target, you can use an 284`#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` macro in your target code. 285 286*** note 287**Note:** Patching target code is not a preferred way of improving the 288corresponding fuzz target, but in some cases it might be the only way to do it 289(e.g., when there is no intended API to disable checksum verification, or when 290the target code uses a random generator that affects the reproducibility of 291crashes). 292*** 293 294[AFL]: http://lcamtuf.coredump.cx/afl/ 295[ClusterFuzz status]: libFuzzer_integration.md#Status-Links 296[Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer 297[Getting Started Guide]: getting_started.md 298[gn config]: getting_started.md#running-the-fuzz-target 299[corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus 300[coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py 301[fuzzing coverage]: https://chromium-coverage.appspot.com/reports/latest_fuzzers_only/linux/index.html 302[gsutil]: https://cloud.google.com/storage/docs/gsutil 303[startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization 304