1# Getting started with fuzzing in Chromium 2 3This document walks you through the basic steps to start fuzzing and suggestions 4for improving your fuzz targets. If you're looking for more advanced fuzzing 5topics, see the [main page](README.md). 6 7[TOC] 8 9## Getting started 10 11### Setting up your build environment 12 13Generate build files by using the `use_libfuzzer` [GN] argument together with a 14sanitizer. Pick the [GN config] that corresponds to the DUT you're deploying to: 15 16```bash 17# AddressSanitizer is the default config we recommend testing with. 18# Linux: 19tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Linux ASan' out/libfuzzer 20# Chrome OS: 21tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Chrome OS ASan' out/libfuzzer 22# Mac: 23tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Mac ASan' out/libfuzzer 24# Windows: 25python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Upload Windows ASan" out\libfuzzer 26``` 27 28If testing things locally these are the recommended configurations 29 30```bash 31# AddressSanitizer is the default config we recommend testing with. 32# Linux: 33tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Linux ASan' out/libfuzzer 34# Chrome OS: 35tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Chrome OS ASan' out/libfuzzer 36# Mac: 37tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Mac ASan' out/libfuzzer 38# Windows: 39python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Local Windows ASan" out\libfuzzer 40``` 41 42*** note 43**Note:** The above invocations may set `use_remoteexec` or `use_rbe` to true. 44However, these args aren't compatible on local workstations yet. So if you run 45into reclient errors when building locally, remove both those args and set 46`use_goma` instead. 47 48You can also invoke [AFL] by using the `use_afl` GN argument, but we 49recommend libFuzzer for local development. Running libFuzzer locally doesn't 50require any special configuration and gives quick, meaningful output for speed, 51coverage, and other parameters. 52*** 53 54It’s possible to run fuzz targets without sanitizers, but not recommended, as 55sanitizers help to detect errors which may not result in a crash otherwise. 56`use_libfuzzer` is supported in the following sanitizer configurations. 57 58| GN Argument | Description | Supported OS | 59|-------------|-------------|--------------| 60| `is_asan=true` | Enables [AddressSanitizer] to catch problems like buffer overruns. | Linux, Windows, Mac, Chrome OS | 61| `is_msan=true` | Enables [MemorySanitizer] to catch problems like uninitialized reads<sup>\[[\*](reference.md#MSan)\]</sup>. | Linux | 62| `is_ubsan_security=true` | Enables [UndefinedBehaviorSanitizer] to catch<sup>\[[\*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow.| Linux | 63 64For more on builder and sanitizer configurations, see the [Integration 65Reference] page. 66 67*** note 68**Hint**: Fuzz targets are built with minimal symbols by default. You can adjust 69the symbol level by setting the `symbol_level` attribute. 70*** 71 72### Creating your first fuzz target 73 74After you set up your build environment, you can create your first fuzz target: 75 761. In the same directory as the code you are going to fuzz (or next to the tests 77 for that code), create a new `<my_fuzzer>.cc` file. 78 79 *** note 80 **Note:** Do not use the `testing/libfuzzer/fuzzers` directory. This 81 directory was used for initial sample fuzz targets but is no longer 82 recommended for landing new targets. 83 *** 84 852. In the new file, define a `LLVMFuzzerTestOneInput` function: 86 87 ```cpp 88 #include <stddef.h> 89 #include <stdint.h> 90 91 extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 92 // Put your fuzzing code here and use |data| and |size| as input. 93 return 0; 94 } 95 ``` 96 973. In `BUILD.gn` file, define a `fuzzer_test` GN target: 98 99 ```python 100 import("//testing/libfuzzer/fuzzer_test.gni") 101 fuzzer_test("my_fuzzer") { 102 sources = [ "my_fuzzer.cc" ] 103 deps = [ ... ] 104 } 105 ``` 106 107*** note 108**Note:** Most of the targets are small. They may perform one or a few API calls 109using the data provided by the fuzzing engine as an argument. However, fuzz 110targets may be more complex if a certain initialization procedure needs to be 111performed. [quic_stream_factory_fuzzer.cc] is a good example of a complex fuzz 112target. 113*** 114 115### Running the fuzz target 116 117After you create your fuzz target, build it with autoninja and run it locally. In 118most cases you don't want to commit your locally generated corpus, so save it 119somewhere like `/tmp/corpus`. 120 121```bash 122# Build the fuzz target. 123autoninja -C out/libfuzzer url_parse_fuzzer 124# Create an empty corpus directory. 125mkdir /tmp/corpus 126# Run the fuzz target. 127./out/libfuzzer/url_parse_fuzzer /tmp/corpus 128# If have other corpus directories, pass their paths as well: 129./out/libfuzzer/url_parse_fuzzer /tmp/corpus seed_corpus_dir_1 seed_corpus_dir_N 130``` 131 132Your fuzz target should produce output like this: 133 134``` 135INFO: Seed: 1511722356 136INFO: Loaded 2 modules (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194), 137INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes 138INFO: A corpus is not provided, starting from an empty corpus 139#2 INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb 140#3 NEW cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes- 141#4 NEW cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte- 142#6 NEW cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes- 143``` 144 145A `... NEW ...` line appears when libFuzzer finds new and interesting inputs. If 146your fuzz target is efficient, it will find a lot of them quickly. A `... pulse 147...` line appears periodically to show the current status. 148 149For more information about the output, see [libFuzzer's output documentation]. 150 151*** note 152**Note:** If you observe an `odr-violation` error in the log, please try setting 153the following environment variable: `ASAN_OPTIONS=detect_odr_violation=0` and 154running the fuzz target again. 155*** 156 157#### Symbolizing a stacktrace 158 159If your fuzz target crashes when running locally and you see non-symbolized 160stacktrace, make sure you add the `third_party/llvm-build/Release+Asserts/bin/` 161directory from Chromium’s Clang package in `$PATH`. This directory contains the 162`llvm-symbolizer` binary. 163 164Alternatively, you can set an `external_symbolizer_path` via the `ASAN_OPTIONS` 165environment variable: 166 167```bash 168ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \ 169 ./fuzzer ./crash-input 170``` 171 172The same approach works with other sanitizers via `MSAN_OPTIONS`, 173`UBSAN_OPTIONS`, etc. 174 175### Submitting your fuzz target 176 177ClusterFuzz and the build infrastructure automatically discover, build and 178execute all `fuzzer_test` targets in the Chromium repository. Once you land your 179fuzz target, ClusterFuzz will run it at scale. Check the [ClusterFuzz status] 180page after a day or two. 181 182If you want to better understand and optimize your fuzz target’s performance, 183see the [Efficient Fuzzing Guide]. 184 185*** note 186**Note:** It’s important to run fuzzers at scale, not just in your own 187environment, because local fuzzing will catch fewer issues. If you run fuzz 188targets at scale continuously, you’ll catch regressions and improve code 189coverage over time. 190*** 191 192## Optional improvements 193 194### Common tricks 195 196Your fuzz target may immediately discover interesting (i.e. crashing) inputs. 197You can make it more effective with several easy steps: 198 199* **Create a seed corpus**. You can guide the fuzzing engine to generate more 200 relevant inputs by adding the `seed_corpus = "src/fuzz-testcases/"` attribute 201 to your fuzz target and adding example files to the appropriate directory. For 202 more, see the [Seed Corpus] section of the [Efficient Fuzzing Guide]. 203 204 *** note 205 **Note:** make sure your corpus files are appropriately licensed. 206 *** 207 208* **Create a mutation dictionary**. You can make mutations more effective by 209 providing the fuzzer with a `dict = "protocol.dict"` GN attribute and a 210 dictionary file that contains interesting strings / byte sequences for the 211 target API. For more, see the [Fuzzer Dictionary] section of the [Efficient 212 Fuzzer Guide]. 213 214* **Specify testcase length limits**. Long inputs can be problematic, because 215 they are more slowly processed by the fuzz target and increase the search 216 space. By default, libFuzzer uses `-max_len=4096` or takes the longest 217 testcase in the corpus if `-max_len` is not specified. 218 219 ClusterFuzz uses different strategies for different fuzzing sessions, 220 including different random values. Also, ClusterFuzz uses different fuzzing 221 engines (e.g. AFL that doesn't have `-max_len` option). If your target has an 222 input length limit that you would like to *strictly enforce*, add a sanity 223 check to the beginning of your `LLVMFuzzerTestOneInput` function: 224 225 ```cpp 226 if (size < kMinInputLength || size > kMaxInputLength) 227 return 0; 228 ``` 229 230* **Generate a [code coverage report]**. See which code the fuzzer covered in 231 recent runs, so you can gauge whether it hits the important code parts or not. 232 233 **Note:** Since the code coverage of a fuzz target depends heavily on the 234 corpus provided when running the target, we recommend running the fuzz target 235 built with ASan locally for a little while (several minutes / hours) first. 236 This will produce some corpus, which should be used for generating a code 237 coverage report. 238 239#### Disabling noisy error message logging 240 241If the code you’re fuzzing generates a lot of error messages when encountering 242incorrect or invalid data, the fuzz target will be slow and inefficient. 243 244If the target uses Chromium logging APIs, you can silence errors by overriding 245the environment used for logging in your fuzz target: 246 247```cpp 248struct Environment { 249 Environment() { 250 logging::SetMinLogLevel(logging::LOG_FATAL); 251 } 252}; 253 254extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 255 static Environment env; 256 257 // Put your fuzzing code here and use data+size as input. 258 return 0; 259} 260``` 261 262### Mutating Multiple Inputs 263 264By default, a fuzzing engine such as libFuzzer mutates a single input (`uint8_t* 265data, size_t size`). However, APIs often accept multiple arguments of various 266types, rather than a single buffer. You can use three different methods to 267mutate multiple inputs at once. 268 269#### libprotobuf-mutator (LPM) 270 271If you need to mutate multiple inputs of various types and length, see [Getting 272Started with libprotobuf-mutator in Chromium]. 273 274*** note 275**Note:** This method works with APIs and data structures of any complexity, but 276requires extra effort. You would need to write a `.proto` definition (unless you 277fuzz an existing protobuf) and C++ code to pass the proto message to the API you 278are fuzzing (you'll have a fuzzed protobuf message instead of `data, size` 279buffer). 280*** 281 282#### FuzzedDataProvider (FDP) 283 284[FuzzedDataProvider] is a class useful for splitting a fuzz input into multiple 285parts of various types. 286 287*** note 288**Note:** FDP is much easier to use than LPM, but its downside is that format of 289the corpus becomes inconsistent. This doesn't matter if you don't have [Seed 290Corpus] (e.g. valid image files if you fuzz an image parser). FDP splits your 291corpus files into several pieces to fuzz a broader range of input types, so it 292can take longer to reach deeper code paths that surface more quickly if you fuzz 293only a single input type. 294*** 295 296To use FDP, add `#include <fuzzer/FuzzedDataProvider.h>` to your fuzz target 297source file. 298 299To learn more about `FuzzedDataProvider`, check out the [upstream documentation] 300on it. It gives an overview of the available methods and links to a few example 301fuzz targets. 302 303#### Hash-based argument 304 305If your API accepts a buffer with data and some integer value (i.e., a bitwise 306combination of flags), you can calculate a hash value from (`data, size`) and 307use it to fuzz an additional integer argument. For example: 308 309```cpp 310extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 311 std::string str = std::string(reinterpret_cast<const char*>(data), size); 312 std::size_t data_hash = std::hash<std::string>()(str); 313 APIToBeFuzzed(data, size, data_hash); 314 return 0; 315} 316 317``` 318 319*** note 320**Note:** The hash method doesn't have the corpus format issue mentioned in the 321FDP section above, but it can lead to results that aren't as sophisticated as 322LPM or FDP. The hash value derived from the data is a random value, rather than 323a meaningful one controlled by the fuzzing engine. A single bit mutation might 324lead to a new code coverage, but the next mutation would generate a new hash 325value and trigger another code path, without providing any real guidance to the 326fuzzing engine. 327*** 328 329[AFL]: AFL_integration.md 330[AddressSanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html 331[ClusterFuzz status]: libFuzzer_integration.md#Status-Links 332[Efficient Fuzzing Guide]: efficient_fuzzing.md 333[FuzzedDataProvider]: https://cs.chromium.org/chromium/src/third_party/re2/src/re2/fuzzing/compiler-rt/include/fuzzer/FuzzedDataProvider.h 334[Fuzzer Dictionary]: efficient_fuzzing.md#Fuzzer-dictionary 335[GN]: https://gn.googlesource.com/gn/+/master/README.md 336[GN config]: https://cs.chromium.org/chromium/src/tools/mb/mb_config_expectations/chromium.fuzz.json 337[Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md 338[Integration Reference]: reference.md 339[MemorySanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html 340[Seed Corpus]: efficient_fuzzing.md#Seed-corpus 341[UndefinedBehaviorSanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html 342[code coverage report]: efficient_fuzzing.md#Code-coverage 343[upstream documentation]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider 344[libFuzzer's output documentation]: http://llvm.org/docs/LibFuzzer.html#output 345[quic_stream_factory_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/quic_stream_factory_fuzzer.cc 346