• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Getting started with fuzzing in Chromium
2
3This document walks you through the basic steps to start fuzzing and suggestions
4for improving your fuzz targets. If you're looking for more advanced fuzzing
5topics, see the [main page](README.md).
6
7[TOC]
8
9## Getting started
10
11### Setting up your build environment
12
13Generate build files by using the `use_libfuzzer` [GN] argument together with a
14sanitizer. Pick the [GN config] that corresponds to the DUT you're deploying to:
15
16```bash
17# AddressSanitizer is the default config we recommend testing with.
18# Linux:
19tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Linux ASan' out/libfuzzer
20# Chrome OS:
21tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Chrome OS ASan' out/libfuzzer
22# Mac:
23tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Upload Mac ASan' out/libfuzzer
24# Windows:
25python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Upload Windows ASan" out\libfuzzer
26```
27
28If testing things locally these are the recommended configurations
29
30```bash
31# AddressSanitizer is the default config we recommend testing with.
32# Linux:
33tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Linux ASan' out/libfuzzer
34# Chrome OS:
35tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Chrome OS ASan' out/libfuzzer
36# Mac:
37tools/mb/mb.py gen -m chromium.fuzz -b 'Libfuzzer Local Mac ASan' out/libfuzzer
38# Windows:
39python tools\mb\mb.py gen -m chromium.fuzz -b "Libfuzzer Local Windows ASan" out\libfuzzer
40```
41
42*** note
43**Note:** The above invocations may set `use_remoteexec` or `use_rbe` to true.
44However, these args aren't compatible on local workstations yet. So if you run
45into reclient errors when building locally, remove both those args and set
46`use_goma` instead.
47
48You can also invoke [AFL] by using the `use_afl` GN argument, but we
49recommend libFuzzer for local development. Running libFuzzer locally doesn't
50require any special configuration and gives quick, meaningful output for speed,
51coverage, and other parameters.
52***
53
54It’s possible to run fuzz targets without sanitizers, but not recommended, as
55sanitizers help to detect errors which may not result in a crash otherwise.
56`use_libfuzzer` is supported in the following sanitizer configurations.
57
58| GN Argument | Description | Supported OS |
59|-------------|-------------|--------------|
60| `is_asan=true` | Enables [AddressSanitizer] to catch problems like buffer overruns. | Linux, Windows, Mac, Chrome OS |
61| `is_msan=true` | Enables [MemorySanitizer] to catch problems like uninitialized reads<sup>\[[\*](reference.md#MSan)\]</sup>. | Linux |
62| `is_ubsan_security=true` | Enables [UndefinedBehaviorSanitizer] to catch<sup>\[[\*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow.| Linux |
63
64For more on builder and sanitizer configurations, see the [Integration
65Reference] page.
66
67*** note
68**Hint**: Fuzz targets are built with minimal symbols by default. You can adjust
69the symbol level by setting the `symbol_level` attribute.
70***
71
72### Creating your first fuzz target
73
74After you set up your build environment, you can create your first fuzz target:
75
761. In the same directory as the code you are going to fuzz (or next to the tests
77   for that code), create a new `<my_fuzzer>.cc` file.
78
79   *** note
80   **Note:** Do not use the `testing/libfuzzer/fuzzers` directory. This
81   directory was used for initial sample fuzz targets but is no longer
82   recommended for landing new targets.
83   ***
84
852. In the new file, define a `LLVMFuzzerTestOneInput` function:
86
87  ```cpp
88  #include <stddef.h>
89  #include <stdint.h>
90
91  extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
92    // Put your fuzzing code here and use |data| and |size| as input.
93    return 0;
94  }
95  ```
96
973. In `BUILD.gn` file, define a `fuzzer_test` GN target:
98
99  ```python
100  import("//testing/libfuzzer/fuzzer_test.gni")
101  fuzzer_test("my_fuzzer") {
102    sources = [ "my_fuzzer.cc" ]
103    deps = [ ... ]
104  }
105  ```
106
107*** note
108**Note:** Most of the targets are small. They may perform one or a few API calls
109using the data provided by the fuzzing engine as an argument. However, fuzz
110targets may be more complex if a certain initialization procedure needs to be
111performed. [quic_stream_factory_fuzzer.cc] is a good example of a complex fuzz
112target.
113***
114
115### Running the fuzz target
116
117After you create your fuzz target, build it with autoninja and run it locally. In
118most cases you don't want to commit your locally generated corpus, so save it
119somewhere like `/tmp/corpus`.
120
121```bash
122# Build the fuzz target.
123autoninja -C out/libfuzzer url_parse_fuzzer
124# Create an empty corpus directory.
125mkdir /tmp/corpus
126# Run the fuzz target.
127./out/libfuzzer/url_parse_fuzzer /tmp/corpus
128# If have other corpus directories, pass their paths as well:
129./out/libfuzzer/url_parse_fuzzer /tmp/corpus seed_corpus_dir_1 seed_corpus_dir_N
130```
131
132Your fuzz target should produce output like this:
133
134```
135INFO: Seed: 1511722356
136INFO: Loaded 2 modules   (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194),
137INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
138INFO: A corpus is not provided, starting from an empty corpus
139#2  INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb
140#3  NEW    cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes-
141#4  NEW    cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte-
142#6  NEW    cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes-
143```
144
145A `... NEW ...` line appears when libFuzzer finds new and interesting inputs. If
146your fuzz target is efficient, it will find a lot of them quickly. A `... pulse
147...` line appears periodically to show the current status.
148
149For more information about the output, see [libFuzzer's output documentation].
150
151*** note
152**Note:** If you observe an `odr-violation` error in the log, please try setting
153the following environment variable: `ASAN_OPTIONS=detect_odr_violation=0` and
154running the fuzz target again.
155***
156
157#### Symbolizing a stacktrace
158
159If your fuzz target crashes when running locally and you see non-symbolized
160stacktrace, make sure you add the `third_party/llvm-build/Release+Asserts/bin/`
161directory from Chromium’s Clang package in `$PATH`. This directory contains the
162`llvm-symbolizer` binary.
163
164Alternatively, you can set an `external_symbolizer_path` via the `ASAN_OPTIONS`
165environment variable:
166
167```bash
168ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \
169  ./fuzzer ./crash-input
170```
171
172The same approach works with other sanitizers via `MSAN_OPTIONS`,
173`UBSAN_OPTIONS`, etc.
174
175### Submitting your fuzz target
176
177ClusterFuzz and the build infrastructure automatically discover, build and
178execute all `fuzzer_test` targets in the Chromium repository. Once you land your
179fuzz target, ClusterFuzz will run it at scale. Check the [ClusterFuzz status]
180page after a day or two.
181
182If you want to better understand and optimize your fuzz target’s performance,
183see the [Efficient Fuzzing Guide].
184
185*** note
186**Note:** It’s important to run fuzzers at scale, not just in your own
187environment, because local fuzzing will catch fewer issues. If you run fuzz
188targets at scale continuously, you’ll catch regressions and improve code
189coverage over time.
190***
191
192## Optional improvements
193
194### Common tricks
195
196Your fuzz target may immediately discover interesting (i.e. crashing) inputs.
197You can make it more effective with several easy steps:
198
199* **Create a seed corpus**. You can guide the fuzzing engine to generate more
200  relevant inputs by adding the `seed_corpus = "src/fuzz-testcases/"` attribute
201  to your fuzz target and adding example files to the appropriate directory. For
202  more, see the [Seed Corpus] section of the [Efficient Fuzzing Guide].
203
204  *** note
205  **Note:** make sure your corpus files are appropriately licensed.
206  ***
207
208* **Create a mutation dictionary**. You can make mutations more effective by
209  providing the fuzzer with a `dict = "protocol.dict"` GN attribute and a
210  dictionary file that contains interesting strings / byte sequences for the
211  target API. For more, see the [Fuzzer Dictionary] section of the [Efficient
212  Fuzzer Guide].
213
214* **Specify testcase length limits**. Long inputs can be problematic, because
215  they are more slowly processed by the fuzz target and increase the search
216  space. By default, libFuzzer uses `-max_len=4096` or takes the longest
217  testcase in the corpus if `-max_len` is not specified.
218
219  ClusterFuzz uses different strategies for different fuzzing sessions,
220  including different random values. Also, ClusterFuzz uses different fuzzing
221  engines (e.g. AFL that doesn't have `-max_len` option). If your target has an
222  input length limit that you would like to *strictly enforce*, add a sanity
223  check to the beginning of your `LLVMFuzzerTestOneInput` function:
224
225  ```cpp
226  if (size < kMinInputLength || size > kMaxInputLength)
227    return 0;
228  ```
229
230* **Generate a [code coverage report]**. See which code the fuzzer covered in
231  recent runs, so you can gauge whether it hits the important code parts or not.
232
233  **Note:** Since the code coverage of a fuzz target depends heavily on the
234  corpus provided when running the target, we recommend running the fuzz target
235  built with ASan locally for a little while (several minutes / hours) first.
236  This will produce some corpus, which should be used for generating a code
237  coverage report.
238
239#### Disabling noisy error message logging
240
241If the code you’re fuzzing generates a lot of error messages when encountering
242incorrect or invalid data, the fuzz target will be slow and inefficient.
243
244If the target uses Chromium logging APIs, you can silence errors by overriding
245the environment used for logging in your fuzz target:
246
247```cpp
248struct Environment {
249  Environment() {
250    logging::SetMinLogLevel(logging::LOG_FATAL);
251  }
252};
253
254extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
255  static Environment env;
256
257  // Put your fuzzing code here and use data+size as input.
258  return 0;
259}
260```
261
262### Mutating Multiple Inputs
263
264By default, a fuzzing engine such as libFuzzer mutates a single input (`uint8_t*
265data, size_t size`). However, APIs often accept multiple arguments of various
266types, rather than a single buffer. You can use three different methods to
267mutate multiple inputs at once.
268
269#### libprotobuf-mutator (LPM)
270
271If you need to mutate multiple inputs of various types and length, see [Getting
272Started with libprotobuf-mutator in Chromium].
273
274*** note
275**Note:** This method works with APIs and data structures of any complexity, but
276requires extra effort. You would need to write a `.proto` definition (unless you
277fuzz an existing protobuf) and C++ code to pass the proto message to the API you
278are fuzzing (you'll have a fuzzed protobuf message instead of `data, size`
279buffer).
280***
281
282#### FuzzedDataProvider (FDP)
283
284[FuzzedDataProvider] is a class useful for splitting a fuzz input into multiple
285parts of various types.
286
287*** note
288**Note:** FDP is much easier to use than LPM, but its downside is that format of
289the corpus becomes inconsistent. This doesn't matter if you don't have [Seed
290Corpus] (e.g. valid image files if you fuzz an image parser). FDP splits your
291corpus files into several pieces to fuzz a broader range of input types, so it
292can take longer to reach deeper code paths that surface more quickly if you fuzz
293only a single input type.
294***
295
296To use FDP, add `#include <fuzzer/FuzzedDataProvider.h>` to your fuzz target
297source file.
298
299To learn more about `FuzzedDataProvider`, check out the [upstream documentation]
300on it. It gives an overview of the available methods and links to a few example
301fuzz targets.
302
303#### Hash-based argument
304
305If your API accepts a buffer with data and some integer value (i.e., a bitwise
306combination of flags), you can calculate a hash value from (`data, size`) and
307use it to fuzz an additional integer argument. For example:
308
309```cpp
310extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
311  std::string str = std::string(reinterpret_cast<const char*>(data), size);
312  std::size_t data_hash = std::hash<std::string>()(str);
313  APIToBeFuzzed(data, size, data_hash);
314  return 0;
315}
316
317```
318
319*** note
320**Note:** The hash method doesn't have the corpus format issue mentioned in the
321FDP section above, but it can lead to results that aren't as sophisticated as
322LPM or FDP. The hash value derived from the data is a random value, rather than
323a meaningful one controlled by the fuzzing engine. A single bit mutation might
324lead to a new code coverage, but the next mutation would generate a new hash
325value and trigger another code path, without providing any real guidance to the
326fuzzing engine.
327***
328
329[AFL]: AFL_integration.md
330[AddressSanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html
331[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
332[Efficient Fuzzing Guide]: efficient_fuzzing.md
333[FuzzedDataProvider]: https://cs.chromium.org/chromium/src/third_party/re2/src/re2/fuzzing/compiler-rt/include/fuzzer/FuzzedDataProvider.h
334[Fuzzer Dictionary]: efficient_fuzzing.md#Fuzzer-dictionary
335[GN]: https://gn.googlesource.com/gn/+/master/README.md
336[GN config]: https://cs.chromium.org/chromium/src/tools/mb/mb_config_expectations/chromium.fuzz.json
337[Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md
338[Integration Reference]: reference.md
339[MemorySanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html
340[Seed Corpus]: efficient_fuzzing.md#Seed-corpus
341[UndefinedBehaviorSanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
342[code coverage report]: efficient_fuzzing.md#Code-coverage
343[upstream documentation]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider
344[libFuzzer's output documentation]: http://llvm.org/docs/LibFuzzer.html#output
345[quic_stream_factory_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/quic_stream_factory_fuzzer.cc
346