• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!--
2© 2019 and later: Unicode, Inc. and others.
3License & terms of use: http://www.unicode.org/copyright.html
4-->
5
6Developing Fuzzer Targets for ICU APIs
7======================================
8
9This documents describes how to develop a [fuzzer](https://opensource.google.com/projects/oss-fuzz)
10target for an ICU API and its integration into the ICU build process.
11
12### Directory and naming conventions
13
14Fuzzer targets are exclusively in directory
15[`source/test/fuzzer/`](https://github.com/unicode-org/icu/tree/master/icu4c/source/test/fuzzer)
16and end with `_fuzzer.cpp`. Only files with such ending are recognized and executed as fuzzer
17targets by the OSS-Fuzz system.
18
19### General structure of a fuzzer target
20
21As a minimum, a fuzzer target contains the function
22
23
24```
25extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
26  ...
27}
28```
29
30This function is expected and invoked by the fuzzer system. The `data` parameter contains the
31fuzzer-controlled data of size `size` bytes. Part or all of this data is then passed into the
32ICU API under test.
33
34Fuzzer target
35[`collator_rulebased_fuzzer.cpp`](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/collator_rulebased_fuzzer.cpp)
36illustrates the basic elements.
37
38```
39// © 2019 and later: Unicode, Inc. and others.
40// License & terms of use: http://www.unicode.org/copyright.html
41
42#include <cstring>
43
44#include "fuzzer_utils.h"
45#include "unicode/coll.h"
46#include "unicode/localpointer.h"
47#include "unicode/locid.h"
48#include "unicode/tblcoll.h"
49
50IcuEnvironment* env = new IcuEnvironment();
51
52extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
53  UErrorCode status = U_ZERO_ERROR;
54
55  size_t unistr_size = size/2;
56  std::unique_ptr<char16_t[]> fuzzbuff(new char16_t[unistr_size]);
57  std::memcpy(fuzzbuff.get(), data, unistr_size * 2);
58  icu::UnicodeString fuzzstr(false, fuzzbuff.get(), unistr_size);
59
60  icu::LocalPointer<icu::RuleBasedCollator> col1(
61      new icu::RuleBasedCollator(fuzzstr, status));
62
63  return 0;
64}
65```
66
67The ICU API under test is the `RuleBasedCollator(const UnicodeString &rules, UErrorCode &status)`
68constructor. The code interprets the fuzzer data as UnicodeString and passes it to the constructor.
69And that is all. Specific error handling or return value verification is not required because the
70fuzzer will detect all memory issues by means of memory/address sanitizer findings.
71
72### Makefile.in changes
73
74ICU fuzzer targets are built and executed by the OSS-Fuzz project. On side of ICU they are compiled
75to assure that the code is syntactically correct and, as a sanity check, executed in the most basic
76manner, i.e. with minimal testdata and without ASAN or MSAN analysis.
77
78Add the new fuzzer target to the list of targets in the `FUZZER_TARGETS` variable in
79[`Makefile.in`](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/Makefile.in).
80The new fuzzer target will then be built and executed as part of a normal ICU4C unit test run. Note
81that each fuzzer target becomes executable on its own. As such it is linked with the code in
82`fuzzer_driver.cpp`, which contains the `main()` function.
83
84### Fuzzer seed corpus
85
86Any fuzzer seed data for a fuzzer target goes into a file with name `<fuzzer_target>_seed_corpus.txt`.
87In many cases the input parameter of the ICU API under test is of type `UnicodeString`, in case
88of which the seed data should be in UTF-16 format. As an example,see
89[collator_rulebased_fuzzer_seed_corpus.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/collator_rulebased_fuzzer_seed_corpus.txt).
90
91### Guidelines and tips
92
93*   Leave all randomness to the fuzzer. If a random selection of any kind is needed (e.g., of a
94    locale), then use bytes from the fuzzer data to make the selection
95    ([example](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/break_iterator_fuzzer.cpp)).
96*   In many cases ICU unit tests can provide seed data or at least ideas for seed data. If the API
97    under test requires a Unicode string then make sure that the seed data is in UTF-16 encoding.
98    This can be achieved with e.g. the 'iconv' command or using an editor that saves text in UTF-16.
99
100### How to locally reproduce fuzzer findings
101
102At this time reproduction of fuzzer findings requires Docker installed on the local machine and the
103OSS-Fuzz project downloaded in a local git client.
104
1051.  Install Docker (Ubuntu):
106
107    ```
108    sudo apt install docker
109    ```
1102.  Download OSS-Fuzz, switch into directory oss-fuzz/
111
112    In a git client directory, download the fuzzer system.
113
114    ```
115    git clone https://github.com/google/oss-fuzz.git
116    cd oss-fuzz/
117    ```
1183.  Build the Docker image for ICU.
119    In some setups root permissions may be required to connect to the Docker.
120
121    ```
122    [sudo] python infra/helper.py build_image icu
123    ```
124    A prompt will appear: `Pull latest base images (compiler/runtime)? (y/N)`
125    Respond: 'N'. If you are curious then respond with 'y' (won't hurt).
1264.  Build the ICU fuzzers:
127
128    ```
129    [sudo] python infra/helper.py build_fuzzers --sanitizer [address | memory | undefined] icu
130    ```
131    Check that the fuzzer targets were built successfully: ```ls -l build/out/icu```
132
1335.   Reproduce the fuzzer finding.
134     First, get the testdata the fuzzer used when finding the issue. In the fuzzer bug report look
135     for 'Reproducer Testcase', a click on the link will download the testdata. Then execute
136
137     ```
138     [sudo] python infra/helper.py reproduce icu <icu_fuzzer> <testdata>
139     ```
140     Concrete example:
141
142     ```
143     sudo python infra/helper.py reproduce icu uregex_open_fuzzer  ~/Downloads/clusterfuzz-testcase-minimized-uregex_open_fuzzer-5732067058384896
144     ```
145
146**Limitations:** When reproducing a fuzzer finding in the way outlined above the fuzzer environment
147will use the current ICU trunk from https://github.com/unicode-org/icu.git. Thus it is not possible
148to modify the code to try out a possible fix. What can be done is to redirect Docker to download ICU
149from a forked ICU repository. Open the file oss-fuzz/projects/icu/Dockerfile and adjust the line
150with `git clone --depth 1 https://github.com/unicode-org/icu.git icu` accordingly. Then modify
151the code in the forked repository and follow the steps above beginning with step 3, create a Docker
152image.
153
154This of course is still a tedious way of reproducing and working on a fuzzer finding. Ticket
155[ICU-20734](https://unicode-org.atlassian.net/browse/ICU-20734) aims to introduce a fuzzer driver
156that can reproduce certain fuzzer findings in a local ICU workspace.
157