1<!-- 2© 2019 and later: Unicode, Inc. and others. 3License & terms of use: http://www.unicode.org/copyright.html 4--> 5 6Developing Fuzzer Targets for ICU APIs 7====================================== 8 9This documents describes how to develop a [fuzzer](https://opensource.google.com/projects/oss-fuzz) 10target for an ICU API and its integration into the ICU build process. 11 12### Directory and naming conventions 13 14Fuzzer targets are exclusively in directory 15[`source/test/fuzzer/`](https://github.com/unicode-org/icu/tree/master/icu4c/source/test/fuzzer) 16and end with `_fuzzer.cpp`. Only files with such ending are recognized and executed as fuzzer 17targets by the OSS-Fuzz system. 18 19### General structure of a fuzzer target 20 21As a minimum, a fuzzer target contains the function 22 23 24``` 25extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 26 ... 27} 28``` 29 30This function is expected and invoked by the fuzzer system. The `data` parameter contains the 31fuzzer-controlled data of size `size` bytes. Part or all of this data is then passed into the 32ICU API under test. 33 34Fuzzer target 35[`collator_rulebased_fuzzer.cpp`](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/collator_rulebased_fuzzer.cpp) 36illustrates the basic elements. 37 38``` 39// © 2019 and later: Unicode, Inc. and others. 40// License & terms of use: http://www.unicode.org/copyright.html 41 42#include <cstring> 43 44#include "fuzzer_utils.h" 45#include "unicode/coll.h" 46#include "unicode/localpointer.h" 47#include "unicode/locid.h" 48#include "unicode/tblcoll.h" 49 50IcuEnvironment* env = new IcuEnvironment(); 51 52extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { 53 UErrorCode status = U_ZERO_ERROR; 54 55 size_t unistr_size = size/2; 56 std::unique_ptr<char16_t[]> fuzzbuff(new char16_t[unistr_size]); 57 std::memcpy(fuzzbuff.get(), data, unistr_size * 2); 58 icu::UnicodeString fuzzstr(false, fuzzbuff.get(), unistr_size); 59 60 icu::LocalPointer<icu::RuleBasedCollator> col1( 61 new icu::RuleBasedCollator(fuzzstr, status)); 62 63 return 0; 64} 65``` 66 67The ICU API under test is the `RuleBasedCollator(const UnicodeString &rules, UErrorCode &status)` 68constructor. The code interprets the fuzzer data as UnicodeString and passes it to the constructor. 69And that is all. Specific error handling or return value verification is not required because the 70fuzzer will detect all memory issues by means of memory/address sanitizer findings. 71 72### Makefile.in changes 73 74ICU fuzzer targets are built and executed by the OSS-Fuzz project. On side of ICU they are compiled 75to assure that the code is syntactically correct and, as a sanity check, executed in the most basic 76manner, i.e. with minimal testdata and without ASAN or MSAN analysis. 77 78Add the new fuzzer target to the list of targets in the `FUZZER_TARGETS` variable in 79[`Makefile.in`](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/Makefile.in). 80The new fuzzer target will then be built and executed as part of a normal ICU4C unit test run. Note 81that each fuzzer target becomes executable on its own. As such it is linked with the code in 82`fuzzer_driver.cpp`, which contains the `main()` function. 83 84### Fuzzer seed corpus 85 86Any fuzzer seed data for a fuzzer target goes into a file with name `<fuzzer_target>_seed_corpus.txt`. 87In many cases the input parameter of the ICU API under test is of type `UnicodeString`, in case 88of which the seed data should be in UTF-16 format. As an example,see 89[collator_rulebased_fuzzer_seed_corpus.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/collator_rulebased_fuzzer_seed_corpus.txt). 90 91### Guidelines and tips 92 93* Leave all randomness to the fuzzer. If a random selection of any kind is needed (e.g., of a 94 locale), then use bytes from the fuzzer data to make the selection 95 ([example](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/fuzzer/break_iterator_fuzzer.cpp)). 96* In many cases ICU unit tests can provide seed data or at least ideas for seed data. If the API 97 under test requires a Unicode string then make sure that the seed data is in UTF-16 encoding. 98 This can be achieved with e.g. the 'iconv' command or using an editor that saves text in UTF-16. 99 100### How to locally reproduce fuzzer findings 101 102At this time reproduction of fuzzer findings requires Docker installed on the local machine and the 103OSS-Fuzz project downloaded in a local git client. 104 1051. Install Docker (Ubuntu): 106 107 ``` 108 sudo apt install docker 109 ``` 1102. Download OSS-Fuzz, switch into directory oss-fuzz/ 111 112 In a git client directory, download the fuzzer system. 113 114 ``` 115 git clone https://github.com/google/oss-fuzz.git 116 cd oss-fuzz/ 117 ``` 1183. Build the Docker image for ICU. 119 In some setups root permissions may be required to connect to the Docker. 120 121 ``` 122 [sudo] python infra/helper.py build_image icu 123 ``` 124 A prompt will appear: `Pull latest base images (compiler/runtime)? (y/N)` 125 Respond: 'N'. If you are curious then respond with 'y' (won't hurt). 1264. Build the ICU fuzzers: 127 128 ``` 129 [sudo] python infra/helper.py build_fuzzers --sanitizer [address | memory | undefined] icu 130 ``` 131 Check that the fuzzer targets were built successfully: ```ls -l build/out/icu``` 132 1335. Reproduce the fuzzer finding. 134 First, get the testdata the fuzzer used when finding the issue. In the fuzzer bug report look 135 for 'Reproducer Testcase', a click on the link will download the testdata. Then execute 136 137 ``` 138 [sudo] python infra/helper.py reproduce icu <icu_fuzzer> <testdata> 139 ``` 140 Concrete example: 141 142 ``` 143 sudo python infra/helper.py reproduce icu uregex_open_fuzzer ~/Downloads/clusterfuzz-testcase-minimized-uregex_open_fuzzer-5732067058384896 144 ``` 145 146**Limitations:** When reproducing a fuzzer finding in the way outlined above the fuzzer environment 147will use the current ICU trunk from https://github.com/unicode-org/icu.git. Thus it is not possible 148to modify the code to try out a possible fix. What can be done is to redirect Docker to download ICU 149from a forked ICU repository. Open the file oss-fuzz/projects/icu/Dockerfile and adjust the line 150with `git clone --depth 1 https://github.com/unicode-org/icu.git icu` accordingly. Then modify 151the code in the forked repository and follow the steps above beginning with step 3, create a Docker 152image. 153 154This of course is still a tedious way of reproducing and working on a fuzzer finding. Ticket 155[ICU-20734](https://unicode-org.atlassian.net/browse/ICU-20734) aims to introduce a fuzzer driver 156that can reproduce certain fuzzer findings in a local ICU workspace. 157