1# strcmp() / memcmp() token capture library 2 3 NOTE: libtokencap is only recommended for binary-only targets or targets that 4 do not compile with afl-clang-fast/afl-clang-lto. 5 The afl-clang-fast AFL_LLVM_DICT2FILE feature is much better, afl-clang-lto 6 has that feature automatically integrated. 7 8For the general instruction manual, see [docs/README.md](../../docs/README.md). 9 10This companion library allows you to instrument `strcmp()`, `memcmp()`, 11and related functions to automatically extract syntax tokens passed to any of 12these libcalls. The resulting list of tokens may be then given as a starting 13dictionary to afl-fuzz (the -x option) to improve coverage on subsequent 14fuzzing runs. 15 16This may help improving coverage in some targets, and do precisely nothing in 17others. In some cases, it may even make things worse: if libtokencap picks up 18syntax tokens that are not used to process the input data, but that are a part 19of - say - parsing a config file... well, you're going to end up wasting a lot 20of CPU time on trying them out in the input stream. In other words, use this 21feature with care. Manually screening the resulting dictionary is almost 22always a necessity. 23 24As for the actual operation: the library stores tokens, without any deduping, 25by appending them to a file specified via AFL_TOKEN_FILE. If the variable is not 26set, the tool uses stderr (which is probably not what you want). 27 28Similarly to afl-tmin, the library is not "proprietary" and can be used with 29other fuzzers or testing tools without the need for any code tweaks. It does not 30require AFL-instrumented binaries to work. 31 32To use the library, you *need* to make sure that your fuzzing target is compiled 33with -fno-builtin and is linked dynamically. If you wish to automate the first 34part without mucking with CFLAGS in Makefiles, you can set `AFL_NO_BUILTIN=1` 35when using afl-gcc. This setting specifically adds the following flags: 36 37``` 38 -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp 39 -fno-builtin-strcasencmp -fno-builtin-memcmp -fno-builtin-strstr 40 -fno-builtin-strcasestr 41``` 42 43The next step is to load this library via LD_PRELOAD. The optimal usage pattern 44is to allow afl-fuzz to fuzz normally for a while and build up a corpus, and 45then fire off the target binary, with libtokencap.so loaded, on every file found 46by AFL++ in that earlier run. This demonstrates the basic principle: 47 48``` 49 export AFL_TOKEN_FILE=$PWD/temp_output.txt 50 51 for i in <out_dir>/queue/id*; do 52 LD_PRELOAD=/path/to/libtokencap.so \ 53 /path/to/target/program [...params, including $i...] 54 done 55 56 sort -u temp_output.txt >afl_dictionary.txt 57``` 58 59If you don't get any results, the target library is probably not using strcmp() 60and memcmp() to parse input; or you haven't compiled it with -fno-builtin; or 61the whole thing isn't dynamically linked, and LD_PRELOAD is having no effect. 62 63Portability hints: There is probably no particularly portable and non-invasive 64way to distinguish between read-only and read-write memory mappings. 65The `__tokencap_load_mappings()` function is the only thing that would 66need to be changed for other OSes. 67 68Current supported OSes are: Linux, Darwin, FreeBSD (thanks to @devnexen) 69 70