1# Fuzzing binary-only targets 2 3AFL++, libfuzzer, and other fuzzers are great if you have the source code of the 4target. This allows for very fast and coverage guided fuzzing. 5 6However, if there is only the binary program and no source code available, then 7standard `afl-fuzz -n` (non-instrumented mode) is not effective. 8 9For fast, on-the-fly instrumentation of black-box binaries, AFL++ still offers 10various support. The following is a description of how these binaries can be 11fuzzed with AFL++. 12 13## TL;DR: 14 15FRIDA mode and QEMU mode in persistent mode are the fastest - if persistent mode 16is possible and the stability is high enough. 17 18Otherwise, try Zafl, RetroWrite, Dyninst, and if these fail, too, then try 19standard FRIDA/QEMU mode with `AFL_ENTRYPOINT` to where you need it. 20 21If your target is non-linux, then use unicorn_mode. 22 23## Fuzzing binary-only targets with AFL++ 24 25### QEMU mode 26 27QEMU mode is the "native" solution to the program. It is available in the 28./qemu_mode/ directory and, once compiled, it can be accessed by the afl-fuzz -Q 29command line option. It is the easiest to use alternative and even works for 30cross-platform binaries. 31 32For linux programs and its libraries, this is accomplished with a version of 33QEMU running in the lesser-known "user space emulation" mode. QEMU is a project 34separate from AFL++, but you can conveniently build the feature by doing: 35 36```shell 37cd qemu_mode 38./build_qemu_support.sh 39``` 40 41The following setup to use QEMU mode is recommended: 42 43* run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`) 44* run 1 afl-fuzz -Q instance with QASAN (`AFL_USE_QASAN=1`) 45* run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` + 46 `AFL_COMPCOV_LEVEL=2`), alternatively you can use FRIDA mode, just switch `-Q` 47 with `-O` and remove the LAF instance 48 49Then run as many instances as you have cores left with either -Q mode or - even 50better - use a binary rewriter like Dyninst, RetroWrite, ZAFL, etc. 51The binary rewriters all have their own advantages and caveats. 52ZAFL is the best but cannot be used in a business/commercial context. 53 54If a binary rewriter works for your target then you can use afl-fuzz normally 55and it will have twice the speed compared to QEMU mode (but slower than QEMU 56persistent mode). 57 58The speed decrease of QEMU mode is at about 50%. However, various options exist 59to increase the speed: 60- using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in 61 the binary (+5-10% speed) 62- using persistent mode 63 [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md) this will 64 result in a 150-300% overall speed increase - so 3-8x the original QEMU mode 65 speed! 66- using AFL_CODE_START/AFL_CODE_END to only instrument specific parts 67 68For additional instructions and caveats, see 69[qemu_mode/README.md](../qemu_mode/README.md). If possible, you should use the 70persistent mode, see 71[qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md). The mode is 72approximately 2-5x slower than compile-time instrumentation, and is less 73conducive to parallelization. 74 75Note that there is also honggfuzz: 76[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz) which 77now has a QEMU mode, but its performance is just 1.5% ... 78 79If you like to code a customized fuzzer without much work, we highly recommend 80to check out our sister project libafl which supports QEMU, too: 81[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL) 82 83### WINE+QEMU 84 85Wine mode can run Win32 PE binaries with the QEMU instrumentation. It needs 86Wine, python3, and the pefile python package installed. 87 88It is included in AFL++. 89 90For more information, see 91[qemu_mode/README.wine.md](../qemu_mode/README.wine.md). 92 93### FRIDA mode 94 95In FRIDA mode, you can fuzz binary-only targets as easily as with QEMU mode. 96FRIDA mode is most of the times slightly faster than QEMU mode. It is also 97newer, lacks COMPCOV, and has the advantage that it works on MacOS (both intel 98and M1). 99 100To build FRIDA mode: 101 102```shell 103cd frida_mode 104gmake 105``` 106 107For additional instructions and caveats, see 108[frida_mode/README.md](../frida_mode/README.md). 109 110If possible, you should use the persistent mode, see 111[instrumentation/README.persistent_mode.md](../instrumentation/README.persistent_mode.md). 112The mode is approximately 2-5x slower than compile-time instrumentation, and is 113less conducive to parallelization. But for binary-only fuzzing, it gives a huge 114speed improvement if it is possible to use. 115 116If you want to fuzz a binary-only library, then you can fuzz it with frida-gum 117via frida_mode/. You will have to write a harness to call the target function in 118the library, use afl-frida.c as a template. 119 120You can also perform remote fuzzing with frida, e.g., if you want to fuzz on 121iPhone or Android devices, for this you can use 122[https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/) as 123an intermediate that uses AFL++ for fuzzing. 124 125If you like to code a customized fuzzer without much work, we highly recommend 126to check out our sister project libafl which supports Frida, too: 127[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL). 128Working examples already exist :-) 129 130### Nyx mode 131 132Nyx is a full system emulation fuzzing environment with snapshot support that is 133built upon KVM and QEMU. It is only available on Linux and currently restricted 134to x86_x64. 135 136For binary-only fuzzing a special 5.10 kernel is required. 137 138See [nyx_mode/README.md](../nyx_mode/README.md). 139 140### Unicorn 141 142Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar. In 143contrast to QEMU, Unicorn does not offer a full system or even userland 144emulation. Runtime environment and/or loaders have to be written from scratch, 145if needed. On top, block chaining has been removed. This means the speed boost 146introduced in the patched QEMU Mode of AFL++ cannot be ported over to Unicorn. 147 148For non-Linux binaries, you can use AFL++'s unicorn_mode which can emulate 149anything you want - for the price of speed and user written scripts. 150 151To build unicorn_mode: 152 153```shell 154cd unicorn_mode 155./build_unicorn_support.sh 156``` 157 158For further information, check out 159[unicorn_mode/README.md](../unicorn_mode/README.md). 160 161### Shared libraries 162 163If the goal is to fuzz a dynamic library, then there are two options available. 164For both, you need to write a small harness that loads and calls the library. 165Then you fuzz this with either FRIDA mode or QEMU mode and either use 166`AFL_INST_LIBS=1` or `AFL_QEMU/FRIDA_INST_RANGES`. 167 168Another, less precise and slower option is to fuzz it with utils/afl_untracer/ 169and use afl-untracer.c as a template. It is slower than FRIDA mode. 170 171For more information, see 172[utils/afl_untracer/README.md](../utils/afl_untracer/README.md). 173 174### Coresight 175 176Coresight is ARM's answer to Intel's PT. With AFL++ v3.15, there is a coresight 177tracer implementation available in `coresight_mode/` which is faster than QEMU, 178however, cannot run in parallel. Currently, only one process can be traced, it 179is WIP. 180 181Fore more information, see 182[coresight_mode/README.md](../coresight_mode/README.md). 183 184## Binary rewriters 185 186An alternative solution are binary rewriters. They are faster than the solutions 187native to AFL++ but don't always work. 188 189### ZAFL 190 191ZAFL is a static rewriting platform supporting x86-64 C/C++, 192stripped/unstripped, and PIE/non-PIE binaries. Beyond conventional 193instrumentation, ZAFL's API enables transformation passes (e.g., laf-Intel, 194context sensitivity, InsTrim, etc.). 195 196Its baseline instrumentation speed typically averages 90-95% of 197afl-clang-fast's. 198 199[https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl) 200 201### RetroWrite 202 203RetroWrite is a static binary rewriter that can be combined with AFL++. If you 204have an x86_64 binary that still has its symbols (i.e., not stripped binary), is 205compiled with position independent code (PIC/PIE), and does not contain C++ 206exceptions, then the RetroWrite solution might be for you. It decompiles to ASM 207files which can then be instrumented with afl-gcc. 208 209Binaries that are statically instrumented for fuzzing using RetroWrite are close 210in performance to compiler-instrumented binaries and outperform the QEMU-based 211instrumentation. 212 213[https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite) 214 215### Dyninst 216 217Dyninst is a binary instrumentation framework similar to Pintool and DynamoRIO. 218However, whereas Pintool and DynamoRIO work at runtime, Dyninst instruments the 219target at load time and then let it run - or save the binary with the changes. 220This is great for some things, e.g., fuzzing, and not so effective for others, 221e.g., malware analysis. 222 223So, what you can do with Dyninst is taking every basic block and putting AFL++'s 224instrumentation code in there - and then save the binary. Afterwards, just fuzz 225the newly saved target binary with afl-fuzz. Sounds great? It is. The issue 226though - it is a non-trivial problem to insert instructions, which change 227addresses in the process space, so that everything is still working afterwards. 228Hence, more often than not binaries crash when they are run. 229 230The speed decrease is about 15-35%, depending on the optimization options used 231with afl-dyninst. 232 233[https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst) 234 235### Mcsema 236 237Theoretically, you can also decompile to llvm IR with mcsema, and then use 238llvm_mode to instrument the binary. Good luck with that. 239 240[https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema) 241 242## Binary tracers 243 244### Pintool & DynamoRIO 245 246Pintool and DynamoRIO are dynamic instrumentation engines. They can be used for 247getting basic block information at runtime. Pintool is only available for Intel 248x32/x64 on Linux, Mac OS, and Windows, whereas DynamoRIO is additionally 249available for ARM and AARCH64. DynamoRIO is also 10x faster than Pintool. 250 251The big issue with DynamoRIO (and therefore Pintool, too) is speed. DynamoRIO 252has a speed decrease of 98-99%, Pintool has a speed decrease of 99.5%. 253 254Hence, DynamoRIO is the option to go for if everything else fails and Pintool 255only if DynamoRIO fails, too. 256 257DynamoRIO solutions: 258* [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio) 259* [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL) 260* [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/) 261 <= very good but windows only 262 263Pintool solutions: 264* [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin) 265* [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin) 266* [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode) 267 <= only old Pintool version supported 268 269### Intel PT 270 271If you have a newer Intel CPU, you can make use of Intel's processor trace. The 272big issue with Intel's PT is the small buffer size and the complex encoding of 273the debug information collected through PT. This makes the decoding very CPU 274intensive and hence slow. As a result, the overall speed decrease is about 27570-90% (depending on the implementation and other factors). 276 277There are two AFL intel-pt implementations: 278 2791. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt) 280 => This needs Ubuntu 14.04.05 without any updates and the 4.4 kernel. 281 2822. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer) 283 => This needs a 4.14 or 4.15 kernel. The "nopti" kernel boot option must be 284 used. This one is faster than the other. 285 286Note that there is also honggfuzz: 287[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz). But 288its IPT performance is just 6%! 289 290## Non-AFL++ solutions 291 292There are many binary-only fuzzing frameworks. Some are great for CTFs but don't 293work with large binaries, others are very slow but have good path discovery, 294some are very hard to set-up... 295 296* Jackalope: 297 [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope) 298* Manticore: 299 [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore) 300* QSYM: 301 [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym) 302* S2E: [https://github.com/S2E](https://github.com/S2E) 303* TinyInst: 304 [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst) 305 (Mac/Windows only) 306* ... please send me any missing that are good 307 308## Closing words 309 310That's it! News, corrections, updates? Send an email to vh@thc.org. 311