1# Fast LLVM-based instrumentation for afl-fuzz 2 3For the general instruction manual, see [docs/README.md](../docs/README.md). 4 5For the GCC-based instrumentation, see 6[README.gcc_plugin.md](README.gcc_plugin.md). 7 8## 1) Introduction 9 10! llvm_mode works with llvm versions 3.8 up to 13 ! 11 12The code in this directory allows you to instrument programs for AFL++ using 13true compiler-level instrumentation, instead of the more crude assembly-level 14rewriting approach taken by afl-gcc and afl-clang. This has several interesting 15properties: 16 17- The compiler can make many optimizations that are hard to pull off when 18 manually inserting assembly. As a result, some slow, CPU-bound programs will 19 run up to around 2x faster. 20 21 The gains are less pronounced for fast binaries, where the speed is limited 22 chiefly by the cost of creating new processes. In such cases, the gain will 23 probably stay within 10%. 24 25- The instrumentation is CPU-independent. At least in principle, you should be 26 able to rely on it to fuzz programs on non-x86 architectures (after building 27 afl-fuzz with AFL_NO_X86=1). 28 29- The instrumentation can cope a bit better with multi-threaded targets. 30 31- Because the feature relies on the internals of LLVM, it is clang-specific and 32 will *not* work with GCC (see ../gcc_plugin/ for an alternative once it is 33 available). 34 35Once this implementation is shown to be sufficiently robust and portable, it 36will probably replace afl-clang. For now, it can be built separately and 37co-exists with the original code. 38 39The idea and much of the initial implementation came from Laszlo Szekeres. 40 41## 2a) How to use this - short 42 43Set the `LLVM_CONFIG` variable to the clang version you want to use, e.g.: 44 45``` 46LLVM_CONFIG=llvm-config-9 make 47``` 48 49In case you have your own compiled llvm version specify the full path: 50 51``` 52LLVM_CONFIG=~/llvm-project/build/bin/llvm-config make 53``` 54 55If you try to use a new llvm version on an old Linux this can fail because of 56old c++ libraries. In this case usually switching to gcc/g++ to compile 57llvm_mode will work: 58 59``` 60LLVM_CONFIG=llvm-config-7 REAL_CC=gcc REAL_CXX=g++ make 61``` 62 63It is highly recommended to use the newest clang version you can put your hands 64on :) 65 66Then look at [README.persistent_mode.md](README.persistent_mode.md). 67 68## 2b) How to use this - long 69 70In order to leverage this mechanism, you need to have clang installed on your 71system. You should also make sure that the llvm-config tool is in your path (or 72pointed to via LLVM_CONFIG in the environment). 73 74Note that if you have several LLVM versions installed, pointing LLVM_CONFIG to 75the version you want to use will switch compiling to this specific version - if 76you installation is set up correctly :-) 77 78Unfortunately, some systems that do have clang come without llvm-config or the 79LLVM development headers; one example of this is FreeBSD. FreeBSD users will 80also run into problems with clang being built statically and not being able to 81load modules (you'll see "Service unavailable" when loading afl-llvm-pass.so). 82 83To solve all your problems, you can grab pre-built binaries for your OS from: 84 85[https://llvm.org/releases/download.html](https://llvm.org/releases/download.html) 86 87...and then put the bin/ directory from the tarball at the beginning of your 88$PATH when compiling the feature and building packages later on. You don't need 89to be root for that. 90 91To build the instrumentation itself, type `make`. This will generate binaries 92called afl-clang-fast and afl-clang-fast++ in the parent directory. Once this is 93done, you can instrument third-party code in a way similar to the standard 94operating mode of AFL, e.g.: 95 96``` 97 CC=/path/to/afl/afl-clang-fast ./configure [...options...] 98 make 99``` 100 101Be sure to also include CXX set to afl-clang-fast++ for C++ code. 102 103Note that afl-clang-fast/afl-clang-fast++ are just pointers to afl-cc. You can 104also use afl-cc/afl-c++ and instead direct it to use LLVM instrumentation by 105either setting `AFL_CC_COMPILER=LLVM` or pass the parameter `--afl-llvm` via 106CFLAGS/CXXFLAGS/CPPFLAGS. 107 108The tool honors roughly the same environmental variables as afl-gcc (see 109[docs/env_variables.md](../docs/env_variables.md)). This includes 110`AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`. However, `AFL_INST_RATIO` 111is not honored as it does not serve a good purpose with the more effective 112PCGUARD analysis. 113 114## 3) Options 115 116Several options are present to make llvm_mode faster or help it rearrange the 117code to make afl-fuzz path discovery easier. 118 119If you need just to instrument specific parts of the code, you can the 120instrument file list which C/C++ files to actually instrument. See 121[README.instrument_list.md](README.instrument_list.md) 122 123For splitting memcmp, strncmp, etc., see 124[README.laf-intel.md](README.laf-intel.md). 125 126Then there are different ways of instrumenting the target: 127 1281. A better instrumentation strategy uses LTO and link time instrumentation. 129 Note that not all targets can compile in this mode, however, if it works it 130 is the best option you can use. To go with this option, use 131 afl-clang-lto/afl-clang-lto++. See [README.lto.md](README.lto.md). 132 1332. Alternatively you can choose a completely different coverage method: 134 1352a. N-GRAM coverage - which combines the previous visited edges with the current 136 one. This explodes the map but on the other hand has proven to be effective 137 for fuzzing. See 138 [7) AFL++ N-Gram Branch Coverage](#7-afl-n-gram-branch-coverage). 139 1402b. Context sensitive coverage - which combines the visited edges with an 141 individual caller ID (the function that called the current one). See 142 [6) AFL++ Context Sensitive Branch Coverage](#6-afl-context-sensitive-branch-coverage). 143 144Then - additionally to one of the instrumentation options above - there is a 145very effective new instrumentation option called CmpLog as an alternative to 146laf-intel that allow AFL++ to apply mutations similar to Redqueen. See 147[README.cmplog.md](README.cmplog.md). 148 149Finally, if your llvm version is 8 or lower, you can activate a mode that 150prevents that a counter overflow result in a 0 value. This is good for path 151discovery, but the llvm implementation for x86 for this functionality is not 152optimal and was only fixed in llvm 9. You can set this with AFL_LLVM_NOT_ZERO=1. 153 154Support for thread safe counters has been added for all modes. Activate it with 155`AFL_LLVM_THREADSAFE_INST=1`. The tradeoff is better precision in multi threaded 156apps for a slightly higher instrumentation overhead. This also disables the 157nozero counter default for performance reasons. 158 159## 4) deferred initialization, persistent mode, shared memory fuzzing 160 161This is the most powerful and effective fuzzing you can do. For a full 162explanation, see [README.persistent_mode.md](README.persistent_mode.md). 163 164## 5) Bonus feature: 'dict2file' pass 165 166Just specify `AFL_LLVM_DICT2FILE=/absolute/path/file.txt` and during compilation 167all constant string compare parameters will be written to this file to be used 168with afl-fuzz' `-x` option. 169 170## 6) AFL++ Context Sensitive Branch Coverage 171 172### What is this? 173 174This is an LLVM-based implementation of the context sensitive branch coverage. 175 176Basically every function gets its own ID and, every time when an edge is logged, 177all the IDs in the callstack are hashed and combined with the edge transition 178hash to augment the classic edge coverage with the information about the calling 179context. 180 181So if both function A and function B call a function C, the coverage collected 182in C will be different. 183 184In math the coverage is collected as follows: `map[current_location_ID ^ 185previous_location_ID >> 1 ^ hash_callstack_IDs] += 1` 186 187The callstack hash is produced XOR-ing the function IDs to avoid explosion with 188recursive functions. 189 190### Usage 191 192Set the `AFL_LLVM_INSTRUMENT=CTX` or `AFL_LLVM_CTX=1` environment variable. 193 194It is highly recommended to increase the MAP_SIZE_POW2 definition in config.h to 195at least 18 and maybe up to 20 for this as otherwise too many map collisions 196occur. 197 198### Caller Branch Coverage 199 200If the context sensitive coverage introduces too may collisions and becoming 201detrimental, the user can choose to augment edge coverage with just the called 202function ID, instead of the entire callstack hash. 203 204In math the coverage is collected as follows: `map[current_location_ID ^ 205previous_location_ID >> 1 ^ previous_callee_ID] += 1` 206 207Set the `AFL_LLVM_INSTRUMENT=CALLER` or `AFL_LLVM_CALLER=1` environment 208variable. 209 210## 7) AFL++ N-Gram Branch Coverage 211 212### Source 213 214This is an LLVM-based implementation of the n-gram branch coverage proposed in 215the paper 216["Be Sensitive and Collaborative: Analyzing Impact of Coverage Metrics in Greybox Fuzzing"](https://www.usenix.org/system/files/raid2019-wang-jinghan.pdf) 217by Jinghan Wang, et. al. 218 219Note that the original implementation (available 220[here](https://github.com/bitsecurerlab/afl-sensitive)) is built on top of AFL's 221QEMU mode. This is essentially a port that uses LLVM vectorized instructions 222(available from llvm versions 4.0.1 and higher) to achieve the same results when 223compiling source code. 224 225In math the branch coverage is performed as follows: `map[current_location ^ 226prev_location[0] >> 1 ^ prev_location[1] >> 1 ^ ... up to n-1`] += 1` 227 228### Usage 229 230The size of `n` (i.e., the number of branches to remember) is an option that is 231specified either in the `AFL_LLVM_INSTRUMENT=NGRAM-{value}` or the 232`AFL_LLVM_NGRAM_SIZE` environment variable. Good values are 2, 4, or 8, valid 233are 2-16. 234 235It is highly recommended to increase the MAP_SIZE_POW2 definition in config.h to 236at least 18 and maybe up to 20 for this as otherwise too many map collisions 237occur. 238 239## 8) NeverZero counters 240 241In larger, complex, or reiterative programs, the byte sized counters that 242collect the edge coverage can easily fill up and wrap around. This is not that 243much of an issue - unless, by chance, it wraps just to a value of zero when the 244program execution ends. In this case, afl-fuzz is not able to see that the edge 245has been accessed and will ignore it. 246 247NeverZero prevents this behavior. If a counter wraps, it jumps over the value 0 248directly to a 1. This improves path discovery (by a very small amount) at a very 249low cost (one instruction per edge). 250 251(The alternative of saturated counters has been tested also and proved to be 252inferior in terms of path discovery.) 253 254This is implemented in afl-gcc and afl-gcc-fast, however, for llvm_mode this is 255optional if multithread safe counters are selected or the llvm version is below 2569 - as there are severe performance costs in these cases. 257 258If you want to enable this for llvm versions below 9 or thread safe counters, 259then set 260 261``` 262export AFL_LLVM_NOT_ZERO=1 263``` 264 265In case you are on llvm 9 or greater and you do not want this behavior, then you 266can set: 267 268``` 269AFL_LLVM_SKIP_NEVERZERO=1 270``` 271 272If the target does not have extensive loops or functions that are called a lot, 273then this can give a small performance boost. 274 275Please note that the default counter implementations are not thread safe! 276 277Support for thread safe counters in mode LLVM CLASSIC can be activated with 278setting `AFL_LLVM_THREADSAFE_INST=1`.