• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# afl-clang-lto - collision free instrumentation at link time
2
3## TL;DR:
4
5This version requires a current llvm 11+ compiled from the GitHub master.
6
71. Use afl-clang-lto/afl-clang-lto++ because it is faster and gives better
8   coverage than anything else that is out there in the AFL world.
9
102. You can use it together with llvm_mode: laf-intel and the instrument file
11   listing features and can be combined with cmplog/Redqueen.
12
133. It only works with llvm 11+.
14
154. AUTODICTIONARY feature (see below)!
16
175. If any problems arise, be sure to set `AR=llvm-ar RANLIB=llvm-ranlib`. Some
18   targets might need `LD=afl-clang-lto` and others `LD=afl-ld-lto`.
19
20## Introduction and problem description
21
22A big issue with how AFL++ works is that the basic block IDs that are set during
23compilation are random - and hence naturally the larger the number of
24instrumented locations, the higher the number of edge collisions are in the map.
25This can result in not discovering new paths and therefore degrade the
26efficiency of the fuzzing process.
27
28*This issue is underestimated in the fuzzing community!* With a 2^16 = 64kb
29standard map at already 256 instrumented blocks, there is on average one
30collision. On average, a target has 10.000 to 50.000 instrumented blocks, hence
31the real collisions are between 750-18.000!
32
33To reach a solution that prevents any collisions took several approaches and
34many dead ends until we got to this:
35
36* We instrument at link time when we have all files pre-compiled.
37* To instrument at link time, we compile in LTO (link time optimization) mode.
38* Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the correct
39  LTO options and runs our own afl-ld linker instead of the system linker.
40* The LLVM linker collects all LTO files to link and instruments them so that we
41  have non-colliding edge overage.
42* We use a new (for afl) edge coverage - which is the same as in llvm
43  -fsanitize=coverage edge coverage mode. :)
44
45The result:
46
47* 10-25% speed gain compared to llvm_mode
48* guaranteed non-colliding edge coverage :-)
49* The compile time, especially for binaries to an instrumented library, can be
50  much longer.
51
52Example build output from a libtiff build:
53
54```
55libtool: link: afl-clang-lto -g -O2 -Wall -W -o thumbnail thumbnail.o  ../libtiff/.libs/libtiff.a ../port/.libs/libport.a -llzma -ljbig -ljpeg -lz -lm
56afl-clang-lto++2.63d by Marc "vanHauser" Heuse <mh@mh-sec.de> in mode LTO
57afl-llvm-lto++2.63d by Marc "vanHauser" Heuse <mh@mh-sec.de>
58AUTODICTIONARY: 11 strings found
59[+] Instrumented 12071 locations with no collisions (on average 1046 collisions would be in afl-gcc/afl-clang-fast) (non-hardened mode).
60```
61
62## Getting llvm 11+
63
64### Installing llvm version 11 or 12
65
66llvm 11 or even 12 should be available in all current Linux repositories. If you
67use an outdated Linux distribution, read the next section.
68
69### Installing llvm from the llvm repository (version 12+)
70
71Installing the llvm snapshot builds is easy and mostly painless:
72
73In the following line, change `NAME` for your Debian or Ubuntu release name
74(e.g., buster, focal, eon, etc.):
75
76```
77echo deb http://apt.llvm.org/NAME/ llvm-toolchain-NAME NAME >> /etc/apt/sources.list
78```
79
80Then add the pgp key of llvm and install the packages:
81
82```
83wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
84apt-get update && apt-get upgrade -y
85apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \
86    libc++abi1-12 libc++abi-12-dev libclang1-12 libclang-12-dev \
87    libclang-common-12-dev libclang-cpp12 libclang-cpp12-dev liblld-12 \
88    liblld-12-dev liblldb-12 liblldb-12-dev libllvm12 libomp-12-dev \
89    libomp5-12 lld-12 lldb-12 llvm-12 llvm-12-dev llvm-12-runtime llvm-12-tools
90```
91
92### Building llvm yourself (version 12+)
93
94Building llvm from GitHub takes quite some time and is not painless:
95
96```sh
97sudo apt install binutils-dev  # this is *essential*!
98git clone --depth=1 https://github.com/llvm/llvm-project
99cd llvm-project
100mkdir build
101cd build
102
103# Add -G Ninja if ninja-build installed
104# "Building with ninja significantly improves your build time, especially with
105# incremental builds, and improves your memory usage."
106cmake \
107    -DCLANG_INCLUDE_DOCS="OFF" \
108    -DCMAKE_BUILD_TYPE=Release \
109    -DLLVM_BINUTILS_INCDIR=/usr/include/ \
110    -DLLVM_BUILD_LLVM_DYLIB="ON" \
111    -DLLVM_ENABLE_BINDINGS="OFF" \
112    -DLLVM_ENABLE_PROJECTS='clang;compiler-rt;libcxx;libcxxabi;libunwind;lld' \
113    -DLLVM_ENABLE_WARNINGS="OFF" \
114    -DLLVM_INCLUDE_BENCHMARKS="OFF" \
115    -DLLVM_INCLUDE_DOCS="OFF" \
116    -DLLVM_INCLUDE_EXAMPLES="OFF" \
117    -DLLVM_INCLUDE_TESTS="OFF" \
118    -DLLVM_LINK_LLVM_DYLIB="ON" \
119    -DLLVM_TARGETS_TO_BUILD="host" \
120    ../llvm/
121cmake --build . -j4
122export PATH="$(pwd)/bin:$PATH"
123export LLVM_CONFIG="$(pwd)/bin/llvm-config"
124export LD_LIBRARY_PATH="$(llvm-config --libdir)${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
125cd /path/to/AFLplusplus/
126make
127sudo make install
128```
129
130## How to use afl-clang-lto
131
132Just use afl-clang-lto like you did with afl-clang-fast or afl-gcc.
133
134Also, the instrument file listing (AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST ->
135[README.instrument_list.md](README.instrument_list.md)) and laf-intel/compcov
136(AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work.
137
138Example:
139
140```
141CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar ./configure
142make
143```
144
145NOTE: some targets also need to set the linker, try both `afl-clang-lto` and
146`afl-ld-lto` for `LD=` before `configure`.
147
148## Instrumenting shared libraries
149
150Note: this is highly discouraged! Try to compile to static libraries with
151afl-clang-lto instead of shared libraries!
152
153To make instrumented shared libraries work with afl-clang-lto, you have to do
154quite some extra steps.
155
156Every shared library you want to instrument has to be individually compiled. The
157environment variable `AFL_LLVM_LTO_DONTWRITEID=1` has to be set during
158compilation. Additionally, the environment variable `AFL_LLVM_LTO_STARTID` has
159to be set to the added edge count values of all previous compiled instrumented
160shared libraries for that target. E.g., for the first shared library this would
161be `AFL_LLVM_LTO_STARTID=0` and afl-clang-lto will then report how many edges
162have been instrumented (let's say it reported 1000 instrumented edges). The
163second shared library then has to be set to that value
164(`AFL_LLVM_LTO_STARTID=1000` in our example), for the third to all previous
165counts added, etc.
166
167The final program compilation step then may *not* have
168`AFL_LLVM_LTO_DONTWRITEID` set, and `AFL_LLVM_LTO_STARTID` must be set to all
169edge counts added of all shared libraries it will be linked to.
170
171This is quite some hands-on work, so better stay away from instrumenting shared
172libraries. :-)
173
174## AUTODICTIONARY feature
175
176While compiling, a dictionary based on string comparisons is automatically
177generated and put into the target binary. This dictionary is transferred to
178afl-fuzz on start. This improves coverage statistically by 5-10%. :)
179
180Note that if for any reason you do not want to use the autodictionary feature,
181then just set the environment variable `AFL_NO_AUTODICT` when starting afl-fuzz.
182
183## Fixed memory map
184
185To speed up fuzzing a little bit more, it is possible to set a fixed shared
186memory map. Recommended is the value 0x10000.
187
188In most cases, this will work without any problems. However, if a target uses
189early constructors, ifuncs, or a deferred forkserver, this can crash the target.
190
191Also, on unusual operating systems/processors/kernels or weird libraries the
192recommended 0x10000 address might not work, so then change the fixed address.
193
194To enable this feature, set `AFL_LLVM_MAP_ADDR` with the address.
195
196## Document edge IDs
197
198Setting `export AFL_LLVM_DOCUMENT_IDS=file` will document in a file which edge
199ID was given to which function. This helps to identify functions with variable
200bytes or which functions were touched by an input.
201
202## Solving difficult targets
203
204Some targets are difficult because the configure script does unusual stuff that
205is unexpected for afl. See the next section `Potential issues` for how to solve
206these.
207
208### Example: ffmpeg
209
210An example of a hard to solve target is ffmpeg. Here is how to successfully
211instrument it:
212
2131. Get and extract the current ffmpeg and change to its directory.
214
2152. Running configure with --cc=clang fails and various other items will fail
216   when compiling, so we have to trick configure:
217
218    ```
219    ./configure --enable-lto --disable-shared --disable-inline-asm
220    ```
221
2223. Now the configuration is done - and we edit the settings in
223   `./ffbuild/config.mak` (-: the original line, +: what to change it into):
224
225    ```
226    -CC=gcc
227    +CC=afl-clang-lto
228    -CXX=g++
229    +CXX=afl-clang-lto++
230    -AS=gcc
231    +AS=llvm-as
232    -LD=gcc
233    +LD=afl-clang-lto++
234    -DEPCC=gcc
235    +DEPCC=afl-clang-lto
236    -DEPAS=gcc
237    +DEPAS=afl-clang-lto++
238    -AR=ar
239    +AR=llvm-ar
240    -AR_CMD=ar
241    +AR_CMD=llvm-ar
242    -NM_CMD=nm -g
243    +NM_CMD=llvm-nm -g
244    -RANLIB=ranlib -D
245    +RANLIB=llvm-ranlib -D
246    ```
247
2484. Then type make, wait for a long time, and you are done. :)
249
250### Example: WebKit jsc
251
252Building jsc is difficult as the build script has bugs.
253
2541. Checkout Webkit:
255
256    ```
257    svn checkout https://svn.webkit.org/repository/webkit/trunk WebKit
258    cd WebKit
259    ```
260
2612. Fix the build environment:
262
263    ```
264    mkdir -p WebKitBuild/Release
265    cd WebKitBuild/Release
266    ln -s ../../../../../usr/bin/llvm-ar-12 llvm-ar-12
267    ln -s ../../../../../usr/bin/llvm-ranlib-12 llvm-ranlib-12
268    cd ../..
269    ```
270
2713. Build. :)
272
273    ```
274    Tools/Scripts/build-jsc --jsc-only --cli --cmakeargs="-DCMAKE_AR='llvm-ar-12' -DCMAKE_RANLIB='llvm-ranlib-12' -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CC_FLAGS='-O3 -lrt' -DCMAKE_CXX_FLAGS='-O3 -lrt' -DIMPORTED_LOCATION='/lib/x86_64-linux-gnu/' -DCMAKE_CC=afl-clang-lto -DCMAKE_CXX=afl-clang-lto++ -DENABLE_STATIC_JSC=ON"
275    ```
276
277## Potential issues
278
279### Compiling libraries fails
280
281If you see this message:
282
283```
284/bin/ld: libfoo.a: error adding symbols: archive has no index; run ranlib to add one
285```
286
287This is because usually gnu gcc ranlib is being called which cannot deal with
288clang LTO files. The solution is simple: when you `./configure`, you also have
289to set `RANLIB=llvm-ranlib` and `AR=llvm-ar`.
290
291Solution:
292
293```
294AR=llvm-ar RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --disable-shared
295```
296
297And on some targets you have to set `AR=/RANLIB=` even for `make` as the
298configure script does not save it. Other targets ignore environment variables
299and need the parameters set via `./configure --cc=... --cxx= --ranlib= ...` etc.
300(I am looking at you ffmpeg!)
301
302If you see this message:
303
304```
305assembler command failed ...
306```
307
308Then try setting `llvm-as` for configure:
309
310```
311AS=llvm-as  ...
312```
313
314### Compiling programs still fail
315
316afl-clang-lto is still work in progress.
317
318Known issues:
319* Anything that llvm 11+ cannot compile, afl-clang-lto cannot compile either -
320  obviously.
321* Anything that does not compile with LTO, afl-clang-lto cannot compile either -
322  obviously.
323
324Hence, if building a target with afl-clang-lto fails, try to build it with
325llvm12 and LTO enabled (`CC=clang-12`, `CXX=clang++-12`, `CFLAGS=-flto=full`,
326and `CXXFLAGS=-flto=full`).
327
328If this succeeds, then there is an issue with afl-clang-lto. Please report at
329[https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226).
330
331Even some targets where clang-12 fails can be built if the fail is just in
332`./configure`, see `Solving difficult targets` above.
333
334## History
335
336This was originally envisioned by hexcoder- in Summer 2019. However, we saw no
337way to create a pass that is run at link time - although there is a option for
338this in the PassManager: EP_FullLinkTimeOptimizationLast. ("Fun" info - nobody
339knows what this is doing. And the developer who implemented this didn't respond
340to emails.)
341
342In December then came the idea to implement this as a pass that is run via the
343llvm "opt" program, which is performed via an own linker that afterwards calls
344the real linker. This was first implemented in January and work ... kinda. The
345LTO time instrumentation worked, however, "how" the basic blocks were
346instrumented was a problem, as reducing duplicates turned out to be very, very
347difficult with a program that has so many paths and therefore so many
348dependencies. A lot of strategies were implemented - and failed. And then sat
349solvers were tried, but with over 10.000 variables that turned out to be a
350dead-end too.
351
352The final idea to solve this came from domenukk who proposed to insert a block
353into an edge and then just use incremental counters ... and this worked! After
354some trials and errors to implement this vanhauser-thc found out that there is
355actually an llvm function for this: SplitEdge() :-)
356
357Still more problems came up though as this only works without bugs from llvm 9
358onwards, and with high optimization the link optimization ruins the instrumented
359control flow graph.
360
361This is all now fixed with llvm 11+. The llvm's own linker is now able to load
362passes and this bypasses all problems we had.
363
364Happy end :)