README.md
1This folder contains an implementation of [automemcpy: A framework for automatic generation of fundamental memory operations](https://research.google/pubs/pub50338/).
2
3It uses the [Z3 theorem prover](https://github.com/Z3Prover/z3) to enumerate a subset of valid memory function implementations. These implementations are then materialized as C++ code and can be [benchmarked](../) against various [size distributions](../distributions). This process helps the design of efficient implementations for a particular environnement (size distribution, processor or custom compilation options).
4
5This is not enabled by default, as it is mostly useful when working on tuning the library implementation. To build it, use `LIBC_BUILD_AUTOMEMCPY=ON` (see below).
6
7## Prerequisites
8
9You may need to install `Z3` from source if it's not available on your system.
10Here we show instructions to install it into `<Z3_INSTALL_DIR>`.
11You may need to `sudo` to `make install`.
12
13```shell
14mkdir -p ~/git
15cd ~/git
16git clone https://github.com/Z3Prover/z3.git
17python scripts/mk_make.py --prefix=<Z3_INSTALL_DIR>
18cd build
19make -j
20make install
21```
22
23## Configuration
24
25```shell
26mkdir -p <BUILD_DIR>
27cd <LLVM_PROJECT_DIR>/llvm
28cmake -DCMAKE_C_COMPILER=/usr/bin/clang \
29 -DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
30 -DLLVM_ENABLE_PROJECTS="libc" \
31 -DLLVM_ENABLE_Z3_SOLVER=ON \
32 -DLLVM_Z3_INSTALL_DIR=<Z3_INSTALL_DIR> \
33 -DLIBC_BUILD_AUTOMEMCPY=ON \
34 -DCMAKE_BUILD_TYPE=Release \
35 -B<BUILD_DIR>
36```
37
38## Targets and compilation
39
40There are three main CMake targets
41 1. `automemcpy_implementations`
42 - runs `Z3` and materializes valid memory functions as C++ code, a message will display its ondisk location.
43 - the source code is then compiled using the native host optimizations (i.e. `-march=native` or `-mcpu=native` depending on the architecture).
44 2. `automemcpy`
45 - the binary that benchmarks the autogenerated implementations.
46 3. `automemcpy_result_analyzer`
47 - the binary that analyses the benchmark results.
48
49You may only compile the binaries as they both pull the autogenerated code as a dependency.
50
51```shell
52make -C <BUILD_DIR> -j automemcpy automemcpy_result_analyzer
53```
54
55## Running the benchmarks
56
57Make sure to save the results of the benchmark as a json file.
58
59```shell
60<BUILD_DIR>/bin/automemcpy --benchmark_out_format=json --benchmark_out=<RESULTS_DIR>/results.json
61```
62
63### Additional useful options
64
65
66 - `--benchmark_min_time=.2`
67
68 By default, each function is benchmarked for at least one second, here we lower it to 200ms.
69
70 - `--benchmark_filter="BM_Memset|BM_Bzero"`
71
72 By default, all functions are benchmarked, here we restrict them to `memset` and `bzero`.
73
74Other options might be useful, use `--help` for more information.
75
76## Analyzing the benchmarks
77
78Analysis is performed by running `automemcpy_result_analyzer` on one or more json result files.
79
80```shell
81<BUILD_DIR>/bin/automemcpy_result_analyzer <RESULTS_DIR>/results.json
82```
83
84What it does:
85 1. Gathers all throughput values for each function / distribution pair and picks the median one.\
86 This allows picking a representative value over many runs of the benchmark. Please make sure all the runs happen under similar circumstances.
87
88 2. For each distribution, look at the span of throughputs for functions of the same type (e.g. For distribution `A`, memcpy throughput spans from 2GiB/s to 5GiB/s).
89
90 3. For each distribution, give a normalized score to each function (e.g. For distribution `A`, function `M` scores 0.65).\
91 This score is then turned into a grade `EXCELLENT`, `VERY_GOOD`, `GOOD`, `PASSABLE`, `INADEQUATE`, `MEDIOCRE`, `BAD` - so that each distribution categorizes how function perform according to them.
92
93 4. A [Majority Judgement](https://en.wikipedia.org/wiki/Majority_judgment) process is then used to categorize each function. This enables finer analysis of how distributions agree on which function is better. In the following example, `Function_1` and `Function_2` are rated `EXCELLENT` but looking at the grade's distribution might help decide which is best.
94
95| | EXCELLENT | VERY_GOOD | GOOD | PASSABLE | INADEQUATE | MEDIOCRE | BAD |
96|------------|:---------:|:---------:|:----:|:--------:|:----------:|:--------:|:---:|
97| Function_1 | 7 | 1 | 2 | | | | |
98| Function_2 | 6 | 4 | | | | | |
99
100The tool outputs the histogram of grades for each function. In case of tie, other dimensions might help decide (e.g. code size, performance on other microarchitectures).
101
102```
103EXCELLENT |█▁▂ | Function_0
104EXCELLENT |█▅ | Function_1
105VERY_GOOD |▂█▁ ▁ | Function_2
106GOOD | ▁█▄ | Function_3
107PASSABLE | ▂▆▄█ | Function_4
108INADEQUATE | ▃▃█▁ | Function_5
109MEDIOCRE | █▆▁| Function_6
110BAD | ▁▁█| Function_7
111```
112