README.md
1# Performance benchmarks for MLIR based code generation
2
3These benchmarks compare performance of Tensorflow -> LLVM code generation
4with Eigen. These benchmarks are based on the Google Benchmark library and
5can be integrated with performance monitoring tools.
6
7## Running benchmarks
8
9```
10bazel run -c opt --cpu=haswell \
11 :cwise_op_tanh_benchmark -- --benchmarks="f32/10k"
12```
13
14## Using perf and pprof with these benchmarks
15
161. Record perf profile
17```
18perf record -k 1 -o /tmp/perf.data -- \
19 bazel run -c opt --cpu=haswell -copt=-gmlt \
20 :cwise_op_tanh_benchmark -- --benchmarks="f32/10k"
21```
22
232. Inject data from the JIT compiled functions
24```
25perf inject -j -v -i /tmp/perf.data -o /tmp/perf.data.jit
26```
27
283. Report perf data
29
30```
31perf report -i /tmp/perf.data.jit
32```
33
34or
35
36```
37pprof -flame -nodecount=10000 /tmp/perf.data.jit
38```
39
40<!-- BEGIN GOOGLE-INTERNAL -->
41## Running benchmarks using perflab and benchy
42
431. go/benchy
442. go/perflab
45
46```
47benchy \
48 --reference=${reference} --cpu=haswell --runs=20 --benchmarks=all \
49 --perflab --borg_constraints="platform_family_genus_cpu=indus-skylake-2000" \
50 third_party/tensorflow/compiler/mlir/tfrt/benchmarks:cwise_op_tanh_benchmark
51```
52
53As of Q1 2021 `indus-skylake-2000` is the machine of the day, and roughly 60% of
54the fleet cycles are executed on Skylakes.
55
56Reference can be: 1. Cl number to test agains another pending change 2. `srcfs`
57to test agains the g3 head 3. Another client number to test local changes
58without exporting them <!-- END GOOGLE-INTERNAL -->
59