• Home
Name Date Size #Lines LOC

..--

BUILDD03-May-20247.5 KiB308282

README.mdD03-May-20241.6 KiB5943

benchmark.ccD03-May-20245.7 KiB154112

benchmark.hD03-May-20247.4 KiB180104

benchmark_mlir_function.ccD03-May-20247.9 KiB214129

benchmark_mlir_function.hD03-May-20243.8 KiB8547

compute_function_benchmark.ccD03-May-202419.7 KiB469420

cwise_op_exp_benchmark.ccD03-May-20242.2 KiB8153

cwise_op_expm1_benchmark.ccD03-May-20241.5 KiB5126

cwise_op_fusion_benchmark.ccD03-May-20242 KiB5634

cwise_op_log1p_benchmark.ccD03-May-20241.5 KiB5026

cwise_op_log2_benchmark.ccD03-May-20241.9 KiB6640

cwise_op_log_benchmark.ccD03-May-20241.5 KiB5126

cwise_op_rsqrt_benchmark.ccD03-May-20241.7 KiB5832

cwise_op_sigmoid_benchmark.ccD03-May-20241.5 KiB5127

cwise_op_tanh_benchmark.ccD03-May-20241.5 KiB5126

cwise_op_unary_benchmark.hD03-May-202413.4 KiB293203

fused_reduction_benchmark.ccD03-May-20243.4 KiB10978

matmul_op_benchmark.ccD03-May-20241.6 KiB5128

matmul_op_benchmark.hD03-May-20246.9 KiB181104

mean_row_op_benchmark.ccD03-May-20242.4 KiB6947

reduction_benchmark.ccD03-May-20243.1 KiB8257

reduction_benchmark.hD03-May-20245.1 KiB12190

softmax_op_benchmark.ccD03-May-20245.1 KiB145106

sum_col_op_benchmark.ccD03-May-20245.9 KiB14598

sum_full_op_benchmark.ccD03-May-20245.5 KiB14498

sum_row_op_benchmark.ccD03-May-20245.9 KiB14598

sum_transposed_op_benchmark.ccD03-May-20241.8 KiB4522

transpose_op_benchmark.ccD03-May-202410.1 KiB248183

README.md

1# Performance benchmarks for MLIR based code generation
2
3These benchmarks compare performance of Tensorflow -> LLVM code generation
4with Eigen. These benchmarks are based on the Google Benchmark library and
5can be integrated with performance monitoring tools.
6
7## Running benchmarks
8
9```
10bazel run -c opt --cpu=haswell \
11  :cwise_op_tanh_benchmark -- --benchmarks="f32/10k"
12```
13
14## Using perf and pprof with these benchmarks
15
161. Record perf profile
17```
18perf record -k 1 -o /tmp/perf.data --        \
19  bazel run -c opt --cpu=haswell -copt=-gmlt \
20  :cwise_op_tanh_benchmark -- --benchmarks="f32/10k"
21```
22
232. Inject data from the JIT compiled functions
24```
25perf inject -j -v -i /tmp/perf.data -o /tmp/perf.data.jit
26```
27
283. Report perf data
29
30```
31perf report -i /tmp/perf.data.jit
32```
33
34or
35
36```
37pprof -flame -nodecount=10000 /tmp/perf.data.jit
38```
39
40<!-- BEGIN GOOGLE-INTERNAL -->
41## Running benchmarks using perflab and benchy
42
431. go/benchy
442. go/perflab
45
46```
47benchy                                                                        \
48  --reference=${reference} --cpu=haswell --runs=20 --benchmarks=all           \
49  --perflab --borg_constraints="platform_family_genus_cpu=indus-skylake-2000" \
50  third_party/tensorflow/compiler/mlir/tfrt/benchmarks:cwise_op_tanh_benchmark
51```
52
53As of Q1 2021 `indus-skylake-2000` is the machine of the day, and roughly 60% of
54the fleet cycles are executed on Skylakes.
55
56Reference can be: 1. Cl number to test agains another pending change 2. `srcfs`
57to test agains the g3 head 3. Another client number to test local changes
58without exporting them <!-- END GOOGLE-INTERNAL -->
59