• Home
Name Date Size #Lines LOC

..--

READMED03-May-20242.2 KiB2215

benchmark.hD03-May-20241.5 KiB5033

benchmark_main.ccD03-May-20246.7 KiB238209

contraction_benchmarks_cpu.ccD03-May-20241.4 KiB4025

tensor_benchmarks.hD03-May-202415.8 KiB479377

tensor_benchmarks_cpu.ccD03-May-20246.1 KiB169129

tensor_benchmarks_fp16_gpu.cuD03-May-20243.3 KiB7854

tensor_benchmarks_gpu.cuD03-May-20243.3 KiB7661

tensor_benchmarks_sycl.ccD03-May-20241.1 KiB3831

README

1The tensor benchmark suite is made of several parts.
2
3The first part is a generic suite, in which each benchmark comes in 2 flavors: one that runs on CPU, and one that runs on GPU.
4
5To compile the floating point CPU benchmarks, simply call:
6g++ tensor_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
7
8To compile the floating point GPU benchmarks, simply call:
9nvcc tensor_benchmarks_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -use_fast_math -ftz=true -arch compute_35 -o benchmarks_gpu
10
11We also provide a version of the generic GPU tensor benchmarks that uses half floats (aka fp16) instead of regular floats. To compile these benchmarks, simply call the command line below. You'll need a recent GPU that supports compute capability 5.3 or higher to run them and nvcc 7.5 or higher to compile the code.
12nvcc tensor_benchmarks_fp16_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -use_fast_math -ftz=true -arch compute_53 -o benchmarks_fp16_gpu
13
14last but not least, we also provide a suite of benchmarks to measure the scalability of the contraction code on CPU. To compile these benchmarks, call
15g++ contraction_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
16
17To compile the benchmark for SYCL, using ComputeCpp you currently need 2 passes (only for translation units containing device code):
181. The device compilation pass that generates the device code (SYCL kernels and referenced device functions) and glue code needed by the host compiler to reference the device code from host code.
19{ComputeCpp_ROOT}/bin/compute++ -I ../../ -I {ComputeCpp_ROOT}/include/ -std=c++11 -mllvm -inline-threshold=1000 -Wno-ignored-attributes -sycl -intelspirmetadata -emit-llvm -no-serial-memop -sycl-compress-name -DBUILD_PLATFORM_SPIR -DNDBUG -O3 -c tensor_benchmarks_sycl.cc
202. The host compilation pass that generates the final host binary.
21clang++-3.7 -include tensor_benchmarks_sycl.sycl benchmark_main.cc tensor_benchmarks_sycl.cc -pthread -I ../../ -I {ComputeCpp_ROOT}/include/ -L {ComputeCpp_ROOT}/lib/ -lComputeCpp -lOpenCL -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 -o tensor_benchmark_sycl
22