1 2Platform: NVIDIA CUDA 3 Device: Tesla V100-PCIE-32GB 4 Driver version : 455.23.05 (Linux x64) 5 Compute units : 80 6 Clock frequency : 1380 MHz 7 8 Global memory bandwidth (GBPS) 9 float : 716.38 10 float2 : 765.67 11 float4 : 810.35 12 float8 : 723.85 13 float16 : 750.17 14 15 Single-precision compute (GFLOPS) 16 float : 14098.15 17 float2 : 14135.97 18 float4 : 14095.57 19 float8 : 14049.00 20 float16 : 13934.45 21 22 No half precision support! Skipped 23 24 Double-precision compute (GFLOPS) 25 double : 7075.81 26 double2 : 7065.56 27 double4 : 7046.01 28 double8 : 7013.68 29 double16 : 6951.51 30 31 Integer compute (GIOPS) 32 int : 14069.94 33 int2 : 14118.04 34 int4 : 14121.60 35 int8 : 14124.16 36 int16 : 14099.04 37 38 Integer compute Fast 24bit (GIOPS) 39 int : 14077.32 40 int2 : 14119.12 41 int4 : 14122.14 42 int8 : 14113.63 43 int16 : 14104.60 44 45 Transfer bandwidth (GBPS) 46 enqueueWriteBuffer : 12.06 47 enqueueReadBuffer : 10.64 48 enqueueWriteBuffer non-blocking : 10.72 49 enqueueReadBuffer non-blocking : 8.13 50 enqueueMapBuffer(for read) : 10.25 51 memcpy from mapped ptr : 17.55 52 enqueueUnmap(after write) : 12.59 53 memcpy to mapped ptr : 18.20 54 55 Kernel launch latency : 7.88 us 56 57