1 2Platform: NVIDIA CUDA 3 Device: Tesla K80 4 Driver version : 455.32.00 (Linux x64) 5 Compute units : 13 6 Clock frequency : 823 MHz 7 8 Global memory bandwidth (GBPS) 9 float : 147.19 10 float2 : 148.99 11 float4 : 152.33 12 float8 : 141.67 13 float16 : 68.77 14 15 Single-precision compute (GFLOPS) 16 float : 2835.78 17 float2 : 2834.16 18 float4 : 3700.81 19 float8 : 3518.41 20 float16 : 3288.67 21 22 No half precision support! Skipped 23 24 Double-precision compute (GFLOPS) 25 double : 1400.02 26 double2 : 1399.04 27 double4 : 1394.24 28 double8 : 1396.52 29 double16 : 1386.00 30 31 Integer compute (GIOPS) 32 int : 711.60 33 int2 : 711.39 34 int4 : 711.65 35 int8 : 711.87 36 int16 : 711.75 37 38 Integer compute Fast 24bit (GIOPS) 39 int : 711.52 40 int2 : 711.37 41 int4 : 711.58 42 int8 : 711.36 43 int16 : 709.81 44 45 Transfer bandwidth (GBPS) 46 enqueueWriteBuffer : 9.35 47 enqueueReadBuffer : 11.89 48 enqueueWriteBuffer non-blocking : 8.42 49 enqueueReadBuffer non-blocking : 11.13 50 enqueueMapBuffer(for read) : 9.99 51 memcpy from mapped ptr : 9.80 52 enqueueUnmap(after write) : 12.05 53 memcpy to mapped ptr : 9.57 54 55 Kernel launch latency : 8.20 us 56 57