1Platform: NVIDIA CUDA 2 Device: Tesla K40c 3 Driver version : 355.11 (Linux x64) 4 Compute units : 15 5 Clock frequency : 745 MHz 6 7 Global memory bandwidth (GBPS) 8 float : 207.57 9 float2 : 213.87 10 float4 : 218.45 11 float8 : 178.41 12 float16 : 127.86 13 14 Single-precision compute (GFLOPS) 15 float : 2798.13 16 float2 : 3495.13 17 float4 : 3340.98 18 float8 : 3481.63 19 float16 : 3168.49 20 21 Double-precision compute (GFLOPS) 22 double : 1427.69 23 double2 : 1426.97 24 double4 : 1423.42 25 double8 : 1418.63 26 double16 : 1408.26 27 28 Integer compute (GIOPS) 29 int : 715.75 30 int2 : 715.52 31 int4 : 715.50 32 int8 : 715.39 33 int16 : 715.43 34 35 Transfer bandwidth (GBPS) 36 enqueueWriteBuffer : 5.93 37 enqueueReadBuffer : 5.86 38 enqueueMapBuffer(for read) : 5.85 39 memcpy from mapped ptr : 4.80 40 enqueueUnmap(after write) : 6.16 41 memcpy to mapped ptr : 4.84 42 43 Kernel launch latency : 9.27 us 44