Platform: NVIDIA CUDA Device: Graphics Device Driver version : 378.13 (Linux x64) Compute units : 28 Clock frequency : 1683 MHz Global memory bandwidth (GBPS) float : 389.99 float2 : 394.86 float4 : 410.15 float8 : 388.05 float16 : 263.58 Single-precision compute (GFLOPS) float : 11675.87 float2 : 13240.07 float4 : 13317.21 float8 : 13151.05 float16 : 12939.08 Double-precision compute (GFLOPS) double : 425.21 double2 : 432.63 double4 : 425.45 double8 : 420.62 double16 : 409.39 Integer compute (GIOPS) int : 3507.68 int2 : 3801.87 int4 : 3772.84 int8 : 3774.45 int16 : 3748.59 Transfer bandwidth (GBPS) enqueueWriteBuffer : 9.96 enqueueReadBuffer : 8.95 enqueueMapBuffer(for read) : 11.11 memcpy from mapped ptr : 12.16 enqueueUnmap(after write) : 12.40 memcpy to mapped ptr : 12.48 Kernel launch latency : 4.22 us