Platform: NVIDIA CUDA Device: GeForce GT 650M Driver version: 331.82 (Win32) Global memory bandwidth (GBPS) float : 25.16 float2 : 26.84 float4 : 26.84 float8 : 13.42 float16 : 12.39 Single-precision compute (GFLOPS) float : 486.09 float2 : 644.06 float4 : 613.40 float8 : 613.40 float16 : 613.40 Double-precision compute (GFLOPS) double : 30.38 double2 : 30.34 double4 : 30.27 double8 : 30.13 double16 : 29.92 Transfer bandwidth (GBPS) enqueueWriteBuffer : 4.36 enqueueReadBuffer : 4.29 enqueueMapBuffer(for read) : 2.95 memcpy from mapped ptr : 8.01 enqueueUnmap(after write) : 4.33 memcpy to mapped ptr : 8.26 Kernel launch latency : 25.48 us