1Platform: NVIDIA CUDA 2 Device: Tesla P40 3 Driver version : 550.54.14 (Linux x64) 4 Compute units : 30 5 Clock frequency : 1531 MHz 6 7 Global memory bandwidth (GBPS) 8 float : 282.85 9 float2 : 294.10 10 float4 : 301.39 11 float8 : 279.29 12 float16 : 193.72 13 14 Single-precision compute (GFLOPS) 15 float : 11153.70 16 float2 : 11505.40 17 float4 : 11475.82 18 float8 : 11410.92 19 float16 : 11367.69 20 21 No half precision support! Skipped 22 23 Double-precision compute (GFLOPS) 24 double : 367.62 25 double2 : 367.05 26 double4 : 366.32 27 double8 : 365.52 28 double16 : 362.97 29 30 Integer compute (GIOPS) 31 int : 3897.08 32 int2 : 3889.65 33 int4 : 3904.29 34 int8 : 3610.75 35 int16 : 3540.68 36 37 Integer compute Fast 24bit (GIOPS) 38 int : 3895.72 39 int2 : 3901.65 40 int4 : 3895.32 41 int8 : 3882.49 42 int16 : 3866.57 43 44 Integer char (8bit) compute (GIOPS) 45 char : 10813.47 46 char2 : 11447.82 47 char4 : 11485.37 48 char8 : 11522.07 49 char16 : 11404.32 50 51 Integer short (16bit) compute (GIOPS) 52 short : 10708.50 53 short2 : 11449.04 54 short4 : 11481.69 55 short8 : 11518.50 56 short16 : 11333.30 57 58 Transfer bandwidth (GBPS) 59 enqueueWriteBuffer : 6.17 60 enqueueReadBuffer : 6.45 61 enqueueWriteBuffer non-blocking : 5.68 62 enqueueReadBuffer non-blocking : 6.37 63 enqueueMapBuffer(for read) : 5.75 64 memcpy from mapped ptr : 9.36 65 enqueueUnmap(after write) : 6.27 66 memcpy to mapped ptr : 9.36 67 68 Kernel launch latency : 3.78 us