1Platform: NVIDIA CUDA 2 Device: Tesla T4 3 Driver version : 560.35.03 (Linux x64) 4 Compute units : 40 5 Clock frequency : 1590 MHz 6 7 Global memory bandwidth (GBPS) 8 float : 235.00 9 float2 : 247.01 10 float4 : 253.11 11 float8 : 263.44 12 float16 : 252.38 13 14 Single-precision compute (GFLOPS) 15 float : 8030.45 16 float2 : 8034.32 17 float4 : 7985.38 18 float8 : 7848.48 19 float16 : 7651.69 20 21 No half precision support! Skipped 22 23 Double-precision compute (GFLOPS) 24 double : 256.45 25 double2 : 256.03 26 double4 : 253.74 27 double8 : 252.76 28 double16 : 251.68 29 30 Integer compute (GIOPS) 31 int : 5802.79 32 int2 : 5715.24 33 int4 : 5742.30 34 int8 : 5863.19 35 int16 : 5711.99 36 37 Transfer bandwidth (GBPS) 38 enqueueWriteBuffer : 4.73 39 enqueueReadBuffer : 4.78 40 enqueueMapBuffer(for read) : 8.73 41 memcpy from mapped ptr : 5.39 42 enqueueUnmap(after write) : 12.17 43 memcpy to mapped ptr : 5.39 44 45 Kernel launch latency : 5.82 us 46