1Platform: NVIDIA CUDA 2 Device: NVIDIA GeForce GTX 1660 Ti 3 Driver version : 565.57.01 (Linux x64) 4 Compute units : 24 5 Clock frequency : 1590 MHz 6 7 Global memory bandwidth (GBPS) 8 float : 235.92 9 float2 : 247.28 10 float4 : 260.64 11 float8 : 254.10 12 float16 : 217.35 13 14 Single-precision compute (GFLOPS) 15 float : 5692.43 16 float2 : 5705.85 17 float4 : 5697.71 18 float8 : 5497.52 19 float16 : 4822.71 20 21 No half precision support! Skipped 22 23 Double-precision compute (GFLOPS) 24 double : 166.56 25 double2 : 169.71 26 double4 : 151.43 27 double8 : 152.88 28 double16 : 163.43 29 30 Integer compute (GIOPS) 31 int : 5009.23 32 int2 : 5025.67 33 int4 : 4511.78 34 int8 : 4535.21 35 int16 : 4828.46 36 37 Integer compute Fast 24bit (GIOPS) 38 int : 5030.41 39 int2 : 5000.83 40 int4 : 5002.84 41 int8 : 4461.20 42 int16 : 4415.56 43 44 Integer char (8bit) compute (GIOPS) 45 char : 4137.69 46 char2 : 4238.38 47 char4 : 4174.55 48 char8 : 4234.00 49 char16 : 3432.68 50 51 Integer short (16bit) compute (GIOPS) 52 short : 4185.20 53 short2 : 4014.07 54 short4 : 4125.94 55 short8 : 3622.42 56 short16 : 3496.44 57 58 Transfer bandwidth (GBPS) 59 enqueueWriteBuffer : 6.85 60 enqueueReadBuffer : 6.92 61 enqueueWriteBuffer non-blocking : 6.14 62 enqueueReadBuffer non-blocking : 6.08 63 enqueueMapBuffer(for read) : 9.77 64 memcpy from mapped ptr : 11.68 65 enqueueUnmap(after write) : 12.33 66 memcpy to mapped ptr : 11.99 67 68 Kernel launch latency : 4.14 us 69