• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Platform: NVIDIA CUDA
2  Device: NVIDIA GeForce GTX 1660 Ti
3    Driver version  : 565.57.01 (Linux x64)
4    Compute units   : 24
5    Clock frequency : 1590 MHz
6
7    Global memory bandwidth (GBPS)
8      float   : 235.92
9      float2  : 247.28
10      float4  : 260.64
11      float8  : 254.10
12      float16 : 217.35
13
14    Single-precision compute (GFLOPS)
15      float   : 5692.43
16      float2  : 5705.85
17      float4  : 5697.71
18      float8  : 5497.52
19      float16 : 4822.71
20
21    No half precision support! Skipped
22
23    Double-precision compute (GFLOPS)
24      double   : 166.56
25      double2  : 169.71
26      double4  : 151.43
27      double8  : 152.88
28      double16 : 163.43
29
30    Integer compute (GIOPS)
31      int   : 5009.23
32      int2  : 5025.67
33      int4  : 4511.78
34      int8  : 4535.21
35      int16 : 4828.46
36
37    Integer compute Fast 24bit (GIOPS)
38      int   : 5030.41
39      int2  : 5000.83
40      int4  : 5002.84
41      int8  : 4461.20
42      int16 : 4415.56
43
44    Integer char (8bit) compute (GIOPS)
45      char   : 4137.69
46      char2  : 4238.38
47      char4  : 4174.55
48      char8  : 4234.00
49      char16 : 3432.68
50
51    Integer short (16bit) compute (GIOPS)
52      short   : 4185.20
53      short2  : 4014.07
54      short4  : 4125.94
55      short8  : 3622.42
56      short16 : 3496.44
57
58    Transfer bandwidth (GBPS)
59      enqueueWriteBuffer              : 6.85
60      enqueueReadBuffer               : 6.92
61      enqueueWriteBuffer non-blocking : 6.14
62      enqueueReadBuffer non-blocking  : 6.08
63      enqueueMapBuffer(for read)      : 9.77
64        memcpy from mapped ptr        : 11.68
65      enqueueUnmap(after write)       : 12.33
66        memcpy to mapped ptr          : 11.99
67
68    Kernel launch latency : 4.14 us
69