• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Platform: NVIDIA CUDA
2  Device: NVIDIA A100-SXM4-40GB
3    Driver version  : 515.48.07 (Linux x64)
4    Compute units   : 108
5    Clock frequency : 1410 MHz
6
7    Global memory bandwidth (GBPS)
8      float   : 1292.96
9      float2  : 1377.07
10      float4  : 1419.22
11      float8  : 1443.80
12      float16 : 1464.44
13
14    Single-precision compute (GFLOPS)
15      float   : 19352.89
16      float2  : 19386.64
17      float4  : 19367.97
18      float8  : 19285.78
19      float16 : 19110.61
20
21    No half precision support! Skipped
22
23    Double-precision compute (GFLOPS)
24      double   : 9723.09
25      double2  : 9707.90
26      double4  : 9686.22
27      double8  : 9636.75
28      double16 : 9547.99
29
30    Integer compute (GIOPS)
31      int   : 19278.83
32      int2  : 19335.19
33      int4  : 19273.49
34      int8  : 19357.90
35      int16 : 19346.59
36
37    Integer compute Fast 24bit (GIOPS)
38      int   : 19289.10
39      int2  : 19293.80
40      int4  : 19278.51
41      int8  : 19233.96
42      int16 : 19053.93
43
44    Transfer bandwidth (GBPS)
45      enqueueWriteBuffer              : 14.54
46      enqueueReadBuffer               : 13.04
47      enqueueWriteBuffer non-blocking : 7.79
48      enqueueReadBuffer non-blocking  : 7.21
49      enqueueMapBuffer(for read)      : 20.03
50        memcpy from mapped ptr        : 20.65
51      enqueueUnmap(after write)       : 26.77
52        memcpy to mapped ptr          : 20.86
53
54    Kernel launch latency : 5.76 us
55
56  Device: NVIDIA A100-SXM4-40GB
57    Driver version  : 515.48.07 (Linux x64)
58    Compute units   : 108
59    Clock frequency : 1410 MHz
60
61    Global memory bandwidth (GBPS)
62      float   : 1293.90
63      float2  : 1377.21
64      float4  : 1419.86
65      float8  : 1443.43
66      float16 : 1464.50
67
68    Single-precision compute (GFLOPS)
69      float   : 19349.87
70      float2  : 19381.94
71      float4  : 19362.20
72      float8  : 19280.28
73      float16 : 19105.41
74
75    No half precision support! Skipped
76
77    Double-precision compute (GFLOPS)
78      double   : 9716.58
79      double2  : 9701.24
80      double4  : 9680.46
81      double8  : 9631.46
82      double16 : 9543.48
83
84    Integer compute (GIOPS)
85      int   : 19275.95
86      int2  : 19324.45
87      int4  : 19268.48
88      int8  : 19351.87
89      int16 : 19343.15
90
91    Integer compute Fast 24bit (GIOPS)
92      int   : 19283.22
93      int2  : 19287.28
94      int4  : 19272.75
95      int8  : 19230.13
96      int16 : 19047.88
97
98    Transfer bandwidth (GBPS)
99      enqueueWriteBuffer              : 14.50
100      enqueueReadBuffer               : 13.05
101      enqueueWriteBuffer non-blocking : 7.71
102      enqueueReadBuffer non-blocking  : 7.27
103      enqueueMapBuffer(for read)      : 19.83
104        memcpy from mapped ptr        : 19.54
105      enqueueUnmap(after write)       : 26.77
106        memcpy to mapped ptr          : 20.55
107
108    Kernel launch latency : 5.65 us
109
110  Device: NVIDIA A100-SXM4-40GB
111    Driver version  : 515.48.07 (Linux x64)
112    Compute units   : 108
113    Clock frequency : 1410 MHz
114
115    Global memory bandwidth (GBPS)
116      float   : 1304.11
117      float2  : 1376.87
118      float4  : 1419.82
119      float8  : 1444.07
120      float16 : 1465.06
121
122    Single-precision compute (GFLOPS)
123      float   : 19350.41
124      float2  : 19382.11
125      float4  : 19363.12
126      float8  : 19281.61
127      float16 : 19108.25
128
129    No half precision support! Skipped
130
131    Double-precision compute (GFLOPS)
132      double   : 9719.24
133      double2  : 9704.38
134      double4  : 9682.93
135      double8  : 9633.92
136      double16 : 9544.74
137
138    Integer compute (GIOPS)
139      int   : 19277.98
140      int2  : 19332.19
141      int4  : 19269.01
142      int8  : 19352.73
143      int16 : 19343.15
144
145    Integer compute Fast 24bit (GIOPS)
146      int   : 19283.32
147      int2  : 19288.03
148      int4  : 19273.28
149      int8  : 19231.30
150      int16 : 19048.40
151
152    Transfer bandwidth (GBPS)
153      enqueueWriteBuffer              : 14.37
154      enqueueReadBuffer               : 13.13
155      enqueueWriteBuffer non-blocking : 7.50
156      enqueueReadBuffer non-blocking  : 6.90
157      enqueueMapBuffer(for read)      : 19.81
158        memcpy from mapped ptr        : 20.73
159      enqueueUnmap(after write)       : 26.77
160        memcpy to mapped ptr          : 20.62
161
162    Kernel launch latency : 5.75 us
163
164  Device: NVIDIA A100-SXM4-40GB
165    Driver version  : 515.48.07 (Linux x64)
166    Compute units   : 108
167    Clock frequency : 1410 MHz
168
169    Global memory bandwidth (GBPS)
170      float   : 1303.89
171      float2  : 1376.82
172      float4  : 1419.15
173      float8  : 1444.89
174      float16 : 1465.04
175
176    Single-precision compute (GFLOPS)
177      float   : 19339.44
178      float2  : 19388.10
179      float4  : 19371.42
180      float8  : 19289.58
181      float16 : 19115.54
182
183    No half precision support! Skipped
184
185    Double-precision compute (GFLOPS)
186      double   : 9724.83
187      double2  : 9710.39
188      double4  : 9689.79
189      double8  : 9641.13
190      double16 : 9552.76
191
192    Integer compute (GIOPS)
193      int   : 19285.03
194      int2  : 19313.19
195      int4  : 19286.42
196      int8  : 19361.56
197      int16 : 19347.78
198
199    Integer compute Fast 24bit (GIOPS)
200      int   : 19292.73
201      int2  : 19297.12
202      int4  : 19282.58
203      int8  : 19238.22
204      int16 : 19056.33
205
206    Transfer bandwidth (GBPS)
207      enqueueWriteBuffer              : 14.48
208      enqueueReadBuffer               : 13.16
209      enqueueWriteBuffer non-blocking : 7.18
210      enqueueReadBuffer non-blocking  : 6.98
211      enqueueMapBuffer(for read)      : 19.99
212        memcpy from mapped ptr        : 19.35
213      enqueueUnmap(after write)       : 26.77
214        memcpy to mapped ptr          : 20.63
215
216    Kernel launch latency : 5.70 us
217