• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Platform: Portable Computing Language
2  Device: pthread-POWER9, altivec supported
3    Driver version  : 3.0-rc2 (Linux unknown)
4    Compute units   : 160
5    Clock frequency : 3800 MHz
6
7    Global memory bandwidth (GBPS)
8      float   : 30.30
9      float2  : 59.39
10      float4  : 63.92
11      float8  : 60.26
12      float16 : 57.05
13
14    Single-precision compute (GFLOPS)
15      float   : 73.11
16      float2  : 179.68
17      float4  : 411.74
18      float8  : 739.41
19      float16 : 910.81
20
21    No half precision support! Skipped
22
23    Double-precision compute (GFLOPS)
24      double   : 85.08
25      double2  : 151.08
26      double4  : 275.05
27      double8  : 401.79
28      double16 : 456.30
29
30    Integer compute (GIOPS)
31      int   : 112.89
32      int2  : 189.39
33      int4  : 440.41
34      int8  : 708.03
35      int16 : 748.61
36
37    Integer compute Fast 24bit (GIOPS)
38      int   : 149.56
39      int2  : 226.40
40      int4  : 407.09
41      int8  : 721.65
42      int16 : 755.17
43
44    Transfer bandwidth (GBPS)
45      enqueueWriteBuffer              : 5.88
46      enqueueReadBuffer               : 5.37
47      enqueueWriteBuffer non-blocking : 5.52
48      enqueueReadBuffer non-blocking  : 5.24
49      enqueueMapBuffer(for read)      : 901.70
50        memcpy from mapped ptr        : 7.58
51      enqueueUnmap(after write)       : 734.74
52        memcpy to mapped ptr          : 11.31
53
54    Kernel launch latency : 76.72 us
55
56  Device: Tesla V100-SXM2-16GB
57    Driver version  : 3.0-rc2 (Linux unknown)
58    Compute units   : 80
59    Clock frequency : 1530 MHz
60
61    Global memory bandwidth (GBPS)
62      float   : 765.25
63      float2  : 801.86
64      float4  : 836.36
65      float8  : 774.89
66      float16 : 645.66
67
68    Single-precision compute (GFLOPS)
69      float   : 15565.38
70      float2  : 15600.19
71      float4  : 15576.29
72      float8  : 15519.63
73      float16 : 15397.09
74
75    No half precision support! Skipped
76
77    Double-precision compute (GFLOPS)
78      double   : 7807.99
79      double2  : 7801.33
80      double4  : 7783.04
81      double8  : 7760.49
82      double16 : 7698.18
83
84    Integer compute (GIOPS)
85      int   : 15556.50
86      int2  : 15584.06
87      int4  : 15583.59
88      int8  : 15595.85
89      int16 : 15612.48
90
91    Integer compute Fast 24bit (GIOPS)
92      int   : 15599.91
93      int2  : 15625.54
94      int4  : 15624.03
95      int8  : 15637.68
96      int16 : 15612.58
97
98    Transfer bandwidth (GBPS)
99      enqueueWriteBuffer              : 11.79
100      enqueueReadBuffer               : 8.04
101      enqueueWriteBuffer non-blocking : 11.82
102      enqueueReadBuffer non-blocking  : 8.05
103      enqueueMapBuffer(for read)      : 51433.80
104        memcpy from mapped ptr        : 8.05
105      enqueueUnmap(after write)       : 14.40
106        memcpy to mapped ptr          : 11.85
107
108    Kernel launch latency : -5189.91 us
109
110  Device: Tesla V100-SXM2-16GB
111    Driver version  : 3.0-rc2 (Linux unknown)
112    Compute units   : 80
113    Clock frequency : 1530 MHz
114
115    Global memory bandwidth (GBPS)
116      float   : 765.06
117      float2  : 801.82
118      float4  : 836.53
119      float8  : 774.79
120      float16 : 646.26
121
122    Single-precision compute (GFLOPS)
123      float   : 15618.11
124      float2  : 15653.73
125      float4  : 15630.66
126      float8  : 15570.97
127      float16 : 15449.34
128
129    No half precision support! Skipped
130
131    Double-precision compute (GFLOPS)
132      double   : 7833.29
133      double2  : 7824.87
134      double4  : 7806.29
135      double8  : 7782.29
136      double16 : 7720.92
137
138    Integer compute (GIOPS)
139      int   : 15585.95
140      int2  : 15622.80
141      int4  : 15621.28
142      int8  : 15634.55
143      int16 : 15610.02
144
145    Integer compute Fast 24bit (GIOPS)
146      int   : 15597.93
147      int2  : 15623.93
148      int4  : 15621.19
149      int8  : 15635.40
150      int16 : 15611.91
151
152    Transfer bandwidth (GBPS)
153      enqueueWriteBuffer              : 13.13
154      enqueueReadBuffer               : 7.61
155      enqueueWriteBuffer non-blocking : 13.12
156      enqueueReadBuffer non-blocking  : 7.62
157      enqueueMapBuffer(for read)      : 35.21
158        memcpy from mapped ptr        : 8.39
159      enqueueUnmap(after write)       : 41.61
160        memcpy to mapped ptr          : 11.96
161
162    Kernel launch latency : -6635.35 us
163
164  Device: Tesla V100-SXM2-16GB
165    Driver version  : 3.0-rc2 (Linux unknown)
166    Compute units   : 80
167    Clock frequency : 1530 MHz
168
169    Global memory bandwidth (GBPS)
170      float   : 770.44
171      float2  : 802.01
172      float4  : 836.44
173      float8  : 775.56
174      float16 : 646.04
175
176    Single-precision compute (GFLOPS)
177      float   : 15619.53
178      float2  : 15587.93
179      float4  : 15565.71
180      float8  : 15507.16
181      float16 : 15451.14
182
183    No half precision support! Skipped
184
185    Double-precision compute (GFLOPS)
186      double   : 7835.15
187      double2  : 7828.63
188      double4  : 7808.13
189      double8  : 7784.17
190      double16 : 7722.49
191
192    Integer compute (GIOPS)
193      int   : 15599.63
194      int2  : 15626.78
195      int4  : 15624.69
196      int8  : 15638.92
197      int16 : 15613.33
198
199    Integer compute Fast 24bit (GIOPS)
200      int   : 15548.42
201      int2  : 15574.17
202      int4  : 15573.14
203      int8  : 15586.13
204      int16 : 15561.38
205
206    Transfer bandwidth (GBPS)
207      enqueueWriteBuffer              : 13.13
208      enqueueReadBuffer               : 7.62
209      enqueueWriteBuffer non-blocking : 13.12
210      enqueueReadBuffer non-blocking  : 7.62
211      enqueueMapBuffer(for read)      : 69.31
212        memcpy from mapped ptr        : 8.00
213      enqueueUnmap(after write)       : 70.45
214        memcpy to mapped ptr          : 12.12
215
216    Kernel launch latency : -8030.76 us
217
218  Device: Tesla V100-SXM2-16GB
219    Driver version  : 3.0-rc2 (Linux unknown)
220    Compute units   : 80
221    Clock frequency : 1530 MHz
222
223    Global memory bandwidth (GBPS)
224      float   : 768.67
225      float2  : 802.01
226      float4  : 836.45
227      float8  : 775.67
228      float16 : 646.45
229
230    Single-precision compute (GFLOPS)
231      float   : 15569.47
232      float2  : 15610.50
233      float4  : 15590.76
234      float8  : 15526.32
235      float16 : 15409.06
236
237    No half precision support! Skipped
238
239    Double-precision compute (GFLOPS)
240      double   : 7814.15
241      double2  : 7805.11
242      double4  : 7785.35
243      double8  : 7765.17
244      double16 : 7703.52
245
246    Integer compute (GIOPS)
247      int   : 15576.90
248      int2  : 15601.71
249      int4  : 15555.37
250      int8  : 15533.24
251      int16 : 15613.33
252
253    Integer compute Fast 24bit (GIOPS)
254      int   : 15557.44
255      int2  : 15626.78
256      int4  : 15626.21
257      int8  : 15640.24
258      int16 : 15614.84
259
260    Transfer bandwidth (GBPS)
261      enqueueWriteBuffer              : 13.14
262      enqueueReadBuffer               : 7.70
263      enqueueWriteBuffer non-blocking : 13.13
264      enqueueReadBuffer non-blocking  : 7.69
265      enqueueMapBuffer(for read)      : 69.99
266        memcpy from mapped ptr        : 8.05
267      enqueueUnmap(after write)       : 70.45
268        memcpy to mapped ptr          : 12.27
269
270    Kernel launch latency : -10064.30 us
271