1Platform: Portable Computing Language 2 Device: pthread-POWER9, altivec supported 3 Driver version : 3.0-rc2 (Linux unknown) 4 Compute units : 160 5 Clock frequency : 3800 MHz 6 7 Global memory bandwidth (GBPS) 8 float : 30.30 9 float2 : 59.39 10 float4 : 63.92 11 float8 : 60.26 12 float16 : 57.05 13 14 Single-precision compute (GFLOPS) 15 float : 73.11 16 float2 : 179.68 17 float4 : 411.74 18 float8 : 739.41 19 float16 : 910.81 20 21 No half precision support! Skipped 22 23 Double-precision compute (GFLOPS) 24 double : 85.08 25 double2 : 151.08 26 double4 : 275.05 27 double8 : 401.79 28 double16 : 456.30 29 30 Integer compute (GIOPS) 31 int : 112.89 32 int2 : 189.39 33 int4 : 440.41 34 int8 : 708.03 35 int16 : 748.61 36 37 Integer compute Fast 24bit (GIOPS) 38 int : 149.56 39 int2 : 226.40 40 int4 : 407.09 41 int8 : 721.65 42 int16 : 755.17 43 44 Transfer bandwidth (GBPS) 45 enqueueWriteBuffer : 5.88 46 enqueueReadBuffer : 5.37 47 enqueueWriteBuffer non-blocking : 5.52 48 enqueueReadBuffer non-blocking : 5.24 49 enqueueMapBuffer(for read) : 901.70 50 memcpy from mapped ptr : 7.58 51 enqueueUnmap(after write) : 734.74 52 memcpy to mapped ptr : 11.31 53 54 Kernel launch latency : 76.72 us 55 56 Device: Tesla V100-SXM2-16GB 57 Driver version : 3.0-rc2 (Linux unknown) 58 Compute units : 80 59 Clock frequency : 1530 MHz 60 61 Global memory bandwidth (GBPS) 62 float : 765.25 63 float2 : 801.86 64 float4 : 836.36 65 float8 : 774.89 66 float16 : 645.66 67 68 Single-precision compute (GFLOPS) 69 float : 15565.38 70 float2 : 15600.19 71 float4 : 15576.29 72 float8 : 15519.63 73 float16 : 15397.09 74 75 No half precision support! Skipped 76 77 Double-precision compute (GFLOPS) 78 double : 7807.99 79 double2 : 7801.33 80 double4 : 7783.04 81 double8 : 7760.49 82 double16 : 7698.18 83 84 Integer compute (GIOPS) 85 int : 15556.50 86 int2 : 15584.06 87 int4 : 15583.59 88 int8 : 15595.85 89 int16 : 15612.48 90 91 Integer compute Fast 24bit (GIOPS) 92 int : 15599.91 93 int2 : 15625.54 94 int4 : 15624.03 95 int8 : 15637.68 96 int16 : 15612.58 97 98 Transfer bandwidth (GBPS) 99 enqueueWriteBuffer : 11.79 100 enqueueReadBuffer : 8.04 101 enqueueWriteBuffer non-blocking : 11.82 102 enqueueReadBuffer non-blocking : 8.05 103 enqueueMapBuffer(for read) : 51433.80 104 memcpy from mapped ptr : 8.05 105 enqueueUnmap(after write) : 14.40 106 memcpy to mapped ptr : 11.85 107 108 Kernel launch latency : -5189.91 us 109 110 Device: Tesla V100-SXM2-16GB 111 Driver version : 3.0-rc2 (Linux unknown) 112 Compute units : 80 113 Clock frequency : 1530 MHz 114 115 Global memory bandwidth (GBPS) 116 float : 765.06 117 float2 : 801.82 118 float4 : 836.53 119 float8 : 774.79 120 float16 : 646.26 121 122 Single-precision compute (GFLOPS) 123 float : 15618.11 124 float2 : 15653.73 125 float4 : 15630.66 126 float8 : 15570.97 127 float16 : 15449.34 128 129 No half precision support! Skipped 130 131 Double-precision compute (GFLOPS) 132 double : 7833.29 133 double2 : 7824.87 134 double4 : 7806.29 135 double8 : 7782.29 136 double16 : 7720.92 137 138 Integer compute (GIOPS) 139 int : 15585.95 140 int2 : 15622.80 141 int4 : 15621.28 142 int8 : 15634.55 143 int16 : 15610.02 144 145 Integer compute Fast 24bit (GIOPS) 146 int : 15597.93 147 int2 : 15623.93 148 int4 : 15621.19 149 int8 : 15635.40 150 int16 : 15611.91 151 152 Transfer bandwidth (GBPS) 153 enqueueWriteBuffer : 13.13 154 enqueueReadBuffer : 7.61 155 enqueueWriteBuffer non-blocking : 13.12 156 enqueueReadBuffer non-blocking : 7.62 157 enqueueMapBuffer(for read) : 35.21 158 memcpy from mapped ptr : 8.39 159 enqueueUnmap(after write) : 41.61 160 memcpy to mapped ptr : 11.96 161 162 Kernel launch latency : -6635.35 us 163 164 Device: Tesla V100-SXM2-16GB 165 Driver version : 3.0-rc2 (Linux unknown) 166 Compute units : 80 167 Clock frequency : 1530 MHz 168 169 Global memory bandwidth (GBPS) 170 float : 770.44 171 float2 : 802.01 172 float4 : 836.44 173 float8 : 775.56 174 float16 : 646.04 175 176 Single-precision compute (GFLOPS) 177 float : 15619.53 178 float2 : 15587.93 179 float4 : 15565.71 180 float8 : 15507.16 181 float16 : 15451.14 182 183 No half precision support! Skipped 184 185 Double-precision compute (GFLOPS) 186 double : 7835.15 187 double2 : 7828.63 188 double4 : 7808.13 189 double8 : 7784.17 190 double16 : 7722.49 191 192 Integer compute (GIOPS) 193 int : 15599.63 194 int2 : 15626.78 195 int4 : 15624.69 196 int8 : 15638.92 197 int16 : 15613.33 198 199 Integer compute Fast 24bit (GIOPS) 200 int : 15548.42 201 int2 : 15574.17 202 int4 : 15573.14 203 int8 : 15586.13 204 int16 : 15561.38 205 206 Transfer bandwidth (GBPS) 207 enqueueWriteBuffer : 13.13 208 enqueueReadBuffer : 7.62 209 enqueueWriteBuffer non-blocking : 13.12 210 enqueueReadBuffer non-blocking : 7.62 211 enqueueMapBuffer(for read) : 69.31 212 memcpy from mapped ptr : 8.00 213 enqueueUnmap(after write) : 70.45 214 memcpy to mapped ptr : 12.12 215 216 Kernel launch latency : -8030.76 us 217 218 Device: Tesla V100-SXM2-16GB 219 Driver version : 3.0-rc2 (Linux unknown) 220 Compute units : 80 221 Clock frequency : 1530 MHz 222 223 Global memory bandwidth (GBPS) 224 float : 768.67 225 float2 : 802.01 226 float4 : 836.45 227 float8 : 775.67 228 float16 : 646.45 229 230 Single-precision compute (GFLOPS) 231 float : 15569.47 232 float2 : 15610.50 233 float4 : 15590.76 234 float8 : 15526.32 235 float16 : 15409.06 236 237 No half precision support! Skipped 238 239 Double-precision compute (GFLOPS) 240 double : 7814.15 241 double2 : 7805.11 242 double4 : 7785.35 243 double8 : 7765.17 244 double16 : 7703.52 245 246 Integer compute (GIOPS) 247 int : 15576.90 248 int2 : 15601.71 249 int4 : 15555.37 250 int8 : 15533.24 251 int16 : 15613.33 252 253 Integer compute Fast 24bit (GIOPS) 254 int : 15557.44 255 int2 : 15626.78 256 int4 : 15626.21 257 int8 : 15640.24 258 int16 : 15614.84 259 260 Transfer bandwidth (GBPS) 261 enqueueWriteBuffer : 13.14 262 enqueueReadBuffer : 7.70 263 enqueueWriteBuffer non-blocking : 13.13 264 enqueueReadBuffer non-blocking : 7.69 265 enqueueMapBuffer(for read) : 69.99 266 memcpy from mapped ptr : 8.05 267 enqueueUnmap(after write) : 70.45 268 memcpy to mapped ptr : 12.27 269 270 Kernel launch latency : -10064.30 us 271