• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2Platform: NVIDIA CUDA
3  Device: Tesla V100-PCIE-32GB
4    Driver version  : 455.23.05 (Linux x64)
5    Compute units   : 80
6    Clock frequency : 1380 MHz
7
8    Global memory bandwidth (GBPS)
9      float   : 716.38
10      float2  : 765.67
11      float4  : 810.35
12      float8  : 723.85
13      float16 : 750.17
14
15    Single-precision compute (GFLOPS)
16      float   : 14098.15
17      float2  : 14135.97
18      float4  : 14095.57
19      float8  : 14049.00
20      float16 : 13934.45
21
22    No half precision support! Skipped
23
24    Double-precision compute (GFLOPS)
25      double   : 7075.81
26      double2  : 7065.56
27      double4  : 7046.01
28      double8  : 7013.68
29      double16 : 6951.51
30
31    Integer compute (GIOPS)
32      int   : 14069.94
33      int2  : 14118.04
34      int4  : 14121.60
35      int8  : 14124.16
36      int16 : 14099.04
37
38    Integer compute Fast 24bit (GIOPS)
39      int   : 14077.32
40      int2  : 14119.12
41      int4  : 14122.14
42      int8  : 14113.63
43      int16 : 14104.60
44
45    Transfer bandwidth (GBPS)
46      enqueueWriteBuffer              : 12.06
47      enqueueReadBuffer               : 10.64
48      enqueueWriteBuffer non-blocking : 10.72
49      enqueueReadBuffer non-blocking  : 8.13
50      enqueueMapBuffer(for read)      : 10.25
51        memcpy from mapped ptr        : 17.55
52      enqueueUnmap(after write)       : 12.59
53        memcpy to mapped ptr          : 18.20
54
55    Kernel launch latency : 7.88 us
56
57