• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Platform: NVIDIA CUDA
2  Device: Tesla P40
3    Driver version  : 550.54.14 (Linux x64)
4    Compute units   : 30
5    Clock frequency : 1531 MHz
6
7    Global memory bandwidth (GBPS)
8      float   : 282.85
9      float2  : 294.10
10      float4  : 301.39
11      float8  : 279.29
12      float16 : 193.72
13
14    Single-precision compute (GFLOPS)
15      float   : 11153.70
16      float2  : 11505.40
17      float4  : 11475.82
18      float8  : 11410.92
19      float16 : 11367.69
20
21    No half precision support! Skipped
22
23    Double-precision compute (GFLOPS)
24      double   : 367.62
25      double2  : 367.05
26      double4  : 366.32
27      double8  : 365.52
28      double16 : 362.97
29
30    Integer compute (GIOPS)
31      int   : 3897.08
32      int2  : 3889.65
33      int4  : 3904.29
34      int8  : 3610.75
35      int16 : 3540.68
36
37    Integer compute Fast 24bit (GIOPS)
38      int   : 3895.72
39      int2  : 3901.65
40      int4  : 3895.32
41      int8  : 3882.49
42      int16 : 3866.57
43
44    Integer char (8bit) compute (GIOPS)
45      char   : 10813.47
46      char2  : 11447.82
47      char4  : 11485.37
48      char8  : 11522.07
49      char16 : 11404.32
50
51    Integer short (16bit) compute (GIOPS)
52      short   : 10708.50
53      short2  : 11449.04
54      short4  : 11481.69
55      short8  : 11518.50
56      short16 : 11333.30
57
58    Transfer bandwidth (GBPS)
59      enqueueWriteBuffer              : 6.17
60      enqueueReadBuffer               : 6.45
61      enqueueWriteBuffer non-blocking : 5.68
62      enqueueReadBuffer non-blocking  : 6.37
63      enqueueMapBuffer(for read)      : 5.75
64        memcpy from mapped ptr        : 9.36
65      enqueueUnmap(after write)       : 6.27
66        memcpy to mapped ptr          : 9.36
67
68    Kernel launch latency : 3.78 us