results/NVIDIA_CUDA/Tesla_P40.log

Platform: NVIDIA CUDA
  Device: Tesla P40
    Driver version  : 550.54.14 (Linux x64)
    Compute units   : 30
    Clock frequency : 1531 MHz

    Global memory bandwidth (GBPS)
      float   : 282.85
      float2  : 294.10
      float4  : 301.39
      float8  : 279.29
      float16 : 193.72

    Single-precision compute (GFLOPS)
      float   : 11153.70
      float2  : 11505.40
      float4  : 11475.82
      float8  : 11410.92
      float16 : 11367.69

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 367.62
      double2  : 367.05
      double4  : 366.32
      double8  : 365.52
      double16 : 362.97

    Integer compute (GIOPS)
      int   : 3897.08
      int2  : 3889.65
      int4  : 3904.29
      int8  : 3610.75
      int16 : 3540.68

    Integer compute Fast 24bit (GIOPS)
      int   : 3895.72
      int2  : 3901.65
      int4  : 3895.32
      int8  : 3882.49
      int16 : 3866.57

    Integer char (8bit) compute (GIOPS)
      char   : 10813.47
      char2  : 11447.82
      char4  : 11485.37
      char8  : 11522.07
      char16 : 11404.32

    Integer short (16bit) compute (GIOPS)
      short   : 10708.50
      short2  : 11449.04
      short4  : 11481.69
      short8  : 11518.50
      short16 : 11333.30

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 6.17
      enqueueReadBuffer               : 6.45
      enqueueWriteBuffer non-blocking : 5.68
      enqueueReadBuffer non-blocking  : 6.37
      enqueueMapBuffer(for read)      : 5.75
        memcpy from mapped ptr        : 9.36
      enqueueUnmap(after write)       : 6.27
        memcpy to mapped ptr          : 9.36

    Kernel launch latency : 3.78 us