Platform: Portable Computing Language Device: NVIDIA Tegra X1 Driver version : 1.3 (Linux ARM64) Compute units : 1 Clock frequency : 921 MHz Global memory bandwidth (GBPS) float : 17.95 float2 : 20.21 float4 : 20.92 float8 : 19.82 float16 : 15.14 Single-precision compute (GFLOPS) float : 214.09 float2 : 229.80 float4 : 230.95 float8 : 229.31 float16 : 228.80 Half-precision compute (GFLOPS) half : 212.93 half2 : 228.95 half4 : 228.69 half8 : 245.39 half16 : 238.39 Double-precision compute (GFLOPS) double : 7.32 double2 : 7.31 double4 : 7.30 double8 : 7.27 double16 : 7.21 Integer compute (GIOPS) int : 70.95 int2 : 74.95 int4 : 76.43 int8 : 76.62 int16 : 76.78 Transfer bandwidth (GBPS) enqueueWriteBuffer : 2.94 enqueueReadBuffer : 0.69 enqueueMapBuffer(for read) : 2487.73 memcpy from mapped ptr : 0.70 enqueueUnmap(after write) : 0.68 memcpy to mapped ptr : 3.68 Kernel launch latency : 32.77 us Note via POCL 1.3