1Platform: Moore Threads OpenCL 2 Device: MUSA GEN1-104 3 Driver version : 20241010 release kuae1.3.0_musa3.1.0 db329f8fb@20241009 (Linux x64) 4 Compute units : 32 5 Clock frequency : 1799 MHz 6 7 Global memory bandwidth (GBPS) 8 float : 273.11 9 float2 : 374.98 10 float4 : 387.00 11 float8 : 391.09 12 float16 : 399.77 13 14 Single-precision compute (GFLOPS) 15 float : 14176.27 16 float2 : 13376.75 17 float4 : 13451.20 18 float8 : 13425.36 19 float16 : 13350.33 20 21 Half-precision compute (GFLOPS) 22 half : 13296.25 23 half2 : 13387.43 24 half4 : 13450.06 25 half8 : 13437.34 26 half16 : 13368.42 27 28 Double-precision compute (GFLOPS) 29 double : 35.60 30 double2 : 30.19 31 double4 : 21.89 32 double8 : 13.00 33 double16 : 7.04 34 35 Integer compute (GIOPS) 36 int : 2094.24 37 int2 : 2092.97 38 int4 : 2096.44 39 int8 : 2096.89 40 int16 : 2099.12 41 42 Integer compute Fast 24bit (GIOPS) 43 int : 2094.74 44 int2 : 2092.97 45 int4 : 2095.84 46 int8 : 2097.56 47 int16 : 2099.13 48 49 Integer char (8bit) compute (GIOPS) 50 char : 2095.43 51 char2 : 2093.63 52 char4 : 2097.98 53 char8 : 2097.77 54 char16 : 2100.38 55 56 Integer short (16bit) compute (GIOPS) 57 short : 2095.20 58 short2 : 2093.82 59 short4 : 2096.92 60 short8 : 2098.04 61 short16 : 2099.83 62 63 Transfer bandwidth (GBPS) 64 enqueueWriteBuffer : 5.84 65 enqueueReadBuffer : 5.92 66 enqueueWriteBuffer non-blocking : 5.85 67 enqueueReadBuffer non-blocking : 5.91 68 enqueueMapBuffer(for read) : 5743.47 69 memcpy from mapped ptr : 0.02 70 enqueueUnmap(after write) : 6042.44 71 memcpy to mapped ptr : 5.03 72 73 Kernel launch latency : 48.02 us 74