• Home
  • Raw
  • Download

Lines Matching +full:d3 +full:- +full:time +full:- +full:format

3 ## Kernels provide an inner-loop implementation, and a format
9 other low-level details, while achieving high performance. Thus a line had to be
14 In itself, a GEMM kernel is just an implementation of the inner-most loop in a
15 GEMM (That inner-most loop has to be over the 'depth' dimension so as to be able
22 computation, but also in the format of data that they operate on. Indeed, in
28 gemmlowp allows each GEMM kernel to dictate the format of data that it expects,
29 in addition to providing its inner-loop implementation.
31 The former is given by a 'Format' typedef, and the latter by a 'Run' method.
34 NEONKernel12x4Depth2 kernel, which specifies its format as
38 KernelSideFormat<CellFormat<4, 2>, 1> > Format;
45 - 3 'cells' of size 4x2 each of the lhs, so a total lhs block of size 12x2
47 - 1 'cell' of size 2x4 of the rhs.
62 // A 2x4 cell of Rhs is stored in 16bit in d0--d1 (q0).
63 // A 12x2 block of 3 4x2 cells Lhs is stored in 16bit in d2--d7
64 // (q1--q3).
65 // A 12x4 block of accumulators is stored in 32bit in q4--q15.
67 // +-----+-----+-----+-----+
69 // Rhs +-----+-----+-----+-----+
71 // +-----+-----+-----+-----+
77 // +--+--+ - - - - +-----+-----+-----+-----+
78 // |d2|d3| | q4 | q5 | q6 | q7 |
79 // |d2|d3| | q4 | q5 | q6 | q7 |
80 // |d2|d3| | q4 | q5 | q6 | q7 |
81 // |d2|d3| | q4 | q5 | q6 | q7 |
82 // +--+--+ - - - - +-----+-----+-----+-----+
87 // +--+--+ - - - - +-----+-----+-----+-----+
92 // +--+--+ - - - - +-----+-----+-----+-----+
110 // Multiply-accumulate, level of depth 0
124 // Multiply-accumulate, level of depth 1
125 "vmlal.u16 q4, d3, d1[0]\n"
126 "vmlal.u16 q5, d3, d1[1]\n"
127 "vmlal.u16 q6, d3, d1[2]\n"
128 "vmlal.u16 q7, d3, d1[3]\n"
144 ## Packing code adapts to the format chosen by the kernel
148 depends on fine details of the kernel format, in ways that can only be
149 efficiently handled by knowing these kernel format details at compile-time.
152 templated in the corresponding kernel format.