• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Gemmlowp's public entry points
2
3gemmlowp's public interface is defined in
4[public/gemmlowp.h](../public/gemmlowp.h).
5
6## GemmWithOutputPipeline
7
8The primary public entry point is: `GemmWithOutputPipeline`.
9
10A usage example is given in
11[doc/quantization_example.cc](quantization_example.cc).
12
13The high-level overview of how this specifies a low-precision matrix
14multiplication is explained in [low-precision.md](low-precision.md). The
15rationale for a specific quantization paradigm is given in
16[quantization.md](quantization.md). That specific quantization paradigm is
17implemented at two different stages of the computation: as pre-processing on
18the operands and as post-processing on the result:
19
20*   Pre-processing on the LHS, RHS operands, in the form of adding constant
21    `lhs_offset`, `rhs_offset` to them, is explained in
22    [low-precision.md](low-precision.md).
23
24*   Post-processing on the result, in the form of a flexible "output pipeline",
25    is explained in [output.md](output.md).
26
27More details on this below as we discuss specific function parameters.
28
29The prototype is:
30
31```
32template <typename InputScalar, typename OutputScalar, typename BitDepthParams,
33          MapOrder LhsOrder, MapOrder RhsOrder, MapOrder ResultOrder,
34          typename OutputPipelineType, typename GemmContextType>
35void GemmWithOutputPipeline(GemmContextType* context,
36                            const MatrixMap<const InputScalar, LhsOrder>& lhs,
37                            const MatrixMap<const InputScalar, RhsOrder>& rhs,
38                            MatrixMap<OutputScalar, ResultOrder>* result,
39                            int lhs_offset, int rhs_offset,
40                            const OutputPipelineType& output_pipeline);
41```
42
43A typical call looks like (from the [usage example](quantization_example.cc)):
44
45```
46gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::uint8_t,
47                                 gemmlowp::DefaultL8R8BitDepthParams>(
48    &gemm_context, uint8_lhs_matrix, uint8_rhs_matrix,
49    &uint8_result_matrix, lhs_offset, rhs_offset, output_pipeline);
50```
51
52### Template parameters
53
54Typically only the 3 first template parameters need to be specified, the rest
55being automatically deduced from function parameters:
56
57*   `InputScalar`: The scalar type of the LHS and RHS operands. At the moment,
58    this must be `std::uint8_t`.
59*   `OutputScalar`: The scalar type of the result. At the moment,
60    this must be `std::uint8_t`.
61*   `BitDepthParams`: Defines the bit format of the input and output matrices
62    and the required accuracy of the computation. At the moment, the only
63    non-deprecated valid value is `gemmlowp::DefaultL8R8BitDepthParams`. See
64    [less-than-8-bit.md](less-than-8-bit.md) for other values and the general
65    idea of this, and how it may become more useful in the future.
66
67The other template parameters, which typically do not need to be specified, are:
68
69*   `LhsOrder`, `RhsOrder`, `ResultOrder`: the storage orders (row-major or
70    column-major) of the LHS, RHS, result matrices. See
71    [public/map.h](../public/map.h). See the below performance note: we
72    recommend using respectively RowMajor, ColMajor, ColMajor for optimal
73    performance.
74*   `OutputPipelineType`: the actual `std::tuple` type of the output pipeline.
75    See below explanation of the `output_pipeline` parameter, and
76    [output.md](output.md).
77*   `GemmContextType`: the type of the `context` parameter. At the moment, this
78    must be `gemmlowp::GemmContext`.
79
80### Function parameters
81
82The function parameters taken by `GemmWithOutputPipeline` are:
83
84*   `context`: The `gemmlowp::GemmContext` object holding state and resources to
85    be used for this gemmlowp call.
86*   `lhs`, `rhs`: The LHS and RHS operand matrices. Note that these are
87    `MatrixMap` objects, mapping external buffers as matrices, not owning data.
88    See [public/map.h](../public/map.h).
89*   `result`: pointer to the destination `MatrixMap` object, which must be
90    already constructed, wrapping the external destination buffer with the
91    wanted destination matrix shape and storage layout. No memory allocation
92    will be performed by gemmlowp for the destination buffer. See
93    [public/map.h](../public/map.h).
94*   `lhs_offset`, `rhs_offset` are constants added to each matrix entry in the
95    LHS, RHS matrices respectively, as explained in
96    [low-precision.md](low-precision.md). This is only the part of the
97    quantization paradigm explained in [quantization.md](quantization.md) that
98    needs to be implemented as operations on the operands; everything else is
99    operations on the result, see `output_pipeline`.
100*   `output_pipeline` is a `std::tuple` of output stages (see
101    [public/output_stages.h](../public/output_stages.h)), specifying the output
102    pipeline (see [output.md](output.md)). This is the part of the quantization
103    paradigm explained in [quantization.md](quantization.md) that needs to be
104    implemented as operations on the result matrix.
105
106### Performance note on storage orders.
107
108gemmlowp supports arbitrary combinations of storage orders for the LHS, RHS and
109result matrices. However, not all are equally optimized for.
110
111Because gemmlowp is primarily aimed at neural network inference workloads,
112optimization focus is on this particular combination of storage orders:
113
114*   `LhsOrder=RowMajor`
115*   `RhsOrder=ColMajor`
116*   `ResultOrder=ColMajor`
117
118The rationale is that the LHS is typically the constant weights of a neural
119network layer (e.g. the weights of a Convolutional layer implemented as a matrix
120multiplication), while the RHS and result are neural network activations,
121respectively the input and output activations of the layer.
122
123Because the RHS and result are activations, we want them to share the same
124storage order -- so that one layer's output activations can be readily used as
125the next layer's input activations. Thus, we focus on `RhsOrder=ResultOrder`.
126
127We also know from general considerations on matrix multiplication that it is
128slightly more efficient to have the direction of accumulation (the "depth"
129dimension) be the direction of contiguous storage in memory. That means that it
130is always going to be slightly easier and more efficient to have
131`LhsOrder=RowMajor` and `RhsOrder=ColMajor`.
132
133Putting this together, we arrive at gemmlowp's focus on the above-described
134combination of storage orders.
135
136Using other storage orders will typically mean taking less efficient paths in
137the packing and unpacking stages, see [packing.md](packing.md). The compute
138kernel stage ([kernel.md](kernel.md)) is unaffected.
139
140## GemmWithOutputPipelinePC
141
142This is a variant where `lhs_offset` and `rhs_offset` may be vectors instead of
143scalar. They are then broadcasted against LHS, RHS respectively.
144
145This is useful for some flavors of neural network inference with "per-channel
146quantization", whence the PC suffix. This has been useful in some settings where
147a neural network trained in float arithmetic was subsequently quantized. On the
148other hand, retraining neural networks for quantized inference tends to remove
149the need for per-channel quantization. For that reason, the long-term usefulness
150of this entry point is in question.
151
152## Gemm
153
154This is gemmlowp's original, now legacy and deprecated, entry point. See the
155section of [low-precision.md](low-precision.md) on the legacy quantization
156paradigm. Avoid in new code.
157
158## The eight_bit_int_gemm directory
159
160As explained in the top-level [README.md](../README.md#public-interfaces), this
161is entirely deprecated.
162