1# Tensor Operator Set Architecture (TOSA) Dialect 2 3[TOC] 4 5## Rationale 6 7The MLIR TOSA dialect implements the [TOSA 8specification](https://developer.mlplatform.org/w/tosa/). This document 9describes the decision process for how TOSA expresses operators in 10high level dialects. 11 12TOSA was developed after parallel efforts to rationalize the top-down picture 13from multiple high-level frameworks, as well as a bottom-up view of different 14hardware target concerns (CPU, GPU and NPU), and reflects a set of choices 15that attempt to manage both sets of requirements. 16 17## TOSA and Tensor Level Expressiveness 18 19TOSA endeavors to provide an operator set that tries to fulfil the following 20expressivenes goals at the *tensor level of abstraction* : 21 22### Complete 23 24This is driven by the top-down perspective, needing to express as much of 25multiple high level frameworks fully in TOSA, as possible. This was originally 26done from an operator frequency analysis done upon dozens of high level 27networks in different frameworks, to select the most frequently occuring ones 28and establish a common set of tensor-level operators that could express them. 29 30TOSA categorizes its operator set into classes and attempts to address major 31functional operations at the tensor level, including compute, reduction, 32elementwise transformations, comparison and control flow. 33 34### Minimal 35 36This takes the bottom-up approach - keep the TOSA operator set minimal in 37order to bound the design of hardware, operator kernels, code generation 38strategies and associated considerations that effect the executability of TOSA 39content. 40 41In this regard TOSA seeks to avoid creating compound operators, instead 42leaving it to compiler backend to fuse multiple TOSA ops if required. This 43choice also benefits the numerical precision goal, since it is easier to fuse the 44numerical functionality of successive operators, than to split the numerical 45functionality of a compound operator. 46 47### Numerical Precision 48 49TOSA began as a means to address operator-level numerical precision for 50code generation and hardware development. It therefore incorporates precision 51detail into the operator set. 52 53In this regard, TOSA operators are best understood as a combination of the visible 54quantization information embedded within an operation, together with the 55functional information about how that information is used, as described in the 56specification of the operation. 57 58## TOSA Operator Rationale 59 60The general basis of selection of the operator set that constitutes TOSA is 61described in the TOSA specification document under Section 1.3 Operator 62Selection. Explanation of the thinking behind some operators is listed here: 63 64### IDENTITYN 65 66tosa.IDENTITYN is used to form a list of Operator results during 67lowering of operations such as tf.Split from a sequence of tosa.SLICE 68ops. If there are alternate ways to express this lowering without the 69tosa.IDENTITYN op, the tosa.IDENTITYN op could be removed from TOSA. 70 71``` 72Value lower_split_op(Value %value, size_t axis, size_t 73num_split) { Value %output[] 74 75 size_t slice_size = %value.shape[axis] / num_split 76 77 for (int i = 0; i < num_split; i++) { 78 vector <size_t> begin_vals, size_vals 79 80 for (int j = 0; j < %value.rank; j++) { 81 if (j == axis) { 82 begin_vals.push_back(slice_size * i) 83 size_vals.push_back(slice_size) 84 } else { 85 begin_vals.push_back(0) 86 size_vals.push_bac(%value.shape[j]) 87 } 88 89 %output[i] = tosa.SLICE(%value) {start=begin_vals, size=size_vals} (tensor<%value.type>) -> tensor<size_vals, %value.dtype> 90 } 91 92 } 93 94 %output_list = tosa.IDENTITYN(%output) (tensor<%output:*.type>) -> tensor<%output_list:*.type> 95 return %output_list 96} 97``` 98 99### COND\_IF and WHILE\_LOOP 100 101Several neural networks express conditional control flow at the tensor level. 102A survey of multiple high level frameworks indicated that conditional if and 103a loop construct are common in all major frameworks, with some variation. 104Since TOSA endeavors to be complete in expressing tensor level functionality 105including control flow, it implements these constructs. 106 107The COND\_IF and WHILE\_LOOP operators implement such structured control 108flow forms and should be lowerable to corresponding ops in the scf dialect. 109Since the dialect seeks to remain isomorphic with an external, serialized form, 110the decision was to keep these ops in the dialect (as opposed to deferring 111completely to scf), and this may be re-evaluated if this turns out to not yield 112the expected value. 113 114## Using TOSA In A Compiler 115 116The TOSA specification describes each operator in functional detail. It is 117expected that compilers that use TOSA will use its builders to construct the 118operators so that the quantization information for the operator is correctly 119generated. 120 121The functional steps described in the pseudocode of the specification enables 122the construction of code generation for that operation, or decisions on the 123design of underlying hardware. The functional pseudocode also describes 124how the quantization parameters are utilized within the operation. 125 126### Quantization Parameters in Ops vs Tensors 127 128TOSA uses the quantization parameters embedded in the input and output 129tensors to construct the quantization attributes that sit within the operator. 130Once these attributes are constructed, the quantization information within 131the tensors are no longer necessary for code generation. 132 133This enables the tensors to be subsequently interpreted simply as contiguous 134buffers containing raw data, with no 'meta information' in the form of the 135quantization_type. Precision related manipulation of the input or output are 136instead described by the operator itself which describes, for example, when 137the zero point is applied, or when the scale multiplication is done. 138 139However, TOSA does *not* eliminate the existing MLIR QuantOps quantization 140type information within the tensors; this leaves the choice of how to handle 141quantization information, to later backend code generation steps. 142 143Maintaining the ability to overlap these different representations of 144quantization parameters (i.e. tensor-carried vs op-carried) is an important 145capability when considering progressive lowering between uses that expect one 146scheme vs the other. 147 148## Operation definitions 149 150[include "Dialects/TosaOps.md"] 151