README.md
1# XNNPACK backend for TensorFlow Lite
2
3XNNPACK is a highly optimized library of neural network inference operators for
4ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, macOS,
5and Emscripten environments. This document describes how to use the XNNPACK
6library as an inference engine for TensorFlow Lite.
7
8## Using XNNPACK engine with TensorFlow Lite interpreter
9
10XNNPACK integrates with TensorFlow Lite interpreter through the delegation
11mechanism. TensorFlow Lite supports several methods to enable XNNPACK
12for floating-point inference.
13
14### Enable XNNPACK via Java API on Android (recommended on Android)
15
16Pre-built [nightly TensorFlow Lite binaries for Android](https://www.tensorflow.org/lite/guide/android#use_the_tensorflow_lite_aar_from_mavencentral)
17include XNNPACK, albeit it is disabled by default. Use the `setUseXNNPACK`
18method in `Interpreter.Options` class to enable it:
19
20```java
21Interpreter.Options interpreterOptions = new Interpreter.Options();
22interpreterOptions.setUseXNNPACK(true);
23Interpreter interpreter = new Interpreter(model, interpreterOptions);
24```
25
26### Enable XNNPACK via Swift/Objective-C API on iOS (recommended on iOS)
27
28Pre-built [nightly TensorFlow Lite CocoaPods](https://www.tensorflow.org/lite/guide/ios#specifying_versions)
29include XNNPACK, but do not enable it by default. Swift developers can use
30`InterpreterOptions` object to enable XNNPACK:
31
32```swift
33var options = InterpreterOptions()
34options.isXNNPackEnabled = true
35var interpreter = try Interpreter(modelPath: "model/path", options: options)
36```
37
38Objective-C developers can enable XNNPACK via a new property in the
39`TFLInterpreterOptions` class:
40
41```objc
42TFLInterpreterOptions *options = [[TFLInterpreterOptions alloc] init];
43options.useXNNPACK = YES;
44NSError *error;
45TFLInterpreter *interpreter =
46 [[TFLInterpreter alloc] initWithModelPath:@"model/path"
47 options:options
48 error:&error];
49```
50
51### Enable XNNPACK via Bazel build flags (recommended on desktop)
52
53When building TensorFlow Lite with Bazel, add
54`--define tflite_with_xnnpack=true`, and the TensorFlow Lite interpreter will
55use XNNPACK engine by default.
56
57The exact command depends on the target platform, e.g. for Android AAR you'd use
58
59```
60bazel build -c opt --fat_apk_cpu=x86,x86_64,arm64-v8a,armeabi-v7a \
61 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
62 --define tflite_with_xnnpack=true \
63 //tensorflow/lite/java:tensorflow-lite
64```
65
66Note that in this case `Interpreter::SetNumThreads` invocation does not take
67effect on number of threads used by XNNPACK engine. In order to specify number
68of threads available for XNNPACK engine you should manually pass the value when
69constructing the interpreter. The snippet below illustrates this assuming you
70are using `InterpreterBuilder` to construct the interpreter:
71
72```c++
73// Load model
74tflite::Model* model;
75...
76
77// Construct the interprepter
78tflite::ops::builtin::BuiltinOpResolver resolver;
79std::unique_ptr<tflite::Interpreter> interpreter;
80
81TfLiteStatus res = tflite::InterpreterBuilder(model, resolver, num_threads);
82```
83
84**XNNPACK engine used by TensorFlow Lite interpreter uses a single thread for
85inference by default.**
86
87### Enable XNNPACK via additional dependency
88
89Another way to enable XNNPACK is to build and link the
90`//tensorflow/lite:tflite_with_xnnpack` target into your application alongside
91the TensorFlow Lite framework.
92
93This method works on platforms which support POSIX-style weak symbols (Android,
94iOS, Linux, Mac, but **NOT** Windows).
95
96### Enable XNNPACK via low-level delegate API (not recommended)
97
98While it is possible to use low-level delegate API to enable XNNPACK, this
99method is **NOT RECOMMENDED** unless you need to use TensorFlow Lite both with
100and without XNNPACK (e.g. for benchmarking).
101
102With low-level delegate API users create an XNNPACK delegate with the
103`TfLiteXNNPackDelegateCreate` function, and then call
104`Interpreter::ModifyGraphWithDelegate` to delegate supported parts of
105the model to the XNNPACK delegate. The users must destroy the delegate with
106`TfLiteXNNPackDelegateDelete` **after** releasing the TensorFlow Lite
107interpreter. The snippet below illustrates the typical usage:
108
109```c++
110// Build the interpreter
111std::unique_ptr<tflite::Interpreter> interpreter;
112...
113
114// IMPORTANT: initialize options with TfLiteXNNPackDelegateOptionsDefault() for
115// API-compatibility with future extensions of the TfLiteXNNPackDelegateOptions
116// structure.
117TfLiteXNNPackDelegateOptions xnnpack_options =
118 TfLiteXNNPackDelegateOptionsDefault();
119xnnpack_options.num_threads = num_threads;
120
121TfLiteDelegate* xnnpack_delegate =
122 TfLiteXNNPackDelegateCreate(&xnnpack_options);
123if (interpreter->ModifyGraphWithDelegate(xnnpack_delegate) != kTfLiteOk) {
124 // Report error and fall back to another delegate, or the default backend
125}
126
127...
128
129// Run inference using XNNPACK
130interpreter->Invoke()
131
132...
133
134// IMPORTANT: release the interpreter before destroying the delegate
135interpreter.reset();
136TfLiteXNNPackDelegateDelete(xnnpack_delegate);
137```
138
139## Limitations and supported operators
140
141XNNPACK delegate is a work-in-progress, and currently supports a limited set of
142operators. Unsupported operators will fall back to the default implementations,
143so models using a combination of supported and unsupported operators can still
144benefit from XNNPACK delegate.
145
146### Floating-Point Operators
147
148Below is the list of currently supported floating-point operators:
149
150#### `ABS`
151
152* Inputs and outputs must be in 32-bit floating-point format.
153
154#### `ADD`
155
156* Inputs and outputs must be in 32-bit floating-point format.
157* Only addition with two inputs is supported.
158* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
159 but fused `TANH` and `SIGN_BIT` activations are not.
160
161#### `AVERAGE_POOL_2D`
162
163* Inputs and outputs must be in 32-bit floating-point format.
164* 1x1 pooling with non-unit stride is not supported.
165* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
166 but fused `TANH` and `SIGN_BIT` activations are not.
167
168#### `CEIL`
169
170* Inputs and outputs must be in 32-bit floating-point format.
171
172#### `CONV_2D`
173
174* Inputs and outputs must be in 32-bit floating-point format.
175* Bias is mandatory.
176* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
177* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
178 but fused `TANH` and `SIGN_BIT` activations are not.
179
180#### `DEPTH_TO_SPACE`
181
182* Inputs and outputs must be in 32-bit floating-point format.
183* Block size must be greater than 1.
184
185#### `DEPTHWISE_CONV_2D`
186
187* Inputs and outputs must be in 32-bit floating-point format.
188* Bias is mandatory.
189* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
190* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
191 but fused `TANH` and `SIGN_BIT` activations are not.
192
193#### `DIV`
194
195* Inputs and outputs must be in 32-bit floating-point format.
196* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
197 but fused `TANH` and `SIGN_BIT` activations are not.
198
199#### `ELU`
200
201* Inputs and outputs must be in 32-bit floating-point format.
202
203#### `FULLY_CONNECTED`
204
205* Inputs and outputs must be in 32-bit floating-point format.
206* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
207* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
208 but fused `TANH` and `SIGN_BIT` activations are not.
209
210#### `FLOOR`
211
212* Inputs and outputs must be in 32-bit floating-point format.
213
214#### `HARD_SWISH`
215
216* Inputs and outputs must be in 32-bit floating-point format.
217
218#### `LEAKY_RELU`
219
220* Inputs and outputs must be in 32-bit floating-point format.
221
222#### `LOGISTIC`
223
224* Inputs and outputs must be in 32-bit floating-point format.
225
226#### `MAX_POOL_2D`
227
228* Inputs and outputs must be in 32-bit floating-point format.
229* 1x1 pooling with non-unit stride is not supported.
230* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
231 but fused `TANH` and `SIGN_BIT` activations are not.
232
233#### `MAXIMUM`
234
235* Inputs and outputs must be in 32-bit floating-point format.
236
237#### `MEAN`
238
239* The first input and the output must be a 4D tensors in 32-bit
240 floating-point format.
241* The second input (the input with the axes specification) must be static
242 (use `kTfLiteMmapRo` allocation type).
243* Only [1, 2] or [2, 1] axes specification (i.e. reduction across spatial
244 dimensions) is supported.
245* Only `keep_dims = True` parameter value is supported.
246
247#### `MINIMUM`
248
249* Inputs and outputs must be in 32-bit floating-point format.
250
251#### `MUL`
252
253* Inputs and outputs must be in 32-bit floating-point format.
254* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
255 but fused `TANH` and `SIGN_BIT` activations are not.
256
257#### `NEG`
258
259* Inputs and outputs must be in 32-bit floating-point format.
260
261#### `PAD`
262
263* The first input and the output must be in 32-bit floating-point format.
264* The second input (the input with the padding specification) must be static
265 (use `kTfLiteMmapRo` allocation type).
266* The numbers of padding elements must be non-negative.
267
268#### `PRELU`
269
270* Inputs and outputs must be in 32-bit floating-point format.
271* Slope must be static (use `kTfLiteMmapRo` allocation type).
272* Slope must be either a 1D tensor, or have all its non-channel dimensions equal
273 1.
274
275#### `RELU`
276
277* Inputs and outputs must be in 32-bit floating-point format.
278
279#### `RELU6`
280
281* Inputs and outputs must be in 32-bit floating-point format.
282
283#### `RELU_N1_TO_1`
284
285* Inputs and outputs must be in 32-bit floating-point format.
286
287#### `RESHAPE`
288
289* The first input and the output must be in 32-bit floating-point format.
290* The second input (the input with the new shape specification) must be either
291 static (use `kTfLiteMmapRo` allocation type), or absent (with the new shape
292 specified via `ReshapeOptions` table).
293
294#### `RESIZE_BILINEAR`
295
296* The first input and the output must be 4D tensors in 32-bit floating-point
297 format.
298* The second input (the input with the new shape specification) must be
299 static (use `kTfLiteMmapRo` allocation type).
300
301#### `ROUND`
302
303* Inputs and outputs must be in 32-bit floating-point format.
304
305#### `SOFTMAX`
306
307* Inputs and outputs must be in 32-bit floating-point format.
308* Only `beta = 1.0` is supported.
309
310#### `SQRT`
311
312* Inputs and outputs must be in 32-bit floating-point format.
313
314#### `SQUARE`
315
316* Inputs and outputs must be in 32-bit floating-point format.
317
318#### `SQUARED_DIFFERENCE`
319
320* Inputs and outputs must be in 32-bit floating-point format.
321
322#### `SUB`
323
324* Inputs and outputs must be in 32-bit floating-point format.
325* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
326 but fused `TANH` and `SIGN_BIT` activations are not.
327
328### Quantized Operators
329
330By default, quantized inference in XNNPACK delegate is disabled, and XNNPACK is
331used only for floating-point models. Support for quantized inference in XNNPACK
332must be enabled by adding extra Bazel flags when building TensorFlow Lite.
333
334* `--define xnn_enable_qs8=true` flag enables XNNPACK inference for quantized
335operators using signed quantization schema. This schema is used by models
336produced by [Model Optimization Toolkit](https://www.tensorflow.org/model_optimization)
337through either post-training integer quantization or quantization-aware
338training. Post-training dynamic range quantization is not supported in XNNPACK.
339
340* `--define xnn_enable_qu8=true` flag enables XNNPACK inference for quantized
341operators using unsigned quantization schema, produced via the legacy TensorFlow
3421.X quantization tooling. This option is experimental and may perform
343suboptimally on mobile processors with NEON DOT product instructions.
344
345Below is the list of currently supported quantized operators:
346
347#### `ADD`
348
349* Inputs and outputs must be in 8-bit quantized format.
350* Only addition with two inputs is supported.
351* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
352 but fused `TANH` and `SIGN_BIT` activations are not.
353
354#### `CONV_2D`
355
356* Inputs and outputs must be in 8-bit quantized format (bias must be in 32-bit
357 quantized format).
358* Bias is mandatory.
359* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
360 and can use either per-tensor or per-channel quantization parameters.
361* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
362 but fused `TANH` and `SIGN_BIT` activations are not.
363
364#### `DEPTHWISE_CONV_2D`
365
366* Inputs and outputs must be in 8-bit quantized format (bias must be in
367 32-bit quantized format).
368* Bias is mandatory.
369* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
370 and can use either per-tensor or per-channel quantization parameters.
371* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
372 but fused `TANH` and `SIGN_BIT` activations are not.
373
374#### `FULLY_CONNECTED`
375
376* Inputs and outputs must be in 8-bit quantized format (bias, if present, must
377 be in 32-bit quantized format).
378* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
379* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
380 but fused `TANH` and `SIGN_BIT` activations are not.
381
382#### `MUL`
383
384* Inputs and outputs must be in 8-bit quantized format.
385* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
386 but fused `TANH` and `SIGN_BIT` activations are not.
387
388#### `PAD`
389
390* The first input and the output must be in 8-bit quantized format.
391* The second input (the input with the padding specification) must be static
392 (use `kTfLiteMmapRo` allocation type).
393* The numbers of padding elements must be non-negative.
394
395### Sparse Inference
396
397XNNPACK backend supports sparse inference for CNN models described in the
398[Fast Sparse ConvNets](https://arxiv.org/abs/1911.09723) paper. Sparse
399inference is restricted to subgraphs with the following operators:
400
401* Sparse subgraph must store its weights in sparse representation (using
402 `DENSIFY` operators in the TensorFlow Lite schema).
403* Sparse subgraph must start with a 3x3 stride-2 `CONV_2D` operator with
404 padding 1 on each side, no dilation, and 3 input channels.
405* Sparse subgraph must end with either a `MEAN` operator with reduction across
406 spatial axes, or a `DEPTH_TO_SPACE` operator.
407* Sparse subgraph may contain the following operators:
408 * `CONV_2D` with 1x1 kernel and no padding. At least 2/3rd of filter weights
409 in the 1x1 `CONV_2D` operators across the sparse subgraph must be zeroes
410 to enable sparse inference.
411 * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 1, no dilation, and padding 1
412 on each side.
413 * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 2, no dilation, and padding 1
414 on each side.
415 * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 1, no dilation, and padding 2
416 on each side.
417 * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 2, no dilation, and padding 2
418 on each side.
419 * `RESIZE_BILINEAR` operator with output dimensions greater than 1.
420 * `MEAN` operator with reduction across spatial axes.
421 * `ADD` and `MUL` operators where both inputs are 4D tensors. If one of the
422 inputs to `ADD` or `MUL` is a constant tensor, it must be representable as
423 either a scalar, or a 1D vector.
424 * Unary elementwise operators `ABS`, `CEIL`, `ELU`, `FLOOR`, `HARD_SWISH`,
425 `LEAKY_RELU`, `LOGISTIC`, `NEG`, `RELU`, `RELU6`, `RELU_N1_TO_1`, `ROUND`,
426 `SIGMOID`, and `SQUARE`.
427
428Pre-trained [Fast Sparse ConvNets models](https://github.com/google-research/google-research/tree/master/fastconvnets)
429provide examples that satisfy these constrains.
430
431### Other limitations
432
433* Dynamically allocated (with `kTfLiteDynamic` allocation type) inputs and
434 outputs are not supported.
435* Resizing model inputs (via `Interpreter::ResizeInputTensor`) is supported, but
436 cause a complete reinitialization of the delegate instance, which has
437 considerable overhead.
438