README.md
1# XNNPACK backend for TensorFlow Lite
2
3XNNPACK is a highly optimized library of neural network inference operators for
4ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, macOS,
5and Emscripten environments. This document describes how to use the XNNPACK
6library as an inference engine for TensorFlow Lite.
7
8## Using XNNPACK engine with TensorFlow Lite interpreter
9
10XNNPACK integrates with TensorFlow Lite interpreter through the delegation
11mechanism. TensorFlow Lite supports several methods to enable XNNPACK
12for floating-point inference.
13
14### Enable XNNPACK via Java API on Android (recommended on Android)
15
16Pre-built [nightly TensorFlow Lite binaries for Android](https://www.tensorflow.org/lite/guide/android#use_the_tensorflow_lite_aar_from_mavencentral)
17include XNNPACK, albeit it is disabled by default. Use the `setUseXNNPACK`
18method in `Interpreter.Options` class to enable it:
19
20```java
21Interpreter.Options interpreterOptions = new Interpreter.Options();
22interpreterOptions.setUseXNNPACK(true);
23Interpreter interpreter = new Interpreter(model, interpreterOptions);
24```
25
26### Enable XNNPACK via Swift/Objective-C API on iOS (recommended on iOS)
27
28Pre-built [nightly TensorFlow Lite CocoaPods](https://www.tensorflow.org/lite/guide/ios#specifying_versions)
29include XNNPACK, but do not enable it by default. Swift developers can use
30`InterpreterOptions` object to enable XNNPACK:
31
32```swift
33var options = InterpreterOptions()
34options.isXNNPackEnabled = true
35var interpreter = try Interpreter(modelPath: "model/path", options: options)
36```
37
38Objective-C developers can enable XNNPACK via a new property in the
39`TFLInterpreterOptions` class:
40
41```objc
42TFLInterpreterOptions *options = [[TFLInterpreterOptions alloc] init];
43options.useXNNPACK = YES;
44NSError *error;
45TFLInterpreter *interpreter =
46 [[TFLInterpreter alloc] initWithModelPath:@"model/path"
47 options:options
48 error:&error];
49```
50
51### Enable XNNPACK via Bazel build flags (recommended on desktop)
52
53When building TensorFlow Lite with Bazel, add
54`--define tflite_with_xnnpack=true`, and the TensorFlow Lite interpreter will
55use XNNPACK engine by default.
56
57The exact command depends on the target platform, e.g. for Android AAR you'd use
58
59```
60bazel build -c opt --fat_apk_cpu=x86,x86_64,arm64-v8a,armeabi-v7a \
61 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
62 --define tflite_with_xnnpack=true \
63 //tensorflow/lite/java:tensorflow-lite
64```
65
66Note that in this case `Interpreter::SetNumThreads` invocation does not take
67effect on number of threads used by XNNPACK engine. In order to specify number
68of threads available for XNNPACK engine you should manually pass the value when
69constructing the interpreter. The snippet below illustrates this assuming you
70are using `InterpreterBuilder` to construct the interpreter:
71
72```c++
73// Load model
74tflite::Model* model;
75...
76
77// Construct the interprepter
78tflite::ops::builtin::BuiltinOpResolver resolver;
79std::unique_ptr<tflite::Interpreter> interpreter;
80
81TfLiteStatus res = tflite::InterpreterBuilder(model, resolver, num_threads);
82```
83
84**XNNPACK engine used by TensorFlow Lite interpreter uses a single thread for
85inference by default.**
86
87### Enable XNNPACK via additional dependency
88
89Another way to enable XNNPACK is to build and link the
90`//tensorflow/lite:tflite_with_xnnpack` target into your application alongside
91the TensorFlow Lite framework.
92
93This method works on platforms which support POSIX-style weak symbols (Android,
94iOS, Linux, Mac, but **NOT** Windows).
95
96### Enable XNNPACK via low-level delegate API (not recommended)
97
98While it is possible to use low-level delegate API to enable XNNPACK, this
99method is **NOT RECOMMENDED** unless you need to use TensorFlow Lite both with
100and without XNNPACK (e.g. for benchmarking).
101
102With low-level delegate API users create an XNNPACK delegate with the
103`TfLiteXNNPackDelegateCreate` function, and then call
104`Interpreter::ModifyGraphWithDelegate` to delegate supported parts of
105the model to the XNNPACK delegate. The users must destroy the delegate with
106`TfLiteXNNPackDelegateDelete` **after** releasing the TensorFlow Lite
107interpreter. The snippet below illustrates the typical usage:
108
109```c++
110// Build the interpreter
111std::unique_ptr<tflite::Interpreter> interpreter;
112...
113
114// IMPORTANT: initialize options with TfLiteXNNPackDelegateOptionsDefault() for
115// API-compatibility with future extensions of the TfLiteXNNPackDelegateOptions
116// structure.
117TfLiteXNNPackDelegateOptions xnnpack_options =
118 TfLiteXNNPackDelegateOptionsDefault();
119xnnpack_options.num_threads = num_threads;
120
121TfLiteDelegate* xnnpack_delegate =
122 TfLiteXNNPackDelegateCreate(&xnnpack_options);
123if (interpreter->ModifyGraphWithDelegate(xnnpack_delegate) != kTfLiteOk) {
124 // Report error and fall back to another delegate, or the default backend
125}
126
127...
128
129// Run inference using XNNPACK
130interpreter->Invoke()
131
132...
133
134// IMPORTANT: release the interpreter before destroying the delegate
135interpreter.reset();
136TfLiteXNNPackDelegateDelete(xnnpack_delegate);
137```
138
139### Using the XNNPACK weights cache
140
141XNNPACK internally packs static weights for operations (like convolutions) in
142order to make accessing weights more memory friendly. XNNPACK needs to allocate
143memory internally to hold these packed weights. If you are starting multiple
144TFLite interpreter instances based on the same model, there can be multiple
145copies of the same packed weights in each instance. This can cause high memory
146usage. The weights cache can be used to share packed weights between multiple
147TFLite instances.
148
149```c++
150// Create 2 interpreters which share the same model.
151std::unique_ptr<tflite::Interpreter> interpreter1;
152std::unique_ptr<tflite::Interpreter> interpreter2;
153
154// Create a weights cache that you can pass to XNNPACK delegate.
155TfLiteXNNPackDelegateWeightsCache* weights_cache =
156 TfLiteXNNPackDelegateWeightsCacheCreate();
157
158// Like using the low-level API above, initialize options, and pass this cache
159// to XNNPACK delegate via the options.
160TfLiteXNNPackDelegateOptions xnnpack_options =
161 TfLiteXNNPackDelegateOptionsDefault();
162xnnpack_options.weights_cache = weights_cache;
163
164// Modify graph with delegate, as above...
165TfLiteDelegate* delegate1 = TfLiteXNNPackDelegateCreate(&xnnpack_options);
166if (interpreter1->ModifyGraphWithDelegate(delegate1) != kTfLiteOk) {
167 // Static weights will be packed and written into weights_cache.
168}
169TfLiteDelegate* delegate2 = TfLiteXNNPackDelegateCreate(&xnnpack_options);
170if (interpreter1->ModifyGraphWithDelegate(delegate2) != kTfLiteOk) {
171 // XNNPACK will reuse packed weights if they can be found in the weights
172 // cache.
173}
174
175// Finalize the weights cache.
176// Hard finalization has the lowest memory overhead, but requires that all
177// TFLite interpreter instances must be created up front before any finalization
178// and inference.
179TfLiteXNNPackDelegateWeightsCacheFinalizeHard(weights_cache);
180
181// Alternatively, soft-finalizate the weights cache. This is useful if more
182// delegates using the same model will to be created after finalization.
183// TfLiteXNNPackDelegateWeightsCacheFinalizeSoft(weights_cache);
184
185// Later, after all the interpreters and XNNPACK delegates using the cache are
186// destroyed, release the weights cache.
187TfLiteXNNPackDelegateWeightsCacheDelete(weights_cache);
188```
189
190The weights cache is a contents-based cache. Every time XNNPACK has to pack
191weights, it first packs into a temporary buffer, then tries to look up if the
192packed weights can be found in the weights cache, based on the contents of the
193packed weights. If it can be found, we access the packed weights in the
194cache for subsequent operations, and the temporary buffer is freed. Otherwise,
195the packed weights is added to the cache.
196
197The weights cache has to be finalized before any inference, it will be an error
198otherwise. Hard finalization and soft finalization depends on whether new
199XNNPACK delegate instances will be created after finalization. Hard finalization
200does not allow new instances to be created, and has lower memory overhead. Soft
201finalization allows new instances to be created, and has higher memory overhead
202(up to the size of the largest packed weights, rounded up to page alignment).
203
204## Profiling
205When TfLite profiling is enabled, XNNPACK will time each operator and report the
206results to TfLite which will print them as part of the overall execution profile.
207
208## Limitations and supported operators
209
210XNNPACK delegate is a work-in-progress, and currently supports a limited set of
211operators. Unsupported operators will fall back to the default implementations,
212so models using a combination of supported and unsupported operators can still
213benefit from XNNPACK delegate.
214
215### Floating-Point (IEEE FP32) Operators
216
217Below is the list of currently supported floating-point operators:
218
219#### `ABS`
220
221* Inputs and outputs must be in 32-bit floating-point format.
222
223#### `ADD`
224
225* Inputs and outputs must be in 32-bit floating-point format.
226* Only addition with two inputs is supported.
227* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
228 but fused `TANH` and `SIGN_BIT` activations are not.
229
230#### `AVERAGE_POOL_2D`
231
232* Inputs and outputs must be in 32-bit floating-point format.
233* 1x1 pooling with non-unit stride is not supported.
234* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
235 but fused `TANH` and `SIGN_BIT` activations are not.
236
237#### `CEIL`
238
239* Inputs and outputs must be in 32-bit floating-point format.
240
241#### `CONCATENATION`
242
243* Inputs and outputs must be in 32-bit floating-point format.
244* Only concatenation with two, three, or four inputs is supported.
245
246#### `CONV_2D`
247
248* Inputs and outputs must be in 32-bit floating-point format.
249* Bias is mandatory.
250* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
251* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
252 but fused `TANH` and `SIGN_BIT` activations are not.
253
254#### `DEPTH_TO_SPACE`
255
256* Inputs and outputs must be in 32-bit floating-point format.
257* Block size must be greater than 1.
258
259#### `DEPTHWISE_CONV_2D`
260
261* Inputs and outputs must be in 32-bit floating-point format.
262* Bias is mandatory.
263* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
264* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
265 but fused `TANH` and `SIGN_BIT` activations are not.
266
267#### `DIV`
268
269* Inputs and outputs must be in 32-bit floating-point format.
270* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
271 but fused `TANH` and `SIGN_BIT` activations are not.
272
273#### `ELU`
274
275* Inputs and outputs must be in 32-bit floating-point format.
276
277#### `FULLY_CONNECTED`
278
279* Inputs and outputs must be in 32-bit floating-point format.
280* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
281* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
282 but fused `TANH` and `SIGN_BIT` activations are not.
283
284#### `FLOOR`
285
286* Inputs and outputs must be in 32-bit floating-point format.
287
288#### `HARD_SWISH`
289
290* Inputs and outputs must be in 32-bit floating-point format.
291
292#### `LEAKY_RELU`
293
294* Inputs and outputs must be in 32-bit floating-point format.
295
296#### `LOGISTIC`
297
298* Inputs and outputs must be in 32-bit floating-point format.
299
300#### `MAX_POOL_2D`
301
302* Inputs and outputs must be in 32-bit floating-point format.
303* 1x1 pooling with non-unit stride is not supported.
304* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
305 but fused `TANH` and `SIGN_BIT` activations are not.
306
307#### `MAXIMUM`
308
309* Inputs and outputs must be in 32-bit floating-point format.
310
311#### `MEAN`
312
313* The first input and the output must be 4D tensors in 32-bit
314 floating-point format.
315* The second input (the input with the axes specification) must be static
316 (use `kTfLiteMmapRo` allocation type).
317* Only [1, 2], [2, 1], and [2] axes specification (i.e. reduction across either
318 both spatial dimensions or across the width dimension) is supported.
319
320#### `MINIMUM`
321
322* Inputs and outputs must be in 32-bit floating-point format.
323
324#### `MUL`
325
326* Inputs and outputs must be in 32-bit floating-point format.
327* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
328 but fused `TANH` and `SIGN_BIT` activations are not.
329
330#### `NEG`
331
332* Inputs and outputs must be in 32-bit floating-point format.
333
334#### `PAD`
335
336* The first input and the output must be in 32-bit floating-point format.
337* The second input (the input with the padding specification) must be static
338 (use `kTfLiteMmapRo` allocation type).
339* The numbers of padding elements must be non-negative.
340
341#### `PRELU`
342
343* Inputs and outputs must be in 32-bit floating-point format.
344* Slope must be static (use `kTfLiteMmapRo` allocation type).
345* Slope must be either a 1D tensor, or have all its non-channel dimensions equal
346 1.
347
348#### `RELU`
349
350* Inputs and outputs must be in 32-bit floating-point format.
351
352#### `RELU6`
353
354* Inputs and outputs must be in 32-bit floating-point format.
355
356#### `RELU_N1_TO_1`
357
358* Inputs and outputs must be in 32-bit floating-point format.
359
360#### `RESHAPE`
361
362* The first input and the output must be in 32-bit floating-point format.
363* The second input (the input with the new shape specification) must be either
364 static (use `kTfLiteMmapRo` allocation type), or absent (with the new shape
365 specified via `ReshapeOptions` table).
366
367#### `RESIZE_BILINEAR`
368
369* The first input and the output must be 4D tensors in 32-bit floating-point
370 format.
371* The second input (the input with the new shape specification) must be
372 static (use `kTfLiteMmapRo` allocation type).
373
374#### `ROUND`
375
376* Inputs and outputs must be in 32-bit floating-point format.
377
378#### `SPLIT`
379
380* Inputs and outputs must be in 32-bit floating-point format.
381* Only split into two, three, or four outputs is supported.
382
383#### `SOFTMAX`
384
385* Inputs and outputs must be in 32-bit floating-point format.
386* Only `beta = 1.0` is supported.
387
388#### `SQRT`
389
390* Inputs and outputs must be in 32-bit floating-point format.
391
392#### `SQUARE`
393
394* Inputs and outputs must be in 32-bit floating-point format.
395
396#### `SQUARED_DIFFERENCE`
397
398* Inputs and outputs must be in 32-bit floating-point format.
399
400#### `SUB`
401
402* Inputs and outputs must be in 32-bit floating-point format.
403* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
404 but fused `TANH` and `SIGN_BIT` activations are not.
405
406#### `TRANSPOSE`
407
408* The first input and the output must be in 32-bit floating-point format.
409* The second input (the input with the permutation specification) must be
410 static (use `kTfLiteMmapRo` allocation type).
411
412#### `TRANSPOSE_CONV`
413
414* Input, filter, bias (if present) and output tensors must be in 32-bit
415 floating-point format.
416* Output size, filter and bias (if present) must be static (use
417 `kTfLiteMmapRo` allocation type).
418
419### Floating-Point (IEEE FP16) Operators (experimental)
420
421XNNPACK supports half-precision (using IEEE FP16 format) inference for a subset
422of floating-point operators. XNNPACK automatically enables half-precision
423inference when the following conditions are met:
424
425* XNNPACK runs on hardware that natively supports computations in IEEE FP16
426format. Currently, this hardware is limited to ARM64 devices with ARMv8.2 FP16
427arithmetics extension, and includes Android phones starting with Pixel 3,
428Galaxy S9 (Snapdragon SoC), Galaxy S10 (Exynos SoC), iOS devices with A11 or
429newer SoCs, and all Apple Silicon Macs.
430
431* IEEE FP16 inference is supported for every floating-point operator in the
432model.
433
434* The model's "reduced_precision_support" metadata indicates that the model
435is compatible with FP16 inference.
436
437When the above conditions are met, XNNPACK replace FP32 operators with their
438FP16 equivalents, and insert additional operators to convert model inputs
439from FP32 to FP16 and convert model outputs back from FP16 to FP32. If the
440above conditions are not met, XNNPACK will perform model inference with FP32
441calculations.
442
443Additionally, XNNPACK delegate provides an option to force FP16 inference
444regardless of model metadata. This option is intended for development workflows,
445and in particular for testing end-to-end accuracy of model when FP16 inference
446is used. Forcing FP16 inference has several effects:
447
448* Besides ARM64 devices with ARMv8.2 FP16 arithmetics extension, forced FP16
449inference is supported on x86/x86-64 devices with AVX2 extension in emulation
450mode: all elementary floating-point operations are computed in FP32, then
451converted to FP16 and back to FP32. Note that such simulation is not exactly
452equivalent to native FP16 inference, but simulates the effects of restricted
453mantissa precision and exponent range in the native FP16 arithmetics.
454
455* On devices that support neither the native FP16 arithmetics (ARM64 devices
456with ARMv8.2 FP16 arithmetics extension), nor emulation (x86/x86-64 devices with
457AVX2 extension), inference will fail rather than fall back to FP32.
458
459* If any floating-point operator offloaded to XNNPACK is not supported for FP16
460inference, inference will fail rather than fall back to FP32.
461
462To force FP16 inference, either build the delegate with
463`--define xnnpack_force_float_precision=fp16` option, or add
464`TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16` flag to the
465`TfLiteXNNPackDelegateOptions.flags` bitmask passed into
466the `TfLiteXNNPackDelegateCreate` call:
467
468```c
469TfLiteXNNPackDelegateOptions xnnpack_options =
470 TfLiteXNNPackDelegateOptionsDefault();
471...
472xnnpack_options.flags |= TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16;
473TfLiteDelegate* xnnpack_delegate =
474 TfLiteXNNPackDelegateCreate(&xnnpack_options);
475```
476
477Below is the list of operators supported in IEEE FP16 inference:
478
479#### `ABS`
480
481* Must satisfy constraints on the floating-point (FP32) operator.
482
483#### `ADD`
484
485* Must satisfy constraints on the floating-point (FP32) operator.
486* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
487
488#### `AVERAGE_POOL_2D`
489
490* Must satisfy constraints on the floating-point (FP32) operator.
491
492#### `CEIL`
493
494* Must satisfy constraints on the floating-point (FP32) operator.
495
496#### `CONV_2D`
497
498* Must satisfy constraints on the floating-point (FP32) operator.
499
500#### `CONCATENATION`
501
502* Must satisfy constraints on the floating-point (FP32) operator.
503* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
504
505#### `DEPTH_TO_SPACE`
506
507* Must satisfy constraints on the floating-point (FP32) operator.
508
509#### `DEPTHWISE_CONV_2D`
510
511* Must satisfy constraints on the floating-point (FP32) operator.
512
513#### `DIV`
514
515* Must satisfy constraints on the floating-point (FP32) operator.
516* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
517
518#### `FLOOR`
519
520* Must satisfy constraints on the floating-point (FP32) operator.
521
522#### `FULLY_CONNECTED`
523
524* Must satisfy constraints on the floating-point (FP32) operator.
525
526#### `HARD_SWISH`
527
528* Must satisfy constraints on the floating-point (FP32) operator.
529
530#### `LEAKY_RELU`
531
532* Must satisfy constraints on the floating-point (FP32) operator.
533
534#### `LOGISTIC`
535
536* Must satisfy constraints on the floating-point (FP32) operator.
537
538#### `MAX_POOL_2D`
539
540* Must satisfy constraints on the floating-point (FP32) operator.
541
542#### `MAXIMUM`
543
544* Must satisfy constraints on the floating-point (FP32) operator.
545* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
546
547#### `MEAN`
548
549* Must satisfy constraints on the floating-point (FP32) operator.
550
551#### `MINIMUM`
552
553* Must satisfy constraints on the floating-point (FP32) operator.
554* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
555
556#### `MUL`
557
558* Must satisfy constraints on the floating-point (FP32) operator.
559* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
560
561#### `NEG`
562
563* Must satisfy constraints on the floating-point (FP32) operator.
564
565#### `PAD`
566
567* Must satisfy constraints on the floating-point (FP32) operator.
568
569#### `PRELU`
570
571* Must satisfy constraints on the floating-point (FP32) operator.
572
573#### `RELU`
574
575* Must satisfy constraints on the floating-point (FP32) operator.
576
577#### `RELU6`
578
579* Must satisfy constraints on the floating-point (FP32) operator.
580
581#### `RELU_N1_TO_1`
582
583* Must satisfy constraints on the floating-point (FP32) operator.
584
585#### `RESHAPE`
586
587* Must satisfy constraints on the floating-point (FP32) operator.
588
589#### `RESIZE_BILINEAR`
590
591* Must satisfy constraints on the floating-point (FP32) operator.
592
593#### `ROUND`
594
595* Must satisfy constraints on the floating-point (FP32) operator.
596
597#### `SPLIT`
598
599* Must satisfy constraints on the floating-point (FP32) operator.
600
601#### `SOFTMAX`
602
603* Must satisfy constraints on the floating-point (FP32) operator.
604
605#### `SQRT`
606
607* Must satisfy constraints on the floating-point (FP32) operator.
608
609#### `SQUARE`
610
611* Must satisfy constraints on the floating-point (FP32) operator.
612
613#### `SQUARED_DIFFERENCE`
614
615* Must satisfy constraints on the floating-point (FP32) operator.
616* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
617
618#### `SUB`
619
620* Must satisfy constraints on the floating-point (FP32) operator.
621* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
622
623#### `TRANSPOSE`
624
625* Must satisfy constraints on the floating-point (FP32) operator.
626
627#### `TRANSPOSE_CONV`
628
629* Must satisfy constraints on the floating-point (FP32) operator.
630
631### Quantized Operators
632
633By default, quantized inference in XNNPACK delegate is disabled, and XNNPACK is
634used only for floating-point models. Support for quantized inference in XNNPACK
635must be enabled by adding extra Bazel flags when building TensorFlow Lite.
636
637* `--define tflite_with_xnnpack_qs8=true` flag enables XNNPACK inference for
638 quantized operators using signed quantization schema. This schema is used by
639 models produced by [Model Optimization
640 Toolkit](https://www.tensorflow.org/model_optimization) through either
641 post-training integer quantization or quantization-aware training.
642 Post-training dynamic range quantization is not supported in XNNPACK.
643
644* `--define tflite_with_xnnpack_qu8=true` flag enables XNNPACK inference for
645 quantized operators using unsigned quantization schema, produced via the
646 legacy TensorFlow 1.X quantization tooling. This option is experimental and
647 may perform suboptimally on mobile processors with NEON DOT product
648 instructions.
649
650Below is the list of currently supported quantized operators:
651
652#### `ADD`
653
654* Inputs and outputs must be in 8-bit quantized format.
655* Only addition with two inputs is supported.
656* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
657 but fused `TANH` and `SIGN_BIT` activations are not.
658
659#### `CONCATENATION`
660
661* Inputs and outputs must be in 8-bit quantized format.
662* Only concatenation with two, three, or four inputs is supported.
663
664#### `CONV_2D`
665
666* Inputs and outputs must be in 8-bit quantized format (bias must be in 32-bit
667 quantized format).
668* Bias is mandatory.
669* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
670 and can use either per-tensor or per-channel quantization parameters.
671* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
672 but fused `TANH` and `SIGN_BIT` activations are not.
673
674#### `DEPTH_TO_SPACE`
675
676* Inputs and outputs must be in 8-bit quantized format.
677* Block size must be greater than 1.
678
679#### `DEPTHWISE_CONV_2D`
680
681* Inputs and outputs must be in 8-bit quantized format (bias must be in
682 32-bit quantized format).
683* Bias is mandatory.
684* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
685 and can use either per-tensor or per-channel quantization parameters.
686* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
687 but fused `TANH` and `SIGN_BIT` activations are not.
688
689#### `DEQUANTIZE`
690
691* Input tensor must be in 8-bit quantized format without per-channel
692 quantization.
693* Output tensor must be in 32-bit floating-point format.
694
695#### `ELU`
696
697* Inputs and outputs must be in 8-bit signed quantized format.
698
699#### `FULLY_CONNECTED`
700
701* Inputs and outputs must be in 8-bit quantized format (bias, if present, must
702 be in 32-bit quantized format).
703* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
704* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
705 but fused `TANH` and `SIGN_BIT` activations are not.
706
707#### `LEAKY_RELU`
708
709* Inputs and outputs must be in 8-bit quantized format.
710* The ratio of input scale to output scale must be within [1/256, 128].
711* The product of negative slope by the ratio of input scale to output scale
712 must be within either [-127.99609375, -1/256] range or [1/256, 128] range.
713
714#### `LOGISTIC`
715
716* Inputs and outputs must be in 8-bit quantized format.
717
718#### `MAX_POOL_2D`
719
720* Inputs and outputs must be in 8-bit quantized format.
721* 1x1 pooling with non-unit stride is not supported.
722* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
723 but fused `TANH` and `SIGN_BIT` activations are not.
724
725#### `MEAN`
726
727* The first input and the output must be 4D tensors in 8-bit quantized format.
728* The second input (the input with the axes specification) must be static
729 (use `kTfLiteMmapRo` allocation type).
730* Only [1, 2], [2, 1], and [2] axes specification (i.e. reduction across either
731 both spatial dimensions or across the width dimension) is supported.
732
733#### `MUL`
734
735* Inputs and outputs must be in 8-bit quantized format.
736* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
737 but fused `TANH` and `SIGN_BIT` activations are not.
738
739#### `PAD`
740
741* The first input and the output must be in 8-bit quantized format.
742* The second input (the input with the padding specification) must be static
743 (use `kTfLiteMmapRo` allocation type).
744* The numbers of padding elements must be non-negative.
745
746#### `QUANTIZE`
747
748* Input tensor must be in 32-bit floating-point format or in 8-bit quantized
749 format.
750* Output tensor must be in 8-bit quantized format without per-channel
751 quantization.
752* If inputs are in 8-bit quantized format, they must have the same signedness
753 as the outputs, and the ratio of input scale to output scale must be in the
754 [2**-8, 2**7] range.
755
756#### `RESIZE_BILINEAR`
757
758* The first input and the output must be 4D tensors in 8-bit quantized format.
759* The second input (the input with the new shape specification) must be
760 static (use `kTfLiteMmapRo` allocation type).
761
762#### `SPLIT`
763
764* Inputs and outputs must be in 8-bit quantized format.
765* Only split into two, three, or four outputs is supported.
766
767#### `SUB`
768
769* Inputs and outputs must be in 8-bit quantized format.
770* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
771 but fused `TANH` and `SIGN_BIT` activations are not.
772
773#### `TRANSPOSE`
774
775* The first input and the output must be in 8-bit quantized format.
776* The second input (the input with the permutation specification) must be
777 static (use `kTfLiteMmapRo` allocation type).
778
779#### `TRANSPOSE_CONV`
780
781* Input, filter, and output tensors must be in 8-bit quantized format (bias, if
782 present, must be in 32-bit quantized format).
783* Output size, filter and bias (if present) must be static (use
784 `kTfLiteMmapRo` allocation type).
785
786### Sparse Inference
787
788XNNPACK backend supports sparse inference for CNN models described in the
789[Fast Sparse ConvNets](https://arxiv.org/abs/1911.09723) paper. Sparse
790inference is restricted to subgraphs with the following operators:
791
792* Sparse subgraph must store its weights in sparse representation (using
793 `DENSIFY` operators in the TensorFlow Lite schema).
794* Sparse subgraph must start with a 3x3 stride-2 `CONV_2D` operator with
795 padding 1 on each side, no dilation, and 3 input channels.
796* Sparse subgraph must end with either a `MEAN` operator with reduction across
797 spatial axes, or a `DEPTH_TO_SPACE` operator.
798* Sparse subgraph may contain the following operators:
799 * `CONV_2D` with 1x1 kernel and no padding. At least 2/3rd of filter weights
800 in the 1x1 `CONV_2D` operators across the sparse subgraph must be zeroes
801 to enable sparse inference.
802 * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 1, no dilation, and padding 1
803 on each side.
804 * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 2, no dilation, and padding 1
805 on each side.
806 * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 1, no dilation, and padding 2
807 on each side.
808 * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 2, no dilation, and padding 2
809 on each side.
810 * `RESIZE_BILINEAR` operator with output dimensions greater than 1.
811 * `MEAN` operator with reduction across spatial axes.
812 * `ADD` and `MUL` operators where both inputs are 4D tensors. If one of the
813 inputs to `ADD` or `MUL` is a constant tensor, it must be representable as
814 either a scalar, or a 1D vector.
815 * Unary elementwise operators `ABS`, `CEIL`, `ELU`, `FLOOR`, `HARD_SWISH`,
816 `LEAKY_RELU`, `LOGISTIC`, `NEG`, `RELU`, `RELU6`, `RELU_N1_TO_1`, `ROUND`,
817 `SIGMOID`, and `SQUARE`.
818
819Pre-trained [Fast Sparse ConvNets models](https://github.com/google-research/google-research/tree/master/fastconvnets)
820provide examples that satisfy these constrains.
821
822### Other limitations
823
824* Dynamically allocated (with `kTfLiteDynamic` allocation type) inputs and
825 outputs are not supported.
826* Resizing model inputs (via `Interpreter::ResizeInputTensor`) is supported, but
827 cause a complete reinitialization of the delegate instance, which has
828 considerable overhead.
829