• Home
Name Date Size #Lines LOC

..--

BUILDD04-Jul-202553.1 KiB1,9961,860

README.mdD04-Jul-202528.7 KiB829569

abs_test.ccD04-Jul-20254.1 KiB12187

add_test.ccD04-Jul-202532 KiB928773

average_pool_2d_test.ccD04-Jul-202515.3 KiB425365

binary_elementwise_tester.ccD04-Jul-202521.5 KiB518461

binary_elementwise_tester.hD04-Jul-20254.5 KiB159110

ceil_test.ccD04-Jul-20254.1 KiB12187

channelwise_quantized_conv_2d_test.ccD04-Jul-202527.1 KiB675600

channelwise_quantized_depthwise_conv_2d_test.ccD04-Jul-202532.6 KiB814723

concatenation_test.ccD04-Jul-202511.3 KiB328231

concatenation_tester.ccD04-Jul-20259 KiB239189

concatenation_tester.hD04-Jul-20253 KiB9560

conv_2d_test.ccD04-Jul-202528 KiB785693

conv_2d_tester.ccD04-Jul-202518.7 KiB432364

conv_2d_tester.hD04-Jul-20257.3 KiB273201

delegate_test.ccD04-Jul-20252.4 KiB6743

depth_to_space_test.ccD04-Jul-20255.6 KiB157123

depth_to_space_tester.ccD04-Jul-202510 KiB251203

depth_to_space_tester.hD04-Jul-20253.1 KiB10465

depthwise_conv_2d_test.ccD04-Jul-202526.1 KiB726635

depthwise_conv_2d_tester.ccD04-Jul-202517.4 KiB403337

depthwise_conv_2d_tester.hD04-Jul-20257.4 KiB263193

dequantize_tester.ccD04-Jul-20256.3 KiB171127

dequantize_tester.hD04-Jul-20252.7 KiB9256

div_test.ccD04-Jul-202532 KiB928773

elu_test.ccD04-Jul-20254.1 KiB12187

floor_test.ccD04-Jul-20254.1 KiB12187

fully_connected_test.ccD04-Jul-202518.7 KiB521439

fully_connected_tester.ccD04-Jul-202515.1 KiB364298

fully_connected_tester.hD04-Jul-20254.5 KiB160110

hard_swish_test.ccD04-Jul-20254.2 KiB12187

leaky_relu_test.ccD04-Jul-20254.6 KiB138101

leaky_relu_tester.ccD04-Jul-20255.7 KiB160116

leaky_relu_tester.hD04-Jul-20252 KiB7039

logistic_test.ccD04-Jul-20254.3 KiB12591

max_pool_2d_test.ccD04-Jul-202515.2 KiB425365

maximum_test.ccD04-Jul-202528.6 KiB823683

mean_test.ccD04-Jul-202516.8 KiB502412

minimum_test.ccD04-Jul-202528.6 KiB823683

mul_test.ccD04-Jul-202532 KiB928773

neg_test.ccD04-Jul-20254.1 KiB12187

pad_test.ccD04-Jul-20259.9 KiB280222

pad_tester.ccD04-Jul-20257.2 KiB193148

pad_tester.hD04-Jul-20252.6 KiB8855

pool_2d_tester.ccD04-Jul-20258 KiB202152

pool_2d_tester.hD04-Jul-20254.8 KiB178126

prelu_test.ccD04-Jul-202521.7 KiB654524

prelu_tester.ccD04-Jul-202512.1 KiB300250

prelu_tester.hD04-Jul-20253.2 KiB11575

quantization_util.ccD04-Jul-20252.3 KiB5936

quantization_util.hD04-Jul-20252.3 KiB5620

quantization_util_test.ccD04-Jul-20253.3 KiB10368

quantize_float32_to_int8_test.ccD04-Jul-20254.5 KiB13298

quantize_float32_to_uint8_test.ccD04-Jul-20254.6 KiB137103

quantize_int8_to_int8_test.ccD04-Jul-20254.7 KiB142108

quantize_tester.ccD04-Jul-20258.8 KiB233184

quantize_tester.hD04-Jul-20253.4 KiB11473

quantize_uint8_to_uint8_test.ccD04-Jul-20254.7 KiB142108

quantized_binary_elementwise_tester.ccD04-Jul-202511.3 KiB287235

quantized_binary_elementwise_tester.hD04-Jul-20255.4 KiB181127

quantized_conv_2d_tester.ccD04-Jul-202511.7 KiB278228

quantized_conv_2d_tester.hD04-Jul-20258.8 KiB305223

quantized_depthwise_conv_2d_tester.ccD04-Jul-202511.9 KiB283233

quantized_depthwise_conv_2d_tester.hD04-Jul-20258.9 KiB303222

quantized_fully_connected_tester.ccD04-Jul-202510.1 KiB253203

quantized_fully_connected_tester.hD04-Jul-20255.7 KiB194136

quantized_leaky_relu_tester.ccD04-Jul-20257 KiB183138

quantized_leaky_relu_tester.hD04-Jul-20253.5 KiB11674

quantized_pad_tester.ccD04-Jul-20258.2 KiB211165

quantized_pad_tester.hD04-Jul-20253.5 KiB12080

quantized_pool_2d_tester.ccD04-Jul-20258.4 KiB208161

quantized_pool_2d_tester.hD04-Jul-20255.8 KiB207148

quantized_reduce_tester.ccD04-Jul-20257.8 KiB196151

quantized_reduce_tester.hD04-Jul-20254.7 KiB156110

quantized_resize_bilinear_tester.ccD04-Jul-20258.1 KiB207161

quantized_resize_bilinear_tester.hD04-Jul-20254.1 KiB14698

quantized_transpose_conv_tester.ccD04-Jul-202511.3 KiB288212

quantized_transpose_conv_tester.hD04-Jul-20256.8 KiB233167

quantized_unary_elementwise_tester.ccD04-Jul-20256.9 KiB181137

quantized_unary_elementwise_tester.hD04-Jul-20253.4 KiB11373

reduce_tester.ccD04-Jul-20256.6 KiB177133

reduce_tester.hD04-Jul-20253.5 KiB11881

relu6_test.ccD04-Jul-20254.1 KiB12187

relu_n1_to_1_test.ccD04-Jul-20254.2 KiB12187

relu_test.ccD04-Jul-20254.1 KiB12187

reshape_test.ccD04-Jul-20258 KiB226178

reshape_tester.ccD04-Jul-20256.9 KiB187141

reshape_tester.hD04-Jul-20252.6 KiB8854

resize_bilinear_test.ccD04-Jul-20254 KiB12089

resize_bilinear_tester.ccD04-Jul-20257.2 KiB189144

resize_bilinear_tester.hD04-Jul-20253.3 KiB11675

round_test.ccD04-Jul-20254.1 KiB12187

signed_dequantize_test.ccD04-Jul-20254.2 KiB12995

signed_quantized_add_test.ccD04-Jul-202542.6 KiB1,067930

signed_quantized_concatenation_test.ccD04-Jul-202511.4 KiB328231

signed_quantized_conv_2d_test.ccD04-Jul-202521.2 KiB541480

signed_quantized_depth_to_space_test.ccD04-Jul-20255.6 KiB157123

signed_quantized_depthwise_conv_2d_test.ccD04-Jul-202525.9 KiB649576

signed_quantized_elu_test.ccD04-Jul-20254.7 KiB142108

signed_quantized_fully_connected_test.ccD04-Jul-202519.2 KiB470406

signed_quantized_leaky_relu_test.ccD04-Jul-20255.4 KiB166129

signed_quantized_logistic_test.ccD04-Jul-20255 KiB142108

signed_quantized_max_pool_2d_test.ccD04-Jul-202515.5 KiB425365

signed_quantized_mean_test.ccD04-Jul-202517.4 KiB502412

signed_quantized_mul_test.ccD04-Jul-202542.6 KiB1,067930

signed_quantized_pad_test.ccD04-Jul-202510.2 KiB280222

signed_quantized_resize_bilinear_test.ccD04-Jul-20254.1 KiB12089

signed_quantized_split_test.ccD04-Jul-202510.9 KiB334246

signed_quantized_sub_test.ccD04-Jul-202542.6 KiB1,067930

signed_quantized_transpose_conv_test.ccD04-Jul-202523.8 KiB677595

signed_quantized_transpose_test.ccD04-Jul-20253.4 KiB10774

softmax_test.ccD04-Jul-20254.7 KiB141103

softmax_tester.ccD04-Jul-20255.8 KiB162118

softmax_tester.hD04-Jul-20252 KiB7241

split_test.ccD04-Jul-202510.7 KiB334246

split_tester.ccD04-Jul-20259.5 KiB243192

split_tester.hD04-Jul-20252.7 KiB8956

sqrt_test.ccD04-Jul-20254.1 KiB12187

square_test.ccD04-Jul-20254.1 KiB12187

squared_difference_test.ccD04-Jul-202529.5 KiB823683

sub_test.ccD04-Jul-202532 KiB928773

test_util.ccD04-Jul-20254 KiB10676

test_util.hD04-Jul-20251.5 KiB4218

transpose_conv_test.ccD04-Jul-202530.2 KiB858758

transpose_conv_tester.ccD04-Jul-202517.2 KiB415323

transpose_conv_tester.hD04-Jul-20256.7 KiB238172

transpose_test.ccD04-Jul-20253.3 KiB10774

transpose_tester.ccD04-Jul-20257 KiB183140

transpose_tester.hD04-Jul-20252.3 KiB7846

unary_elementwise_tester.ccD04-Jul-20256.6 KiB187144

unary_elementwise_tester.hD04-Jul-20252.3 KiB7140

unsigned_dequantize_test.ccD04-Jul-20254.4 KiB137103

unsigned_quantized_add_test.ccD04-Jul-202543.9 KiB1,121984

unsigned_quantized_concatenation_test.ccD04-Jul-202511.5 KiB328231

unsigned_quantized_conv_2d_test.ccD04-Jul-202523.4 KiB583522

unsigned_quantized_depth_to_space_test.ccD04-Jul-20255.6 KiB157123

unsigned_quantized_depthwise_conv_2d_test.ccD04-Jul-202528.8 KiB703630

unsigned_quantized_fully_connected_test.ccD04-Jul-202519.8 KiB476415

unsigned_quantized_leaky_relu_test.ccD04-Jul-20255.4 KiB166129

unsigned_quantized_logistic_test.ccD04-Jul-20255.1 KiB147113

unsigned_quantized_max_pool_2d_test.ccD04-Jul-202515.9 KiB438378

unsigned_quantized_mean_test.ccD04-Jul-202517.9 KiB529439

unsigned_quantized_mul_test.ccD04-Jul-202543.9 KiB1,121984

unsigned_quantized_pad_test.ccD04-Jul-202510.5 KiB293235

unsigned_quantized_resize_bilinear_test.ccD04-Jul-20254.2 KiB12493

unsigned_quantized_split_test.ccD04-Jul-202510.9 KiB334246

unsigned_quantized_sub_test.ccD04-Jul-202543.9 KiB1,121984

unsigned_quantized_transpose_conv_test.ccD04-Jul-202523.9 KiB677595

unsigned_quantized_transpose_test.ccD04-Jul-20253.4 KiB10774

weights_cache_test.ccD04-Jul-20259.5 KiB235167

xnnpack_delegate.ccD04-Jul-2025223.6 KiB5,2854,645

xnnpack_delegate.hD04-Jul-20254.7 KiB10736

README.md

1# XNNPACK backend for TensorFlow Lite
2
3XNNPACK is a highly optimized library of neural network inference operators for
4ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, macOS,
5and Emscripten environments. This document describes how to use the XNNPACK
6library as an inference engine for TensorFlow Lite.
7
8## Using XNNPACK engine with TensorFlow Lite interpreter
9
10XNNPACK integrates with TensorFlow Lite interpreter through the delegation
11mechanism. TensorFlow Lite supports several methods to enable XNNPACK
12for floating-point inference.
13
14### Enable XNNPACK via Java API on Android (recommended on Android)
15
16Pre-built [nightly TensorFlow Lite binaries for Android](https://www.tensorflow.org/lite/guide/android#use_the_tensorflow_lite_aar_from_mavencentral)
17include XNNPACK, albeit it is disabled by default. Use the `setUseXNNPACK`
18method in `Interpreter.Options` class to enable it:
19
20```java
21Interpreter.Options interpreterOptions = new Interpreter.Options();
22interpreterOptions.setUseXNNPACK(true);
23Interpreter interpreter = new Interpreter(model, interpreterOptions);
24```
25
26### Enable XNNPACK via Swift/Objective-C API on iOS (recommended on iOS)
27
28Pre-built [nightly TensorFlow Lite CocoaPods](https://www.tensorflow.org/lite/guide/ios#specifying_versions)
29include XNNPACK, but do not enable it by default. Swift developers can use
30`InterpreterOptions` object to enable XNNPACK:
31
32```swift
33var options = InterpreterOptions()
34options.isXNNPackEnabled = true
35var interpreter = try Interpreter(modelPath: "model/path", options: options)
36```
37
38Objective-C developers can enable XNNPACK via a new property in the
39`TFLInterpreterOptions` class:
40
41```objc
42TFLInterpreterOptions *options = [[TFLInterpreterOptions alloc] init];
43options.useXNNPACK = YES;
44NSError *error;
45TFLInterpreter *interpreter =
46    [[TFLInterpreter alloc] initWithModelPath:@"model/path"
47                                      options:options
48                                        error:&error];
49```
50
51### Enable XNNPACK via Bazel build flags (recommended on desktop)
52
53When building TensorFlow Lite with Bazel, add
54`--define tflite_with_xnnpack=true`, and the TensorFlow Lite interpreter will
55use XNNPACK engine by default.
56
57The exact command depends on the target platform, e.g. for Android AAR you'd use
58
59```
60bazel build -c opt --fat_apk_cpu=x86,x86_64,arm64-v8a,armeabi-v7a \
61  --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
62  --define tflite_with_xnnpack=true \
63  //tensorflow/lite/java:tensorflow-lite
64```
65
66Note that in this case `Interpreter::SetNumThreads` invocation does not take
67effect on number of threads used by XNNPACK engine. In order to specify number
68of threads available for XNNPACK engine you should manually pass the value when
69constructing the interpreter. The snippet below illustrates this assuming you
70are using `InterpreterBuilder` to construct the interpreter:
71
72```c++
73// Load model
74tflite::Model* model;
75...
76
77// Construct the interprepter
78tflite::ops::builtin::BuiltinOpResolver resolver;
79std::unique_ptr<tflite::Interpreter> interpreter;
80
81TfLiteStatus res = tflite::InterpreterBuilder(model, resolver, num_threads);
82```
83
84**XNNPACK engine used by TensorFlow Lite interpreter uses a single thread for
85inference by default.**
86
87### Enable XNNPACK via additional dependency
88
89Another way to enable XNNPACK is to build and link the
90`//tensorflow/lite:tflite_with_xnnpack` target into your application alongside
91the TensorFlow Lite framework.
92
93This method works on platforms which support POSIX-style weak symbols (Android,
94iOS, Linux, Mac, but **NOT** Windows).
95
96### Enable XNNPACK via low-level delegate API (not recommended)
97
98While it is possible to use low-level delegate API to enable XNNPACK, this
99method is **NOT RECOMMENDED** unless you need to use TensorFlow Lite both with
100and without XNNPACK (e.g. for benchmarking).
101
102With low-level delegate API users create an XNNPACK delegate with the
103`TfLiteXNNPackDelegateCreate` function, and then call
104`Interpreter::ModifyGraphWithDelegate` to delegate supported parts of
105the model to the XNNPACK delegate. The users must destroy the delegate with
106`TfLiteXNNPackDelegateDelete` **after** releasing the TensorFlow Lite
107interpreter. The snippet below illustrates the typical usage:
108
109```c++
110// Build the interpreter
111std::unique_ptr<tflite::Interpreter> interpreter;
112...
113
114// IMPORTANT: initialize options with TfLiteXNNPackDelegateOptionsDefault() for
115// API-compatibility with future extensions of the TfLiteXNNPackDelegateOptions
116// structure.
117TfLiteXNNPackDelegateOptions xnnpack_options =
118    TfLiteXNNPackDelegateOptionsDefault();
119xnnpack_options.num_threads = num_threads;
120
121TfLiteDelegate* xnnpack_delegate =
122    TfLiteXNNPackDelegateCreate(&xnnpack_options);
123if (interpreter->ModifyGraphWithDelegate(xnnpack_delegate) != kTfLiteOk) {
124  // Report error and fall back to another delegate, or the default backend
125}
126
127...
128
129// Run inference using XNNPACK
130interpreter->Invoke()
131
132...
133
134// IMPORTANT: release the interpreter before destroying the delegate
135interpreter.reset();
136TfLiteXNNPackDelegateDelete(xnnpack_delegate);
137```
138
139### Using the XNNPACK weights cache
140
141XNNPACK internally packs static weights for operations (like convolutions) in
142order to make accessing weights more memory friendly. XNNPACK needs to allocate
143memory internally to hold these packed weights. If you are starting multiple
144TFLite interpreter instances based on the same model, there can be multiple
145copies of the same packed weights in each instance. This can cause high memory
146usage. The weights cache can be used to share packed weights between multiple
147TFLite instances.
148
149```c++
150// Create 2 interpreters which share the same model.
151std::unique_ptr<tflite::Interpreter> interpreter1;
152std::unique_ptr<tflite::Interpreter> interpreter2;
153
154// Create a weights cache that you can pass to XNNPACK delegate.
155TfLiteXNNPackDelegateWeightsCache* weights_cache =
156    TfLiteXNNPackDelegateWeightsCacheCreate();
157
158// Like using the low-level API above, initialize options, and pass this cache
159// to XNNPACK delegate via the options.
160TfLiteXNNPackDelegateOptions xnnpack_options =
161    TfLiteXNNPackDelegateOptionsDefault();
162xnnpack_options.weights_cache = weights_cache;
163
164// Modify graph with delegate, as above...
165TfLiteDelegate* delegate1 = TfLiteXNNPackDelegateCreate(&xnnpack_options);
166if (interpreter1->ModifyGraphWithDelegate(delegate1) != kTfLiteOk) {
167    // Static weights will be packed and written into weights_cache.
168}
169TfLiteDelegate* delegate2 = TfLiteXNNPackDelegateCreate(&xnnpack_options);
170if (interpreter1->ModifyGraphWithDelegate(delegate2) != kTfLiteOk) {
171    // XNNPACK will reuse packed weights if they can be found in the weights
172    // cache.
173}
174
175// Finalize the weights cache.
176// Hard finalization has the lowest memory overhead, but requires that all
177// TFLite interpreter instances must be created up front before any finalization
178// and inference.
179TfLiteXNNPackDelegateWeightsCacheFinalizeHard(weights_cache);
180
181// Alternatively, soft-finalizate the weights cache. This is useful if more
182// delegates using the same model will to be created after finalization.
183// TfLiteXNNPackDelegateWeightsCacheFinalizeSoft(weights_cache);
184
185// Later, after all the interpreters and XNNPACK delegates using the cache are
186// destroyed, release the weights cache.
187TfLiteXNNPackDelegateWeightsCacheDelete(weights_cache);
188```
189
190The weights cache is a contents-based cache. Every time XNNPACK has to pack
191weights, it first packs into a temporary buffer, then tries to look up if the
192packed weights can be found in the weights cache, based on the contents of the
193packed weights. If it can be found, we access the packed weights in the
194cache for subsequent operations, and the temporary buffer is freed. Otherwise,
195the packed weights is added to the cache.
196
197The weights cache has to be finalized before any inference, it will be an error
198otherwise. Hard finalization and soft finalization depends on whether new
199XNNPACK delegate instances will be created after finalization. Hard finalization
200does not allow new instances to be created, and has lower memory overhead. Soft
201finalization allows new instances to be created, and has higher memory overhead
202(up to the size of the largest packed weights, rounded up to page alignment).
203
204## Profiling
205When TfLite profiling is enabled, XNNPACK will time each operator and report the
206results to TfLite which will print them as part of the overall execution profile.
207
208## Limitations and supported operators
209
210XNNPACK delegate is a work-in-progress, and currently supports a limited set of
211operators. Unsupported operators will fall back to the default implementations,
212so models using a combination of supported and unsupported operators can still
213benefit from XNNPACK delegate.
214
215### Floating-Point (IEEE FP32) Operators
216
217Below is the list of currently supported floating-point operators:
218
219#### `ABS`
220
221* Inputs and outputs must be in 32-bit floating-point format.
222
223#### `ADD`
224
225* Inputs and outputs must be in 32-bit floating-point format.
226* Only addition with two inputs is supported.
227* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
228  but fused `TANH` and `SIGN_BIT` activations are not.
229
230#### `AVERAGE_POOL_2D`
231
232* Inputs and outputs must be in 32-bit floating-point format.
233* 1x1 pooling with non-unit stride is not supported.
234* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
235  but fused `TANH` and `SIGN_BIT` activations are not.
236
237#### `CEIL`
238
239* Inputs and outputs must be in 32-bit floating-point format.
240
241#### `CONCATENATION`
242
243* Inputs and outputs must be in 32-bit floating-point format.
244* Only concatenation with two, three, or four inputs is supported.
245
246#### `CONV_2D`
247
248* Inputs and outputs must be in 32-bit floating-point format.
249* Bias is mandatory.
250* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
251* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
252  but fused `TANH` and `SIGN_BIT` activations are not.
253
254#### `DEPTH_TO_SPACE`
255
256* Inputs and outputs must be in 32-bit floating-point format.
257* Block size must be greater than 1.
258
259#### `DEPTHWISE_CONV_2D`
260
261* Inputs and outputs must be in 32-bit floating-point format.
262* Bias is mandatory.
263* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
264* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
265  but fused `TANH` and `SIGN_BIT` activations are not.
266
267#### `DIV`
268
269* Inputs and outputs must be in 32-bit floating-point format.
270* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
271  but fused `TANH` and `SIGN_BIT` activations are not.
272
273#### `ELU`
274
275* Inputs and outputs must be in 32-bit floating-point format.
276
277#### `FULLY_CONNECTED`
278
279* Inputs and outputs must be in 32-bit floating-point format.
280* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
281* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
282  but fused `TANH` and `SIGN_BIT` activations are not.
283
284#### `FLOOR`
285
286* Inputs and outputs must be in 32-bit floating-point format.
287
288#### `HARD_SWISH`
289
290* Inputs and outputs must be in 32-bit floating-point format.
291
292#### `LEAKY_RELU`
293
294* Inputs and outputs must be in 32-bit floating-point format.
295
296#### `LOGISTIC`
297
298* Inputs and outputs must be in 32-bit floating-point format.
299
300#### `MAX_POOL_2D`
301
302* Inputs and outputs must be in 32-bit floating-point format.
303* 1x1 pooling with non-unit stride is not supported.
304* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
305  but fused `TANH` and `SIGN_BIT` activations are not.
306
307#### `MAXIMUM`
308
309* Inputs and outputs must be in 32-bit floating-point format.
310
311#### `MEAN`
312
313* The first input and the output must be 4D tensors in 32-bit
314  floating-point format.
315* The second input (the input with the axes specification) must be static
316  (use `kTfLiteMmapRo` allocation type).
317* Only [1, 2], [2, 1], and [2] axes specification (i.e. reduction across either
318  both spatial dimensions or across the width dimension) is supported.
319
320#### `MINIMUM`
321
322* Inputs and outputs must be in 32-bit floating-point format.
323
324#### `MUL`
325
326* Inputs and outputs must be in 32-bit floating-point format.
327* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
328  but fused `TANH` and `SIGN_BIT` activations are not.
329
330#### `NEG`
331
332* Inputs and outputs must be in 32-bit floating-point format.
333
334#### `PAD`
335
336* The first input and the output must be in 32-bit floating-point format.
337* The second input (the input with the padding specification) must be static
338  (use `kTfLiteMmapRo` allocation type).
339* The numbers of padding elements must be non-negative.
340
341#### `PRELU`
342
343* Inputs and outputs must be in 32-bit floating-point format.
344* Slope must be static (use `kTfLiteMmapRo` allocation type).
345* Slope must be either a 1D tensor, or have all its non-channel dimensions equal
346  1.
347
348#### `RELU`
349
350* Inputs and outputs must be in 32-bit floating-point format.
351
352#### `RELU6`
353
354* Inputs and outputs must be in 32-bit floating-point format.
355
356#### `RELU_N1_TO_1`
357
358* Inputs and outputs must be in 32-bit floating-point format.
359
360#### `RESHAPE`
361
362* The first input and the output must be in 32-bit floating-point format.
363* The second input (the input with the new shape specification) must be either
364  static (use `kTfLiteMmapRo` allocation type), or absent (with the new shape
365  specified via `ReshapeOptions` table).
366
367#### `RESIZE_BILINEAR`
368
369* The first input and the output must be 4D tensors in 32-bit floating-point
370  format.
371* The second input (the input with the new shape specification) must be
372  static (use `kTfLiteMmapRo` allocation type).
373
374#### `ROUND`
375
376* Inputs and outputs must be in 32-bit floating-point format.
377
378#### `SPLIT`
379
380* Inputs and outputs must be in 32-bit floating-point format.
381* Only split into two, three, or four outputs is supported.
382
383#### `SOFTMAX`
384
385* Inputs and outputs must be in 32-bit floating-point format.
386* Only `beta = 1.0` is supported.
387
388#### `SQRT`
389
390* Inputs and outputs must be in 32-bit floating-point format.
391
392#### `SQUARE`
393
394* Inputs and outputs must be in 32-bit floating-point format.
395
396#### `SQUARED_DIFFERENCE`
397
398* Inputs and outputs must be in 32-bit floating-point format.
399
400#### `SUB`
401
402* Inputs and outputs must be in 32-bit floating-point format.
403* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
404  but fused `TANH` and `SIGN_BIT` activations are not.
405
406#### `TRANSPOSE`
407
408* The first input and the output must be in 32-bit floating-point format.
409* The second input (the input with the permutation specification) must be
410  static (use `kTfLiteMmapRo` allocation type).
411
412#### `TRANSPOSE_CONV`
413
414* Input, filter, bias (if present) and output tensors must be in 32-bit
415  floating-point format.
416* Output size, filter and bias (if present) must be static (use
417  `kTfLiteMmapRo` allocation type).
418
419### Floating-Point (IEEE FP16) Operators (experimental)
420
421XNNPACK supports half-precision (using IEEE FP16 format) inference for a subset
422of floating-point operators. XNNPACK automatically enables half-precision
423inference when the following conditions are met:
424
425* XNNPACK runs on hardware that natively supports computations in IEEE FP16
426format. Currently, this hardware is limited to ARM64 devices with ARMv8.2 FP16
427arithmetics extension, and includes Android phones starting with Pixel 3,
428Galaxy S9 (Snapdragon SoC), Galaxy S10 (Exynos SoC), iOS devices with A11 or
429newer SoCs, and all Apple Silicon Macs.
430
431* IEEE FP16 inference is supported for every floating-point operator in the
432model.
433
434* The model's "reduced_precision_support" metadata indicates that the model
435is compatible with FP16 inference.
436
437When the above conditions are met, XNNPACK replace FP32 operators with their
438FP16 equivalents, and insert additional operators to convert model inputs
439from FP32 to FP16 and convert model outputs back from FP16 to FP32. If the
440above conditions are not met, XNNPACK will perform model inference with FP32
441calculations.
442
443Additionally, XNNPACK delegate provides an option to force FP16 inference
444regardless of model metadata. This option is intended for development workflows,
445and in particular for testing end-to-end accuracy of model when FP16 inference
446is used. Forcing FP16 inference has several effects:
447
448* Besides ARM64 devices with ARMv8.2 FP16 arithmetics extension, forced FP16
449inference is supported on x86/x86-64 devices with AVX2 extension in emulation
450mode: all elementary floating-point operations are computed in FP32, then
451converted to FP16 and back to FP32. Note that such simulation is not exactly
452equivalent to native FP16 inference, but simulates the effects of restricted
453mantissa precision and exponent range in the native FP16 arithmetics.
454
455* On devices that support neither the native FP16 arithmetics (ARM64 devices
456with ARMv8.2 FP16 arithmetics extension), nor emulation (x86/x86-64 devices with
457AVX2 extension), inference will fail rather than fall back to FP32.
458
459* If any floating-point operator offloaded to XNNPACK is not supported for FP16
460inference, inference will fail rather than fall back to FP32.
461
462To force FP16 inference, either build the delegate with
463`--define xnnpack_force_float_precision=fp16` option, or add
464`TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16` flag to the
465`TfLiteXNNPackDelegateOptions.flags` bitmask passed into
466the `TfLiteXNNPackDelegateCreate` call:
467
468```c
469TfLiteXNNPackDelegateOptions xnnpack_options =
470    TfLiteXNNPackDelegateOptionsDefault();
471...
472xnnpack_options.flags |= TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16;
473TfLiteDelegate* xnnpack_delegate =
474    TfLiteXNNPackDelegateCreate(&xnnpack_options);
475```
476
477Below is the list of operators supported in IEEE FP16 inference:
478
479#### `ABS`
480
481* Must satisfy constraints on the floating-point (FP32) operator.
482
483#### `ADD`
484
485* Must satisfy constraints on the floating-point (FP32) operator.
486* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
487
488#### `AVERAGE_POOL_2D`
489
490* Must satisfy constraints on the floating-point (FP32) operator.
491
492#### `CEIL`
493
494* Must satisfy constraints on the floating-point (FP32) operator.
495
496#### `CONV_2D`
497
498* Must satisfy constraints on the floating-point (FP32) operator.
499
500#### `CONCATENATION`
501
502* Must satisfy constraints on the floating-point (FP32) operator.
503* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
504
505#### `DEPTH_TO_SPACE`
506
507* Must satisfy constraints on the floating-point (FP32) operator.
508
509#### `DEPTHWISE_CONV_2D`
510
511* Must satisfy constraints on the floating-point (FP32) operator.
512
513#### `DIV`
514
515* Must satisfy constraints on the floating-point (FP32) operator.
516* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
517
518#### `FLOOR`
519
520* Must satisfy constraints on the floating-point (FP32) operator.
521
522#### `FULLY_CONNECTED`
523
524* Must satisfy constraints on the floating-point (FP32) operator.
525
526#### `HARD_SWISH`
527
528* Must satisfy constraints on the floating-point (FP32) operator.
529
530#### `LEAKY_RELU`
531
532* Must satisfy constraints on the floating-point (FP32) operator.
533
534#### `LOGISTIC`
535
536* Must satisfy constraints on the floating-point (FP32) operator.
537
538#### `MAX_POOL_2D`
539
540* Must satisfy constraints on the floating-point (FP32) operator.
541
542#### `MAXIMUM`
543
544* Must satisfy constraints on the floating-point (FP32) operator.
545* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
546
547#### `MEAN`
548
549* Must satisfy constraints on the floating-point (FP32) operator.
550
551#### `MINIMUM`
552
553* Must satisfy constraints on the floating-point (FP32) operator.
554* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
555
556#### `MUL`
557
558* Must satisfy constraints on the floating-point (FP32) operator.
559* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
560
561#### `NEG`
562
563* Must satisfy constraints on the floating-point (FP32) operator.
564
565#### `PAD`
566
567* Must satisfy constraints on the floating-point (FP32) operator.
568
569#### `PRELU`
570
571* Must satisfy constraints on the floating-point (FP32) operator.
572
573#### `RELU`
574
575* Must satisfy constraints on the floating-point (FP32) operator.
576
577#### `RELU6`
578
579* Must satisfy constraints on the floating-point (FP32) operator.
580
581#### `RELU_N1_TO_1`
582
583* Must satisfy constraints on the floating-point (FP32) operator.
584
585#### `RESHAPE`
586
587* Must satisfy constraints on the floating-point (FP32) operator.
588
589#### `RESIZE_BILINEAR`
590
591* Must satisfy constraints on the floating-point (FP32) operator.
592
593#### `ROUND`
594
595* Must satisfy constraints on the floating-point (FP32) operator.
596
597#### `SPLIT`
598
599* Must satisfy constraints on the floating-point (FP32) operator.
600
601#### `SOFTMAX`
602
603* Must satisfy constraints on the floating-point (FP32) operator.
604
605#### `SQRT`
606
607* Must satisfy constraints on the floating-point (FP32) operator.
608
609#### `SQUARE`
610
611* Must satisfy constraints on the floating-point (FP32) operator.
612
613#### `SQUARED_DIFFERENCE`
614
615* Must satisfy constraints on the floating-point (FP32) operator.
616* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
617
618#### `SUB`
619
620* Must satisfy constraints on the floating-point (FP32) operator.
621* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type).
622
623#### `TRANSPOSE`
624
625* Must satisfy constraints on the floating-point (FP32) operator.
626
627#### `TRANSPOSE_CONV`
628
629* Must satisfy constraints on the floating-point (FP32) operator.
630
631### Quantized Operators
632
633By default, quantized inference in XNNPACK delegate is disabled, and XNNPACK is
634used only for floating-point models. Support for quantized inference in XNNPACK
635must be enabled by adding extra Bazel flags when building TensorFlow Lite.
636
637* `--define tflite_with_xnnpack_qs8=true` flag enables XNNPACK inference for
638  quantized operators using signed quantization schema. This schema is used by
639  models produced by [Model Optimization
640  Toolkit](https://www.tensorflow.org/model_optimization) through either
641  post-training integer quantization or quantization-aware training.
642  Post-training dynamic range quantization is not supported in XNNPACK.
643
644* `--define tflite_with_xnnpack_qu8=true` flag enables XNNPACK inference for
645  quantized operators using unsigned quantization schema, produced via the
646  legacy TensorFlow 1.X quantization tooling. This option is experimental and
647  may perform suboptimally on mobile processors with NEON DOT product
648  instructions.
649
650Below is the list of currently supported quantized operators:
651
652#### `ADD`
653
654* Inputs and outputs must be in 8-bit quantized format.
655* Only addition with two inputs is supported.
656* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
657  but fused `TANH` and `SIGN_BIT` activations are not.
658
659#### `CONCATENATION`
660
661* Inputs and outputs must be in 8-bit quantized format.
662* Only concatenation with two, three, or four inputs is supported.
663
664#### `CONV_2D`
665
666* Inputs and outputs must be in 8-bit quantized format (bias must be in 32-bit
667  quantized format).
668* Bias is mandatory.
669* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
670  and can use either per-tensor or per-channel quantization parameters.
671* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
672  but fused `TANH` and `SIGN_BIT` activations are not.
673
674#### `DEPTH_TO_SPACE`
675
676* Inputs and outputs must be in 8-bit quantized format.
677* Block size must be greater than 1.
678
679#### `DEPTHWISE_CONV_2D`
680
681* Inputs and outputs must be in 8-bit quantized format (bias must be in
682  32-bit quantized format).
683* Bias is mandatory.
684* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
685  and can use either per-tensor or per-channel quantization parameters.
686* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
687  but fused `TANH` and `SIGN_BIT` activations are not.
688
689#### `DEQUANTIZE`
690
691* Input tensor must be in 8-bit quantized format without per-channel
692  quantization.
693* Output tensor must be in 32-bit floating-point format.
694
695#### `ELU`
696
697* Inputs and outputs must be in 8-bit signed quantized format.
698
699#### `FULLY_CONNECTED`
700
701* Inputs and outputs must be in 8-bit quantized format (bias, if present, must
702  be in 32-bit quantized format).
703* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
704* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
705  but fused `TANH` and `SIGN_BIT` activations are not.
706
707#### `LEAKY_RELU`
708
709* Inputs and outputs must be in 8-bit quantized format.
710* The ratio of input scale to output scale must be within [1/256, 128].
711* The product of negative slope by the ratio of input scale to output scale
712  must be within either [-127.99609375, -1/256] range or [1/256, 128] range.
713
714#### `LOGISTIC`
715
716* Inputs and outputs must be in 8-bit quantized format.
717
718#### `MAX_POOL_2D`
719
720* Inputs and outputs must be in 8-bit quantized format.
721* 1x1 pooling with non-unit stride is not supported.
722* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
723  but fused `TANH` and `SIGN_BIT` activations are not.
724
725#### `MEAN`
726
727* The first input and the output must be 4D tensors in 8-bit quantized format.
728* The second input (the input with the axes specification) must be static
729  (use `kTfLiteMmapRo` allocation type).
730* Only [1, 2], [2, 1], and [2] axes specification (i.e. reduction across either
731  both spatial dimensions or across the width dimension) is supported.
732
733#### `MUL`
734
735* Inputs and outputs must be in 8-bit quantized format.
736* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
737  but fused `TANH` and `SIGN_BIT` activations are not.
738
739#### `PAD`
740
741* The first input and the output must be in 8-bit quantized format.
742* The second input (the input with the padding specification) must be static
743  (use `kTfLiteMmapRo` allocation type).
744* The numbers of padding elements must be non-negative.
745
746#### `QUANTIZE`
747
748* Input tensor must be in 32-bit floating-point format or in 8-bit quantized
749  format.
750* Output tensor must be in 8-bit quantized format without per-channel
751  quantization.
752* If inputs are in 8-bit quantized format, they must have the same signedness
753  as the outputs, and the ratio of input scale to output scale must be in the
754  [2**-8, 2**7] range.
755
756#### `RESIZE_BILINEAR`
757
758* The first input and the output must be 4D tensors in 8-bit quantized format.
759* The second input (the input with the new shape specification) must be
760  static (use `kTfLiteMmapRo` allocation type).
761
762#### `SPLIT`
763
764* Inputs and outputs must be in 8-bit quantized format.
765* Only split into two, three, or four outputs is supported.
766
767#### `SUB`
768
769* Inputs and outputs must be in 8-bit quantized format.
770* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
771  but fused `TANH` and `SIGN_BIT` activations are not.
772
773#### `TRANSPOSE`
774
775* The first input and the output must be in 8-bit quantized format.
776* The second input (the input with the permutation specification) must be
777  static (use `kTfLiteMmapRo` allocation type).
778
779#### `TRANSPOSE_CONV`
780
781* Input, filter, and output tensors must be in 8-bit quantized format (bias, if
782  present, must be in 32-bit quantized format).
783* Output size, filter and bias (if present) must be static (use
784  `kTfLiteMmapRo` allocation type).
785
786### Sparse Inference
787
788XNNPACK backend supports sparse inference for CNN models described in the
789[Fast Sparse ConvNets](https://arxiv.org/abs/1911.09723) paper. Sparse
790inference is restricted to subgraphs with the following operators:
791
792* Sparse subgraph must store its weights in sparse representation (using
793  `DENSIFY` operators in the TensorFlow Lite schema).
794* Sparse subgraph must start with a 3x3 stride-2 `CONV_2D` operator with
795  padding 1 on each side, no dilation, and 3 input channels.
796* Sparse subgraph must end with either a `MEAN` operator with reduction across
797  spatial axes, or a `DEPTH_TO_SPACE` operator.
798* Sparse subgraph may contain the following operators:
799  * `CONV_2D` with 1x1 kernel and no padding. At least 2/3rd of filter weights
800    in the 1x1 `CONV_2D` operators across the sparse subgraph must be zeroes
801    to enable sparse inference.
802  * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 1, no dilation, and padding 1
803    on each side.
804  * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 2, no dilation, and padding 1
805    on each side.
806  * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 1, no dilation, and padding 2
807    on each side.
808  * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 2, no dilation, and padding 2
809    on each side.
810  * `RESIZE_BILINEAR` operator with output dimensions greater than 1.
811  * `MEAN` operator with reduction across spatial axes.
812  * `ADD` and `MUL` operators where both inputs are 4D tensors. If one of the
813    inputs to `ADD` or `MUL` is a constant tensor, it must be representable as
814    either a scalar, or a 1D vector.
815  * Unary elementwise operators `ABS`, `CEIL`, `ELU`, `FLOOR`, `HARD_SWISH`,
816    `LEAKY_RELU`, `LOGISTIC`, `NEG`, `RELU`, `RELU6`, `RELU_N1_TO_1`, `ROUND`,
817    `SIGMOID`, and `SQUARE`.
818
819Pre-trained [Fast Sparse ConvNets models](https://github.com/google-research/google-research/tree/master/fastconvnets)
820provide examples that satisfy these constrains.
821
822### Other limitations
823
824* Dynamically allocated (with `kTfLiteDynamic` allocation type) inputs and
825  outputs are not supported.
826* Resizing model inputs (via `Interpreter::ResizeInputTensor`) is supported, but
827  cause a complete reinitialization of the delegate instance, which has
828  considerable overhead.
829