• Home
Name Date Size #Lines LOC

..--

BUILDD03-May-202430.2 KiB1,1771,096

README.mdD03-May-202415.6 KiB438311

abs_test.ccD03-May-20244.1 KiB12187

add_test.ccD03-May-202430.9 KiB899748

average_pool_2d_test.ccD03-May-202415.3 KiB425365

binary_elementwise_tester.ccD03-May-202418.8 KiB471414

binary_elementwise_tester.hD03-May-20244.2 KiB149102

ceil_test.ccD03-May-20244.1 KiB12187

channelwise_quantized_conv_2d_test.ccD03-May-202424.8 KiB620549

channelwise_quantized_depthwise_conv_2d_test.ccD03-May-202430.3 KiB760673

conv_2d_test.ccD03-May-202423 KiB650571

conv_2d_tester.ccD03-May-202417.2 KiB403337

conv_2d_tester.hD03-May-20246.5 KiB241175

delegate_test.ccD03-May-20242.4 KiB6743

depth_to_space_test.ccD03-May-20245.5 KiB157123

depth_to_space_tester.ccD03-May-20246.5 KiB171130

depth_to_space_tester.hD03-May-20242.8 KiB9860

depthwise_conv_2d_test.ccD03-May-202423.4 KiB657572

depthwise_conv_2d_tester.ccD03-May-202416.2 KiB377312

depthwise_conv_2d_tester.hD03-May-20246.9 KiB245178

div_test.ccD03-May-202430.9 KiB899748

elu_test.ccD03-May-20244.1 KiB12187

floor_test.ccD03-May-20244.1 KiB12187

fully_connected_test.ccD03-May-202415.7 KiB444371

fully_connected_tester.ccD03-May-202413.9 KiB343277

fully_connected_tester.hD03-May-20243.9 KiB14295

hard_swish_test.ccD03-May-20244.2 KiB12187

leaky_relu_test.ccD03-May-20244.6 KiB138101

leaky_relu_tester.ccD03-May-20245.7 KiB159115

leaky_relu_tester.hD03-May-20242 KiB7039

logistic_test.ccD03-May-20244.3 KiB12591

max_pool_2d_test.ccD03-May-202415.2 KiB425365

maximum_test.ccD03-May-202427.6 KiB794658

mean_test.ccD03-May-202416.8 KiB502412

minimum_test.ccD03-May-202427.6 KiB794658

mul_test.ccD03-May-202430.9 KiB899748

neg_test.ccD03-May-20244.1 KiB12187

pad_test.ccD03-May-20249.9 KiB280222

pad_tester.ccD03-May-20247.1 KiB192147

pad_tester.hD03-May-20242.6 KiB8855

pool_2d_tester.ccD03-May-20248 KiB201151

pool_2d_tester.hD03-May-20244.8 KiB178126

prelu_test.ccD03-May-202419.9 KiB605481

prelu_tester.ccD03-May-202410.6 KiB265218

prelu_tester.hD03-May-20242.7 KiB9760

quantization_util.ccD03-May-20241.6 KiB4523

quantization_util.hD03-May-20241.8 KiB4615

quantization_util_test.ccD03-May-20242.6 KiB8453

quantized_binary_elementwise_tester.ccD03-May-202411.2 KiB286234

quantized_binary_elementwise_tester.hD03-May-20245.4 KiB181127

quantized_conv_2d_tester.ccD03-May-202411.6 KiB273224

quantized_conv_2d_tester.hD03-May-20248.2 KiB283205

quantized_depthwise_conv_2d_tester.ccD03-May-202411.7 KiB278229

quantized_depthwise_conv_2d_tester.hD03-May-20248.6 KiB295215

quantized_fully_connected_tester.ccD03-May-202410 KiB248199

quantized_fully_connected_tester.hD03-May-20245.4 KiB186129

quantized_pad_tester.ccD03-May-20248 KiB205160

quantized_pad_tester.hD03-May-20243.5 KiB12080

reduce_tester.ccD03-May-20246.6 KiB176132

reduce_tester.hD03-May-20243.5 KiB11881

relu6_test.ccD03-May-20244.1 KiB12187

relu_n1_to_1_test.ccD03-May-20244.2 KiB12187

relu_test.ccD03-May-20244.1 KiB12187

reshape_test.ccD03-May-20248 KiB226178

reshape_tester.ccD03-May-20246.9 KiB186140

reshape_tester.hD03-May-20242.6 KiB8854

resize_bilinear_test.ccD03-May-20244 KiB12089

resize_bilinear_tester.ccD03-May-20247.2 KiB188143

resize_bilinear_tester.hD03-May-20243.3 KiB11675

round_test.ccD03-May-20244.1 KiB12187

signed_quantized_add_test.ccD03-May-202442.6 KiB1,068930

signed_quantized_conv_2d_test.ccD03-May-202419.7 KiB503445

signed_quantized_depthwise_conv_2d_test.ccD03-May-202424 KiB605535

signed_quantized_fully_connected_test.ccD03-May-202417.6 KiB434373

signed_quantized_mul_test.ccD03-May-202442.6 KiB1,067930

signed_quantized_pad_test.ccD03-May-202410.2 KiB280222

softmax_test.ccD03-May-20244.7 KiB141103

softmax_tester.ccD03-May-20245.8 KiB161117

softmax_tester.hD03-May-20242 KiB7241

sqrt_test.ccD03-May-20244.1 KiB12187

square_test.ccD03-May-20244.1 KiB12187

squared_difference_test.ccD03-May-202428.5 KiB794658

sub_test.ccD03-May-202430.9 KiB899748

test_util.ccD03-May-20241.7 KiB4925

test_util.hD03-May-20241.1 KiB3311

unary_elementwise_tester.ccD03-May-20246.5 KiB186143

unary_elementwise_tester.hD03-May-20242.3 KiB7140

unsigned_quantized_add_test.ccD03-May-202443.9 KiB1,122984

unsigned_quantized_conv_2d_test.ccD03-May-202421.8 KiB542484

unsigned_quantized_depthwise_conv_2d_test.ccD03-May-202426.7 KiB656586

unsigned_quantized_fully_connected_test.ccD03-May-202419.8 KiB476415

unsigned_quantized_mul_test.ccD03-May-202443.9 KiB1,121984

unsigned_quantized_pad_test.ccD03-May-202410.5 KiB293235

xnnpack_delegate.ccD03-May-2024166.1 KiB3,9133,406

xnnpack_delegate.hD03-May-20242.4 KiB6219

README.md

1# XNNPACK backend for TensorFlow Lite
2
3XNNPACK is a highly optimized library of neural network inference operators for
4ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, macOS,
5and Emscripten environments. This document describes how to use the XNNPACK
6library as an inference engine for TensorFlow Lite.
7
8## Using XNNPACK engine with TensorFlow Lite interpreter
9
10XNNPACK integrates with TensorFlow Lite interpreter through the delegation
11mechanism. TensorFlow Lite supports several methods to enable XNNPACK
12for floating-point inference.
13
14### Enable XNNPACK via Java API on Android (recommended on Android)
15
16Pre-built [nightly TensorFlow Lite binaries for Android](https://www.tensorflow.org/lite/guide/android#use_the_tensorflow_lite_aar_from_mavencentral)
17include XNNPACK, albeit it is disabled by default. Use the `setUseXNNPACK`
18method in `Interpreter.Options` class to enable it:
19
20```java
21Interpreter.Options interpreterOptions = new Interpreter.Options();
22interpreterOptions.setUseXNNPACK(true);
23Interpreter interpreter = new Interpreter(model, interpreterOptions);
24```
25
26### Enable XNNPACK via Swift/Objective-C API on iOS (recommended on iOS)
27
28Pre-built [nightly TensorFlow Lite CocoaPods](https://www.tensorflow.org/lite/guide/ios#specifying_versions)
29include XNNPACK, but do not enable it by default. Swift developers can use
30`InterpreterOptions` object to enable XNNPACK:
31
32```swift
33var options = InterpreterOptions()
34options.isXNNPackEnabled = true
35var interpreter = try Interpreter(modelPath: "model/path", options: options)
36```
37
38Objective-C developers can enable XNNPACK via a new property in the
39`TFLInterpreterOptions` class:
40
41```objc
42TFLInterpreterOptions *options = [[TFLInterpreterOptions alloc] init];
43options.useXNNPACK = YES;
44NSError *error;
45TFLInterpreter *interpreter =
46    [[TFLInterpreter alloc] initWithModelPath:@"model/path"
47                                      options:options
48                                        error:&error];
49```
50
51### Enable XNNPACK via Bazel build flags (recommended on desktop)
52
53When building TensorFlow Lite with Bazel, add
54`--define tflite_with_xnnpack=true`, and the TensorFlow Lite interpreter will
55use XNNPACK engine by default.
56
57The exact command depends on the target platform, e.g. for Android AAR you'd use
58
59```
60bazel build -c opt --fat_apk_cpu=x86,x86_64,arm64-v8a,armeabi-v7a \
61  --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
62  --define tflite_with_xnnpack=true \
63  //tensorflow/lite/java:tensorflow-lite
64```
65
66Note that in this case `Interpreter::SetNumThreads` invocation does not take
67effect on number of threads used by XNNPACK engine. In order to specify number
68of threads available for XNNPACK engine you should manually pass the value when
69constructing the interpreter. The snippet below illustrates this assuming you
70are using `InterpreterBuilder` to construct the interpreter:
71
72```c++
73// Load model
74tflite::Model* model;
75...
76
77// Construct the interprepter
78tflite::ops::builtin::BuiltinOpResolver resolver;
79std::unique_ptr<tflite::Interpreter> interpreter;
80
81TfLiteStatus res = tflite::InterpreterBuilder(model, resolver, num_threads);
82```
83
84**XNNPACK engine used by TensorFlow Lite interpreter uses a single thread for
85inference by default.**
86
87### Enable XNNPACK via additional dependency
88
89Another way to enable XNNPACK is to build and link the
90`//tensorflow/lite:tflite_with_xnnpack` target into your application alongside
91the TensorFlow Lite framework.
92
93This method works on platforms which support POSIX-style weak symbols (Android,
94iOS, Linux, Mac, but **NOT** Windows).
95
96### Enable XNNPACK via low-level delegate API (not recommended)
97
98While it is possible to use low-level delegate API to enable XNNPACK, this
99method is **NOT RECOMMENDED** unless you need to use TensorFlow Lite both with
100and without XNNPACK (e.g. for benchmarking).
101
102With low-level delegate API users create an XNNPACK delegate with the
103`TfLiteXNNPackDelegateCreate` function, and then call
104`Interpreter::ModifyGraphWithDelegate` to delegate supported parts of
105the model to the XNNPACK delegate. The users must destroy the delegate with
106`TfLiteXNNPackDelegateDelete` **after** releasing the TensorFlow Lite
107interpreter. The snippet below illustrates the typical usage:
108
109```c++
110// Build the interpreter
111std::unique_ptr<tflite::Interpreter> interpreter;
112...
113
114// IMPORTANT: initialize options with TfLiteXNNPackDelegateOptionsDefault() for
115// API-compatibility with future extensions of the TfLiteXNNPackDelegateOptions
116// structure.
117TfLiteXNNPackDelegateOptions xnnpack_options =
118    TfLiteXNNPackDelegateOptionsDefault();
119xnnpack_options.num_threads = num_threads;
120
121TfLiteDelegate* xnnpack_delegate =
122    TfLiteXNNPackDelegateCreate(&xnnpack_options);
123if (interpreter->ModifyGraphWithDelegate(xnnpack_delegate) != kTfLiteOk) {
124  // Report error and fall back to another delegate, or the default backend
125}
126
127...
128
129// Run inference using XNNPACK
130interpreter->Invoke()
131
132...
133
134// IMPORTANT: release the interpreter before destroying the delegate
135interpreter.reset();
136TfLiteXNNPackDelegateDelete(xnnpack_delegate);
137```
138
139## Limitations and supported operators
140
141XNNPACK delegate is a work-in-progress, and currently supports a limited set of
142operators. Unsupported operators will fall back to the default implementations,
143so models using a combination of supported and unsupported operators can still
144benefit from XNNPACK delegate.
145
146### Floating-Point Operators
147
148Below is the list of currently supported floating-point operators:
149
150#### `ABS`
151
152* Inputs and outputs must be in 32-bit floating-point format.
153
154#### `ADD`
155
156* Inputs and outputs must be in 32-bit floating-point format.
157* Only addition with two inputs is supported.
158* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
159  but fused `TANH` and `SIGN_BIT` activations are not.
160
161#### `AVERAGE_POOL_2D`
162
163* Inputs and outputs must be in 32-bit floating-point format.
164* 1x1 pooling with non-unit stride is not supported.
165* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
166  but fused `TANH` and `SIGN_BIT` activations are not.
167
168#### `CEIL`
169
170* Inputs and outputs must be in 32-bit floating-point format.
171
172#### `CONV_2D`
173
174* Inputs and outputs must be in 32-bit floating-point format.
175* Bias is mandatory.
176* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
177* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
178  but fused `TANH` and `SIGN_BIT` activations are not.
179
180#### `DEPTH_TO_SPACE`
181
182* Inputs and outputs must be in 32-bit floating-point format.
183* Block size must be greater than 1.
184
185#### `DEPTHWISE_CONV_2D`
186
187* Inputs and outputs must be in 32-bit floating-point format.
188* Bias is mandatory.
189* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
190* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
191  but fused `TANH` and `SIGN_BIT` activations are not.
192
193#### `DIV`
194
195* Inputs and outputs must be in 32-bit floating-point format.
196* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
197  but fused `TANH` and `SIGN_BIT` activations are not.
198
199#### `ELU`
200
201* Inputs and outputs must be in 32-bit floating-point format.
202
203#### `FULLY_CONNECTED`
204
205* Inputs and outputs must be in 32-bit floating-point format.
206* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
207* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
208  but fused `TANH` and `SIGN_BIT` activations are not.
209
210#### `FLOOR`
211
212* Inputs and outputs must be in 32-bit floating-point format.
213
214#### `HARD_SWISH`
215
216* Inputs and outputs must be in 32-bit floating-point format.
217
218#### `LEAKY_RELU`
219
220* Inputs and outputs must be in 32-bit floating-point format.
221
222#### `LOGISTIC`
223
224* Inputs and outputs must be in 32-bit floating-point format.
225
226#### `MAX_POOL_2D`
227
228* Inputs and outputs must be in 32-bit floating-point format.
229* 1x1 pooling with non-unit stride is not supported.
230* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
231  but fused `TANH` and `SIGN_BIT` activations are not.
232
233#### `MAXIMUM`
234
235* Inputs and outputs must be in 32-bit floating-point format.
236
237#### `MEAN`
238
239* The first input and the output must be a 4D tensors in 32-bit
240  floating-point format.
241* The second input (the input with the axes specification) must be static
242  (use `kTfLiteMmapRo` allocation type).
243* Only [1, 2] or [2, 1] axes specification (i.e. reduction across spatial
244  dimensions) is supported.
245* Only `keep_dims = True` parameter value is supported.
246
247#### `MINIMUM`
248
249* Inputs and outputs must be in 32-bit floating-point format.
250
251#### `MUL`
252
253* Inputs and outputs must be in 32-bit floating-point format.
254* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
255  but fused `TANH` and `SIGN_BIT` activations are not.
256
257#### `NEG`
258
259* Inputs and outputs must be in 32-bit floating-point format.
260
261#### `PAD`
262
263* The first input and the output must be in 32-bit floating-point format.
264* The second input (the input with the padding specification) must be static
265  (use `kTfLiteMmapRo` allocation type).
266* The numbers of padding elements must be non-negative.
267
268#### `PRELU`
269
270* Inputs and outputs must be in 32-bit floating-point format.
271* Slope must be static (use `kTfLiteMmapRo` allocation type).
272* Slope must be either a 1D tensor, or have all its non-channel dimensions equal
273  1.
274
275#### `RELU`
276
277* Inputs and outputs must be in 32-bit floating-point format.
278
279#### `RELU6`
280
281* Inputs and outputs must be in 32-bit floating-point format.
282
283#### `RELU_N1_TO_1`
284
285* Inputs and outputs must be in 32-bit floating-point format.
286
287#### `RESHAPE`
288
289* The first input and the output must be in 32-bit floating-point format.
290* The second input (the input with the new shape specification) must be either
291  static (use `kTfLiteMmapRo` allocation type), or absent (with the new shape
292  specified via `ReshapeOptions` table).
293
294#### `RESIZE_BILINEAR`
295
296* The first input and the output must be 4D tensors in 32-bit floating-point
297  format.
298* The second input (the input with the new shape specification) must be
299  static (use `kTfLiteMmapRo` allocation type).
300
301#### `ROUND`
302
303* Inputs and outputs must be in 32-bit floating-point format.
304
305#### `SOFTMAX`
306
307* Inputs and outputs must be in 32-bit floating-point format.
308* Only `beta = 1.0` is supported.
309
310#### `SQRT`
311
312* Inputs and outputs must be in 32-bit floating-point format.
313
314#### `SQUARE`
315
316* Inputs and outputs must be in 32-bit floating-point format.
317
318#### `SQUARED_DIFFERENCE`
319
320* Inputs and outputs must be in 32-bit floating-point format.
321
322#### `SUB`
323
324* Inputs and outputs must be in 32-bit floating-point format.
325* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
326  but fused `TANH` and `SIGN_BIT` activations are not.
327
328### Quantized Operators
329
330By default, quantized inference in XNNPACK delegate is disabled, and XNNPACK is
331used only for floating-point models. Support for quantized inference in XNNPACK
332must be enabled by adding extra Bazel flags when building TensorFlow Lite.
333
334* `--define xnn_enable_qs8=true` flag enables XNNPACK inference for quantized
335operators using signed quantization schema. This schema is used by models
336produced by [Model Optimization Toolkit](https://www.tensorflow.org/model_optimization)
337through either post-training integer quantization or quantization-aware
338training. Post-training dynamic range quantization is not supported in XNNPACK.
339
340* `--define xnn_enable_qu8=true` flag enables XNNPACK inference for quantized
341operators using unsigned quantization schema, produced via the legacy TensorFlow
3421.X quantization tooling. This option is experimental and may perform
343suboptimally on mobile processors with NEON DOT product instructions.
344
345Below is the list of currently supported quantized operators:
346
347#### `ADD`
348
349* Inputs and outputs must be in 8-bit quantized format.
350* Only addition with two inputs is supported.
351* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
352  but fused `TANH` and `SIGN_BIT` activations are not.
353
354#### `CONV_2D`
355
356* Inputs and outputs must be in 8-bit quantized format (bias must be in 32-bit
357  quantized format).
358* Bias is mandatory.
359* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
360  and can use either per-tensor or per-channel quantization parameters.
361* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
362  but fused `TANH` and `SIGN_BIT` activations are not.
363
364#### `DEPTHWISE_CONV_2D`
365
366* Inputs and outputs must be in 8-bit quantized format (bias must be in
367  32-bit quantized format).
368* Bias is mandatory.
369* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
370  and can use either per-tensor or per-channel quantization parameters.
371* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
372  but fused `TANH` and `SIGN_BIT` activations are not.
373
374#### `FULLY_CONNECTED`
375
376* Inputs and outputs must be in 8-bit quantized format (bias, if present, must
377  be in 32-bit quantized format).
378* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
379* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
380  but fused `TANH` and `SIGN_BIT` activations are not.
381
382#### `MUL`
383
384* Inputs and outputs must be in 8-bit quantized format.
385* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
386  but fused `TANH` and `SIGN_BIT` activations are not.
387
388#### `PAD`
389
390* The first input and the output must be in 8-bit quantized format.
391* The second input (the input with the padding specification) must be static
392  (use `kTfLiteMmapRo` allocation type).
393* The numbers of padding elements must be non-negative.
394
395### Sparse Inference
396
397XNNPACK backend supports sparse inference for CNN models described in the
398[Fast Sparse ConvNets](https://arxiv.org/abs/1911.09723) paper. Sparse
399inference is restricted to subgraphs with the following operators:
400
401* Sparse subgraph must store its weights in sparse representation (using
402  `DENSIFY` operators in the TensorFlow Lite schema).
403* Sparse subgraph must start with a 3x3 stride-2 `CONV_2D` operator with
404  padding 1 on each side, no dilation, and 3 input channels.
405* Sparse subgraph must end with either a `MEAN` operator with reduction across
406  spatial axes, or a `DEPTH_TO_SPACE` operator.
407* Sparse subgraph may contain the following operators:
408  * `CONV_2D` with 1x1 kernel and no padding. At least 2/3rd of filter weights
409    in the 1x1 `CONV_2D` operators across the sparse subgraph must be zeroes
410    to enable sparse inference.
411  * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 1, no dilation, and padding 1
412    on each side.
413  * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 2, no dilation, and padding 1
414    on each side.
415  * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 1, no dilation, and padding 2
416    on each side.
417  * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 2, no dilation, and padding 2
418    on each side.
419  * `RESIZE_BILINEAR` operator with output dimensions greater than 1.
420  * `MEAN` operator with reduction across spatial axes.
421  * `ADD` and `MUL` operators where both inputs are 4D tensors. If one of the
422    inputs to `ADD` or `MUL` is a constant tensor, it must be representable as
423    either a scalar, or a 1D vector.
424  * Unary elementwise operators `ABS`, `CEIL`, `ELU`, `FLOOR`, `HARD_SWISH`,
425    `LEAKY_RELU`, `LOGISTIC`, `NEG`, `RELU`, `RELU6`, `RELU_N1_TO_1`, `ROUND`,
426    `SIGMOID`, and `SQUARE`.
427
428Pre-trained [Fast Sparse ConvNets models](https://github.com/google-research/google-research/tree/master/fastconvnets)
429provide examples that satisfy these constrains.
430
431### Other limitations
432
433* Dynamically allocated (with `kTfLiteDynamic` allocation type) inputs and
434  outputs are not supported.
435* Resizing model inputs (via `Interpreter::ResizeInputTensor`) is supported, but
436  cause a complete reinitialization of the delegate instance, which has
437  considerable overhead.
438