xnnpack - OpenGrok cross reference for /external/tensorflow/tensorflow/lite/delegates/xnnpack/

# XNNPACK backend for TensorFlow Lite

XNNPACK is a highly optimized library of neural network inference operators for
ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, macOS,
and Emscripten environments. This document describes how to use the XNNPACK
library as an inference engine for TensorFlow Lite.

## Using XNNPACK engine with TensorFlow Lite interpreter

XNNPACK integrates with TensorFlow Lite interpreter through the delegation
mechanism. TensorFlow Lite supports several methods to enable XNNPACK
for floating-point inference.

### Enable XNNPACK via Java API on Android (recommended on Android)

Pre-built [nightly TensorFlow Lite binaries for Android](https://www.tensorflow.org/lite/guide/android#use_the_tensorflow_lite_aar_from_mavencentral)
include XNNPACK, albeit it is disabled by default. Use the `setUseXNNPACK`
method in `Interpreter.Options` class to enable it:

```java
Interpreter.Options interpreterOptions = new Interpreter.Options();
interpreterOptions.setUseXNNPACK(true);
Interpreter interpreter = new Interpreter(model, interpreterOptions);
```

### Enable XNNPACK via Swift/Objective-C API on iOS (recommended on iOS)

Pre-built [nightly TensorFlow Lite CocoaPods](https://www.tensorflow.org/lite/guide/ios#specifying_versions)
include XNNPACK, but do not enable it by default. Swift developers can use
`InterpreterOptions` object to enable XNNPACK:

```swift
var options = InterpreterOptions()
options.isXNNPackEnabled = true
var interpreter = try Interpreter(modelPath: "model/path", options: options)
```

Objective-C developers can enable XNNPACK via a new property in the
`TFLInterpreterOptions` class:

```objc
TFLInterpreterOptions *options = [[TFLInterpreterOptions alloc] init];
options.useXNNPACK = YES;
NSError *error;
TFLInterpreter *interpreter =
    [[TFLInterpreter alloc] initWithModelPath:@"model/path"
                                      options:options
                                        error:&error];
```

### Enable XNNPACK via Bazel build flags (recommended on desktop)

When building TensorFlow Lite with Bazel, add
`--define tflite_with_xnnpack=true`, and the TensorFlow Lite interpreter will
use XNNPACK engine by default.

The exact command depends on the target platform, e.g. for Android AAR you'd use

```
bazel build -c opt --fat_apk_cpu=x86,x86_64,arm64-v8a,armeabi-v7a \
  --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
  --define tflite_with_xnnpack=true \
  //tensorflow/lite/java:tensorflow-lite
```

Note that in this case `Interpreter::SetNumThreads` invocation does not take
effect on number of threads used by XNNPACK engine. In order to specify number
of threads available for XNNPACK engine you should manually pass the value when
constructing the interpreter. The snippet below illustrates this assuming you
are using `InterpreterBuilder` to construct the interpreter:

```c++
// Load model
tflite::Model* model;
...

// Construct the interprepter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;

TfLiteStatus res = tflite::InterpreterBuilder(model, resolver, num_threads);
```

**XNNPACK engine used by TensorFlow Lite interpreter uses a single thread for
inference by default.**

### Enable XNNPACK via additional dependency

Another way to enable XNNPACK is to build and link the
`//tensorflow/lite:tflite_with_xnnpack` target into your application alongside
the TensorFlow Lite framework.

This method works on platforms which support POSIX-style weak symbols (Android,
iOS, Linux, Mac, but **NOT** Windows).

### Enable XNNPACK via low-level delegate API (not recommended)

While it is possible to use low-level delegate API to enable XNNPACK, this
method is **NOT RECOMMENDED** unless you need to use TensorFlow Lite both with
and without XNNPACK (e.g. for benchmarking).

With low-level delegate API users create an XNNPACK delegate with the
`TfLiteXNNPackDelegateCreate` function, and then call
`Interpreter::ModifyGraphWithDelegate` to delegate supported parts of
the model to the XNNPACK delegate. The users must destroy the delegate with
`TfLiteXNNPackDelegateDelete` **after** releasing the TensorFlow Lite
interpreter. The snippet below illustrates the typical usage:

```c++
// Build the interpreter
std::unique_ptr<tflite::Interpreter> interpreter;
...

// IMPORTANT: initialize options with TfLiteXNNPackDelegateOptionsDefault() for
// API-compatibility with future extensions of the TfLiteXNNPackDelegateOptions
// structure.
TfLiteXNNPackDelegateOptions xnnpack_options =
    TfLiteXNNPackDelegateOptionsDefault();
xnnpack_options.num_threads = num_threads;

TfLiteDelegate* xnnpack_delegate =
    TfLiteXNNPackDelegateCreate(&xnnpack_options);
if (interpreter->ModifyGraphWithDelegate(xnnpack_delegate) != kTfLiteOk) {
  // Report error and fall back to another delegate, or the default backend
}

...

// Run inference using XNNPACK
interpreter->Invoke()

...

// IMPORTANT: release the interpreter before destroying the delegate
interpreter.reset();
TfLiteXNNPackDelegateDelete(xnnpack_delegate);
```

## Limitations and supported operators

XNNPACK delegate is a work-in-progress, and currently supports a limited set of
operators. Unsupported operators will fall back to the default implementations,
so models using a combination of supported and unsupported operators can still
benefit from XNNPACK delegate.

### Floating-Point Operators

Below is the list of currently supported floating-point operators:

#### `ABS`

* Inputs and outputs must be in 32-bit floating-point format.

#### `ADD`

* Inputs and outputs must be in 32-bit floating-point format.
* Only addition with two inputs is supported.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `AVERAGE_POOL_2D`

* Inputs and outputs must be in 32-bit floating-point format.
* 1x1 pooling with non-unit stride is not supported.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `CEIL`

* Inputs and outputs must be in 32-bit floating-point format.

#### `CONV_2D`

* Inputs and outputs must be in 32-bit floating-point format.
* Bias is mandatory.
* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `DEPTH_TO_SPACE`

* Inputs and outputs must be in 32-bit floating-point format.
* Block size must be greater than 1.

#### `DEPTHWISE_CONV_2D`

* Inputs and outputs must be in 32-bit floating-point format.
* Bias is mandatory.
* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `DIV`

* Inputs and outputs must be in 32-bit floating-point format.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `ELU`

* Inputs and outputs must be in 32-bit floating-point format.

#### `FULLY_CONNECTED`

* Inputs and outputs must be in 32-bit floating-point format.
* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `FLOOR`

* Inputs and outputs must be in 32-bit floating-point format.

#### `HARD_SWISH`

* Inputs and outputs must be in 32-bit floating-point format.

#### `LEAKY_RELU`

* Inputs and outputs must be in 32-bit floating-point format.

#### `LOGISTIC`

* Inputs and outputs must be in 32-bit floating-point format.

#### `MAX_POOL_2D`

* Inputs and outputs must be in 32-bit floating-point format.
* 1x1 pooling with non-unit stride is not supported.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `MAXIMUM`

* Inputs and outputs must be in 32-bit floating-point format.

#### `MEAN`

* The first input and the output must be a 4D tensors in 32-bit
  floating-point format.
* The second input (the input with the axes specification) must be static
  (use `kTfLiteMmapRo` allocation type).
* Only [1, 2] or [2, 1] axes specification (i.e. reduction across spatial
  dimensions) is supported.
* Only `keep_dims = True` parameter value is supported.

#### `MINIMUM`

* Inputs and outputs must be in 32-bit floating-point format.

#### `MUL`

* Inputs and outputs must be in 32-bit floating-point format.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `NEG`

* Inputs and outputs must be in 32-bit floating-point format.

#### `PAD`

* The first input and the output must be in 32-bit floating-point format.
* The second input (the input with the padding specification) must be static
  (use `kTfLiteMmapRo` allocation type).
* The numbers of padding elements must be non-negative.

#### `PRELU`

* Inputs and outputs must be in 32-bit floating-point format.
* Slope must be static (use `kTfLiteMmapRo` allocation type).
* Slope must be either a 1D tensor, or have all its non-channel dimensions equal
  1.

#### `RELU`

* Inputs and outputs must be in 32-bit floating-point format.

#### `RELU6`

* Inputs and outputs must be in 32-bit floating-point format.

#### `RELU_N1_TO_1`

* Inputs and outputs must be in 32-bit floating-point format.

#### `RESHAPE`

* The first input and the output must be in 32-bit floating-point format.
* The second input (the input with the new shape specification) must be either
  static (use `kTfLiteMmapRo` allocation type), or absent (with the new shape
  specified via `ReshapeOptions` table).

#### `RESIZE_BILINEAR`

* The first input and the output must be 4D tensors in 32-bit floating-point
  format.
* The second input (the input with the new shape specification) must be
  static (use `kTfLiteMmapRo` allocation type).

#### `ROUND`

* Inputs and outputs must be in 32-bit floating-point format.

#### `SOFTMAX`

* Inputs and outputs must be in 32-bit floating-point format.
* Only `beta = 1.0` is supported.

#### `SQRT`

* Inputs and outputs must be in 32-bit floating-point format.

#### `SQUARE`

* Inputs and outputs must be in 32-bit floating-point format.

#### `SQUARED_DIFFERENCE`

* Inputs and outputs must be in 32-bit floating-point format.

#### `SUB`

* Inputs and outputs must be in 32-bit floating-point format.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

### Quantized Operators

By default, quantized inference in XNNPACK delegate is disabled, and XNNPACK is
used only for floating-point models. Support for quantized inference in XNNPACK
must be enabled by adding extra Bazel flags when building TensorFlow Lite.

* `--define xnn_enable_qs8=true` flag enables XNNPACK inference for quantized
operators using signed quantization schema. This schema is used by models
produced by [Model Optimization Toolkit](https://www.tensorflow.org/model_optimization)
through either post-training integer quantization or quantization-aware
training. Post-training dynamic range quantization is not supported in XNNPACK.

* `--define xnn_enable_qu8=true` flag enables XNNPACK inference for quantized
operators using unsigned quantization schema, produced via the legacy TensorFlow
1.X quantization tooling. This option is experimental and may perform
suboptimally on mobile processors with NEON DOT product instructions.

Below is the list of currently supported quantized operators:

#### `ADD`

* Inputs and outputs must be in 8-bit quantized format.
* Only addition with two inputs is supported.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `CONV_2D`

* Inputs and outputs must be in 8-bit quantized format (bias must be in 32-bit
  quantized format).
* Bias is mandatory.
* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
  and can use either per-tensor or per-channel quantization parameters.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `DEPTHWISE_CONV_2D`

* Inputs and outputs must be in 8-bit quantized format (bias must be in
  32-bit quantized format).
* Bias is mandatory.
* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type),
  and can use either per-tensor or per-channel quantization parameters.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `FULLY_CONNECTED`

* Inputs and outputs must be in 8-bit quantized format (bias, if present, must
  be in 32-bit quantized format).
* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type).
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `MUL`

* Inputs and outputs must be in 8-bit quantized format.
* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported,
  but fused `TANH` and `SIGN_BIT` activations are not.

#### `PAD`

* The first input and the output must be in 8-bit quantized format.
* The second input (the input with the padding specification) must be static
  (use `kTfLiteMmapRo` allocation type).
* The numbers of padding elements must be non-negative.

### Sparse Inference

XNNPACK backend supports sparse inference for CNN models described in the
[Fast Sparse ConvNets](https://arxiv.org/abs/1911.09723) paper. Sparse
inference is restricted to subgraphs with the following operators:

* Sparse subgraph must store its weights in sparse representation (using
  `DENSIFY` operators in the TensorFlow Lite schema).
* Sparse subgraph must start with a 3x3 stride-2 `CONV_2D` operator with
  padding 1 on each side, no dilation, and 3 input channels.
* Sparse subgraph must end with either a `MEAN` operator with reduction across
  spatial axes, or a `DEPTH_TO_SPACE` operator.
* Sparse subgraph may contain the following operators:
  * `CONV_2D` with 1x1 kernel and no padding. At least 2/3rd of filter weights
    in the 1x1 `CONV_2D` operators across the sparse subgraph must be zeroes
    to enable sparse inference.
  * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 1, no dilation, and padding 1
    on each side.
  * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 2, no dilation, and padding 1
    on each side.
  * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 1, no dilation, and padding 2
    on each side.
  * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 2, no dilation, and padding 2
    on each side.
  * `RESIZE_BILINEAR` operator with output dimensions greater than 1.
  * `MEAN` operator with reduction across spatial axes.
  * `ADD` and `MUL` operators where both inputs are 4D tensors. If one of the
    inputs to `ADD` or `MUL` is a constant tensor, it must be representable as
    either a scalar, or a 1D vector.
  * Unary elementwise operators `ABS`, `CEIL`, `ELU`, `FLOOR`, `HARD_SWISH`,
    `LEAKY_RELU`, `LOGISTIC`, `NEG`, `RELU`, `RELU6`, `RELU_N1_TO_1`, `ROUND`,
    `SIGMOID`, and `SQUARE`.

Pre-trained [Fast Sparse ConvNets models](https://github.com/google-research/google-research/tree/master/fastconvnets)
provide examples that satisfy these constrains.

### Other limitations

* Dynamically allocated (with `kTfLiteDynamic` allocation type) inputs and
  outputs are not supported.
* Resizing model inputs (via `Interpreter::ResizeInputTensor`) is supported, but
  cause a complete reinitialization of the delegate instance, which has
  considerable overhead.
Name		Date	Size	#Lines	LOC
..		-	-
BUILD	D	03-May-2024	30.2 KiB	1,177	1,096
README.md	D	03-May-2024	15.6 KiB	438	311
abs_test.cc	D	03-May-2024	4.1 KiB	121	87
add_test.cc	D	03-May-2024	30.9 KiB	899	748
average_pool_2d_test.cc	D	03-May-2024	15.3 KiB	425	365
binary_elementwise_tester.cc	D	03-May-2024	18.8 KiB	471	414
binary_elementwise_tester.h	D	03-May-2024	4.2 KiB	149	102
ceil_test.cc	D	03-May-2024	4.1 KiB	121	87
channelwise_quantized_conv_2d_test.cc	D	03-May-2024	24.8 KiB	620	549
channelwise_quantized_depthwise_conv_2d_test.cc	D	03-May-2024	30.3 KiB	760	673
conv_2d_test.cc	D	03-May-2024	23 KiB	650	571
conv_2d_tester.cc	D	03-May-2024	17.2 KiB	403	337
conv_2d_tester.h	D	03-May-2024	6.5 KiB	241	175
delegate_test.cc	D	03-May-2024	2.4 KiB	67	43
depth_to_space_test.cc	D	03-May-2024	5.5 KiB	157	123
depth_to_space_tester.cc	D	03-May-2024	6.5 KiB	171	130
depth_to_space_tester.h	D	03-May-2024	2.8 KiB	98	60
depthwise_conv_2d_test.cc	D	03-May-2024	23.4 KiB	657	572
depthwise_conv_2d_tester.cc	D	03-May-2024	16.2 KiB	377	312
depthwise_conv_2d_tester.h	D	03-May-2024	6.9 KiB	245	178
div_test.cc	D	03-May-2024	30.9 KiB	899	748
elu_test.cc	D	03-May-2024	4.1 KiB	121	87
floor_test.cc	D	03-May-2024	4.1 KiB	121	87
fully_connected_test.cc	D	03-May-2024	15.7 KiB	444	371
fully_connected_tester.cc	D	03-May-2024	13.9 KiB	343	277
fully_connected_tester.h	D	03-May-2024	3.9 KiB	142	95
hard_swish_test.cc	D	03-May-2024	4.2 KiB	121	87
leaky_relu_test.cc	D	03-May-2024	4.6 KiB	138	101
leaky_relu_tester.cc	D	03-May-2024	5.7 KiB	159	115
leaky_relu_tester.h	D	03-May-2024	2 KiB	70	39
logistic_test.cc	D	03-May-2024	4.3 KiB	125	91
max_pool_2d_test.cc	D	03-May-2024	15.2 KiB	425	365
maximum_test.cc	D	03-May-2024	27.6 KiB	794	658
mean_test.cc	D	03-May-2024	16.8 KiB	502	412
minimum_test.cc	D	03-May-2024	27.6 KiB	794	658
mul_test.cc	D	03-May-2024	30.9 KiB	899	748
neg_test.cc	D	03-May-2024	4.1 KiB	121	87
pad_test.cc	D	03-May-2024	9.9 KiB	280	222
pad_tester.cc	D	03-May-2024	7.1 KiB	192	147
pad_tester.h	D	03-May-2024	2.6 KiB	88	55
pool_2d_tester.cc	D	03-May-2024	8 KiB	201	151
pool_2d_tester.h	D	03-May-2024	4.8 KiB	178	126
prelu_test.cc	D	03-May-2024	19.9 KiB	605	481
prelu_tester.cc	D	03-May-2024	10.6 KiB	265	218
prelu_tester.h	D	03-May-2024	2.7 KiB	97	60
quantization_util.cc	D	03-May-2024	1.6 KiB	45	23
quantization_util.h	D	03-May-2024	1.8 KiB	46	15
quantization_util_test.cc	D	03-May-2024	2.6 KiB	84	53
quantized_binary_elementwise_tester.cc	D	03-May-2024	11.2 KiB	286	234
quantized_binary_elementwise_tester.h	D	03-May-2024	5.4 KiB	181	127
quantized_conv_2d_tester.cc	D	03-May-2024	11.6 KiB	273	224
quantized_conv_2d_tester.h	D	03-May-2024	8.2 KiB	283	205
quantized_depthwise_conv_2d_tester.cc	D	03-May-2024	11.7 KiB	278	229
quantized_depthwise_conv_2d_tester.h	D	03-May-2024	8.6 KiB	295	215
quantized_fully_connected_tester.cc	D	03-May-2024	10 KiB	248	199
quantized_fully_connected_tester.h	D	03-May-2024	5.4 KiB	186	129
quantized_pad_tester.cc	D	03-May-2024	8 KiB	205	160
quantized_pad_tester.h	D	03-May-2024	3.5 KiB	120	80
reduce_tester.cc	D	03-May-2024	6.6 KiB	176	132
reduce_tester.h	D	03-May-2024	3.5 KiB	118	81
relu6_test.cc	D	03-May-2024	4.1 KiB	121	87
relu_n1_to_1_test.cc	D	03-May-2024	4.2 KiB	121	87
relu_test.cc	D	03-May-2024	4.1 KiB	121	87
reshape_test.cc	D	03-May-2024	8 KiB	226	178
reshape_tester.cc	D	03-May-2024	6.9 KiB	186	140
reshape_tester.h	D	03-May-2024	2.6 KiB	88	54
resize_bilinear_test.cc	D	03-May-2024	4 KiB	120	89
resize_bilinear_tester.cc	D	03-May-2024	7.2 KiB	188	143
resize_bilinear_tester.h	D	03-May-2024	3.3 KiB	116	75
round_test.cc	D	03-May-2024	4.1 KiB	121	87
signed_quantized_add_test.cc	D	03-May-2024	42.6 KiB	1,068	930
signed_quantized_conv_2d_test.cc	D	03-May-2024	19.7 KiB	503	445
signed_quantized_depthwise_conv_2d_test.cc	D	03-May-2024	24 KiB	605	535
signed_quantized_fully_connected_test.cc	D	03-May-2024	17.6 KiB	434	373
signed_quantized_mul_test.cc	D	03-May-2024	42.6 KiB	1,067	930
signed_quantized_pad_test.cc	D	03-May-2024	10.2 KiB	280	222
softmax_test.cc	D	03-May-2024	4.7 KiB	141	103
softmax_tester.cc	D	03-May-2024	5.8 KiB	161	117
softmax_tester.h	D	03-May-2024	2 KiB	72	41
sqrt_test.cc	D	03-May-2024	4.1 KiB	121	87
square_test.cc	D	03-May-2024	4.1 KiB	121	87
squared_difference_test.cc	D	03-May-2024	28.5 KiB	794	658
sub_test.cc	D	03-May-2024	30.9 KiB	899	748
test_util.cc	D	03-May-2024	1.7 KiB	49	25
test_util.h	D	03-May-2024	1.1 KiB	33	11
unary_elementwise_tester.cc	D	03-May-2024	6.5 KiB	186	143
unary_elementwise_tester.h	D	03-May-2024	2.3 KiB	71	40
unsigned_quantized_add_test.cc	D	03-May-2024	43.9 KiB	1,122	984
unsigned_quantized_conv_2d_test.cc	D	03-May-2024	21.8 KiB	542	484
unsigned_quantized_depthwise_conv_2d_test.cc	D	03-May-2024	26.7 KiB	656	586
unsigned_quantized_fully_connected_test.cc	D	03-May-2024	19.8 KiB	476	415
unsigned_quantized_mul_test.cc	D	03-May-2024	43.9 KiB	1,121	984
unsigned_quantized_pad_test.cc	D	03-May-2024	10.5 KiB	293	235
xnnpack_delegate.cc	D	03-May-2024	166.1 KiB	3,913	3,406
xnnpack_delegate.h	D	03-May-2024	2.4 KiB	62	19