README.md
1# XNNPACK backend for TensorFlow Lite 2 3XNNPACK is a highly optimized library of neural network inference operators for 4ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, macOS, 5and Emscripten environments. This document describes how to use the XNNPACK 6library as an inference engine for TensorFlow Lite. 7 8## Using XNNPACK engine with TensorFlow Lite interpreter 9 10XNNPACK integrates with TensorFlow Lite interpreter through the delegation 11mechanism. TensorFlow Lite supports several methods to enable XNNPACK 12for floating-point inference. 13 14### Enable XNNPACK via Java API on Android (recommended on Android) 15 16Pre-built [nightly TensorFlow Lite binaries for Android](https://www.tensorflow.org/lite/guide/android#use_the_tensorflow_lite_aar_from_mavencentral) 17include XNNPACK, albeit it is disabled by default. Use the `setUseXNNPACK` 18method in `Interpreter.Options` class to enable it: 19 20```java 21Interpreter.Options interpreterOptions = new Interpreter.Options(); 22interpreterOptions.setUseXNNPACK(true); 23Interpreter interpreter = new Interpreter(model, interpreterOptions); 24``` 25 26### Enable XNNPACK via Swift/Objective-C API on iOS (recommended on iOS) 27 28Pre-built [nightly TensorFlow Lite CocoaPods](https://www.tensorflow.org/lite/guide/ios#specifying_versions) 29include XNNPACK, but do not enable it by default. Swift developers can use 30`InterpreterOptions` object to enable XNNPACK: 31 32```swift 33var options = InterpreterOptions() 34options.isXNNPackEnabled = true 35var interpreter = try Interpreter(modelPath: "model/path", options: options) 36``` 37 38Objective-C developers can enable XNNPACK via a new property in the 39`TFLInterpreterOptions` class: 40 41```objc 42TFLInterpreterOptions *options = [[TFLInterpreterOptions alloc] init]; 43options.useXNNPACK = YES; 44NSError *error; 45TFLInterpreter *interpreter = 46 [[TFLInterpreter alloc] initWithModelPath:@"model/path" 47 options:options 48 error:&error]; 49``` 50 51### Enable XNNPACK via Bazel build flags (recommended on desktop) 52 53When building TensorFlow Lite with Bazel, add 54`--define tflite_with_xnnpack=true`, and the TensorFlow Lite interpreter will 55use XNNPACK engine by default. 56 57The exact command depends on the target platform, e.g. for Android AAR you'd use 58 59``` 60bazel build -c opt --fat_apk_cpu=x86,x86_64,arm64-v8a,armeabi-v7a \ 61 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \ 62 --define tflite_with_xnnpack=true \ 63 //tensorflow/lite/java:tensorflow-lite 64``` 65 66Note that in this case `Interpreter::SetNumThreads` invocation does not take 67effect on number of threads used by XNNPACK engine. In order to specify number 68of threads available for XNNPACK engine you should manually pass the value when 69constructing the interpreter. The snippet below illustrates this assuming you 70are using `InterpreterBuilder` to construct the interpreter: 71 72```c++ 73// Load model 74tflite::Model* model; 75... 76 77// Construct the interprepter 78tflite::ops::builtin::BuiltinOpResolver resolver; 79std::unique_ptr<tflite::Interpreter> interpreter; 80 81TfLiteStatus res = tflite::InterpreterBuilder(model, resolver, num_threads); 82``` 83 84**XNNPACK engine used by TensorFlow Lite interpreter uses a single thread for 85inference by default.** 86 87### Enable XNNPACK via additional dependency 88 89Another way to enable XNNPACK is to build and link the 90`//tensorflow/lite:tflite_with_xnnpack` target into your application alongside 91the TensorFlow Lite framework. 92 93This method works on platforms which support POSIX-style weak symbols (Android, 94iOS, Linux, Mac, but **NOT** Windows). 95 96### Enable XNNPACK via low-level delegate API (not recommended) 97 98While it is possible to use low-level delegate API to enable XNNPACK, this 99method is **NOT RECOMMENDED** unless you need to use TensorFlow Lite both with 100and without XNNPACK (e.g. for benchmarking). 101 102With low-level delegate API users create an XNNPACK delegate with the 103`TfLiteXNNPackDelegateCreate` function, and then call 104`Interpreter::ModifyGraphWithDelegate` to delegate supported parts of 105the model to the XNNPACK delegate. The users must destroy the delegate with 106`TfLiteXNNPackDelegateDelete` **after** releasing the TensorFlow Lite 107interpreter. The snippet below illustrates the typical usage: 108 109```c++ 110// Build the interpreter 111std::unique_ptr<tflite::Interpreter> interpreter; 112... 113 114// IMPORTANT: initialize options with TfLiteXNNPackDelegateOptionsDefault() for 115// API-compatibility with future extensions of the TfLiteXNNPackDelegateOptions 116// structure. 117TfLiteXNNPackDelegateOptions xnnpack_options = 118 TfLiteXNNPackDelegateOptionsDefault(); 119xnnpack_options.num_threads = num_threads; 120 121TfLiteDelegate* xnnpack_delegate = 122 TfLiteXNNPackDelegateCreate(&xnnpack_options); 123if (interpreter->ModifyGraphWithDelegate(xnnpack_delegate) != kTfLiteOk) { 124 // Report error and fall back to another delegate, or the default backend 125} 126 127... 128 129// Run inference using XNNPACK 130interpreter->Invoke() 131 132... 133 134// IMPORTANT: release the interpreter before destroying the delegate 135interpreter.reset(); 136TfLiteXNNPackDelegateDelete(xnnpack_delegate); 137``` 138 139### Using the XNNPACK weights cache 140 141XNNPACK internally packs static weights for operations (like convolutions) in 142order to make accessing weights more memory friendly. XNNPACK needs to allocate 143memory internally to hold these packed weights. If you are starting multiple 144TFLite interpreter instances based on the same model, there can be multiple 145copies of the same packed weights in each instance. This can cause high memory 146usage. The weights cache can be used to share packed weights between multiple 147TFLite instances. 148 149```c++ 150// Create 2 interpreters which share the same model. 151std::unique_ptr<tflite::Interpreter> interpreter1; 152std::unique_ptr<tflite::Interpreter> interpreter2; 153 154// Create a weights cache that you can pass to XNNPACK delegate. 155TfLiteXNNPackDelegateWeightsCache* weights_cache = 156 TfLiteXNNPackDelegateWeightsCacheCreate(); 157 158// Like using the low-level API above, initialize options, and pass this cache 159// to XNNPACK delegate via the options. 160TfLiteXNNPackDelegateOptions xnnpack_options = 161 TfLiteXNNPackDelegateOptionsDefault(); 162xnnpack_options.weights_cache = weights_cache; 163 164// Modify graph with delegate, as above... 165TfLiteDelegate* delegate1 = TfLiteXNNPackDelegateCreate(&xnnpack_options); 166if (interpreter1->ModifyGraphWithDelegate(delegate1) != kTfLiteOk) { 167 // Static weights will be packed and written into weights_cache. 168} 169TfLiteDelegate* delegate2 = TfLiteXNNPackDelegateCreate(&xnnpack_options); 170if (interpreter1->ModifyGraphWithDelegate(delegate2) != kTfLiteOk) { 171 // XNNPACK will reuse packed weights if they can be found in the weights 172 // cache. 173} 174 175// Finalize the weights cache. 176// Hard finalization has the lowest memory overhead, but requires that all 177// TFLite interpreter instances must be created up front before any finalization 178// and inference. 179TfLiteXNNPackDelegateWeightsCacheFinalizeHard(weights_cache); 180 181// Alternatively, soft-finalizate the weights cache. This is useful if more 182// delegates using the same model will to be created after finalization. 183// TfLiteXNNPackDelegateWeightsCacheFinalizeSoft(weights_cache); 184 185// Later, after all the interpreters and XNNPACK delegates using the cache are 186// destroyed, release the weights cache. 187TfLiteXNNPackDelegateWeightsCacheDelete(weights_cache); 188``` 189 190The weights cache is a contents-based cache. Every time XNNPACK has to pack 191weights, it first packs into a temporary buffer, then tries to look up if the 192packed weights can be found in the weights cache, based on the contents of the 193packed weights. If it can be found, we access the packed weights in the 194cache for subsequent operations, and the temporary buffer is freed. Otherwise, 195the packed weights is added to the cache. 196 197The weights cache has to be finalized before any inference, it will be an error 198otherwise. Hard finalization and soft finalization depends on whether new 199XNNPACK delegate instances will be created after finalization. Hard finalization 200does not allow new instances to be created, and has lower memory overhead. Soft 201finalization allows new instances to be created, and has higher memory overhead 202(up to the size of the largest packed weights, rounded up to page alignment). 203 204## Profiling 205When TfLite profiling is enabled, XNNPACK will time each operator and report the 206results to TfLite which will print them as part of the overall execution profile. 207 208## Limitations and supported operators 209 210XNNPACK delegate is a work-in-progress, and currently supports a limited set of 211operators. Unsupported operators will fall back to the default implementations, 212so models using a combination of supported and unsupported operators can still 213benefit from XNNPACK delegate. 214 215### Floating-Point (IEEE FP32) Operators 216 217Below is the list of currently supported floating-point operators: 218 219#### `ABS` 220 221* Inputs and outputs must be in 32-bit floating-point format. 222 223#### `ADD` 224 225* Inputs and outputs must be in 32-bit floating-point format. 226* Only addition with two inputs is supported. 227* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 228 but fused `TANH` and `SIGN_BIT` activations are not. 229 230#### `AVERAGE_POOL_2D` 231 232* Inputs and outputs must be in 32-bit floating-point format. 233* 1x1 pooling with non-unit stride is not supported. 234* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 235 but fused `TANH` and `SIGN_BIT` activations are not. 236 237#### `CEIL` 238 239* Inputs and outputs must be in 32-bit floating-point format. 240 241#### `CONCATENATION` 242 243* Inputs and outputs must be in 32-bit floating-point format. 244* Only concatenation with two, three, or four inputs is supported. 245 246#### `CONV_2D` 247 248* Inputs and outputs must be in 32-bit floating-point format. 249* Bias is mandatory. 250* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type). 251* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 252 but fused `TANH` and `SIGN_BIT` activations are not. 253 254#### `DEPTH_TO_SPACE` 255 256* Inputs and outputs must be in 32-bit floating-point format. 257* Block size must be greater than 1. 258 259#### `DEPTHWISE_CONV_2D` 260 261* Inputs and outputs must be in 32-bit floating-point format. 262* Bias is mandatory. 263* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type). 264* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 265 but fused `TANH` and `SIGN_BIT` activations are not. 266 267#### `DIV` 268 269* Inputs and outputs must be in 32-bit floating-point format. 270* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 271 but fused `TANH` and `SIGN_BIT` activations are not. 272 273#### `ELU` 274 275* Inputs and outputs must be in 32-bit floating-point format. 276 277#### `FULLY_CONNECTED` 278 279* Inputs and outputs must be in 32-bit floating-point format. 280* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type). 281* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 282 but fused `TANH` and `SIGN_BIT` activations are not. 283 284#### `FLOOR` 285 286* Inputs and outputs must be in 32-bit floating-point format. 287 288#### `HARD_SWISH` 289 290* Inputs and outputs must be in 32-bit floating-point format. 291 292#### `LEAKY_RELU` 293 294* Inputs and outputs must be in 32-bit floating-point format. 295 296#### `LOGISTIC` 297 298* Inputs and outputs must be in 32-bit floating-point format. 299 300#### `MAX_POOL_2D` 301 302* Inputs and outputs must be in 32-bit floating-point format. 303* 1x1 pooling with non-unit stride is not supported. 304* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 305 but fused `TANH` and `SIGN_BIT` activations are not. 306 307#### `MAXIMUM` 308 309* Inputs and outputs must be in 32-bit floating-point format. 310 311#### `MEAN` 312 313* The first input and the output must be 4D tensors in 32-bit 314 floating-point format. 315* The second input (the input with the axes specification) must be static 316 (use `kTfLiteMmapRo` allocation type). 317* Only [1, 2], [2, 1], and [2] axes specification (i.e. reduction across either 318 both spatial dimensions or across the width dimension) is supported. 319 320#### `MINIMUM` 321 322* Inputs and outputs must be in 32-bit floating-point format. 323 324#### `MUL` 325 326* Inputs and outputs must be in 32-bit floating-point format. 327* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 328 but fused `TANH` and `SIGN_BIT` activations are not. 329 330#### `NEG` 331 332* Inputs and outputs must be in 32-bit floating-point format. 333 334#### `PAD` 335 336* The first input and the output must be in 32-bit floating-point format. 337* The second input (the input with the padding specification) must be static 338 (use `kTfLiteMmapRo` allocation type). 339* The numbers of padding elements must be non-negative. 340 341#### `PRELU` 342 343* Inputs and outputs must be in 32-bit floating-point format. 344* Slope must be static (use `kTfLiteMmapRo` allocation type). 345* Slope must be either a 1D tensor, or have all its non-channel dimensions equal 346 1. 347 348#### `RELU` 349 350* Inputs and outputs must be in 32-bit floating-point format. 351 352#### `RELU6` 353 354* Inputs and outputs must be in 32-bit floating-point format. 355 356#### `RELU_N1_TO_1` 357 358* Inputs and outputs must be in 32-bit floating-point format. 359 360#### `RESHAPE` 361 362* The first input and the output must be in 32-bit floating-point format. 363* The second input (the input with the new shape specification) must be either 364 static (use `kTfLiteMmapRo` allocation type), or absent (with the new shape 365 specified via `ReshapeOptions` table). 366 367#### `RESIZE_BILINEAR` 368 369* The first input and the output must be 4D tensors in 32-bit floating-point 370 format. 371* The second input (the input with the new shape specification) must be 372 static (use `kTfLiteMmapRo` allocation type). 373 374#### `ROUND` 375 376* Inputs and outputs must be in 32-bit floating-point format. 377 378#### `SPLIT` 379 380* Inputs and outputs must be in 32-bit floating-point format. 381* Only split into two, three, or four outputs is supported. 382 383#### `SOFTMAX` 384 385* Inputs and outputs must be in 32-bit floating-point format. 386* Only `beta = 1.0` is supported. 387 388#### `SQRT` 389 390* Inputs and outputs must be in 32-bit floating-point format. 391 392#### `SQUARE` 393 394* Inputs and outputs must be in 32-bit floating-point format. 395 396#### `SQUARED_DIFFERENCE` 397 398* Inputs and outputs must be in 32-bit floating-point format. 399 400#### `SUB` 401 402* Inputs and outputs must be in 32-bit floating-point format. 403* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 404 but fused `TANH` and `SIGN_BIT` activations are not. 405 406#### `TRANSPOSE` 407 408* The first input and the output must be in 32-bit floating-point format. 409* The second input (the input with the permutation specification) must be 410 static (use `kTfLiteMmapRo` allocation type). 411 412#### `TRANSPOSE_CONV` 413 414* Input, filter, bias (if present) and output tensors must be in 32-bit 415 floating-point format. 416* Output size, filter and bias (if present) must be static (use 417 `kTfLiteMmapRo` allocation type). 418 419### Floating-Point (IEEE FP16) Operators (experimental) 420 421XNNPACK supports half-precision (using IEEE FP16 format) inference for a subset 422of floating-point operators. XNNPACK automatically enables half-precision 423inference when the following conditions are met: 424 425* XNNPACK runs on hardware that natively supports computations in IEEE FP16 426format. Currently, this hardware is limited to ARM64 devices with ARMv8.2 FP16 427arithmetics extension, and includes Android phones starting with Pixel 3, 428Galaxy S9 (Snapdragon SoC), Galaxy S10 (Exynos SoC), iOS devices with A11 or 429newer SoCs, and all Apple Silicon Macs. 430 431* IEEE FP16 inference is supported for every floating-point operator in the 432model. 433 434* The model's "reduced_precision_support" metadata indicates that the model 435is compatible with FP16 inference. 436 437When the above conditions are met, XNNPACK replace FP32 operators with their 438FP16 equivalents, and insert additional operators to convert model inputs 439from FP32 to FP16 and convert model outputs back from FP16 to FP32. If the 440above conditions are not met, XNNPACK will perform model inference with FP32 441calculations. 442 443Additionally, XNNPACK delegate provides an option to force FP16 inference 444regardless of model metadata. This option is intended for development workflows, 445and in particular for testing end-to-end accuracy of model when FP16 inference 446is used. Forcing FP16 inference has several effects: 447 448* Besides ARM64 devices with ARMv8.2 FP16 arithmetics extension, forced FP16 449inference is supported on x86/x86-64 devices with AVX2 extension in emulation 450mode: all elementary floating-point operations are computed in FP32, then 451converted to FP16 and back to FP32. Note that such simulation is not exactly 452equivalent to native FP16 inference, but simulates the effects of restricted 453mantissa precision and exponent range in the native FP16 arithmetics. 454 455* On devices that support neither the native FP16 arithmetics (ARM64 devices 456with ARMv8.2 FP16 arithmetics extension), nor emulation (x86/x86-64 devices with 457AVX2 extension), inference will fail rather than fall back to FP32. 458 459* If any floating-point operator offloaded to XNNPACK is not supported for FP16 460inference, inference will fail rather than fall back to FP32. 461 462To force FP16 inference, either build the delegate with 463`--define xnnpack_force_float_precision=fp16` option, or add 464`TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16` flag to the 465`TfLiteXNNPackDelegateOptions.flags` bitmask passed into 466the `TfLiteXNNPackDelegateCreate` call: 467 468```c 469TfLiteXNNPackDelegateOptions xnnpack_options = 470 TfLiteXNNPackDelegateOptionsDefault(); 471... 472xnnpack_options.flags |= TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16; 473TfLiteDelegate* xnnpack_delegate = 474 TfLiteXNNPackDelegateCreate(&xnnpack_options); 475``` 476 477Below is the list of operators supported in IEEE FP16 inference: 478 479#### `ABS` 480 481* Must satisfy constraints on the floating-point (FP32) operator. 482 483#### `ADD` 484 485* Must satisfy constraints on the floating-point (FP32) operator. 486* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 487 488#### `AVERAGE_POOL_2D` 489 490* Must satisfy constraints on the floating-point (FP32) operator. 491 492#### `CEIL` 493 494* Must satisfy constraints on the floating-point (FP32) operator. 495 496#### `CONV_2D` 497 498* Must satisfy constraints on the floating-point (FP32) operator. 499 500#### `CONCATENATION` 501 502* Must satisfy constraints on the floating-point (FP32) operator. 503* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 504 505#### `DEPTH_TO_SPACE` 506 507* Must satisfy constraints on the floating-point (FP32) operator. 508 509#### `DEPTHWISE_CONV_2D` 510 511* Must satisfy constraints on the floating-point (FP32) operator. 512 513#### `DIV` 514 515* Must satisfy constraints on the floating-point (FP32) operator. 516* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 517 518#### `FLOOR` 519 520* Must satisfy constraints on the floating-point (FP32) operator. 521 522#### `FULLY_CONNECTED` 523 524* Must satisfy constraints on the floating-point (FP32) operator. 525 526#### `HARD_SWISH` 527 528* Must satisfy constraints on the floating-point (FP32) operator. 529 530#### `LEAKY_RELU` 531 532* Must satisfy constraints on the floating-point (FP32) operator. 533 534#### `LOGISTIC` 535 536* Must satisfy constraints on the floating-point (FP32) operator. 537 538#### `MAX_POOL_2D` 539 540* Must satisfy constraints on the floating-point (FP32) operator. 541 542#### `MAXIMUM` 543 544* Must satisfy constraints on the floating-point (FP32) operator. 545* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 546 547#### `MEAN` 548 549* Must satisfy constraints on the floating-point (FP32) operator. 550 551#### `MINIMUM` 552 553* Must satisfy constraints on the floating-point (FP32) operator. 554* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 555 556#### `MUL` 557 558* Must satisfy constraints on the floating-point (FP32) operator. 559* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 560 561#### `NEG` 562 563* Must satisfy constraints on the floating-point (FP32) operator. 564 565#### `PAD` 566 567* Must satisfy constraints on the floating-point (FP32) operator. 568 569#### `PRELU` 570 571* Must satisfy constraints on the floating-point (FP32) operator. 572 573#### `RELU` 574 575* Must satisfy constraints on the floating-point (FP32) operator. 576 577#### `RELU6` 578 579* Must satisfy constraints on the floating-point (FP32) operator. 580 581#### `RELU_N1_TO_1` 582 583* Must satisfy constraints on the floating-point (FP32) operator. 584 585#### `RESHAPE` 586 587* Must satisfy constraints on the floating-point (FP32) operator. 588 589#### `RESIZE_BILINEAR` 590 591* Must satisfy constraints on the floating-point (FP32) operator. 592 593#### `ROUND` 594 595* Must satisfy constraints on the floating-point (FP32) operator. 596 597#### `SPLIT` 598 599* Must satisfy constraints on the floating-point (FP32) operator. 600 601#### `SOFTMAX` 602 603* Must satisfy constraints on the floating-point (FP32) operator. 604 605#### `SQRT` 606 607* Must satisfy constraints on the floating-point (FP32) operator. 608 609#### `SQUARE` 610 611* Must satisfy constraints on the floating-point (FP32) operator. 612 613#### `SQUARED_DIFFERENCE` 614 615* Must satisfy constraints on the floating-point (FP32) operator. 616* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 617 618#### `SUB` 619 620* Must satisfy constraints on the floating-point (FP32) operator. 621* Neither of the inputs can be static (use `kTfLiteMmapRo` allocation type). 622 623#### `TRANSPOSE` 624 625* Must satisfy constraints on the floating-point (FP32) operator. 626 627#### `TRANSPOSE_CONV` 628 629* Must satisfy constraints on the floating-point (FP32) operator. 630 631### Quantized Operators 632 633By default, quantized inference in XNNPACK delegate is disabled, and XNNPACK is 634used only for floating-point models. Support for quantized inference in XNNPACK 635must be enabled by adding extra Bazel flags when building TensorFlow Lite. 636 637* `--define tflite_with_xnnpack_qs8=true` flag enables XNNPACK inference for 638 quantized operators using signed quantization schema. This schema is used by 639 models produced by [Model Optimization 640 Toolkit](https://www.tensorflow.org/model_optimization) through either 641 post-training integer quantization or quantization-aware training. 642 Post-training dynamic range quantization is not supported in XNNPACK. 643 644* `--define tflite_with_xnnpack_qu8=true` flag enables XNNPACK inference for 645 quantized operators using unsigned quantization schema, produced via the 646 legacy TensorFlow 1.X quantization tooling. This option is experimental and 647 may perform suboptimally on mobile processors with NEON DOT product 648 instructions. 649 650Below is the list of currently supported quantized operators: 651 652#### `ADD` 653 654* Inputs and outputs must be in 8-bit quantized format. 655* Only addition with two inputs is supported. 656* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 657 but fused `TANH` and `SIGN_BIT` activations are not. 658 659#### `CONCATENATION` 660 661* Inputs and outputs must be in 8-bit quantized format. 662* Only concatenation with two, three, or four inputs is supported. 663 664#### `CONV_2D` 665 666* Inputs and outputs must be in 8-bit quantized format (bias must be in 32-bit 667 quantized format). 668* Bias is mandatory. 669* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type), 670 and can use either per-tensor or per-channel quantization parameters. 671* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 672 but fused `TANH` and `SIGN_BIT` activations are not. 673 674#### `DEPTH_TO_SPACE` 675 676* Inputs and outputs must be in 8-bit quantized format. 677* Block size must be greater than 1. 678 679#### `DEPTHWISE_CONV_2D` 680 681* Inputs and outputs must be in 8-bit quantized format (bias must be in 682 32-bit quantized format). 683* Bias is mandatory. 684* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type), 685 and can use either per-tensor or per-channel quantization parameters. 686* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 687 but fused `TANH` and `SIGN_BIT` activations are not. 688 689#### `DEQUANTIZE` 690 691* Input tensor must be in 8-bit quantized format without per-channel 692 quantization. 693* Output tensor must be in 32-bit floating-point format. 694 695#### `ELU` 696 697* Inputs and outputs must be in 8-bit signed quantized format. 698 699#### `FULLY_CONNECTED` 700 701* Inputs and outputs must be in 8-bit quantized format (bias, if present, must 702 be in 32-bit quantized format). 703* Both filter and bias must be static (use `kTfLiteMmapRo` allocation type). 704* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 705 but fused `TANH` and `SIGN_BIT` activations are not. 706 707#### `LEAKY_RELU` 708 709* Inputs and outputs must be in 8-bit quantized format. 710* The ratio of input scale to output scale must be within [1/256, 128]. 711* The product of negative slope by the ratio of input scale to output scale 712 must be within either [-127.99609375, -1/256] range or [1/256, 128] range. 713 714#### `LOGISTIC` 715 716* Inputs and outputs must be in 8-bit quantized format. 717 718#### `MAX_POOL_2D` 719 720* Inputs and outputs must be in 8-bit quantized format. 721* 1x1 pooling with non-unit stride is not supported. 722* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 723 but fused `TANH` and `SIGN_BIT` activations are not. 724 725#### `MEAN` 726 727* The first input and the output must be 4D tensors in 8-bit quantized format. 728* The second input (the input with the axes specification) must be static 729 (use `kTfLiteMmapRo` allocation type). 730* Only [1, 2], [2, 1], and [2] axes specification (i.e. reduction across either 731 both spatial dimensions or across the width dimension) is supported. 732 733#### `MUL` 734 735* Inputs and outputs must be in 8-bit quantized format. 736* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 737 but fused `TANH` and `SIGN_BIT` activations are not. 738 739#### `PAD` 740 741* The first input and the output must be in 8-bit quantized format. 742* The second input (the input with the padding specification) must be static 743 (use `kTfLiteMmapRo` allocation type). 744* The numbers of padding elements must be non-negative. 745 746#### `QUANTIZE` 747 748* Input tensor must be in 32-bit floating-point format or in 8-bit quantized 749 format. 750* Output tensor must be in 8-bit quantized format without per-channel 751 quantization. 752* If inputs are in 8-bit quantized format, they must have the same signedness 753 as the outputs, and the ratio of input scale to output scale must be in the 754 [2**-8, 2**7] range. 755 756#### `RESIZE_BILINEAR` 757 758* The first input and the output must be 4D tensors in 8-bit quantized format. 759* The second input (the input with the new shape specification) must be 760 static (use `kTfLiteMmapRo` allocation type). 761 762#### `SPLIT` 763 764* Inputs and outputs must be in 8-bit quantized format. 765* Only split into two, three, or four outputs is supported. 766 767#### `SUB` 768 769* Inputs and outputs must be in 8-bit quantized format. 770* Fused `NONE`, `RELU`, `RELU_N1_TO_1`, and `RELU6` activations are supported, 771 but fused `TANH` and `SIGN_BIT` activations are not. 772 773#### `TRANSPOSE` 774 775* The first input and the output must be in 8-bit quantized format. 776* The second input (the input with the permutation specification) must be 777 static (use `kTfLiteMmapRo` allocation type). 778 779#### `TRANSPOSE_CONV` 780 781* Input, filter, and output tensors must be in 8-bit quantized format (bias, if 782 present, must be in 32-bit quantized format). 783* Output size, filter and bias (if present) must be static (use 784 `kTfLiteMmapRo` allocation type). 785 786### Sparse Inference 787 788XNNPACK backend supports sparse inference for CNN models described in the 789[Fast Sparse ConvNets](https://arxiv.org/abs/1911.09723) paper. Sparse 790inference is restricted to subgraphs with the following operators: 791 792* Sparse subgraph must store its weights in sparse representation (using 793 `DENSIFY` operators in the TensorFlow Lite schema). 794* Sparse subgraph must start with a 3x3 stride-2 `CONV_2D` operator with 795 padding 1 on each side, no dilation, and 3 input channels. 796* Sparse subgraph must end with either a `MEAN` operator with reduction across 797 spatial axes, or a `DEPTH_TO_SPACE` operator. 798* Sparse subgraph may contain the following operators: 799 * `CONV_2D` with 1x1 kernel and no padding. At least 2/3rd of filter weights 800 in the 1x1 `CONV_2D` operators across the sparse subgraph must be zeroes 801 to enable sparse inference. 802 * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 1, no dilation, and padding 1 803 on each side. 804 * `DEPTHWISE_CONV_2D` with 3x3 kernel, stride 2, no dilation, and padding 1 805 on each side. 806 * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 1, no dilation, and padding 2 807 on each side. 808 * `DEPTHWISE_CONV_2D` with 5x5 kernel, stride 2, no dilation, and padding 2 809 on each side. 810 * `RESIZE_BILINEAR` operator with output dimensions greater than 1. 811 * `MEAN` operator with reduction across spatial axes. 812 * `ADD` and `MUL` operators where both inputs are 4D tensors. If one of the 813 inputs to `ADD` or `MUL` is a constant tensor, it must be representable as 814 either a scalar, or a 1D vector. 815 * Unary elementwise operators `ABS`, `CEIL`, `ELU`, `FLOOR`, `HARD_SWISH`, 816 `LEAKY_RELU`, `LOGISTIC`, `NEG`, `RELU`, `RELU6`, `RELU_N1_TO_1`, `ROUND`, 817 `SIGMOID`, and `SQUARE`. 818 819Pre-trained [Fast Sparse ConvNets models](https://github.com/google-research/google-research/tree/master/fastconvnets) 820provide examples that satisfy these constrains. 821 822### Other limitations 823 824* Dynamically allocated (with `kTfLiteDynamic` allocation type) inputs and 825 outputs are not supported. 826* Resizing model inputs (via `Interpreter::ResizeInputTensor`) is supported, but 827 cause a complete reinitialization of the delegate instance, which has 828 considerable overhead. 829