• Home
Name Date Size #Lines LOC

..--

bench/03-May-2024-13,80410,536

cmake/03-May-2024-1,6881,564

eval/03-May-2024-589479

include/03-May-2024-1,301901

models/03-May-2024-10,1469,460

scripts/03-May-2024-1,8491,257

src/03-May-2024-327,673238,635

test/03-May-2024-274,403250,998

third_party/03-May-2024-1,9581,797

tools/03-May-2024-6,6875,719

.bazelrcD03-May-20241.2 KiB4735

.gitignoreD03-May-2024485 3532

Android.bpD03-May-202474.5 KiB2,1562,084

BUILD.bazelD03-May-2024105 KiB3,3383,115

CMakeLists.txtD03-May-2024108.8 KiB2,6172,454

CONTRIBUTING.mdD03-May-20241.1 KiB2920

LICENSED03-May-20241.5 KiB3224

METADATAD03-May-2024738 2019

MODULE_LICENSE_BSDD03-May-20240

NOTICED03-May-20241.5 KiB3224

OWNERSD03-May-2024219 1211

README.mdD03-May-20244.9 KiB9067

TEST_MAPPINGD03-May-20241.5 KiB8382

WORKSPACED03-May-20243.3 KiB9482

build_defs.bzlD03-May-202410.9 KiB314286

emscripten.bzlD03-May-20241.1 KiB3833

preamble.js.ldsD03-May-2024393 108

README.md

1# XNNPACK
2
3XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 platforms. XNNPACK is not intended for direct use by deep learning practitioners and researchers; instead it provides low-level performance primitives for accelerating high-level machine learning frameworks, such as [MediaPipe](https://mediapipe.dev), [TensorFlow Lite](https://www.tensorflow.org/lite), and [TensorFlow.js](https://www.tensorflow.org/js).
4
5## Supported Architectures
6
7- ARM64 on Android, Linux, and iOS (including WatchOS and tvOS)
8- ARMv7 (with NEON) on Android, Linux, and iOS (including WatchOS)
9- WebAssembly MVP
10- WebAssembly SIMD (experimental)
11- x86 and x86-64 (up to AVX512) on Android, Linux, macOS, and iOS simulator
12
13## Operator Coverage
14
15XNNPACK implements the following neural network operators:
16
17- 2D Convolution (including grouped and depthwise)
18- 2D Deconvolution (AKA Transposed Convolution)
19- 2D Average Pooling
20- 2D Max Pooling
21- 2D ArgMax Pooling (Max Pooling + indices)
22- 2D Unpooling
23- 2D Bilinear Resize
24- Add (including broadcasting, two inputs only)
25- Subtract (including broadcasting)
26- Divide (including broadcasting)
27- Maximum (including broadcasting)
28- Minimum (including broadcasting)
29- Multiply (including broadcasting)
30- Global Average Pooling
31- Channel Shuffle
32- Fully Connected
33- Clamp (includes ReLU and ReLU6)
34- HardSwish
35- Sigmoid
36- Softmax
37- PReLU
38
39All operators in XNNPACK support NHWC layout, but additionally allow custom stride along the **C**hannel dimension. Thus, operators can consume a subset of channels in the input tensor, and produce a subset of channels in the output tensor, providing a zero-cost Channel Split and Channel Concatenation operations.
40
41## Performance
42
43### Mobile phones
44
45The table below presents **single-threaded** performance of XNNPACK library on three generations of MobileNet models and three generations of Pixel phones.
46
47| Model              | Pixel, ms | Pixel 2, ms | Pixel 3a, ms |
48| ------------------ | :-------: | :---------: | :----------: |
49| MobileNet v1 1.0X  |    81     |      89     |      88      |
50| MobileNet v2 1.0X  |    48     |      55     |      54      |
51| MobileNet v3 Large |    40     |      44     |      44      |
52| MobileNet v3 Small |    12     |      14     |      14      |
53
54The following table presents **multi-threaded** (using as many threads as there are big cores) performance of XNNPACK library on three generations of MobileNet models and three generations of Pixel phones.
55
56| Model              | Pixel, ms | Pixel 2, ms | Pixel 3a, ms |
57| ------------------ | :-------: | :---------: | :----------: |
58| MobileNet v1 1.0X  |    45     |      27     |      46      |
59| MobileNet v2 1.0X  |    28     |      18     |      28      |
60| MobileNet v3 Large |    23     |      16     |      24      |
61| MobileNet v3 Small |     7     |       6     |       8      |
62
63Benchmarked on January 9, 2020 with `end2end_bench --benchmark_min_time=5` on an Android/ARM64 build (`bazel build -c opt --config android_arm64 :end2end_bench`) and neural network models with randomized weights and inputs.
64
65### Raspberry Pi
66
67The table below presents **multi-threaded** performance of XNNPACK library on three generations of MobileNet models and three generations of Raspberry Pi boards.
68
69| Model              | RPi 2 (BCM2836), ms | RPi 3+ (BCM2837B0), ms | RPi 4 (BCM2711), ms |
70| ------------------ | :-----------------: | :--------------------: | :-----------------: |
71| MobileNet v1 1.0X  |         380         |          115           |          76         |
72| MobileNet v2 1.0X  |         217         |           80           |          45         |
73| MobileNet v3 Large |         180         |           67           |          41         |
74| MobileNet v3 Small |          57         |           23           |          15         |
75
76Benchmarked on January 9, 2020 with `end2end-bench --benchmark_min_time=5` on a Raspbian Buster build with CMake (`./scripts/build-local.sh`) and neural network models with randomized weights and inputs.
77
78## Publications
79
80- Marat Dukhan "The Indirect Convolution Algorithm". Presented on [Efficient Deep Learning for Compute Vision (ECV) 2019](https://sites.google.com/corp/view/ecv2019/) workshop ([slides](https://drive.google.com/file/d/1ZayB3By5ZxxQIRtN7UDq_JvPg1IYd3Ac/view), [paper on ArXiv](https://arxiv.org/abs/1907.02129)).
81- Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan "Fast Sparse ConvNets".
82  [Paper on ArXiv](https://arxiv.org/abs/1911.09723), [pre-trained sparse
83  models](https://github.com/google-research/google-research/tree/master/fastconvnets).
84- Marat Dukhan, Artsiom Ablavatski "The Two-Pass Softmax Algorithm".
85  [Paper on ArXiv](https://arxiv.org/abs/2001.04438).
86
87## Acknowledgements
88
89XNNPACK is a based on [QNNPACK](https://github.com/pytorch/QNNPACK) library. Unlike QNNPACK, XNNPACK focuses entirely on floating-point operators, and its API is no longer compatible with QNNPACK.
90