• Home
Name Date Size #Lines LOC

..--

bench/04-Jul-2025-5,5164,285

cmake/04-Jul-2025-154126

deps/clog/04-Jul-2025-977762

include/04-Jul-2025-1,008861

scripts/04-Jul-2025-669384

src/04-Jul-2025-46,63039,480

test/04-Jul-2025-36,19632,072

wrappers/04-Jul-2025-538268

.gitignoreD04-Jul-2025211 2522

BUCK.ossD04-Jul-2025110 53

CMakeLists.txtD04-Jul-202534.8 KiB850765

CODE_OF_CONDUCT.mdD04-Jul-2025244 64

CONTRIBUTING.mdD04-Jul-20251.3 KiB3526

LICENSED04-Jul-20251.5 KiB3123

README.mdD04-Jul-20258 KiB190139

buckbuild.bzlD04-Jul-202522 KiB666655

configure.pyD04-Jul-202511.8 KiB282253

confu.yamlD04-Jul-2025511 1918

generate-wrapper.pyD04-Jul-20254.5 KiB133116

README.md

1# QNNPACK
2QNNPACK (Quantized Neural Networks PACKage) is a mobile-optimized library for low-precision high-performance neural network inference. QNNPACK provides implementation of common neural network operators on quantized 8-bit tensors.
3
4QNNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives for high-level deep learning frameworks. As of today, QNNPACK is integrated in [PyTorch 1.0](https://github.com/pytorch/pytorch) with Caffe2 graph representation.
5
6## Operator Coverage
7
8Currently implemented and planned for implementation operators are below:
9
10- [x] 2D Convolution
11- [x] 2D Deconvolution
12- [x] Channel Shuffle
13- [x] Fully Connected
14- [ ] Locally Connected
15- [x] 2D Max Pooling
16- [x] 2D Average Pooling
17- [x] Global Average Pooling
18- [x] Sigmoid
19- [x] TanH
20- [x] Leaky ReLU
21- [x] Hardsigmoid
22- [x] Hardswish
23- [x] Clamp (can be used for ReLU, ReLU6 if it is not fused in another operator)
24- [x] SoftArgMax (aka SoftMax)
25- [ ] Group Normalization
26
27## Building
28
29QNNPACK provides standard CMake-based build scripts.
30
31### Native compilation
32
33Users are recommended to use `scripts/build-local.sh` script to build QNNPACK for the host machine.
34
35### Cross-compilation for Android
36
37To cross-compile for Android, set `$ANDROID_NDK` environment variable (where `$ANDROID_NDK` is the path to Android NDK directory, e.g. `/opt/android-ndk-r15c`) and use one of the scripts from the table below:
38
39| ABI         | Build script                     | Restrictions               |
40| ----------- | ---------------------------------| -------------------------- |
41| armeabi-v7a | `scripts/build-android-armv7.sh` | Requires CPU with ARM NEON |
42| arm64-v8a   | `scripts/build-android-arm64.sh` |                            |
43| x86         | `scripts/build-android-x86.sh`   |                            |
44
45Notes:
46- On **armeabi-v7a** `pytorch_qnnp_initialize` will fail with `pytorch_qnnp_status_unsupported_hardware` if the mobile CPU does not support ARM NEON. Don't set `-DANDROID_ARM_NEON=1` for QNNPACK compilation as it can make `pytorch_qnnp_initialize` crash on CPUs without ARM NEON.
47
48### Cross-compilation for iOS
49
50To cross-compile for iOS, clone [ios-cmake](https://github.com/leetal/ios-cmake), and set `$IOS_CMAKE_TOOLCHAIN_FILE` environment variable (where `$IOS_CMAKE_TOOLCHAIN_FILE` is the path to `ios.toolchain.cmake` file in [ios-cmake](https://github.com/leetal/ios-cmake)), and use one of the scripts from the table below:
51
52| Architecture | Build script                  | Notes                     |
53| ------------ | ----------------------------- | ------------------------- |
54| armv7        | `scripts/build-ios-armv7.sh`  | iPhone 3GS/4/4S           |
55| armv7        | `scripts/build-ios-armv7s.sh` | iPhone 5 and newer        |
56| arm64        | `scripts/build-ios-arm64.sh`  | iPhone 5S and newer       |
57| arm64e       | `scripts/build-ios-arm64e.sh` | iPhone XS/XR              |
58| i386         | `scripts/build-ios-i386.sh`   | iPhone Simulator (32-bit) |
59| x86_64       | `scripts/build-ios-x86_64.sh` | iPhone Simulator (64-bit) |
60
61## End-to-End Benchmarking
62
63Caffe2 backend of PyTorch 1.0 natively integrates QNNPACK, and provides a [pre-trained quantized MobileNet v2 model](https://github.com/caffe2/models/tree/master/mobilenet_v2_quantized). Below are instructions for benchmarking this model end-to-end with QNNPACK.
64
65### Raspberry Pi 2 or 3
66
67```bash
68# Clone PyTorch 1.0 repo
69git clone --recursive https://github.com/pytorch/pytorch.git
70cd pytorch
71
72# Optional: update QNNPACK submodule to latest revision
73git submodule update --remote third_party/QNNPACK
74
75# Build Caffe2 (including binaries) for the host system
76# Use only 1 thread for build to avoid out-of-memory failures
77MAX_JOBS=1 scripts/build_local.sh -DBUILD_BINARY=ON -DBUILD_PYTHON=OFF \
78    -DUSE_OBSERVERS=OFF -DUSE_DISTRIBUTED=OFF
79
80# Download model weights
81wget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/init_net.pb
82
83# Download model graph
84wget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/predict_net.pb
85
86# Run speed benchmark with 50 warm-up iterations and 10 measurement iterations
87build/bin/speed_benchmark --net predict_net.pb --init_net init_net.pb \
88    --input data --input_dims 1,3,224,224 --input_type float \
89    --warmup 50 --iter 10
90```
91
92### ARMv7 (32-bit) Android
93
94```bash
95# Clone PyTorch 1.0 repo
96git clone --recursive https://github.com/pytorch/pytorch.git
97cd pytorch
98
99# Optional: update QNNPACK submodule to latest revision
100git submodule update --remote third_party/QNNPACK
101
102# Build Caffe2 (including binaries) for Android, and push to device
103scripts/build_android.sh -DANDROID_TOOLCHAIN=clang -DBUILD_BINARY=ON
104adb push build_android/bin/speed_benchmark /data/local/tmp/speed_benchmark
105
106# Download model weights and copy them to Android device
107wget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/init_net.pb
108adb push init_net.pb /data/local/tmp/init_net.pb
109
110# Download model graph and copy it to Android device
111wget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/predict_net.pb
112adb push predict_net.pb /data/local/tmp/predict_net.pb
113
114# Run speed benchmark with 50 warm-up iterations and 10 measurement iterations
115adb shell /data/local/tmp/speed_benchmark \
116    --net /data/local/tmp/predict_net.pb \
117    --init_net /data/local/tmp/init_net.pb \
118    --input data --input_dims 1,3,224,224 --input_type float \
119    --warmup 50 --iter 10
120```
121
122### ARM64 (64-bit) Android
123
124```bash
125# Clone PyTorch 1.0 repo
126git clone --recursive https://github.com/pytorch/pytorch.git
127cd pytorch
128
129# Optional: update QNNPACK submodule to latest revision
130git submodule update --remote third_party/QNNPACK
131
132# Build Caffe2 (including binaries) for Android, and push to device
133scripts/build_android.sh -DANDROID_ABI=arm64-v8a -DANDROID_TOOLCHAIN=clang -DBUILD_BINARY=ON
134adb push build_android/bin/speed_benchmark /data/local/tmp/speed_benchmark
135
136# Download model weights and copy them to Android device
137wget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/init_net.pb
138adb push init_net.pb /data/local/tmp/init_net.pb
139
140# Download model graph and copy it to Android device
141wget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/predict_net.pb
142adb push predict_net.pb /data/local/tmp/predict_net.pb
143
144# Run speed benchmark with 50 warm-up iterations and 10 measurement iterations
145adb shell /data/local/tmp/speed_benchmark \
146    --net /data/local/tmp/predict_net.pb \
147    --init_net /data/local/tmp/init_net.pb \
148    --input data --input_dims 1,3,224,224 --input_type float \
149    --warmup 50 --iter 10
150```
151
152### PEP (Performance Evaluation Platform) Method
153
154[Facebook AI Performance Evaluation Platform](https://github.com/facebook/FAI-PEP) is a framework and backend agnostic benchmarking platform to compare machine learning inferencing runtime metrics on a set of models and a variety of backends.
155
156We use PEP to produce the results we have in our [blog](https://code.fb.com/ml-applications/qnnpack/)
157
158With an ARMv7 device connected:
159
160```bash
161# Clone PyTorch 1.0 repo
162mkdir ~/Code && cd ~/Code
163git clone --recursive https://github.com/pytorch/pytorch.git
164cd pytorch
165
166# Optional: update QNNPACK submodule to latest revision
167git submodule update --remote third_party/QNNPACK
168
169# Clone PEP repo
170cd ~/Code
171git clone --recursive https://github.com/facebook/FAI-PEP.git aibench
172cd aibench
173
174# Run PEP benchmark with cool specifications. Try changing that cmd with more specifications!
175# First time compile could take 20+ minutes
176./benchmarking/run_bench.py \
177  --platform android \
178  -b ~/Code/aibench/specifications/models/caffe2/mobilenet_v2/mobilenet_v2_quant.json \
179  --platform android --repo_dir ~/Code/pytorch \
180  --frameworks_dir ~/Code/aibench/specifications/frameworks --framework caffe2
181```
182
183## Acknowledgements
184
185QNNPACK is developed by Marat Dukhan, Yiming Wu, Hao Lu, and Bert Maher. We thank Andrew Tulloch and Yangqing Jia for advice during the development of QNNPACK.
186
187## License
188
189QNNPACK is BSD licensed, as found in the [`LICENSE`](LICENSE) file.
190