1TensorFlow Lite delegate 2======================== 3 4Mesa contains a TensorFlow Lite delegate that can make use of NPUs to accelerate ML inference. It is implemented in the form of a *external delegate*, a shared library that the TensorFlow Lite runtime can load at startup. See https://www.tensorflow.org/api_docs/python/tf/lite/experimental/load_delegate. 5 6.. list-table:: Supported acceleration hardware 7 :header-rows: 1 8 9 * - Gallium driver 10 - NPU supported 11 - Hardware tested 12 * - Etnaviv 13 - ``VeriSilicon VIPNano-QI.7120`` 14 - ``Amlogic A311D on Libre Computer AML-A311D-CC Alta and Khadas VIM3`` 15 * - Etnaviv 16 - ``VeriSilicon VIPNano-SI+.8002`` 17 - ``NXP iMX8M Plus on Toradex Verdin SoM`` 18 19.. list-table:: Tested models 20 :header-rows: 1 21 22 * - Model name 23 - Data type 24 - Link (may be outdated) 25 - Status 26 - Inference speed on AML-A311D-CC Alta 27 - Inference speed on Verdin iMX8M Plus 28 * - MobileNet V1 29 - UINT8 30 - http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz 31 - Fully supported 32 - ~6.6 ms 33 - ~7.9 ms 34 * - MobileNet V2 35 - UINT8 36 - https://storage.googleapis.com/mobilenet_v2/checkpoints/quantized_v2_224_100.tgz 37 - Fully supported 38 - ~6.9 ms 39 - ~8.0 ms 40 * - SSDLite MobileDet 41 - UINT8 42 - https://raw.githubusercontent.com/google-coral/test_data/master/ssdlite_mobiledet_coco_qat_postprocess.tflite 43 - Fully supported 44 - ~24.8 ms 45 - ~24.4 ms 46 47Build 48----- 49 50Build Mesa as usual, with the -Dteflon=true argument. 51 52Example instructions: 53 54.. code-block:: console 55 56 # Install build dependencies 57 ~ # apt-get -y build-dep mesa 58 ~ # apt-get -y install git cmake 59 60 # Download sources 61 ~ $ git clone https://gitlab.freedesktop.org/mesa/mesa.git 62 63 # Build Mesa 64 ~ $ cd mesa 65 mesa $ meson setup build -Dgallium-drivers=etnaviv -Dvulkan-drivers= -Dteflon=true 66 mesa $ meson compile -C build 67 68Install runtime dependencies 69---------------------------- 70 71Your board should have booted into a mainline 6.7 (6.8 for the i.MX8MP) or greater kernel. 72 73.. code-block:: console 74 75 # Install Python 3.10 and dependencies (as root) 76 ~ # echo deb-src http://deb.debian.org/debian testing main >> /etc/apt/sources.list 77 ~ # echo deb http://deb.debian.org/debian unstable main >> /etc/apt/sources.list 78 ~ # echo 'APT::Default-Release "testing";' >> /etc/apt/apt.conf 79 ~ # apt-get update 80 ~ # apt-get -y install python3.10 python3-pytest python3-exceptiongroup 81 82 # Install TensorFlow Lite Python package (as non-root) 83 ~ $ python3.10 -m pip install --break-system-packages tflite-runtime==2.13.0 84 85 # For the classification.py script mentioned below, you will need PIL 86 ~ $ python3.10 -m pip install --break-system-packages pillow 87 88Do some inference with MobileNetV1 89---------------------------------- 90 91Run the above for a quick way of checking that the setup is correct and the NPU is accelerating the inference. It assumes you have followed the steps above so Python 3.10 and dependencies have been installed, and assumes that Mesa was built to the ``./build`` directory. 92 93You can use any image that prominently features one of the objects in the ``src/gallium/frontends/teflon/tests/labels_mobilenet_quant_v1_224.txt`` file. 94 95This example script has been based from the code in https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/examples/python. 96 97.. code-block:: console 98 99 ~ $ cd mesa/ 100 mesa $ TEFLON_DEBUG=verbose ETNA_MESA_DEBUG=ml_dbgs python3.10 src/gallium/frontends/teflon/tests/classification.py \ 101 -i ~/tensorflow/assets/grace_hopper.bmp \ 102 -m src/gallium/targets/teflon/tests/mobilenet_v1_1.0_224_quant.tflite \ 103 -l src/gallium/frontends/teflon/tests/labels_mobilenet_quant_v1_224.txt \ 104 -e build/src/gallium/targets/teflon/libteflon.so 105 106 Loading external delegate from build/src/gallium/targets/teflon/libteflon.so with args: {} 107 Teflon delegate: loaded etnaviv driver 108 109 teflon: compiling graph: 89 tensors 28 operations 110 idx scale zp has_data size 111 ======================================= 112 0 0.023528 0 no 1x1x1x1024 113 1 0.166099 42 no 1x1x1x1001 114 2 0.000117 0 yes 1001x0x0x0 115 3 0.004987 4a yes 1001x1x1x1024 116 4 0.166099 42 no 1x1001x0x0 117 5 0.166099 42 yes 2x0x0x0 118 6 0.000171 0 yes 32x0x0x0 119 7 0.023528 0 no 1x112x112x32 120 8 0.021827 97 yes 32x3x3x3 121 9 0.023528 0 no 1x14x14x512 122 ... 123 124 idx type in out operation type-specific 125 ================================================================================================ 126 0 CONV 88 7 w: 8 b: 6 stride: 2 pad: SAME 127 1 DWCONV 7 33 w: 35 b: 34 stride: 1 pad: SAME 128 2 CONV 33 37 w: 38 b: 36 stride: 1 pad: SAME 129 3 DWCONV 37 39 w: 41 b: 40 stride: 2 pad: SAME 130 4 CONV 39 43 w: 44 b: 42 stride: 1 pad: SAME 131 5 DWCONV 43 45 w: 47 b: 46 stride: 1 pad: SAME 132 6 CONV 45 49 w: 50 b: 48 stride: 1 pad: SAME 133 7 DWCONV 49 51 w: 53 b: 52 stride: 2 pad: SAME 134 8 CONV 51 55 w: 56 b: 54 stride: 1 pad: SAME 135 9 DWCONV 55 57 w: 59 b: 58 stride: 1 pad: SAME 136 10 CONV 57 61 w: 62 b: 60 stride: 1 pad: SAME 137 11 DWCONV 61 63 w: 65 b: 64 stride: 2 pad: SAME 138 12 CONV 63 67 w: 68 b: 66 stride: 1 pad: SAME 139 13 DWCONV 67 69 w: 71 b: 70 stride: 1 pad: SAME 140 14 CONV 69 73 w: 74 b: 72 stride: 1 pad: SAME 141 15 DWCONV 73 75 w: 77 b: 76 stride: 1 pad: SAME 142 16 CONV 75 79 w: 80 b: 78 stride: 1 pad: SAME 143 17 DWCONV 79 81 w: 83 b: 82 stride: 1 pad: SAME 144 18 CONV 81 85 w: 86 b: 84 stride: 1 pad: SAME 145 19 DWCONV 85 9 w: 11 b: 10 stride: 1 pad: SAME 146 20 CONV 9 13 w: 14 b: 12 stride: 1 pad: SAME 147 21 DWCONV 13 15 w: 17 b: 16 stride: 1 pad: SAME 148 22 CONV 15 19 w: 20 b: 18 stride: 1 pad: SAME 149 23 DWCONV 19 21 w: 23 b: 22 stride: 2 pad: SAME 150 24 CONV 21 25 w: 26 b: 24 stride: 1 pad: SAME 151 25 DWCONV 25 27 w: 29 b: 28 stride: 1 pad: SAME 152 26 CONV 27 31 w: 32 b: 30 stride: 1 pad: SAME 153 27 POOL 31 0 filter: 0x0 stride: 0 pad: VALID 154 155 teflon: compiled graph, took 10307 ms 156 teflon: invoked graph, took 21 ms 157 teflon: invoked graph, took 17 ms 158 teflon: invoked graph, took 17 ms 159 teflon: invoked graph, took 17 ms 160 teflon: invoked graph, took 16 ms 161 0.866667: military uniform 162 0.031373: Windsor tie 163 0.015686: mortarboard 164 0.007843: bow tie 165 0.007843: academic 166