README.md
1# Object Detection Sample Application
2
3## Introduction
4This sample application guides the user and shows how to perform object detection using PyArmNN or Arm NN TensorFlow Lite Delegate API. We assume the user has already built PyArmNN by following the instructions of the README in the main PyArmNN directory.
5
6##### Running with Armn NN TensorFlow Lite Delegate
7There is an option to use the Arm NN TensorFlow Lite Delegate instead of Arm NN TensorFlow Lite Parser for the object detection inference.
8The Arm NN TensorFlow Lite Delegate is part of Arm NN library and its purpose is to accelerate certain TensorFlow Lite
9(TfLite) operators on Arm hardware. The main advantage of using the Arm NN TensorFlow Lite Delegate over the Arm NN TensorFlow
10Lite Parser is that the number of supported operations is far greater, which means Arm NN TfLite Delegate can execute
11all TfLite models, and accelerates any operations that Arm NN supports.
12In addition, in the delegate options there are some optimizations applied by default in order to improve the inference
13performance at the expanse of a slight accuracy reduction. In this example we enable fast math and reduce float32 to
14float16 optimizations.
15
16Using the **fast_math** flag can lead to performance improvements in fp32 and fp16 layers but may result in
17results with reduced or different precision. The fast_math flag will not have any effect on int8 performance.
18
19The **reduce-fp32-to-fp16** feature works best if all operators of the model are in Fp32. ArmNN will add conversion layers
20between layers that weren't in Fp32 in the first place or if the operator is not supported in Fp16.
21The overhead of these conversions can lead to a slower overall performance if too many conversions are required.
22
23One can turn off these optimizations in the `create_network` function found in the `network_executor_tflite.py`.
24Just change the `optimization_enable` flag to false.
25
26We provide example scripts for performing object detection from video file and video stream with `run_video_file.py` and `run_video_stream.py`.
27
28The application takes a model and video file or camera feed as input, runs inference on each frame, and draws bounding boxes around detected objects, with the corresponding labels and confidence scores overlaid.
29
30A similar implementation of this object detection application is also provided in C++ in the examples for ArmNN.
31
32##### Performing Object Detection with Style Transfer and TensorFlow Lite Delegate
33In addition to running Object Detection using TensorFlow Lite Delegate, instead of drawing bounding boxes on each frame, there is an option to run style transfer to create stylized detections.
34Style transfer is the ability to create a new image, known as a pastiche, based on two input images: one representing an artistic style and one representing the content frame containing class detections.
35The style transfer consists of two submodels:
36Style Prediction Model: A MobilenetV2-based neural network that takes an input style image to create a style bottleneck vector.
37Style Transform Model: A neural network that applies a style bottleneck vector to a content image and creates a stylized image.
38An image containing an art style is preprocessed to a correct size and dimension.
39The preprocessed style image is passed to a style predict network which calculates and returns a style bottleneck tensor.
40The style transfer network receives the style bottleneck, and a content frame that contains detections, which then transforms the requested class detected and returns a stylized frame.
41
42
43## Prerequisites
44
45##### PyArmNN
46
47Before proceeding to the next steps, make sure that you have successfully installed the newest version of PyArmNN on your system by following the instructions in the README of the PyArmNN root directory.
48
49You can verify that PyArmNN library is installed and check PyArmNN version using:
50```bash
51$ pip show pyarmnn
52```
53
54You can also verify it by running the following and getting output similar to below:
55```bash
56$ python -c "import pyarmnn as ann;print(ann.GetVersion())"
57'32.0.0'
58```
59
60##### Dependencies
61
62Install the following libraries on your system:
63```bash
64$ sudo apt-get install python3-opencv
65```
66
67
68<b>This section is needed only if running with Arm NN TensorFlow Lite Delegate is desired</b>\
69If there is no libarmnnDelegate.so file in your ARMNN_LIB path,
70download Arm NN artifacts with Arm NN delegate according to your platform and Arm NN latest version (for this example aarch64 and v21.11 respectively):
71```bash
72$ export $WORKSPACE=`pwd`
73$ mkdir ./armnn_artifacts ; cd armnn_artifacts
74$ wget https://github.com/ARM-software/armnn/releases/download/v21.11/ArmNN-linux-aarch64.tar.gz
75$ tar -xvzf ArmNN*.tar.gz
76$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`
77```
78
79Create a virtual environment:
80```bash
81$ python3.7 -m venv devenv --system-site-packages
82$ source devenv/bin/activate
83```
84
85Install the dependencies from the object_detection example folder:
86* In case the python version is 3.8 or lower, tflite_runtime version 2.5.0 (without post1 suffix) should be installed.
87 (requirements.txt file should be amended)
88```bash
89$ cd $WORKSPACE/armnn/python/pyarmnn/examples/object_detection
90$ pip install -r requirements.txt
91```
92
93---
94
95# Performing Object Detection
96
97## Object Detection from Video File
98The `run_video_file.py` example takes a video file as input, runs inference on each frame, and produces frames with bounding boxes drawn around detected objects. The processed frames are written to video file.
99
100The user can specify these arguments at command line:
101
102* `--video_file_path` - <b>Required:</b> Path to the video file to run object detection on
103
104* `--model_file_path` - <b>Required:</b> Path to <b>.tflite, .pb</b> or <b>.onnx</b> object detection model
105
106* `--model_name` - <b>Required:</b> The name of the model being used. Assembles the workflow for the input model. The examples support the model names:
107
108 * `ssd_mobilenet_v1`
109
110 * `yolo_v3_tiny`
111
112* `--label_path` - <b>Required:</b> Path to labels file for the specified model file
113
114* `--output_video_file_path` - Path to the output video file with detections added in
115
116* `--preferred_backends` - You can specify one or more backend in order of preference. Accepted backends include `CpuAcc, GpuAcc, CpuRef`. Arm NN will decide which layers of the network are supported by the backend, falling back to the next if a layer is unsupported. Defaults to `['CpuAcc', 'CpuRef']`
117
118* `--tflite_delegate_path` - Optional. Path to the Arm NN TensorFlow Lite Delegate library (libarmnnDelegate.so). If provided, Arm NN TensorFlow Lite Delegate will be used instead of PyArmNN.
119
120* `--profiling_enabled` - Optional. Enabling this option will print important ML related milestones timing information in micro-seconds. By default, this option is disabled. Accepted options are `true/false`
121
122The `run_video_file.py` example can also perform style transfer on a selected class of detected objects, and stylize the detections based on a given style image.
123
124In addition, to run style transfer, the user needs to specify these arguments at command line:
125
126* `--style_predict_model_file_path` - Path to the style predict model that will be used to create a style bottleneck tensor
127
128* `--style_transfer_model_file_path` - Path to the style transfer model to use which will perform the style transfer
129
130* `--style_image_path` - Path to a .jpg/jpeg/png style image to create stylized frames
131
132* `--style_transfer_class` - A detected class name to transform its style
133
134
135Run the sample script:
136```bash
137$ python run_video_file.py --video_file_path <video_file_path> --model_file_path <model_file_path> --model_name <model_name> --tflite_delegate_path <ARMNN delegate file path> --style_predict_model_file_path <style_predict_model_path>
138--style_transfer_model_file_path <style_transfer_model_path> --style_image_path <style_image_path> --style_transfer_class <style_transfer_class>
139```
140
141## Object Detection from Video Stream
142The `run_video_stream.py` example captures frames from a video stream of a device, runs inference on each frame, and produces frames with bounding boxes drawn around detected objects. A window is displayed and refreshed with the latest processed frame.
143
144The user can specify these arguments at command line:
145
146* `--video_source` - Device index to access video stream. Defaults to primary device camera at index 0
147
148* `--model_file_path` - <b>Required:</b> Path to <b>.tflite, .pb</b> or <b>.onnx</b> object detection model
149
150* `--model_name` - <b>Required:</b> The name of the model being used. Assembles the workflow for the input model. The examples support the model names:
151
152 * `ssd_mobilenet_v1`
153
154 * `yolo_v3_tiny`
155
156* `--label_path` - <b>Required:</b> Path to labels file for the specified model file
157
158* `--preferred_backends` - You can specify one or more backend in order of preference. Accepted backends include `CpuAcc, GpuAcc, CpuRef`. Arm NN will decide which layers of the network are supported by the backend, falling back to the next if a layer is unsupported. Defaults to `['CpuAcc', 'CpuRef']`
159
160* `--tflite_delegate_path` - Optional. Path to the Arm NN TensorFlow Lite Delegate library (libarmnnDelegate.so). If provided, Arm NN TensorFlow Lite Delegate will be used instead of PyArmNN.
161
162* `--profiling_enabled` - Optional. Enabling this option will print important ML related milestones timing information in micro-seconds. By default, this option is disabled. Accepted options are `true/false`
163
164Run the sample script:
165```bash
166$ python run_video_stream.py --model_file_path <model_file_path> --model_name <model_name> --tflite_delegate_path <ARMNN delegate file path> --label_path <Model label path> --video_file_path <Video file>
167
168In addition, to run style trasnfer, the user needs to specify these arguments at command line:
169
170* `--style_predict_model_file_path` - Path to .tflite style predict model that will be used to create a style bottleneck tensor
171
172* `--style_transfer_model_file_path` - Path to .tflite style transfer model to use which will perform the style transfer
173
174* `--style_image_path` - Path to a .jpg/jpeg/png style image to create stylized frames
175
176* `--style_transfer_class` - A detected class name to transform its style
177
178Run the sample script:
179```bash
180$ python run_video_stream.py --model_file_path <model_file_path> --model_name <model_name> --tflite_delegate_path <ARMNN delegate file path> --style_predict_model_file_path <style_predict_model_path>
181--style_transfer_model_file_path <style_transfer_model_path> --style_image_path <style_image_path> --style_transfer_class <style_transfer_class>
182```
183
184This application has been verified to work against the MobileNet SSD model and YOLOv3, which can be downloaded along with it's label set from:
185
186* https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
187
188
189or from Arm Model Zoo on GitHub.
190```bash
191sudo apt-get install git git-lfs
192git lfs install
193git clone https://github.com/arm-software/ml-zoo.git
194cd ml-zoo/models/object_detection/yolo_v3_tiny/tflite_fp32/
195./get_class_labels.sh
196cp labelmappings.txt yolo_v3_tiny_darknet_fp32.tflite $WORKSPACE/armnn/python/pyarmnn/examples/object_detection/
197```
198
199The Style Transfer has been verified to work with the following models:
200
201* style prediction model: https://tfhub.dev/google/lite-model/magenta/arbitrary-image-stylization-v1-256/int8/prediction/1?lite-format=tflite
202
203* style transfer model: https://tfhub.dev/google/lite-model/magenta/arbitrary-image-stylization-v1-256/int8/transfer/1?lite-format=tflite
204
205## Implementing Your Own Network
206The examples provide support for `ssd_mobilenet_v1` and `yolo_v3_tiny` models. However, the user is able to add their own network to the object detection scripts by following the steps:
207
2081. Create a new file for your network, for example `network.py`, to contain functions to process the output of the model
2092. In that file, the user will need to write a function that decodes the output vectors obtained from running inference on their network and return the bounding box positions of detected objects plus their class index and confidence. Additionally, include a function that returns a resize factor that will scale the obtained bounding boxes to their correct positions in the original frame
2103. Import the functions into the main file and, such as with the provided networks, add a conditional statement to the `get_model_processing()` function with the new model name and functions
2114. The labels associated with the model can then be passed in with `--label_path` argument
212
213---
214
215# Application Overview
216
217This section provides a walk-through of the application, explaining in detail the steps:
218
2191. Initialisation
2202. Creating a Network
2213. Preparing the Workload Tensors
2224. Executing Inference
2235. Postprocessing
224
225
226### Initialisation
227
228##### Reading from Video Source
229After parsing user arguments, the chosen video file or stream is loaded into an OpenCV `cv2.VideoCapture()` object. We use this object to capture frames from the source using the `read()` function.
230
231The `VideoCapture` object also tells us information about the source, such as the frame-rate and resolution of the input video. Using this information, we create a `cv2.VideoWriter()` object which will be used at the end of every loop to write the processed frame to an output video file of the same format as the input.
232
233##### Preparing Labels and Model Specific Functions
234In order to interpret the result of running inference on the loaded network, it is required to load the labels associated with the model. In the provided example code, the `dict_labels()` function creates a dictionary that is keyed on the classification index at the output node of the model, with values of the dictionary corresponding to a label and a randomly generated RGB color. This ensures that each class has a unique color which will prove helpful when plotting the bounding boxes of various detected objects in a frame.
235
236Depending on the model being used, the user-specified model name accesses and returns functions to decode and process the inference output, along with a resize factor used when plotting bounding boxes to ensure they are scaled to their correct position in the original frame.
237
238
239### Creating a Network
240
241##### Creating Parser and Importing Graph
242The first step with PyArmNN is to import a graph from file by using the appropriate parser.
243
244The Arm NN SDK provides parsers for reading graphs from a variety of model formats. In our application we specifically focus on `.tflite, .pb, .onnx` models.
245
246Based on the extension of the provided model file, the corresponding parser is created and the network file loaded with `CreateNetworkFromBinaryFile()` function. The parser will handle the creation of the underlying Arm NN graph.
247
248##### Optimizing Graph for Compute Device
249Arm NN supports optimized execution on multiple CPU and GPU devices. Prior to executing a graph, we must select the appropriate device context. We do this by creating a runtime context with default options with `IRuntime()`.
250
251We can optimize the imported graph by specifying a list of backends in order of preference and implement backend-specific optimizations. The backends are identified by a string unique to the backend, for example `CpuAcc, GpuAcc, CpuRef`.
252
253Internally and transparently, Arm NN splits the graph into subgraph based on backends, it calls a optimize subgraphs function on each of them and, if possible, substitutes the corresponding subgraph in the original graph with its optimized version.
254
255Using the `Optimize()` function we optimize the graph for inference and load the optimized network onto the compute device with `LoadNetwork()`. This function creates the backend-specific workloads for the layers and a backend specific workload factory which is called to create the workloads.
256
257##### Creating Input and Output Binding Information
258Parsers can also be used to extract the input information for the network. By calling `GetSubgraphInputTensorNames` we extract all the input names and, with `GetNetworkInputBindingInfo`, bind the input points of the graph.
259
260The input binding information contains all the essential information about the input. It is a tuple consisting of integer identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information, number of dimensions, total number of elements).
261
262Similarly, we can get the output binding information for an output layer by using the parser to retrieve output tensor names and calling `GetNetworkOutputBindingInfo()`.
263
264
265### Preparing the Workload Tensors
266
267##### Preprocessing the Captured Frame
268Each frame captured from source is read as an `ndarray` in BGR format and therefore has to be preprocessed before being passed into the network.
269
270This preprocessing step consists of swapping channels (BGR to RGB in this example), resizing the frame to the required resolution, expanding dimensions of the array and doing data type conversion to match the model input layer. This information about the input tensor can be readily obtained from reading the `input_binding_info`. For example, SSD MobileNet V1 takes for input a tensor with shape `[1, 300, 300, 3]` and data type `uint8`.
271
272##### Making Input and Output Tensors
273To produce the workload tensors, calling the functions `make_input_tensors()` and `make_output_tensors()` will return the input and output tensors respectively.
274
275#### Creating a style bottleneck - Style prediction
276If the user decides to use style transfer, a style transfer constructor will be called to create a style bottleneck.
277To create a style bottleneck, the style transfer executor will call a style_predict function, which requires a style prediction executor, and an artistic style image.
278The style image must be preprocssed to (1, 256, 256, 3) to fit the style predict executor which will then perform inference to create a style bottleneck.
279
280### Executing Inference
281After making the workload tensors, a compute device performs inference for the loaded network using the `EnqueueWorkload()` function of the runtime context. By calling the `workload_tensors_to_ndarray()` function, we obtain the results from inference as a list of `ndarrays`.
282
283
284### Postprocessing
285
286##### Decoding and Processing Inference Output
287The output from inference must be decoded to obtain information about detected objects in the frame. In the examples there are implementations for two networks but you may also implement your own network decoding solution here. Please refer to <i>Implementing Your Own Network</i> section of this document to learn how to do this.
288
289For SSD MobileNet V1 models, we decode the results to obtain the bounding box positions, classification index, confidence and number of detections in the input frame.
290
291For YOLO V3 Tiny models, we decode the output and perform non-maximum suppression to filter out any weak detections below a confidence threshold and any redudant bounding boxes above an intersection-over-union threshold.
292
293It is encouraged to experiment with threshold values for confidence and intersection-over-union (IoU) to achieve the best visual results.
294
295The detection results are always returned as a list in the form `[class index, [box positions], confidence score]`, with the box positions list containing bounding box coordinates in the form `[x_min, y_min, x_max, y_max]`.
296
297##### Drawing Bounding Boxes
298With the obtained results and using `draw_bounding_boxes()`, we are able to draw bounding boxes around detected objects and add the associated label and confidence score. The labels dictionary created earlier uses the class index of the detected object as a key to return the associated label and color for that class. The resize factor defined at the beginning scales the bounding box coordinates to their correct positions in the original frame. The processed frames are written to file or displayed in a separate window.
299
300##### Creating Stylized Detections
301Using the detections, we are able to send them as an input to the style transfer executor to create stylized detections using the style bottleneck tensor that was calculated in the style prediction process.
302Each detection will be cropped from the frame, and then preprocessed to (1, 384, 384, 3) to fit the style transfer executor.
303The style transfer executor will use the style bottleneck and the preprocessed content frame to create an artistic stylized frame.
304The labels dictionary created earlier uses the class index of the detected object as a key to return the associated label, which is used to identify if it's equal to the style transfer class. The resize factor defined at the beginning scales the bounding box coordinates to their correct positions in the original frame. The processed frames are written to file or displayed in a separate window.
305