• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# PyArmNN Object Detection Sample Application
2
3## Introduction
4This sample application guides the user and shows how to perform object detection using PyArmNN API. We assume the user has already built PyArmNN by following the instructions of the README in the main PyArmNN directory.
5
6We provide example scripts for performing object detection from video file and video stream with `run_video_file.py` and `run_video_stream.py`.
7
8The application takes a model and video file or camera feed as input, runs inference on each frame, and draws bounding boxes around detected objects, with the corresponding labels and confidence scores overlaid.
9
10A similar implementation of this object detection application is also provided in C++ in the examples for ArmNN.
11
12## Prerequisites
13
14##### PyArmNN
15
16Before proceeding to the next steps, make sure that you have successfully installed the newest version of PyArmNN on your system by following the instructions in the README of the PyArmNN root directory.
17
18You can verify that PyArmNN library is installed and check PyArmNN version using:
19```bash
20$ pip show pyarmnn
21```
22
23You can also verify it by running the following and getting output similar to below:
24```bash
25$ python -c "import pyarmnn as ann;print(ann.GetVersion())"
26'22.0.0'
27```
28
29##### Dependencies
30
31Install the following libraries on your system:
32```bash
33$ sudo apt-get install python3-opencv libqtgui4 libqt4-test
34```
35
36Create a virtual environment:
37```bash
38$ python3.7 -m venv devenv --system-site-packages
39$ source devenv/bin/activate
40```
41
42Install the dependencies:
43```bash
44$ pip install -r requirements.txt
45```
46
47---
48
49# Performing Object Detection
50
51## Object Detection from Video File
52The `run_video_file.py` example takes a video file as input, runs inference on each frame, and produces frames with bounding boxes drawn around detected objects. The processed frames are written to video file.
53
54The user can specify these arguments at command line:
55
56* `--video_file_path` - <b>Required:</b> Path to the video file to run object detection on
57
58* `--model_file_path` - <b>Required:</b> Path to <b>.tflite, .pb</b> or <b>.onnx</b> object detection model
59
60* `--model_name` - <b>Required:</b> The name of the model being used. Assembles the workflow for the input model. The examples support the model names:
61
62  * `ssd_mobilenet_v1`
63
64  * `yolo_v3_tiny`
65
66* `--label_path` - <b>Required:</b> Path to labels file for the specified model file
67
68* `--output_video_file_path` - Path to the output video file with detections added in
69
70* `--preferred_backends` - You can specify one or more backend in order of preference. Accepted backends include `CpuAcc, GpuAcc, CpuRef`. Arm NN will decide which layers of the network are supported by the backend, falling back to the next if a layer is unsupported. Defaults to `['CpuAcc', 'CpuRef']`
71
72
73Run the sample script:
74```bash
75$ python run_video_file.py --video_file_path <video_file_path> --model_file_path <model_file_path> --model_name <model_name>
76```
77
78## Object Detection from Video Stream
79The `run_video_stream.py` example captures frames from a video stream of a device, runs inference on each frame, and produces frames with bounding boxes drawn around detected objects. A window is displayed and refreshed with the latest processed frame.
80
81The user can specify these arguments at command line:
82
83* `--video_source` - Device index to access video stream. Defaults to primary device camera at index 0
84
85* `--model_file_path` - <b>Required:</b> Path to <b>.tflite, .pb</b> or <b>.onnx</b> object detection model
86
87* `--model_name` - <b>Required:</b> The name of the model being used. Assembles the workflow for the input model. The examples support the model names:
88
89  * `ssd_mobilenet_v1`
90
91  * `yolo_v3_tiny`
92
93* `--label_path` - <b>Required:</b> Path to labels file for the specified model file
94
95* `--preferred_backends` - You can specify one or more backend in order of preference. Accepted backends include `CpuAcc, GpuAcc, CpuRef`. Arm NN will decide which layers of the network are supported by the backend, falling back to the next if a layer is unsupported. Defaults to `['CpuAcc', 'CpuRef']`
96
97
98Run the sample script:
99```bash
100$ python run_video_stream.py --model_file_path <model_file_path> --model_name <model_name>
101```
102
103This application has been verified to work against the MobileNet SSD model, which can be downloaded along with it's label set from:
104
105* https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
106
107## Implementing Your Own Network
108The examples provide support for `ssd_mobilenet_v1` and `yolo_v3_tiny` models. However, the user is able to add their own network to the object detection scripts by following the steps:
109
1101. Create a new file for your network, for example `network.py`, to contain functions to process the output of the model
1112. In that file, the user will need to write a function that decodes the output vectors obtained from running inference on their network and return the bounding box positions of detected objects plus their class index and confidence. Additionally, include a function that returns a resize factor that will scale the obtained bounding boxes to their correct positions in the original frame
1123. Import the functions into the main file and, such as with the provided networks, add a conditional statement to the `get_model_processing()` function with the new model name and functions
1134. The labels associated with the model can then be passed in with `--label_path` argument
114
115---
116
117# Application Overview
118
119This section provides a walkthrough of the application, explaining in detail the steps:
120
1211. Initialisation
1222. Creating a Network
1233. Preparing the Workload Tensors
1244. Executing Inference
1255. Postprocessing
126
127
128### Initialisation
129
130##### Reading from Video Source
131After parsing user arguments, the chosen video file or stream is loaded into an OpenCV `cv2.VideoCapture()` object. We use this object to capture frames from the source using the `read()` function.
132
133The `VideoCapture` object also tells us information about the source, such as the framerate and resolution of the input video. Using this information, we create a `cv2.VideoWriter()` object which will be used at the end of every loop to write the processed frame to an output video file of the same format as the input.
134
135##### Preparing Labels and Model Specific Functions
136In order to interpret the result of running inference on the loaded network, it is required to load the labels associated with the model. In the provided example code, the `dict_labels()` function creates a dictionary that is keyed on the classification index at the output node of the model, with values of the dictionary corresponding to a label and a randomly generated RGB color. This ensures that each class has a unique color which will prove helpful when plotting the bounding boxes of various detected objects in a frame.
137
138Depending on the model being used, the user-specified model name accesses and returns functions to decode and process the inference output, along with a resize factor used when plotting bounding boxes to ensure they are scaled to their correct position in the original frame.
139
140
141### Creating a Network
142
143##### Creating Parser and Importing Graph
144The first step with PyArmNN is to import a graph from file by using the appropriate parser.
145
146The Arm NN SDK provides parsers for reading graphs from a variety of model formats. In our application we specifically focus on `.tflite, .pb, .onnx` models.
147
148Based on the extension of the provided model file, the corresponding parser is created and the network file loaded with `CreateNetworkFromBinaryFile()` function. The parser will handle the creation of the underlying Arm NN graph.
149
150##### Optimizing Graph for Compute Device
151Arm NN supports optimized execution on multiple CPU and GPU devices. Prior to executing a graph, we must select the appropriate device context. We do this by creating a runtime context with default options with `IRuntime()`.
152
153We can optimize the imported graph by specifying a list of backends in order of preference and implement backend-specific optimizations. The backends are identified by a string unique to the backend, for example `CpuAcc, GpuAcc, CpuRef`.
154
155Internally and transparently, Arm NN splits the graph into subgraph based on backends, it calls a optimize subgraphs function on each of them and, if possible, substitutes the corresponding subgraph in the original graph with its optimized version.
156
157Using the `Optimize()` function we optimize the graph for inference and load the optimized network onto the compute device with `LoadNetwork()`. This function creates the backend-specific workloads for the layers and a backend specific workload factory which is called to create the workloads.
158
159##### Creating Input and Output Binding Information
160Parsers can also be used to extract the input information for the network. By calling `GetSubgraphInputTensorNames` we extract all the input names and, with `GetNetworkInputBindingInfo`, bind the input points of the graph.
161
162The input binding information contains all the essential information about the input. It is a tuple consisting of integer identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information, number of dimensions, total number of elements).
163
164Similarly, we can get the output binding information for an output layer by using the parser to retrieve output tensor names and calling `GetNetworkOutputBindingInfo()`.
165
166
167### Preparing the Workload Tensors
168
169##### Preprocessing the Captured Frame
170Each frame captured from source is read as an `ndarray` in BGR format and therefore has to be preprocessed before being passed into the network.
171
172This preprocessing step consists of swapping channels (BGR to RGB in this example), resizing the frame to the required resolution, expanding dimensions of the array and doing data type conversion to match the model input layer. This information about the input tensor can be readily obtained from reading the `input_binding_info`. For example, SSD MobileNet V1 takes for input a tensor with shape `[1, 300, 300, 3]` and data type `uint8`.
173
174##### Making Input and Output Tensors
175To produce the workload tensors, calling the functions `make_input_tensors()` and `make_output_tensors()` will return the input and output tensors respectively.
176
177
178### Executing Inference
179After making the workload tensors, a compute device performs inference for the loaded network using the `EnqueueWorkload()` function of the runtime context. By calling the `workload_tensors_to_ndarray()` function, we obtain the results from inference as a list of `ndarrays`.
180
181
182### Postprocessing
183
184##### Decoding and Processing Inference Output
185The output from inference must be decoded to obtain information about detected objects in the frame. In the examples there are implementations for two networks but you may also implement your own network decoding solution here. Please refer to <i>Implementing Your Own Network</i> section of this document to learn how to do this.
186
187For SSD MobileNet V1 models, we decode the results to obtain the bounding box positions, classification index, confidence and number of detections in the input frame.
188
189For YOLO V3 Tiny models, we decode the output and perform non-maximum suppression to filter out any weak detections below a confidence threshold and any redudant bounding boxes above an intersection-over-union threshold.
190
191It is encouraged to experiment with threshold values for confidence and intersection-over-union (IoU) to achieve the best visual results.
192
193The detection results are always returned as a list in the form `[class index, [box positions], confidence score]`, with the box positions list containing bounding box coordinates in the form `[x_min, y_min, x_max, y_max]`.
194
195##### Drawing Bounding Boxes
196With the obtained results and using `draw_bounding_boxes()`, we are able to draw bounding boxes around detected objects and add the associated label and confidence score. The labels dictionary created earlier uses the class index of the detected object as a key to return the associated label and color for that class. The resize factor defined at the beginning scales the bounding box coordinates to their correct positions in the original frame. The processed frames are written to file or displayed in a separate window.
197