Readme.md
1# Object Detection Example
2
3## Introduction
4This is a sample code showing object detection using Arm NN public C++ API. The compiled application can take
5
6 * a video file
7
8as input and
9 * save a video file
10 * or output video stream to the window
11
12with detections shown in bounding boxes, class labels and confidence.
13
14## Dependencies
15
16This example utilises OpenCV functions to capture and output video data. Top level inference API is provided by Arm NN
17library.
18
19### Arm NN
20
21Object detection example build system does not trigger Arm NN compilation. Thus, before building the application,
22please ensure that Arm NN libraries and header files are available on your build platform.
23The application executable binary dynamically links with the following Arm NN libraries:
24* libarmnn.so
25* libarmnnTfLiteParser.so
26
27The build script searches for available Arm NN libraries in the following order:
281. Inside custom user directory specified by ARMNN_LIB_DIR cmake option.
292. Inside the current Arm NN repository, assuming that Arm NN was built following [this instructions](../../BuildGuideCrossCompilation.md).
303. Inside default locations for system libraries, assuming Arm NN was installed from deb packages.
31
32Arm NN header files will be searched in parent directory of found libraries files under `include` directory, i.e.
33libraries found in `/usr/lib` or `/usr/lib64` and header files in `/usr/include` (or `${ARMNN_LIB_DIR}/include`).
34
35Please see [find_armnn.cmake](./cmake/find_armnn.cmake) for implementation details.
36
37### OpenCV
38
39This application uses [OpenCV (Open Source Computer Vision Library)](https://opencv.org/) for video stream processing.
40Your host platform may have OpenCV available through linux package manager. If this is the case, please install it using
41standard way. If not, our build system has a script to download and cross-compile required OpenCV modules
42as well as [FFMPEG](https://ffmpeg.org/) and [x264 encoder](https://www.videolan.org/developers/x264.html) libraries.
43The latter will build limited OpenCV functionality and application will support only video file input and video file output
44way of working. Displaying video frames in a window requires building OpenCV with GTK and OpenGL support.
45
46The application executable binary dynamically links with the following OpenCV libraries:
47* libopencv_core.so.4.0.0
48* libopencv_imgproc.so.4.0.0
49* libopencv_imgcodecs.so.4.0.0
50* libopencv_videoio.so.4.0.0
51* libopencv_video.so.4.0.0
52* libopencv_highgui.so.4.0.0
53
54and transitively depends on:
55* libavcodec.so (FFMPEG)
56* libavformat.so (FFMPEG)
57* libavutil.so (FFMPEG)
58* libswscale.so (FFMPEG)
59* libx264.so (x264)
60
61The application searches for above libraries in the following order:
621. Inside custom user directory specified by OPENCV_LIB_DIR cmake option.
632. Inside default locations for system libraries.
64
65If no OpenCV libraries were found, the cross-compilation build is extended with x264, ffmpeg and OpenCV compilation steps.
66
67Note: Native build does not add third party libraries to compilation.
68
69Please see [find_opencv.cmake](./cmake/find_opencv.cmake) for implementation details.
70
71## Building
72There are two flows for building this application:
73* native build on a host platform,
74* cross-compilation for a Arm-based host platform.
75
76### Build Options
77
78* CMAKE_TOOLCHAIN_FILE - choose one of the available cross-compilation toolchain files:
79 * `cmake/aarch64-toolchain.cmake`
80 * `cmake/arm-linux-gnueabihf-toolchain.cmake`
81* ARMNN_LIB_DIR - point to the custom location of the Arm NN libs and headers.
82* OPENCV_LIB_DIR - point to the custom location of the OpenCV libs and headers.
83* BUILD_UNIT_TESTS - set to `1` to build tests. Additionally to the main application, `object_detection_example-tests`
84unit tests executable will be created.
85
86### Native Build
87To build this application on a host platform, firstly ensure that required dependencies are installed:
88For example, for raspberry PI:
89```commandline
90sudo apt-get update
91sudo apt-get -yq install pkg-config
92sudo apt-get -yq install libgtk2.0-dev zlib1g-dev libjpeg-dev libpng-dev libxvidcore-dev libx264-dev
93sudo apt-get -yq install libavcodec-dev libavformat-dev libswscale-dev
94```
95
96To build demo application, create a build directory:
97```commandline
98mkdir build
99cd build
100```
101If you have already installed Arm NN and OpenCV:
102
103Inside build directory, run cmake and make commands:
104```commandline
105cmake ..
106make
107```
108This will build the following in bin directory:
109* object_detection_example - application executable
110
111If you have custom Arm NN and OpenCV location, use `OPENCV_LIB_DIR` and `ARMNN_LIB_DIR` options:
112```commandline
113cmake -DARMNN_LIB_DIR=/path/to/armnn -DOPENCV_LIB_DIR=/path/to/opencv ..
114make
115```
116
117### Cross-compilation
118
119This section will explain how to cross-compile the application and dependencies on a Linux x86 machine
120for arm host platforms.
121
122You will require working cross-compilation toolchain supported by your host platform. For raspberry Pi 3 and 4 with glibc
123runtime version 2.28, the following toolchains were successfully used:
124* https://releases.linaro.org/components/toolchain/binaries/latest-7/aarch64-linux-gnu/
125* https://releases.linaro.org/components/toolchain/binaries/latest-7/arm-linux-gnueabihf/
126
127Choose aarch64-linux-gnu if `lscpu` command shows architecture as aarch64 or arm-linux-gnueabihf if detected
128architecture is armv71.
129
130You can check runtime version on your host platform by running:
131```
132ldd --version
133```
134On **build machine**, install C and C++ cross compiler toolchains and add them to the PATH variable.
135
136Install package dependencies:
137```commandline
138sudo apt-get update
139sudo apt-get -yq install pkg-config
140```
141Package config is required by OpenCV build to discover FFMPEG libs.
142
143To build demo application, create a build directory:
144```commandline
145mkdir build
146cd build
147```
148Inside build directory, run cmake and make commands:
149
150**Arm 32bit**
151```commandline
152cmake -DARMNN_LIB_DIR=<path-to-armnn-libs> -DCMAKE_TOOLCHAIN_FILE=cmake/arm-linux-gnueabihf-toolchain.cmake ..
153make
154```
155**Arm 64bit**
156```commandline
157cmake -DARMNN_LIB_DIR=<path-to-armnn-libs> -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64-toolchain.cmake ..
158make
159```
160
161Add `-j` flag to the make command to run compilation in multiple threads.
162
163From the build directory, copy the following to the host platform:
164* bin directory - contains object_detection_example executable,
165* lib directory - contains cross-compiled OpenCV, ffmpeg, x264 libraries,
166* Your Arm NN libs used during compilation.
167
168The full list of libs after cross-compilation to copy on your board:
169```
170libarmnn.so
171libarmnn.so.22
172libarmnn.so.23.0
173libarmnnTfLiteParser.so
174libarmnnTfLiteParser.so.22.0
175libavcodec.so
176libavcodec.so.58
177libavcodec.so.58.54.100
178libavdevice.so
179libavdevice.so.58
180libavdevice.so.58.8.100
181libavfilter.so
182libavfilter.so.7
183libavfilter.so.7.57.100
184libavformat.so
185libavformat.so.58
186libavformat.so.58.29.100
187libavutil.so
188libavutil.so.56
189libavutil.so.56.31.100
190libopencv_core.so
191libopencv_core.so.4.0
192libopencv_core.so.4.0.0
193libopencv_highgui.so
194libopencv_highgui.so.4.0
195libopencv_highgui.so.4.0.0
196libopencv_imgcodecs.so
197libopencv_imgcodecs.so.4.0
198libopencv_imgcodecs.so.4.0.0
199libopencv_imgproc.so
200libopencv_imgproc.so.4.0
201libopencv_imgproc.so.4.0.0
202libopencv_video.so
203libopencv_video.so.4.0
204libopencv_video.so.4.0.0
205libopencv_videoio.so
206libopencv_videoio.so.4.0
207libopencv_videoio.so.4.0.0
208libpostproc.so
209libpostproc.so.55
210libpostproc.so.55.5.100
211libswresample.a
212libswresample.so
213libswresample.so.3
214libswresample.so.3.5.100
215libswscale.so
216libswscale.so.5
217libswscale.so.5.5.100
218libx264.so
219libx264.so.160
220```
221## Executing
222
223Once the application executable is built, it can be executed with the following options:
224* --video-file-path: Path to the video file to run object detection on **[REQUIRED]**
225* --model-file-path: Path to the Object Detection model to use **[REQUIRED]**
226* --label-path: Path to the label set for the provided model file **[REQUIRED]**
227* --model-name: The name of the model being used. Accepted options: SSD_MOBILE | YOLO_V3_TINY **[REQUIRED]**
228* --output-video-file-path: Path to the output video file with detections added in. Defaults to /tmp/output.avi
229 **[OPTIONAL]**
230* --preferred-backends: Takes the preferred backends in preference order, separated by comma.
231 For example: CpuAcc,GpuAcc,CpuRef. Accepted options: [CpuAcc, CpuRef, GpuAcc].
232 Defaults to CpuRef **[OPTIONAL]**
233* --help: Prints all the available options to screen
234
235### Object Detection on a supplied video file
236
237To run object detection on a supplied video file and output result to a video file:
238```commandline
239LD_LIBRARY_PATH=/path/to/armnn/libs:/path/to/opencv/libs ./object_detection_example --label-path /path/to/labels/file
240 --video-file-path /path/to/video/file --model-file-path /path/to/model/file
241 --model-name [YOLO_V3_TINY | SSD_MOBILE] --output-video-file-path /path/to/output/file
242```
243
244To run object detection on a supplied video file and output result to a window gui:
245```commandline
246LD_LIBRARY_PATH=/path/to/armnn/libs:/path/to/opencv/libs ./object_detection_example --label-path /path/to/labels/file
247 --video-file-path /path/to/video/file --model-file-path /path/to/model/file
248 --model-name [YOLO_V3_TINY | SSD_MOBILE]
249```
250
251This application has been verified to work against the MobileNet SSD model, which can be downloaded along with it's label set from:
252* https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
253
254---
255
256# Application Overview
257This section provides a walkthrough of the application, explaining in detail the steps:
2581. Initialisation
259 1. Reading from Video Source
260 2. Preparing Labels and Model Specific Functions
2612. Creating a Network
262 1. Creating Parser and Importing Graph
263 3. Optimizing Graph for Compute Device
264 4. Creating Input and Output Binding Information
2653. Object detection pipeline
266 1. Pre-processing the Captured Frame
267 2. Making Input and Output Tensors
268 3. Executing Inference
269 4. Postprocessing
270 5. Decoding and Processing Inference Output
271 6. Drawing Bounding Boxes
272
273
274### Initialisation
275
276##### Reading from Video Source
277After parsing user arguments, the chosen video file or stream is loaded into an OpenCV `cv::VideoCapture` object.
278We use [`IFrameReader`](./include/IFrameReader.hpp) interface and OpenCV specific implementation
279[`CvVideoFrameReader`](./include/CvVideoFrameReader.hpp) in our main function to capture frames from the source using the
280`ReadFrame()` function.
281
282The `CvVideoFrameReader` object also tells us information about the input video. Using this information and application
283arguments, we create one of the implementations of the [`IFrameOutput`](./include/IFrameOutput.hpp) interface:
284[`CvVideoFileWriter`](./include/CvVideoFileWriter.hpp) or [`CvWindowOutput`](./include/CvWindowOutput.hpp).
285This object will be used at the end of every loop to write the processed frame to an output video file or gui
286window.
287`CvVideoFileWriter` uses `cv::VideoWriter` with ffmpeg backend. `CvWindowOutput` makes use of `cv::imshow()` function.
288
289See `GetFrameSourceAndSink` function in [Main.cpp](./src/Main.cpp) for more details.
290
291##### Preparing Labels and Model Specific Functions
292In order to interpret the result of running inference on the loaded network, it is required to load the labels
293associated with the model. In the provided example code, the `AssignColourToLabel` function creates a vector of pairs
294label - colour that is ordered according to object class index at the output node of the model. Labels are assigned with
295a randomly generated RGB color. This ensures that each class has a unique color which will prove helpful when plotting
296the bounding boxes of various detected objects in a frame.
297
298Depending on the model being used, `CreatePipeline` function returns specific implementation of the object detection
299pipeline.
300
301### Creating a Network
302
303All operations with Arm NN and networks are encapsulated in [`ArmnnNetworkExecutor`](./include/ArmnnNetworkExecutor.hpp)
304class.
305
306##### Creating Parser and Importing Graph
307The first step with Arm NN SDK is to import a graph from file by using the appropriate parser.
308
309The Arm NN SDK provides parsers for reading graphs from a variety of model formats. In our application we specifically
310focus on `.tflite, .pb, .onnx` models.
311
312Based on the extension of the provided model file, the corresponding parser is created and the network file loaded with
313`CreateNetworkFromBinaryFile()` method. The parser will handle the creation of the underlying Arm NN graph.
314
315Current example accepts tflite format model files, we use `ITfLiteParser`:
316```c++
317#include "armnnTfLiteParser/ITfLiteParser.hpp"
318
319armnnTfLiteParser::ITfLiteParserPtr parser = armnnTfLiteParser::ITfLiteParser::Create();
320armnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile(modelPath.c_str());
321```
322
323##### Optimizing Graph for Compute Device
324Arm NN supports optimized execution on multiple CPU and GPU devices. Prior to executing a graph, we must select the
325appropriate device context. We do this by creating a runtime context with default options with `IRuntime()`.
326
327For example:
328```c++
329#include "armnn/ArmNN.hpp"
330
331auto runtime = armnn::IRuntime::Create(armnn::IRuntime::CreationOptions());
332```
333
334We can optimize the imported graph by specifying a list of backends in order of preference and implement
335backend-specific optimizations. The backends are identified by a string unique to the backend,
336for example `CpuAcc, GpuAcc, CpuRef`.
337
338For example:
339```c++
340std::vector<armnn::BackendId> backends{"CpuAcc", "GpuAcc", "CpuRef"};
341```
342
343Internally and transparently, Arm NN splits the graph into subgraph based on backends, it calls a optimize subgraphs
344function on each of them and, if possible, substitutes the corresponding subgraph in the original graph with
345its optimized version.
346
347Using the `Optimize()` function we optimize the graph for inference and load the optimized network onto the compute
348device with `LoadNetwork()`. This function creates the backend-specific workloads
349for the layers and a backend specific workload factory which is called to create the workloads.
350
351For example:
352```c++
353armnn::IOptimizedNetworkPtr optNet = Optimize(*network,
354 backends,
355 m_Runtime->GetDeviceSpec(),
356 armnn::OptimizerOptions());
357std::string errorMessage;
358runtime->LoadNetwork(0, std::move(optNet), errorMessage));
359std::cerr << errorMessage << std::endl;
360```
361
362##### Creating Input and Output Binding Information
363Parsers can also be used to extract the input information for the network. By calling `GetSubgraphInputTensorNames`
364we extract all the input names and, with `GetNetworkInputBindingInfo`, bind the input points of the graph.
365For example:
366```c++
367std::vector<std::string> inputNames = parser->GetSubgraphInputTensorNames(0);
368auto inputBindingInfo = parser->GetNetworkInputBindingInfo(0, inputNames[0]);
369```
370The input binding information contains all the essential information about the input. It is a tuple consisting of
371integer identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information,
372number of dimensions, total number of elements).
373
374Similarly, we can get the output binding information for an output layer by using the parser to retrieve output
375tensor names and calling `GetNetworkOutputBindingInfo()`.
376
377### Object detection pipeline
378
379Generic object detection pipeline has 3 steps to perform data pre-processing, run inference and decode inference results
380in the post-processing step.
381
382See [`ObjDetectionPipeline`](./include/NetworkPipeline.hpp) and implementations for [`MobileNetSSDv1`](./include/NetworkPipeline.hpp)
383and [`YoloV3Tiny`](./include/NetworkPipeline.hpp) for more details.
384
385#### Pre-processing the Captured Frame
386Each frame captured from source is read as an `cv::Mat` in BGR format but channels are swapped to RGB in a frame reader
387code.
388
389```c++
390cv::Mat processed;
391...
392objectDetectionPipeline->PreProcessing(frame, processed);
393```
394
395A pre-processing step consists of resizing the frame to the required resolution, padding and doing data type conversion
396to match the model input layer.
397For example, SSD MobileNet V1 that is used in our example takes for input a tensor with shape `[1, 300, 300, 3]` and
398data type `uint8`.
399
400Pre-processing step returns `cv::Mat` object containing data ready for inference.
401
402#### Executing Inference
403```c++
404od::InferenceResults results;
405...
406objectDetectionPipeline->Inference(processed, results);
407```
408Inference step will call `ArmnnNetworkExecutor::Run` method that will prepare input tensors and execute inference.
409A compute device performs inference for the loaded network using the `EnqueueWorkload()` function of the runtime context.
410For example:
411```c++
412//const void* inputData = ...;
413//outputTensors were pre-allocated before
414
415armnn::InputTensors inputTensors = {{ inputBindingInfo.first,armnn::ConstTensor(inputBindingInfo.second, inputData)}};
416runtime->EnqueueWorkload(0, inputTensors, outputTensors);
417```
418We allocate memory for output data once and map it to output tensor objects. After successful inference, we read data
419from the pre-allocated output data buffer. See [`ArmnnNetworkExecutor::ArmnnNetworkExecutor`](./src/ArmnnNetworkExecutor.cpp)
420and [`ArmnnNetworkExecutor::Run`](./src/ArmnnNetworkExecutor.cpp) for more details.
421
422#### Postprocessing
423
424##### Decoding and Processing Inference Output
425The output from inference must be decoded to obtain information about detected objects in the frame. In the examples
426there are implementations for two networks but you may also implement your own network decoding solution here.
427
428For SSD MobileNet V1 models, we decode the results to obtain the bounding box positions, classification index,
429confidence and number of detections in the input frame.
430See [`SSDResultDecoder`](./include/SSDResultDecoder.hpp) for more details.
431
432For YOLO V3 Tiny models, we decode the output and perform non-maximum suppression to filter out any weak detections
433below a confidence threshold and any redudant bounding boxes above an intersection-over-union threshold.
434See [`YoloResultDecoder`](./include/YoloResultDecoder.hpp) for more details.
435
436It is encouraged to experiment with threshold values for confidence and intersection-over-union (IoU)
437to achieve the best visual results.
438
439The detection results are always returned as a vector of [`DetectedObject`](./include/DetectedObject.hpp),
440with the box positions list containing bounding box coordinates in the form `[x_min, y_min, x_max, y_max]`.
441
442#### Drawing Bounding Boxes
443Post-processing step accepts a callback function to be invoked when the decoding is finished. We will use it
444to draw detections on the initial frame.
445With the obtained detections and using [`AddInferenceOutputToFrame`](./src/ImageUtils.cpp) function, we are able to draw bounding boxes around
446detected objects and add the associated label and confidence score.
447```c++
448//results - inference output
449objectDetectionPipeline->PostProcessing(results, [&frame, &labels](od::DetectedObjects detects) -> void {
450 AddInferenceOutputToFrame(detects, *frame, labels);
451 });
452```
453The processed frames are written to a file or displayed in a separate window.