• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Object detection
2
3Given an image or a video stream, an object detection model can identify which
4of a known set of objects might be present and provide information about their
5positions within the image.
6
7For example, this screenshot of the <a href="#get_started">example
8application</a> shows how two objects have been recognized and their positions
9annotated:
10
11<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
12
13Note: (1) To integrate an existing model, try
14[TensorFlow Lite Task Library](https://www.tensorflow.org/lite/inference_with_metadata/task_library/object_detector).
15(2) To customize a model, try
16[TensorFlow Lite Model Maker](https://www.tensorflow.org/lite/guide/model_maker).
17
18## Get started
19
20To learn how to use object detection in a mobile app, explore the
21<a href="#example_applications_and_guides">Example applications and guides</a>.
22
23If you are using a platform other than Android or iOS, or if you are already
24familiar with the
25<a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite
26APIs</a>, you can download our starter object detection model and the
27accompanying labels.
28
29<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
30starter model with Metadata</a>
31
32For more information about Metadata and associated fields (eg: `labels.txt`) see
33<a href="../../models/convert/metadata#read_the_metadata_from_models">Read
34the metadata from models</a>
35
36If you want to train a custom detection model for your own task, see
37<a href="#model-customization">Model customization</a>.
38
39For the following use cases, you should use a different type of model:
40
41<ul>
42  <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
43  <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
44</ul>
45
46### Example applications and guides
47
48If you are new to TensorFlow Lite and are working with Android or iOS, we
49recommend exploring the following example applications that can help you get
50started.
51
52#### Android
53
54You can leverage the out-of-box API from
55[TensorFlow Lite Task Library](../../inference_with_metadata/task_library/object_detector)
56to integrate object detection models in just a few lines of code. You can also
57build your own custom inference pipeline using the
58[TensorFlow Lite Interpreter Java API](../../guide/inference#load_and_run_a_model_in_java).
59
60The Android example below demonstrates the implementation for both methods as
61[lib_task_api](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android/lib_task_api)
62and
63[lib_interpreter](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android/lib_interpreter),
64respectively.
65
66<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">View
67Android example</a>
68
69#### iOS
70
71You can integrate the model using the
72[TensorFlow Lite Interpreter Swift API](../../guide/inference#load_and_run_a_model_in_swift).
73See the iOS example below.
74
75<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">View
76iOS example</a>
77
78## Model description
79
80This section describes the signature for
81[Single-Shot Detector](https://arxiv.org/abs/1512.02325) models converted to
82TensorFlow Lite from the
83[TensorFlow Object Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/).
84
85An object detection model is trained to detect the presence and location of
86multiple classes of objects. For example, a model might be trained with images
87that contain various pieces of fruit, along with a _label_ that specifies the
88class of fruit they represent (e.g. an apple, a banana, or a strawberry), and
89data specifying where each object appears in the image.
90
91When an image is subsequently provided to the model, it will output a list of
92the objects it detects, the location of a bounding box that contains each
93object, and a score that indicates the confidence that detection was correct.
94
95### Input Signature
96
97The model takes an image as input.
98
99Lets assume the expected image is 300x300 pixels, with three channels (red,
100blue, and green) per pixel. This should be fed to the model as a flattened
101buffer of 270,000 byte values (300x300x3). If the model is
102<a href="../../performance/post_training_quantization.md">quantized</a>, each
103value should be a single byte representing a value between 0 and 255.
104
105You can take a look at our
106[example app code](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android)
107to understand how to do this pre-processing on Android.
108
109### Output Signature
110
111The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
112describe `N` detected objects, with one element in each array corresponding to
113each object.
114
115<table>
116  <thead>
117    <tr>
118      <th>Index</th>
119      <th>Name</th>
120      <th>Description</th>
121    </tr>
122  </thead>
123  <tbody>
124    <tr>
125      <td>0</td>
126      <td>Locations</td>
127      <td>Multidimensional array of [N][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
128    </tr>
129    <tr>
130      <td>1</td>
131      <td>Classes</td>
132      <td>Array of N integers (output as floating point values) each indicating the index of a class label from the labels file</td>
133    </tr>
134    <tr>
135      <td>2</td>
136      <td>Scores</td>
137      <td>Array of N floating point values between 0 and 1 representing probability that a class was detected</td>
138    </tr>
139    <tr>
140      <td>3</td>
141      <td>Number of detections</td>
142      <td>Integer value of N</td>
143    </tr>
144  </tbody>
145</table>
146
147NOTE: The number of results (10 in the above case) is a parameter set while
148exporting the detection model to TensorFlow Lite. See
149<a href="#model-customization">Model customization</a> for more details.
150
151For example, imagine a model has been trained to detect apples, bananas, and
152strawberries. When provided an image, it will output a set number of detection
153results - in this example, 5.
154
155<table style="width: 60%;">
156  <thead>
157    <tr>
158      <th>Class</th>
159      <th>Score</th>
160      <th>Location</th>
161    </tr>
162  </thead>
163  <tbody>
164    <tr>
165      <td>Apple</td>
166      <td>0.92</td>
167      <td>[18, 21, 57, 63]</td>
168    </tr>
169    <tr>
170      <td>Banana</td>
171      <td>0.88</td>
172      <td>[100, 30, 180, 150]</td>
173    </tr>
174    <tr>
175      <td>Strawberry</td>
176      <td>0.87</td>
177      <td>[7, 82, 89, 163] </td>
178    </tr>
179    <tr>
180      <td>Banana</td>
181      <td>0.23</td>
182      <td>[42, 66, 57, 83]</td>
183    </tr>
184    <tr>
185      <td>Apple</td>
186      <td>0.11</td>
187      <td>[6, 42, 31, 58]</td>
188    </tr>
189  </tbody>
190</table>
191
192#### Confidence score
193
194To interpret these results, we can look at the score and the location for each
195detected object. The score is a number between 0 and 1 that indicates confidence
196that the object was genuinely detected. The closer the number is to 1, the more
197confident the model is.
198
199Depending on your application, you can decide a cut-off threshold below which
200you will discard detection results. For the current example, a sensible cut-off
201is a score of 0.5 (meaning a 50% probability that the detection is valid). In
202that case, the last two objects in the array would be ignored because those
203confidence scores are below 0.5:
204
205<table style="width: 60%;">
206  <thead>
207    <tr>
208      <th>Class</th>
209      <th>Score</th>
210      <th>Location</th>
211    </tr>
212  </thead>
213  <tbody>
214    <tr>
215      <td>Apple</td>
216      <td>0.92</td>
217      <td>[18, 21, 57, 63]</td>
218    </tr>
219    <tr>
220      <td>Banana</td>
221      <td>0.88</td>
222      <td>[100, 30, 180, 150]</td>
223    </tr>
224    <tr>
225      <td>Strawberry</td>
226      <td>0.87</td>
227      <td>[7, 82, 89, 163] </td>
228    </tr>
229    <tr>
230      <td style="background-color: #e9cecc; text-decoration-line: line-through;">Banana</td>
231      <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.23</td>
232      <td style="background-color: #e9cecc; text-decoration-line: line-through;">[42, 66, 57, 83]</td>
233    </tr>
234    <tr>
235      <td style="background-color: #e9cecc; text-decoration-line: line-through;">Apple</td>
236      <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.11</td>
237      <td style="background-color: #e9cecc; text-decoration-line: line-through;">[6, 42, 31, 58]</td>
238    </tr>
239  </tbody>
240</table>
241
242The cut-off you use should be based on whether you are more comfortable with
243false positives (objects that are wrongly identified, or areas of the image that
244are erroneously identified as objects when they are not), or false negatives
245(genuine objects that are missed because their confidence was low).
246
247For example, in the following image, a pear (which is not an object that the
248model was trained to detect) was misidentified as a "person". This is an example
249of a false positive that could be ignored by selecting an appropriate cut-off.
250In this case, a cut-off of 0.6 (or 60%) would comfortably exclude the false
251positive.
252
253<img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%">
254
255#### Location
256
257For each detected object, the model will return an array of four numbers
258representing a bounding rectangle that surrounds its position. For the starter
259model provided, the numbers are ordered as follows:
260
261<table style="width: 50%; margin: 0 auto;">
262  <tbody>
263    <tr style="border-top: none;">
264      <td>[</td>
265      <td>top,</td>
266      <td>left,</td>
267      <td>bottom,</td>
268      <td>right</td>
269      <td>]</td>
270    </tr>
271  </tbody>
272</table>
273
274The top value represents the distance of the rectangle’s top edge from the top
275of the image, in pixels. The left value represents the left edge’s distance from
276the left of the input image. The other values represent the bottom and right
277edges in a similar manner.
278
279Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your device’s camera, and you will have to write code to crop and scale your raw image to fit the model’s input size (there are examples of this in our <a href="#get_started">example applications</a>).<br /><br />The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly.
280
281## Performance benchmarks
282
283Performance benchmark numbers for our
284<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">starter
285model</a> are generated with the tool
286[described here](https://www.tensorflow.org/lite/performance/benchmarks).
287
288<table>
289  <thead>
290    <tr>
291      <th>Model Name</th>
292      <th>Model size </th>
293      <th>Device </th>
294      <th>GPU</th>
295      <th>CPU</th>
296    </tr>
297  </thead>
298  <tr>
299    <td rowspan = 3>
300      <a href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">COCO SSD MobileNet v1</a>
301    </td>
302    <td rowspan = 3>
303      27 Mb
304    </td>
305    <td>Pixel 3 (Android 10) </td>
306    <td>22ms</td>
307    <td>46ms*</td>
308  </tr>
309   <tr>
310     <td>Pixel 4 (Android 10) </td>
311    <td>20ms</td>
312    <td>29ms*</td>
313  </tr>
314   <tr>
315     <td>iPhone XS (iOS 12.4.1) </td>
316     <td>7.6ms</td>
317    <td>11ms** </td>
318  </tr>
319</table>
320
321\* 4 threads used.
322
323\*\* 2 threads used on iPhone for the best performance result.
324
325## Model Customization
326
327### Pre-trained models
328
329Mobile-optimized detection models with a variety of latency and precision
330characteristics can be found in the
331[Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#mobile-models).
332Each one of them follows the input and output signatures described in the
333following sections.
334
335Most of the download zips contain a `model.tflite` file. If there isn't one, a
336TensorFlow Lite flatbuffer can be generated using
337[these instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md).
338SSD models from the
339[TF2 Object Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)
340can also be converted to TensorFlow Lite using the instructions
341[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md).
342It is important to note that detection models cannot be converted directly using
343the [TensorFlow Lite Converter](../../models/convert), since
344they require an intermediate step of generating a mobile-friendly source model.
345The scripts linked above perform this step.
346
347Both the
348[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
349&
350[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
351exporting scripts have parameters that can enable a larger number of output
352objects or slower, more-accurate post processing. Please use `--help` with the
353scripts to see an exhaustive list of supported arguments.
354
355> Currently, on-device inference is only optimized with SSD models. Better
356> support for other architectures like CenterNet and EfficientDet is being
357> investigated.
358
359### How to choose a model to customize?
360
361Each model comes with its own precision (quantified by mAP value) and latency
362characteristics. You should choose a model that works the best for your use-case
363and intended hardware. For example, the
364[Edge TPU](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#pixel4-edge-tpu-models)
365models are ideal for inference on Google's Edge TPU on Pixel 4.
366
367You can use our
368[benchmark tool](https://www.tensorflow.org/lite/performance/measurement) to
369evaluate models and choose the most efficient option available.
370
371## Fine-tuning models on custom data
372
373The pre-trained models we provide are trained to detect 90 classes of objects.
374For a full list of classes, see the labels file in the
375<a href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">model
376metadata</a>.
377
378You can use a technique known as transfer learning to re-train a model to
379recognize classes not in the original set. For example, you could re-train the
380model to detect multiple types of vegetable, despite there only being one
381vegetable in the original training data. To do this, you will need a set of
382training images for each of the new labels you wish to train. The recommended
383way is to use
384[TensorFlow Lite Model Maker](https://www.tensorflow.org/lite/guide/model_maker)
385library which simplifies the process of training a TensorFlow Lite model using
386custom dataset, with a few lines of codes. It uses transfer learning to reduce
387the amount of required training data and time. You can also learn from
388[Few-shot detection Colab](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb)
389as an example of fine-tuning a pre-trained model with few examples.
390
391For fine-tuning with larger datasets, take a look at the these guides for
392training your own models with the TensorFlow Object Detection API:
393[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md),
394[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md).
395Once trained, they can be converted to a TFLite-friendly format with the
396instructions here:
397[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md),
398[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
399