1# Object detection 2 3<img src="../images/detection.png" class="attempt-right"> 4 5Detect multiple objects within an image, with bounding boxes. Recognize 80 6different classes of objects. 7 8## Get started 9 10If you are new to TensorFlow Lite and are working with Android or iOS, we 11recommend exploring the following example applications that can help you get 12started. 13 14<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/android">Android 15example</a> 16<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/ios">iOS 17example</a> 18 19If you are using a platform other than Android or iOS, or you are already 20familiar with the <a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite APIs</a>, you can 21download our starter object detection model and the accompanying labels. 22 23<a class="button button-primary" href="http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">Download 24starter model and labels</a> 25 26For more information about the starter model, see 27<a href="#starter_model">Starter model</a>. 28 29## What is object detection? 30 31Given an image or a video stream, an object detection model can identify which 32of a known set of objects might be present and provide information about their 33positions within the image. 34 35For example, this screenshot of our <a href="#get_started">example 36application</a> shows how two objects have been recognized and their positions 37annotated: 38 39<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%"> 40 41An object detection model is trained to detect the presence and location of 42multiple classes of objects. For example, a model might be trained with images 43that contain various pieces of fruit, along with a _label_ that specifies the 44class of fruit they represent (e.g. an apple, a banana, or a strawberry), and 45data specifying where each object appears in the image. 46 47When we subsequently provide an image to the model, it will output a list of the 48objects it detects, the location of a bounding box that contains each object, 49and a score that indicates the confidence that detection was correct. 50 51### Model output 52 53Imagine a model has been trained to detect apples, bananas, and strawberries. 54When we pass it an image, it will output a set number of detection results - in 55this example, 5. 56 57<table style="width: 60%;"> 58 <thead> 59 <tr> 60 <th>Class</th> 61 <th>Score</th> 62 <th>Location</th> 63 </tr> 64 </thead> 65 <tbody> 66 <tr> 67 <td>Apple</td> 68 <td>0.92</td> 69 <td>[18, 21, 57, 63]</td> 70 </tr> 71 <tr> 72 <td>Banana</td> 73 <td>0.88</td> 74 <td>[100, 30, 180, 150]</td> 75 </tr> 76 <tr> 77 <td>Strawberry</td> 78 <td>0.87</td> 79 <td>[7, 82, 89, 163] </td> 80 </tr> 81 <tr> 82 <td>Banana</td> 83 <td>0.23</td> 84 <td>[42, 66, 57, 83]</td> 85 </tr> 86 <tr> 87 <td>Apple</td> 88 <td>0.11</td> 89 <td>[6, 42, 31, 58]</td> 90 </tr> 91 </tbody> 92</table> 93 94### Confidence score 95 96To interpret these results, we can look at the score and the location for each 97detected object. The score is a number between 0 and 1 that indicates confidence 98that the object was genuinely detected. The closer the number is to 1, the more 99confident the model is. 100 101Depending on your application, you can decide a cut-off threshold below which 102you will discard detection results. For our example, we might decide a sensible 103cut-off is a score of 0.5 (meaning a 50% probability that the detection is 104valid). In that case, we would ignore the last two objects in the array, because 105those confidence scores are below 0.5: 106 107<table style="width: 60%;"> 108 <thead> 109 <tr> 110 <th>Class</th> 111 <th>Score</th> 112 <th>Location</th> 113 </tr> 114 </thead> 115 <tbody> 116 <tr> 117 <td>Apple</td> 118 <td>0.92</td> 119 <td>[18, 21, 57, 63]</td> 120 </tr> 121 <tr> 122 <td>Banana</td> 123 <td>0.88</td> 124 <td>[100, 30, 180, 150]</td> 125 </tr> 126 <tr> 127 <td>Strawberry</td> 128 <td>0.87</td> 129 <td>[7, 82, 89, 163] </td> 130 </tr> 131 <tr> 132 <td style="background-color: #e9cecc; text-decoration-line: line-through;">Banana</td> 133 <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.23</td> 134 <td style="background-color: #e9cecc; text-decoration-line: line-through;">[42, 66, 57, 83]</td> 135 </tr> 136 <tr> 137 <td style="background-color: #e9cecc; text-decoration-line: line-through;">Apple</td> 138 <td style="background-color: #e9cecc; text-decoration-line: line-through;">0.11</td> 139 <td style="background-color: #e9cecc; text-decoration-line: line-through;">[6, 42, 31, 58]</td> 140 </tr> 141 </tbody> 142</table> 143 144The cut-off you use should be based on whether you are more comfortable with 145false positives (objects that are wrongly identified, or areas of the image that 146are erroneously identified as objects when they are not), or false negatives 147(genuine objects that are missed because their confidence was low). 148 149For example, in the following image, a pear (which is not an object that the 150model was trained to detect) was misidentified as a "person". This is an example 151of a false positive that could be ignored by selecting an appropriate cut-off. 152In this case, a cut-off of 0.6 (or 60%) would comfortably exclude the false 153positive. 154 155<img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%"> 156 157### Location 158 159For each detected object, the model will return an array of four numbers 160representing a bounding rectangle that surrounds its position. For the starter 161model we provide, the numbers are ordered as follows: 162 163<table style="width: 50%; margin: 0 auto;"> 164 <tbody> 165 <tr style="border-top: none;"> 166 <td>[</td> 167 <td>top,</td> 168 <td>left,</td> 169 <td>bottom,</td> 170 <td>right</td> 171 <td>]</td> 172 </tr> 173 </tbody> 174</table> 175 176The top value represents the distance of the rectangle’s top edge from the top 177of the image, in pixels. The left value represents the left edge’s distance from 178the left of the input image. The other values represent the bottom and right 179edges in a similar manner. 180 181Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your device’s camera, and you will have to write code to crop and scale your raw image to fit the model’s input size (there are examples of this in our <a href="#get_started">example applications</a>).<br /><br />The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly. 182 183## Starter model 184 185We recommend starting with this pre-trained quantized COCO SSD MobileNet v1 186model. 187 188<a class="button button-primary" href="http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">Download 189starter model and labels</a> 190 191### Uses and limitations 192 193The object detection model we provide can identify and locate up to 10 objects 194in an image. It is trained to recognize 80 classes of object. For a full list of 195classes, see the labels file in the 196<a href="http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">model 197zip</a>. 198 199If you want to train a model to recognize new classes, see 200<a href="#customize_model">Customize model</a>. 201 202For the following use cases, you should use a different type of model: 203 204<ul> 205 <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li> 206 <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li> 207</ul> 208 209### Input 210 211The model takes an image as input. The expected image is 300x300 pixels, with 212three channels (red, blue, and green) per pixel. This should be fed to the model 213as a flattened buffer of 270,000 byte values (300x300x3). Since the model is 214<a href="../../performance/post_training_quantization.md">quantized</a>, each 215value should be a single byte representing a value between 0 and 255. 216 217### Output 218 219The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2 220describe 10 detected objects, with one element in each array corresponding to 221each object. There will always be 10 objects detected. 222 223<table> 224 <thead> 225 <tr> 226 <th>Index</th> 227 <th>Name</th> 228 <th>Description</th> 229 </tr> 230 </thead> 231 <tbody> 232 <tr> 233 <td>0</td> 234 <td>Locations</td> 235 <td>Multidimensional array of [10][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td> 236 </tr> 237 <tr> 238 <td>1</td> 239 <td>Classes</td> 240 <td>Array of 10 integers (output as floating point values) each indicating the index of a class label from the labels file</td> 241 </tr> 242 <tr> 243 <td>2</td> 244 <td>Scores</td> 245 <td>Array of 10 floating point values between 0 and 1 representing probability that a class was detected</td> 246 </tr> 247 <tr> 248 <td>3</td> 249 <td>Number and detections</td> 250 <td>Array of length 1 containing a floating point value expressing the total number of detection results</td> 251 </tr> 252 </tbody> 253</table> 254 255## Customize model 256 257The pre-trained models we provide are trained to detect 80 classes of object. 258For a full list of classes, see the labels file in the 259<a href="http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">model 260zip</a>. 261 262You can use a technique known as transfer learning to re-train a model to 263recognize classes not in the original set. For example, you could re-train the 264model to detect multiple types of vegetable, despite there only being one 265vegetable in the original training data. To do this, you will need a set of 266training images for each of the new labels you wish to train. 267 268Learn how to perform transfer learning in 269<a href="https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193">Training 270and serving a real-time mobile object detector in 30 minutes</a>. 271