1# TensorFlow Lite inference 2 3The term *inference* refers to the process of executing a TensorFlow Lite model 4on-device in order to make predictions based on input data. To perform an 5inference with a TensorFlow Lite model, you must run it through an 6*interpreter*. The TensorFlow Lite interpreter is designed to be lean and fast. 7The interpreter uses a static graph ordering and a custom (less-dynamic) memory 8allocator to ensure minimal load, initialization, and execution latency. 9 10This page describes how to access to the TensorFlow Lite interpreter and perform 11an inference using C++, Java, and Python, plus links to other resources for each 12[supported platform](#supported-platforms). 13 14[TOC] 15 16## Important concepts 17 18TensorFlow Lite inference typically follows the following steps: 19 201. **Loading a model** 21 22 You must load the `.tflite` model into memory, which contains the model's 23 execution graph. 24 251. **Transforming data** 26 27 Raw input data for the model generally does not match the input data format 28 expected by the model. For example, you might need to resize an image or 29 change the image format to be compatible with the model. 30 311. **Running inference** 32 33 This step involves using the TensorFlow Lite API to execute the model. It 34 involves a few steps such as building the interpreter, and allocating 35 tensors, as described in the following sections. 36 371. **Interpreting output** 38 39 When you receive results from the model inference, you must interpret the 40 tensors in a meaningful way that's useful in your application. 41 42 For example, a model might return only a list of probabilities. It's up to 43 you to map the probabilities to relevant categories and present it to your 44 end-user. 45 46## Supported platforms 47 48TensorFlow inference APIs are provided for most common mobile/embedded platforms 49such as [Android](#android-platform), [iOS](#ios-platform) and 50[Linux](#linux-platform), in multiple programming languages. 51 52In most cases, the API design reflects a preference for performance over ease of 53use. TensorFlow Lite is designed for fast inference on small devices, so it 54should be no surprise that the APIs try to avoid unnecessary copies at the 55expense of convenience. Similarly, consistency with TensorFlow APIs was not an 56explicit goal and some variance between languages is to be expected. 57 58Across all libraries, the TensorFlow Lite API enables you to load models, feed 59inputs, and retrieve inference outputs. 60 61### Android Platform 62 63On Android, TensorFlow Lite inference can be performed using either Java or C++ 64APIs. The Java APIs provide convenience and can be used directly within your 65Android Activity classes. The C++ APIs offer more flexibility and speed, but may 66require writing JNI wrappers to move data between Java and C++ layers. 67 68See below for details about using [C++](#load-and-run-a-model-in-c) and 69[Java](#load-and-run-a-model-in-java), or follow the 70[Android quickstart](android.md) for a tutorial and example code. 71 72#### TensorFlow Lite Android wrapper code generator 73 74Note: TensorFlow Lite wrapper code generator is in experimental (beta) phase and 75it currently only supports Android. 76 77For TensorFlow Lite model enhanced with [metadata](../convert/metadata.md), 78developers can use the TensorFlow Lite Android wrapper code generator to create 79platform specific wrapper code. The wrapper code removes the need to interact 80directly with `ByteBuffer` on Android. Instead, developers can interact with the 81TensorFlow Lite model with typed objects such as `Bitmap` and `Rect`. For more 82information, please refer to the 83[TensorFlow Lite Android wrapper code generator](../inference_with_metadata/codegen.md). 84 85### iOS Platform 86 87On iOS, TensorFlow Lite is available with native iOS libraries written in 88[Swift](https://www.tensorflow.org/code/tensorflow/lite/swift) 89and 90[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/objc). 91You can also use 92[C API](https://www.tensorflow.org/code/tensorflow/lite/c/c_api.h) 93directly in Objective-C codes. 94 95See below for details about using [Swift](#load-and-run-a-model-in-swift), 96[Objective-C](#load-and-run-a-model-in-objective-c) and the 97[C API](#using-c-api-in-objective-c-code), or follow the 98[iOS quickstart](ios.md) for a tutorial and example code. 99 100### Linux Platform 101 102On Linux platforms (including [Raspberry Pi](build_rpi.md)), you can run 103inferences using TensorFlow Lite APIs available in 104[C++](#load-and-run-a-model-in-c) and [Python](#load-and-run-a-model-in-python), 105as shown in the following sections. 106 107## Running a model 108 109Running a TensorFlow Lite model involves a few simple steps: 110 1111. Load the model into memory. 1122. Build an `Interpreter` based on an existing model. 1133. Set input tensor values. (Optionally resize input tensors if the predefined 114 sizes are not desired.) 1154. Invoke inference. 1165. Read output tensor values. 117 118Following sections describe how these steps can be done in each language. 119 120## Load and run a model in Java 121 122*Platform: Android* 123 124The Java API for running an inference with TensorFlow Lite is primarily designed 125for use with Android, so it's available as an Android library dependency: 126`org.tensorflow:tensorflow-lite`. 127 128In Java, you'll use the `Interpreter` class to load a model and drive model 129inference. In many cases, this may be the only API you need. 130 131You can initialize an `Interpreter` using a `.tflite` file: 132 133```java 134public Interpreter(@NotNull File modelFile); 135``` 136 137Or with a `MappedByteBuffer`: 138 139```java 140public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer); 141``` 142 143In both cases, you must provide a valid TensorFlow Lite model or the API throws 144`IllegalArgumentException`. If you use `MappedByteBuffer` to initialize an 145`Interpreter`, it must remain unchanged for the whole lifetime of the 146`Interpreter`. 147 148To then run an inference with the model, simply call `Interpreter.run()`. For 149example: 150 151```java 152try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) { 153 interpreter.run(input, output); 154} 155``` 156 157The `run()` method takes only one input and returns only one output. So if your 158model has multiple inputs or multiple outputs, instead use: 159 160```java 161interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs); 162``` 163 164In this case, each entry in `inputs` corresponds to an input tensor and 165`map_of_indices_to_outputs` maps indices of output tensors to the corresponding 166output data. 167 168In both cases, the tensor indices should correspond to the values you gave to 169the [TensorFlow Lite Converter](../convert/) when you created the model. Be 170aware that the order of tensors in `input` must match the order given to the 171TensorFlow Lite Converter. 172 173The `Interpreter` class also provides convenient functions for you to get the 174index of any model input or output using an operation name: 175 176```java 177public int getInputIndex(String opName); 178public int getOutputIndex(String opName); 179``` 180 181If `opName` is not a valid operation in the model, it throws an 182`IllegalArgumentException`. 183 184Also beware that `Interpreter` owns resources. To avoid memory leak, the 185resources must be released after use by: 186 187```java 188interpreter.close(); 189``` 190 191For an example project with Java, see the 192[Android image classification sample](https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/android). 193 194### Supported data types (in Java) 195 196To use TensorFlow Lite, the data types of the input and output tensors must be 197one of the following primitive types: 198 199* `float` 200* `int` 201* `long` 202* `byte` 203 204`String` types are also supported, but they are encoded differently than the 205primitive types. In particular, the shape of a string Tensor dictates the number 206and arrangement of strings in the Tensor, with each element itself being a 207variable length string. In this sense, the (byte) size of the Tensor cannot be 208computed from the shape and type alone, and consequently strings cannot be 209provided as a single, flat `ByteBuffer` argument. 210 211If other data types, including boxed types like `Integer` and `Float`, are used, 212an `IllegalArgumentException` will be thrown. 213 214#### Inputs 215 216Each input should be an array or multi-dimensional array of the supported 217primitive types, or a raw `ByteBuffer` of the appropriate size. If the input is 218an array or multi-dimensional array, the associated input tensor will be 219implicitly resized to the array's dimensions at inference time. If the input is 220a ByteBuffer, the caller should first manually resize the associated input 221tensor (via `Interpreter.resizeInput()`) before running inference. 222 223When using `ByteBuffer`, prefer using direct byte buffers, as this allows the 224`Interpreter` to avoid unnecessary copies. If the `ByteBuffer` is a direct byte 225buffer, its order must be `ByteOrder.nativeOrder()`. After it is used for a 226model inference, it must remain unchanged until the model inference is finished. 227 228#### Outputs 229 230Each output should be an array or multi-dimensional array of the supported 231primitive types, or a ByteBuffer of the appropriate size. Note that some models 232have dynamic outputs, where the shape of output tensors can vary depending on 233the input. There's no straightforward way of handling this with the existing 234Java inference API, but planned extensions will make this possible. 235 236## Load and run a model in Swift 237 238*Platform: iOS* 239 240The 241[Swift API](https://www.tensorflow.org/code/tensorflow/lite/swift) 242is available in `TensorFlowLiteSwift` Pod from Cocoapods. 243 244First, you need to import `TensorFlowLite` module. 245 246```swift 247import TensorFlowLite 248``` 249 250```swift 251// Getting model path 252guard 253 let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite") 254else { 255 // Error handling... 256} 257 258do { 259 // Initialize an interpreter with the model. 260 let interpreter = try Interpreter(modelPath: modelPath) 261 262 // Allocate memory for the model's input `Tensor`s. 263 try interpreter.allocateTensors() 264 265 let inputData: Data // Should be initialized 266 267 // input data preparation... 268 269 // Copy the input data to the input `Tensor`. 270 try self.interpreter.copy(inputData, toInputAt: 0) 271 272 // Run inference by invoking the `Interpreter`. 273 try self.interpreter.invoke() 274 275 // Get the output `Tensor` 276 let outputTensor = try self.interpreter.output(at: 0) 277 278 // Copy output to `Data` to process the inference results. 279 let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y}) 280 let outputData = 281 UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize) 282 outputTensor.data.copyBytes(to: outputData) 283 284 if (error != nil) { /* Error handling... */ } 285} catch error { 286 // Error handling... 287} 288``` 289 290## Load and run a model in Objective-C 291 292*Platform: iOS* 293 294The 295[Objective-C API](https://www.tensorflow.org/code/tensorflow/lite/objc) 296is available in `TensorFlowLiteObjC` Pod from Cocoapods. 297 298First, you need to import `TensorFlowLite` module. 299 300```objc 301@import TensorFlowLite; 302``` 303 304```objc 305NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model" 306 ofType:@"tflite"]; 307NSError *error; 308 309// Initialize an interpreter with the model. 310TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath 311 error:&error]; 312if (error != nil) { /* Error handling... */ } 313 314// Allocate memory for the model's input `TFLTensor`s. 315[interpreter allocateTensorsWithError:&error]; 316if (error != nil) { /* Error handling... */ } 317 318NSMutableData *inputData; // Should be initialized 319// input data preparation... 320 321// Copy the input data to the input `TFLTensor`. 322[interpreter copyData:inputData toInputTensorAtIndex:0 error:&error]; 323if (error != nil) { /* Error handling... */ } 324 325// Run inference by invoking the `TFLInterpreter`. 326[interpreter invokeWithError:&error]; 327if (error != nil) { /* Error handling... */ } 328 329// Get the output `TFLTensor` 330TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error]; 331if (error != nil) { /* Error handling... */ } 332 333// Copy output to `NSData` to process the inference results. 334NSData *outputData = [outputTensor dataWithError:&error]; 335if (error != nil) { /* Error handling... */ } 336``` 337 338### Using C API in Objective-C code 339 340Currently Objective-C API does not support delegates. In order to use delegates 341with Objective-C code, you need to directly call underlying 342[C API](https://www.tensorflow.org/code/tensorflow/lite/c/c_api.h). 343 344```c 345#include "tensorflow/lite/c/c_api.h" 346``` 347 348```c 349TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]); 350TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate(); 351 352// Create the interpreter. 353TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options); 354 355// Allocate tensors and populate the input tensor data. 356TfLiteInterpreterAllocateTensors(interpreter); 357TfLiteTensor* input_tensor = 358 TfLiteInterpreterGetInputTensor(interpreter, 0); 359TfLiteTensorCopyFromBuffer(input_tensor, input.data(), 360 input.size() * sizeof(float)); 361 362// Execute inference. 363TfLiteInterpreterInvoke(interpreter); 364 365// Extract the output tensor data. 366const TfLiteTensor* output_tensor = 367 TfLiteInterpreterGetOutputTensor(interpreter, 0); 368TfLiteTensorCopyToBuffer(output_tensor, output.data(), 369 output.size() * sizeof(float)); 370 371// Dispose of the model and interpreter objects. 372TfLiteInterpreterDelete(interpreter); 373TfLiteInterpreterOptionsDelete(options); 374TfLiteModelDelete(model); 375``` 376 377## Load and run a model in C++ 378 379*Platforms: Android, iOS, and Linux* 380 381Note: C++ API on iOS is only available when using bazel. 382 383In C++, the model is stored in 384[`FlatBufferModel`](https://www.tensorflow.org/lite/api_docs/cc/class/tflite/flat-buffer-model.html) 385class. It encapsulates a TensorFlow Lite model and you can build it in a couple 386of different ways, depending on where the model is stored: 387 388```c++ 389class FlatBufferModel { 390 // Build a model based on a file. Return a nullptr in case of failure. 391 static std::unique_ptr<FlatBufferModel> BuildFromFile( 392 const char* filename, 393 ErrorReporter* error_reporter); 394 395 // Build a model based on a pre-loaded flatbuffer. The caller retains 396 // ownership of the buffer and should keep it alive until the returned object 397 // is destroyed. Return a nullptr in case of failure. 398 static std::unique_ptr<FlatBufferModel> BuildFromBuffer( 399 const char* buffer, 400 size_t buffer_size, 401 ErrorReporter* error_reporter); 402}; 403``` 404 405Note: If TensorFlow Lite detects the presence of the 406[Android NNAPI](https://developer.android.com/ndk/guides/neuralnetworks), it 407will automatically try to use shared memory to store the `FlatBufferModel`. 408 409Now that you have the model as a `FlatBufferModel` object, you can execute it 410with an 411[`Interpreter`](https://www.tensorflow.org/lite/api_docs/cc/class/tflite/interpreter.html). 412A single `FlatBufferModel` can be used simultaneously by more than one 413`Interpreter`. 414 415Caution: The `FlatBufferModel` object must remain valid until all instances of 416`Interpreter` using it have been destroyed. 417 418The important parts of the `Interpreter` API are shown in the code snippet 419below. It should be noted that: 420 421* Tensors are represented by integers, in order to avoid string comparisons 422 (and any fixed dependency on string libraries). 423* An interpreter must not be accessed from concurrent threads. 424* Memory allocation for input and output tensors must be triggered by calling 425 `AllocateTensors()` right after resizing tensors. 426 427The simplest usage of TensorFlow Lite with C++ looks like this: 428 429```c++ 430// Load the model 431std::unique_ptr<tflite::FlatBufferModel> model = 432 tflite::FlatBufferModel::BuildFromFile(filename); 433 434// Build the interpreter 435tflite::ops::builtin::BuiltinOpResolver resolver; 436std::unique_ptr<tflite::Interpreter> interpreter; 437tflite::InterpreterBuilder(*model, resolver)(&interpreter); 438 439// Resize input tensors, if desired. 440interpreter->AllocateTensors(); 441 442float* input = interpreter->typed_input_tensor<float>(0); 443// Fill `input`. 444 445interpreter->Invoke(); 446 447float* output = interpreter->typed_output_tensor<float>(0); 448``` 449 450For more example code, see 451[`minimal.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/minimal/minimal.cc) 452and 453[`label_image.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/label_image.cc). 454 455## Load and run a model in Python 456 457*Platform: Linux* 458 459The Python API for running an inference is provided in the `tf.lite` module. 460From which, you mostly need only 461[`tf.lite.Interpreter`](https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter) 462to load a model and run an inference. 463 464The following example shows how to use the Python interpreter to load a 465`.tflite` file and run inference with random input data: 466 467```python 468import numpy as np 469import tensorflow as tf 470 471# Load the TFLite model and allocate tensors. 472interpreter = tf.lite.Interpreter(model_path="converted_model.tflite") 473interpreter.allocate_tensors() 474 475# Get input and output tensors. 476input_details = interpreter.get_input_details() 477output_details = interpreter.get_output_details() 478 479# Test the model on random input data. 480input_shape = input_details[0]['shape'] 481input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32) 482interpreter.set_tensor(input_details[0]['index'], input_data) 483 484interpreter.invoke() 485 486# The function `get_tensor()` returns a copy of the tensor data. 487# Use `tensor()` in order to get a pointer to the tensor. 488output_data = interpreter.get_tensor(output_details[0]['index']) 489print(output_data) 490``` 491 492As an alternative to loading the model as a pre-converted `.tflite` file, you 493can combine your code with the 494[TensorFlow Lite Converter Python API](https://www.tensorflow.org/lite/convert/python_api) 495(`tf.lite.TFLiteConverter`), allowing you to convert your TensorFlow model into 496the TensorFlow Lite format and then run inference: 497 498```python 499import numpy as np 500import tensorflow as tf 501 502img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3)) 503const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.]) 504val = img + const 505out = tf.identity(val, name="out") 506 507# Convert to TF Lite format 508with tf.Session() as sess: 509 converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out]) 510 tflite_model = converter.convert() 511 512# Load the TFLite model and allocate tensors. 513interpreter = tf.lite.Interpreter(model_content=tflite_model) 514interpreter.allocate_tensors() 515 516# Continue to get tensors and so forth, as shown above... 517``` 518 519For more Python sample code, see 520[`label_image.py`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py). 521 522Tip: Run `help(tf.lite.Interpreter)` in the Python terminal to get detailed 523documentation about the interpreter. 524 525## Supported operations 526 527TensorFlow Lite supports a subset of TensorFlow operations with some 528limitations. For full list of operations and limitations see 529[TF Lite Ops page](https://www.tensorflow.org/mlir/tfl_ops). 530