1# Using MindSpore Lite for Model Inference 2 3## When to Use 4 5MindSpore Lite is an AI engine that provides AI model inference for different hardware devices. It has been used in a wide range of fields, such as image classification, target recognition, facial recognition, and character recognition. 6 7This document describes the general development process for MindSpore Lite model inference. 8 9## Basic Concepts 10 11Before getting started, you need to understand the following basic concepts: 12 13**Tensor**: a special data structure that is similar to arrays and matrices. It is basic data structure used in MindSpore Lite network operations. 14 15**Float16 inference mode**: a mode that uses half-precision inference. Float16 uses 16 bits to represent a number and therefore it is also called half-precision. 16 17 18 19## Available APIs 20APIs involved in MindSpore Lite model inference are categorized into context APIs, model APIs, and tensor APIs. 21### Context APIs 22 23| API | Description | 24| ------------------ | ----------------- | 25|OH_AI_ContextHandle OH_AI_ContextCreate()|Creates a context object.| 26|void OH_AI_ContextSetThreadNum(OH_AI_ContextHandle context, int32_t thread_num)|Sets the number of runtime threads.| 27| void OH_AI_ContextSetThreadAffinityMode(OH_AI_ContextHandle context, int mode)|Sets the affinity mode for binding runtime threads to CPU cores, which are classified into large, medium, and small cores based on the CPU frequency. You only need to bind the large or medium cores, but not small cores. 28|OH_AI_DeviceInfoHandle OH_AI_DeviceInfoCreate(OH_AI_DeviceType device_type)|Creates a runtime device information object.| 29|void OH_AI_ContextDestroy(OH_AI_ContextHandle *context)|Destroys a context object.| 30|void OH_AI_DeviceInfoSetEnableFP16(OH_AI_DeviceInfoHandle device_info, bool is_fp16)|Sets whether to enable float16 inference. This function is available only for CPU and GPU devices.| 31|void OH_AI_ContextAddDeviceInfo(OH_AI_ContextHandle context, OH_AI_DeviceInfoHandle device_info)|Adds a runtime device information object.| 32 33### Model APIs 34 35| API | Description | 36| ------------------ | ----------------- | 37|OH_AI_ModelHandle OH_AI_ModelCreate()|Creates a model object.| 38|OH_AI_Status OH_AI_ModelBuildFromFile(OH_AI_ModelHandle model, const char *model_path,OH_AI_ModelType odel_type, const OH_AI_ContextHandle model_context)|Loads and builds a MindSpore model from a model file.| 39|void OH_AI_ModelDestroy(OH_AI_ModelHandle *model)|Destroys a model object.| 40 41### Tensor APIs 42 43| API | Description | 44| ------------------ | ----------------- | 45|OH_AI_TensorHandleArray OH_AI_ModelGetInputs(const OH_AI_ModelHandle model)|Obtains the input tensor array structure of a model.| 46|int64_t OH_AI_TensorGetElementNum(const OH_AI_TensorHandle tensor)|Obtains the number of tensor elements.| 47|const char *OH_AI_TensorGetName(const OH_AI_TensorHandle tensor)|Obtains the name of a tensor.| 48|OH_AI_DataType OH_AI_TensorGetDataType(const OH_AI_TensorHandle tensor)|Obtains the tensor data type.| 49|void *OH_AI_TensorGetMutableData(const OH_AI_TensorHandle tensor)|Obtains the pointer to variable tensor data.| 50 51## How to Develop 52The following figure shows the development process for MindSpore Lite model inference. 53 54**Figure 1** Development process for MindSpore Lite model inference 55 56 57Before moving to the development process, you need to reference related header files and compile functions to generate random input. The sample code is as follows: 58 59```c 60#include <stdlib.h> 61#include <stdio.h> 62#include "mindspore/model.h" 63 64// Generate random input. 65int GenerateInputDataWithRandom(OH_AI_TensorHandleArray inputs) { 66 for (size_t i = 0; i < inputs.handle_num; ++i) { 67 float *input_data = (float *)OH_AI_TensorGetMutableData(inputs.handle_list[i]); 68 if (input_data == NULL) { 69 printf("MSTensorGetMutableData failed.\n"); 70 return OH_AI_STATUS_LITE_ERROR; 71 } 72 int64_t num = OH_AI_TensorGetElementNum(inputs.handle_list[i]); 73 const int divisor = 10; 74 for (size_t j = 0; j < num; j++) { 75 input_data[j] = (float)(rand() % divisor) / divisor; // 0--0.9f 76 } 77 } 78 return OH_AI_STATUS_SUCCESS; 79} 80``` 81 82The development process consists of the following main steps: 831. Prepare the required model. 84 85 The required model can be downloaded directly or obtained using the model conversion tool. 86 87 - If the downloaded model is in the `.ms` format, you can use it directly for inference. The following uses the **mobilenetv2.ms** model as an example. 88 - If the downloaded model uses a third-party framework, such as TensorFlow, TensorFlow Lite, Caffe, or ONNX, you can use the [model conversion tool](https://www.mindspore.cn/lite/docs/en/r1.5/use/downloads.html#id1) to convert it to the .ms format. 89 902. Create a context, and set parameters such as the number of runtime threads and device type. 91 92 The following describes two typical scenarios: 93 94 Scenario 1: Only the CPU inference context is created. 95 96 ```c 97 // Create a context, and set the number of runtime threads to 2 and the thread affinity mode to 1 (big cores first). 98 OH_AI_ContextHandle context = OH_AI_ContextCreate(); 99 if (context == NULL) { 100 printf("OH_AI_ContextCreate failed.\n"); 101 return OH_AI_STATUS_LITE_ERROR; 102 } 103 const int thread_num = 2; 104 OH_AI_ContextSetThreadNum(context, thread_num); 105 OH_AI_ContextSetThreadAffinityMode(context, 1); 106 // Set the device type to CPU, and disable Float16 inference. 107 OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU); 108 if (cpu_device_info == NULL) { 109 printf("OH_AI_DeviceInfoCreate failed.\n"); 110 OH_AI_ContextDestroy(&context); 111 return OH_AI_STATUS_LITE_ERROR; 112 } 113 OH_AI_DeviceInfoSetEnableFP16(cpu_device_info, false); 114 OH_AI_ContextAddDeviceInfo(context, cpu_device_info); 115 ``` 116 117 Scenario 2: The neural network runtime (NNRt) and CPU heterogeneous inference contexts are created. 118 119 NNRt is the runtime for cross-chip inference computing in the AI field. Generally, the acceleration hardware connected to NNRt, such as the NPU, has strong inference capabilities but supports only a limited number of operators, whereas the general-purpose CPU has weak inference capabilities but supports a wide range of operators. MindSpore Lite supports NNRt/CPU heterogeneous inference. Model operators are preferentially scheduled to NNRt inference. If certain operators are not supported by NNRt, then they are scheduled to the CPU for inference. The following is the sample code for configuring NNRt/CPU heterogeneous inference: 120 121 > **NOTE** 122 > 123 > NNRt/CPU heterogeneous inference requires access of NNRt hardware. For details, see [OpenHarmony/ai_neural_network_runtime](https://gitee.com/openharmony/ai_neural_network_runtime). 124 125 ```c 126 // Create a context, and set the number of runtime threads to 2 and the thread affinity mode to 1 (big cores first). 127 OH_AI_ContextHandle context = OH_AI_ContextCreate(); 128 if (context == NULL) { 129 printf("OH_AI_ContextCreate failed.\n"); 130 return OH_AI_STATUS_LITE_ERROR; 131 } 132 // Preferentially use NNRt inference. 133 // Use the NNRt hardware of the first ACCELERATORS class to create the NNRt device information and configure the high-performance inference mode for the NNRt hardware. You can also use OH_AI_GetAllNNRTDeviceDescs() to obtain the list of NNRt devices in the current environment, search for a specific device by device name or type, and use the device as the NNRt inference hardware. 134 OH_AI_DeviceInfoHandle nnrt_device_info = OH_AI_CreateNNRTDeviceInfoByType(OH_AI_NNRTDEVICE_ACCELERATOR); 135 if (nnrt_device_info == NULL) { 136 printf("OH_AI_DeviceInfoCreate failed.\n"); 137 OH_AI_ContextDestroy(&context); 138 return OH_AI_STATUS_LITE_ERROR; 139 } 140 OH_AI_DeviceInfoSetPerformanceMode(nnrt_device_info, OH_AI_PERFORMANCE_HIGH); 141 OH_AI_ContextAddDeviceInfo(context, nnrt_device_info); 142 143 // Configure CPU inference. 144 OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU); 145 if (cpu_device_info == NULL) { 146 printf("OH_AI_DeviceInfoCreate failed.\n"); 147 OH_AI_ContextDestroy(&context); 148 return OH_AI_STATUS_LITE_ERROR; 149 } 150 OH_AI_ContextAddDeviceInfo(context, cpu_device_info); 151 ``` 152 153 154 1553. Create, load, and build the model. 156 157 Call **OH_AI_ModelBuildFromFile** to load and build the model. 158 159 In this example, the **argv[1]** parameter passed to **OH_AI_ModelBuildFromFile** indicates the specified model file path. 160 161 ```c 162 // Create a model. 163 OH_AI_ModelHandle model = OH_AI_ModelCreate(); 164 if (model == NULL) { 165 printf("OH_AI_ModelCreate failed.\n"); 166 OH_AI_ContextDestroy(&context); 167 return OH_AI_STATUS_LITE_ERROR; 168 } 169 170 // Load and build the inference model. The model type is OH_AI_MODELTYPE_MINDIR. 171 int ret = OH_AI_ModelBuildFromFile(model, argv[1], OH_AI_MODELTYPE_MINDIR, context); 172 if (ret != OH_AI_STATUS_SUCCESS) { 173 printf("OH_AI_ModelBuildFromFile failed, ret: %d.\n", ret); 174 OH_AI_ModelDestroy(&model); 175 return ret; 176 } 177 ``` 178 1794. Input data. 180 181 Before executing model inference, you need to populate data to the input tensor. In this example, random data is used to populate the model. 182 183 ```c 184 // Obtain the input tensor. 185 OH_AI_TensorHandleArray inputs = OH_AI_ModelGetInputs(model); 186 if (inputs.handle_list == NULL) { 187 printf("OH_AI_ModelGetInputs failed, ret: %d.\n", ret); 188 OH_AI_ModelDestroy(&model); 189 return ret; 190 } 191 // Use random data to populate the tensor. 192 ret = GenerateInputDataWithRandom(inputs); 193 if (ret != OH_AI_STATUS_SUCCESS) { 194 printf("GenerateInputDataWithRandom failed, ret: %d.\n", ret); 195 OH_AI_ModelDestroy(&model); 196 return ret; 197 } 198 ``` 199 2005. Execute model inference. 201 202 Call **OH_AI_ModelPredict** to perform model inference. 203 204 ```c 205 // Execute model inference. 206 OH_AI_TensorHandleArray outputs; 207 ret = OH_AI_ModelPredict(model, inputs, &outputs, NULL, NULL); 208 if (ret != OH_AI_STATUS_SUCCESS) { 209 printf("OH_AI_ModelPredict failed, ret: %d.\n", ret); 210 OH_AI_ModelDestroy(&model); 211 return ret; 212 } 213 ``` 214 2156. Obtain the output. 216 217 After model inference is complete, you can obtain the inference result through the output tensor. 218 219 ```c 220 // Obtain the output tensor and print the information. 221 for (size_t i = 0; i < outputs.handle_num; ++i) { 222 OH_AI_TensorHandle tensor = outputs.handle_list[i]; 223 int64_t element_num = OH_AI_TensorGetElementNum(tensor); 224 printf("Tensor name: %s, tensor size is %zu ,elements num: %lld.\n", OH_AI_TensorGetName(tensor), 225 OH_AI_TensorGetDataSize(tensor), element_num); 226 const float *data = (const float *)OH_AI_TensorGetData(tensor); 227 printf("output data is:\n"); 228 const int max_print_num = 50; 229 for (int j = 0; j < element_num && j <= max_print_num; ++j) { 230 printf("%f ", data[j]); 231 } 232 printf("\n"); 233 } 234 ``` 235 2367. Destroy the model. 237 238 If the MindSpore Lite inference framework is no longer needed, you need to destroy the created model. 239 240 ```c 241 // Destroy the model. 242 OH_AI_ModelDestroy(&model); 243 ``` 244 245## Verification 246 2471. Compile **CMakeLists.txt**. 248 249 ```cmake 250 cmake_minimum_required(VERSION 3.14) 251 project(Demo) 252 253 add_executable(demo main.c) 254 255 target_link_libraries( 256 demo 257 mindspore-lite.huawei 258 pthread 259 dl 260 ) 261 ``` 262 - To use ohos-sdk for cross compilation, you need to set the native toolchain path for the CMake tool as follows: `-DCMAKE_TOOLCHAIN_FILE="/xxx/native/build/cmake/ohos.toolchain.camke"`. 263 264 - The toolchain builds a 64-bit application by default. To build a 32-bit application, add the following configuration: `-DOHOS_ARCH="armeabi-v7a"`. 265 2662. Run the CMake tool. 267 268 - Use hdc_std to connect to the RK3568 development board and put **demo** and **mobilenetv2.ms** to the same directory on the board. 269 - Run the hdc_std shell command to access the development board, go to the directory where **demo** is located, and run the following command: 270 271 ```shell 272 ./demo mobilenetv2.ms 273 ``` 274 275 The inference is successful if the output is similar to the following: 276 277 ```shell 278 # ./QuickStart ./mobilenetv2.ms 279 Tensor name: Softmax-65, tensor size is 4004 ,elements num: 1001. 280 output data is: 281 0.000018 0.000012 0.000026 0.000194 0.000156 0.001501 0.000240 0.000825 0.000016 0.000006 0.000007 0.000004 0.000004 0.000004 0.000015 0.000099 0.000011 0.000013 0.000005 0.000023 0.000004 0.000008 0.000003 0.000003 0.000008 0.000014 0.000012 0.000006 0.000019 0.000006 0.000018 0.000024 0.000010 0.000002 0.000028 0.000372 0.000010 0.000017 0.000008 0.000004 0.000007 0.000010 0.000007 0.000012 0.000005 0.000015 0.000007 0.000040 0.000004 0.000085 0.000023 282 ``` 283 284## Samples 285The following sample is provided to help you better understand how to use MindSpore Lite: 286- [Simple MindSpore Lite Tutorial](https://gitee.com/openharmony/third_party_mindspore/tree/OpenHarmony-3.2-Release/mindspore/lite/examples/quick_start_c) 287