1# Using MindSpore Lite for Model Inference (C/C++) 2 3## When to Use 4 5MindSpore Lite is an AI engine that provides AI model inference for different hardware devices. It has been used in a wide range of fields, such as image classification, target recognition, facial recognition, and character recognition. 6 7This document describes the general development process for MindSpore Lite model inference. 8 9## Basic Concepts 10 11Before getting started, you need to understand the following basic concepts: 12 13**Tensor**: a special data structure that is similar to arrays and matrices. It is basic data structure used in MindSpore Lite network operations. 14 15**Float16 inference mode**: an inference mode in half-precision format, where a number is represented with 16 bits. 16 17 18 19## Available APIs 20 21APIs involved in MindSpore Lite model inference are categorized into context APIs, model APIs, and tensor APIs. For details about the APIs, see [MindSpore](../../reference/apis-mindspore-lite-kit/capi-mindspore.md). 22 23### Context APIs 24 25| API | Description | 26| ------------------ | ----------------- | 27|OH_AI_ContextHandle OH_AI_ContextCreate()|Creates a context object. This API must be used together with **OH_AI_ContextDestroy**.| 28|void OH_AI_ContextSetThreadNum(OH_AI_ContextHandle context, int32_t thread_num)|Sets the number of runtime threads.| 29|void OH_AI_ContextSetThreadAffinityMode(OH_AI_ContextHandle context, int mode)|Sets the affinity mode for binding runtime threads to CPU cores, which are classified into large, medium, and small cores based on the CPU frequency. You only need to bind the large or medium cores, but not small cores.| 30|OH_AI_DeviceInfoHandle OH_AI_DeviceInfoCreate(OH_AI_DeviceType device_type)|Creates a runtime device information object.| 31|void OH_AI_ContextDestroy(OH_AI_ContextHandle *context)|Destroys a context object.| 32|void OH_AI_DeviceInfoSetEnableFP16(OH_AI_DeviceInfoHandle device_info, bool is_fp16)|Sets whether to enable float16 inference. This function is available only for CPU and GPU devices.| 33|void OH_AI_ContextAddDeviceInfo(OH_AI_ContextHandle context, OH_AI_DeviceInfoHandle device_info)|Adds a runtime device information object.| 34 35### Model APIs 36 37| API | Description | 38| ------------------ | ----------------- | 39|OH_AI_ModelHandle OH_AI_ModelCreate()|Creates a model object.| 40|OH_AI_Status OH_AI_ModelBuildFromFile(OH_AI_ModelHandle model, const char *model_path,OH_AI_ModelType model_type, const OH_AI_ContextHandle model_context)|Loads and builds a MindSpore model from a model file.| 41|void OH_AI_ModelDestroy(OH_AI_ModelHandle *model)|Destroys a model object.| 42 43### Tensor APIs 44 45| API | Description | 46| ------------------ | ----------------- | 47|OH_AI_TensorHandleArray OH_AI_ModelGetInputs(const OH_AI_ModelHandle model)|Obtains the input tensor array structure of a model.| 48|int64_t OH_AI_TensorGetElementNum(const OH_AI_TensorHandle tensor)|Obtains the number of tensor elements.| 49|const char *OH_AI_TensorGetName(const OH_AI_TensorHandle tensor)|Obtains the name of a tensor.| 50|OH_AI_DataType OH_AI_TensorGetDataType(const OH_AI_TensorHandle tensor)|Obtains the tensor data type.| 51|void *OH_AI_TensorGetMutableData(const OH_AI_TensorHandle tensor)|Obtains the pointer to mutable tensor data.| 52 53## How to Develop 54 55The following figure shows the development process for MindSpore Lite model inference. 56 57**Figure 1** Development process for MindSpore Lite model inference 58 59 60 61Before moving to the development process, you need to reference related header files and compile functions to generate random input. The sample code is as follows: 62 63```c 64#include <stdlib.h> 65#include <stdio.h> 66#include <unistd.h> 67#include "mindspore/model.h" 68 69// Generate random input. 70int GenerateInputDataWithRandom(OH_AI_TensorHandleArray inputs) { 71 for (size_t i = 0; i < inputs.handle_num; ++i) { 72 float *input_data = (float *)OH_AI_TensorGetMutableData(inputs.handle_list[i]); 73 if (input_data == NULL) { 74 printf("MSTensorGetMutableData failed.\n"); 75 return OH_AI_STATUS_LITE_ERROR; 76 } 77 int64_t num = OH_AI_TensorGetElementNum(inputs.handle_list[i]); 78 const int divisor = 10; 79 for (size_t j = 0; j < num; j++) { 80 input_data[j] = (float)(rand() % divisor) / divisor; // 0--0.9f 81 } 82 } 83 return OH_AI_STATUS_SUCCESS; 84} 85``` 86 87The development process consists of the following main steps: 88 891. Prepare the required model. 90 91 The required model can be downloaded directly or obtained using the model conversion tool. 92 93 - If the downloaded model is in the `.ms` format, you can use it directly for inference. The following uses the **mobilenetv2.ms** model as an example. 94 - If the downloaded model uses a third-party framework, such as TensorFlow, TensorFlow Lite, Caffe, or ONNX, you can use the [model conversion tool](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html#2-3-0) to convert it to the `.ms` format. 95 962. Create a context, and set parameters such as the number of runtime threads and device type. 97 98 The following describes two typical scenarios: 99 100 Scenario 1: Only the CPU inference context is created. 101 102 ```c 103 // Create a context, and set the number of runtime threads to 2 and the thread affinity mode to 1 (big cores first). 104 OH_AI_ContextHandle context = OH_AI_ContextCreate(); 105 if (context == NULL) { 106 printf("OH_AI_ContextCreate failed.\n"); 107 return OH_AI_STATUS_LITE_ERROR; 108 } 109 const int thread_num = 2; 110 OH_AI_ContextSetThreadNum(context, thread_num); 111 OH_AI_ContextSetThreadAffinityMode(context, 1); 112 // Set the device type to CPU, and disable Float16 inference. 113 OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU); 114 if (cpu_device_info == NULL) { 115 printf("OH_AI_DeviceInfoCreate failed.\n"); 116 OH_AI_ContextDestroy(&context); 117 return OH_AI_STATUS_LITE_ERROR; 118 } 119 OH_AI_DeviceInfoSetEnableFP16(cpu_device_info, false); 120 OH_AI_ContextAddDeviceInfo(context, cpu_device_info); 121 ``` 122 123 Scenario 2: The neural network runtime (NNRT) and CPU heterogeneous inference contexts are created. 124 125 NNRT is the runtime for cross-chip inference computing in the AI field. Generally, the acceleration hardware connected to NNRT, such as the NPU, has strong inference capabilities but supports only a limited number of operators, whereas the general-purpose CPU has weak inference capabilities but supports a wide range of operators. MindSpore Lite supports NNRT and CPU heterogeneous inference. Model operators are preferentially scheduled to NNRT for inference. If certain operators are not supported by NNRT, then they are scheduled to the CPU for inference. The following is the sample code for configuring NNRT/CPU heterogeneous inference: 126 <!--Del--> 127 > **NOTE** 128 > 129 > NNRT/CPU heterogeneous inference requires access of NNRT hardware. For details, see [OpenHarmony/ai_neural_network_runtime](https://gitee.com/openharmony/ai_neural_network_runtime). 130 <!--DelEnd--> 131 ```c 132 // Create a context, and set the number of runtime threads to 2 and the thread affinity mode to 1 (big cores first). 133 OH_AI_ContextHandle context = OH_AI_ContextCreate(); 134 if (context == NULL) { 135 printf("OH_AI_ContextCreate failed.\n"); 136 return OH_AI_STATUS_LITE_ERROR; 137 } 138 // Preferentially use NNRT inference. 139 // Use the NNRT hardware of the first ACCELERATORS class to create the NNRT device information and configure the high-performance inference mode for the NNRT hardware. You can also use OH_AI_GetAllNNRTDeviceDescs() to obtain the list of NNRT devices in the current environment, search for a specific device by device name or type, and use the device as the NNRT inference hardware. 140 OH_AI_DeviceInfoHandle nnrt_device_info = OH_AI_CreateNNRTDeviceInfoByType(OH_AI_NNRTDEVICE_ACCELERATOR); 141 if (nnrt_device_info == NULL) { 142 printf("OH_AI_DeviceInfoCreate failed.\n"); 143 OH_AI_ContextDestroy(&context); 144 return OH_AI_STATUS_LITE_ERROR; 145 } 146 OH_AI_DeviceInfoSetPerformanceMode(nnrt_device_info, OH_AI_PERFORMANCE_HIGH); 147 OH_AI_ContextAddDeviceInfo(context, nnrt_device_info); 148 149 // Configure CPU inference. 150 OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU); 151 if (cpu_device_info == NULL) { 152 printf("OH_AI_DeviceInfoCreate failed.\n"); 153 OH_AI_ContextDestroy(&context); 154 return OH_AI_STATUS_LITE_ERROR; 155 } 156 OH_AI_ContextAddDeviceInfo(context, cpu_device_info); 157 ``` 158 159 160 1613. Create, load, and build the model. 162 163 Call **OH_AI_ModelBuildFromFile** to load and build the model. 164 165 In this example, the **argv[1]** parameter passed to **OH_AI_ModelBuildFromFile** indicates the specified model file path. 166 167 ```c 168 // Create a model. 169 OH_AI_ModelHandle model = OH_AI_ModelCreate(); 170 if (model == NULL) { 171 printf("OH_AI_ModelCreate failed.\n"); 172 OH_AI_ContextDestroy(&context); 173 return OH_AI_STATUS_LITE_ERROR; 174 } 175 176 // Load and build the inference model. The model type is OH_AI_MODELTYPE_MINDIR. 177 if (access(argv[1], F_OK) != 0) { 178 printf("model file not exists.\n"); 179 OH_AI_ModelDestroy(&model); 180 OH_AI_ContextDestroy(&context); 181 return OH_AI_STATUS_LITE_ERROR; 182 } 183 int ret = OH_AI_ModelBuildFromFile(model, argv[1], OH_AI_MODELTYPE_MINDIR, context); 184 if (ret != OH_AI_STATUS_SUCCESS) { 185 printf("OH_AI_ModelBuildFromFile failed, ret: %d.\n", ret); 186 OH_AI_ModelDestroy(&model); 187 OH_AI_ContextDestroy(&context); 188 return ret; 189 } 190 ``` 191 1924. Input data. 193 194 Before executing model inference, you need to populate data to the input tensor. In this example, random data is used to populate the model. 195 196 ```c 197 // Obtain the input tensor. 198 OH_AI_TensorHandleArray inputs = OH_AI_ModelGetInputs(model); 199 if (inputs.handle_list == NULL) { 200 printf("OH_AI_ModelGetInputs failed, ret: %d.\n", ret); 201 OH_AI_ModelDestroy(&model); 202 OH_AI_ContextDestroy(&context); 203 return ret; 204 } 205 // Use random data to populate the tensor. 206 ret = GenerateInputDataWithRandom(inputs); 207 if (ret != OH_AI_STATUS_SUCCESS) { 208 printf("GenerateInputDataWithRandom failed, ret: %d.\n", ret); 209 OH_AI_ModelDestroy(&model); 210 OH_AI_ContextDestroy(&context); 211 return ret; 212 } 213 ``` 214 2155. Execute model inference. 216 217 Call **OH_AI_ModelPredict** to perform model inference. 218 219 ```c 220 // Execute model inference. 221 OH_AI_TensorHandleArray outputs; 222 ret = OH_AI_ModelPredict(model, inputs, &outputs, NULL, NULL); 223 if (ret != OH_AI_STATUS_SUCCESS) { 224 printf("OH_AI_ModelPredict failed, ret: %d.\n", ret); 225 OH_AI_ModelDestroy(&model); 226 OH_AI_ContextDestroy(&context); 227 return ret; 228 } 229 ``` 230 2316. Obtain the output. 232 233 After model inference is complete, you can obtain the inference result through the output tensor. 234 235 ```c 236 // Obtain the output tensor and print the information. 237 for (size_t i = 0; i < outputs.handle_num; ++i) { 238 OH_AI_TensorHandle tensor = outputs.handle_list[i]; 239 long long element_num = OH_AI_TensorGetElementNum(tensor); 240 printf("Tensor name: %s, tensor size is %zu ,elements num: %lld.\n", OH_AI_TensorGetName(tensor), 241 OH_AI_TensorGetDataSize(tensor), element_num); 242 const float *data = (const float *)OH_AI_TensorGetData(tensor); 243 if (data == NULL) { 244 printf("OH_AI_TensorGetData failed.\n"); 245 OH_AI_ModelDestroy(&model); 246 OH_AI_ContextDestroy(&context); 247 return OH_AI_STATUS_LITE_ERROR; 248 } 249 printf("output data is:\n"); 250 const int max_print_num = 50; 251 for (int j = 0; j < element_num && j <= max_print_num; ++j) { 252 printf("%f ", data[j]); 253 } 254 printf("\n"); 255 } 256 ``` 257 2587. Destroy the model. 259 260 If the MindSpore Lite inference framework is no longer needed, you need to destroy the created model. 261 262 ```c 263 // Release the model and context. 264 OH_AI_ModelDestroy(&model); 265 OH_AI_ContextDestroy(&context); 266 ``` 267 268## Verification 269 2701. Write **CMakeLists.txt**. 271 272 ```cmake 273 cmake_minimum_required(VERSION 3.14) 274 project(Demo) 275 276 add_executable(demo main.c) 277 278 target_link_libraries( 279 demo 280 mindspore_lite_ndk 281 pthread 282 dl 283 ) 284 ``` 285 - To use ohos-sdk for cross compilation, you need to set the toolchain path for the CMake tool as follows: `-DCMAKE_TOOLCHAIN_FILE="/{sdkPath}/native/build/cmake/ohos.toolchain.cmake"`. 286 287 Where, **sdkPath** indicates the SDK path in the DevEco Studio installation directory. To obtain the SDK path, go to the project page on DevEco Studio, choose **File** > **Settings...** > **OpenHarmony SDK**, and view the information in **Location**. 288 289 - The toolchain builds a 64-bit application by default. To build a 32-bit application, add the following configuration: `-DOHOS_ARCH="armeabi-v7a"`. 290 2912. Run the CMake tool. 292 293 - Use hdc_std to connect to the device and put **demo** and **mobilenetv2.ms** to the same directory on the device. 294 - Run the hdc_std shell command to access the device, go to the directory where **demo** is located, and run the following command: 295 296 ```shell 297 ./demo mobilenetv2.ms 298 ``` 299 300 The inference is successful if the output is similar to the following: 301 302 ```shell 303 # ./demo ./mobilenetv2.ms 304 Tensor name: Softmax-65, tensor size is 4004 ,elements num: 1001. 305 output data is: 306 0.000018 0.000012 0.000026 0.000194 0.000156 0.001501 0.000240 0.000825 0.000016 0.000006 0.000007 0.000004 0.000004 0.000004 0.000015 0.000099 0.000011 0.000013 0.000005 0.000023 0.000004 0.000008 0.000003 0.000003 0.000008 0.000014 0.000012 0.000006 0.000019 0.000006 0.000018 0.000024 0.000010 0.000002 0.000028 0.000372 0.000010 0.000017 0.000008 0.000004 0.000007 0.000010 0.000007 0.000012 0.000005 0.000015 0.000007 0.000040 0.000004 0.000085 0.000023 307 ``` 308 309## Samples 310 311The following sample is provided to help you better understand how to use MindSpore Lite: 312 313- [Simple MindSpore Lite Tutorial](https://gitee.com/openharmony/third_party_mindspore/tree/OpenHarmony-3.2-Release/mindspore/lite/examples/quick_start_c) 314