1# Connecting the Neural Network Runtime to an AI Inference Framework 2 3## When to Use 4 5As a bridge between the AI inference engine and acceleration chip, the Neural Network Runtime provides simplified Native APIs for the AI inference engine to perform end-to-end inference through the acceleration chip. 6 7This document uses the `Add` single-operator model shown in Figure 1 as an example to describe the development process of Neural Network Runtime. The `Add` operator involves two inputs, one parameter, and one output. Wherein, the `activation` parameter is used to specify the type of the activation function in the `Add` operator. 8 9**Figure 1** Add single-operator model 10!["Add single-operator model"](figures/neural_network_runtime.png) 11 12## Preparing the Environment 13 14### Environment Requirements 15 16The environment requirements for the Neural Network Runtime are as follows: 17 18- System version: OpenHarmony master branch. 19- Development environment: Ubuntu 18.04 or later. 20- Access device: a standard device running OpenHarmony. The built-in hardware accelerator driver has been connected to the Neural Network Runtime through an HDI API. 21 22The Neural Network Runtime is opened to external systems through OpenHarmony Native APIs. Therefore, you need to use the Native development suite of the OpenHarmony to compile Neural Network Runtime applications. You can download the **ohos-sdk** package of the corresponding version from [Daily Build](http://ci.openharmony.cn/dailys/dailybuilds) in the OpenHarmony community and then decompress the package to obtain the Native development suite of the corresponding platform. Take Linux as an example. The package of the Native development suite is named `native-linux-{version number}.zip`. 23 24### Environment Setup 25 261. Start the Ubuntu server. 272. Copy the downloaded package of the Native development suite to the root directory of the current user. 283. Decompress the package of the Native development suite. 29```shell 30unzip native-linux-{version number}.zip 31``` 32 33The directory structure after decompression is as follows. The content in the directory may vary depending on version iteration. Use the Native APIs of the latest version. 34```text 35native/ 36─ ─ build // Cross-compilation toolchain 37─ ─ build-tools // Compilation and build tools 38├── docs 39├── llvm 40├── nativeapi_syscap_config.json 41├── ndk_system_capability.json 42├── NOTICE.txt 43├── oh-uni-package.json 44── sysroot // Native API header files and libraries 45``` 46## Available APIs 47 48This section describes the common APIs used in the development process of the Neural Network Runtime. 49 50### Structure 51 52| Name| Description| 53| --------- | ---- | 54| typedef struct OH_NNModel OH_NNModel | Model handle of the Neural Network Runtime. It is used to construct a model.| 55| typedef struct OH_NNCompilation OH_NNCompilation | Compiler handle of the Neural Network Runtime. It is used to compile an AI model.| 56| typedef struct OH_NNExecutor OH_NNExecutor | Executor handle of the Neural Network Runtime. It is used to perform inference computing on a specified device.| 57 58### Model Construction APIs 59 60| Name| Description| 61| ------- | --- | 62| OH_NNModel_Construct() | Creates a model instance of the OH_NNModel type.| 63| OH_NN_ReturnCode OH_NNModel_AddTensor(OH_NNModel *model, const OH_NN_Tensor *tensor) | Adds a tensor to a model instance.| 64| OH_NN_ReturnCode OH_NNModel_SetTensorData(OH_NNModel *model, uint32_t index, const void *dataBuffer, size_t length) | Sets the tensor value.| 65| OH_NN_ReturnCode OH_NNModel_AddOperation(OH_NNModel *model, OH_NN_OperationType op, const OH_NN_UInt32Array *paramIndices, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Adds an operator to a model instance.| 66| OH_NN_ReturnCode OH_NNModel_SpecifyInputsAndOutputs(OH_NNModel *model, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Specifies the model input and output.| 67| OH_NN_ReturnCode OH_NNModel_Finish(OH_NNModel *model) | Completes model composition.| 68| void OH_NNModel_Destroy(OH_NNModel **model) | Destroys a model instance.| 69 70### Model Compilation APIs 71 72| Name| Description| 73| ------- | --- | 74| OH_NNCompilation *OH_NNCompilation_Construct(const OH_NNModel *model) | Creates a compilation instance of the OH_NNCompilation type.| 75| OH_NN_ReturnCode OH_NNCompilation_SetDevice(OH_NNCompilation *compilation, size_t deviceID) | Specifies the device for model compilation and computing.| 76| OH_NN_ReturnCode OH_NNCompilation_SetCache(OH_NNCompilation *compilation, const char *cachePath, uint32_t version) | Sets the cache directory and version of the compiled model.| 77| OH_NN_ReturnCode OH_NNCompilation_Build(OH_NNCompilation *compilation) | Performs model compilation.| 78| void OH_NNCompilation_Destroy(OH_NNCompilation **compilation) | Destroys the OH_NNCompilation instance.| 79 80### Inference Execution APIs 81 82| Name| Description| 83| ------- | --- | 84| OH_NNExecutor *OH_NNExecutor_Construct(OH_NNCompilation *compilation) | Creates an executor instance of the OH_NNExecutor type.| 85| OH_NN_ReturnCode OH_NNExecutor_SetInput(OH_NNExecutor *executor, uint32_t inputIndex, const OH_NN_Tensor *tensor, const void *dataBuffer, size_t length) | Sets the single input data for a model.| 86| OH_NN_ReturnCode OH_NNExecutor_SetOutput(OH_NNExecutor *executor, uint32_t outputIndex, void *dataBuffer, size_t length) | Sets the buffer for a single output of a model.| 87| OH_NN_ReturnCode OH_NNExecutor_Run(OH_NNExecutor *executor) | Executes model inference.| 88| void OH_NNExecutor_Destroy(OH_NNExecutor **executor) | Destroys the OH_NNExecutor instance to release the memory occupied by the instance.| 89 90### Device Management APIs 91 92| Name| Description| 93| ------- | --- | 94| OH_NN_ReturnCode OH_NNDevice_GetAllDevicesID(const size_t **allDevicesID, uint32_t *deviceCount) | Obtains the ID of the device connected to the Neural Network Runtime.| 95 96 97## How to Develop 98 99The development process of the Neural Network Runtime consists of three phases: model construction, model compilation, and inference execution. The following uses the `Add` single-operator model as an example to describe how to call Neural Network Runtime APIs during application development. 100 1011. Create an application sample file. 102 103 Create the source file of the Neural Network Runtime application sample. Run the following commands in the project directory to create the `nnrt_example/` directory and create the `nnrt_example.cpp` source file in the directory: 104 105 ```shell 106 mkdir ~/nnrt_example && cd ~/nnrt_example 107 touch nnrt_example.cpp 108 ``` 109 1102. Import the Neural Network Runtime module. 111 112 Add the following code at the beginning of the `nnrt_example.cpp` file to import the Neural Network Runtime module: 113 114 ```cpp 115 #include <cstdint> 116 #include <iostream> 117 #include <vector> 118 119 #include "neural_network_runtime/neural_network_runtime.h" 120 121 // Constant, used to specify the byte length of the input and output data. 122 const size_t DATA_LENGTH = 4 * 12; 123 ``` 124 1253. Construct a model. 126 127 Use Neural Network Runtime APIs to construct an `Add` single-operator sample model. 128 129 ```cpp 130 OH_NN_ReturnCode BuildModel(OH_NNModel** pModel) 131 { 132 // Create a model instance and construct a model. 133 OH_NNModel* model = OH_NNModel_Construct(); 134 if (model == nullptr) { 135 std::cout << "Create model failed." << std::endl; 136 return OH_NN_MEMORY_ERROR; 137 } 138 139 // Add the first input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3]. 140 int32_t inputDims[4] = {1, 2, 2, 3}; 141 OH_NN_Tensor input1 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 142 OH_NN_ReturnCode ret = OH_NNModel_AddTensor(model, &input1); 143 if (ret != OH_NN_SUCCESS) { 144 std::cout << "BuildModel failed, add Tensor of first input failed." << std::endl; 145 return ret; 146 } 147 148 // Add the second input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3]. 149 OH_NN_Tensor input2 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 150 ret = OH_NNModel_AddTensor(model, &input2); 151 if (ret != OH_NN_SUCCESS) { 152 std::cout << "BuildModel failed, add Tensor of second input failed." << std::endl; 153 return ret; 154 } 155 156 // Add the Tensor parameter of the Add operator. This parameter is used to specify the type of the activation function. The data type of the Tensor parameter is int8. 157 int32_t activationDims = 1; 158 int8_t activationValue = OH_NN_FUSED_NONE; 159 OH_NN_Tensor activation = {OH_NN_INT8, 1, &activationDims, nullptr, OH_NN_ADD_ACTIVATIONTYPE}; 160 ret = OH_NNModel_AddTensor(model, &activation); 161 if (ret != OH_NN_SUCCESS) { 162 std::cout << "BuildModel failed, add Tensor of activation failed." << std::endl; 163 return ret; 164 } 165 166 // Set the type of the activation function to OH_NN_FUSED_NONE, indicating that no activation function is added to the operator. 167 ret = OH_NNModel_SetTensorData(model, 2, &activationValue, sizeof(int8_t)); 168 if (ret != OH_NN_SUCCESS) { 169 std::cout << "BuildModel failed, set value of activation failed." << std::endl; 170 return ret; 171 } 172 173 // Set the output of the Add operator. The data type is float32 and the tensor shape is [1, 2, 2, 3]. 174 OH_NN_Tensor output = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 175 ret = OH_NNModel_AddTensor(model, &output); 176 if (ret != OH_NN_SUCCESS) { 177 std::cout << "BuildModel failed, add Tensor of output failed." << std::endl; 178 return ret; 179 } 180 181 // Specify the input, parameter, and output indexes of the Add operator. 182 uint32_t inputIndicesValues[2] = {0, 1}; 183 uint32_t paramIndicesValues = 2; 184 uint32_t outputIndicesValues = 3; 185 OH_NN_UInt32Array paramIndices = {¶mIndicesValues, 1}; 186 OH_NN_UInt32Array inputIndices = {inputIndicesValues, 2}; 187 OH_NN_UInt32Array outputIndices = {&outputIndicesValues, 1}; 188 189 // Add the Add operator to the model instance. 190 ret = OH_NNModel_AddOperation(model, OH_NN_OPS_ADD, ¶mIndices, &inputIndices, &outputIndices); 191 if (ret != OH_NN_SUCCESS) { 192 std::cout << "BuildModel failed, add operation failed." << std::endl; 193 return ret; 194 } 195 196 // Set the input and output indexes of the model instance. 197 ret = OH_NNModel_SpecifyInputsAndOutputs(model, &inputIndices, &outputIndices); 198 if (ret != OH_NN_SUCCESS) { 199 std::cout << "BuildModel failed, specify inputs and outputs failed." << std::endl; 200 return ret; 201 } 202 203 // Complete the model instance construction. 204 ret = OH_NNModel_Finish(model); 205 if (ret != OH_NN_SUCCESS) { 206 std::cout << "BuildModel failed, error happened when finishing model construction." << std::endl; 207 return ret; 208 } 209 210 *pModel = model; 211 return OH_NN_SUCCESS; 212 } 213 ``` 214 2154. Query the acceleration chip connected to the Neural Network Runtime. 216 217 The Neural Network Runtime can connect to multiple acceleration chips through HDI APIs. Before model compilation, you need to query the acceleration chips connected to the Neural Network Runtime on the current device. Each acceleration chip has a unique ID. In the compilation phase, you need to specify the chip for model compilation based on the device ID. 218 ```cpp 219 void GetAvailableDevices(std::vector<size_t>& availableDevice) 220 { 221 availableDevice.clear(); 222 223 // Obtain the available hardware ID. 224 const size_t* devices = nullptr; 225 uint32_t deviceCount = 0; 226 OH_NN_ReturnCode ret = OH_NNDevice_GetAllDevicesID(&devices, &deviceCount); 227 if (ret != OH_NN_SUCCESS) { 228 std::cout << "GetAllDevicesID failed, get no available device." << std::endl; 229 return; 230 } 231 232 for (uint32_t i = 0; i < deviceCount; i++) { 233 availableDevice.emplace_back(devices[i]); 234 } 235 } 236 ``` 237 2385. Compile a model on the specified device. 239 240 The Neural Network Runtime uses abstract model expressions to describe the topology structure of an AI model. Before inference execution on an acceleration chip, the compilation module provided by Neural Network Runtime needs to deliver the abstract model expression to the chip driver layer and convert the abstract model expression into a format that supports inference and computing. 241 ```cpp 242 OH_NN_ReturnCode CreateCompilation(OH_NNModel* model, const std::vector<size_t>& availableDevice, OH_NNCompilation** pCompilation) 243 { 244 // Create a compilation instance to pass the model to the underlying hardware for compilation. 245 OH_NNCompilation* compilation = OH_NNCompilation_Construct(model); 246 if (compilation == nullptr) { 247 std::cout << "CreateCompilation failed, error happened when creating compilation." << std::endl; 248 return OH_NN_MEMORY_ERROR; 249 } 250 251 // Set compilation options, such as the compilation hardware, cache path, performance mode, computing priority, and whether to enable float16 low-precision computing. 252 253 // Choose to perform model compilation on the first device. 254 OH_NN_ReturnCode ret = OH_NNCompilation_SetDevice(compilation, availableDevice[0]); 255 if (ret != OH_NN_SUCCESS) { 256 std::cout << "CreateCompilation failed, error happened when setting device." << std::endl; 257 return ret; 258 } 259 260 // Have the model compilation result cached in the /data/local/tmp directory, with the version number set to 1. 261 ret = OH_NNCompilation_SetCache(compilation, "/data/local/tmp", 1); 262 if (ret != OH_NN_SUCCESS) { 263 std::cout << "CreateCompilation failed, error happened when setting cache path." << std::endl; 264 return ret; 265 } 266 267 // Start model compilation. 268 ret = OH_NNCompilation_Build(compilation); 269 if (ret != OH_NN_SUCCESS) { 270 std::cout << "CreateCompilation failed, error happened when building compilation." << std::endl; 271 return ret; 272 } 273 274 *pCompilation = compilation; 275 return OH_NN_SUCCESS; 276 } 277 ``` 278 2796. Create an executor. 280 281 After the model compilation is complete, you need to call the execution module of the Neural Network Runtime to create an inference executor. In the execution phase, operations such as setting the model input, obtaining the model output, and triggering inference computing are performed through the executor. 282 ```cpp 283 OH_NNExecutor* CreateExecutor(OH_NNCompilation* compilation) 284 { 285 // Create an executor instance. 286 OH_NNExecutor* executor = OH_NNExecutor_Construct(compilation); 287 return executor; 288 } 289 ``` 290 2917. Perform inference computing and print the computing result. 292 293 The input data required for inference computing is passed to the executor through the API provided by the execution module. This way, the executor is triggered to perform inference computing once to obtain the inference computing result. 294 ```cpp 295 OH_NN_ReturnCode Run(OH_NNExecutor* executor) 296 { 297 // Construct sample data. 298 float input1[12] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}; 299 float input2[12] = {11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22}; 300 301 int32_t inputDims[4] = {1, 2, 2, 3}; 302 OH_NN_Tensor inputTensor1 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 303 OH_NN_Tensor inputTensor2 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 304 305 // Set the execution input. 306 307 // Set the first input for execution. The input data is specified by input1. 308 OH_NN_ReturnCode ret = OH_NNExecutor_SetInput(executor, 0, &inputTensor1, input1, DATA_LENGTH); 309 if (ret != OH_NN_SUCCESS) { 310 std::cout << "Run failed, error happened when setting first input." << std::endl; 311 return ret; 312 } 313 314 // Set the second input for execution. The input data is specified by input2. 315 ret = OH_NNExecutor_SetInput(executor, 1, &inputTensor2, input2, DATA_LENGTH); 316 if (ret != OH_NN_SUCCESS) { 317 std::cout << "Run failed, error happened when setting second input." << std::endl; 318 return ret; 319 } 320 321 // Set the output data cache. After the OH_NNExecutor_Run instance performs inference computing, the output result is stored in the output. 322 float output[12]; 323 ret = OH_NNExecutor_SetOutput(executor, 0, output, DATA_LENGTH); 324 if (ret != OH_NN_SUCCESS) { 325 std::cout << "Run failed, error happened when setting output buffer." << std::endl; 326 return ret; 327 } 328 329 // Perform inference computing. 330 ret = OH_NNExecutor_Run(executor); 331 if (ret != OH_NN_SUCCESS) { 332 std::cout << "Run failed, error doing execution." << std::endl; 333 return ret; 334 } 335 336 // Print the output result. 337 for (uint32_t i = 0; i < 12; i++) { 338 std::cout << "Output index: " << i << ", value is: " << output[i] << "." << std::endl; 339 } 340 341 return OH_NN_SUCCESS; 342 } 343 ``` 344 3458. Build an end-to-end process from model construction to model compilation and execution. 346 347 Steps 3 to 7 implement the model construction, compilation, and execution processes and encapsulates them into four functions to facilitate modular development. The following sample code shows how to concatenate the four functions into a complete Neural Network Runtime the development process. 348 ```cpp 349 int main() 350 { 351 OH_NNModel* model = nullptr; 352 OH_NNCompilation* compilation = nullptr; 353 OH_NNExecutor* executor = nullptr; 354 std::vector<size_t> availableDevices; 355 356 // Perform model construction. 357 OH_NN_ReturnCode ret = BuildModel(&model); 358 if (ret != OH_NN_SUCCESS) { 359 std::cout << "BuildModel failed." << std::endl; 360 OH_NNModel_Destroy(&model); 361 return -1; 362 } 363 364 // Obtain the available devices. 365 GetAvailableDevices(availableDevices); 366 if (availableDevices.empty()) { 367 std::cout << "No available device." << std::endl; 368 OH_NNModel_Destroy(&model); 369 return -1; 370 } 371 372 // Perform model compilation. 373 ret = CreateCompilation(model, availableDevices, &compilation); 374 if (ret != OH_NN_SUCCESS) { 375 std::cout << "CreateCompilation failed." << std::endl; 376 OH_NNModel_Destroy(&model); 377 OH_NNCompilation_Destroy(&compilation); 378 return -1; 379 } 380 381 // Create an inference executor for the model. 382 executor = CreateExecutor(compilation); 383 if (executor == nullptr) { 384 std::cout << "CreateExecutor failed, no executor is created." << std::endl; 385 OH_NNModel_Destroy(&model); 386 OH_NNCompilation_Destroy(&compilation); 387 return -1; 388 } 389 390 // Use the created executor to perform single-step inference computing. 391 ret = Run(executor); 392 if (ret != OH_NN_SUCCESS) { 393 std::cout << "Run failed." << std::endl; 394 OH_NNModel_Destroy(&model); 395 OH_NNCompilation_Destroy(&compilation); 396 OH_NNExecutor_Destroy(&executor); 397 return -1; 398 } 399 400 // Destroy the model to release occupied resources. 401 OH_NNModel_Destroy(&model); 402 OH_NNCompilation_Destroy(&compilation); 403 OH_NNExecutor_Destroy(&executor); 404 405 return 0; 406 } 407 ``` 408 409## Verification 410 4111. Prepare the compilation configuration file of the application sample. 412 413 Create a `CMakeLists.txt` file, and add compilation configurations to the application sample file `nnrt_example.cpp`. The following is a simple example of the `CMakeLists.txt` file: 414 ```text 415 cmake_minimum_required(VERSION 3.16) 416 project(nnrt_example C CXX) 417 418 add_executable(nnrt_example 419 ./nnrt_example.cpp 420 ) 421 422 target_link_libraries(nnrt_example 423 neural_network_runtime.z 424 ) 425 ``` 426 4272. Compile the application sample. 428 429 Create the **build/** directory in the current directory, and compile `nnrt\_example.cpp` in the **build/** directory to obtain the binary file `nnrt\_example`: 430 ```shell 431 mkdir build && cd build 432 cmake -DCMAKE_TOOLCHAIN_FILE={Path of the cross-compilation tool chain }/build/cmake/ohos.toolchain.cmake -DOHOS_ARCH=arm64-v8a -DOHOS_PLATFORM=OHOS -DOHOS_STL=c++_static .. 433 make 434 ``` 435 4363. Push the application sample to the device for execution. 437 ```shell 438 # Push the `nnrt_example` obtained through compilation to the device, and execute it. 439 hdc_std file send ./nnrt_example /data/local/tmp/. 440 441 # Grant required permissions to the executable file of the test case. 442 hdc_std shell "chmod +x /data/local/tmp/nnrt_example" 443 444 # Execute the test case. 445 hdc_std shell "/data/local/tmp/nnrt_example" 446 ``` 447 448 If the execution is normal, information similar to the following is displayed: 449 ```text 450 Output index: 0, value is: 11.000000. 451 Output index: 1, value is: 13.000000. 452 Output index: 2, value is: 15.000000. 453 Output index: 3, value is: 17.000000. 454 Output index: 4, value is: 19.000000. 455 Output index: 5, value is: 21.000000. 456 Output index: 6, value is: 23.000000. 457 Output index: 7, value is: 25.000000. 458 Output index: 8, value is: 27.000000. 459 Output index: 9, value is: 29.000000. 460 Output index: 10, value is: 31.000000. 461 Output index: 11, value is: 33.000000. 462 ``` 463 4644. (Optional) Check the model cache. 465 466 If the HDI service connected to the Neural Network Runtime supports the model cache function, you can find the generated cache file in the `/data/local/tmp` directory after the `nnrt_example` is executed successfully. 467 468 > **NOTE** 469 > 470 > The IR graphs of the model need to be passed to the hardware driver layer, so that the HDI service compiles the IR graphs into a computing graph dedicated to hardware. The compilation process is time-consuming. The Neural Network Runtime supports the computing graph cache feature. It can cache the computing graphs compiled by the HDI service to the device storage. If the same model is compiled on the same acceleration chip next time, you can specify the cache path so that the Neural Network Runtime can directly load the computing graphs in the cache file, reducing the compilation time. 471 472 Check the cached files in the cache directory. 473 ```shell 474 ls /data/local/tmp 475 ``` 476 477 The command output is as follows: 478 ```text 479 # 0.nncache cache_info.nncache 480 ``` 481 482 If the cache is no longer used, manually delete the cache files. 483 ```shell 484 rm /data/local/tmp/*nncache 485 ``` 486 487## Samples 488 489The following sample is provided to help you understand how to connect a third-party AI inference framework to the Neural Network Runtime: 490- [Development Guide for Connecting TensorFlow Lite to NNRt Delegate](https://gitee.com/openharmony/neural_network_runtime/tree/master/example/deep_learning_framework) 491 492