1# Development Guide for Connecting the Neural Network Runtime to an AI Inference Framework 2 3## When to Use 4 5As a bridge between the AI inference engine and acceleration chip, the Neural Network Runtime provides simplified Native APIs for the AI inference engine to perform end-to-end inference through the acceleration chip. 6 7This document uses the `Add` single-operator model shown in Figure 1 as an example to describe the development process of Neural Network Runtime. The `Add` operator involves two inputs, one parameter, and one output. Wherein, the `activation` parameter is used to specify the type of the activation function in the `Add` operator. 8 9**Figure 1** Add single-operator model 10 11 12## Preparing the Environment 13 14### Environment Requirements 15 16The environment requirements for the Neural Network Runtime are as follows: 17 18- System version: OpenHarmony master branch. 19- Development environment: Ubuntu 18.04 or later. 20- Access device: a standard device running OpenHarmony. The built-in hardware accelerator driver has been connected to the Neural Network Runtime through an HDI API. 21 22The Neural Network Runtime is opened to external systems through OpenHarmony Native APIs. Therefore, you need to use the Native development suite of the OpenHarmony to compile Neural Network Runtime applications. 23 24### Environment Setup 25 261. Start the Ubuntu server. 272. Copy the package of the Native development suite to the root directory of the current user. 283. Decompress the package of the Native development suite. 29 30```shell 31unzip native-linux-{version number}.zip 32``` 33 34The directory structure after decompression is as follows. The content in the directory may vary depending on version iteration. Use the Native APIs of the latest version. 35```text 36native/ 37─ ─ build // Cross-compilation toolchain 38─ ─ build-tools // Compilation and build tools 39├── docs 40├── llvm 41├── nativeapi_syscap_config.json 42├── ndk_system_capability.json 43├── NOTICE.txt 44├── oh-uni-package.json 45── sysroot // Native API header files and libraries 46``` 47## Available APIs 48 49This section describes the common APIs used in the development process of the Neural Network Runtime. 50 51### Structure 52 53| Name| Description| 54| --------- | ---- | 55| typedef struct OH_NNModel OH_NNModel | Model handle of the Neural Network Runtime. It is used to construct a model.| 56| typedef struct OH_NNCompilation OH_NNCompilation | Compiler handle of the Neural Network Runtime. It is used to compile an AI model.| 57| typedef struct OH_NNExecutor OH_NNExecutor | Executor handle of the Neural Network Runtime. It is used to perform inference computing on a specified device.| 58 59### Model Construction APIs 60 61| Name| Description| 62| ------- | --- | 63| OH_NNModel_Construct() | Creates a model instance of the OH_NNModel type.| 64| OH_NN_ReturnCode OH_NNModel_AddTensor(OH_NNModel *model, const OH_NN_Tensor *tensor) | Adds a tensor to a model instance.| 65| OH_NN_ReturnCode OH_NNModel_SetTensorData(OH_NNModel *model, uint32_t index, const void *dataBuffer, size_t length) | Sets the tensor value.| 66| OH_NN_ReturnCode OH_NNModel_AddOperation(OH_NNModel *model, OH_NN_OperationType op, const OH_NN_UInt32Array *paramIndices, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Adds an operator to a model instance.| 67| OH_NN_ReturnCode OH_NNModel_SpecifyInputsAndOutputs(OH_NNModel *model, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Specifies the model input and output.| 68| OH_NN_ReturnCode OH_NNModel_Finish(OH_NNModel *model) | Completes model composition.| 69| void OH_NNModel_Destroy(OH_NNModel **model) | Destroys a model instance.| 70 71### Model Compilation APIs 72 73| Name| Description| 74| ------- | --- | 75| OH_NNCompilation *OH_NNCompilation_Construct(const OH_NNModel *model) | Creates a compilation instance of the OH_NNCompilation type.| 76| OH_NN_ReturnCode OH_NNCompilation_SetDevice(OH_NNCompilation *compilation, size_t deviceID) | Specifies the device for model compilation and computing.| 77| OH_NN_ReturnCode OH_NNCompilation_SetCache(OH_NNCompilation *compilation, const char *cachePath, uint32_t version) | Sets the cache directory and version of the compiled model.| 78| OH_NN_ReturnCode OH_NNCompilation_Build(OH_NNCompilation *compilation) | Performs model compilation.| 79| void OH_NNCompilation_Destroy(OH_NNCompilation **compilation) | Destroys the OH_NNCompilation instance.| 80 81### Inference Execution APIs 82 83| Name| Description| 84| ------- | --- | 85| OH_NNExecutor *OH_NNExecutor_Construct(OH_NNCompilation *compilation) | Creates an executor instance of the OH_NNExecutor type.| 86| OH_NN_ReturnCode OH_NNExecutor_SetInput(OH_NNExecutor *executor, uint32_t inputIndex, const OH_NN_Tensor *tensor, const void *dataBuffer, size_t length) | Sets the single input data for a model.| 87| OH_NN_ReturnCode OH_NNExecutor_SetOutput(OH_NNExecutor *executor, uint32_t outputIndex, void *dataBuffer, size_t length) | Sets the buffer for a single output of a model.| 88| OH_NN_ReturnCode OH_NNExecutor_Run(OH_NNExecutor *executor) | Executes model inference.| 89| void OH_NNExecutor_Destroy(OH_NNExecutor **executor) | Destroys the OH_NNExecutor instance to release the memory occupied by the instance.| 90 91### Device Management APIs 92 93| Name| Description| 94| ------- | --- | 95| OH_NN_ReturnCode OH_NNDevice_GetAllDevicesID(const size_t **allDevicesID, uint32_t *deviceCount) | Obtains the ID of the device connected to the Neural Network Runtime.| 96 97 98## How to Develop 99 100The development process of the Neural Network Runtime consists of three phases: model construction, model compilation, and inference execution. The following uses the `Add` single-operator model as an example to describe how to call Neural Network Runtime APIs during application development. 101 1021. Create an application sample file. 103 104 Create the source file of the Neural Network Runtime application sample. Run the following commands in the project directory to create the `nnrt_example/` directory and create the `nnrt_example.cpp` source file in the directory: 105 106 ```shell 107 mkdir ~/nnrt_example && cd ~/nnrt_example 108 touch nnrt_example.cpp 109 ``` 110 1112. Import the Neural Network Runtime module. 112 113 Add the following code at the beginning of the `nnrt_example.cpp` file to import the Neural Network Runtime module: 114 115 ```cpp 116 #include <cstdint> 117 #include <iostream> 118 #include <vector> 119 120 #include "neural_network_runtime/neural_network_runtime.h" 121 122 // Constant, used to specify the byte length of the input and output data. 123 const size_t DATA_LENGTH = 4 * 12; 124 ``` 125 1263. Construct a model. 127 128 Use Neural Network Runtime APIs to construct an `Add` single-operator sample model. 129 130 ```cpp 131 OH_NN_ReturnCode BuildModel(OH_NNModel** pModel) 132 { 133 // Create a model instance and construct a model. 134 OH_NNModel* model = OH_NNModel_Construct(); 135 if (model == nullptr) { 136 std::cout << "Create model failed." << std::endl; 137 return OH_NN_MEMORY_ERROR; 138 } 139 140 // Add the first input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3]. 141 int32_t inputDims[4] = {1, 2, 2, 3}; 142 OH_NN_Tensor input1 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 143 OH_NN_ReturnCode ret = OH_NNModel_AddTensor(model, &input1); 144 if (ret != OH_NN_SUCCESS) { 145 std::cout << "BuildModel failed, add Tensor of first input failed." << std::endl; 146 return ret; 147 } 148 149 // Add the second input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3]. 150 OH_NN_Tensor input2 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 151 ret = OH_NNModel_AddTensor(model, &input2); 152 if (ret != OH_NN_SUCCESS) { 153 std::cout << "BuildModel failed, add Tensor of second input failed." << std::endl; 154 return ret; 155 } 156 157 // Add the Tensor parameter of the Add operator. This parameter is used to specify the type of the activation function. The data type of the Tensor parameter is int8. 158 int32_t activationDims = 1; 159 int8_t activationValue = OH_NN_FUSED_NONE; 160 OH_NN_Tensor activation = {OH_NN_INT8, 1, &activationDims, nullptr, OH_NN_ADD_ACTIVATIONTYPE}; 161 ret = OH_NNModel_AddTensor(model, &activation); 162 if (ret != OH_NN_SUCCESS) { 163 std::cout << "BuildModel failed, add Tensor of activation failed." << std::endl; 164 return ret; 165 } 166 167 // Set the type of the activation function to OH_NN_FUSED_NONE, indicating that no activation function is added to the operator. 168 ret = OH_NNModel_SetTensorData(model, 2, &activationValue, sizeof(int8_t)); 169 if (ret != OH_NN_SUCCESS) { 170 std::cout << "BuildModel failed, set value of activation failed." << std::endl; 171 return ret; 172 } 173 174 // Set the output of the Add operator. The data type is float32 and the tensor shape is [1, 2, 2, 3]. 175 OH_NN_Tensor output = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 176 ret = OH_NNModel_AddTensor(model, &output); 177 if (ret != OH_NN_SUCCESS) { 178 std::cout << "BuildModel failed, add Tensor of output failed." << std::endl; 179 return ret; 180 } 181 182 // Specify the input, parameter, and output indexes of the Add operator. 183 uint32_t inputIndicesValues[2] = {0, 1}; 184 uint32_t paramIndicesValues = 2; 185 uint32_t outputIndicesValues = 3; 186 OH_NN_UInt32Array paramIndices = {¶mIndicesValues, 1}; 187 OH_NN_UInt32Array inputIndices = {inputIndicesValues, 2}; 188 OH_NN_UInt32Array outputIndices = {&outputIndicesValues, 1}; 189 190 // Add the Add operator to the model instance. 191 ret = OH_NNModel_AddOperation(model, OH_NN_OPS_ADD, ¶mIndices, &inputIndices, &outputIndices); 192 if (ret != OH_NN_SUCCESS) { 193 std::cout << "BuildModel failed, add operation failed." << std::endl; 194 return ret; 195 } 196 197 // Set the input and output indexes of the model instance. 198 ret = OH_NNModel_SpecifyInputsAndOutputs(model, &inputIndices, &outputIndices); 199 if (ret != OH_NN_SUCCESS) { 200 std::cout << "BuildModel failed, specify inputs and outputs failed." << std::endl; 201 return ret; 202 } 203 204 // Complete the model instance construction. 205 ret = OH_NNModel_Finish(model); 206 if (ret != OH_NN_SUCCESS) { 207 std::cout << "BuildModel failed, error happened when finishing model construction." << std::endl; 208 return ret; 209 } 210 211 *pModel = model; 212 return OH_NN_SUCCESS; 213 } 214 ``` 215 2164. Query the acceleration chip connected to the Neural Network Runtime. 217 218 The Neural Network Runtime can connect to multiple acceleration chips through HDI APIs. Before model compilation, you need to query the acceleration chips connected to the Neural Network Runtime on the current device. Each acceleration chip has a unique ID. In the compilation phase, you need to specify the chip for model compilation based on the device ID. 219 ```cpp 220 void GetAvailableDevices(std::vector<size_t>& availableDevice) 221 { 222 availableDevice.clear(); 223 224 // Obtain the available hardware ID. 225 const size_t* devices = nullptr; 226 uint32_t deviceCount = 0; 227 OH_NN_ReturnCode ret = OH_NNDevice_GetAllDevicesID(&devices, &deviceCount); 228 if (ret != OH_NN_SUCCESS) { 229 std::cout << "GetAllDevicesID failed, get no available device." << std::endl; 230 return; 231 } 232 233 for (uint32_t i = 0; i < deviceCount; i++) { 234 availableDevice.emplace_back(devices[i]); 235 } 236 } 237 ``` 238 2395. Compile a model on the specified device. 240 241 The Neural Network Runtime uses abstract model expressions to describe the topology structure of an AI model. Before inference execution on an acceleration chip, the compilation module provided by Neural Network Runtime needs to deliver the abstract model expression to the chip driver layer and convert the abstract model expression into a format that supports inference and computing. 242 ```cpp 243 OH_NN_ReturnCode CreateCompilation(OH_NNModel* model, const std::vector<size_t>& availableDevice, OH_NNCompilation** pCompilation) 244 { 245 // Create a compilation instance to pass the model to the underlying hardware for compilation. 246 OH_NNCompilation* compilation = OH_NNCompilation_Construct(model); 247 if (compilation == nullptr) { 248 std::cout << "CreateCompilation failed, error happened when creating compilation." << std::endl; 249 return OH_NN_MEMORY_ERROR; 250 } 251 252 // Set compilation options, such as the compilation hardware, cache path, performance mode, computing priority, and whether to enable float16 low-precision computing. 253 254 // Choose to perform model compilation on the first device. 255 OH_NN_ReturnCode ret = OH_NNCompilation_SetDevice(compilation, availableDevice[0]); 256 if (ret != OH_NN_SUCCESS) { 257 std::cout << "CreateCompilation failed, error happened when setting device." << std::endl; 258 return ret; 259 } 260 261 // Have the model compilation result cached in the /data/local/tmp directory, with the version number set to 1. 262 ret = OH_NNCompilation_SetCache(compilation, "/data/local/tmp", 1); 263 if (ret != OH_NN_SUCCESS) { 264 std::cout << "CreateCompilation failed, error happened when setting cache path." << std::endl; 265 return ret; 266 } 267 268 // Start model compilation. 269 ret = OH_NNCompilation_Build(compilation); 270 if (ret != OH_NN_SUCCESS) { 271 std::cout << "CreateCompilation failed, error happened when building compilation." << std::endl; 272 return ret; 273 } 274 275 *pCompilation = compilation; 276 return OH_NN_SUCCESS; 277 } 278 ``` 279 2806. Create an executor. 281 282 After the model compilation is complete, you need to call the execution module of the Neural Network Runtime to create an inference executor. In the execution phase, operations such as setting the model input, obtaining the model output, and triggering inference computing are performed through the executor. 283 ```cpp 284 OH_NNExecutor* CreateExecutor(OH_NNCompilation* compilation) 285 { 286 // Create an executor instance. 287 OH_NNExecutor* executor = OH_NNExecutor_Construct(compilation); 288 return executor; 289 } 290 ``` 291 2927. Perform inference computing and print the computing result. 293 294 The input data required for inference computing is passed to the executor through the API provided by the execution module. This way, the executor is triggered to perform inference computing once to obtain the inference computing result. 295 ```cpp 296 OH_NN_ReturnCode Run(OH_NNExecutor* executor) 297 { 298 // Construct sample data. 299 float input1[12] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}; 300 float input2[12] = {11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22}; 301 302 int32_t inputDims[4] = {1, 2, 2, 3}; 303 OH_NN_Tensor inputTensor1 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 304 OH_NN_Tensor inputTensor2 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR}; 305 306 // Set the execution input. 307 308 // Set the first input for execution. The input data is specified by input1. 309 OH_NN_ReturnCode ret = OH_NNExecutor_SetInput(executor, 0, &inputTensor1, input1, DATA_LENGTH); 310 if (ret != OH_NN_SUCCESS) { 311 std::cout << "Run failed, error happened when setting first input." << std::endl; 312 return ret; 313 } 314 315 // Set the second input for execution. The input data is specified by input2. 316 ret = OH_NNExecutor_SetInput(executor, 1, &inputTensor2, input2, DATA_LENGTH); 317 if (ret != OH_NN_SUCCESS) { 318 std::cout << "Run failed, error happened when setting second input." << std::endl; 319 return ret; 320 } 321 322 // Set the output data cache. After the OH_NNExecutor_Run instance performs inference computing, the output result is stored in the output. 323 float output[12]; 324 ret = OH_NNExecutor_SetOutput(executor, 0, output, DATA_LENGTH); 325 if (ret != OH_NN_SUCCESS) { 326 std::cout << "Run failed, error happened when setting output buffer." << std::endl; 327 return ret; 328 } 329 330 // Perform inference computing. 331 ret = OH_NNExecutor_Run(executor); 332 if (ret != OH_NN_SUCCESS) { 333 std::cout << "Run failed, error doing execution." << std::endl; 334 return ret; 335 } 336 337 // Print the output result. 338 for (uint32_t i = 0; i < 12; i++) { 339 std::cout << "Output index: " << i << ", value is: " << output[i] << "." << std::endl; 340 } 341 342 return OH_NN_SUCCESS; 343 } 344 ``` 345 3468. Build an end-to-end process from model construction to model compilation and execution. 347 348 Steps 3 to 7 implement the model construction, compilation, and execution processes and encapsulates them into four functions to facilitate modular development. The following sample code shows how to concatenate the four functions into a complete Neural Network Runtime the development process. 349 ```cpp 350 int main() 351 { 352 OH_NNModel* model = nullptr; 353 OH_NNCompilation* compilation = nullptr; 354 OH_NNExecutor* executor = nullptr; 355 std::vector<size_t> availableDevices; 356 357 // Perform model construction. 358 OH_NN_ReturnCode ret = BuildModel(&model); 359 if (ret != OH_NN_SUCCESS) { 360 std::cout << "BuildModel failed." << std::endl; 361 OH_NNModel_Destroy(&model); 362 return -1; 363 } 364 365 // Obtain the available devices. 366 GetAvailableDevices(availableDevices); 367 if (availableDevices.empty()) { 368 std::cout << "No available device." << std::endl; 369 OH_NNModel_Destroy(&model); 370 return -1; 371 } 372 373 // Perform model compilation. 374 ret = CreateCompilation(model, availableDevices, &compilation); 375 if (ret != OH_NN_SUCCESS) { 376 std::cout << "CreateCompilation failed." << std::endl; 377 OH_NNModel_Destroy(&model); 378 OH_NNCompilation_Destroy(&compilation); 379 return -1; 380 } 381 382 // Create an inference executor for the model. 383 executor = CreateExecutor(compilation); 384 if (executor == nullptr) { 385 std::cout << "CreateExecutor failed, no executor is created." << std::endl; 386 OH_NNModel_Destroy(&model); 387 OH_NNCompilation_Destroy(&compilation); 388 return -1; 389 } 390 391 // Use the created executor to perform single-step inference computing. 392 ret = Run(executor); 393 if (ret != OH_NN_SUCCESS) { 394 std::cout << "Run failed." << std::endl; 395 OH_NNModel_Destroy(&model); 396 OH_NNCompilation_Destroy(&compilation); 397 OH_NNExecutor_Destroy(&executor); 398 return -1; 399 } 400 401 // Destroy the model to release occupied resources. 402 OH_NNModel_Destroy(&model); 403 OH_NNCompilation_Destroy(&compilation); 404 OH_NNExecutor_Destroy(&executor); 405 406 return 0; 407 } 408 ``` 409 410## Verification 411 4121. Prepare the compilation configuration file of the application sample. 413 414 Create a `CMakeLists.txt` file, and add compilation configurations to the application sample file `nnrt_example.cpp`. The following is a simple example of the `CMakeLists.txt` file: 415 ```text 416 cmake_minimum_required(VERSION 3.16) 417 project(nnrt_example C CXX) 418 419 add_executable(nnrt_example 420 ./nnrt_example.cpp 421 ) 422 423 target_link_libraries(nnrt_example 424 neural_network_runtime.z 425 ) 426 ``` 427 4282. Compile the application sample. 429 430 Create the **build/** directory in the current directory, and compile `nnrt\_example.cpp` in the **build/** directory to obtain the binary file `nnrt\_example`: 431 ```shell 432 mkdir build && cd build 433 cmake -DCMAKE_TOOLCHAIN_FILE={Path of the cross-compilation tool chain }/build/cmake/ohos.toolchain.cmake -DOHOS_ARCH=arm64-v8a -DOHOS_PLATFORM=OHOS -DOHOS_STL=c++_static .. 434 make 435 ``` 436 4373. Push the application sample to the device for execution. 438 ```shell 439 # Push the `nnrt_example` obtained through compilation to the device, and execute it. 440 hdc_std file send ./nnrt_example /data/local/tmp/. 441 442 # Grant required permissions to the executable file of the test case. 443 hdc_std shell "chmod +x /data/local/tmp/nnrt_example" 444 445 # Execute the test case. 446 hdc_std shell "/data/local/tmp/nnrt_example" 447 ``` 448 449 If the execution is normal, information similar to the following is displayed: 450 ```text 451 Output index: 0, value is: 11.000000. 452 Output index: 1, value is: 13.000000. 453 Output index: 2, value is: 15.000000. 454 Output index: 3, value is: 17.000000. 455 Output index: 4, value is: 19.000000. 456 Output index: 5, value is: 21.000000. 457 Output index: 6, value is: 23.000000. 458 Output index: 7, value is: 25.000000. 459 Output index: 8, value is: 27.000000. 460 Output index: 9, value is: 29.000000. 461 Output index: 10, value is: 31.000000. 462 Output index: 11, value is: 33.000000. 463 ``` 464 4654. (Optional) Check the model cache. 466 467 If the HDI service connected to the Neural Network Runtime supports the model cache function, you can find the generated cache file in the `/data/local/tmp` directory after the `nnrt_example` is executed successfully. 468 469 > **NOTE** 470 > 471 > The IR graphs of the model need to be passed to the hardware driver layer, so that the HDI service compiles the IR graphs into a computing graph dedicated to hardware. The compilation process is time-consuming. The Neural Network Runtime supports the computing graph cache feature. It can cache the computing graphs compiled by the HDI service to the device storage. If the same model is compiled on the same acceleration chip next time, you can specify the cache path so that the Neural Network Runtime can directly load the computing graphs in the cache file, reducing the compilation time. 472 473 Check the cached files in the cache directory. 474 475 ```shell 476 ls /data/local/tmp 477 ``` 478 479 The command output is as follows: 480 481 ```text 482 # 0.nncache cache_info.nncache 483 ``` 484 485 If the cache is no longer used, manually delete the cache files. 486 487 ```shell 488 rm /data/local/tmp/*nncache 489 ``` 490