• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Development Guide for Connecting the Neural Network Runtime to an AI Inference Framework
2
3## When to Use
4
5As a bridge between the AI inference engine and acceleration chip, the Neural Network Runtime provides simplified Native APIs for the AI inference engine to perform end-to-end inference through the acceleration chip.
6
7This document uses the `Add` single-operator model shown in Figure 1 as an example to describe the development process of Neural Network Runtime. The `Add` operator involves two inputs, one parameter, and one output. Wherein, the `activation` parameter is used to specify the type of the activation function in the `Add` operator.
8
9**Figure 1** Add single-operator model
10!["Add single-operator model"](figures/neural_network_runtime.png)
11
12## Preparing the Environment
13
14### Environment Requirements
15
16The environment requirements for the Neural Network Runtime are as follows:
17
18- System version: OpenHarmony master branch.
19- Development environment: Ubuntu 18.04 or later.
20- Access device: a standard device running OpenHarmony. The built-in hardware accelerator driver has been connected to the Neural Network Runtime through an HDI API.
21
22The Neural Network Runtime is opened to external systems through OpenHarmony Native APIs. Therefore, you need to use the Native development suite of the OpenHarmony to compile Neural Network Runtime applications.
23
24### Environment Setup
25
261. Start the Ubuntu server.
272. Copy the package of the Native development suite to the root directory of the current user.
283. Decompress the package of the Native development suite.
29
30```shell
31unzip native-linux-{version number}.zip
32```
33
34The directory structure after decompression is as follows. The content in the directory may vary depending on version iteration. Use the Native APIs of the latest version.
35```text
36native/
37─ ─ build // Cross-compilation toolchain
38─ ─ build-tools // Compilation and build tools
39├── docs
40├── llvm
41├── nativeapi_syscap_config.json
42├── ndk_system_capability.json
43├── NOTICE.txt
44├── oh-uni-package.json
45── sysroot // Native API header files and libraries
46```
47## Available APIs
48
49This section describes the common APIs used in the development process of the Neural Network Runtime.
50
51### Structure
52
53| Name| Description|
54| --------- | ---- |
55| typedef struct OH_NNModel OH_NNModel | Model handle of the Neural Network Runtime. It is used to construct a model.|
56| typedef struct OH_NNCompilation OH_NNCompilation | Compiler handle of the Neural Network Runtime. It is used to compile an AI model.|
57| typedef struct OH_NNExecutor OH_NNExecutor | Executor handle of the Neural Network Runtime. It is used to perform inference computing on a specified device.|
58
59### Model Construction APIs
60
61| Name| Description|
62| ------- | --- |
63| OH_NNModel_Construct() | Creates a model instance of the OH_NNModel type.|
64| OH_NN_ReturnCode OH_NNModel_AddTensor(OH_NNModel *model, const OH_NN_Tensor *tensor) | Adds a tensor to a model instance.|
65| OH_NN_ReturnCode OH_NNModel_SetTensorData(OH_NNModel *model, uint32_t index, const void *dataBuffer, size_t length) | Sets the tensor value.|
66| OH_NN_ReturnCode OH_NNModel_AddOperation(OH_NNModel *model, OH_NN_OperationType op, const OH_NN_UInt32Array *paramIndices, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Adds an operator to a model instance.|
67| OH_NN_ReturnCode OH_NNModel_SpecifyInputsAndOutputs(OH_NNModel *model, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Specifies the model input and output.|
68| OH_NN_ReturnCode OH_NNModel_Finish(OH_NNModel *model) | Completes model composition.|
69| void OH_NNModel_Destroy(OH_NNModel **model) | Destroys a model instance.|
70
71### Model Compilation APIs
72
73| Name| Description|
74| ------- | --- |
75| OH_NNCompilation *OH_NNCompilation_Construct(const OH_NNModel *model) | Creates a compilation instance of the OH_NNCompilation type.|
76| OH_NN_ReturnCode OH_NNCompilation_SetDevice(OH_NNCompilation *compilation, size_t deviceID) | Specifies the device for model compilation and computing.|
77| OH_NN_ReturnCode OH_NNCompilation_SetCache(OH_NNCompilation *compilation, const char *cachePath, uint32_t version) | Sets the cache directory and version of the compiled model.|
78| OH_NN_ReturnCode OH_NNCompilation_Build(OH_NNCompilation *compilation) | Performs model compilation.|
79| void OH_NNCompilation_Destroy(OH_NNCompilation **compilation) | Destroys the OH_NNCompilation instance.|
80
81### Inference Execution APIs
82
83| Name| Description|
84| ------- | --- |
85| OH_NNExecutor *OH_NNExecutor_Construct(OH_NNCompilation *compilation) | Creates an executor instance of the OH_NNExecutor type.|
86| OH_NN_ReturnCode OH_NNExecutor_SetInput(OH_NNExecutor *executor, uint32_t inputIndex, const OH_NN_Tensor *tensor, const void *dataBuffer, size_t length) | Sets the single input data for a model.|
87| OH_NN_ReturnCode OH_NNExecutor_SetOutput(OH_NNExecutor *executor, uint32_t outputIndex, void *dataBuffer, size_t length) | Sets the buffer for a single output of a model.|
88| OH_NN_ReturnCode OH_NNExecutor_Run(OH_NNExecutor *executor) | Executes model inference.|
89| void OH_NNExecutor_Destroy(OH_NNExecutor **executor) | Destroys the OH_NNExecutor instance to release the memory occupied by the instance.|
90
91### Device Management APIs
92
93| Name| Description|
94| ------- | --- |
95| OH_NN_ReturnCode OH_NNDevice_GetAllDevicesID(const size_t **allDevicesID, uint32_t *deviceCount) | Obtains the ID of the device connected to the Neural Network Runtime.|
96
97
98## How to Develop
99
100The development process of the Neural Network Runtime consists of three phases: model construction, model compilation, and inference execution. The following uses the `Add` single-operator model as an example to describe how to call Neural Network Runtime APIs during application development.
101
1021. Create an application sample file.
103
104    Create the source file of the Neural Network Runtime application sample. Run the following commands in the project directory to create the `nnrt_example/` directory and create the `nnrt_example.cpp` source file in the directory:
105
106    ```shell
107    mkdir ~/nnrt_example && cd ~/nnrt_example
108    touch nnrt_example.cpp
109    ```
110
1112. Import the Neural Network Runtime module.
112
113    Add the following code at the beginning of the `nnrt_example.cpp` file to import the Neural Network Runtime module:
114
115    ```cpp
116    #include <cstdint>
117    #include <iostream>
118    #include <vector>
119
120    #include "neural_network_runtime/neural_network_runtime.h"
121
122    // Constant, used to specify the byte length of the input and output data.
123    const size_t DATA_LENGTH = 4 * 12;
124    ```
125
1263. Construct a model.
127
128    Use Neural Network Runtime APIs to construct an `Add` single-operator sample model.
129
130    ```cpp
131    OH_NN_ReturnCode BuildModel(OH_NNModel** pModel)
132    {
133        // Create a model instance and construct a model.
134        OH_NNModel* model = OH_NNModel_Construct();
135        if (model == nullptr) {
136            std::cout << "Create model failed." << std::endl;
137            return OH_NN_MEMORY_ERROR;
138        }
139
140        // Add the first input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3].
141        int32_t inputDims[4] = {1, 2, 2, 3};
142        OH_NN_Tensor input1 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR};
143        OH_NN_ReturnCode ret = OH_NNModel_AddTensor(model, &input1);
144        if (ret != OH_NN_SUCCESS) {
145            std::cout << "BuildModel failed, add Tensor of first input failed." << std::endl;
146            return ret;
147        }
148
149        // Add the second input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3].
150        OH_NN_Tensor input2 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR};
151        ret = OH_NNModel_AddTensor(model, &input2);
152        if (ret != OH_NN_SUCCESS) {
153            std::cout << "BuildModel failed, add Tensor of second input failed." << std::endl;
154            return ret;
155        }
156
157        // Add the Tensor parameter of the Add operator. This parameter is used to specify the type of the activation function. The data type of the Tensor parameter is int8.
158        int32_t activationDims = 1;
159        int8_t activationValue = OH_NN_FUSED_NONE;
160        OH_NN_Tensor activation = {OH_NN_INT8, 1, &activationDims, nullptr, OH_NN_ADD_ACTIVATIONTYPE};
161        ret = OH_NNModel_AddTensor(model, &activation);
162        if (ret != OH_NN_SUCCESS) {
163            std::cout << "BuildModel failed, add Tensor of activation failed." << std::endl;
164            return ret;
165        }
166
167        // Set the type of the activation function to OH_NN_FUSED_NONE, indicating that no activation function is added to the operator.
168        ret = OH_NNModel_SetTensorData(model, 2, &activationValue, sizeof(int8_t));
169        if (ret != OH_NN_SUCCESS) {
170            std::cout << "BuildModel failed, set value of activation failed." << std::endl;
171            return ret;
172        }
173
174        // Set the output of the Add operator. The data type is float32 and the tensor shape is [1, 2, 2, 3].
175        OH_NN_Tensor output = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR};
176        ret = OH_NNModel_AddTensor(model, &output);
177        if (ret != OH_NN_SUCCESS) {
178            std::cout << "BuildModel failed, add Tensor of output failed." << std::endl;
179            return ret;
180        }
181
182        // Specify the input, parameter, and output indexes of the Add operator.
183        uint32_t inputIndicesValues[2] = {0, 1};
184        uint32_t paramIndicesValues = 2;
185        uint32_t outputIndicesValues = 3;
186        OH_NN_UInt32Array paramIndices = {&paramIndicesValues, 1};
187        OH_NN_UInt32Array inputIndices = {inputIndicesValues, 2};
188        OH_NN_UInt32Array outputIndices = {&outputIndicesValues, 1};
189
190        // Add the Add operator to the model instance.
191        ret = OH_NNModel_AddOperation(model, OH_NN_OPS_ADD, &paramIndices, &inputIndices, &outputIndices);
192        if (ret != OH_NN_SUCCESS) {
193            std::cout << "BuildModel failed, add operation failed." << std::endl;
194            return ret;
195        }
196
197        // Set the input and output indexes of the model instance.
198        ret = OH_NNModel_SpecifyInputsAndOutputs(model, &inputIndices, &outputIndices);
199        if (ret != OH_NN_SUCCESS) {
200            std::cout << "BuildModel failed, specify inputs and outputs failed." << std::endl;
201            return ret;
202        }
203
204        // Complete the model instance construction.
205        ret = OH_NNModel_Finish(model);
206        if (ret != OH_NN_SUCCESS) {
207            std::cout << "BuildModel failed, error happened when finishing model construction." << std::endl;
208            return ret;
209        }
210
211        *pModel = model;
212        return OH_NN_SUCCESS;
213    }
214    ```
215
2164. Query the acceleration chip connected to the Neural Network Runtime.
217
218    The Neural Network Runtime can connect to multiple acceleration chips through HDI APIs. Before model compilation, you need to query the acceleration chips connected to the Neural Network Runtime on the current device. Each acceleration chip has a unique ID. In the compilation phase, you need to specify the chip for model compilation based on the device ID.
219    ```cpp
220    void GetAvailableDevices(std::vector<size_t>& availableDevice)
221    {
222        availableDevice.clear();
223
224        // Obtain the available hardware ID.
225        const size_t* devices = nullptr;
226        uint32_t deviceCount = 0;
227        OH_NN_ReturnCode ret = OH_NNDevice_GetAllDevicesID(&devices, &deviceCount);
228        if (ret != OH_NN_SUCCESS) {
229            std::cout << "GetAllDevicesID failed, get no available device." << std::endl;
230            return;
231        }
232
233        for (uint32_t i = 0; i < deviceCount; i++) {
234            availableDevice.emplace_back(devices[i]);
235        }
236    }
237    ```
238
2395. Compile a model on the specified device.
240
241    The Neural Network Runtime uses abstract model expressions to describe the topology structure of an AI model. Before inference execution on an acceleration chip, the compilation module provided by Neural Network Runtime needs to deliver the abstract model expression to the chip driver layer and convert the abstract model expression into a format that supports inference and computing.
242    ```cpp
243    OH_NN_ReturnCode CreateCompilation(OH_NNModel* model, const std::vector<size_t>& availableDevice, OH_NNCompilation** pCompilation)
244    {
245        // Create a compilation instance to pass the model to the underlying hardware for compilation.
246        OH_NNCompilation* compilation = OH_NNCompilation_Construct(model);
247        if (compilation == nullptr) {
248            std::cout << "CreateCompilation failed, error happened when creating compilation." << std::endl;
249            return OH_NN_MEMORY_ERROR;
250        }
251
252        // Set compilation options, such as the compilation hardware, cache path, performance mode, computing priority, and whether to enable float16 low-precision computing.
253
254        // Choose to perform model compilation on the first device.
255        OH_NN_ReturnCode ret = OH_NNCompilation_SetDevice(compilation, availableDevice[0]);
256        if (ret != OH_NN_SUCCESS) {
257            std::cout << "CreateCompilation failed, error happened when setting device." << std::endl;
258            return ret;
259        }
260
261        // Have the model compilation result cached in the /data/local/tmp directory, with the version number set to 1.
262        ret = OH_NNCompilation_SetCache(compilation, "/data/local/tmp", 1);
263        if (ret != OH_NN_SUCCESS) {
264            std::cout << "CreateCompilation failed, error happened when setting cache path." << std::endl;
265            return ret;
266        }
267
268        // Start model compilation.
269        ret = OH_NNCompilation_Build(compilation);
270        if (ret != OH_NN_SUCCESS) {
271            std::cout << "CreateCompilation failed, error happened when building compilation." << std::endl;
272            return ret;
273        }
274
275        *pCompilation = compilation;
276        return OH_NN_SUCCESS;
277    }
278    ```
279
2806. Create an executor.
281
282    After the model compilation is complete, you need to call the execution module of the Neural Network Runtime to create an inference executor. In the execution phase, operations such as setting the model input, obtaining the model output, and triggering inference computing are performed through the executor.
283    ```cpp
284    OH_NNExecutor* CreateExecutor(OH_NNCompilation* compilation)
285    {
286        // Create an executor instance.
287        OH_NNExecutor* executor = OH_NNExecutor_Construct(compilation);
288        return executor;
289    }
290    ```
291
2927. Perform inference computing and print the computing result.
293
294    The input data required for inference computing is passed to the executor through the API provided by the execution module. This way, the executor is triggered to perform inference computing once to obtain the inference computing result.
295    ```cpp
296    OH_NN_ReturnCode Run(OH_NNExecutor* executor)
297    {
298        // Construct sample data.
299        float input1[12] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
300        float input2[12] = {11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22};
301
302        int32_t inputDims[4] = {1, 2, 2, 3};
303        OH_NN_Tensor inputTensor1 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR};
304        OH_NN_Tensor inputTensor2 = {OH_NN_FLOAT32, 4, inputDims, nullptr, OH_NN_TENSOR};
305
306        // Set the execution input.
307
308        // Set the first input for execution. The input data is specified by input1.
309        OH_NN_ReturnCode ret = OH_NNExecutor_SetInput(executor, 0, &inputTensor1, input1, DATA_LENGTH);
310        if (ret != OH_NN_SUCCESS) {
311            std::cout << "Run failed, error happened when setting first input." << std::endl;
312            return ret;
313        }
314
315        // Set the second input for execution. The input data is specified by input2.
316        ret = OH_NNExecutor_SetInput(executor, 1, &inputTensor2, input2, DATA_LENGTH);
317        if (ret != OH_NN_SUCCESS) {
318            std::cout << "Run failed, error happened when setting second input." << std::endl;
319            return ret;
320        }
321
322        // Set the output data cache. After the OH_NNExecutor_Run instance performs inference computing, the output result is stored in the output.
323        float output[12];
324        ret = OH_NNExecutor_SetOutput(executor, 0, output, DATA_LENGTH);
325        if (ret != OH_NN_SUCCESS) {
326            std::cout << "Run failed, error happened when setting output buffer." << std::endl;
327            return ret;
328        }
329
330        // Perform inference computing.
331        ret = OH_NNExecutor_Run(executor);
332        if (ret != OH_NN_SUCCESS) {
333            std::cout << "Run failed, error doing execution." << std::endl;
334            return ret;
335        }
336
337        // Print the output result.
338        for (uint32_t i = 0; i < 12; i++) {
339            std::cout << "Output index: " << i << ", value is: " << output[i] << "." << std::endl;
340        }
341
342        return OH_NN_SUCCESS;
343    }
344    ```
345
3468. Build an end-to-end process from model construction to model compilation and execution.
347
348    Steps 3 to 7 implement the model construction, compilation, and execution processes and encapsulates them into four functions to facilitate modular development. The following sample code shows how to concatenate the four functions into a complete Neural Network Runtime the development process.
349    ```cpp
350    int main()
351    {
352        OH_NNModel* model = nullptr;
353        OH_NNCompilation* compilation = nullptr;
354        OH_NNExecutor* executor = nullptr;
355        std::vector<size_t> availableDevices;
356
357        // Perform model construction.
358        OH_NN_ReturnCode ret = BuildModel(&model);
359        if (ret != OH_NN_SUCCESS) {
360            std::cout << "BuildModel failed." << std::endl;
361            OH_NNModel_Destroy(&model);
362            return -1;
363        }
364
365        // Obtain the available devices.
366        GetAvailableDevices(availableDevices);
367        if (availableDevices.empty()) {
368            std::cout << "No available device." << std::endl;
369            OH_NNModel_Destroy(&model);
370            return -1;
371        }
372
373        // Perform model compilation.
374        ret = CreateCompilation(model, availableDevices, &compilation);
375        if (ret != OH_NN_SUCCESS) {
376            std::cout << "CreateCompilation failed." << std::endl;
377            OH_NNModel_Destroy(&model);
378            OH_NNCompilation_Destroy(&compilation);
379            return -1;
380        }
381
382        // Create an inference executor for the model.
383        executor = CreateExecutor(compilation);
384        if (executor == nullptr) {
385            std::cout << "CreateExecutor failed, no executor is created." << std::endl;
386            OH_NNModel_Destroy(&model);
387            OH_NNCompilation_Destroy(&compilation);
388            return -1;
389        }
390
391        // Use the created executor to perform single-step inference computing.
392        ret = Run(executor);
393        if (ret != OH_NN_SUCCESS) {
394            std::cout << "Run failed." << std::endl;
395            OH_NNModel_Destroy(&model);
396            OH_NNCompilation_Destroy(&compilation);
397            OH_NNExecutor_Destroy(&executor);
398            return -1;
399        }
400
401        // Destroy the model to release occupied resources.
402        OH_NNModel_Destroy(&model);
403        OH_NNCompilation_Destroy(&compilation);
404        OH_NNExecutor_Destroy(&executor);
405
406        return 0;
407    }
408    ```
409
410## Verification
411
4121. Prepare the compilation configuration file of the application sample.
413
414    Create a `CMakeLists.txt` file, and add compilation configurations to the application sample file `nnrt_example.cpp`. The following is a simple example of the `CMakeLists.txt` file:
415    ```text
416    cmake_minimum_required(VERSION 3.16)
417    project(nnrt_example C CXX)
418
419    add_executable(nnrt_example
420        ./nnrt_example.cpp
421    )
422
423    target_link_libraries(nnrt_example
424        neural_network_runtime.z
425    )
426    ```
427
4282. Compile the application sample.
429
430    Create the **build/** directory in the current directory, and compile `nnrt\_example.cpp` in the **build/** directory to obtain the binary file `nnrt\_example`:
431    ```shell
432    mkdir build && cd build
433    cmake -DCMAKE_TOOLCHAIN_FILE={Path of the cross-compilation tool chain }/build/cmake/ohos.toolchain.cmake -DOHOS_ARCH=arm64-v8a -DOHOS_PLATFORM=OHOS -DOHOS_STL=c++_static ..
434    make
435    ```
436
4373. Push the application sample to the device for execution.
438    ```shell
439    # Push the `nnrt_example` obtained through compilation to the device, and execute it.
440    hdc_std file send ./nnrt_example /data/local/tmp/.
441
442    # Grant required permissions to the executable file of the test case.
443    hdc_std shell "chmod +x /data/local/tmp/nnrt_example"
444
445    # Execute the test case.
446    hdc_std shell "/data/local/tmp/nnrt_example"
447    ```
448
449    If the execution is normal, information similar to the following is displayed:
450    ```text
451    Output index: 0, value is: 11.000000.
452    Output index: 1, value is: 13.000000.
453    Output index: 2, value is: 15.000000.
454    Output index: 3, value is: 17.000000.
455    Output index: 4, value is: 19.000000.
456    Output index: 5, value is: 21.000000.
457    Output index: 6, value is: 23.000000.
458    Output index: 7, value is: 25.000000.
459    Output index: 8, value is: 27.000000.
460    Output index: 9, value is: 29.000000.
461    Output index: 10, value is: 31.000000.
462    Output index: 11, value is: 33.000000.
463    ```
464
4654. (Optional) Check the model cache.
466
467    If the HDI service connected to the Neural Network Runtime supports the model cache function, you can find the generated cache file in the `/data/local/tmp` directory after the `nnrt_example` is executed successfully.
468
469    > **NOTE**
470    >
471    > The IR graphs of the model need to be passed to the hardware driver layer, so that the HDI service compiles the IR graphs into a computing graph dedicated to hardware. The compilation process is time-consuming. The Neural Network Runtime supports the computing graph cache feature. It can cache the computing graphs compiled by the HDI service to the device storage. If the same model is compiled on the same acceleration chip next time, you can specify the cache path so that the Neural Network Runtime can directly load the computing graphs in the cache file, reducing the compilation time.
472
473    Check the cached files in the cache directory.
474
475    ```shell
476    ls /data/local/tmp
477    ```
478
479    The command output is as follows:
480
481    ```text
482    # 0.nncache  cache_info.nncache
483    ```
484
485    If the cache is no longer used, manually delete the cache files.
486
487    ```shell
488    rm /data/local/tmp/*nncache
489    ```
490