• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Using MindSpore Lite for Speech Recognition (C/C++)
2
3<!--Kit: MindSpore Lite Kit-->
4<!--Subsystem: AI-->
5<!--Owner: @zhuguodong8-->
6<!--Designer: @zhuguodong8; @jjfeing-->
7<!--Tester: @principal87-->
8<!--Adviser: @ge-yafang-->
9
10## When to Use
11
12You can use [MindSpore](../../reference/apis-mindspore-lite-kit/capi-mindspore.md) to quickly deploy AI algorithms into your application to perform AI model inference for speech recognition.
13
14Speech recognition can convert an audio file into text, which is widely used in intelligent voice assistants, voice input, and voice search.
15
16## Basic Concepts
17
18- N-API: a set of native APIs used to build ArkTS components. N-APIs can be used to encapsulate C/C++ libraries into ArkTS modules.
19
20## How to Develop
21
221. Select a speech recognition model.
232. Use the MindSpore Lite inference model on the device to implement speech recognition.
24
25## Environment Setup
26
27Install DevEco Studio 5.0.2 or later, and update the SDK to API version 14 or later.
28
29## Development Procedure
30
31This section uses the inference of a speech recognition model as an example to demonstrate how to implement a speech recognition application using MindSpore Lite.
32
33### Selecting an Appropriate Model
34
35The speech recognition model files **tiny-encoder.ms**, **tiny-decoder-main.ms**, and **tiny-decoder-loop.ms** used in this sample application are stored in the **entry/src/main/resources/rawfile** directory.
36
37### Writing the Code for Audio Playback
38
391. Call [@ohos.multimedia.media](../../reference/apis-media-kit/arkts-apis-media.md) and [@ohos.multimedia.audio](../../reference/apis-audio-kit/arkts-apis-audio.md) to play audio.
40
41   ```ts
42   // player.ets
43   import { media } from '@kit.MediaKit';
44   import { common } from '@kit.AbilityKit';
45   import { BusinessError } from '@kit.BasicServicesKit';
46   import { audio } from '@kit.AudioKit';
47   import { UIContext } from '@kit.ArkUI';
48
49   export default class AVPlayerDemo {
50     private isSeek: boolean = false; // Disable the seek operation.
51     // Set the AVPlayer callback.
52     setAVPlayerCallback(avPlayer: media.AVPlayer) {
53       // Callback for the seek operation.
54       avPlayer.on('seekDone', (seekDoneTime: number) => {
55         console.info(`MS_LITE_LOG: AVPlayer seek succeeded, seek time is ${seekDoneTime}`);
56       });
57       // Callback invoked if an error occurs while the AVPlayer is playing audio. In such a case, reset() is called to reset the AVPlayer.
58       avPlayer.on('error', (err: BusinessError) => {
59         console.error(`MS_LITE_LOG: Invoke avPlayer failed, code is ${err.code}, message is ${err.message}`);
60         avPlayer.reset(); // Call reset() to reset the AVPlayer, which enters the idle state.
61       });
62       // Callback for state changes.
63       avPlayer.on('stateChange', async (state: string, reason: media.StateChangeReason) => {
64         switch (state) {
65           case 'idle': // This state is reported upon a successful callback of reset().
66             console.info('MS_LITE_LOG: AVPlayer state idle called.');
67             avPlayer.release(); // Call release() to release the instance.
68             break;
69           case 'initialized': // This state is reported when the AVPlayer sets the playback source.
70             console.info('MS_LITE_LOG: AVPlayer state initialized called.');
71             avPlayer.audioRendererInfo = {
72               usage: audio.StreamUsage.STREAM_USAGE_MUSIC, // Audio stream usage type: music. Set this parameter based on the service scenario.
73               rendererFlags: 0 // Audio renderer flag.
74             };
75             avPlayer.prepare();
76             break;
77           case 'prepared': // This state is reported upon a successful callback of prepare().
78             console.info('MS_LITE_LOG: AVPlayer state prepared called.');
79             avPlayer.play(); // Call play() to start playback.
80             break;
81           case 'playing': // This state is reported upon a successful callback of play().
82             console.info('MS_LITE_LOG: AVPlayer state playing called.');
83             if (this.isSeek) {
84               console.info('MS_LITE_LOG: AVPlayer start to seek.');
85               avPlayer.seek(0); // Move the playback position to the beginning of the audio.
86             } else {
87               // When the seek operation is not supported, the playback continues until it reaches the end.
88               console.info('MS_LITE_LOG: AVPlayer wait to play end.');
89             }
90             break;
91           case 'paused': // This state is reported upon a successful callback of pause().
92             console.info('MS_LITE_LOG: AVPlayer state paused called.');
93             setTimeout(() => {
94               console.info('MS_LITE_LOG: AVPlayer paused wait to play again');
95               avPlayer.play(); // After the playback is paused for 3 seconds, call the play API again to start playback.
96             }, 3000);
97             break;
98           case 'completed': // This state is reported upon the completion of the playback.
99             console.info('MS_LITE_LOG: AVPlayer state completed called.');
100             avPlayer.stop(); // Call stop() to stop the playback.
101             break;
102           case 'stopped': // This state is reported upon a successful callback of stop().
103             console.info('MS_LITE_LOG: AVPlayer state stopped called.');
104             avPlayer.reset(); // Call reset() to reset the AVPlayer.
105             break;
106           case 'released':
107             console.info('MS_LITE_LOG: AVPlayer state released called.');
108             break;
109           default:
110             console.info('MS_LITE_LOG: AVPlayer state unknown called.');
111             break;
112         }
113       });
114     }
115
116     // Use the resource management API to obtain the audio file and play the audio file through the fdSrc attribute.
117     async avPlayerFdSrcDemo() {
118       // Create an AVPlayer instance.
119       let avPlayer: media.AVPlayer = await media.createAVPlayer();
120       // Create a callback for state changes.
121       this.setAVPlayerCallback(avPlayer);
122       // Call getRawFd of the resourceManager member of UIAbilityContext to obtain the media asset URL.
123       // The return type is {fd,offset,length}, where fd indicates the file descriptor address of the HAP file, offset indicates the media asset offset, and length indicates the duration of the media asset to play.
124       let context = new UIContext().getHostContext() as common.UIAbilityContext;
125       let fileDescriptor = await context.resourceManager.getRawFd('zh.wav');
126       let avFileDescriptor: media.AVFileDescriptor =
127         { fd: fileDescriptor.fd, offset: fileDescriptor.offset, length: fileDescriptor.length };
128       this.isSeek = true; // Enable the seek operation.
129       // Assign a value to fdSrc to trigger the reporting of the initialized state.
130       avPlayer.fdSrc = avFileDescriptor;
131     }
132   }
133   ```
134
135
136### Writing the Code for Speech Recognition
137
138Call [MindSpore](../../reference/apis-mindspore-lite-kit/capi-mindspore.md) to perform inference on the three models in sequence. The inference process is as follows:
139
1401. Include the corresponding header files. The third-party libraries **librosa**, **libsamplerate**, and **base64.h** are from [LibrosaCpp](https://github.com/ewan-xu/LibrosaCpp), [libsamplerate](https://github.com/libsndfile/libsamplerate), AudioFile.h, and [whisper.axera](https://github.com/ml-inory/whisper.axera/tree/main/cpp/src), respectively.
141
142   ```c++
143   #include "AudioFile.h"
144   #include "base64.h"
145   #include "napi/native_api.h"
146   #include "utils.h"
147   #include <algorithm>
148   #include <cstdlib>
149   #include <fstream>
150   #include <hilog/log.h>
151   #include <iostream>
152   #include <librosa/librosa.h>
153   #include <mindspore/context.h>
154   #include <mindspore/model.h>
155   #include <mindspore/status.h>
156   #include <mindspore/tensor.h>
157   #include <mindspore/types.h>
158   #include <numeric>
159   #include <rawfile/raw_file_manager.h>
160   #include <sstream>
161   #include <vector>
162   ```
163
1642. Read related files such as audio files and model files, and converts them to buffer data.
165
166   ```c++
167   #define LOGI(...) ((void)OH_LOG_Print(LOG_APP, LOG_INFO, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__))
168   #define LOGD(...) ((void)OH_LOG_Print(LOG_APP, LOG_DEBUG, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__))
169   #define LOGW(...) ((void)OH_LOG_Print(LOG_APP, LOG_WARN, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__))
170   #define LOGE(...) ((void)OH_LOG_Print(LOG_APP, LOG_ERROR, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__))
171
172   using BinBuffer = std::pair<void *, size_t>;
173
174   BinBuffer ReadBinFile(NativeResourceManager *nativeResourceManager, const std::string &modelName)
175   {
176       auto rawFile = OH_ResourceManager_OpenRawFile(nativeResourceManager, modelName.c_str());
177       if (rawFile == nullptr) {
178           LOGE("MS_LITE_ERR: Open model file failed");
179           return BinBuffer(nullptr, 0);
180       }
181       long fileSize = OH_ResourceManager_GetRawFileSize(rawFile);
182       if (fileSize <= 0) {
183           LOGE("MS_LITE_ERR: FileSize not correct");
184           return BinBuffer(nullptr, 0);
185       }
186       void *buffer = malloc(fileSize);
187       if (buffer == nullptr) {
188           LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed");
189           return BinBuffer(nullptr, 0);
190       }
191       int ret = OH_ResourceManager_ReadRawFile(rawFile, buffer, fileSize);
192       if (ret == 0) {
193           LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed");
194           OH_ResourceManager_CloseRawFile(rawFile);
195           return BinBuffer(nullptr, 0);
196       }
197       OH_ResourceManager_CloseRawFile(rawFile);
198       return BinBuffer(buffer, fileSize);
199   }
200
201   BinBuffer ReadTokens(NativeResourceManager *nativeResourceManager, const std::string &modelName) {
202       auto rawFile = OH_ResourceManager_OpenRawFile(nativeResourceManager, modelName.c_str());
203       if (rawFile == nullptr) {
204           LOGE("MS_LITE_ERR: Open model file failed");
205           return BinBuffer(nullptr, 0);
206       }
207       long fileSize = OH_ResourceManager_GetRawFileSize(rawFile);
208       if (fileSize <= 0) {
209           LOGE("MS_LITE_ERR: FileSize not correct");
210           return BinBuffer(nullptr, 0);
211       }
212       void *buffer = malloc(fileSize);
213       if (buffer == nullptr) {
214           LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed");
215           return BinBuffer(nullptr, 0);
216       }
217       int ret = OH_ResourceManager_ReadRawFile(rawFile, buffer, fileSize);
218       if (ret == 0) {
219           LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed");
220           OH_ResourceManager_CloseRawFile(rawFile);
221           return BinBuffer(nullptr, 0);
222       }
223       OH_ResourceManager_CloseRawFile(rawFile);
224       BinBuffer res(buffer, fileSize);
225       return res;
226   }
227   ```
228
2293. Create a context, set the device type, and load the model.
230
231   ```c++
232   void DestroyModelBuffer(void **buffer)
233   {
234       if (buffer == nullptr) {
235           return;
236       }
237       free(*buffer);
238       *buffer = nullptr;
239   }
240
241   OH_AI_ModelHandle CreateMSLiteModel(BinBuffer &bin)
242   {
243       // Create and configure the context for model inference.
244       auto context = OH_AI_ContextCreate();
245       if (context == nullptr) {
246           DestroyModelBuffer(&bin.first);
247           LOGE("MS_LITE_ERR: Create MSLite context failed.\n");
248           return nullptr;
249       }
250       auto cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU);
251       OH_AI_DeviceInfoSetEnableFP16(cpu_device_info, false);
252       OH_AI_ContextAddDeviceInfo(context, cpu_device_info);
253
254       // Create a model.
255       auto model = OH_AI_ModelCreate();
256       if (model == nullptr) {
257           DestroyModelBuffer(&bin.first);
258           LOGE("MS_LITE_ERR: Allocate MSLite Model failed.\n");
259           return nullptr;
260       }
261
262       // Load and build the inference model. The model type is OH_AI_MODELTYPE_MINDIR.
263       auto build_ret = OH_AI_ModelBuild(model, bin.first, bin.second, OH_AI_MODELTYPE_MINDIR, context);
264       DestroyModelBuffer(&bin.first);
265       if (build_ret != OH_AI_STATUS_SUCCESS) {
266           OH_AI_ModelDestroy(&model);
267           LOGE("MS_LITE_ERR: Build MSLite model failed.\n");
268           return nullptr;
269       }
270       LOGI("MS_LITE_LOG: Build MSLite model success.\n");
271       return model;
272   }
273   ```
274
2754. Set the model input data, and perform model inference.
276
277   ```c++
278   constexpr int K_NUM_PRINT_OF_OUT_DATA = 20;
279
280   int FillInputTensor(OH_AI_TensorHandle input, const BinBuffer &bin)
281   {
282       if (OH_AI_TensorGetDataSize(input) != bin.second) {
283           return OH_AI_STATUS_LITE_INPUT_PARAM_INVALID;
284       }
285       char *data = (char *)OH_AI_TensorGetMutableData(input);
286       memcpy(data, (const char *)bin.first, OH_AI_TensorGetDataSize(input));
287       return OH_AI_STATUS_SUCCESS;
288   }
289
290   // Perform model inference.
291   int RunMSLiteModel(OH_AI_ModelHandle model, std::vector<BinBuffer> inputBins)
292   {
293       // Set the model input data.
294       auto inputs = OH_AI_ModelGetInputs(model);
295       for(int i = 0; i < inputBins.size(); i++)
296       {
297           auto ret = FillInputTensor(inputs.handle_list[i], inputBins[i]);
298           if (ret != OH_AI_STATUS_SUCCESS) {
299               LOGE("MS_LITE_ERR: set input %{public}d error.\n", i);
300               return OH_AI_STATUS_LITE_ERROR;
301           }
302       }
303
304       // Obtain the output tensor of the model.
305       auto outputs = OH_AI_ModelGetOutputs(model);
306
307       // Model inference
308       auto predict_ret = OH_AI_ModelPredict(model, inputs, &outputs, nullptr, nullptr);
309       if (predict_ret != OH_AI_STATUS_SUCCESS) {
310           OH_AI_ModelDestroy(&model);
311           LOGE("MS_LITE_ERR: MSLite Predict error.\n");
312           return OH_AI_STATUS_LITE_ERROR;
313       }
314       LOGD("MS_LITE_LOG: Run MSLite model Predict success.\n");
315
316       // Print the output data.
317       LOGD("MS_LITE_LOG: Get model outputs:\n");
318       for (size_t i = 0; i < outputs.handle_num; i++) {
319           auto tensor = outputs.handle_list[i];
320           LOGD("MS_LITE_LOG: - Tensor %{public}d name is: %{public}s.\n", static_cast<int>(i),
321                OH_AI_TensorGetName(tensor));
322           LOGD("MS_LITE_LOG: - Tensor %{public}d size is: %{public}d.\n", static_cast<int>(i),
323                (int)OH_AI_TensorGetDataSize(tensor));
324           LOGD("MS_LITE_LOG: - Tensor data is:\n");
325           auto out_data = reinterpret_cast<const float *>(OH_AI_TensorGetData(tensor));
326           std::stringstream outStr;
327           for (int i = 0; (i < OH_AI_TensorGetElementNum(tensor)) && (i <= K_NUM_PRINT_OF_OUT_DATA); i++) {
328               outStr << out_data[i] << " ";
329           }
330           LOGD("MS_LITE_LOG: %{public}s", outStr.str().c_str());
331       }
332       return OH_AI_STATUS_SUCCESS;
333   }
334   ```
335
3365. Repeat the preceding procedure for the remaining models.
337
338   ```c++
339   const float NEG_INF = -std::numeric_limits<float>::infinity();
340   const int WHISPER_SOT = 50258;
341   const int WHISPER_TRANSCRIBE = 50359;
342   const int WHISPER_TRANSLATE = 50358;
343   const int WHISPER_NO_TIMESTAMPS = 50363;
344   const int WHISPER_EOT = 50257;
345   const int WHISPER_BLANK = 220;
346   const int WHISPER_NO_SPEECH = 50362;
347   const int WHISPER_N_TEXT_CTX = 448;
348   const int WHISPER_N_TEXT_STATE = 384;
349   constexpr int WHISPER_SAMPLE_RATE = 16000;
350
351   BinBuffer GetMSOutput(OH_AI_TensorHandle output) {
352       float *outputData = reinterpret_cast<float *>(OH_AI_TensorGetMutableData(output));
353       size_t size = OH_AI_TensorGetDataSize(output);
354       return {outputData, size};
355   }
356
357   void SupressTokens(BinBuffer &logits, bool is_initial) {
358       auto logits_data = static_cast<float *>(logits.first);
359       if (is_initial) {
360           logits_data[WHISPER_EOT] = NEG_INF;
361           logits_data[WHISPER_BLANK] = NEG_INF;
362       }
363
364       // Suppress other tokens.
365       logits_data[WHISPER_NO_TIMESTAMPS] = NEG_INF;
366       logits_data[WHISPER_SOT] = NEG_INF;
367       logits_data[WHISPER_NO_SPEECH] = NEG_INF;
368       logits_data[WHISPER_TRANSLATE] = NEG_INF;
369   }
370
371   std::vector<int> LoopPredict(const OH_AI_ModelHandle model, const BinBuffer &n_layer_cross_k,
372                                const BinBuffer &n_layer_cross_v, const BinBuffer &logits_init,
373                                BinBuffer &out_n_layer_self_k_cache, BinBuffer &out_n_layer_self_v_cache,
374                                const BinBuffer &data_embedding, const int loop, const int offset_init) {
375       BinBuffer logits{nullptr, 51865 * sizeof(float)};
376       logits.first = malloc(logits.second);
377       if (!logits.first) {
378           LOGE("MS_LITE_ERR: Fail to malloc!\n");
379           return {};
380       }
381       void *logits_init_src = static_cast<char *>(logits_init.first) + 51865 * 3 * sizeof(float);
382       memcpy(logits.first, logits_init_src, logits.second);
383       SupressTokens(logits, true);
384
385       std::vector<int> output_token;
386       float *logits_data = static_cast<float *>(logits.first);
387       int max_token_id = 0;
388       float max_token = logits_data[0];
389       for (int i = 0; i < logits.second / sizeof(float); i++) {
390           if (logits_data[i] > max_token) {
391               max_token_id = i;
392               max_token = logits_data[i];
393           }
394       }
395
396       int offset = offset_init;
397       BinBuffer slice{nullptr, 0};
398       slice.second = WHISPER_N_TEXT_STATE * sizeof(float);
399       slice.first = malloc(slice.second);
400       if (!slice.first) {
401           LOGE("MS_LITE_ERR: Fail to malloc!\n");
402           return {};
403       }
404
405       auto out_n_layer_self_k_cache_new = out_n_layer_self_k_cache;
406       auto out_n_layer_self_v_cache_new = out_n_layer_self_v_cache;
407
408       for (size_t i = 0; i < loop; i++) {
409           if (max_token_id == WHISPER_EOT) {
410               break;
411           }
412           output_token.push_back(max_token_id);
413           std::vector<float> mask(WHISPER_N_TEXT_CTX, 0.0f);
414           for (size_t i = 0; i < WHISPER_N_TEXT_CTX - offset - 1; ++i) {
415               mask[i] = NEG_INF;
416           }
417           BinBuffer tokens{&max_token_id, sizeof(int)};
418
419           void *data_embedding_src =
420               static_cast<char *>(data_embedding.first) + offset * WHISPER_N_TEXT_STATE * sizeof(float);
421           memcpy(slice.first, data_embedding_src, slice.second);
422           BinBuffer mask_bin(mask.data(), mask.size() * sizeof(float));
423           int ret = RunMSLiteModel(model, {tokens, out_n_layer_self_k_cache_new, out_n_layer_self_v_cache_new,
424                                            n_layer_cross_k, n_layer_cross_v, slice, mask_bin});
425
426           auto outputs = OH_AI_ModelGetOutputs(model);
427           logits = GetMSOutput(outputs.handle_list[0]);
428           out_n_layer_self_k_cache_new = GetMSOutput(outputs.handle_list[1]);
429           out_n_layer_self_v_cache_new = GetMSOutput(outputs.handle_list[2]);
430           offset++;
431           SupressTokens(logits, false);
432           logits_data = static_cast<float *>(logits.first);
433           max_token = logits_data[0];
434
435           for (int j = 0; j < logits.second / sizeof(float); j++) {
436               if (logits_data[j] > max_token) {
437                   max_token_id = j;
438                   max_token = logits_data[j];
439               }
440           }
441           LOGI("MS_LITE_LOG: run decoder loop %{public}d ok!\n token = %{public}d", i, max_token_id);
442       }
443       return output_token;
444   }
445
446   std::vector<std::string> ProcessDataLines(const BinBuffer token_txt) {
447       void *data_ptr = token_txt.first;
448       size_t data_size = token_txt.second;
449       std::vector<std::string> tokens;
450
451       const char *char_data = static_cast<const char *>(data_ptr);
452       std::stringstream ss(std::string(char_data, char_data + data_size));
453       std::string line;
454       while (std::getline(ss, line)) {
455           size_t space_pos = line.find(' ');
456           tokens.push_back(line.substr(0, space_pos));
457       }
458       return tokens;
459   }
460
461   static napi_value RunDemo(napi_env env, napi_callback_info info)
462   {
463       // Perform sample inference.
464       napi_value error_ret;
465       napi_create_int32(env, -1, &error_ret);
466       size_t argc = 1;
467       napi_value argv[1] = {nullptr};
468       napi_get_cb_info(env, info, &argc, argv, nullptr, nullptr);
469       auto resourcesManager = OH_ResourceManager_InitNativeResourceManager(env, argv[0]);
470
471       // Data preprocessing
472       AudioFile<float> audioFile;
473       std::string filePath = "zh.wav";
474       auto audioBin = ReadBinFile(resourcesManager, filePath);
475       if (audioBin.first == nullptr) {
476           LOGE("MS_LITE_ERR: Fail to read  %{public}s!", filePath.c_str());
477           return error_ret;
478       }
479       size_t dataSize = audioBin.second;
480       uint8_t *dataBuffer = (uint8_t *)audioBin.first;
481       bool ok = audioFile.loadFromMemory(std::vector<uint8_t>(dataBuffer, dataBuffer + dataSize));
482       if (!ok) {
483           LOGE("MS_LITE_ERR: Fail to read  %{public}s!", filePath.c_str());
484           return error_ret;
485       }
486       std::vector<float> data(audioFile.samples[0]);
487       ResampleAudio(data, audioFile.getSampleRate(), WHISPER_SAMPLE_RATE, 1, SRC_SINC_BEST_QUALITY);
488       std::vector<float> audio(data);
489
490       int padding = 480000;
491       int sr = 16000;
492       int n_fft = 480;
493       int n_hop = 160;
494       int n_mel = 80;
495       int fmin = 0; // Minimum frequency. The default value is 0.0 Hz.
496       int fmax =
497           sr /
498           2.0; // Maximum frequency. The default value is half of the sampling rate (sr/2.0).
499       audio.insert(audio.end(), padding, 0.0f);
500       std::vector<std::vector<float>> mels_T =
501           librosa::Feature::melspectrogram(audio, sr, n_fft, n_hop, "hann", true, "reflect", 2.f, n_mel, fmin, fmax);
502       std::cout << "mels: " << std::endl;
503
504       std::vector<std::vector<float>> mels = TransposeMel(mels_T);
505       ProcessMelSpectrogram(mels);
506
507       std::vector<float> inputMels(mels.size() * mels[0].size(), 0);
508       for (int i = 0; i < mels.size(); i++) {
509           std::copy(mels[i].begin(), mels[i].end(), inputMels.begin() + i * mels[0].size());
510       }
511
512       BinBuffer inputMelsBin(inputMels.data(), inputMels.size() * sizeof(float));
513
514       // Run inference on tiny-encoder.ms.
515       auto encoderBin = ReadBinFile(resourcesManager, "tiny-encoder.ms");
516       if (encoderBin.first == nullptr) {
517           free(dataBuffer);
518           dataBuffer = nullptr;
519           return error_ret;
520       }
521
522       auto encoder = CreateMSLiteModel(encoderBin);
523
524       int ret = RunMSLiteModel(encoder, {inputMelsBin});
525       if (ret != OH_AI_STATUS_SUCCESS) {
526           OH_AI_ModelDestroy(&encoder);
527           return error_ret;
528       }
529       LOGI("MS_LITE_LOG: run encoder ok!\n");
530
531       auto outputs = OH_AI_ModelGetOutputs(encoder);
532       auto n_layer_cross_k = GetMSOutput(outputs.handle_list[0]);
533       auto n_layer_cross_v = GetMSOutput(outputs.handle_list[1]);
534
535       // Run inference on tiny-decoder-main.ms.
536       std::vector<int> SOT_SEQUENCE = {WHISPER_SOT,
537                                        WHISPER_SOT + 1 + 1,
538                                        WHISPER_TRANSCRIBE, WHISPER_NO_TIMESTAMPS};
539       BinBuffer sotSequence(SOT_SEQUENCE.data(), SOT_SEQUENCE.size() * sizeof(int));
540
541       const std::string decoder_main_path = "tiny-decoder-main.ms";
542       auto decoderMainBin = ReadBinFile(resourcesManager, decoder_main_path);
543       if (decoderMainBin.first == nullptr) {
544           OH_AI_ModelDestroy(&encoder);
545           return error_ret;
546       }
547       auto decoder_main = CreateMSLiteModel(decoderMainBin);
548       int ret2 = RunMSLiteModel(decoder_main, {sotSequence, n_layer_cross_k, n_layer_cross_v});
549
550       if (ret2 != OH_AI_STATUS_SUCCESS) {
551           OH_AI_ModelDestroy(&decoder_main);
552           return error_ret;
553       }
554       LOGI("MS_LITE_LOG: run decoder_main ok!\n");
555
556       auto decoderMainOut = OH_AI_ModelGetOutputs(decoder_main);
557       auto logitsBin = GetMSOutput(decoderMainOut.handle_list[0]);
558       auto out_n_layer_self_k_cache_Bin = GetMSOutput(decoderMainOut.handle_list[1]);
559       auto out_n_layer_self_v_cache_Bin = GetMSOutput(decoderMainOut.handle_list[2]);
560
561       // Run inference on tiny-decoder-loop.ms.
562       const std::string modelName3 = "tiny-decoder-loop.ms";
563       auto modelBuffer3 = ReadBinFile(resourcesManager, modelName3);
564       auto decoder_loop = CreateMSLiteModel(modelBuffer3);
565
566       const std::string dataName_embedding = "tiny-positional_embedding.bin"; // Obtain the input data.
567       auto data_embedding = ReadBinFile(resourcesManager, dataName_embedding);
568       if (data_embedding.first == nullptr) {
569           OH_AI_ModelDestroy(&encoder);
570           OH_AI_ModelDestroy(&decoder_main);
571           OH_AI_ModelDestroy(&decoder_loop);
572           return error_ret;
573       }
574
575       int loop_times = WHISPER_N_TEXT_CTX - SOT_SEQUENCE.size();
576       int offset_init = SOT_SEQUENCE.size();
577       auto output_tokens =
578           LoopPredict(decoder_loop, n_layer_cross_k, n_layer_cross_v, logitsBin, out_n_layer_self_k_cache_Bin,
579                       out_n_layer_self_v_cache_Bin, data_embedding, loop_times, offset_init);
580
581       std::vector<std::string> token_tables = ProcessDataLines(ReadTokens(resourcesManager, "tiny-tokens.txt"));
582       std::string result;
583       for (const auto i : output_tokens) {
584           char str[1024];
585           base64_decode((const uint8 *)token_tables[i].c_str(), (uint32)token_tables[i].size(), str);
586           result += str;
587       }
588       LOGI("MS_LITE_LOG: result is -> %{public}s", result.c_str());
589
590       OH_AI_ModelDestroy(&encoder);
591       OH_AI_ModelDestroy(&decoder_main);
592       OH_AI_ModelDestroy(&decoder_loop);
593
594       napi_value out_data;
595       napi_create_string_utf8(env, result.c_str(), result.length(), &out_data);
596       return out_data;
597   }
598   ```
599
6007. Write the **CMake** script to link the MindSpore Lite dynamic library.
601
602   ```c++
603   # the minimum version of CMake.
604   cmake_minimum_required(VERSION 3.5.0)
605   project(test)
606   # AudioFile.h
607   set(CMAKE_CXX_STANDARD 17)
608   set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
609   set(NATIVERENDER_PATH ${CMAKE_CURRENT_SOURCE_DIR})
610
611   if(DEFINED PACKAGE_FIND_FILE)
612       include(${PACKAGE_FIND_FILE})
613   endif()
614
615   include_directories(${NATIVERENDER_PATH}
616                       ${NATIVERENDER_PATH}/include)
617
618   # libsamplerate
619   set(LIBSAMPLERATE_DIR ${NATIVERENDER_PATH}/third_party/libsamplerate)
620   include_directories(${LIBSAMPLERATE_DIR}/include)
621   add_subdirectory(${LIBSAMPLERATE_DIR})
622
623   include_directories(${NATIVERENDER_PATH}/third_party/opencc/include/opencc)
624   # src
625   aux_source_directory(src SRC_DIR)
626   include_directories(${NATIVERENDER_PATH}/src)
627
628   include_directories(${CMAKE_SOURCE_DIR}/third_party)
629
630   file(GLOB SRC src/*.cc)
631
632   add_library(entry SHARED mslite_napi.cpp ${SRC})
633   target_link_libraries(entry PUBLIC samplerate)
634   target_link_libraries(entry PUBLIC mindspore_lite_ndk)
635   target_link_libraries(entry PUBLIC hilog_ndk.z)
636   target_link_libraries(entry PUBLIC rawfile.z)
637   target_link_libraries(entry PUBLIC ace_napi.z)
638   ```
639
640### Use N-APIs to encapsulate the C++ dynamic library into an ArkTS module.
641
6421. In **entry/src/main/cpp/types/libentry/Index.d.ts**, define the ArkTS API `runDemo()` by adding the following content:
643
644   ```ts
645   export const runDemo: (a: Object) => string;
646   ```
647
6482. In the **oh-package.json5** file, associate the API with the .so file to form a complete ArkTS module.
649
650   ```json
651   {
652     "name": "entry",
653     "version": "1.0.0",
654     "description": "MindSpore Lite inference module",
655     "main": "",
656     "author": "",
657     "license": "",
658     "dependencies": {
659       "libentry.so": "file:./src/main/cpp/types/libentry"
660     }
661   }
662   ```
663
664### Invoke the encapsulated ArkTS module to perform inference and output the result.
665
666In **entry/src/main/ets/pages/Index.ets**, call the encapsulated ArkTS module to process the inference result.
667
668```ts
669// Index.ets
670
671import msliteNapi from 'libentry.so'
672import AVPlayerDemo from './player';
673import { transverter, TransverterType, TransverterLanguage } from "@nutpi/chinese_transverter"
674
675@Entry
676@Component
677struct Index {
678  @State message: string = 'MSLite Whisper Demo';
679  @State wavName: string = 'zh.wav';
680  @State content: string = '';
681
682  build() {
683    Row() {
684      Column() {
685        Text(this.message)
686          .fontSize(30)
687          .fontWeight(FontWeight.Bold);
688        Button() {
689          Text('Play Audio')
690            .fontSize(20)
691            .fontWeight(FontWeight.Medium)
692        }
693        .type(ButtonType.Capsule)
694        .margin({
695          top: 20
696        })
697        .backgroundColor('#0D9FFB')
698        .width('40%')
699        .height('5%')
700        .onClick(async () =>{
701          // Invoke functions in the avPlayerFdSrcDemo class.
702          console.info('MS_LITE_LOG: begin to play wav.');
703          let myClass = new AVPlayerDemo();
704          myClass.avPlayerFdSrcDemo();
705        })
706        Button() {
707          Text ('Recognize Audio')
708            .fontSize(20)
709            .fontWeight(FontWeight.Medium)
710        }
711        .type(ButtonType.Capsule)
712        .margin({
713          top: 20
714        })
715        .backgroundColor('#0D9FFB')
716        .width('40%')
717        .height('5%')
718        .onClick(() => {
719          let resMgr = this.getUIContext()?.getHostContext()?.getApplicationContext().resourceManager;
720          if (resMgr === undefined || resMgr === null) {
721            console.error('MS_LITE_ERR: get resourceManager failed.');
722            return
723          }
724          // Call the encapsulated runDemo function.
725          console.info('MS_LITE_LOG: *** Start MSLite Demo ***');
726          let output = msliteNapi.runDemo(resMgr);
727          if (output === null || output.length === 0) {
728            console.error('MS_LITE_ERR: runDemo failed.')
729            return
730          }
731          console.info('MS_LITE_LOG: output length = ', output.length, ';value = ', output.slice(0, 20));
732          this.content = output;
733          console.info('MS_LITE_LOG: *** Finished MSLite Demo ***');
734        })
735
736        // Display the recognized content.
737        if (this.content) {
738          Text ('Recognized content:\n' + transverter({
739            type: TransverterType.SIMPLIFIED,
740            str: this.content,
741            language: TransverterLanguage.ZH_CN
742          }) + '\n').focusable(true).fontSize(20).height('20%')
743        }
744      }.width('100%')
745    }
746    .height('100%')
747  }
748}
749```
750
751### Verification
752
7531. On DevEco Studio, connect to the device, click **Run entry**, and build your own HAP.
754
755   ```shell
756   Launching com.samples.mindsporelitecdemoasr
757   $ hdc shell aa force-stop com.samples.mindsporelitecdemoasr
758   $ hdc shell mkdir data/local/tmp/xxx
759   $ hdc file send E:\xxx\entry\build\default\outputs\default\entry-default-signed.hap "data/local/tmp/xxx"
760   $ hdc shell bm install -p data/local/tmp/xxx
761   $ hdc shell rm -rf data/local/tmp/xxx
762   $ hdc shell aa start -a EntryAbility -b com.samples.mindsporelitecdemoasr
763   com.samples.mindsporelitecdemoasr successfully launched...
764   ```
765
7662. Tap the `Play Audio` button on the device screen to play the sample audio file. After you tap the `Recognize Audio` button, the content of the sample audio file is displayed on the device screen. Filter the keyword **MS_LITE_LOG** in the log printing result. The following information is displayed:
767
768   ```verilog
769   05-16 14:53:44.200   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: begin to play wav.
770   05-16 14:53:44.210   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     [a92ab1e0f831191, 0, 0] MS_LITE_LOG: AVPlayer state initialized called.
771   05-16 14:53:44.228   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     [a92ab1e0f831191, 0, 0] MS_LITE_LOG: AVPlayer state prepared called.
772   05-16 14:53:44.242   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: AVPlayer state playing called.
773   05-16 14:53:44.242   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: AVPlayer start to seek.
774   05-16 14:53:44.372   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: AVPlayer seek succeeded, seek time is 0
775   05-16 14:53:49.621   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: AVPlayer state completed called.
776   05-16 14:53:49.646   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: AVPlayer state stopped called.
777   05-16 14:53:49.647   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: AVPlayer state idle called.
778   05-16 14:53:49.649   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: AVPlayer state released called.
779   05-16 14:53:53.282   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: *** Start MSLite Demo ***
780   05-16 14:53:53.926   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  I     MS_LITE_LOG: Build MSLite model success.
781   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: Run MSLite model Predict success.
782   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: Get model outputs:
783   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: - Tensor 0 name is: n_layer_cross_k.
784   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: - Tensor 0 size is: 9216000.
785   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: - Tensor data is:
786   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: -1.14678 -2.30223 0.868679 0.284441 1.03233 -2.02062 0.688163 -0.732034 -1.10553 1.43459 0.083885 -0.116173 -0.772636 1.5466 -0.631993 -0.897929 -0.0501685 -1.62517 0.375988 -1.77772 -0.432178
787   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: - Tensor 1 name is: n_layer_cross_v.
788   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: - Tensor 1 size is: 9216000.
789   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: - Tensor data is:
790   05-16 14:53:54.260   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: 0.0876085 -0.560317 -0.652518 -0.116969 -0.182608 -9.40531e-05 0.186293 0.123206 0.0127445 0.0708352 -0.489624 -0.226322 -0.0686949 -0.0341293 -0.0719619 0.103588 0.398025 -0.444261 0.396124 -0.347295 0.00541205
791   05-16 14:53:54.430   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  I     MS_LITE_LOG: Build MSLite model success.
792   05-16 14:53:54.462   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  D     MS_LITE_LOG: Run MSLite model Predict success.
793   ......
794   05-16 14:53:55.272   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  I     MS_LITE_LOG: run decoder loop 16 ok!
795                                                                                                   token = 50257
796   05-16 14:53:55.307   1679-1679     A00000/[MSLiteNapi]             com.sampl...cdemoasr  I     MS_LITE_LOG: result is -> I think the most important thing about running is that it brings me physical health.
797   05-16 14:53:55.334   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: output length =  20 ;value =  I think the most important thing about running is that it brings me physical health.
798   05-16 14:53:55.334   1679-1679     A03d00/JSAPP                    com.sampl...cdemoasr  I     MS_LITE_LOG: *** Finished MSLite Demo ***
799   ```
800
801
802### Effects
803
804After you tap the **Play Audio** button on the device screen, the sample audio file is played. After you tap the **Recognize Audio** button, the content of the sample audio file is displayed on the device screen.
805
806|         Initial Page         |  Page with Recognized Content  |
807| :-----------------------: | :-----------------------: |
808| ![asr1](figures/asr1.png) | ![asr2](figures/asr2.png) |
809
810