1# Using MindSpore Lite for Speech Recognition (C/C++) 2 3<!--Kit: MindSpore Lite Kit--> 4<!--Subsystem: AI--> 5<!--Owner: @zhuguodong8--> 6<!--Designer: @zhuguodong8; @jjfeing--> 7<!--Tester: @principal87--> 8<!--Adviser: @ge-yafang--> 9 10## When to Use 11 12You can use [MindSpore](../../reference/apis-mindspore-lite-kit/capi-mindspore.md) to quickly deploy AI algorithms into your application to perform AI model inference for speech recognition. 13 14Speech recognition can convert an audio file into text, which is widely used in intelligent voice assistants, voice input, and voice search. 15 16## Basic Concepts 17 18- N-API: a set of native APIs used to build ArkTS components. N-APIs can be used to encapsulate C/C++ libraries into ArkTS modules. 19 20## How to Develop 21 221. Select a speech recognition model. 232. Use the MindSpore Lite inference model on the device to implement speech recognition. 24 25## Environment Setup 26 27Install DevEco Studio 5.0.2 or later, and update the SDK to API version 14 or later. 28 29## Development Procedure 30 31This section uses the inference of a speech recognition model as an example to demonstrate how to implement a speech recognition application using MindSpore Lite. 32 33### Selecting an Appropriate Model 34 35The speech recognition model files **tiny-encoder.ms**, **tiny-decoder-main.ms**, and **tiny-decoder-loop.ms** used in this sample application are stored in the **entry/src/main/resources/rawfile** directory. 36 37### Writing the Code for Audio Playback 38 391. Call [@ohos.multimedia.media](../../reference/apis-media-kit/arkts-apis-media.md) and [@ohos.multimedia.audio](../../reference/apis-audio-kit/arkts-apis-audio.md) to play audio. 40 41 ```ts 42 // player.ets 43 import { media } from '@kit.MediaKit'; 44 import { common } from '@kit.AbilityKit'; 45 import { BusinessError } from '@kit.BasicServicesKit'; 46 import { audio } from '@kit.AudioKit'; 47 import { UIContext } from '@kit.ArkUI'; 48 49 export default class AVPlayerDemo { 50 private isSeek: boolean = false; // Disable the seek operation. 51 // Set the AVPlayer callback. 52 setAVPlayerCallback(avPlayer: media.AVPlayer) { 53 // Callback for the seek operation. 54 avPlayer.on('seekDone', (seekDoneTime: number) => { 55 console.info(`MS_LITE_LOG: AVPlayer seek succeeded, seek time is ${seekDoneTime}`); 56 }); 57 // Callback invoked if an error occurs while the AVPlayer is playing audio. In such a case, reset() is called to reset the AVPlayer. 58 avPlayer.on('error', (err: BusinessError) => { 59 console.error(`MS_LITE_LOG: Invoke avPlayer failed, code is ${err.code}, message is ${err.message}`); 60 avPlayer.reset(); // Call reset() to reset the AVPlayer, which enters the idle state. 61 }); 62 // Callback for state changes. 63 avPlayer.on('stateChange', async (state: string, reason: media.StateChangeReason) => { 64 switch (state) { 65 case 'idle': // This state is reported upon a successful callback of reset(). 66 console.info('MS_LITE_LOG: AVPlayer state idle called.'); 67 avPlayer.release(); // Call release() to release the instance. 68 break; 69 case 'initialized': // This state is reported when the AVPlayer sets the playback source. 70 console.info('MS_LITE_LOG: AVPlayer state initialized called.'); 71 avPlayer.audioRendererInfo = { 72 usage: audio.StreamUsage.STREAM_USAGE_MUSIC, // Audio stream usage type: music. Set this parameter based on the service scenario. 73 rendererFlags: 0 // Audio renderer flag. 74 }; 75 avPlayer.prepare(); 76 break; 77 case 'prepared': // This state is reported upon a successful callback of prepare(). 78 console.info('MS_LITE_LOG: AVPlayer state prepared called.'); 79 avPlayer.play(); // Call play() to start playback. 80 break; 81 case 'playing': // This state is reported upon a successful callback of play(). 82 console.info('MS_LITE_LOG: AVPlayer state playing called.'); 83 if (this.isSeek) { 84 console.info('MS_LITE_LOG: AVPlayer start to seek.'); 85 avPlayer.seek(0); // Move the playback position to the beginning of the audio. 86 } else { 87 // When the seek operation is not supported, the playback continues until it reaches the end. 88 console.info('MS_LITE_LOG: AVPlayer wait to play end.'); 89 } 90 break; 91 case 'paused': // This state is reported upon a successful callback of pause(). 92 console.info('MS_LITE_LOG: AVPlayer state paused called.'); 93 setTimeout(() => { 94 console.info('MS_LITE_LOG: AVPlayer paused wait to play again'); 95 avPlayer.play(); // After the playback is paused for 3 seconds, call the play API again to start playback. 96 }, 3000); 97 break; 98 case 'completed': // This state is reported upon the completion of the playback. 99 console.info('MS_LITE_LOG: AVPlayer state completed called.'); 100 avPlayer.stop(); // Call stop() to stop the playback. 101 break; 102 case 'stopped': // This state is reported upon a successful callback of stop(). 103 console.info('MS_LITE_LOG: AVPlayer state stopped called.'); 104 avPlayer.reset(); // Call reset() to reset the AVPlayer. 105 break; 106 case 'released': 107 console.info('MS_LITE_LOG: AVPlayer state released called.'); 108 break; 109 default: 110 console.info('MS_LITE_LOG: AVPlayer state unknown called.'); 111 break; 112 } 113 }); 114 } 115 116 // Use the resource management API to obtain the audio file and play the audio file through the fdSrc attribute. 117 async avPlayerFdSrcDemo() { 118 // Create an AVPlayer instance. 119 let avPlayer: media.AVPlayer = await media.createAVPlayer(); 120 // Create a callback for state changes. 121 this.setAVPlayerCallback(avPlayer); 122 // Call getRawFd of the resourceManager member of UIAbilityContext to obtain the media asset URL. 123 // The return type is {fd,offset,length}, where fd indicates the file descriptor address of the HAP file, offset indicates the media asset offset, and length indicates the duration of the media asset to play. 124 let context = new UIContext().getHostContext() as common.UIAbilityContext; 125 let fileDescriptor = await context.resourceManager.getRawFd('zh.wav'); 126 let avFileDescriptor: media.AVFileDescriptor = 127 { fd: fileDescriptor.fd, offset: fileDescriptor.offset, length: fileDescriptor.length }; 128 this.isSeek = true; // Enable the seek operation. 129 // Assign a value to fdSrc to trigger the reporting of the initialized state. 130 avPlayer.fdSrc = avFileDescriptor; 131 } 132 } 133 ``` 134 135 136### Writing the Code for Speech Recognition 137 138Call [MindSpore](../../reference/apis-mindspore-lite-kit/capi-mindspore.md) to perform inference on the three models in sequence. The inference process is as follows: 139 1401. Include the corresponding header files. The third-party libraries **librosa**, **libsamplerate**, and **base64.h** are from [LibrosaCpp](https://github.com/ewan-xu/LibrosaCpp), [libsamplerate](https://github.com/libsndfile/libsamplerate), AudioFile.h, and [whisper.axera](https://github.com/ml-inory/whisper.axera/tree/main/cpp/src), respectively. 141 142 ```c++ 143 #include "AudioFile.h" 144 #include "base64.h" 145 #include "napi/native_api.h" 146 #include "utils.h" 147 #include <algorithm> 148 #include <cstdlib> 149 #include <fstream> 150 #include <hilog/log.h> 151 #include <iostream> 152 #include <librosa/librosa.h> 153 #include <mindspore/context.h> 154 #include <mindspore/model.h> 155 #include <mindspore/status.h> 156 #include <mindspore/tensor.h> 157 #include <mindspore/types.h> 158 #include <numeric> 159 #include <rawfile/raw_file_manager.h> 160 #include <sstream> 161 #include <vector> 162 ``` 163 1642. Read related files such as audio files and model files, and converts them to buffer data. 165 166 ```c++ 167 #define LOGI(...) ((void)OH_LOG_Print(LOG_APP, LOG_INFO, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__)) 168 #define LOGD(...) ((void)OH_LOG_Print(LOG_APP, LOG_DEBUG, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__)) 169 #define LOGW(...) ((void)OH_LOG_Print(LOG_APP, LOG_WARN, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__)) 170 #define LOGE(...) ((void)OH_LOG_Print(LOG_APP, LOG_ERROR, LOG_DOMAIN, "[MSLiteNapi]", __VA_ARGS__)) 171 172 using BinBuffer = std::pair<void *, size_t>; 173 174 BinBuffer ReadBinFile(NativeResourceManager *nativeResourceManager, const std::string &modelName) 175 { 176 auto rawFile = OH_ResourceManager_OpenRawFile(nativeResourceManager, modelName.c_str()); 177 if (rawFile == nullptr) { 178 LOGE("MS_LITE_ERR: Open model file failed"); 179 return BinBuffer(nullptr, 0); 180 } 181 long fileSize = OH_ResourceManager_GetRawFileSize(rawFile); 182 if (fileSize <= 0) { 183 LOGE("MS_LITE_ERR: FileSize not correct"); 184 return BinBuffer(nullptr, 0); 185 } 186 void *buffer = malloc(fileSize); 187 if (buffer == nullptr) { 188 LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed"); 189 return BinBuffer(nullptr, 0); 190 } 191 int ret = OH_ResourceManager_ReadRawFile(rawFile, buffer, fileSize); 192 if (ret == 0) { 193 LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed"); 194 OH_ResourceManager_CloseRawFile(rawFile); 195 return BinBuffer(nullptr, 0); 196 } 197 OH_ResourceManager_CloseRawFile(rawFile); 198 return BinBuffer(buffer, fileSize); 199 } 200 201 BinBuffer ReadTokens(NativeResourceManager *nativeResourceManager, const std::string &modelName) { 202 auto rawFile = OH_ResourceManager_OpenRawFile(nativeResourceManager, modelName.c_str()); 203 if (rawFile == nullptr) { 204 LOGE("MS_LITE_ERR: Open model file failed"); 205 return BinBuffer(nullptr, 0); 206 } 207 long fileSize = OH_ResourceManager_GetRawFileSize(rawFile); 208 if (fileSize <= 0) { 209 LOGE("MS_LITE_ERR: FileSize not correct"); 210 return BinBuffer(nullptr, 0); 211 } 212 void *buffer = malloc(fileSize); 213 if (buffer == nullptr) { 214 LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed"); 215 return BinBuffer(nullptr, 0); 216 } 217 int ret = OH_ResourceManager_ReadRawFile(rawFile, buffer, fileSize); 218 if (ret == 0) { 219 LOGE("MS_LITE_ERR: OH_ResourceManager_ReadRawFile failed"); 220 OH_ResourceManager_CloseRawFile(rawFile); 221 return BinBuffer(nullptr, 0); 222 } 223 OH_ResourceManager_CloseRawFile(rawFile); 224 BinBuffer res(buffer, fileSize); 225 return res; 226 } 227 ``` 228 2293. Create a context, set the device type, and load the model. 230 231 ```c++ 232 void DestroyModelBuffer(void **buffer) 233 { 234 if (buffer == nullptr) { 235 return; 236 } 237 free(*buffer); 238 *buffer = nullptr; 239 } 240 241 OH_AI_ModelHandle CreateMSLiteModel(BinBuffer &bin) 242 { 243 // Create and configure the context for model inference. 244 auto context = OH_AI_ContextCreate(); 245 if (context == nullptr) { 246 DestroyModelBuffer(&bin.first); 247 LOGE("MS_LITE_ERR: Create MSLite context failed.\n"); 248 return nullptr; 249 } 250 auto cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU); 251 OH_AI_DeviceInfoSetEnableFP16(cpu_device_info, false); 252 OH_AI_ContextAddDeviceInfo(context, cpu_device_info); 253 254 // Create a model. 255 auto model = OH_AI_ModelCreate(); 256 if (model == nullptr) { 257 DestroyModelBuffer(&bin.first); 258 LOGE("MS_LITE_ERR: Allocate MSLite Model failed.\n"); 259 return nullptr; 260 } 261 262 // Load and build the inference model. The model type is OH_AI_MODELTYPE_MINDIR. 263 auto build_ret = OH_AI_ModelBuild(model, bin.first, bin.second, OH_AI_MODELTYPE_MINDIR, context); 264 DestroyModelBuffer(&bin.first); 265 if (build_ret != OH_AI_STATUS_SUCCESS) { 266 OH_AI_ModelDestroy(&model); 267 LOGE("MS_LITE_ERR: Build MSLite model failed.\n"); 268 return nullptr; 269 } 270 LOGI("MS_LITE_LOG: Build MSLite model success.\n"); 271 return model; 272 } 273 ``` 274 2754. Set the model input data, and perform model inference. 276 277 ```c++ 278 constexpr int K_NUM_PRINT_OF_OUT_DATA = 20; 279 280 int FillInputTensor(OH_AI_TensorHandle input, const BinBuffer &bin) 281 { 282 if (OH_AI_TensorGetDataSize(input) != bin.second) { 283 return OH_AI_STATUS_LITE_INPUT_PARAM_INVALID; 284 } 285 char *data = (char *)OH_AI_TensorGetMutableData(input); 286 memcpy(data, (const char *)bin.first, OH_AI_TensorGetDataSize(input)); 287 return OH_AI_STATUS_SUCCESS; 288 } 289 290 // Perform model inference. 291 int RunMSLiteModel(OH_AI_ModelHandle model, std::vector<BinBuffer> inputBins) 292 { 293 // Set the model input data. 294 auto inputs = OH_AI_ModelGetInputs(model); 295 for(int i = 0; i < inputBins.size(); i++) 296 { 297 auto ret = FillInputTensor(inputs.handle_list[i], inputBins[i]); 298 if (ret != OH_AI_STATUS_SUCCESS) { 299 LOGE("MS_LITE_ERR: set input %{public}d error.\n", i); 300 return OH_AI_STATUS_LITE_ERROR; 301 } 302 } 303 304 // Obtain the output tensor of the model. 305 auto outputs = OH_AI_ModelGetOutputs(model); 306 307 // Model inference 308 auto predict_ret = OH_AI_ModelPredict(model, inputs, &outputs, nullptr, nullptr); 309 if (predict_ret != OH_AI_STATUS_SUCCESS) { 310 OH_AI_ModelDestroy(&model); 311 LOGE("MS_LITE_ERR: MSLite Predict error.\n"); 312 return OH_AI_STATUS_LITE_ERROR; 313 } 314 LOGD("MS_LITE_LOG: Run MSLite model Predict success.\n"); 315 316 // Print the output data. 317 LOGD("MS_LITE_LOG: Get model outputs:\n"); 318 for (size_t i = 0; i < outputs.handle_num; i++) { 319 auto tensor = outputs.handle_list[i]; 320 LOGD("MS_LITE_LOG: - Tensor %{public}d name is: %{public}s.\n", static_cast<int>(i), 321 OH_AI_TensorGetName(tensor)); 322 LOGD("MS_LITE_LOG: - Tensor %{public}d size is: %{public}d.\n", static_cast<int>(i), 323 (int)OH_AI_TensorGetDataSize(tensor)); 324 LOGD("MS_LITE_LOG: - Tensor data is:\n"); 325 auto out_data = reinterpret_cast<const float *>(OH_AI_TensorGetData(tensor)); 326 std::stringstream outStr; 327 for (int i = 0; (i < OH_AI_TensorGetElementNum(tensor)) && (i <= K_NUM_PRINT_OF_OUT_DATA); i++) { 328 outStr << out_data[i] << " "; 329 } 330 LOGD("MS_LITE_LOG: %{public}s", outStr.str().c_str()); 331 } 332 return OH_AI_STATUS_SUCCESS; 333 } 334 ``` 335 3365. Repeat the preceding procedure for the remaining models. 337 338 ```c++ 339 const float NEG_INF = -std::numeric_limits<float>::infinity(); 340 const int WHISPER_SOT = 50258; 341 const int WHISPER_TRANSCRIBE = 50359; 342 const int WHISPER_TRANSLATE = 50358; 343 const int WHISPER_NO_TIMESTAMPS = 50363; 344 const int WHISPER_EOT = 50257; 345 const int WHISPER_BLANK = 220; 346 const int WHISPER_NO_SPEECH = 50362; 347 const int WHISPER_N_TEXT_CTX = 448; 348 const int WHISPER_N_TEXT_STATE = 384; 349 constexpr int WHISPER_SAMPLE_RATE = 16000; 350 351 BinBuffer GetMSOutput(OH_AI_TensorHandle output) { 352 float *outputData = reinterpret_cast<float *>(OH_AI_TensorGetMutableData(output)); 353 size_t size = OH_AI_TensorGetDataSize(output); 354 return {outputData, size}; 355 } 356 357 void SupressTokens(BinBuffer &logits, bool is_initial) { 358 auto logits_data = static_cast<float *>(logits.first); 359 if (is_initial) { 360 logits_data[WHISPER_EOT] = NEG_INF; 361 logits_data[WHISPER_BLANK] = NEG_INF; 362 } 363 364 // Suppress other tokens. 365 logits_data[WHISPER_NO_TIMESTAMPS] = NEG_INF; 366 logits_data[WHISPER_SOT] = NEG_INF; 367 logits_data[WHISPER_NO_SPEECH] = NEG_INF; 368 logits_data[WHISPER_TRANSLATE] = NEG_INF; 369 } 370 371 std::vector<int> LoopPredict(const OH_AI_ModelHandle model, const BinBuffer &n_layer_cross_k, 372 const BinBuffer &n_layer_cross_v, const BinBuffer &logits_init, 373 BinBuffer &out_n_layer_self_k_cache, BinBuffer &out_n_layer_self_v_cache, 374 const BinBuffer &data_embedding, const int loop, const int offset_init) { 375 BinBuffer logits{nullptr, 51865 * sizeof(float)}; 376 logits.first = malloc(logits.second); 377 if (!logits.first) { 378 LOGE("MS_LITE_ERR: Fail to malloc!\n"); 379 return {}; 380 } 381 void *logits_init_src = static_cast<char *>(logits_init.first) + 51865 * 3 * sizeof(float); 382 memcpy(logits.first, logits_init_src, logits.second); 383 SupressTokens(logits, true); 384 385 std::vector<int> output_token; 386 float *logits_data = static_cast<float *>(logits.first); 387 int max_token_id = 0; 388 float max_token = logits_data[0]; 389 for (int i = 0; i < logits.second / sizeof(float); i++) { 390 if (logits_data[i] > max_token) { 391 max_token_id = i; 392 max_token = logits_data[i]; 393 } 394 } 395 396 int offset = offset_init; 397 BinBuffer slice{nullptr, 0}; 398 slice.second = WHISPER_N_TEXT_STATE * sizeof(float); 399 slice.first = malloc(slice.second); 400 if (!slice.first) { 401 LOGE("MS_LITE_ERR: Fail to malloc!\n"); 402 return {}; 403 } 404 405 auto out_n_layer_self_k_cache_new = out_n_layer_self_k_cache; 406 auto out_n_layer_self_v_cache_new = out_n_layer_self_v_cache; 407 408 for (size_t i = 0; i < loop; i++) { 409 if (max_token_id == WHISPER_EOT) { 410 break; 411 } 412 output_token.push_back(max_token_id); 413 std::vector<float> mask(WHISPER_N_TEXT_CTX, 0.0f); 414 for (size_t i = 0; i < WHISPER_N_TEXT_CTX - offset - 1; ++i) { 415 mask[i] = NEG_INF; 416 } 417 BinBuffer tokens{&max_token_id, sizeof(int)}; 418 419 void *data_embedding_src = 420 static_cast<char *>(data_embedding.first) + offset * WHISPER_N_TEXT_STATE * sizeof(float); 421 memcpy(slice.first, data_embedding_src, slice.second); 422 BinBuffer mask_bin(mask.data(), mask.size() * sizeof(float)); 423 int ret = RunMSLiteModel(model, {tokens, out_n_layer_self_k_cache_new, out_n_layer_self_v_cache_new, 424 n_layer_cross_k, n_layer_cross_v, slice, mask_bin}); 425 426 auto outputs = OH_AI_ModelGetOutputs(model); 427 logits = GetMSOutput(outputs.handle_list[0]); 428 out_n_layer_self_k_cache_new = GetMSOutput(outputs.handle_list[1]); 429 out_n_layer_self_v_cache_new = GetMSOutput(outputs.handle_list[2]); 430 offset++; 431 SupressTokens(logits, false); 432 logits_data = static_cast<float *>(logits.first); 433 max_token = logits_data[0]; 434 435 for (int j = 0; j < logits.second / sizeof(float); j++) { 436 if (logits_data[j] > max_token) { 437 max_token_id = j; 438 max_token = logits_data[j]; 439 } 440 } 441 LOGI("MS_LITE_LOG: run decoder loop %{public}d ok!\n token = %{public}d", i, max_token_id); 442 } 443 return output_token; 444 } 445 446 std::vector<std::string> ProcessDataLines(const BinBuffer token_txt) { 447 void *data_ptr = token_txt.first; 448 size_t data_size = token_txt.second; 449 std::vector<std::string> tokens; 450 451 const char *char_data = static_cast<const char *>(data_ptr); 452 std::stringstream ss(std::string(char_data, char_data + data_size)); 453 std::string line; 454 while (std::getline(ss, line)) { 455 size_t space_pos = line.find(' '); 456 tokens.push_back(line.substr(0, space_pos)); 457 } 458 return tokens; 459 } 460 461 static napi_value RunDemo(napi_env env, napi_callback_info info) 462 { 463 // Perform sample inference. 464 napi_value error_ret; 465 napi_create_int32(env, -1, &error_ret); 466 size_t argc = 1; 467 napi_value argv[1] = {nullptr}; 468 napi_get_cb_info(env, info, &argc, argv, nullptr, nullptr); 469 auto resourcesManager = OH_ResourceManager_InitNativeResourceManager(env, argv[0]); 470 471 // Data preprocessing 472 AudioFile<float> audioFile; 473 std::string filePath = "zh.wav"; 474 auto audioBin = ReadBinFile(resourcesManager, filePath); 475 if (audioBin.first == nullptr) { 476 LOGE("MS_LITE_ERR: Fail to read %{public}s!", filePath.c_str()); 477 return error_ret; 478 } 479 size_t dataSize = audioBin.second; 480 uint8_t *dataBuffer = (uint8_t *)audioBin.first; 481 bool ok = audioFile.loadFromMemory(std::vector<uint8_t>(dataBuffer, dataBuffer + dataSize)); 482 if (!ok) { 483 LOGE("MS_LITE_ERR: Fail to read %{public}s!", filePath.c_str()); 484 return error_ret; 485 } 486 std::vector<float> data(audioFile.samples[0]); 487 ResampleAudio(data, audioFile.getSampleRate(), WHISPER_SAMPLE_RATE, 1, SRC_SINC_BEST_QUALITY); 488 std::vector<float> audio(data); 489 490 int padding = 480000; 491 int sr = 16000; 492 int n_fft = 480; 493 int n_hop = 160; 494 int n_mel = 80; 495 int fmin = 0; // Minimum frequency. The default value is 0.0 Hz. 496 int fmax = 497 sr / 498 2.0; // Maximum frequency. The default value is half of the sampling rate (sr/2.0). 499 audio.insert(audio.end(), padding, 0.0f); 500 std::vector<std::vector<float>> mels_T = 501 librosa::Feature::melspectrogram(audio, sr, n_fft, n_hop, "hann", true, "reflect", 2.f, n_mel, fmin, fmax); 502 std::cout << "mels: " << std::endl; 503 504 std::vector<std::vector<float>> mels = TransposeMel(mels_T); 505 ProcessMelSpectrogram(mels); 506 507 std::vector<float> inputMels(mels.size() * mels[0].size(), 0); 508 for (int i = 0; i < mels.size(); i++) { 509 std::copy(mels[i].begin(), mels[i].end(), inputMels.begin() + i * mels[0].size()); 510 } 511 512 BinBuffer inputMelsBin(inputMels.data(), inputMels.size() * sizeof(float)); 513 514 // Run inference on tiny-encoder.ms. 515 auto encoderBin = ReadBinFile(resourcesManager, "tiny-encoder.ms"); 516 if (encoderBin.first == nullptr) { 517 free(dataBuffer); 518 dataBuffer = nullptr; 519 return error_ret; 520 } 521 522 auto encoder = CreateMSLiteModel(encoderBin); 523 524 int ret = RunMSLiteModel(encoder, {inputMelsBin}); 525 if (ret != OH_AI_STATUS_SUCCESS) { 526 OH_AI_ModelDestroy(&encoder); 527 return error_ret; 528 } 529 LOGI("MS_LITE_LOG: run encoder ok!\n"); 530 531 auto outputs = OH_AI_ModelGetOutputs(encoder); 532 auto n_layer_cross_k = GetMSOutput(outputs.handle_list[0]); 533 auto n_layer_cross_v = GetMSOutput(outputs.handle_list[1]); 534 535 // Run inference on tiny-decoder-main.ms. 536 std::vector<int> SOT_SEQUENCE = {WHISPER_SOT, 537 WHISPER_SOT + 1 + 1, 538 WHISPER_TRANSCRIBE, WHISPER_NO_TIMESTAMPS}; 539 BinBuffer sotSequence(SOT_SEQUENCE.data(), SOT_SEQUENCE.size() * sizeof(int)); 540 541 const std::string decoder_main_path = "tiny-decoder-main.ms"; 542 auto decoderMainBin = ReadBinFile(resourcesManager, decoder_main_path); 543 if (decoderMainBin.first == nullptr) { 544 OH_AI_ModelDestroy(&encoder); 545 return error_ret; 546 } 547 auto decoder_main = CreateMSLiteModel(decoderMainBin); 548 int ret2 = RunMSLiteModel(decoder_main, {sotSequence, n_layer_cross_k, n_layer_cross_v}); 549 550 if (ret2 != OH_AI_STATUS_SUCCESS) { 551 OH_AI_ModelDestroy(&decoder_main); 552 return error_ret; 553 } 554 LOGI("MS_LITE_LOG: run decoder_main ok!\n"); 555 556 auto decoderMainOut = OH_AI_ModelGetOutputs(decoder_main); 557 auto logitsBin = GetMSOutput(decoderMainOut.handle_list[0]); 558 auto out_n_layer_self_k_cache_Bin = GetMSOutput(decoderMainOut.handle_list[1]); 559 auto out_n_layer_self_v_cache_Bin = GetMSOutput(decoderMainOut.handle_list[2]); 560 561 // Run inference on tiny-decoder-loop.ms. 562 const std::string modelName3 = "tiny-decoder-loop.ms"; 563 auto modelBuffer3 = ReadBinFile(resourcesManager, modelName3); 564 auto decoder_loop = CreateMSLiteModel(modelBuffer3); 565 566 const std::string dataName_embedding = "tiny-positional_embedding.bin"; // Obtain the input data. 567 auto data_embedding = ReadBinFile(resourcesManager, dataName_embedding); 568 if (data_embedding.first == nullptr) { 569 OH_AI_ModelDestroy(&encoder); 570 OH_AI_ModelDestroy(&decoder_main); 571 OH_AI_ModelDestroy(&decoder_loop); 572 return error_ret; 573 } 574 575 int loop_times = WHISPER_N_TEXT_CTX - SOT_SEQUENCE.size(); 576 int offset_init = SOT_SEQUENCE.size(); 577 auto output_tokens = 578 LoopPredict(decoder_loop, n_layer_cross_k, n_layer_cross_v, logitsBin, out_n_layer_self_k_cache_Bin, 579 out_n_layer_self_v_cache_Bin, data_embedding, loop_times, offset_init); 580 581 std::vector<std::string> token_tables = ProcessDataLines(ReadTokens(resourcesManager, "tiny-tokens.txt")); 582 std::string result; 583 for (const auto i : output_tokens) { 584 char str[1024]; 585 base64_decode((const uint8 *)token_tables[i].c_str(), (uint32)token_tables[i].size(), str); 586 result += str; 587 } 588 LOGI("MS_LITE_LOG: result is -> %{public}s", result.c_str()); 589 590 OH_AI_ModelDestroy(&encoder); 591 OH_AI_ModelDestroy(&decoder_main); 592 OH_AI_ModelDestroy(&decoder_loop); 593 594 napi_value out_data; 595 napi_create_string_utf8(env, result.c_str(), result.length(), &out_data); 596 return out_data; 597 } 598 ``` 599 6007. Write the **CMake** script to link the MindSpore Lite dynamic library. 601 602 ```c++ 603 # the minimum version of CMake. 604 cmake_minimum_required(VERSION 3.5.0) 605 project(test) 606 # AudioFile.h 607 set(CMAKE_CXX_STANDARD 17) 608 set(CMAKE_CXX_STANDARD_REQUIRED TRUE) 609 set(NATIVERENDER_PATH ${CMAKE_CURRENT_SOURCE_DIR}) 610 611 if(DEFINED PACKAGE_FIND_FILE) 612 include(${PACKAGE_FIND_FILE}) 613 endif() 614 615 include_directories(${NATIVERENDER_PATH} 616 ${NATIVERENDER_PATH}/include) 617 618 # libsamplerate 619 set(LIBSAMPLERATE_DIR ${NATIVERENDER_PATH}/third_party/libsamplerate) 620 include_directories(${LIBSAMPLERATE_DIR}/include) 621 add_subdirectory(${LIBSAMPLERATE_DIR}) 622 623 include_directories(${NATIVERENDER_PATH}/third_party/opencc/include/opencc) 624 # src 625 aux_source_directory(src SRC_DIR) 626 include_directories(${NATIVERENDER_PATH}/src) 627 628 include_directories(${CMAKE_SOURCE_DIR}/third_party) 629 630 file(GLOB SRC src/*.cc) 631 632 add_library(entry SHARED mslite_napi.cpp ${SRC}) 633 target_link_libraries(entry PUBLIC samplerate) 634 target_link_libraries(entry PUBLIC mindspore_lite_ndk) 635 target_link_libraries(entry PUBLIC hilog_ndk.z) 636 target_link_libraries(entry PUBLIC rawfile.z) 637 target_link_libraries(entry PUBLIC ace_napi.z) 638 ``` 639 640### Use N-APIs to encapsulate the C++ dynamic library into an ArkTS module. 641 6421. In **entry/src/main/cpp/types/libentry/Index.d.ts**, define the ArkTS API `runDemo()` by adding the following content: 643 644 ```ts 645 export const runDemo: (a: Object) => string; 646 ``` 647 6482. In the **oh-package.json5** file, associate the API with the .so file to form a complete ArkTS module. 649 650 ```json 651 { 652 "name": "entry", 653 "version": "1.0.0", 654 "description": "MindSpore Lite inference module", 655 "main": "", 656 "author": "", 657 "license": "", 658 "dependencies": { 659 "libentry.so": "file:./src/main/cpp/types/libentry" 660 } 661 } 662 ``` 663 664### Invoke the encapsulated ArkTS module to perform inference and output the result. 665 666In **entry/src/main/ets/pages/Index.ets**, call the encapsulated ArkTS module to process the inference result. 667 668```ts 669// Index.ets 670 671import msliteNapi from 'libentry.so' 672import AVPlayerDemo from './player'; 673import { transverter, TransverterType, TransverterLanguage } from "@nutpi/chinese_transverter" 674 675@Entry 676@Component 677struct Index { 678 @State message: string = 'MSLite Whisper Demo'; 679 @State wavName: string = 'zh.wav'; 680 @State content: string = ''; 681 682 build() { 683 Row() { 684 Column() { 685 Text(this.message) 686 .fontSize(30) 687 .fontWeight(FontWeight.Bold); 688 Button() { 689 Text('Play Audio') 690 .fontSize(20) 691 .fontWeight(FontWeight.Medium) 692 } 693 .type(ButtonType.Capsule) 694 .margin({ 695 top: 20 696 }) 697 .backgroundColor('#0D9FFB') 698 .width('40%') 699 .height('5%') 700 .onClick(async () =>{ 701 // Invoke functions in the avPlayerFdSrcDemo class. 702 console.info('MS_LITE_LOG: begin to play wav.'); 703 let myClass = new AVPlayerDemo(); 704 myClass.avPlayerFdSrcDemo(); 705 }) 706 Button() { 707 Text ('Recognize Audio') 708 .fontSize(20) 709 .fontWeight(FontWeight.Medium) 710 } 711 .type(ButtonType.Capsule) 712 .margin({ 713 top: 20 714 }) 715 .backgroundColor('#0D9FFB') 716 .width('40%') 717 .height('5%') 718 .onClick(() => { 719 let resMgr = this.getUIContext()?.getHostContext()?.getApplicationContext().resourceManager; 720 if (resMgr === undefined || resMgr === null) { 721 console.error('MS_LITE_ERR: get resourceManager failed.'); 722 return 723 } 724 // Call the encapsulated runDemo function. 725 console.info('MS_LITE_LOG: *** Start MSLite Demo ***'); 726 let output = msliteNapi.runDemo(resMgr); 727 if (output === null || output.length === 0) { 728 console.error('MS_LITE_ERR: runDemo failed.') 729 return 730 } 731 console.info('MS_LITE_LOG: output length = ', output.length, ';value = ', output.slice(0, 20)); 732 this.content = output; 733 console.info('MS_LITE_LOG: *** Finished MSLite Demo ***'); 734 }) 735 736 // Display the recognized content. 737 if (this.content) { 738 Text ('Recognized content:\n' + transverter({ 739 type: TransverterType.SIMPLIFIED, 740 str: this.content, 741 language: TransverterLanguage.ZH_CN 742 }) + '\n').focusable(true).fontSize(20).height('20%') 743 } 744 }.width('100%') 745 } 746 .height('100%') 747 } 748} 749``` 750 751### Verification 752 7531. On DevEco Studio, connect to the device, click **Run entry**, and build your own HAP. 754 755 ```shell 756 Launching com.samples.mindsporelitecdemoasr 757 $ hdc shell aa force-stop com.samples.mindsporelitecdemoasr 758 $ hdc shell mkdir data/local/tmp/xxx 759 $ hdc file send E:\xxx\entry\build\default\outputs\default\entry-default-signed.hap "data/local/tmp/xxx" 760 $ hdc shell bm install -p data/local/tmp/xxx 761 $ hdc shell rm -rf data/local/tmp/xxx 762 $ hdc shell aa start -a EntryAbility -b com.samples.mindsporelitecdemoasr 763 com.samples.mindsporelitecdemoasr successfully launched... 764 ``` 765 7662. Tap the `Play Audio` button on the device screen to play the sample audio file. After you tap the `Recognize Audio` button, the content of the sample audio file is displayed on the device screen. Filter the keyword **MS_LITE_LOG** in the log printing result. The following information is displayed: 767 768 ```verilog 769 05-16 14:53:44.200 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: begin to play wav. 770 05-16 14:53:44.210 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I [a92ab1e0f831191, 0, 0] MS_LITE_LOG: AVPlayer state initialized called. 771 05-16 14:53:44.228 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I [a92ab1e0f831191, 0, 0] MS_LITE_LOG: AVPlayer state prepared called. 772 05-16 14:53:44.242 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: AVPlayer state playing called. 773 05-16 14:53:44.242 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: AVPlayer start to seek. 774 05-16 14:53:44.372 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: AVPlayer seek succeeded, seek time is 0 775 05-16 14:53:49.621 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: AVPlayer state completed called. 776 05-16 14:53:49.646 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: AVPlayer state stopped called. 777 05-16 14:53:49.647 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: AVPlayer state idle called. 778 05-16 14:53:49.649 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: AVPlayer state released called. 779 05-16 14:53:53.282 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: *** Start MSLite Demo *** 780 05-16 14:53:53.926 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr I MS_LITE_LOG: Build MSLite model success. 781 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: Run MSLite model Predict success. 782 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: Get model outputs: 783 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: - Tensor 0 name is: n_layer_cross_k. 784 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: - Tensor 0 size is: 9216000. 785 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: - Tensor data is: 786 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: -1.14678 -2.30223 0.868679 0.284441 1.03233 -2.02062 0.688163 -0.732034 -1.10553 1.43459 0.083885 -0.116173 -0.772636 1.5466 -0.631993 -0.897929 -0.0501685 -1.62517 0.375988 -1.77772 -0.432178 787 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: - Tensor 1 name is: n_layer_cross_v. 788 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: - Tensor 1 size is: 9216000. 789 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: - Tensor data is: 790 05-16 14:53:54.260 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: 0.0876085 -0.560317 -0.652518 -0.116969 -0.182608 -9.40531e-05 0.186293 0.123206 0.0127445 0.0708352 -0.489624 -0.226322 -0.0686949 -0.0341293 -0.0719619 0.103588 0.398025 -0.444261 0.396124 -0.347295 0.00541205 791 05-16 14:53:54.430 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr I MS_LITE_LOG: Build MSLite model success. 792 05-16 14:53:54.462 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr D MS_LITE_LOG: Run MSLite model Predict success. 793 ...... 794 05-16 14:53:55.272 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr I MS_LITE_LOG: run decoder loop 16 ok! 795 token = 50257 796 05-16 14:53:55.307 1679-1679 A00000/[MSLiteNapi] com.sampl...cdemoasr I MS_LITE_LOG: result is -> I think the most important thing about running is that it brings me physical health. 797 05-16 14:53:55.334 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: output length = 20 ;value = I think the most important thing about running is that it brings me physical health. 798 05-16 14:53:55.334 1679-1679 A03d00/JSAPP com.sampl...cdemoasr I MS_LITE_LOG: *** Finished MSLite Demo *** 799 ``` 800 801 802### Effects 803 804After you tap the **Play Audio** button on the device screen, the sample audio file is played. After you tap the **Recognize Audio** button, the content of the sample audio file is displayed on the device screen. 805 806| Initial Page | Page with Recognized Content | 807| :-----------------------: | :-----------------------: | 808|  |  | 809 810