• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Application Data Vectorization (ArkTS)
2<!--Kit: ArkData-->
3<!--Subsystem: DistributedDataManager-->
4<!--Owner: @my-2024; @cuile44; @pancodax-->
5<!--Designer: @fysun17; @AnruiWang; @xd_94-->
6<!--Tester: @yippo; @logic42-->
7<!--Adviser: @ge-yafang-->
8
9## When to Use
10
11In the pivotal shift from digital transformation to AI advancement, creating intelligent services is essential for boosting product competitiveness.
12
13Currently, ArkData Intelligence Platform (AIP) provides application data vectorization, which leverages embedding models to convert multi-modal data such as unstructured text and images into semantic vectors.
14
15## Basic Concepts
16
17To get started, it is helpful to understand the following concepts:
18
19### Vectorization
20The process of vectorization uses embedding models to convert high-dimensional unstructured data (such as text and images) into low-dimensional continuous vector representations. This approach captures the semantic relationships with the data, translating abstract information into a format that can be analyzed and processed by computers. Embedding technology is widely used in fields such as natural language processing (semantic search), image recognition (feature extraction), and recommendation systems (user/item representation).
21
22### Multi-Modal Embedding Model
23Embedding models are used to implement application data vectorization. The system supports multimodal embedding models, which can map different data modalities, such as text and images, into a unified vector space. These models support both single-modal semantic representation (text-to-text and image-to-image retrieval) and cross-modal capabilities (text-to-image and image-to-text retrieval).
24
25### Text Segmentation
26To address length limitations when textual data is vectorized, you can use the APIs provided by the ArkData Intelligence Platform (AIP) to split the input text into smaller sections. This approach ensures efficient and effective data vectorization.
27
28## Implementation Mechanism
29
30By leveraging the AIP, you can implement intelligent data construction. All these capabilities operate within the application processes, ensuring that data always remains in the application environment. This ensures data security and safeguards user privacy.
31
32## Working Principles
33Application data vectorization involves converting raw application data into vector formats and storing them in a vector database (store).
34
35## Constraints
36- Considering the significant computing workload and resources of data vectorization processing, the APIs are only available to 2-in-1 device applications.
37- You can use NPUs to accelerate the inference process of embedding models. NPUs are recommended because pure CPU computation falls far behind in latency and energy efficiency.
38- The model can process up to 512 characters of text per inference, supporting both Chinese and English.
39- The model can handle images below 20 MB in size in a single inference.
40
41## Available APIs
42
43The following table lists the APIs related to application data vectorization. For more APIs and their usage, see [ArkData Intelligence Platform](../reference/apis-arkdata/js-apis-data-intelligence.md).
44
45| API| Description|
46| -------- | -------- |
47| getTextEmbeddingModel(config: ModelConfig): Promise&lt;TextEmbedding&gt; | Obtains a text embedding model.|
48| loadModel(): Promise&lt;void&gt; | Loads this text embedding model.|
49| splitText(text: string, config: SplitConfig): Promise&lt;Array&lt;string&gt;&gt; | Splits text.|
50| getEmbedding(text: string): Promise&lt;Array&lt;number&gt;&gt; | Obtains the embedding vector of the given text.|
51| getEmbedding(batchTexts: Array&lt;string&gt;): Promise&lt;Array&lt;Array&lt;number&gt;&gt;&gt; | Obtains the embedding vector of a given batch of text.|
52| releaseModel(): Promise&lt;void&gt; | Releases this text embedding model.|
53| getImageEmbeddingModel(config: ModelConfig): Promise&lt;ImageEmbedding&gt; | Obtains an image embedding model.|
54| loadModel(): Promise&lt;void&gt; | Loads this image embedding model.|
55| getEmbedding(image: Image): Promise&lt;Array&lt;number&gt;&gt; | Obtains the embedding vector of the given image.|
56| releaseModel(): Promise&lt;void&gt; | Releases this image embedding model.|
57
58## How to Develop Text Vectorization
59
601. Import the **intelligence** module.
61
62   ```ts
63   import { intelligence } from '@kit.ArkData';
64   ```
65
662. Obtain a text embedding model using the **getTextEmbeddingModel** method. The sample code is as follows:
67
68```ts
69   import { BusinessError } from '@kit.BasicServicesKit';
70
71let textConfig:intelligence.ModelConfig = {
72     version:intelligence.ModelVersion.BASIC_MODEL,
73     isNpuAvailable:false,
74     cachePath:"/data"
75   }
76   let textEmbedding:intelligence.TextEmbedding;
77
78intelligence.getTextEmbeddingModel(textConfig)
79     .then((data:intelligence.TextEmbedding) => {
80       console.info("Succeeded in getting TextModel");
81       textEmbedding = data;
82     })
83     .catch((err:BusinessError) => {
84       console.error("Failed to get TextModel and code is " + err.code);
85     })
86```
87
883. Load this embedding model using the **loadModel** method. The sample code is as follows:
89
90```ts
91   textEmbedding.loadModel()
92     .then(() => {
93       console.info("Succeeded in loading Model");
94     })
95     .catch((err:BusinessError) => {
96       console.error("Failed to load Model and code is " + err.code);
97     })
98```
99
1004. Split text. If the data length exceeds the limit, call **splitText()** to split the data into smaller text blocks and then vectorize them.
101   The sample code is as follows:
102
103   ```ts
104   let splitConfig:intelligence.SplitConfig = {
105     size:10,
106     overlapRatio:0.1
107   }
108   let splitText = 'text';
109
110   intelligence.splitText(splitText, splitConfig)
111     .then((data:Array<string>) => {
112       console.info("Succeeded in splitting Text");
113     })
114     .catch((err:BusinessError) => {
115       console.error("Failed to split Text and code is " + err.code);
116     })
117   ```
118
1195. Obtain the embedding vector of the given text using the **getEmbedding** method. The given text can be a single piece of text or a collection of multiple text entries.
120   The sample code is as follows:
121
122   ```ts
123   let text = 'text';
124   textEmbedding.getEmbedding(text)
125     .then((data:Array<number>) => {
126       console.info("Succeeded in getting Embedding");
127     })
128     .catch((err:BusinessError) => {
129       console.error("Failed to get Embedding and code is " + err.code);
130     })
131   ```
132
133   ```ts
134   let batchTexts = ['text1','text2'];
135   textEmbedding.getEmbedding(batchTexts)
136     .then((data:Array<Array<number>>) => {
137       console.info("Succeeded in getting Embedding");
138     })
139     .catch((err:BusinessError) => {
140       console.error("Failed to get Embedding and code is " + err.code);
141     })
142   ```
143
1446. Release this text embedding model using the **releaseModel** method. The sample code is as follows:
145
146```ts
147   textEmbedding.releaseModel()
148     .then(() => {
149       console.info("Succeeded in releasing Model");
150     })
151     .catch((err:BusinessError) => {
152       console.error("Failed to release Model and code is " + err.code);
153     })
154```
155
156## How to Develop Image Vectorization
157
1581. Import the **intelligence** module.
159
160   ```ts
161   import { intelligence } from '@kit.ArkData';
162   ```
163
1642. Obtain an image embedding model using the **getImageEmbeddingModel** method. The sample code is as follows:
165
166```ts
167   let imageConfig:intelligence.ModelConfig = {
168     version:intelligence.ModelVersion.BASIC_MODEL,
169     isNpuAvailable:false,
170     cachePath:"/data"
171   }
172   let imageEmbedding:intelligence.ImageEmbedding;
173
174intelligence.getImageEmbeddingModel(imageConfig)
175     .then((data:intelligence.ImageEmbedding) => {
176       console.info("Succeeded in getting ImageModel");
177       imageEmbedding = data;
178     })
179     .catch((err:BusinessError) => {
180       console.error("Failed to get ImageModel and code is " + err.code);
181     })
182```
183
1843. Load this image embedding model using the **loadModel** method. The sample code is as follows:
185
186```ts
187   imageEmbedding.loadModel()
188     .then(() => {
189        console.info("Succeeded in loading Model");
190     })
191     .catch((err:BusinessError) => {
192        console.error("Failed to load Model and code is " + err.code);
193     })
194```
195
1964. Obtain the embedding vector of the given image using the **getEmbedding** method. The sample code is as follows:
197
198 ```ts
199    let image = "file://<packageName>/data/storage/el2/base/haps/entry/files/xxx.jpg";
200    imageEmbedding.getEmbedding(image)
201      .then((data:Array<number>) => {
202        console.info("Succeeded in getting Embedding");
203      })
204      .catch((err:BusinessError) => {
205        console.error("Failed to get Embedding and code is " + err.code);
206      })
207 ```
208
2095. Release this image embedding model using the **releaseModel** method. The sample code is as follows:
210
211 ```ts
212    imageEmbedding.releaseModel()
213      .then(() => {
214        console.info("Succeeded in releasing Model");
215      })
216      .catch((err:BusinessError) => {
217        console.error("Failed to release Model and code is " + err.code);
218      })
219 ```
220