• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Persisting Vector Store Data (ArkTS)
2
3
4## When to Use
5
6Vector stores are designed to store, manage, and retrieve vector data while also supporting relational data processing for scalar values. The data type **floatvector** is used to store data vectorization results, enabling rapid retrieval and similarity searches for such data.<br>
7Since API version 18, data in vector stores can be persisted.
8
9## Basic Concepts
10
11- **ResultSet**: a set of query results, which allows access to the required data in flexible modes.
12- **floatvector**: vector data, for example, **[1.0, 3.0, 2.4, 5.1, 6.2, 11.7]**.
13
14## Constraints
15
16- The default log mode is Write Ahead Log ([WAL](data-terminology.md#write-ahead-log-wal)), and the default flush mode is [FULL](data-terminology.md#full).
17
18- A vector store supports a maximum of four read connections and one write connection at a time by default. A thread can perform the read operation when acquiring an idle read connection. If there is no idle read connection, a new read connection will be created.
19
20- To ensure data accuracy, the database supports only one write operation at a time. Concurrent write operations are performed in serial mode.
21
22- Once an application is uninstalled, related database files and temporary files are automatically deleted from the device.
23
24- ArkTS supports basic data types such as number, string, binary, and boolean, and the special data type ValueType.
25
26- To ensure successful data access, limit the size of a data record to 2 MB. If a single data record exceeds the size limit, the data record may fail to be read even if it is successfully inserted.
27
28## Specifications
29
30### Data Types
31
32Types of database table fields are as follows.
33
34| Type| Description| Supported|
35| -------- | -------- | -------- |
36| NULL | Null.| Yes|
37| INTEGER | Integer.| Yes|
38| DOUBLE | Floating point.| Yes|
39| TEXT | String.| Yes|
40| BLOB | Binary.| Yes|
41| FLOATVECTOR | Vector data.| Yes|
42
43### Field Constraints
44
45Constraints on database table fields are as follows.
46
47| Function| SQL Syntax| Supported|
48| -------- | -------- | -------- |
49| Not null| NOT NULL | Yes|
50| Default value| DEFAULT  | Yes|
51| Unique index| UNIQUE | Yes|
52| Primary key index| PRIMARY KEY | Yes|
53| Foreign key index| FOREIGN | No|
54| CHECK constraint| CHECK | No|
55
56### Clauses
57
58Clauses in a query statement are as follows.
59
60| Keyword| Description| Supported|
61| -------- | -------- | -------- |
62| WHERE | Obtains data from one or more tables.| Yes|
63| LIMIT | Restricts the data to return. | Yes|
64| ORDER BY | Sorts data based on one or more columns.| Yes|
65| ORDER BY vector distance| **<->** indicates the Euclidean distance, and **<=>** indicates the cosine distance.| Yes|
66| GROUP BY | Groups the same data.| Yes|
67| HAVING | Filters the results of aggregate functions.| Yes|
68| INDEXED BY | Uses a specific index during query.| Yes|
69| DISTINCT | Eliminates duplicate records.| No|
70
71### Sets
72
73Set statements in a query statement are as follows.
74
75| Keyword| Description| Supported|
76| -------- | -------- | -------- |
77| UNION | Merges the results of two or more query statements and deduplicates the results.| Yes|
78| UNION ALL | Merges the results of two or more query statements.| Yes|
79
80### Operators
81
82The following lists the operators used to filter data based on a condition. Generally, they are used in query statements.
83
84| Type| Operator| Supported|
85| -------- | -------- | -------- |
86| Arithmetic operation| +, -, *, /, %| Yes|
87| Comparison operation| ==, =, !=, >, >=, <, <=| Yes|
88| Logical operation| AND, BETWEEN, EXISTS, IN, NOT IN, NOT, OR, IS NULL, IS, IS NOT, LIKE, GLOB| Yes|
89| String concatenation operation| \|\| | Yes|
90| Bitwise operation| &, \|, ~, <<, >>| Yes|
91| Vector distance operation| <->, <=>| Yes (used in the aggregate functions **max** and **min**)|
92
93### Time and Date
94
95The following lists the functions used to return dates in different formats based on different time functions. Generally, these functions are used in query statements.
96
97| Keyword| Description| Supported|
98| -------- | -------- | -------- |
99| DATE | Returns a date in the format of YYYY-MM-DD.| Yes|
100| TIME | Returns a time in the format of HH:MM:SS.| Yes|
101| DATETIME | Returns date and time in the format of YYYY-MM-DD HH:MM:SS.| Yes|
102| JULIANDAY | Returns the number of days since noon on November 24, 4714 BC in Greenwich Mean Time (GMT).| Yes|
103| STRFTIME | Returns a formatted date based on the format string specified by the first parameter.| Yes|
104
105### Functions
106
107Functions in SQL statements are as follows.
108
109| Keyword| Description| Supported|
110| -------- | -------- | -------- |
111| COUNT | Calculates the number of rows returned after a query.| Yes|
112| MAX/MIN | Selects the maximum or minimum value in a column.| Yes|
113| AVG | Calculates the average value of a column.| Yes|
114| SUM | Calculates the sum of a column.| Yes|
115| RANDOM | Returns a pseudo-random integer ranging from -9223372036854775808 to 9223372036854775807.| Yes|
116| ABS | Calculates the absolute value.| Yes|
117| UPPER/LOWER | Converts a string to uppercase or lowercase letters.| Yes|
118| LENGTH | Returns the length of a string.| Yes|
119
120## Available APIs
121
122The following lists only the APIs for persisting vector store data. For details about more APIs and their usage, see [RDB Store](../reference/apis-arkdata/arkts-apis-data-relationalStore.md).
123
124| API| Description|
125| -------- | -------- |
126| getRdbStore(context: Context, config: StoreConfig): Promise&lt;RdbStore&gt; | Obtains an **RdbStore** instance for data operations.|
127| execute(sql: string, txId: number, args?: Array&lt;ValueType&gt;): Promise&lt;ValueType&gt; | Executes SQL statements that contain specified parameters. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.|
128| querySql(sql: string, bindArgs?: Array&lt;ValueType&gt;):Promise&lt;ResultSet&gt; | Queries data in the RDB store using the specified SQL statement. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.|
129| beginTrans(): Promise&lt;number&gt; | Starts the transaction before executing the SQL statements.|
130| commit(txId : number):Promise&lt;void&gt; | Commits the executed SQL statements. This API must be used together with **beginTrans**.|
131| rollback(txId : number):Promise&lt;void&gt; | Rolls back the executed SQL statements. This API must be used together with **beginTrans**.|
132| deleteRdbStore(context: Context, config: StoreConfig): Promise&lt;void&gt; | Deletes a database.|
133| isVectorSupported(): boolean | Checks whether the system supports vector stores.|
134
135## How to Develop
136
1371. Check whether the current system supports vector stores. The sample code is as follows:
138
139   ```ts
140   import { relationalStore } from '@kit.ArkData'; // Import the relationalStore module.
141   import { UIAbility } from '@kit.AbilityKit';
142   import { BusinessError } from '@kit.BasicServicesKit';
143   import { window } from '@kit.ArkUI';
144   // In this example, Ability is used to obtain an RdbStore instance. You can use other implementations as required.
145   class EntryAbility extends UIAbility {
146     async onWindowStageCreate(windowStage: window.WindowStage) {
147        // Check whether the current system supports vector stores.
148       let ret = relationalStore.isVectorSupported();
149       if (!ret) {
150         console.error(`vectorDB is not supported .`);
151         return;
152       }
153       // Open the database, and add, delete, and modify data.
154     }
155   }
156   ```
157
1582. If the system supports vector stores, obtain an **RdbStore** instance. Call **getRdbStore()** to create a database and create a table.
159
160   > **NOTE**
161   >
162   > - The RDB store created by an application varies with the context. Multiple RDB stores are created for the same database name with different application contexts. For example, each UIAbility has its own context.
163   >
164   > - When an application calls **getRdbStore()** to obtain an RDB store instance for the first time, the corresponding database file is generated in the application sandbox. When the RDB store is used, temporary files ended with **-wal** and **-shm** may be generated in the same directory as the database file. If you want to move the database files to other places, you must also move these temporary files. After the application is uninstalled, the database files and temporary files generated on the device are also removed.
165   >
166   > - For details about the error codes, see [Universal Error Codes](../reference/errorcode-universal.md) and [RDB Store Error Codes](../reference/apis-arkdata/errorcode-data-rdb.md).
167
168   The sample code is as follows:
169
170   ```ts
171   let store: relationalStore.RdbStore | undefined = undefined;
172   const STORE_CONFIG :relationalStore.StoreConfig= {
173     name: 'VectorTest.db', // Database file name.
174     securityLevel: relationalStore.SecurityLevel.S1, // Database security level.
175     vector: true // Optional. This parameter must be true for a vector store.
176   };
177
178   relationalStore.getRdbStore(this.context, STORE_CONFIG).then(async (rdbStore: relationalStore.RdbStore) => {
179     store = rdbStore;
180     // Create a table. floatvector (2) indicates that repr is 2-dimensional.
181     const SQL_CREATE_TABLE = 'CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, repr floatvector(2));';
182     // The second parameter indicates that transaction is not enabled. The third parameter undefined indicates that parameter binding is not used.
183     await store!.execute(SQL_CREATE_TABLE, 0, undefined);
184   }).catch((err: BusinessError) => {
185     console.error(`Get RdbStore failed, code is ${err.code}, message is ${err.message}`);
186   });
187   ```
188
1893. Call **execute()** to insert data to the vector store.
190
191   > **NOTE**
192   >
193   > **RelationalStore** does not provide explicit flush operations for data persistence. The data inserted is persisted.
194
195   The sample code is as follows:
196
197   ```ts
198   try {
199     // Use parameter binding.
200     const vectorValue: Float32Array = Float32Array.from([1.2, 2.3]);
201     await store!.execute("insert into test VALUES(?, ?);", 0, [0, vectorValue]);
202     // Do not use parameter binding.
203     await store!.execute("insert into test VALUES(1, '[1.3, 2.4]');", 0, undefined);
204   } catch (err) {
205     console.error(`execute insert failed, code is ${err.code}, message is ${err.message}`);
206   }
207   ```
208
2094. Call **execute()** to modify or delete data. The sample code is as follows:
210
211   ```ts
212   // Modify data.
213   try {
214     // Use parameter binding.
215     const vectorValue1: Float32Array = Float32Array.from([2.1, 3.2]);
216     await store!.execute("update test set repr = ? where id = ?", 0, [vectorValue1, 0]);
217     // Do not use parameter binding.
218     await store!.execute("update test set repr = '[5.1, 6.1]' where id = 0", 0, undefined);
219   } catch (err) {
220     console.error(`execute update failed, code is ${err.code}, message is ${err.message}`);
221   }
222
223   // Delete data.
224   try {
225     // Use parameter binding.
226     await store!.execute("delete from test where id = ?", 0, [0]);
227     // Do not use parameter binding.
228     await store!.execute("delete from test where id = 0", 0, undefined);
229   } catch (err) {
230     console.error(`execute delete failed, code is ${err.code}, message is ${err.message}`);
231   }
232   ```
233
2345. Call **querySql()** to query data. A **ResultSet** instance is returned.
235
236   > **NOTE**
237   >
238   > Use **close()** to close the **ResultSet** that is no longer used in a timely manner so that the memory allocated can be released.
239
240   The sample code is as follows:
241
242   ```ts
243   // Perform single-table queries.
244   try {
245     // Use parameter binding.
246     const QUERY_SQL = "select id, repr <-> ? as distance from test where id > ? order by repr <-> ? limit 5;";
247     const vectorValue2: Float32Array = Float32Array.from([6.2, 7.3]);
248     let resultSet = await store!.querySql(QUERY_SQL, [vectorValue2, 0, vectorValue2]);
249     while (resultSet!.goToNextRow()) {
250        let id = resultSet.getValue(0);
251        let dis = resultSet.getValue(1);
252     }
253     resultSet!.close();
254
255     // Do not use parameter binding.
256     const QUERY_SQL1 = "select id, repr <-> '[6.2, 7.3]' as distance from test where id > 0 order by repr <-> '[6.2, 7.3]' limit 5;";
257     resultSet = await store!.querySql(QUERY_SQL1);
258     resultSet!.close();
259   } catch (err) {
260     console.error(`query failed, code is ${err.code}, message is ${err.message}`);
261   }
262
263   // Perform subqueries.
264   try {
265     // Create the second table.
266     let CREATE_SQL = "CREATE TABLE IF NOT EXISTS test1(id text PRIMARY KEY);";
267     await store!.execute(CREATE_SQL);
268     let resultSet = await store!.querySql("select * from test where id in (select id from test1);");
269     resultSet!.close();
270   } catch (err) {
271     console.error(`query failed, code is ${err.code}, message is ${err.message}`);
272   }
273
274   // Perform aggregate queries.
275   try {
276     let resultSet = await store!.querySql("select * from test where repr <-> '[1.0, 1.0]' > 0 group by id having max(repr <=> '[1.0, 1.0]');");
277     resultSet!.close();
278   } catch (err) {
279     console.error(`query failed, code is ${err.code}, message is ${err.message}`);
280   }
281
282   // Perform multi-table queries.
283   try {
284     // Different union all, union will delete duplicate data.
285     let resultSet = await store!.querySql("select id, repr <-> '[1.5, 5.6]' as distance from test union select id, repr <-> '[1.5, 5.6]' as distance from test order by distance limit 5;");
286     resultSet!.close();
287   } catch (err) {
288     console.error(`query failed, code is ${err.code}, message is ${err.message}`);
289   }
290   ```
291
2926. Create a view and query data. The sample code is as follows:
293
294   ```ts
295   // Perform view queries.
296   try {
297     // Create a view.
298     await store!.execute("CREATE VIEW v1 as select * from test where id > 0;");
299     let resultSet = await store!.querySql("select * from v1;");
300     resultSet!.close();
301   } catch (err) {
302     console.error(`query failed, code is ${err.code}, message is ${err.message}`);
303   }
304   ```
305
3067. Query data using vector indexes.
307
308   The vector store uses vectors as keys to provide efficient and fast search capabilities.
309
310   It supports the basic syntax and extended syntax as follows:
311
312   - Basic syntax:
313
314     ```sql
315     // index_name indicates the index name, index_type indicates the index type, and dist_function indicates the type of distance function for similarity measurement.
316     CREATE INDEX [IF NOT EXISTS] index_name ON table_name USING index_type (column_name dist_function);
317
318     DROP INDEX table_name.index_name;
319     ```
320   - Extended syntax:
321
322     ```sql
323     CREATE INDEX [Basic syntax] [WITH(parameter = value [, ...])];
324     ```
325
326   **Table 1** index_type
327
328   | Type     | Description                                                    |
329   | --------- | ------------------------------------------------------------ |
330   | gsdiskann | Index for processing high-dimensional dense vector data, such as text embedding and image features.        |
331
332   **Table 2** dist_function
333
334   | Type  | Operator| Description  |
335   | ------ | -------- | ---------- |
336   | L2     | <->      | Euclidean distance.|
337   | COSINE | <=>      | Cosine distance.|
338
339   **Table 3** parameter (extended syntax parameters)
340
341   | Parameter  | Value Range| Description  |
342   | ------ | -------- | ---------- |
343   | QUEUE_SIZE | Value range: [10, 1000]<br>Default value: **20**    | Size of the candidate queue when an index is created for nearest neighbor search. A larger value indicates a lower construction speed and a slightly higher recall rate.|
344   | OUT_DEGREE | Value range: [1, 1200]<br>Default value: **60**     | Number of outgoing edges of a node in the graph. The value of **OUT_DEGREE** cannot exceed **pageSize**. Otherwise, the error GRD_INVALID_ARGS will be thrown.|
345
346   > **NOTE**
347   >
348   > - When deleting an index, you need to specify the table name, that is, **Drop Index table.index_name**.
349   >
350   > - The index created with a table cannot be deleted, for example, the primary key cannot be deleted.
351   >
352   > - When querying data based on vector indexes, you must use **ORDER BY** and **LIMIT**. **ORDER BY** has only one sorting condition, that is, the vector distance condition. If **ORDER BY** is used with **DESC**, vector indexes will not be used. The distance metric used for querying must match the metric used when the index is created. For example, if the vector index is created using **L2**, <-> must be used for the query. Otherwise, the index cannot be hit.
353
354   The sample code is as follows:
355
356   ```ts
357   // Basic syntax
358   try {
359     // Create an index using the basic syntax. The index name is diskann_l2_idx, index column is repr, type is gsdiskann, and the distance metric is L2.
360     await store!.execute("CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2);");
361     // Delete the diskann_l2_idx index from the test table.
362     await store!.execute("DROP INDEX test.diskann_l2_idx;");
363   } catch (err) {
364     console.error(`create index failed, code is ${err.code}, message is ${err.message}`);
365   }
366
367   // Extended syntax
368   try {
369     // Set QUEUE_SIZE to 20 and OUT_DEGREE to 50.
370     await store!.execute("CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2) WITH (queue_size=20, out_degree=50);");
371   } catch (err) {
372     console.error(`create ext index failed, code is ${err.code}, message is ${err.message}`);
373   }
374   ```
375
3768. Manually reclaim disk fragmentation generated during index deletion. This function is supported since API version 20.
377
378    After the vector store deletes vectors on the table where the GSDiskANN index has been created, disk defragmentation is automatically performed. However, automatic disk defragmentation may not be triggered in the following scenarios:
379    - After vectors are deleted from the GSDiskANN index, the vector store is closed immediately.
380    - After vectors are deleted in batches from the GSDiskANN index, no operation is performed on the table.
381
382   Therefore, the following statement is provided for you to trigger the disk defragmentation from the GSDiskANN index:
383   ```sql
384   PRAGMA DISKANN_ASYNC_COLLECTING;
385   ```
386
387   > **NOTE**
388   >
389   > - This statement is triggered to reclaim disk fragmentation from GSDiskANN indexes in all tables of the vector store at a time.
390   >
391   > - The disk defragmentation task is a background task and does not block the execution of subsequent statements.
392   >
393   > - The disk defragmentation task is automatically scheduled by the background based on the load. Generally, the task is executed only in low-load scenarios. The time required for successful execution depends on the load.
394
395    The sample code is as follows:
396
397   ```ts
398   try {
399     // Manually trigger asynchronous deletion and sorting to execute disk defragmentation for all GSDiskANN indexes in the vector store.
400     await store!.execute("PRAGMA DISKANN_ASYNC_COLLECTING;");
401   } catch (err) {
402     console.error(`diskann async collecting failed, code is ${err.code}, message is ${err.message}`);
403   }
404   ```
405
4069. Configure the data aging policy, which allows the application data to be automatically deleted by time or space.
407
408   The syntax is as follows:
409
410   ```sql
411   CREATE TABLE table_name(column_name type [, ...]) [WITH(parameter = value [, ...])];
412   ```
413
414   **parameter** specifies the parameter to set, and **value** specifies the value of the parameter. The following table describes the fields contained in **parameter**.
415
416   **Table 4** parameter (data aging policy parameters)
417
418   | Parameter| Mandatory| Value Range|
419   | ------ | -------- | ---------- |
420   | time_col | Yes| Column name. The value must be an integer and cannot be empty.|
421   | interval | No| Interval for executing the aging task thread. If a write operation is performed after the interval, an aging task will be triggered to delete the data that meets the aging conditions. If the write operation is performed within the interval, no aging task will be triggered. <br>Value range: [5 second, 1 year]<br>Default value: **1 day**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).|
422   | ttl | No| Data retention period. <br>Value range: [1 hour, 1 year]<br>Default value: **3 month**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).|
423   | max_num | No| Maximum data volume allowed. <br>Value range: [100, 1024]<br>Default value: **1024**<br> After the aging task deletes expired data, if the remaining data in the table exceeds the value of **max_num**, data tied to the nearest expiration-adjacent time point will be deleted until the total row count falls below **max_num**.|
424
425   Time-related parameters are converted into seconds as follows.
426
427   | Unit| Value in Seconds|
428   | ------ | -------- |
429   | year | 365 * 24 * 60 * 60 |
430   | month | 30 * 24 * 60 * 60 |
431   | day | 24 * 60 * 60 |
432   | hour | 60 * 60 |
433   | minute | 60 |
434
435   For example, if **ttl** is set to **3 months**, the value will be converted into 7,776,000 seconds (3 x (30 * 24 * 60 * 60)).
436
437   The sample code is as follows:
438
439   ```ts
440   try {
441     // The write operation performed every 5 minutes will trigger a data aging task.
442     await store!.execute("CREATE TABLE test2(rec_time integer not null) WITH (time_col = 'rec_time', interval = '5 minute');");
443   } catch (err) {
444     console.error(`configure data aging failed, code is ${err.code}, message is ${err.message}`);
445   }
446   ```
447
44810. Configure data compression. This feature is configured when a table is created to compress column data of the text type.
449
450    Data compression is supported since API version 20.
451
452    The syntax is as follows:
453
454    ```sql
455    CREATE TABLE table_name(content text [, ...]) [WITH(compress_col = 'content')];
456    ```
457
458    In this syntax, **compress_col** is mandatory, and value is the name of the data column of the text type. This parameter can be configured together with the data aging policy.
459
460    The sample code is as follows:
461
462    ```ts
463    try {
464      // Data compression and data aging are configured for the content column.
465      await store!.execute("CREATE TABLE IF NOT EXISTS test3 (time integer not null, content text) with (time_col = 'time', interval = '5 minute', compress_col = 'content');");
466    } catch (err) {
467       console.error(`configure data compress failed, code is ${err.code}, message is ${err.message}`);
468    }
469    ```
470
47111. Delete the database.
472
473    Call **deleteRdbStore()** to delete the vector store and related database files. The sample code is as follows:
474
475    ```ts
476    try {
477      await relationalStore.deleteRdbStore(this.context, STORE_CONFIG);
478    } catch (err) {
479      console.error(`delete rdbStore failed, code is ${err.code},message is ${err.message}`);
480    }
481    ```
482
483
484