• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Persisting Vector Store Data (C/C++)
2
3
4## When to Use
5
6Vector stores are designed to store, manage, and retrieve vector data while also supporting relational data processing for scalar values. The data type **floatvector** is used to store data vectorization results, enabling rapid retrieval and similarity searches for such data.<br>
7Since API version 18, data in vector stores can be persisted.
8
9## Basic Concepts
10
11- **ResultSet**: a set of query results, which allows access to the required data in flexible modes.
12- **floatvector**: vector data, for example, **[1.0, 3.0, 2.4, 5.1, 6.2, 11.7]**.
13
14## Constraints
15
16- By default, the Write Ahead Log (WAL) and the **FULL** flushing mode are used.
17
18- A vector store supports a maximum of four read connections and one write connection at a time by default. A thread can perform the read operation when acquiring an idle read connection. If there is no idle read connection, a new read connection will be created.
19
20- To ensure data accuracy, the database supports only one write operation at a time. Concurrent write operations are performed in serial mode.
21
22- Once an application is uninstalled, related database files and temporary files are automatically deleted from the device.
23
24- To ensure successful data access, limit the size of a data record to 2 MB. Data records larger than this may be inserted correctly but fail to read.
25
26## Available APIs
27
28For details about the APIs, see [RDB](../reference/apis-arkdata/_r_d_b.md).
29
30| API| Description|
31| -------- | -------- |
32| int OH_Rdb_SetDbType(OH_Rdb_ConfigV2 *config, int dbType) | Sets the database type.|
33| OH_Rdb_Store *OH_Rdb_CreateOrOpen(const OH_Rdb_ConfigV2 *config, int *errCode) | Obtains an **OH_Rdb_Store** instance (**dbType** is set to **RDB_CAYLEY** by using **OH_Rdb_SetDbType**) for vector operations.|
34| int OH_Rdb_ExecuteV2(OH_Rdb_Store *store, const char *sql, const OH_Data_Values *args, OH_Data_Value **result) | Executes SQL statements with a return value to perform write operations. Parameter binding is supported. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.|
35| int OH_Rdb_ExecuteByTrxId(OH_Rdb_Store *store, int64_t trxId, const char *sql) | Executes the SQL statement that does not return a value with the specified transaction ID. If the transaction ID is **0**, no transaction is used.|
36| OH_Cursor *OH_Rdb_ExecuteQuery(OH_Rdb_Store *store, const char *sql) | Queries data in a store using the specified SQL statements.|
37| OH_Cursor *OH_Rdb_ExecuteQueryV2(OH_Rdb_Store *store, const char *sql, const OH_Data_Values *args) | Queries data in a store using the specified SQL statements. Parameter binding is supported. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.|
38| int OH_Rdb_DeleteStoreV2(const OH_Rdb_ConfigV2 *config) | Deletes a vector store.|
39| int OH_Cursor_GetFloatVectorCount(OH_Cursor *cursor, int32_t columnIndex, size_t *length) | Obtains the length of the floating-point array in the specified column of the current row.|
40| int OH_Cursor_GetFloatVector(OH_Cursor *cursor, int32_t columnIndex, float *val, size_t inLen, size_t *outLen) | Obtains the value in the specified column of the current row as a floating-point array. The value of **inLen** cannot be less than the actual array size.|
41
42## How to Develop
43
44**Adding Dynamic Link Libraries**
45
46Add the following libraries to **CMakeLists.txt**.
47
48```txt
49libnative_rdb_ndk.z.so
50```
51
52**Including Header Files**
53
54```c++
55#include <database/data/oh_data_values.h>
56#include <database/rdb/oh_cursor.h>
57#include <database/rdb/relational_store.h>
58```
59
601. Check whether the system supports vector stores. The sample code is as follows:
61
62   ```c
63   int numType = 0;
64   // If numType is 2, the system supports vector stores. If numType 1, the system does not support vector stores.
65   OH_Rdb_GetSupportedDbType(&numType);
66   ```
67
682. If the system supports vector stores, obtain an **OH_Rdb_Store** instance. The sample code is as follows:
69
70   ```c
71   // Create an OH_Rdb_Config instance.
72   OH_Rdb_ConfigV2 *config = OH_Rdb_CreateConfig();
73   // The path must be an application sandbox path.
74   OH_Rdb_SetDatabaseDir(config, "xxx");
75   // Set the store name, which is the database file name.
76   OH_Rdb_SetStoreName(config, "rdb_vector_test.db");
77   // Set the application bundle name.
78   OH_Rdb_SetBundleName(config, "xxx");
79   // Specify whether the database is encrypted.
80   OH_Rdb_SetEncrypted(config, false);
81   // Set the security level of the database files.
82   OH_Rdb_SetSecurityLevel(config,   OH_Rdb_SecurityLevel::S1);
83   // Set the encryption level for the directory holding the database files.
84   OH_Rdb_SetArea(config, RDB_SECURITY_AREA_EL1);
85   // Set the storage type.
86   OH_Rdb_SetDbType(config, RDB_CAYLEY);
87
88   // Create an OH_Rdb_Store instance.
89   int errCode = 0;
90   OH_Rdb_Store *store_ = OH_Rdb_CreateOrOpen(config, &errCode);
91   ```
92
933. Create a table and insert data into the table.
94
95   > **NOTE**
96   >
97   > **VectorStore** does not provide explicit flush operations for data persistence. The data inserted is persisted.
98
99   The sample code is as follows:
100
101   ```c
102   char createTableSql[] = "CREATE TABLE test (id INTEGER PRIMARY KEY AUTOINCREMENT, data1 floatvector(2));";
103   // Create a table.
104   OH_Rdb_ExecuteByTrxId(store_, 0, createTableSql);
105
106   // Insert data without parameter binding.
107   OH_Rdb_ExecuteV2(store_, "INSERT INTO test (id, data1) VALUES (0, '[3.4, 4.5]');", nullptr, nullptr);
108   // Insert data with parameter binding.
109   OH_Data_Values *values = OH_Values_Create();
110   OH_Values_PutInt(values, 1);
111   float test[] = { 1.2, 2.3 };
112   size_t len = sizeof(test) / sizeof(test[0]);
113   OH_Values_PutFloatVector(values, test, len);
114   char insertSql[] = "INSERT INTO test (id, data1)   VALUES (?, ?);";
115   OH_Rdb_ExecuteV2(store_, insertSql, values, nullptr);
116   OH_Values_Destroy(values);
117   ```
118
1194. Modify or delete data. The sample code is as follows:
120
121   ```c
122   // Modify data without parameter binding.
123   OH_Rdb_ExecuteV2(store_, "update test set data1 = '[5.1, 6.1] where id = 0;", nullptr, nullptr);
124
125   // Modify data with parameter binding.
126   float test1[2] = { 5.5, 6.6 };
127   OH_Data_Values *values1 = OH_Values_Create();
128   OH_Values_PutFloatVector(values1, test1, 2);
129   OH_Values_PutInt(values1, 1);
130   OH_Rdb_ExecuteV2(store_, "update test set data1 = ? where id = ?", values1, nullptr);
131   OH_Values_Destroy(values1);
132
133   // Delete data without parameter binding.
134   OH_Rdb_ExecuteV2(store_, "delete from test where id = 0", nullptr, nullptr);
135
136   // Delete data with parameter binding.
137   OH_Data_Values *values2 = OH_Values_Create();
138   OH_Values_PutInt(values2, 01);
139   OH_Rdb_ExecuteV2(store_, "delete from test where id = ?", values2, nullptr);
140   OH_Values_Destroy(values2);
141   ```
142
1435. Query data.
144
145   > **NOTE**
146   >
147   > Use **destroy()** to destroy the **ResultSet** that is no longer used in a timely manner so that the memory allocated can be released.
148
149   The sample code is as follows:
150
151   ```c
152   // Query data without parameter binding.
153   OH_Cursor *cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where id = 1;", nullptr);
154   int rowCount = 0;
155   cursor->getRowCount(cursor, &rowCount);
156   cursor->goToNextRow(cursor);
157   size_t count = 0;
158   // The floatvector array is the second column of data.
159   OH_Cursor_GetFloatVectorCount(cursor, 1, &count);
160   float test2[count];
161   size_t outLen;
162   OH_Cursor_GetFloatVector(cursor, 1, test2, count, &outLen);
163   cursor->destroy(cursor);
164
165   // Query data with parameter binding.
166   char querySql[] = "select * from test where id = ?;";
167   OH_Data_Values *values3 = OH_Values_Create();
168   OH_Values_PutInt(values3, 1);
169   cursor = OH_Rdb_ExecuteQueryV2(store_, querySql, values3);
170   OH_Values_Destroy(values3);
171   cursor->destroy(cursor);
172
173   // Query data in another table, and create the table if it does not exist.
174   OH_Rdb_ExecuteV2(store_, "CREATE TABLE IF NOT EXISTS test1(id text PRIMARY KEY);", nullptr, nullptr);
175   cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where id in (select id from test1);", nullptr);
176   cursor->destroy(cursor);
177
178   // Perform aggregate query.
179   cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where data1 <-> '[1.0, 1.0]' > 0 group by id having max(data1 <=> '[1.0, 1.0]');", nullptr);
180   cursor->destroy(cursor);
181
182   // Perform multi-table query.
183   cursor = OH_Rdb_ExecuteQueryV2(store_, "select id, data1 <-> '[1.5, 5.6]' as distance from test union select id, data1 <-> '[1.5, 5.6]' as distance from test order by distance limit 5;", nullptr);
184   cursor->destroy(cursor);
185   ```
186
1876. Create a view and query data. The sample code is as follows:
188
189   ```c
190   OH_Rdb_ExecuteV2(store_, "CREATE VIEW v1 as select * from test where id > 0;", nullptr, nullptr);
191   OH_Cursor *cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from v1;", nullptr);
192   cursor->destroy(cursor);
193   ```
194
1957. Query data using vector indexes.
196
197   The vector store uses vectors as keys to provide efficient and fast search capabilities.
198
199   It supports the basic syntax and extended syntax as follows:
200
201   - Basic syntax:
202
203     ```sql
204     // index_name indicates the index name, index_type indicates the index type, and dist_function indicates the type of distance function for similarity measurement.
205     CREATE INDEX [IF NOT EXISTS] index_name ON table_name USING index_type (column_name dist_function);
206
207     DROP INDEX table_name.index_name;
208     ```
209   - Extended syntax:
210
211     ```sql
212     CREATE INDEX [Basic syntax] [WITH(parameter = value [, ...])];
213     ```
214
215   **Table 1** index_type
216
217   | Type     | Description                                                    |
218   | --------- | ------------------------------------------------------------ |
219   | gsdiskann | Index for processing high-dimensional dense vector data, such as text embedding and image features.        |
220
221   **Table 2** dist_function
222
223   | Type  | Operator| Description  |
224   | ------ | -------- | ---------- |
225   | L2     | <->      | Euclidean distance.|
226   | COSINE | <=>      | Cosine distance.|
227
228   **Table 3** parameter (extended syntax parameters)
229
230   | Name  | Value Range| Description  |
231   | ------ | -------- | ---------- |
232   | QUEUE_SIZE | Value range: [10, 1000]<br>Default value: **20**    | Size of the candidate queue when an index is created for nearest neighbor search. A larger value indicates a lower construction speed and a slightly higher recall rate.|
233   | OUT_DEGREE | Value range: [1, 1200]<br>Default value: **60**     | Number of outgoing edges of a node in the graph. The value of **OUT_DEGREE** cannot exceed **pageSize**. Otherwise, the error GRD_INVALID_ARGS will be thrown.|
234
235   > **NOTE**
236   >
237   > - When deleting an index, you need to specify the table name, that is, **Drop Index table.index_name**.
238   >
239   > - The index created with a table cannot be deleted, for example, the primary key cannot be deleted.
240   >
241   > - When querying data based on vector indexes, you must use **ORDER BY** and **LIMIT**. **ORDER BY** has only one sorting condition, that is, the vector distance condition. If **ORDER BY** is used with **DESC**, vector indexes will not be used. The distance metric used for querying must match the metric used when the index is created. For example, if the vector index is created using **L2**, <-> must be used for the query. Otherwise, the index cannot be hit.
242
243   The sample code is as follows:
244
245   ```c
246   // Create an index using the basic syntax. The index name is diskann_l2_idx, index column is repr, type is gsdiskann, and the distance metric is L2.
247   OH_Rdb_ExecuteV2(store_, "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(data1 L2);", nullptr, nullptr);
248
249   // Delete the diskann_l2_idx index from the test table.
250   OH_Rdb_ExecuteV2(store_, "DROP INDEX test.diskann_l2_idx;", nullptr, nullptr);
251
252   // Set QUEUE_SIZE to 20 and OUT_DEGREE to 50 using the extended syntax.
253   OH_Rdb_ExecuteV2(store_, "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2) WITH (queue_size=20, out_degree=50);", nullptr, nullptr);
254   ```
255
2568. Configure the data aging policy, which allows the application data to be automatically deleted by time or space.
257
258   The syntax is as follows:
259
260   ```sql
261   CREATE TABLE table_name(column_name type [, ...]) [WITH(parameter = value [, ...])];
262   ```
263
264   **parameter** specifies the parameter to set, and **value** specifies the value of the parameter. The following table describes the fields contained in **parameter**.
265
266   **Table 4** parameter (data aging policy parameters)
267
268   | Name| Mandatory| Description|
269   | ------ | -------- | ---------- |
270   | time_col | Yes| Column name The value must be an integer and cannot be empty.|
271   | interval | No| Interval for executing the aging task thread. If a write operation is performed after the interval, an aging task will be triggered to delete the data that meets the aging conditions. If the write operation is performed within the interval, no aging task will be triggered. <br>Value range: [5 second, 1 year]<br>Default value: **1 day**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).|
272   | ttl | No| Data retention period. <br>Value range: [1 hour, 1 year]<br>Default value: **3 month**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).|
273   | max_num | No| Maximum data volume allowed. <br>Value range: [100, 1024]<br>Default value: **1024**<br> After the aging task deletes expired data, if the remaining data in the table exceeds the value of **max_num**, data tied to the nearest expiration-adjacent time point will be deleted until the total row count falls below **max_num**.|
274
275   Time-related parameters are converted into seconds as follows.
276
277   | Unit| Value in Seconds|
278   | ------ | -------- |
279   | year | 365 * 24 * 60 * 60 |
280   | month | 30 * 24 * 60 * 60 |
281   | day | 24 * 60 * 60 |
282   | hour | 60 * 60 |
283   | minute | 60 |
284
285   For example, if **ttl** is set to **3 months**, the value will be converted into 7,776,000 seconds (3 x (30 x 24 x 60 x 60)).
286
287   The sample code is as follows:
288
289   ```c
290   // The write operation performed every 5 minutes will trigger a data aging task.
291   OH_Rdb_ExecuteV2(store_, "CREATE TABLE test2(rec_time integer not null) WITH (time_col = 'rec_time', interval = '5 minute');", nullptr, nullptr);
292   ```
293
2949. Delete the database. The sample code is as follows:
295
296   ```c
297   OH_Rdb_CloseStore(store_);
298   OH_Rdb_DeleteStoreV2(config);
299   ```
300