1# Persisting Vector Store Data (C/C++) 2 3 4## When to Use 5 6Vector stores are designed to store, manage, and retrieve vector data while also supporting relational data processing for scalar values. The data type **floatvector** is used to store data vectorization results, enabling rapid retrieval and similarity searches for such data.<br> 7Since API version 18, data in vector stores can be persisted. 8 9## Basic Concepts 10 11- **ResultSet**: a set of query results, which allows access to the required data in flexible modes. 12- **floatvector**: vector data, for example, **[1.0, 3.0, 2.4, 5.1, 6.2, 11.7]**. 13 14## Constraints 15 16- By default, the Write Ahead Log (WAL) and the **FULL** flushing mode are used. 17 18- A vector store supports a maximum of four read connections and one write connection at a time by default. A thread can perform the read operation when acquiring an idle read connection. If there is no idle read connection, a new read connection will be created. 19 20- To ensure data accuracy, the database supports only one write operation at a time. Concurrent write operations are performed in serial mode. 21 22- Once an application is uninstalled, related database files and temporary files are automatically deleted from the device. 23 24- To ensure successful data access, limit the size of a data record to 2 MB. Data records larger than this may be inserted correctly but fail to read. 25 26## Available APIs 27 28For details about the APIs, see [RDB](../reference/apis-arkdata/_r_d_b.md). 29 30| API| Description| 31| -------- | -------- | 32| int OH_Rdb_SetDbType(OH_Rdb_ConfigV2 *config, int dbType) | Sets the database type.| 33| OH_Rdb_Store *OH_Rdb_CreateOrOpen(const OH_Rdb_ConfigV2 *config, int *errCode) | Obtains an **OH_Rdb_Store** instance (**dbType** is set to **RDB_CAYLEY** by using **OH_Rdb_SetDbType**) for vector operations.| 34| int OH_Rdb_ExecuteV2(OH_Rdb_Store *store, const char *sql, const OH_Data_Values *args, OH_Data_Value **result) | Executes SQL statements with a return value to perform write operations. Parameter binding is supported. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.| 35| int OH_Rdb_ExecuteByTrxId(OH_Rdb_Store *store, int64_t trxId, const char *sql) | Executes the SQL statement that does not return a value with the specified transaction ID. If the transaction ID is **0**, no transaction is used.| 36| OH_Cursor *OH_Rdb_ExecuteQuery(OH_Rdb_Store *store, const char *sql) | Queries data in a store using the specified SQL statements.| 37| OH_Cursor *OH_Rdb_ExecuteQueryV2(OH_Rdb_Store *store, const char *sql, const OH_Data_Values *args) | Queries data in a store using the specified SQL statements. Parameter binding is supported. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.| 38| int OH_Rdb_DeleteStoreV2(const OH_Rdb_ConfigV2 *config) | Deletes a vector store.| 39| int OH_Cursor_GetFloatVectorCount(OH_Cursor *cursor, int32_t columnIndex, size_t *length) | Obtains the length of the floating-point array in the specified column of the current row.| 40| int OH_Cursor_GetFloatVector(OH_Cursor *cursor, int32_t columnIndex, float *val, size_t inLen, size_t *outLen) | Obtains the value in the specified column of the current row as a floating-point array. The value of **inLen** cannot be less than the actual array size.| 41 42## How to Develop 43 44**Adding Dynamic Link Libraries** 45 46Add the following libraries to **CMakeLists.txt**. 47 48```txt 49libnative_rdb_ndk.z.so 50``` 51 52**Including Header Files** 53 54```c++ 55#include <database/data/oh_data_values.h> 56#include <database/rdb/oh_cursor.h> 57#include <database/rdb/relational_store.h> 58``` 59 601. Check whether the system supports vector stores. The sample code is as follows: 61 62 ```c 63 int numType = 0; 64 // If numType is 2, the system supports vector stores. If numType 1, the system does not support vector stores. 65 OH_Rdb_GetSupportedDbType(&numType); 66 ``` 67 682. If the system supports vector stores, obtain an **OH_Rdb_Store** instance. The sample code is as follows: 69 70 ```c 71 // Create an OH_Rdb_Config instance. 72 OH_Rdb_ConfigV2 *config = OH_Rdb_CreateConfig(); 73 // The path must be an application sandbox path. 74 OH_Rdb_SetDatabaseDir(config, "xxx"); 75 // Set the store name, which is the database file name. 76 OH_Rdb_SetStoreName(config, "rdb_vector_test.db"); 77 // Set the application bundle name. 78 OH_Rdb_SetBundleName(config, "xxx"); 79 // Specify whether the database is encrypted. 80 OH_Rdb_SetEncrypted(config, false); 81 // Set the security level of the database files. 82 OH_Rdb_SetSecurityLevel(config, OH_Rdb_SecurityLevel::S1); 83 // Set the encryption level for the directory holding the database files. 84 OH_Rdb_SetArea(config, RDB_SECURITY_AREA_EL1); 85 // Set the storage type. 86 OH_Rdb_SetDbType(config, RDB_CAYLEY); 87 88 // Create an OH_Rdb_Store instance. 89 int errCode = 0; 90 OH_Rdb_Store *store_ = OH_Rdb_CreateOrOpen(config, &errCode); 91 ``` 92 933. Create a table and insert data into the table. 94 95 > **NOTE** 96 > 97 > **VectorStore** does not provide explicit flush operations for data persistence. The data inserted is persisted. 98 99 The sample code is as follows: 100 101 ```c 102 char createTableSql[] = "CREATE TABLE test (id INTEGER PRIMARY KEY AUTOINCREMENT, data1 floatvector(2));"; 103 // Create a table. 104 OH_Rdb_ExecuteByTrxId(store_, 0, createTableSql); 105 106 // Insert data without parameter binding. 107 OH_Rdb_ExecuteV2(store_, "INSERT INTO test (id, data1) VALUES (0, '[3.4, 4.5]');", nullptr, nullptr); 108 // Insert data with parameter binding. 109 OH_Data_Values *values = OH_Values_Create(); 110 OH_Values_PutInt(values, 1); 111 float test[] = { 1.2, 2.3 }; 112 size_t len = sizeof(test) / sizeof(test[0]); 113 OH_Values_PutFloatVector(values, test, len); 114 char insertSql[] = "INSERT INTO test (id, data1) VALUES (?, ?);"; 115 OH_Rdb_ExecuteV2(store_, insertSql, values, nullptr); 116 OH_Values_Destroy(values); 117 ``` 118 1194. Modify or delete data. The sample code is as follows: 120 121 ```c 122 // Modify data without parameter binding. 123 OH_Rdb_ExecuteV2(store_, "update test set data1 = '[5.1, 6.1] where id = 0;", nullptr, nullptr); 124 125 // Modify data with parameter binding. 126 float test1[2] = { 5.5, 6.6 }; 127 OH_Data_Values *values1 = OH_Values_Create(); 128 OH_Values_PutFloatVector(values1, test1, 2); 129 OH_Values_PutInt(values1, 1); 130 OH_Rdb_ExecuteV2(store_, "update test set data1 = ? where id = ?", values1, nullptr); 131 OH_Values_Destroy(values1); 132 133 // Delete data without parameter binding. 134 OH_Rdb_ExecuteV2(store_, "delete from test where id = 0", nullptr, nullptr); 135 136 // Delete data with parameter binding. 137 OH_Data_Values *values2 = OH_Values_Create(); 138 OH_Values_PutInt(values2, 01); 139 OH_Rdb_ExecuteV2(store_, "delete from test where id = ?", values2, nullptr); 140 OH_Values_Destroy(values2); 141 ``` 142 1435. Query data. 144 145 > **NOTE** 146 > 147 > Use **destroy()** to destroy the **ResultSet** that is no longer used in a timely manner so that the memory allocated can be released. 148 149 The sample code is as follows: 150 151 ```c 152 // Query data without parameter binding. 153 OH_Cursor *cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where id = 1;", nullptr); 154 int rowCount = 0; 155 cursor->getRowCount(cursor, &rowCount); 156 cursor->goToNextRow(cursor); 157 size_t count = 0; 158 // The floatvector array is the second column of data. 159 OH_Cursor_GetFloatVectorCount(cursor, 1, &count); 160 float test2[count]; 161 size_t outLen; 162 OH_Cursor_GetFloatVector(cursor, 1, test2, count, &outLen); 163 cursor->destroy(cursor); 164 165 // Query data with parameter binding. 166 char querySql[] = "select * from test where id = ?;"; 167 OH_Data_Values *values3 = OH_Values_Create(); 168 OH_Values_PutInt(values3, 1); 169 cursor = OH_Rdb_ExecuteQueryV2(store_, querySql, values3); 170 OH_Values_Destroy(values3); 171 cursor->destroy(cursor); 172 173 // Query data in another table, and create the table if it does not exist. 174 OH_Rdb_ExecuteV2(store_, "CREATE TABLE IF NOT EXISTS test1(id text PRIMARY KEY);", nullptr, nullptr); 175 cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where id in (select id from test1);", nullptr); 176 cursor->destroy(cursor); 177 178 // Perform aggregate query. 179 cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where data1 <-> '[1.0, 1.0]' > 0 group by id having max(data1 <=> '[1.0, 1.0]');", nullptr); 180 cursor->destroy(cursor); 181 182 // Perform multi-table query. 183 cursor = OH_Rdb_ExecuteQueryV2(store_, "select id, data1 <-> '[1.5, 5.6]' as distance from test union select id, data1 <-> '[1.5, 5.6]' as distance from test order by distance limit 5;", nullptr); 184 cursor->destroy(cursor); 185 ``` 186 1876. Create a view and query data. The sample code is as follows: 188 189 ```c 190 OH_Rdb_ExecuteV2(store_, "CREATE VIEW v1 as select * from test where id > 0;", nullptr, nullptr); 191 OH_Cursor *cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from v1;", nullptr); 192 cursor->destroy(cursor); 193 ``` 194 1957. Query data using vector indexes. 196 197 The vector store uses vectors as keys to provide efficient and fast search capabilities. 198 199 It supports the basic syntax and extended syntax as follows: 200 201 - Basic syntax: 202 203 ```sql 204 // index_name indicates the index name, index_type indicates the index type, and dist_function indicates the type of distance function for similarity measurement. 205 CREATE INDEX [IF NOT EXISTS] index_name ON table_name USING index_type (column_name dist_function); 206 207 DROP INDEX table_name.index_name; 208 ``` 209 - Extended syntax: 210 211 ```sql 212 CREATE INDEX [Basic syntax] [WITH(parameter = value [, ...])]; 213 ``` 214 215 **Table 1** index_type 216 217 | Type | Description | 218 | --------- | ------------------------------------------------------------ | 219 | gsdiskann | Index for processing high-dimensional dense vector data, such as text embedding and image features. | 220 221 **Table 2** dist_function 222 223 | Type | Operator| Description | 224 | ------ | -------- | ---------- | 225 | L2 | <-> | Euclidean distance.| 226 | COSINE | <=> | Cosine distance.| 227 228 **Table 3** parameter (extended syntax parameters) 229 230 | Name | Value Range| Description | 231 | ------ | -------- | ---------- | 232 | QUEUE_SIZE | Value range: [10, 1000]<br>Default value: **20** | Size of the candidate queue when an index is created for nearest neighbor search. A larger value indicates a lower construction speed and a slightly higher recall rate.| 233 | OUT_DEGREE | Value range: [1, 1200]<br>Default value: **60** | Number of outgoing edges of a node in the graph. The value of **OUT_DEGREE** cannot exceed **pageSize**. Otherwise, the error GRD_INVALID_ARGS will be thrown.| 234 235 > **NOTE** 236 > 237 > - When deleting an index, you need to specify the table name, that is, **Drop Index table.index_name**. 238 > 239 > - The index created with a table cannot be deleted, for example, the primary key cannot be deleted. 240 > 241 > - When querying data based on vector indexes, you must use **ORDER BY** and **LIMIT**. **ORDER BY** has only one sorting condition, that is, the vector distance condition. If **ORDER BY** is used with **DESC**, vector indexes will not be used. The distance metric used for querying must match the metric used when the index is created. For example, if the vector index is created using **L2**, <-> must be used for the query. Otherwise, the index cannot be hit. 242 243 The sample code is as follows: 244 245 ```c 246 // Create an index using the basic syntax. The index name is diskann_l2_idx, index column is repr, type is gsdiskann, and the distance metric is L2. 247 OH_Rdb_ExecuteV2(store_, "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(data1 L2);", nullptr, nullptr); 248 249 // Delete the diskann_l2_idx index from the test table. 250 OH_Rdb_ExecuteV2(store_, "DROP INDEX test.diskann_l2_idx;", nullptr, nullptr); 251 252 // Set QUEUE_SIZE to 20 and OUT_DEGREE to 50 using the extended syntax. 253 OH_Rdb_ExecuteV2(store_, "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2) WITH (queue_size=20, out_degree=50);", nullptr, nullptr); 254 ``` 255 2568. Configure the data aging policy, which allows the application data to be automatically deleted by time or space. 257 258 The syntax is as follows: 259 260 ```sql 261 CREATE TABLE table_name(column_name type [, ...]) [WITH(parameter = value [, ...])]; 262 ``` 263 264 **parameter** specifies the parameter to set, and **value** specifies the value of the parameter. The following table describes the fields contained in **parameter**. 265 266 **Table 4** parameter (data aging policy parameters) 267 268 | Name| Mandatory| Description| 269 | ------ | -------- | ---------- | 270 | time_col | Yes| Column name The value must be an integer and cannot be empty.| 271 | interval | No| Interval for executing the aging task thread. If a write operation is performed after the interval, an aging task will be triggered to delete the data that meets the aging conditions. If the write operation is performed within the interval, no aging task will be triggered. <br>Value range: [5 second, 1 year]<br>Default value: **1 day**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).| 272 | ttl | No| Data retention period. <br>Value range: [1 hour, 1 year]<br>Default value: **3 month**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).| 273 | max_num | No| Maximum data volume allowed. <br>Value range: [100, 1024]<br>Default value: **1024**<br> After the aging task deletes expired data, if the remaining data in the table exceeds the value of **max_num**, data tied to the nearest expiration-adjacent time point will be deleted until the total row count falls below **max_num**.| 274 275 Time-related parameters are converted into seconds as follows. 276 277 | Unit| Value in Seconds| 278 | ------ | -------- | 279 | year | 365 * 24 * 60 * 60 | 280 | month | 30 * 24 * 60 * 60 | 281 | day | 24 * 60 * 60 | 282 | hour | 60 * 60 | 283 | minute | 60 | 284 285 For example, if **ttl** is set to **3 months**, the value will be converted into 7,776,000 seconds (3 x (30 x 24 x 60 x 60)). 286 287 The sample code is as follows: 288 289 ```c 290 // The write operation performed every 5 minutes will trigger a data aging task. 291 OH_Rdb_ExecuteV2(store_, "CREATE TABLE test2(rec_time integer not null) WITH (time_col = 'rec_time', interval = '5 minute');", nullptr, nullptr); 292 ``` 293 2949. Delete the database. The sample code is as follows: 295 296 ```c 297 OH_Rdb_CloseStore(store_); 298 OH_Rdb_DeleteStoreV2(config); 299 ``` 300