1# Persisting Vector Store Data (C/C++) 2 3 4## When to Use 5 6Vector stores are designed to store, manage, and retrieve vector data while also supporting relational data processing for scalar values. The data type **floatvector** is used to store data vectorization results, enabling rapid retrieval and similarity searches for such data. 7 8## Basic Concepts 9 10- **ResultSet**: a set of query results, which allows access to the required data in flexible modes. 11- **floatvector**: vector data, for example, **[1.0, 3.0, 2.4, 5.1, 6.2, 11.7]**. 12 13## Constraints 14 15- By default, the Write Ahead Log (WAL) and the **FULL** flushing mode are used. 16 17- A vector store supports a maximum of four read connections and one write connection at a time by default. A thread can perform the read operation when acquiring an idle read connection. If there is no idle read connection, a new read connection will be created. 18 19- To ensure data accuracy, the database supports only one write operation at a time. Concurrent write operations are performed in serial mode. 20 21- Once an application is uninstalled, related database files and temporary files are automatically deleted from the device. 22 23- To ensure successful data access, limit the size of a data record to 2 MB. Data records larger than this may be inserted correctly but fail to read. 24 25## Available APIs 26 27For details about the APIs, see [RDB](../reference/apis-arkdata/_r_d_b.md). 28 29| API| Description| 30| -------- | -------- | 31| int OH_Rdb_SetDbType(OH_Rdb_ConfigV2 *config, int dbType) | Sets the database type.| 32| OH_Rdb_Store *OH_Rdb_CreateOrOpen(const OH_Rdb_ConfigV2 *config, int *errCode) | Obtains an **OH_Rdb_Store** instance (**dbType** is set to **RDB_CAYLEY** by using **OH_Rdb_SetDbType**) for vector operations.| 33| int OH_Rdb_ExecuteV2(OH_Rdb_Store *store, const char *sql, const OH_Data_Values *args, OH_Data_Value **result) | Executes SQL statements with a return value to perform write operations. Parameter binding is supported. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.| 34| int OH_Rdb_ExecuteByTrxId(OH_Rdb_Store *store, int64_t trxId, const char *sql) | Executes the SQL statement that does not return a value with the specified transaction ID. If the transaction ID is **0**, no transaction is used.| 35| OH_Cursor *OH_Rdb_ExecuteQuery(OH_Rdb_Store *store, const char *sql) | Queries data in a store using the specified SQL statements.| 36| OH_Cursor *OH_Rdb_ExecuteQueryV2(OH_Rdb_Store *store, const char *sql, const OH_Data_Values *args) | Queries data in a store using the specified SQL statements. Parameter binding is supported. The number of operators (such as =, >, and <) in the SQL statements cannot exceed 1000.| 37| int OH_Rdb_DeleteStoreV2(const OH_Rdb_ConfigV2 *config) | Deletes a vector store.| 38| int OH_Cursor_GetFloatVectorCount(OH_Cursor *cursor, int32_t columnIndex, size_t *length) | Obtains the length of the floating-point array in the specified column of the current row.| 39| int OH_Cursor_GetFloatVector(OH_Cursor *cursor, int32_t columnIndex, float *val, size_t inLen, size_t *outLen) | Obtains the value in the specified column of the current row as a floating-point array. The value of **inLen** cannot be less than the actual array size.| 40 41## How to Develop 42 43**Adding Dynamic Link Libraries** 44 45Add the following libraries to **CMakeLists.txt**. 46 47```txt 48libnative_rdb_ndk.z.so 49``` 50 51**Including Header Files** 52 53```c++ 54#include <database/data/oh_data_values.h> 55#include <database/rdb/oh_cursor.h> 56#include <database/rdb/relational_store.h> 57``` 58 591. Check whether the system supports vector stores. The sample code is as follows: 60 61 ```c 62 int numType = 0; 63 // If numType is 2, the system supports vector stores. If numType 1, the system does not support vector stores. 64 OH_Rdb_GetSupportedDbType(&numType); 65 ``` 66 672. If the system supports vector stores, obtain an **OH_Rdb_Store** instance. The sample code is as follows: 68 69 ```c 70 // Create an OH_Rdb_Config instance. 71 OH_Rdb_ConfigV2 *config = OH_Rdb_CreateConfig(); 72 // The path must be an application sandbox path. 73 OH_Rdb_SetDatabaseDir(config, "xxx"); 74 // Set the store name, which is the database file name. 75 OH_Rdb_SetStoreName(config, "rdb_vector_test.db"); 76 // Set the application bundle name. 77 OH_Rdb_SetBundleName(config, "xxx"); 78 // Specify whether the database is encrypted. 79 OH_Rdb_SetEncrypted(config, false); 80 // Set the security level of the database files. 81 OH_Rdb_SetSecurityLevel(config, OH_Rdb_SecurityLevel::S1); 82 // Set the encryption level for the directory holding the database files. 83 OH_Rdb_SetArea(config, RDB_SECURITY_AREA_EL1); 84 // Set the storage type. 85 OH_Rdb_SetDbType(config, RDB_CAYLEY); 86 87 // Create an OH_Rdb_Store instance. 88 int errCode = 0; 89 OH_Rdb_Store *store_ = OH_Rdb_CreateOrOpen(config, &errCode); 90 ``` 91 923. Create a table and insert data into the table. 93 94 > **NOTE** 95 > 96 > **VectorStore** does not provide explicit flush operations for data persistence. The data inserted is persisted. 97 98 The sample code is as follows: 99 100 ```c 101 char createTableSql[] = "CREATE TABLE test (id INTEGER PRIMARY KEY AUTOINCREMENT, data1 floatvector(2));"; 102 // Create a table. 103 OH_Rdb_ExecuteByTrxId(store_, 0, createTableSql); 104 105 // Insert data without parameter binding. 106 OH_Rdb_ExecuteV2(store_, "INSERT INTO test (id, data1) VALUES (0, '[3.4, 4.5]');", nullptr, nullptr); 107 // Insert data with parameter binding. 108 OH_Data_Values *values = OH_Values_Create(); 109 OH_Values_PutInt(values, 1); 110 float test[] = { 1.2, 2.3 }; 111 size_t len = sizeof(test) / sizeof(test[0]); 112 OH_Values_PutFloatVector(values, test, len); 113 char insertSql[] = "INSERT INTO test (id, data1) VALUES (?, ?);"; 114 OH_Rdb_ExecuteV2(store_, insertSql, values, nullptr); 115 OH_Values_Destroy(values); 116 ``` 117 1184. Modify or delete data. The sample code is as follows: 119 120 ```c 121 // Modify data without parameter binding. 122 OH_Rdb_ExecuteV2(store_, "update test set data1 = '[5.1, 6.1] where id = 0;", nullptr, nullptr); 123 124 // Modify data with parameter binding. 125 float test1[2] = { 5.5, 6.6 }; 126 OH_Data_Values *values1 = OH_Values_Create(); 127 OH_Values_PutFloatVector(values1, test1, 2); 128 OH_Values_PutInt(values1, 1); 129 OH_Rdb_ExecuteV2(store_, "update test set data1 = ? where id = ?", values1, nullptr); 130 OH_Values_Destroy(values1); 131 132 // Delete data without parameter binding. 133 OH_Rdb_ExecuteV2(store_, "delete from test where id = 0", nullptr, nullptr); 134 135 // Delete data with parameter binding. 136 OH_Data_Values *values2 = OH_Values_Create(); 137 OH_Values_PutInt(values2, 01); 138 OH_Rdb_ExecuteV2(store_, "delete from test where id = ?", values2, nullptr); 139 OH_Values_Destroy(values2); 140 ``` 141 1425. Query data. 143 144 > **NOTE** 145 > 146 > Use **destroy()** to destroy the **ResultSet** that is no longer used in a timely manner so that the memory allocated can be released. 147 148 The sample code is as follows: 149 150 ```c 151 // Query data without parameter binding. 152 OH_Cursor *cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where id = 1;", nullptr); 153 int rowCount = 0; 154 cursor->getRowCount(cursor, &rowCount); 155 cursor->goToNextRow(cursor); 156 size_t count = 0; 157 // The floatvector array is the second column of data. 158 OH_Cursor_GetFloatVectorCount(cursor, 1, &count); 159 float test2[count]; 160 size_t outLen; 161 OH_Cursor_GetFloatVector(cursor, 1, test2, count, &outLen); 162 cursor->destroy(cursor); 163 164 // Query data with parameter binding. 165 char querySql[] = "select * from test where id = ?;"; 166 OH_Data_Values *values3 = OH_Values_Create(); 167 OH_Values_PutInt(values3, 1); 168 cursor = OH_Rdb_ExecuteQueryV2(store_, querySql, values3); 169 OH_Values_Destroy(values3); 170 cursor->destroy(cursor); 171 172 // Query data in another table, and create the table if it does not exist. 173 OH_Rdb_ExecuteV2(store_, "CREATE TABLE IF NOT EXISTS test1(id text PRIMARY KEY);", nullptr, nullptr); 174 cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where id in (select id from test1);", nullptr); 175 cursor->destroy(cursor); 176 177 // Perform aggregate query. 178 cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from test where data1 <-> '[1.0, 1.0]' > 0 group by id having max(data1 <=> '[1.0, 1.0]');", nullptr); 179 cursor->destroy(cursor); 180 181 // Perform multi-table query. 182 cursor = OH_Rdb_ExecuteQueryV2(store_, "select id, data1 <-> '[1.5, 5.6]' as distance from test union select id, data1 <-> '[1.5, 5.6]' as distance from test order by distance limit 5;", nullptr); 183 cursor->destroy(cursor); 184 ``` 185 1866. Create a view and query data. The sample code is as follows: 187 188 ```c 189 OH_Rdb_ExecuteV2(store_, "CREATE VIEW v1 as select * from test where id > 0;", nullptr, nullptr); 190 OH_Cursor *cursor = OH_Rdb_ExecuteQueryV2(store_, "select * from v1;", nullptr); 191 cursor->destroy(cursor); 192 ``` 193 1947. Query data using vector indexes. 195 196 The vector store uses vectors as keys to provide efficient and fast search capabilities. 197 198 It supports the basic syntax and extended syntax as follows: 199 200 - Basic syntax: 201 202 ```sql 203 // index_name indicates the index name, index_type indicates the index type, and dist_function indicates the type of distance function for similarity measurement. 204 CREATE INDEX [IF NOT EXISTS] index_name ON table_name USING index_type (column_name dist_function); 205 206 DROP INDEX table_name.index_name; 207 ``` 208 - Extended syntax: 209 210 ```sql 211 CREATE INDEX [Basic syntax] [WITH(parameter = value [, ...])]; 212 ``` 213 214 **Table 1** index_type 215 216 | Type | Description | 217 | --------- | ------------------------------------------------------------ | 218 | gsdiskann | Index for processing high-dimensional dense vector data, such as text embedding and image features. | 219 220 **Table 2** dist_function 221 222 | Type | Operator| Description | 223 | ------ | -------- | ---------- | 224 | L2 | <-> | Euclidean distance.| 225 | COSINE | <=> | Cosine distance.| 226 227 **Table 3** parameter (extended syntax parameters) 228 229 | Name | Value Range| Description | 230 | ------ | -------- | ---------- | 231 | QUEUE_SIZE | Value range: [10, 1000]<br>Default value: **20** | Size of the candidate queue when an index is created for nearest neighbor search. A larger value indicates a lower construction speed and a slightly higher recall rate.| 232 | OUT_DEGREE | Value range: [1, 1200]<br>Default value: **60** | Number of outgoing edges of a node in the graph. The value of **OUT_DEGREE** cannot exceed **pageSize**. Otherwise, the error GRD_INVALID_ARGS will be thrown.| 233 234 > **NOTE** 235 > 236 > - When deleting an index, you need to specify the table name, that is, **Drop Index table.index_name**. 237 > 238 > - The index created with a table cannot be deleted, for example, the primary key cannot be deleted. 239 > 240 > - When querying data based on vector indexes, you must use **ORDER BY** and **LIMIT**. **ORDER BY** has only one sorting condition, that is, the vector distance condition. If **ORDER BY** is used with **DESC**, vector indexes will not be used. The distance metric used for querying must match the metric used when the index is created. For example, if the vector index is created using **L2**, <-> must be used for the query. Otherwise, the index cannot be hit. 241 242 The sample code is as follows: 243 244 ```c 245 // Create an index using the basic syntax. The index name is diskann_l2_idx, index column is repr, type is gsdiskann, and the distance metric is L2. 246 OH_Rdb_ExecuteV2(store_, "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(data1 L2);", nullptr, nullptr); 247 248 // Delete the diskann_l2_idx index from the test table. 249 OH_Rdb_ExecuteV2(store_, "DROP INDEX test.diskann_l2_idx;", nullptr, nullptr); 250 251 // Set QUEUE_SIZE to 20 and OUT_DEGREE to 50 using the extended syntax. 252 OH_Rdb_ExecuteV2(store_, "CREATE INDEX diskann_l2_idx ON test USING GSDISKANN(repr L2) WITH (queue_size=20, out_degree=50);", nullptr, nullptr); 253 ``` 254 2558. Configure the data aging policy, which allows the application data to be automatically deleted by time or space. 256 257 The syntax is as follows: 258 259 ```sql 260 CREATE TABLE table_name(column_name type [, ...]) [WITH(parameter = value [, ...])]; 261 ``` 262 263 **parameter** specifies the parameter to set, and **value** specifies the value of the parameter. The following table describes the fields contained in **parameter**. 264 265 **Table 4** parameter (data aging policy parameters) 266 267 | Name| Mandatory| Description| 268 | ------ | -------- | ---------- | 269 | time_col | Yes| Column name The value must be an integer and cannot be empty.| 270 | interval | No| Interval for executing the aging task thread. If a write operation is performed after the interval, an aging task will be triggered to delete the data that meets the aging conditions. If the write operation is performed within the interval, no aging task will be triggered. <br>Value range: [5 second, 1 year]<br>Default value: **1 day**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).| 271 | ttl | No| Data retention period. <br>Value range: [1 hour, 1 year]<br>Default value: **3 month**<br>Time units supported include **second**, **minute**, **hour**, **day**, **month** and **year**. The value is case-insensitive and supports both singular and plural forms (for example, **2 hour** and **2 hours** are acceptable).| 272 | max_num | No| Maximum data volume allowed. <br>Value range: [100, 1024]<br>Default value: **1024**<br> After the aging task deletes expired data, if the remaining data in the table exceeds the value of **max_num**, data tied to the nearest expiration-adjacent time point will be deleted until the total row count falls below **max_num**.| 273 274 Time-related parameters are converted into seconds as follows. 275 276 | Unit| Value in Seconds| 277 | ------ | -------- | 278 | year | 365 * 24 * 60 * 60 | 279 | month | 30 * 24 * 60 * 60 | 280 | day | 24 * 60 * 60 | 281 | hour | 60 * 60 | 282 | minute | 60 | 283 284 For example, if **ttl** is set to **3 months**, the value will be converted into 7,776,000 seconds (3 x (30 x 24 x 60 x 60)). 285 286 The sample code is as follows: 287 288 ```c 289 // The write operation performed every 5 minutes will trigger a data aging task. 290 OH_Rdb_ExecuteV2(store_, "CREATE TABLE test2(rec_time integer not null) WITH (time_col = 'rec_time', interval = '5 minute');", nullptr, nullptr); 291 ``` 292 2939. Delete the database. The sample code is as follows: 294 295 ```c 296 OH_Rdb_CloseStore(store_); 297 OH_Rdb_DeleteStoreV2(config); 298 ``` 299