token_databases.rst - OpenGrok cross reference for /external/pigweed/pw_tokenizer/token

Lines Matching full:database
10 file can be used as a token database, but it only contains the strings for its
11 exact build. A token database file aggregates tokens from multiple ELF files, so
12 that a single database can decode tokenized strings from any known ELF.
18 Token database formats
20 Three token database formats are supported: CSV, binary, and directory. Tokens
24 CSV database format
26 The CSV database format has three columns: the token in hexadecimal, the removal
31 This example database contains six strings, three of which have removal dates.
42 Binary database format
44 The binary database format is comprised of a 16-byte header followed by a series
51 The binary form of the CSV database is shown below. It contains the same
53 compared with the CSV database's 211 B.
76 .. _module-pw_tokenizer-directory-database-format:
78 Directory database format
80 pw_tokenizer can consume directories of CSV databases. A directory database
84 An example directory database might look something like this:
89    ├── database.pw_tokenizer.csv
94 The token database commands randomly generate unique file names for the CSVs in
95 the database to prevent merge conflicts. Running ``mark_removed`` or ``purge``
96 commands in the database CLI consolidates the files to a single CSV.
98 The database command line tool supports a ``--discard-temporary
103 debug logs) out of the database.
107 While pw_tokenizer doesn't specify a JSON database format, a token database can
109 token database generation for strings that are not embedded as parsable tokens
110 in compiled binaries. See :ref:`module-pw_tokenizer-database-creation` for
111 instructions on generating a token database from a JSON file.
118 Token databases are managed with the ``database.py`` script. This script can be
119 used to extract tokens from compilation artifacts and manage database files.
120 Invoke ``database.py`` with ``-h`` for full usage information.
124 file to experiment with the ``database.py`` commands.
126 .. _module-pw_tokenizer-database-creation:
128 Create a database
130 The ``create`` command makes a new token database from ELF files (.elf, .o, .so,
136    ./database.py create --database DATABASE_NAME ELF_OR_DATABASE_FILE...
138 Two database output formats are supported: CSV and binary. Provide
139 ``--type binary`` to ``create`` to generate a binary database instead of the
144 .. _module-pw_tokenizer-update-token-database:  argument
146 Update a database
148 As new tokenized strings are added, update the database with the ``add``
153    ./database.py add --database DATABASE_NAME ELF_OR_DATABASE_FILE...
155 This command adds new tokens from ELF files or other databases to the database.
156 Adding tokens already present in the database updates the date removed, if any,
159 A CSV token database can be checked into a source repository and updated as code
160 changes are made. The build system can invoke ``database.py`` to update the
161 database after each build.
167 ``$dir_pw_tokenizer/database.gni`` automatically updates an in-source tokenized
168 strings database or creates a new database with artifacts from one or more GN
169 targets or other database files.
171 To create a new database, set the ``create`` variable to the desired database
172 type (``"csv"`` or ``"binary"``). The database will be created in the output
173 directory. To update an existing database, provide the path to the database with
174 the ``database`` variable.
180    import("$dir_pw_tokenizer/database.gni")
183      database = "database_in_the_source_tree.csv"
194      database = "database_in_the_source_tree.csv"
203    when the database is updated. Provide ``targets`` or ``deps`` or build other
210 ``$dir_pw_tokenizer/database.cmake`` automatically updates an in-source tokenized
211 strings database or creates a new database with artifacts from a CMake target.
213 To create a new database, set the ``CREATE`` variable to the desired database
214 type (``"csv"`` or ``"binary"``). The database will be created in the output
219    include("$dir_pw_tokenizer/database.cmake")
227 To update an existing database, provide the path to the database with
228 the ``database`` variable.
233      DATABASE database_in_the_source_tree.csv
245 the same token in the database, and it may not be possible to unambiguously
252 - if / when the string was marked as having been removed from the database.
257 ``python -m pw_tokenizer.database report <database>`` to see information about a
258 token database, including any collisions.
273      python -m pw_tokenizer.database mark_removed --database <database> <ELF files>
275   The ``purge`` command may be used to delete these tokens from the database.