token_databases.rst - OpenGrok cross reference for /external/pigweed/pw_tokenizer/token

Lines Matching full:database
10 file can be used as a token database, but it only contains the strings for its
11 exact build. A token database file aggregates tokens from multiple ELF files, so
12 that a single database can decode tokenized strings from any known ELF.
18 Token database formats
20 Three token database formats are supported: CSV, binary, and directory. Tokens
24 CSV database format
26 The CSV database format has four columns: the token in hexadecimal, the removal
31 This example database contains six strings, three of which have removal dates.
47 Binary database format
49 The binary database format is comprised of a 16-byte header followed by a series
56 The binary form of the CSV database is shown below. It contains the same
58 compared with the CSV database's 211 B.
81 .. _module-pw_tokenizer-directory-database-format:
83 Directory database format
85 pw_tokenizer can consume directories of CSV databases. A directory database
89 An example directory database might look something like this:
94    ├── database.pw_tokenizer.csv
99 The token database commands randomly generate unique file names for the CSVs in
100 the database to prevent merge conflicts. Running ``mark_removed`` or ``purge``
101 commands in the database CLI consolidates the files to a single CSV.
103 The database command line tool supports a ``--discard-temporary
108 debug logs) out of the database.
110 ELF section database format
123 While pw_tokenizer doesn't specify a JSON database format, a token database can
125 token database generation for strings that are not embedded as parsable tokens
126 in compiled binaries. See :ref:`module-pw_tokenizer-database-creation` for
127 instructions on generating a token database from a JSON file.
134 Token databases are managed with the ``database.py`` script. This script can be
135 used to extract tokens from compilation artifacts and manage database files.
136 Invoke ``database.py`` with ``-h`` for full usage information.
140 file to experiment with the ``database.py`` commands.
142 .. _module-pw_tokenizer-database-creation:
144 Create a database
146 The ``create`` command makes a new token database from ELF files (.elf, .o, .so,
152    $ ./database.py create --database DATABASE_NAME ELF_OR_DATABASE_FILE...
154 Two database output formats are supported: CSV and binary. Provide
155 ``--type binary`` to ``create`` to generate a binary database instead of the
160 .. _module-pw_tokenizer-update-token-database:  argument
162 Update a database
164 As new tokenized strings are added, update the database with the ``add``
169    $ ./database.py add --database DATABASE_NAME ELF_OR_DATABASE_FILE...
171 This command adds new tokens from ELF files or other databases to the database.
172 Adding tokens already present in the database updates the date removed, if any,
175 A CSV token database can be checked into a source repository and updated as code
176 changes are made. The build system can invoke ``database.py`` to update the
177 database after each build.
183 ``$dir_pw_tokenizer/database.gni`` automatically updates an in-source tokenized
184 strings database or creates a new database with artifacts from one or more GN
185 targets or other database files.
187 To create a new database, set the ``create`` variable to the desired database
188 type (``"csv"`` or ``"binary"``). The database will be created in the output
189 directory. To update an existing database, provide the path to the database with
190 the ``database`` variable.
196    import("$dir_pw_tokenizer/database.gni")
199      database = "database_in_the_source_tree.csv"
210      database = "database_in_the_source_tree.csv"
219    when the database is updated. Provide ``targets`` or ``deps`` or build other
226 ``$dir_pw_tokenizer/database.cmake`` automatically updates an in-source tokenized
227 strings database or creates a new database with artifacts from a CMake target.
229 To create a new database, set the ``CREATE`` variable to the desired database
230 type (``"csv"`` or ``"binary"``). The database will be created in the output
235    include("$dir_pw_tokenizer/database.cmake")
243 To update an existing database, provide the path to the database with
244 the ``database`` variable.
249      DATABASE database_in_the_source_tree.csv
261 the same token in the database, and it may not be possible to unambiguously
268 - if / when the string was marked as having been removed from the database.
273 ``python -m pw_tokenizer.database report <database>`` to see information about a
274 token database, including any collisions.
289      $ python -m pw_tokenizer.database mark_removed --database <database> <ELF files>
291   The ``purge`` command may be used to delete these tokens from the database.