.. _module-pw_tokenizer: ============ pw_tokenizer ============ .. pigweed-module:: :name: pw_tokenizer Logging is critical, but developers are often forced to choose between additional logging or saving crucial flash space. The ``pw_tokenizer`` module enables **extensive logging with substantially less memory usage** by replacing printf-style strings with binary tokens during compilation. It is designed to integrate easily into existing logging systems. Although the most common application of ``pw_tokenizer`` is binary logging, **the tokenizer is general purpose and can be used to tokenize any strings**, with or without printf-style arguments. Why tokenize strings? * **Dramatically reduce binary size** by removing string literals from binaries. * **Reduce I/O traffic, RAM, and flash usage** by sending and storing compact tokens instead of strings. We've seen over 50% reduction in encoded log contents. * **Reduce CPU usage** by replacing snprintf calls with simple tokenization code. * **Remove potentially sensitive log, assert, and other strings** from binaries. .. grid:: 1 .. grid-item-card:: :octicon:`rocket` Get started :link: module-pw_tokenizer-get-started :link-type: ref :class-item: sales-pitch-cta-primary Integrate pw_tokenizer into your project. .. grid:: 2 .. grid-item-card:: :octicon:`code-square` Tokenization :link: module-pw_tokenizer-tokenization :link-type: ref :class-item: sales-pitch-cta-secondary Convert strings and arguments to tokens. .. grid-item-card:: :octicon:`code-square` Token databases :link: module-pw_tokenizer-token-databases :link-type: ref :class-item: sales-pitch-cta-secondary Store a mapping of tokens to the strings and arguments they represent. .. grid:: 2 .. grid-item-card:: :octicon:`code-square` Detokenization :link: module-pw_tokenizer-detokenization :link-type: ref :class-item: sales-pitch-cta-secondary Expand tokens back to the strings and arguments they represent. .. grid-item-card:: :octicon:`info` API reference :link: module-pw_tokenizer-api :link-type: ref :class-item: sales-pitch-cta-secondary Detailed reference information about the pw_tokenizer API. .. _module-pw_tokenizer-tokenized-logging-example: --------------------------- Tokenized logging in action --------------------------- Here's an example of how ``pw_tokenizer`` enables you to store and send the same logging information using significantly less resources: .. mermaid:: flowchart TD subgraph after["After: Tokenized Logs (37 bytes saved!)"] after_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 4 bytes stored on-device as... --> after_encoding["d9 28 47 8e"] -- 6 bytes sent over the wire as... --> after_transmission["d9 28 47 8e aa 3e"] -- Displayed in logs as... --> after_display["#quot;Battery Voltage: 3989 mV#quot;"] end subgraph before["Before: No Tokenization"] before_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 41 bytes stored on-device as... --> before_encoding["#quot;Battery Voltage: %d mV#quot;"] -- 43 bytes sent over the wire as... --> before_transmission["#quot;Battery Voltage: 3989 mV#quot;"] -- Displayed in logs as... --> before_display["#quot;Battery Voltage: 3989 mV#quot;"] end style after stroke:#00c852,stroke-width:3px style before stroke:#ff5252,stroke-width:3px A quick overview of how the tokenized version works: * You tokenize ``"Battery Voltage: %d mV"`` with a macro like :c:macro:`PW_TOKENIZE_STRING`. You can use :ref:`module-pw_log_tokenized` to handle the tokenization automatically. * After tokenization, ``"Battery Voltage: %d mV"`` becomes ``d9 28 47 8e``. * The first 4 bytes sent over the wire is the tokenized version of ``"Battery Voltage: %d mV"``. The last 2 bytes are the value of ``voltage`` converted to a varint using :ref:`module-pw_varint`. * The logs are converted back to the original, human-readable message via the :ref:`Detokenization API ` and a :ref:`token database `. .. toctree:: :hidden: :maxdepth: 1 Get started tokenization token_databases detokenization API reference