# Design Sketch: Packed Field Notation This document is provided for historical interest. This feature is now implemented in the form of the `$next` keyword. ## Motivation Many structures have many or most fields laid out consecutively, possibly with padding for alignment. For example: struct Simple: 0 [+2] UInt field_1 2 [+4] UInt field_2 6 [+2] UInt field_3 For simple structures of fixed-size fields, the main issue is unchecked redundancy: it is relatively easy to enter the wrong value for the field offset, and no compiler checks will help. For more complex structures with multiple variable-sized fields, this can lead to unwieldy offsets: struct Complex: 0 [+2] UInt header_length (h) 2 [+h] Header header 2+h [+2] UInt body_length (b) 4+h [+b] Body body 4+h+b [+4] UInt crc In both cases, there is some benefit to a shorthand notation that says 'this field should be placed immediately after the end of the lexically-previous field. ## Example (Exact syntax TBD.) struct Complex: 0 [+2] UInt header_length (h) $next [+h] Header header $next [+2] UInt body_length (b) $next [+b] Body body $next [+4] UInt crc It is tempting to use some more specialized, terser syntax, like: struct Complex: 0 [+2] UInt header_length (h) ^^ [+h] Header header ^^ [+2] UInt body_length (b) ^^ [+b] Body body ^^ [+4] UInt crc However, an explicit symbol has the advantage that you can use it in expressions, if needed: struct ComplexWithGap: 0 [+2] UInt header_length (h) $next [+h] Header header $next [+2] UInt body_length (b) # 2-byte reserved gap. $next+2 [+b] Body body $next [+4] UInt crc Or, with a (not-yet-implemented) `$align()` function: struct ComplexWithAlignment: 0 [+2] UInt header_length (h) $align($next, 4) [+h] Header header $align($next, 4) [+2] UInt body_length (b) $align($next, 4) [+b] Body body $align($next, 4) [+4] UInt crc ## Implementation Assuming the "new symbol" approach: 1. Pick a new symbol name -- preferably, come up with a few alternatives and do a quick survey. By convention, Emboss built-in symbols start with `$`. 2. Add the new name to `LITERAL_TOKEN_PATTERNS` in `compiler/front_end/tokenizer.py`. 3. Add a new production for `builtin-word -> "$new_symbol"` to the `_word()` function in `module_ir.py`. 4. Add a new compiler pass before `synthetics.synthesize_fields`, to replace the new symbol with the expanded representation. This should be relatively straightforward -- something that uses `fast_traverse_ir_top_down()` to find all `ir_data.Structure` elements in the IR, then iterates over the field offsets within each structure, and recursively replaces any `ir_data.Expression`s with a `builtin_reference.canonical_name.object_path[0]` equal to `"$new_symbol"`. It would probably be useful to make `traverse_ir._fast_traverse_proto_top_down()` into a public function, so that you do not have to re-write `Expression` traversal. For this change, the back end should not need any modifications.