1=========================== 2TableGen Language Reference 3=========================== 4 5.. contents:: 6 :local: 7 8.. warning:: 9 This document is extremely rough. If you find something lacking, please 10 fix it, file a documentation bug, or ask about it on llvm-dev. 11 12Introduction 13============ 14 15This document is meant to be a normative spec about the TableGen language 16in and of itself (i.e. how to understand a given construct in terms of how 17it affects the final set of records represented by the TableGen file). If 18you are unsure if this document is really what you are looking for, please 19read the :doc:`introduction to TableGen <index>` first. 20 21Notation 22======== 23 24The lexical and syntax notation used here is intended to imitate 25`Python's`_. In particular, for lexical definitions, the productions 26operate at the character level and there is no implied whitespace between 27elements. The syntax definitions operate at the token level, so there is 28implied whitespace between tokens. 29 30.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation 31 32Lexical Analysis 33================ 34 35TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``) 36comments. 37 38The following is a listing of the basic punctuation tokens:: 39 40 - + [ ] { } ( ) < > : ; . = ? # 41 42Numeric literals take one of the following forms: 43 44.. TableGen actually will lex some pretty strange sequences an interpret 45 them as numbers. What is shown here is an attempt to approximate what it 46 "should" accept. 47 48.. productionlist:: 49 TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger` 50 DecimalInteger: ["+" | "-"] ("0"..."9")+ 51 HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+ 52 BinInteger: "0b" ("0" | "1")+ 53 54One aspect to note is that the :token:`DecimalInteger` token *includes* the 55``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as 56most languages do. 57 58Also note that :token:`BinInteger` creates a value of type ``bits<n>`` 59(where ``n`` is the number of bits). This will implicitly convert to 60integers when needed. 61 62TableGen has identifier-like tokens: 63 64.. productionlist:: 65 ualpha: "a"..."z" | "A"..."Z" | "_" 66 TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")* 67 TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")* 68 69Note that unlike most languages, TableGen allows :token:`TokIdentifier` to 70begin with a number. In case of ambiguity, a token will be interpreted as a 71numeric literal rather than an identifier. 72 73TableGen also has two string-like literals: 74 75.. productionlist:: 76 TokString: '"' <non-'"' characters and C-like escapes> '"' 77 TokCodeFragment: "[{" <shortest text not containing "}]"> "}]" 78 79:token:`TokCodeFragment` is essentially a multiline string literal 80delimited by ``[{`` and ``}]``. 81 82.. note:: 83 The current implementation accepts the following C-like escapes:: 84 85 \\ \' \" \t \n 86 87TableGen also has the following keywords:: 88 89 bit bits class code dag 90 def foreach defm field in 91 int let list multiclass string 92 93TableGen also has "bang operators" which have a 94wide variety of meanings: 95 96.. productionlist:: 97 BangOperator: one of 98 :!eq !if !head !tail !con 99 :!add !shl !sra !srl !and 100 :!or !empty !subst !foreach !strconcat 101 :!cast !listconcat !size !foldl 102 :!isa !dag !le !lt !ge 103 :!gt !ne 104 105 106Syntax 107====== 108 109TableGen has an ``include`` mechanism. It does not play a role in the 110syntax per se, since it is lexically replaced with the contents of the 111included file. 112 113.. productionlist:: 114 IncludeDirective: "include" `TokString` 115 116TableGen's top-level production consists of "objects". 117 118.. productionlist:: 119 TableGenFile: `Object`* 120 Object: `Class` | `Def` | `Defm` | `Defset` | `Let` | `MultiClass` | 121 `Foreach` 122 123``class``\es 124------------ 125 126.. productionlist:: 127 Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody` 128 TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">" 129 130A ``class`` declaration creates a record which other records can inherit 131from. A class can be parametrized by a list of "template arguments", whose 132values can be used in the class body. 133 134A given class can only be defined once. A ``class`` declaration is 135considered to define the class if any of the following is true: 136 137.. break ObjectBody into its consituents so that they are present here? 138 139#. The :token:`TemplateArgList` is present. 140#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty. 141#. The :token:`BaseClassList` in the :token:`ObjectBody` is present. 142 143You can declare an empty class by giving and empty :token:`TemplateArgList` 144and an empty :token:`ObjectBody`. This can serve as a restricted form of 145forward declaration: note that records deriving from the forward-declared 146class will inherit no fields from it since the record expansion is done 147when the record is parsed. 148 149Every class has an implicit template argument called ``NAME``, which is set 150to the name of the instantiating ``def`` or ``defm``. The result is undefined 151if the class is instantiated by an anonymous record. 152 153Declarations 154------------ 155 156.. Omitting mention of arcane "field" prefix to discourage its use. 157 158The declaration syntax is pretty much what you would expect as a C++ 159programmer. 160 161.. productionlist:: 162 Declaration: `Type` `TokIdentifier` ["=" `Value`] 163 164It assigns the value to the identifier. 165 166Types 167----- 168 169.. productionlist:: 170 Type: "string" | "code" | "bit" | "int" | "dag" 171 :| "bits" "<" `TokInteger` ">" 172 :| "list" "<" `Type` ">" 173 :| `ClassID` 174 ClassID: `TokIdentifier` 175 176Both ``string`` and ``code`` correspond to the string type; the difference 177is purely to indicate programmer intention. 178 179The :token:`ClassID` must identify a class that has been previously 180declared or defined. 181 182Values 183------ 184 185.. productionlist:: 186 Value: `SimpleValue` `ValueSuffix`* 187 ValueSuffix: "{" `RangeList` "}" 188 :| "[" `RangeList` "]" 189 :| "." `TokIdentifier` 190 RangeList: `RangePiece` ("," `RangePiece`)* 191 RangePiece: `TokInteger` 192 :| `TokInteger` "-" `TokInteger` 193 :| `TokInteger` `TokInteger` 194 195The peculiar last form of :token:`RangePiece` is due to the fact that the 196"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as 197two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``, 198instead of "1", "-", and "5". 199The :token:`RangeList` can be thought of as specifying "list slice" in some 200contexts. 201 202 203:token:`SimpleValue` has a number of forms: 204 205 206.. productionlist:: 207 SimpleValue: `TokIdentifier` 208 209The value will be the variable referenced by the identifier. It can be one 210of: 211 212.. The code for this is exceptionally abstruse. These examples are a 213 best-effort attempt. 214 215* name of a ``def``, such as the use of ``Bar`` in:: 216 217 def Bar : SomeClass { 218 int X = 5; 219 } 220 221 def Foo { 222 SomeClass Baz = Bar; 223 } 224 225* value local to a ``def``, such as the use of ``Bar`` in:: 226 227 def Foo { 228 int Bar = 5; 229 int Baz = Bar; 230 } 231 232 Values defined in superclasses can be accessed the same way. 233 234* a template arg of a ``class``, such as the use of ``Bar`` in:: 235 236 class Foo<int Bar> { 237 int Baz = Bar; 238 } 239 240* value local to a ``class``, such as the use of ``Bar`` in:: 241 242 class Foo { 243 int Bar = 5; 244 int Baz = Bar; 245 } 246 247* a template arg to a ``multiclass``, such as the use of ``Bar`` in:: 248 249 multiclass Foo<int Bar> { 250 def : SomeClass<Bar>; 251 } 252 253* the iteration variable of a ``foreach``, such as the use of ``i`` in:: 254 255 foreach i = 0-5 in 256 def Foo#i; 257 258* a variable defined by ``defset`` 259 260* the implicit template argument ``NAME`` in a ``class`` or ``multiclass`` 261 262.. productionlist:: 263 SimpleValue: `TokInteger` 264 265This represents the numeric value of the integer. 266 267.. productionlist:: 268 SimpleValue: `TokString`+ 269 270Multiple adjacent string literals are concatenated like in C/C++. The value 271is the concatenation of the strings. 272 273.. productionlist:: 274 SimpleValue: `TokCodeFragment` 275 276The value is the string value of the code fragment. 277 278.. productionlist:: 279 SimpleValue: "?" 280 281``?`` represents an "unset" initializer. 282 283.. productionlist:: 284 SimpleValue: "{" `ValueList` "}" 285 ValueList: [`ValueListNE`] 286 ValueListNE: `Value` ("," `Value`)* 287 288This represents a sequence of bits, as would be used to initialize a 289``bits<n>`` field (where ``n`` is the number of bits). 290 291.. productionlist:: 292 SimpleValue: `ClassID` "<" `ValueListNE` ">" 293 294This generates a new anonymous record definition (as would be created by an 295unnamed ``def`` inheriting from the given class with the given template 296arguments) and the value is the value of that record definition. 297 298.. productionlist:: 299 SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"] 300 301A list initializer. The optional :token:`Type` can be used to indicate a 302specific element type, otherwise the element type will be deduced from the 303given values. 304 305.. The initial `DagArg` of the dag must start with an identifier or 306 !cast, but this is more of an implementation detail and so for now just 307 leave it out. 308 309.. productionlist:: 310 SimpleValue: "(" `DagArg` [`DagArgList`] ")" 311 DagArgList: `DagArg` ("," `DagArg`)* 312 DagArg: `Value` [":" `TokVarName`] | `TokVarName` 313 314The initial :token:`DagArg` is called the "operator" of the dag. 315 316.. productionlist:: 317 SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")" 318 319Bodies 320------ 321 322.. productionlist:: 323 ObjectBody: `BaseClassList` `Body` 324 BaseClassList: [":" `BaseClassListNE`] 325 BaseClassListNE: `SubClassRef` ("," `SubClassRef`)* 326 SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"] 327 DefmID: `TokIdentifier` 328 329The version with the :token:`MultiClassID` is only valid in the 330:token:`BaseClassList` of a ``defm``. 331The :token:`MultiClassID` should be the name of a ``multiclass``. 332 333.. put this somewhere else 334 335It is after parsing the base class list that the "let stack" is applied. 336 337.. productionlist:: 338 Body: ";" | "{" BodyList "}" 339 BodyList: BodyItem* 340 BodyItem: `Declaration` ";" 341 :| "let" `TokIdentifier` [ "{" `RangeList` "}" ] "=" `Value` ";" 342 343The ``let`` form allows overriding the value of an inherited field. 344 345``def`` 346------- 347 348.. productionlist:: 349 Def: "def" [`Value`] `ObjectBody` 350 351Defines a record whose name is given by the optional :token:`Value`. The value 352is parsed in a special mode where global identifiers (records and variables 353defined by ``defset``) are not recognized, and all unrecognized identifiers 354are interpreted as strings. 355 356If no name is given, the record is anonymous. The final name of anonymous 357records is undefined, but globally unique. 358 359Special handling occurs if this ``def`` appears inside a ``multiclass`` or 360a ``foreach``. 361 362When a non-anonymous record is defined in a multiclass and the given name 363does not contain a reference to the implicit template argument ``NAME``, such 364a reference will automatically be prepended. That is, the following are 365equivalent inside a multiclass:: 366 367 def Foo; 368 def NAME#Foo; 369 370``defm`` 371-------- 372 373.. productionlist:: 374 Defm: "defm" [`Value`] ":" `BaseClassListNE` ";" 375 376The :token:`BaseClassList` is a list of at least one ``multiclass`` and any 377number of ``class``'s. The ``multiclass``'s must occur before any ``class``'s. 378 379Instantiates all records defined in all given ``multiclass``'s and adds the 380given ``class``'s as superclasses. 381 382The name is parsed in the same special mode used by ``def``. If the name is 383missing, a globally unique string is used instead (but instantiated records 384are not considered to be anonymous, unless they were originally defined by an 385anonymous ``def``) That is, the following have different semantics:: 386 387 defm : SomeMultiClass<...>; // some globally unique name 388 defm "" : SomeMultiClass<...>; // empty name string 389 390When it occurs inside a multiclass, the second variant is equivalent to 391``defm NAME : ...``. More generally, when ``defm`` occurs in a multiclass and 392its name does not contain a reference to the implicit template argument 393``NAME``, such a reference will automatically be prepended. That is, the 394following are equivalent inside a multiclass:: 395 396 defm Foo : SomeMultiClass<...>; 397 defm NAME#Foo : SomeMultiClass<...>; 398 399``defset`` 400---------- 401.. productionlist:: 402 Defset: "defset" `Type` `TokIdentifier` "=" "{" `Object`* "}" 403 404All records defined inside the braces via ``def`` and ``defm`` are collected 405in a globally accessible list of the given name (in addition to being added 406to the global collection of records as usual). Anonymous records created inside 407initializier expressions using the ``Class<args...>`` syntax are never collected 408in a defset. 409 410The given type must be ``list<A>``, where ``A`` is some class. It is an error 411to define a record (via ``def`` or ``defm``) inside the braces which doesn't 412derive from ``A``. 413 414``foreach`` 415----------- 416 417.. productionlist:: 418 Foreach: "foreach" `ForeachDeclaration` "in" "{" `Object`* "}" 419 :| "foreach" `ForeachDeclaration` "in" `Object` 420 ForeachDeclaration: ID "=" ( "{" `RangeList` "}" | `RangePiece` | `Value` ) 421 422The value assigned to the variable in the declaration is iterated over and 423the object or object list is reevaluated with the variable set at each 424iterated value. 425 426Note that the productions involving RangeList and RangePiece have precedence 427over the more generic value parsing based on the first token. 428 429Top-Level ``let`` 430----------------- 431 432.. productionlist:: 433 Let: "let" `LetList` "in" "{" `Object`* "}" 434 :| "let" `LetList` "in" `Object` 435 LetList: `LetItem` ("," `LetItem`)* 436 LetItem: `TokIdentifier` [`RangeList`] "=" `Value` 437 438This is effectively equivalent to ``let`` inside the body of a record 439except that it applies to multiple records at a time. The bindings are 440applied at the end of parsing the base classes of a record. 441 442``multiclass`` 443-------------- 444 445.. productionlist:: 446 MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`] 447 : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}" 448 BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)* 449 MultiClassID: `TokIdentifier` 450 MultiClassObject: `Def` | `Defm` | `Let` | `Foreach` 451