1=========================== 2TableGen Language Reference 3=========================== 4 5.. contents:: 6 :local: 7 8.. warning:: 9 This document is extremely rough. If you find something lacking, please 10 fix it, file a documentation bug, or ask about it on llvm-dev. 11 12Introduction 13============ 14 15This document is meant to be a normative spec about the TableGen language 16in and of itself (i.e. how to understand a given construct in terms of how 17it affects the final set of records represented by the TableGen file). If 18you are unsure if this document is really what you are looking for, please 19read the :doc:`introduction to TableGen <index>` first. 20 21Notation 22======== 23 24The lexical and syntax notation used here is intended to imitate 25`Python's`_. In particular, for lexical definitions, the productions 26operate at the character level and there is no implied whitespace between 27elements. The syntax definitions operate at the token level, so there is 28implied whitespace between tokens. 29 30.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation 31 32Lexical Analysis 33================ 34 35TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``) 36comments. 37 38The following is a listing of the basic punctuation tokens:: 39 40 - + [ ] { } ( ) < > : ; . = ? # 41 42Numeric literals take one of the following forms: 43 44.. TableGen actually will lex some pretty strange sequences an interpret 45 them as numbers. What is shown here is an attempt to approximate what it 46 "should" accept. 47 48.. productionlist:: 49 TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger` 50 DecimalInteger: ["+" | "-"] ("0"..."9")+ 51 HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+ 52 BinInteger: "0b" ("0" | "1")+ 53 54One aspect to note is that the :token:`DecimalInteger` token *includes* the 55``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as 56most languages do. 57 58Also note that :token:`BinInteger` creates a value of type ``bits<n>`` 59(where ``n`` is the number of bits). This will implicitly convert to 60integers when needed. 61 62TableGen has identifier-like tokens: 63 64.. productionlist:: 65 ualpha: "a"..."z" | "A"..."Z" | "_" 66 TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")* 67 TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")* 68 69Note that unlike most languages, TableGen allows :token:`TokIdentifier` to 70begin with a number. In case of ambiguity, a token will be interpreted as a 71numeric literal rather than an identifier. 72 73TableGen also has two string-like literals: 74 75.. productionlist:: 76 TokString: '"' <non-'"' characters and C-like escapes> '"' 77 TokCodeFragment: "[{" <shortest text not containing "}]"> "}]" 78 79:token:`TokCodeFragment` is essentially a multiline string literal 80delimited by ``[{`` and ``}]``. 81 82.. note:: 83 The current implementation accepts the following C-like escapes:: 84 85 \\ \' \" \t \n 86 87TableGen also has the following keywords:: 88 89 bit bits class code dag 90 def foreach defm field in 91 int let list multiclass string 92 93TableGen also has "bang operators" which have a 94wide variety of meanings: 95 96.. productionlist:: 97 BangOperator: one of 98 :!eq !if !head !tail !con 99 :!add !shl !sra !srl !and 100 :!cast !empty !subst !foreach !listconcat !strconcat 101 102Syntax 103====== 104 105TableGen has an ``include`` mechanism. It does not play a role in the 106syntax per se, since it is lexically replaced with the contents of the 107included file. 108 109.. productionlist:: 110 IncludeDirective: "include" `TokString` 111 112TableGen's top-level production consists of "objects". 113 114.. productionlist:: 115 TableGenFile: `Object`* 116 Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach` 117 118``class``\es 119------------ 120 121.. productionlist:: 122 Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody` 123 124A ``class`` declaration creates a record which other records can inherit 125from. A class can be parametrized by a list of "template arguments", whose 126values can be used in the class body. 127 128A given class can only be defined once. A ``class`` declaration is 129considered to define the class if any of the following is true: 130 131.. break ObjectBody into its consituents so that they are present here? 132 133#. The :token:`TemplateArgList` is present. 134#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty. 135#. The :token:`BaseClassList` in the :token:`ObjectBody` is present. 136 137You can declare an empty class by giving and empty :token:`TemplateArgList` 138and an empty :token:`ObjectBody`. This can serve as a restricted form of 139forward declaration: note that records deriving from the forward-declared 140class will inherit no fields from it since the record expansion is done 141when the record is parsed. 142 143.. productionlist:: 144 TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">" 145 146Declarations 147------------ 148 149.. Omitting mention of arcane "field" prefix to discourage its use. 150 151The declaration syntax is pretty much what you would expect as a C++ 152programmer. 153 154.. productionlist:: 155 Declaration: `Type` `TokIdentifier` ["=" `Value`] 156 157It assigns the value to the identifier. 158 159Types 160----- 161 162.. productionlist:: 163 Type: "string" | "code" | "bit" | "int" | "dag" 164 :| "bits" "<" `TokInteger` ">" 165 :| "list" "<" `Type` ">" 166 :| `ClassID` 167 ClassID: `TokIdentifier` 168 169Both ``string`` and ``code`` correspond to the string type; the difference 170is purely to indicate programmer intention. 171 172The :token:`ClassID` must identify a class that has been previously 173declared or defined. 174 175Values 176------ 177 178.. productionlist:: 179 Value: `SimpleValue` `ValueSuffix`* 180 ValueSuffix: "{" `RangeList` "}" 181 :| "[" `RangeList` "]" 182 :| "." `TokIdentifier` 183 RangeList: `RangePiece` ("," `RangePiece`)* 184 RangePiece: `TokInteger` 185 :| `TokInteger` "-" `TokInteger` 186 :| `TokInteger` `TokInteger` 187 188The peculiar last form of :token:`RangePiece` is due to the fact that the 189"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as 190two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``, 191instead of "1", "-", and "5". 192The :token:`RangeList` can be thought of as specifying "list slice" in some 193contexts. 194 195 196:token:`SimpleValue` has a number of forms: 197 198 199.. productionlist:: 200 SimpleValue: `TokIdentifier` 201 202The value will be the variable referenced by the identifier. It can be one 203of: 204 205.. The code for this is exceptionally abstruse. These examples are a 206 best-effort attempt. 207 208* name of a ``def``, such as the use of ``Bar`` in:: 209 210 def Bar : SomeClass { 211 int X = 5; 212 } 213 214 def Foo { 215 SomeClass Baz = Bar; 216 } 217 218* value local to a ``def``, such as the use of ``Bar`` in:: 219 220 def Foo { 221 int Bar = 5; 222 int Baz = Bar; 223 } 224 225* a template arg of a ``class``, such as the use of ``Bar`` in:: 226 227 class Foo<int Bar> { 228 int Baz = Bar; 229 } 230 231* value local to a ``multiclass``, such as the use of ``Bar`` in:: 232 233 multiclass Foo { 234 int Bar = 5; 235 int Baz = Bar; 236 } 237 238* a template arg to a ``multiclass``, such as the use of ``Bar`` in:: 239 240 multiclass Foo<int Bar> { 241 int Baz = Bar; 242 } 243 244.. productionlist:: 245 SimpleValue: `TokInteger` 246 247This represents the numeric value of the integer. 248 249.. productionlist:: 250 SimpleValue: `TokString`+ 251 252Multiple adjacent string literals are concatenated like in C/C++. The value 253is the concatenation of the strings. 254 255.. productionlist:: 256 SimpleValue: `TokCodeFragment` 257 258The value is the string value of the code fragment. 259 260.. productionlist:: 261 SimpleValue: "?" 262 263``?`` represents an "unset" initializer. 264 265.. productionlist:: 266 SimpleValue: "{" `ValueList` "}" 267 ValueList: [`ValueListNE`] 268 ValueListNE: `Value` ("," `Value`)* 269 270This represents a sequence of bits, as would be used to initialize a 271``bits<n>`` field (where ``n`` is the number of bits). 272 273.. productionlist:: 274 SimpleValue: `ClassID` "<" `ValueListNE` ">" 275 276This generates a new anonymous record definition (as would be created by an 277unnamed ``def`` inheriting from the given class with the given template 278arguments) and the value is the value of that record definition. 279 280.. productionlist:: 281 SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"] 282 283A list initializer. The optional :token:`Type` can be used to indicate a 284specific element type, otherwise the element type will be deduced from the 285given values. 286 287.. The initial `DagArg` of the dag must start with an identifier or 288 !cast, but this is more of an implementation detail and so for now just 289 leave it out. 290 291.. productionlist:: 292 SimpleValue: "(" `DagArg` `DagArgList` ")" 293 DagArgList: `DagArg` ("," `DagArg`)* 294 DagArg: `Value` [":" `TokVarName`] | `TokVarName` 295 296The initial :token:`DagArg` is called the "operator" of the dag. 297 298.. productionlist:: 299 SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")" 300 301Bodies 302------ 303 304.. productionlist:: 305 ObjectBody: `BaseClassList` `Body` 306 BaseClassList: [":" `BaseClassListNE`] 307 BaseClassListNE: `SubClassRef` ("," `SubClassRef`)* 308 SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"] 309 DefmID: `TokIdentifier` 310 311The version with the :token:`MultiClassID` is only valid in the 312:token:`BaseClassList` of a ``defm``. 313The :token:`MultiClassID` should be the name of a ``multiclass``. 314 315.. put this somewhere else 316 317It is after parsing the base class list that the "let stack" is applied. 318 319.. productionlist:: 320 Body: ";" | "{" BodyList "}" 321 BodyList: BodyItem* 322 BodyItem: `Declaration` ";" 323 :| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";" 324 325The ``let`` form allows overriding the value of an inherited field. 326 327``def`` 328------- 329 330.. TODO:: 331 There can be pastes in the names here, like ``#NAME#``. Look into that 332 and document it (it boils down to ParseIDValue with IDParseMode == 333 ParseNameMode). ParseObjectName calls into the general ParseValue, with 334 the only different from "arbitrary expression parsing" being IDParseMode 335 == Mode. 336 337.. productionlist:: 338 Def: "def" `TokIdentifier` `ObjectBody` 339 340Defines a record whose name is given by the :token:`TokIdentifier`. The 341fields of the record are inherited from the base classes and defined in the 342body. 343 344Special handling occurs if this ``def`` appears inside a ``multiclass`` or 345a ``foreach``. 346 347``defm`` 348-------- 349 350.. productionlist:: 351 Defm: "defm" `TokIdentifier` ":" `BaseClassListNE` ";" 352 353Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must 354precede any ``class``'s that appear. 355 356``foreach`` 357----------- 358 359.. productionlist:: 360 Foreach: "foreach" `Declaration` "in" "{" `Object`* "}" 361 :| "foreach" `Declaration` "in" `Object` 362 363The value assigned to the variable in the declaration is iterated over and 364the object or object list is reevaluated with the variable set at each 365iterated value. 366 367Top-Level ``let`` 368----------------- 369 370.. productionlist:: 371 Let: "let" `LetList` "in" "{" `Object`* "}" 372 :| "let" `LetList` "in" `Object` 373 LetList: `LetItem` ("," `LetItem`)* 374 LetItem: `TokIdentifier` [`RangeList`] "=" `Value` 375 376This is effectively equivalent to ``let`` inside the body of a record 377except that it applies to multiple records at a time. The bindings are 378applied at the end of parsing the base classes of a record. 379 380``multiclass`` 381-------------- 382 383.. productionlist:: 384 MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`] 385 : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}" 386 BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)* 387 MultiClassID: `TokIdentifier` 388 MultiClassObject: `Def` | `Defm` | `Let` | `Foreach` 389