1======== 2TableGen 3======== 4 5.. contents:: 6 :local: 7 8.. toctree:: 9 :hidden: 10 11 BackEnds 12 LangRef 13 LangIntro 14 Deficiencies 15 16Introduction 17============ 18 19TableGen's purpose is to help a human develop and maintain records of 20domain-specific information. Because there may be a large number of these 21records, it is specifically designed to allow writing flexible descriptions and 22for common features of these records to be factored out. This reduces the 23amount of duplication in the description, reduces the chance of error, and makes 24it easier to structure domain specific information. 25 26The core part of TableGen parses a file, instantiates the declarations, and 27hands the result off to a domain-specific `backend`_ for processing. 28 29The current major users of TableGen are :doc:`../CodeGenerator` 30and the 31`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_. 32 33Note that if you work on TableGen much, and use emacs or vim, that you can find 34an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and 35``llvm/utils/vim`` directories of your LLVM distribution, respectively. 36 37.. _intro: 38 39 40The TableGen program 41==================== 42 43TableGen files are interpreted by the TableGen program: `llvm-tblgen` available 44on your build directory under `bin`. It is not installed in the system (or where 45your sysroot is set to), since it has no use beyond LLVM's build process. 46 47Running TableGen 48---------------- 49 50TableGen runs just like any other LLVM tool. The first (optional) argument 51specifies the file to read. If a filename is not specified, ``llvm-tblgen`` 52reads from standard input. 53 54To be useful, one of the `backends`_ must be used. These backends are 55selectable on the command line (type '``llvm-tblgen -help``' for a list). For 56example, to get a list of all of the definitions that subclass a particular type 57(which can be useful for building up an enum list of these records), use the 58``-print-enums`` option: 59 60.. code-block:: bash 61 62 $ llvm-tblgen X86.td -print-enums -class=Register 63 AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX, 64 ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP, 65 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D, 66 R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15, 67 R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI, 68 RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, 69 XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5, 70 XMM6, XMM7, XMM8, XMM9, 71 72 $ llvm-tblgen X86.td -print-enums -class=Instruction 73 ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri, 74 ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8, 75 ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm, 76 ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, 77 ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... 78 79The default backend prints out all of the records. 80 81If you plan to use TableGen, you will most likely have to write a `backend`_ 82that extracts the information specific to what you need and formats it in the 83appropriate way. 84 85Example 86------- 87 88With no other arguments, `llvm-tblgen` parses the specified file and prints out all 89of the classes, then all of the definitions. This is a good way to see what the 90various definitions expand to fully. Running this on the ``X86.td`` file prints 91this (at the time of this writing): 92 93.. code-block:: llvm 94 95 ... 96 def ADD32rr { // Instruction X86Inst I 97 string Namespace = "X86"; 98 dag OutOperandList = (outs GR32:$dst); 99 dag InOperandList = (ins GR32:$src1, GR32:$src2); 100 string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; 101 list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; 102 list<Register> Uses = []; 103 list<Register> Defs = [EFLAGS]; 104 list<Predicate> Predicates = []; 105 int CodeSize = 3; 106 int AddedComplexity = 0; 107 bit isReturn = 0; 108 bit isBranch = 0; 109 bit isIndirectBranch = 0; 110 bit isBarrier = 0; 111 bit isCall = 0; 112 bit canFoldAsLoad = 0; 113 bit mayLoad = 0; 114 bit mayStore = 0; 115 bit isImplicitDef = 0; 116 bit isConvertibleToThreeAddress = 1; 117 bit isCommutable = 1; 118 bit isTerminator = 0; 119 bit isReMaterializable = 0; 120 bit isPredicable = 0; 121 bit hasDelaySlot = 0; 122 bit usesCustomInserter = 0; 123 bit hasCtrlDep = 0; 124 bit isNotDuplicable = 0; 125 bit hasSideEffects = 0; 126 InstrItinClass Itinerary = NoItinerary; 127 string Constraints = ""; 128 string DisableEncoding = ""; 129 bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; 130 Format Form = MRMDestReg; 131 bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; 132 ImmType ImmT = NoImm; 133 bits<3> ImmTypeBits = { 0, 0, 0 }; 134 bit hasOpSizePrefix = 0; 135 bit hasAdSizePrefix = 0; 136 bits<4> Prefix = { 0, 0, 0, 0 }; 137 bit hasREX_WPrefix = 0; 138 FPFormat FPForm = ?; 139 bits<3> FPFormBits = { 0, 0, 0 }; 140 } 141 ... 142 143This definition corresponds to the 32-bit register-register ``add`` instruction 144of the x86 architecture. ``def ADD32rr`` defines a record named 145``ADD32rr``, and the comment at the end of the line indicates the superclasses 146of the definition. The body of the record contains all of the data that 147TableGen assembled for the record, indicating that the instruction is part of 148the "X86" namespace, the pattern indicating how the instruction is selected by 149the code generator, that it is a two-address instruction, has a particular 150encoding, etc. The contents and semantics of the information in the record are 151specific to the needs of the X86 backend, and are only shown as an example. 152 153As you can see, a lot of information is needed for every instruction supported 154by the code generator, and specifying it all manually would be unmaintainable, 155prone to bugs, and tiring to do in the first place. Because we are using 156TableGen, all of the information was derived from the following definition: 157 158.. code-block:: llvm 159 160 let Defs = [EFLAGS], 161 isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y 162 isConvertibleToThreeAddress = 1 in // Can transform into LEA. 163 def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), 164 (ins GR32:$src1, GR32:$src2), 165 "add{l}\t{$src2, $dst|$dst, $src2}", 166 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>; 167 168This definition makes use of the custom class ``I`` (extended from the custom 169class ``X86Inst``), which is defined in the X86-specific TableGen file, to 170factor out the common features that instructions of its class share. A key 171feature of TableGen is that it allows the end-user to define the abstractions 172they prefer to use when describing their information. 173 174Each ``def`` record has a special entry called "NAME". This is the name of the 175record ("``ADD32rr``" above). In the general case ``def`` names can be formed 176from various kinds of string processing expressions and ``NAME`` resolves to the 177final value obtained after resolving all of those expressions. The user may 178refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``. 179``NAME`` should not be defined anywhere else in user code to avoid conflicts. 180 181Syntax 182====== 183 184TableGen has a syntax that is loosely based on C++ templates, with built-in 185types and specification. In addition, TableGen's syntax introduces some 186automation concepts like multiclass, foreach, let, etc. 187 188Basic concepts 189-------------- 190 191TableGen files consist of two key parts: 'classes' and 'definitions', both of 192which are considered 'records'. 193 194**TableGen records** have a unique name, a list of values, and a list of 195superclasses. The list of values is the main data that TableGen builds for each 196record; it is this that holds the domain specific information for the 197application. The interpretation of this data is left to a specific `backend`_, 198but the structure and format rules are taken care of and are fixed by 199TableGen. 200 201**TableGen definitions** are the concrete form of 'records'. These generally do 202not have any undefined values, and are marked with the '``def``' keyword. 203 204.. code-block:: llvm 205 206 def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", 207 "Enable ARMv8 FP">; 208 209In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised 210with some values. The names of the classes are defined via the 211keyword `class` either on the same file or some other included. Most target 212TableGen files include the generic ones in ``include/llvm/Target``. 213 214**TableGen classes** are abstract records that are used to build and describe 215other records. These classes allow the end-user to build abstractions for 216either the domain they are targeting (such as "Register", "RegisterClass", and 217"Instruction" in the LLVM code generator) or for the implementor to help factor 218out common properties of records (such as "FPInst", which is used to represent 219floating point instructions in the X86 backend). TableGen keeps track of all of 220the classes that are used to build up a definition, so the backend can find all 221definitions of a particular class, such as "Instruction". 222 223.. code-block:: llvm 224 225 class ProcNoItin<string Name, list<SubtargetFeature> Features> 226 : Processor<Name, NoItineraries, Features>; 227 228Here, the class ProcNoItin, receiving parameters `Name` of type `string` and 229a list of target features is specializing the class Processor by passing the 230arguments down as well as hard-coding NoItineraries. 231 232**TableGen multiclasses** are groups of abstract records that are instantiated 233all at once. Each instantiation can result in multiple TableGen definitions. 234If a multiclass inherits from another multiclass, the definitions in the 235sub-multiclass become part of the current multiclass, as if they were declared 236in the current multiclass. 237 238.. code-block:: llvm 239 240 multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, 241 dag address, ValueType sty> { 242 def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), 243 (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") 244 Base, Offset, Extend)>; 245 246 def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), 247 (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") 248 Base, Offset, Extend)>; 249 } 250 251 defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, 252 !foreach(decls.pattern, address, 253 !subst(SHIFT, imm_eq0, decls.pattern)), 254 i8>; 255 256 257 258See the :doc:`TableGen Language Introduction <LangIntro>` for more generic 259information on the usage of the language, and the 260:doc:`TableGen Language Reference <LangRef>` for more in-depth description 261of the formal language specification. 262 263.. _backend: 264.. _backends: 265 266TableGen backends 267================= 268 269TableGen files have no real meaning without a back-end. The default operation 270of running ``llvm-tblgen`` is to print the information in a textual format, but 271that's only useful for debugging of the TableGen files themselves. The power 272in TableGen is, however, to interpret the source files into an internal 273representation that can be generated into anything you want. 274 275Current usage of TableGen is to create huge include files with tables that you 276can either include directly (if the output is in the language you're coding), 277or be used in pre-processing via macros surrounding the include of the file. 278 279Direct output can be used if the back-end already prints a table in C format 280or if the output is just a list of strings (for error and warning messages). 281Pre-processed output should be used if the same information needs to be used 282in different contexts (like Instruction names), so your back-end should print 283a meta-information list that can be shaped into different compile-time formats. 284 285See the `TableGen BackEnds <BackEnds.html>`_ for more information. 286 287TableGen Deficiencies 288===================== 289 290Despite being very generic, TableGen has some deficiencies that have been 291pointed out numerous times. The common theme is that, while TableGen allows 292you to build Domain-Specific-Languages, the final languages that you create 293lack the power of other DSLs, which in turn increase considerably the size 294and complexity of TableGen files. 295 296At the same time, TableGen allows you to create virtually any meaning of 297the basic concepts via custom-made back-ends, which can pervert the original 298design and make it very hard for newcomers to understand the evil TableGen 299file. 300 301There are some in favour of extending the semantics even more, but making sure 302back-ends adhere to strict rules. Others are suggesting we should move to less, 303more powerful DSLs designed with specific purposes, or even re-using existing 304DSLs. 305 306Either way, this is a discussion that will likely span across several years, 307if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_ 308document. 309