1# SPIR-V Assembly language syntax 2 3## Overview 4 5The assembly attempts to adhere to the binary form from Section 3 of the SPIR-V 6spec as closely as possible, with one exception aiming at improving the text's 7readability. The `<result-id>` generated by an instruction is moved to the 8beginning of that instruction and followed by an `=` sign. This allows us to 9distinguish between variable definitions and uses and locate value definitions 10more easily. 11 12Here is an example: 13 14``` 15 OpCapability Shader 16 OpMemoryModel Logical Simple 17 OpEntryPoint GLCompute %3 "main" 18 OpExecutionMode %3 LocalSize 64 64 1 19%1 = OpTypeVoid 20%2 = OpTypeFunction %1 21%3 = OpFunction %1 None %2 22%4 = OpLabel 23 OpReturn 24 OpFunctionEnd 25``` 26 27A module is a sequence of instructions, separated by whitespace. 28An instruction is an opcode name followed by operands, separated by 29whitespace. Typically each instruction is presented on its own line, 30but the assembler does not enforce this rule. 31 32The opcode names and expected operands are described in Section 3 of 33the SPIR-V specification. An operand is one of: 34* a literal integer: A decimal integer, or a hexadecimal integer. 35 A hexadecimal integer is indicated by a leading `0x` or `0X`. A hex 36 integer supplied for a signed integer value will be sign-extended. 37 For example, `0xffff` supplied as the literal for an `OpConstant` 38 on a signed 16-bit integer type will be interpreted as the value `-1`. 39* a literal floating point number, in decimal or hexadecimal form. 40 See [below](#floats). 41* a literal string. 42 * A literal string is everything following a double-quote `"` until the 43 following un-escaped double-quote. This includes special characters such 44 as newlines. 45 * A backslash `\` may be used to escape characters in the string. The `\` 46 may be used to escape a double-quote or a `\` but is simply ignored when 47 preceding any other character. 48* a named enumerated value, specific to that operand position. For example, 49 the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or 50 `Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`). 51 Named enumerated values are only meaningful in specific positions, and will 52 otherwise generate an error. 53* a mask expression, consisting of one or more mask enum names separated 54 by `|`. For example, the expression `NotNaN|NotInf|NSZ` denotes the mask 55 which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags. 56* an injected immediate integer: `!<integer>`. See [below](#immediate). 57* an ID, e.g. `%foo`. See [below](#id). 58* the name of an extended instruction. For example, `sqrt` in an extended 59 instruction such as `%f = OpExtInst %f32 %OpenCLImport sqrt %arg` 60* the name of an opcode for OpSpecConstantOp, but where the `Op` prefix 61 is removed. For example, the following indicates the use of an integer 62 addition in a specialization constant computation: 63 `%sum = OpSpecConstantOp %i32 IAdd %a %b` 64 65## ID Definitions & Usage 66<a name="id"></a> 67 68An ID _definition_ pertains to the `<result-id>` of an instruction, and ID 69_usage_ is a use of an ID as an input to an instruction. 70 71An ID in the assembly language begins with `%` and must be followed by a name 72consisting of one or more letters, numbers or underscore characters. 73 74For every ID in the assembly program, the assembler generates a unique number 75called the ID's internal number. Then each ID reference translates into its 76internal number in the SPIR-V output. Internal numbers are unique within the 77compilation unit: no two IDs in the same unit will share internal numbers. 78 79The disassembler generates IDs where the name is always a decimal number 80greater than 0. 81 82So the example can be rewritten using more user-friendly names, as follows: 83``` 84 OpCapability Shader 85 OpMemoryModel Logical Simple 86 OpEntryPoint GLCompute %main "main" 87 OpExecutionMode %main LocalSize 64 64 1 88 %void = OpTypeVoid 89%fnMain = OpTypeFunction %void 90 %main = OpFunction %void None %fnMain 91%lbMain = OpLabel 92 OpReturn 93 OpFunctionEnd 94``` 95 96## Floating point literals 97<a name="floats"></a> 98 99The assembler and disassembler support floating point literals in both 100decimal and hexadecimal form. 101 102The syntax for a floating point literal is the same as floating point 103constants in the C programming language, except: 104* An optional leading minus (`-`) is part of the literal. 105* An optional type specifier suffix is not allowed. 106Infinity and NaN values are expressed in hexadecimal float literals 107by using the maximum representable exponent for the bit width. 108 109For example, in 32-bit floating point, 8 bits are used for the exponent, and the 110exponent bias is 127. So the maximum representable unbiased exponent is 128. 111Therefore, we represent the infinities and some NaNs as follows: 112 113``` 114%float32 = OpTypeFloat 32 115%inf = OpConstant %float32 0x1p+128 116%neginf = OpConstant %float32 -0x1p+128 117%aNaN = OpConstant %float32 0x1.8p+128 118%moreNaN = OpConstant %float32 -0x1.0002p+128 119``` 120The assembler preserves all the bits of a NaN value. For example, the encoding 121of `%aNaN` in the previous example is the same as the word with bits 122`0x7fc00000`, and `%moreNaN` is encoded as `0xff800100`. 123 124The disassembler prints infinite, NaN, and subnormal values in hexadecimal form. 125Zero and normal values are printed in decimal form with enough digits 126to preserve all significand bits. 127 128## Arbitrary Integers 129<a name="immediate"></a> 130 131When writing tests it can be useful to emit an invalid 32 bit word into the 132binary stream at arbitrary positions within the assembly. To specify an 133arbitrary word into the stream the prefix `!` is used, this takes the form 134`!<integer>`. Here is an example. 135 136``` 137OpCapability !0x0000FF00 138``` 139 140Any token in a valid assembly program may be replaced by `!<integer>` -- even 141tokens that dictate how the rest of the instruction is parsed. Consider, for 142example, the following assembly program: 143 144``` 145%4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33 146OpExecutionMode %3 InputLines 147``` 148 149The tokens `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random 150`!<integer>` values, and the assembler will still assemble an output binary with 151three instructions. It will not necessarily be valid SPIR-V, but it will 152faithfully reflect the input text. 153 154You may wonder how the assembler recognizes the instruction structure (including 155instruction boundaries) in the text with certain crucial tokens replaced by 156arbitrary integers. If, say, `OpConstant` becomes a `!<integer>` whose value 157differs from the binary representation of `OpConstant` (remember that this 158feature is intended for fine-grain control in SPIR-V testing), the assembler 159generally has no idea what that value stands for. So how does it know there is 160exactly one `<id>` and three number literals following in that instruction, 161before the next one begins? And if `LocalSize` is replaced by an arbitrary 162`!<integer>`, how does it know to take the next three tokens (instead of zero or 163one, both of which are possible in the absence of certainty that `LocalSize` 164provided)? The answer is a simple rule governing the parsing of instructions 165with `!<integer>` in them: 166 167When a token in the assembly program is a `!<integer>`, that integer value is 168emitted into the binary output, and parsing proceeds differently than before: 169each subsequent token not recognized as an OpCode or a `<result-id>` is emitted 170into the binary output without any checking; when a recognizable OpCode or a 171`<result-id>` is eventually encountered, it begins a new instruction and parsing 172returns to normal. (If a subsequent OpCode is never found, then this alternate 173parsing mode handles all the remaining tokens in the program.) 174 175The assembler processes the tokens encountered in alternate parsing mode as 176follows: 177 178* If the token is a number literal, since context may be lost, the number 179 is interpreted as a 32-bit value and output as a single word. In order to 180 specify multiple-word literals in alternate-parsing mode, further uses of 181 `!<integer>` tokens may be required. 182 All formats supported by `strtoul()` are accepted. 183* If the token is a string literal, it outputs a sequence of words representing 184 the string as defined in the SPIR-V specification for Literal String. 185* If the token is an ID, it outputs the ID's internal number. 186* If the token is another `!<integer>`, it outputs that integer. 187* Any other token causes the assembler to quit with an error. 188 189Note that this has some interesting consequences, including: 190 191* When an OpCode is replaced by `!<integer>`, the integer value should encode 192 the instruction's word count, as specified in the physical-layout section of 193 the SPIR-V specification. 194 195* Consecutive instructions may have their OpCode replaced by `!<integer>` and 196 still produce valid SPIR-V. For example, `!262187 %1 %2 "abc" !327739 %1 %3 6 197 %2` will successfully assemble into SPIR-V declaring a constant and a 198 PrivateGlobal variable. 199 200* Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled 201 by the alternate parsing mode. They must be replaced by `!<integer>` for 202 successful assembly. 203 204* The `<result-id>` on the left-hand side of an assignment cannot be a 205 `!<integer>`. The `<result-id>` can be still be manually controlled if desired 206 by expressing the entire instruction as `!<integer>` tokens for its opcode and 207 operands. 208 209* The `=` sign cannot be processed by the alternate parsing mode if the OpCode 210 following it is a `!<integer>`. 211 212* When replacing a named ID with `!<integer>`, it is possible to generate 213 unintentionally valid SPIR-V. If the integer provided happens to equal a 214 number generated for an existing named ID, it will result in a reference to 215 that named ID being output. This may be valid SPIR-V, contrary to the 216 presumed intention of the writer. 217 218## Notes 219 220* Some enumerants cannot be used by name, because the target instruction 221in which they are meaningful take an ID reference instead of a literal value. 222For example: 223 * Named enumerated value `CmdExecTime` from section 3.30 Kernel 224 Profiling Info is used in constructing a mask value supplied as 225 an ID for `OpCaptureEventProfilingInfo`. But no other instruction 226 has enough context to bring the enumerant names from section 3.30 227 into scope. 228 * Similarly, the names in section 3.29 Kernel Enqueue Flags are used to 229 construct a value supplied as an ID to the Flags argument of 230 OpEnqueueKernel. 231 * Similarly for the names in section 3.25 Memory Semantics. 232 * Similarly for the names in section 3.27 Scope. 233* Some enumerants cannot be used by name, because they only name values 234returned by an instruction: 235 * Enumerants from 3.12 Image Channel Order name possible values returned 236 by the `OpImageQueryOrder` instruction. 237 * Enumerants from 3.13 Image Channel Data Type name possible values 238 returned by the `OpImageQueryFormat` instruction. 239