1============================== 2User Guide for AMDGPU Back-end 3============================== 4 5Introduction 6============ 7 8The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with 9the R600 family up until the current Volcanic Islands (GCN Gen 3). 10 11 12Conventions 13=========== 14 15Address Spaces 16-------------- 17 18The AMDGPU back-end uses the following address space mapping: 19 20 ============= ============================================ 21 Address Space Memory Space 22 ============= ============================================ 23 0 Private 24 1 Global 25 2 Constant 26 3 Local 27 4 Generic (Flat) 28 5 Region 29 ============= ============================================ 30 31The terminology in the table, aside from the region memory space, is from the 32OpenCL standard. 33 34 35Assembler 36========= 37 38The assembler is currently considered experimental. 39 40For syntax examples look in test/MC/AMDGPU. 41 42Below some of the currently supported features (modulo bugs). These 43all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands 44are also supported but may be missing some instructions and have more bugs: 45 46DS Instructions 47--------------- 48All DS instructions are supported. 49 50FLAT Instructions 51------------------ 52These instructions are only present in the Sea Islands and Volcanic Islands 53instruction set. All FLAT instructions are supported for these architectures 54 55MUBUF Instructions 56------------------ 57All non-atomic MUBUF instructions are supported. 58 59SMRD Instructions 60----------------- 61Only the s_load_dword* SMRD instructions are supported. 62 63SOP1 Instructions 64----------------- 65All SOP1 instructions are supported. 66 67SOP2 Instructions 68----------------- 69All SOP2 instructions are supported. 70 71SOPC Instructions 72----------------- 73All SOPC instructions are supported. 74 75SOPP Instructions 76----------------- 77 78Unless otherwise mentioned, all SOPP instructions that have one or more 79operands accept integer operands only. No verification is performed 80on the operands, so it is up to the programmer to be familiar with the 81range or acceptable values. 82 83s_waitcnt 84^^^^^^^^^ 85 86s_waitcnt accepts named arguments to specify which memory counter(s) to 87wait for. 88 89.. code-block:: nasm 90 91 ; Wait for all counters to be 0 92 s_waitcnt 0 93 94 ; Equivalent to s_waitcnt 0. Counter names can also be delimited by 95 ; '&' or ','. 96 s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) 97 98 ; Wait for vmcnt counter to be 1. 99 s_waitcnt vmcnt(1) 100 101VOP1, VOP2, VOP3, VOPC Instructions 102----------------------------------- 103 104All 32-bit and 64-bit encodings should work. 105 106The assembler will automatically detect which encoding size to use for 107VOP1, VOP2, and VOPC instructions based on the operands. If you want to force 108a specific encoding size, you can add an _e32 (for 32-bit encoding) or 109_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all 110instructions support an explicit suffix. These are all valid assembly 111strings: 112 113.. code-block:: nasm 114 115 v_mul_i32_i24 v1, v2, v3 116 v_mul_i32_i24_e32 v1, v2, v3 117 v_mul_i32_i24_e64 v1, v2, v3 118 119Assembler Directives 120-------------------- 121 122.hsa_code_object_version major, minor 123^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 124 125*major* and *minor* are integers that specify the version of the HSA code 126object that will be generated by the assembler. This value will be stored 127in an entry of the .note section. 128 129.hsa_code_object_isa [major, minor, stepping, vendor, arch] 130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 131 132*major*, *minor*, and *stepping* are all integers that describe the instruction 133set architecture (ISA) version of the assembly program. 134 135*vendor* and *arch* are quoted strings. *vendor* should always be equal to 136"AMD" and *arch* should always be equal to "AMDGPU". 137 138If no arguments are specified, then the assembler will derive the ISA version, 139*vendor*, and *arch* from the value of the -mcpu option that is passed to the 140assembler. 141 142ISA version, *vendor*, and *arch* will all be stored in a single entry of the 143.note section. 144 145.amd_kernel_code_t 146^^^^^^^^^^^^^^^^^^ 147 148This directive marks the beginning of a list of key / value pairs that are used 149to specify the amd_kernel_code_t object that will be emitted by the assembler. 150The list must be terminated by the *.end_amd_kernel_code_t* directive. For 151any amd_kernel_code_t values that are unspecified a default value will be 152used. The default value for all keys is 0, with the following exceptions: 153 154- *kernel_code_version_major* defaults to 1. 155- *machine_kind* defaults to 1. 156- *machine_version_major*, *machine_version_minor*, and 157 *machine_version_stepping* are derived from the value of the -mcpu option 158 that is passed to the assembler. 159- *kernel_code_entry_byte_offset* defaults to 256. 160- *wavefront_size* defaults to 6. 161- *kernarg_segment_alignment*, *group_segment_alignment*, and 162 *private_segment_alignment* default to 4. Note that alignments are specified 163 as a power of two, so a value of **n** means an alignment of 2^ **n**. 164 165The *.amd_kernel_code_t* directive must be placed immediately after the 166function label and before any instructions. 167 168For a full list of amd_kernel_code_t keys, see the examples in 169test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different 170keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h 171 172Here is an example of a minimal amd_kernel_code_t specification: 173 174.. code-block:: nasm 175 176 .hsa_code_object_version 1,0 177 .hsa_code_object_isa 178 179 .hsatext 180 .globl hello_world 181 .p2align 8 182 .amdgpu_hsa_kernel hello_world 183 184 hello_world: 185 186 .amd_kernel_code_t 187 enable_sgpr_kernarg_segment_ptr = 1 188 is_ptr64 = 1 189 compute_pgm_rsrc1_vgprs = 0 190 compute_pgm_rsrc1_sgprs = 0 191 compute_pgm_rsrc2_user_sgpr = 2 192 kernarg_segment_byte_size = 8 193 wavefront_sgpr_count = 2 194 workitem_vgpr_count = 3 195 .end_amd_kernel_code_t 196 197 s_load_dwordx2 s[0:1], s[0:1] 0x0 198 v_mov_b32 v0, 3.14159 199 s_waitcnt lgkmcnt(0) 200 v_mov_b32 v1, s0 201 v_mov_b32 v2, s1 202 flat_store_dword v[1:2], v0 203 s_endpgm 204 .Lfunc_end0: 205 .size hello_world, .Lfunc_end0-hello_world 206