• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1==============================
2User Guide for AMDGPU Back-end
3==============================
4
5Introduction
6============
7
8The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
9the R600 family up until the current Volcanic Islands (GCN Gen 3).
10
11
12Conventions
13===========
14
15Address Spaces
16--------------
17
18The AMDGPU back-end uses the following address space mapping:
19
20   ============= ============================================
21   Address Space Memory Space
22   ============= ============================================
23   0             Private
24   1             Global
25   2             Constant
26   3             Local
27   4             Generic (Flat)
28   5             Region
29   ============= ============================================
30
31The terminology in the table, aside from the region memory space, is from the
32OpenCL standard.
33
34
35Assembler
36=========
37
38The assembler is currently considered experimental.
39
40For syntax examples look in test/MC/AMDGPU.
41
42Below some of the currently supported features (modulo bugs).  These
43all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
44are also supported but may be missing some instructions and have more bugs:
45
46DS Instructions
47---------------
48All DS instructions are supported.
49
50FLAT Instructions
51------------------
52These instructions are only present in the Sea Islands and Volcanic Islands
53instruction set.  All FLAT instructions are supported for these architectures
54
55MUBUF Instructions
56------------------
57All non-atomic MUBUF instructions are supported.
58
59SMRD Instructions
60-----------------
61Only the s_load_dword* SMRD instructions are supported.
62
63SOP1 Instructions
64-----------------
65All SOP1 instructions are supported.
66
67SOP2 Instructions
68-----------------
69All SOP2 instructions are supported.
70
71SOPC Instructions
72-----------------
73All SOPC instructions are supported.
74
75SOPP Instructions
76-----------------
77
78Unless otherwise mentioned, all SOPP instructions that have one or more
79operands accept integer operands only.  No verification is performed
80on the operands, so it is up to the programmer to be familiar with the
81range or acceptable values.
82
83s_waitcnt
84^^^^^^^^^
85
86s_waitcnt accepts named arguments to specify which memory counter(s) to
87wait for.
88
89.. code-block:: nasm
90
91   ; Wait for all counters to be 0
92   s_waitcnt 0
93
94   ; Equivalent to s_waitcnt 0.  Counter names can also be delimited by
95   ; '&' or ','.
96   s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
97
98   ; Wait for vmcnt counter to be 1.
99   s_waitcnt vmcnt(1)
100
101VOP1, VOP2, VOP3, VOPC Instructions
102-----------------------------------
103
104All 32-bit and 64-bit encodings should work.
105
106The assembler will automatically detect which encoding size to use for
107VOP1, VOP2, and VOPC instructions based on the operands.  If you want to force
108a specific encoding size, you can add an _e32 (for 32-bit encoding) or
109_e64 (for 64-bit encoding) suffix to the instruction.  Most, but not all
110instructions support an explicit suffix.  These are all valid assembly
111strings:
112
113.. code-block:: nasm
114
115   v_mul_i32_i24 v1, v2, v3
116   v_mul_i32_i24_e32 v1, v2, v3
117   v_mul_i32_i24_e64 v1, v2, v3
118
119Assembler Directives
120--------------------
121
122.hsa_code_object_version major, minor
123^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
124
125*major* and *minor* are integers that specify the version of the HSA code
126object that will be generated by the assembler.  This value will be stored
127in an entry of the .note section.
128
129.hsa_code_object_isa [major, minor, stepping, vendor, arch]
130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
131
132*major*, *minor*, and *stepping* are all integers that describe the instruction
133set architecture (ISA) version of the assembly program.
134
135*vendor* and *arch* are quoted strings.  *vendor* should always be equal to
136"AMD" and *arch* should always be equal to "AMDGPU".
137
138If no arguments are specified, then the assembler will derive the ISA version,
139*vendor*, and *arch* from the value of the -mcpu option that is passed to the
140assembler.
141
142ISA version, *vendor*, and *arch* will all be stored in a single entry of the
143.note section.
144
145.amd_kernel_code_t
146^^^^^^^^^^^^^^^^^^
147
148This directive marks the beginning of a list of key / value pairs that are used
149to specify the amd_kernel_code_t object that will be emitted by the assembler.
150The list must be terminated by the *.end_amd_kernel_code_t* directive.  For
151any amd_kernel_code_t values that are unspecified a default value will be
152used.  The default value for all keys is 0, with the following exceptions:
153
154- *kernel_code_version_major* defaults to 1.
155- *machine_kind* defaults to 1.
156- *machine_version_major*, *machine_version_minor*, and
157  *machine_version_stepping* are derived from the value of the -mcpu option
158  that is passed to the assembler.
159- *kernel_code_entry_byte_offset* defaults to 256.
160- *wavefront_size* defaults to 6.
161- *kernarg_segment_alignment*, *group_segment_alignment*, and
162  *private_segment_alignment* default to 4.  Note that alignments are specified
163  as a power of two, so a value of **n** means an alignment of 2^ **n**.
164
165The *.amd_kernel_code_t* directive must be placed immediately after the
166function label and before any instructions.
167
168For a full list of amd_kernel_code_t keys, see the examples in
169test/CodeGen/AMDGPU/hsa.s.  For an explanation of the meanings of the different
170keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
171
172Here is an example of a minimal amd_kernel_code_t specification:
173
174.. code-block:: nasm
175
176   .hsa_code_object_version 1,0
177   .hsa_code_object_isa
178
179   .hsatext
180   .globl  hello_world
181   .p2align 8
182   .amdgpu_hsa_kernel hello_world
183
184   hello_world:
185
186      .amd_kernel_code_t
187         enable_sgpr_kernarg_segment_ptr = 1
188         is_ptr64 = 1
189         compute_pgm_rsrc1_vgprs = 0
190         compute_pgm_rsrc1_sgprs = 0
191         compute_pgm_rsrc2_user_sgpr = 2
192         kernarg_segment_byte_size = 8
193         wavefront_sgpr_count = 2
194         workitem_vgpr_count = 3
195     .end_amd_kernel_code_t
196
197     s_load_dwordx2 s[0:1], s[0:1] 0x0
198     v_mov_b32 v0, 3.14159
199     s_waitcnt lgkmcnt(0)
200     v_mov_b32 v1, s0
201     v_mov_b32 v2, s1
202     flat_store_dword v[1:2], v0
203     s_endpgm
204   .Lfunc_end0:
205        .size   hello_world, .Lfunc_end0-hello_world
206