• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!--{
2	"Title": "A Quick Guide to Go's Assembler",
3	"Path":  "/doc/asm"
4}-->
5
6<h2 id="introduction">A Quick Guide to Go's Assembler</h2>
7
8<p>
9This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler.
10The document is not comprehensive.
11</p>
12
13<p>
14The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail
15<a href="https://9p.io/sys/doc/asm.html">elsewhere</a>.
16If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
17The current document provides a summary of the syntax and the differences with
18what is explained in that document, and
19describes the peculiarities that apply when writing assembly code to interact with Go.
20</p>
21
22<p>
23The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
24Some of the details map precisely to the machine, but some do not.
25This is because the compiler suite (see
26<a href="https://9p.io/sys/doc/compiler.html">this description</a>)
27needs no assembler pass in the usual pipeline.
28Instead, the compiler operates on a kind of semi-abstract instruction set,
29and instruction selection occurs partly after code generation.
30The assembler works on the semi-abstract form, so
31when you see an instruction like <code>MOV</code>
32what the toolchain actually generates for that operation might
33not be a move instruction at all, perhaps a clear or load.
34Or it might correspond exactly to the machine instruction with that name.
35In general, machine-specific operations tend to appear as themselves, while more general concepts like
36memory move and subroutine call and return are more abstract.
37The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
38</p>
39
40<p>
41The assembler program is a way to parse a description of that
42semi-abstract instruction set and turn it into instructions to be
43input to the linker.
44If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
45are many examples in the sources of the standard library, in packages such as
46<a href="/pkg/runtime/"><code>runtime</code></a> and
47<a href="/pkg/math/big/"><code>math/big</code></a>.
48You can also examine what the compiler emits as assembly code
49(the actual output may differ from what you see here):
50</p>
51
52<pre>
53$ cat x.go
54package main
55
56func main() {
57	println(3)
58}
59$ GOOS=linux GOARCH=amd64 go tool compile -S x.go        # or: go build -gcflags -S x.go
60"".main STEXT size=74 args=0x0 locals=0x10
61	0x0000 00000 (x.go:3)	TEXT	"".main(SB), $16-0
62	0x0000 00000 (x.go:3)	MOVQ	(TLS), CX
63	0x0009 00009 (x.go:3)	CMPQ	SP, 16(CX)
64	0x000d 00013 (x.go:3)	JLS	67
65	0x000f 00015 (x.go:3)	SUBQ	$16, SP
66	0x0013 00019 (x.go:3)	MOVQ	BP, 8(SP)
67	0x0018 00024 (x.go:3)	LEAQ	8(SP), BP
68	0x001d 00029 (x.go:3)	FUNCDATA	$0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
69	0x001d 00029 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
70	0x001d 00029 (x.go:3)	FUNCDATA	$2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
71	0x001d 00029 (x.go:4)	PCDATA	$0, $0
72	0x001d 00029 (x.go:4)	PCDATA	$1, $0
73	0x001d 00029 (x.go:4)	CALL	runtime.printlock(SB)
74	0x0022 00034 (x.go:4)	MOVQ	$3, (SP)
75	0x002a 00042 (x.go:4)	CALL	runtime.printint(SB)
76	0x002f 00047 (x.go:4)	CALL	runtime.printnl(SB)
77	0x0034 00052 (x.go:4)	CALL	runtime.printunlock(SB)
78	0x0039 00057 (x.go:5)	MOVQ	8(SP), BP
79	0x003e 00062 (x.go:5)	ADDQ	$16, SP
80	0x0042 00066 (x.go:5)	RET
81	0x0043 00067 (x.go:5)	NOP
82	0x0043 00067 (x.go:3)	PCDATA	$1, $-1
83	0x0043 00067 (x.go:3)	PCDATA	$0, $-1
84	0x0043 00067 (x.go:3)	CALL	runtime.morestack_noctxt(SB)
85	0x0048 00072 (x.go:3)	JMP	0
86...
87</pre>
88
89<p>
90The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
91for use by the garbage collector; they are introduced by the compiler.
92</p>
93
94<p>
95To see what gets put in the binary after linking, use <code>go tool objdump</code>:
96</p>
97
98<pre>
99$ go build -o x.exe x.go
100$ go tool objdump -s main.main x.exe
101TEXT main.main(SB) /tmp/x.go
102  x.go:3		0x10501c0		65488b0c2530000000	MOVQ GS:0x30, CX
103  x.go:3		0x10501c9		483b6110		CMPQ 0x10(CX), SP
104  x.go:3		0x10501cd		7634			JBE 0x1050203
105  x.go:3		0x10501cf		4883ec10		SUBQ $0x10, SP
106  x.go:3		0x10501d3		48896c2408		MOVQ BP, 0x8(SP)
107  x.go:3		0x10501d8		488d6c2408		LEAQ 0x8(SP), BP
108  x.go:4		0x10501dd		e86e45fdff		CALL runtime.printlock(SB)
109  x.go:4		0x10501e2		48c7042403000000	MOVQ $0x3, 0(SP)
110  x.go:4		0x10501ea		e8e14cfdff		CALL runtime.printint(SB)
111  x.go:4		0x10501ef		e8ec47fdff		CALL runtime.printnl(SB)
112  x.go:4		0x10501f4		e8d745fdff		CALL runtime.printunlock(SB)
113  x.go:5		0x10501f9		488b6c2408		MOVQ 0x8(SP), BP
114  x.go:5		0x10501fe		4883c410		ADDQ $0x10, SP
115  x.go:5		0x1050202		c3			RET
116  x.go:3		0x1050203		e83882ffff		CALL runtime.morestack_noctxt(SB)
117  x.go:3		0x1050208		ebb6			JMP main.main(SB)
118</pre>
119
120<h3 id="constants">Constants</h3>
121
122<p>
123Although the assembler takes its guidance from the Plan 9 assemblers,
124it is a distinct program, so there are some differences.
125One is in constant evaluation.
126Constant expressions in the assembler are parsed using Go's operator
127precedence, not the C-like precedence of the original.
128Thus <code>3&amp;1&lt;&lt;2</code> is 4, not 0—it parses as <code>(3&amp;1)&lt;&lt;2</code>
129not <code>3&amp;(1&lt;&lt;2)</code>.
130Also, constants are always evaluated as 64-bit unsigned integers.
131Thus <code>-2</code> is not the integer value minus two,
132but the unsigned 64-bit integer with the same bit pattern.
133The distinction rarely matters but
134to avoid ambiguity, division or right shift where the right operand's
135high bit is set is rejected.
136</p>
137
138<h3 id="symbols">Symbols</h3>
139
140<p>
141Some symbols, such as <code>R1</code> or <code>LR</code>,
142are predefined and refer to registers.
143The exact set depends on the architecture.
144</p>
145
146<p>
147There are four predeclared symbols that refer to pseudo-registers.
148These are not real registers, but rather virtual registers maintained by
149the toolchain, such as a frame pointer.
150The set of pseudo-registers is the same for all architectures:
151</p>
152
153<ul>
154
155<li>
156<code>FP</code>: Frame pointer: arguments and locals.
157</li>
158
159<li>
160<code>PC</code>: Program counter:
161jumps and branches.
162</li>
163
164<li>
165<code>SB</code>: Static base pointer: global symbols.
166</li>
167
168<li>
169<code>SP</code>: Stack pointer: the highest address within the local stack frame.
170</li>
171
172</ul>
173
174<p>
175All user-defined symbols are written as offsets to the pseudo-registers
176<code>FP</code> (arguments and locals) and <code>SB</code> (globals).
177</p>
178
179<p>
180The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
181is the name <code>foo</code> as an address in memory.
182This form is used to name global functions and data.
183Adding <code>&lt;&gt;</code> to the name, as in <span style="white-space: nowrap"><code>foo&lt;&gt;(SB)</code></span>, makes the name
184visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
185Adding an offset to the name refers to that offset from the symbol's address, so
186<code>foo+4(SB)</code> is four bytes past the start of <code>foo</code>.
187</p>
188
189<p>
190The <code>FP</code> pseudo-register is a virtual frame pointer
191used to refer to function arguments.
192The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
193Thus <code>0(FP)</code> is the first argument to the function,
194<code>8(FP)</code> is the second (on a 64-bit machine), and so on.
195However, when referring to a function argument this way, it is necessary to place a name
196at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
197(The meaning of the offset—offset from the frame pointer—distinct
198from its use with <code>SB</code>, where it is an offset from the symbol.)
199The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
200The actual name is semantically irrelevant but should be used to document
201the argument's name.
202It is worth stressing that <code>FP</code> is always a
203pseudo-register, not a hardware
204register, even on architectures with a hardware frame pointer.
205</p>
206
207<p>
208For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
209and offsets match.
210On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
211a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
212If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
213</p>
214
215<p>
216The <code>SP</code> pseudo-register is a virtual stack pointer
217used to refer to frame-local variables and the arguments being
218prepared for function calls.
219It points to the highest address within the local stack frame, so references should use negative offsets
220in the range [−framesize, 0):
221<code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
222</p>
223
224<p>
225On architectures with a hardware register named <code>SP</code>,
226the name prefix distinguishes
227references to the virtual stack pointer from references to the architectural
228<code>SP</code> register.
229That is, <code>x-8(SP)</code> and <code>-8(SP)</code>
230are different memory locations:
231the first refers to the virtual stack pointer pseudo-register,
232while the second refers to the
233hardware's <code>SP</code> register.
234</p>
235
236<p>
237On machines where <code>SP</code> and <code>PC</code> are
238traditionally aliases for a physical, numbered register,
239in the Go assembler the names <code>SP</code> and <code>PC</code>
240are still treated specially;
241for instance, references to <code>SP</code> require a symbol,
242much like <code>FP</code>.
243To access the actual hardware register use the true <code>R</code> name.
244For example, on the ARM architecture the hardware
245<code>SP</code> and <code>PC</code> are accessible as
246<code>R13</code> and <code>R15</code>.
247</p>
248
249<p>
250Branches and direct jumps are always written as offsets to the PC, or as
251jumps to labels:
252</p>
253
254<pre>
255label:
256	MOVW $0, R1
257	JMP label
258</pre>
259
260<p>
261Each label is visible only within the function in which it is defined.
262It is therefore permitted for multiple functions in a file to define
263and use the same label names.
264Direct jumps and call instructions can target text symbols,
265such as <code>name(SB)</code>, but not offsets from symbols,
266such as <code>name+4(SB)</code>.
267</p>
268
269<p>
270Instructions, registers, and assembler directives are always in UPPER CASE to remind you
271that assembly programming is a fraught endeavor.
272(Exception: the <code>g</code> register renaming on ARM.)
273</p>
274
275<p>
276In Go object files and binaries, the full name of a symbol is the
277package path followed by a period and the symbol name:
278<code>fmt.Printf</code> or <code>math/rand.Int</code>.
279Because the assembler's parser treats period and slash as punctuation,
280those strings cannot be used directly as identifier names.
281Instead, the assembler allows the middle dot character U+00B7
282and the division slash U+2215 in identifiers and rewrites them to
283plain period and slash.
284Within an assembler source file, the symbols above are written as
285<code>fmt·Printf</code> and <code>math∕rand·Int</code>.
286The assembly listings generated by the compilers when using the <code>-S</code> flag
287show the period and slash directly instead of the Unicode replacements
288required by the assemblers.
289</p>
290
291<p>
292Most hand-written assembly files do not include the full package path
293in symbol names, because the linker inserts the package path of the current
294object file at the beginning of any name starting with a period:
295in an assembly source file within the math/rand package implementation,
296the package's Int function can be referred to as <code>·Int</code>.
297This convention avoids the need to hard-code a package's import path in its
298own source code, making it easier to move the code from one location to another.
299</p>
300
301<h3 id="directives">Directives</h3>
302
303<p>
304The assembler uses various directives to bind text and data to symbol names.
305For example, here is a simple complete function definition. The <code>TEXT</code>
306directive declares the symbol <code>runtime·profileloop</code> and the instructions
307that follow form the body of the function.
308The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
309(If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
310After the symbol, the arguments are flags (see below)
311and the frame size, a constant (but see below):
312</p>
313
314<pre>
315TEXT runtime·profileloop(SB),NOSPLIT,$8
316	MOVQ	$runtime·profileloop1(SB), CX
317	MOVQ	CX, 0(SP)
318	CALL	runtime·externalthreadhandler(SB)
319	RET
320</pre>
321
322<p>
323In the general case, the frame size is followed by an argument size, separated by a minus sign.
324(It's not a subtraction, just idiosyncratic syntax.)
325The frame size <code>$24-8</code> states that the function has a 24-byte frame
326and is called with 8 bytes of argument, which live on the caller's frame.
327If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
328the argument size must be provided.
329For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
330argument size is correct.
331</p>
332
333<p>
334Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
335static base pseudo-register <code>SB</code>.
336This function would be called from Go source for package <code>runtime</code> using the
337simple name <code>profileloop</code>.
338</p>
339
340<p>
341Global data symbols are defined by a sequence of initializing
342<code>DATA</code> directives followed by a <code>GLOBL</code> directive.
343Each <code>DATA</code> directive initializes a section of the
344corresponding memory.
345The memory not explicitly initialized is zeroed.
346The general form of the <code>DATA</code> directive is
347
348<pre>
349DATA	symbol+offset(SB)/width, value
350</pre>
351
352<p>
353which initializes the symbol memory at the given offset and width with the given value.
354The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
355</p>
356
357<p>
358The <code>GLOBL</code> directive declares a symbol to be global.
359The arguments are optional flags and the size of the data being declared as a global,
360which will have initial value all zeros unless a <code>DATA</code> directive
361has initialized it.
362The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
363</p>
364
365<p>
366For example,
367</p>
368
369<pre>
370DATA divtab&lt;&gt;+0x00(SB)/4, $0xf4f8fcff
371DATA divtab&lt;&gt;+0x04(SB)/4, $0xe6eaedf0
372...
373DATA divtab&lt;&gt;+0x3c(SB)/4, $0x81828384
374GLOBL divtab&lt;&gt;(SB), RODATA, $64
375
376GLOBL runtime·tlsoffset(SB), NOPTR, $4
377</pre>
378
379<p>
380declares and initializes <code>divtab&lt;&gt;</code>, a read-only 64-byte table of 4-byte integer values,
381and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
382contains no pointers.
383</p>
384
385<p>
386There may be one or two arguments to the directives.
387If there are two, the first is a bit mask of flags,
388which can be written as numeric expressions, added or or-ed together,
389or can be set symbolically for easier absorption by a human.
390Their values, defined in the standard <code>#include</code>  file <code>textflag.h</code>, are:
391</p>
392
393<ul>
394<li>
395<code>NOPROF</code> = 1
396<br>
397(For <code>TEXT</code> items.)
398Don't profile the marked function.  This flag is deprecated.
399</li>
400<li>
401<code>DUPOK</code> = 2
402<br>
403It is legal to have multiple instances of this symbol in a single binary.
404The linker will choose one of the duplicates to use.
405</li>
406<li>
407<code>NOSPLIT</code> = 4
408<br>
409(For <code>TEXT</code> items.)
410Don't insert the preamble to check if the stack must be split.
411The frame for the routine, plus anything it calls, must fit in the
412spare space remaining in the current stack segment.
413Used to protect routines such as the stack splitting code itself.
414</li>
415<li>
416<code>RODATA</code> = 8
417<br>
418(For <code>DATA</code> and <code>GLOBL</code> items.)
419Put this data in a read-only section.
420</li>
421<li>
422<code>NOPTR</code> = 16
423<br>
424(For <code>DATA</code> and <code>GLOBL</code> items.)
425This data contains no pointers and therefore does not need to be
426scanned by the garbage collector.
427</li>
428<li>
429<code>WRAPPER</code> = 32
430<br>
431(For <code>TEXT</code> items.)
432This is a wrapper function and should not count as disabling <code>recover</code>.
433</li>
434<li>
435<code>NEEDCTXT</code> = 64
436<br>
437(For <code>TEXT</code> items.)
438This function is a closure so it uses its incoming context register.
439</li>
440<li>
441<code>LOCAL</code> = 128
442<br>
443This symbol is local to the dynamic shared object.
444</li>
445<li>
446<code>TLSBSS</code> = 256
447<br>
448(For <code>DATA</code> and <code>GLOBL</code> items.)
449Put this data in thread local storage.
450</li>
451<li>
452<code>NOFRAME</code> = 512
453<br>
454(For <code>TEXT</code> items.)
455Do not insert instructions to allocate a stack frame and save/restore the return
456address, even if this is not a leaf function.
457Only valid on functions that declare a frame size of 0.
458</li>
459<li>
460<code>TOPFRAME</code> = 2048
461<br>
462(For <code>TEXT</code> items.)
463Function is the outermost frame of the call stack. Traceback should stop at this function.
464</li>
465</ul>
466
467<h3 id="special-instructions">Special instructions</h3>
468
469<p>
470The <code>PCALIGN</code> pseudo-instruction is used to indicate that the next instruction should be aligned
471to a specified boundary by padding with no-op instructions.
472</p>
473
474<p>
475It is currently supported on arm64, amd64, ppc64, loong64 and riscv64.
476
477For example, the start of the <code>MOVD</code> instruction below is aligned to 32 bytes:
478<pre>
479PCALIGN $32
480MOVD $2, R0
481</pre>
482</p>
483
484<h3 id="data-offsets">Interacting with Go types and constants</h3>
485
486<p>
487If a package has any .s files, then <code>go build</code> will direct
488the compiler to emit a special header called <code>go_asm.h</code>,
489which the .s files can then <code>#include</code>.
490The file contains symbolic <code>#define</code> constants for the
491offsets of Go struct fields, the sizes of Go struct types, and most
492Go <code>const</code> declarations defined in the current package.
493Go assembly should avoid making assumptions about the layout of Go
494types and instead use these constants.
495This improves the readability of assembly code, and keeps it robust to
496changes in data layout either in the Go type definitions or in the
497layout rules used by the Go compiler.
498</p>
499
500<p>
501Constants are of the form <code>const_<i>name</i></code>.
502For example, given the Go declaration <code>const bufSize =
5031024</code>, assembly code can refer to the value of this constant
504as <code>const_bufSize</code>.
505</p>
506
507<p>
508Field offsets are of the form <code><i>type</i>_<i>field</i></code>.
509Struct sizes are of the form <code><i>type</i>__size</code>.
510For example, consider the following Go definition:
511</p>
512
513<pre>
514type reader struct {
515	buf [bufSize]byte
516	r   int
517}
518</pre>
519
520<p>
521Assembly can refer to the size of this struct
522as <code>reader__size</code> and the offsets of the two fields
523as <code>reader_buf</code> and <code>reader_r</code>.
524Hence, if register <code>R1</code> contains a pointer to
525a <code>reader</code>, assembly can reference the <code>r</code> field
526as <code>reader_r(R1)</code>.
527</p>
528
529<p>
530If any of these <code>#define</code> names are ambiguous (for example,
531a struct with a <code>_size</code> field), <code>#include
532"go_asm.h"</code> will fail with a "redefinition of macro" error.
533</p>
534
535<h3 id="runtime">Runtime Coordination</h3>
536
537<p>
538For garbage collection to run correctly, the runtime must know the
539location of pointers in all global data and in most stack frames.
540The Go compiler emits this information when compiling Go source files,
541but assembly programs must define it explicitly.
542</p>
543
544<p>
545A data symbol marked with the <code>NOPTR</code> flag (see above)
546is treated as containing no pointers to runtime-allocated data.
547A data symbol with the <code>RODATA</code> flag
548is allocated in read-only memory and is therefore treated
549as implicitly marked <code>NOPTR</code>.
550A data symbol with a total size smaller than a pointer
551is also treated as implicitly marked <code>NOPTR</code>.
552It is not possible to define a symbol containing pointers in an assembly source file;
553such a symbol must be defined in a Go source file instead.
554Assembly source can still refer to the symbol by name
555even without <code>DATA</code> and <code>GLOBL</code> directives.
556A good general rule of thumb is to define all non-<code>RODATA</code>
557symbols in Go instead of in assembly.
558</p>
559
560<p>
561Each function also needs annotations giving the location of
562live pointers in its arguments, results, and local stack frame.
563For an assembly function with no pointer results and
564either no local stack frame or no function calls,
565the only requirement is to define a Go prototype for the function
566in a Go source file in the same package. The name of the assembly
567function must not contain the package name component (for example,
568function <code>Syscall</code> in package <code>syscall</code> should
569use the name <code>·Syscall</code> instead of the equivalent name
570<code>syscall·Syscall</code> in its <code>TEXT</code> directive).
571For more complex situations, explicit annotation is needed.
572These annotations use pseudo-instructions defined in the standard
573<code>#include</code> file <code>funcdata.h</code>.
574</p>
575
576<p>
577If a function has no arguments and no results,
578the pointer information can be omitted.
579This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
580on the <code>TEXT</code> instruction.
581Otherwise, pointer information must be provided by
582a Go prototype for the function in a Go source file,
583even for assembly functions not called directly from Go.
584(The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
585At the start of the function, the arguments are assumed
586to be initialized but the results are assumed uninitialized.
587If the results will hold live pointers during a call instruction,
588the function should start by zeroing the results and then
589executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
590This instruction records that the results are now initialized
591and should be scanned during stack movement and garbage collection.
592It is typically easier to arrange that assembly functions do not
593return pointers or do not contain call instructions;
594no assembly functions in the standard library use
595<code>GO_RESULTS_INITIALIZED</code>.
596</p>
597
598<p>
599If a function has no local stack frame,
600the pointer information can be omitted.
601This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
602on the <code>TEXT</code> instruction.
603The pointer information can also be omitted if the
604function contains no call instructions.
605Otherwise, the local stack frame must not contain pointers,
606and the assembly must confirm this fact by executing the
607pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
608Because stack resizing is implemented by moving the stack,
609the stack pointer may change during any function call:
610even pointers to stack data must not be kept in local variables.
611</p>
612
613<p>
614Assembly functions should always be given Go prototypes,
615both to provide pointer information for the arguments and results
616and to let <code>go</code> <code>vet</code> check that
617the offsets being used to access them are correct.
618</p>
619
620<h2 id="architectures">Architecture-specific details</h2>
621
622<p>
623It is impractical to list all the instructions and other details for each machine.
624To see what instructions are defined for a given machine, say ARM,
625look in the source for the <code>obj</code> support library for
626that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>.
627In that directory is a file <code>a.out.go</code>; it contains
628a long list of constants starting with <code>A</code>, like this:
629</p>
630
631<pre>
632const (
633	AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota
634	AEOR
635	ASUB
636	ARSB
637	AADD
638	...
639</pre>
640
641<p>
642This is the list of instructions and their spellings as known to the assembler and linker for that architecture.
643Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code>
644represents the bitwise and instruction,
645<code>AND</code> (without the leading <code>A</code>),
646and is written in assembly source as <code>AND</code>.
647The enumeration is mostly in alphabetical order.
648(The architecture-independent <code>AXXX</code>, defined in the
649<code>cmd/internal/obj</code> package,
650represents an invalid instruction).
651The sequence of the <code>A</code> names has nothing to do with the actual
652encoding of the machine instructions.
653The <code>cmd/internal/obj</code> package takes care of that detail.
654</p>
655
656<p>
657The instructions for both the 386 and AMD64 architectures are listed in
658<code>cmd/internal/obj/x86/a.out.go</code>.
659</p>
660
661<p>
662The architectures share syntax for common addressing modes such as
663<code>(R1)</code> (register indirect),
664<code>4(R1)</code> (register indirect with offset), and
665<code>$foo(SB)</code> (absolute address).
666The assembler also supports some (not necessarily all) addressing modes
667specific to each architecture.
668The sections below list these.
669</p>
670
671<p>
672One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
673<code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
674This rule applies even on architectures where the conventional notation uses the opposite direction.
675</p>
676
677<p>
678Here follow some descriptions of key Go-specific details for the supported architectures.
679</p>
680
681<h3 id="x86">32-bit Intel 386</h3>
682
683<p>
684The runtime pointer to the <code>g</code> structure is maintained
685through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
686In the runtime package, assembly code can include <code>go_tls.h</code>, which defines
687an OS- and architecture-dependent macro <code>get_tls</code> for accessing this register.
688The <code>get_tls</code> macro takes one argument, which is the register to load the
689<code>g</code> pointer into.
690</p>
691
692<p>
693For example, the sequence to load <code>g</code> and <code>m</code>
694using <code>CX</code> looks like this:
695</p>
696
697<pre>
698#include "go_tls.h"
699#include "go_asm.h"
700...
701get_tls(CX)
702MOVL	g(CX), AX     // Move g into AX.
703MOVL	g_m(AX), BX   // Move g.m into BX.
704</pre>
705
706<p>
707The <code>get_tls</code> macro is also defined on <a href="#amd64">amd64</a>.
708</p>
709
710<p>
711Addressing modes:
712</p>
713
714<ul>
715
716<li>
717<code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>.
718</li>
719
720<li>
721<code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64.
722These modes accept only 1, 2, 4, and 8 as scale factors.
723</li>
724
725</ul>
726
727<p>
728When using the compiler and assembler's
729<code>-dynlink</code> or <code>-shared</code> modes,
730any load or store of a fixed memory location such as a global variable
731must be assumed to overwrite <code>CX</code>.
732Therefore, to be safe for use with these modes,
733assembly sources should typically avoid CX except between memory references.
734</p>
735
736<h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
737
738<p>
739The two architectures behave largely the same at the assembler level.
740Assembly code to access the <code>m</code> and <code>g</code>
741pointers on the 64-bit version is the same as on the 32-bit 386,
742except it uses <code>MOVQ</code> rather than <code>MOVL</code>:
743</p>
744
745<pre>
746get_tls(CX)
747MOVQ	g(CX), AX     // Move g into AX.
748MOVQ	g_m(AX), BX   // Move g.m into BX.
749</pre>
750
751<p>
752Register <code>BP</code> is callee-save.
753The assembler automatically inserts <code>BP</code> save/restore when frame size is larger than zero.
754Using <code>BP</code> as a general purpose register is allowed,
755however it can interfere with sampling-based profiling.
756</p>
757
758<h3 id="arm">ARM</h3>
759
760<p>
761The registers <code>R10</code> and <code>R11</code>
762are reserved by the compiler and linker.
763</p>
764
765<p>
766<code>R10</code> points to the <code>g</code> (goroutine) structure.
767Within assembler source code, this pointer must be referred to as <code>g</code>;
768the name <code>R10</code> is not recognized.
769</p>
770
771<p>
772To make it easier for people and compilers to write assembly, the ARM linker
773allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
774that may not be expressible using a single hardware instruction.
775It implements these forms as multiple instructions, often using the <code>R11</code> register
776to hold temporary values.
777Hand-written assembly can use <code>R11</code>, but doing so requires
778being sure that the linker is not also using it to implement any of the other
779instructions in the function.
780</p>
781
782<p>
783When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
784tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
785</p>
786
787<p>
788The name <code>SP</code> always refers to the virtual stack pointer described earlier.
789For the hardware register, use <code>R13</code>.
790</p>
791
792<p>
793Condition code syntax is to append a period and the one- or two-letter code to the instruction,
794as in <code>MOVW.EQ</code>.
795Multiple codes may be appended: <code>MOVM.IA.W</code>.
796The order of the code modifiers is irrelevant.
797</p>
798
799<p>
800Addressing modes:
801</p>
802
803<ul>
804
805<li>
806<code>R0-&gt;16</code>
807<br>
808<code>R0&gt;&gt;16</code>
809<br>
810<code>R0&lt;&lt;16</code>
811<br>
812<code>R0@&gt;16</code>:
813For <code>&lt;&lt;</code>, left shift <code>R0</code> by 16 bits.
814The other codes are <code>-&gt;</code> (arithmetic right shift),
815<code>&gt;&gt;</code> (logical right shift), and
816<code>@&gt;</code> (rotate right).
817</li>
818
819<li>
820<code>R0-&gt;R1</code>
821<br>
822<code>R0&gt;&gt;R1</code>
823<br>
824<code>R0&lt;&lt;R1</code>
825<br>
826<code>R0@&gt;R1</code>:
827For <code>&lt;&lt;</code>, left shift <code>R0</code> by the count in <code>R1</code>.
828The other codes are <code>-&gt;</code> (arithmetic right shift),
829<code>&gt;&gt;</code> (logical right shift), and
830<code>@&gt;</code> (rotate right).
831
832</li>
833
834<li>
835<code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising
836<code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive.
837</li>
838
839<li>
840<code>(R5, R6)</code>: Destination register pair.
841</li>
842
843</ul>
844
845<h3 id="arm64">ARM64</h3>
846
847<p>
848<code>R18</code> is the "platform register", reserved on the Apple platform.
849To prevent accidental misuse, the register is named <code>R18_PLATFORM</code>.
850<code>R27</code> and <code>R28</code> are reserved by the compiler and linker.
851<code>R29</code> is the frame pointer.
852<code>R30</code> is the link register.
853</p>
854
855<p>
856Instruction modifiers are appended to the instruction following a period.
857The only modifiers are <code>P</code> (postincrement) and <code>W</code>
858(preincrement):
859<code>MOVW.P</code>, <code>MOVW.W</code>
860</p>
861
862<p>
863Addressing modes:
864</p>
865
866<ul>
867
868<li>
869<code>R0-&gt;16</code>
870<br>
871<code>R0&gt;&gt;16</code>
872<br>
873<code>R0&lt;&lt;16</code>
874<br>
875<code>R0@&gt;16</code>:
876These are the same as on the 32-bit ARM.
877</li>
878
879<li>
880<code>$(8&lt;&lt;12)</code>:
881Left shift the immediate value <code>8</code> by <code>12</code> bits.
882</li>
883
884<li>
885<code>8(R0)</code>:
886Add the value of <code>R0</code> and <code>8</code>.
887</li>
888
889<li>
890<code>(R2)(R0)</code>:
891The location at <code>R0</code> plus <code>R2</code>.
892</li>
893
894<li>
895<code>R0.UXTB</code>
896<br>
897<code>R0.UXTB&lt;&lt;imm</code>:
898<code>UXTB</code>: extract an 8-bit value from the low-order bits of <code>R0</code> and zero-extend it to the size of <code>R0</code>.
899<code>R0.UXTB&lt;&lt;imm</code>: left shift the result of <code>R0.UXTB</code> by <code>imm</code> bits.
900The <code>imm</code> value can be 0, 1, 2, 3, or 4.
901The other extensions include <code>UXTH</code> (16-bit), <code>UXTW</code> (32-bit), and <code>UXTX</code> (64-bit).
902</li>
903
904<li>
905<code>R0.SXTB</code>
906<br>
907<code>R0.SXTB&lt;&lt;imm</code>:
908<code>SXTB</code>: extract an 8-bit value from the low-order bits of <code>R0</code> and sign-extend it to the size of <code>R0</code>.
909<code>R0.SXTB&lt;&lt;imm</code>: left shift the result of <code>R0.SXTB</code> by <code>imm</code> bits.
910The <code>imm</code> value can be 0, 1, 2, 3, or 4.
911The other extensions include <code>SXTH</code> (16-bit), <code>SXTW</code> (32-bit), and <code>SXTX</code> (64-bit).
912</li>
913
914<li>
915<code>(R5, R6)</code>: Register pair for <code>LDAXP</code>/<code>LDP</code>/<code>LDXP</code>/<code>STLXP</code>/<code>STP</code>/<code>STP</code>.
916</li>
917
918</ul>
919
920<p>
921Reference: <a href="/pkg/cmd/internal/obj/arm64">Go ARM64 Assembly Instructions Reference Manual</a>
922</p>
923
924<h3 id="ppc64">PPC64</h3>
925
926<p>
927This assembler is used by GOARCH values ppc64 and ppc64le.
928</p>
929
930<p>
931Reference: <a href="/pkg/cmd/internal/obj/ppc64">Go PPC64 Assembly Instructions Reference Manual</a>
932</p>
933
934<h3 id="s390x">IBM z/Architecture, a.k.a. s390x</h3>
935
936<p>
937The registers <code>R10</code> and <code>R11</code> are reserved.
938The assembler uses them to hold temporary values when assembling some instructions.
939</p>
940
941<p>
942<code>R13</code> points to the <code>g</code> (goroutine) structure.
943This register must be referred to as <code>g</code>; the name <code>R13</code> is not recognized.
944</p>
945
946<p>
947<code>R15</code> points to the stack frame and should typically only be accessed using the
948virtual registers <code>SP</code> and <code>FP</code>.
949</p>
950
951<p>
952Load- and store-multiple instructions operate on a range of registers.
953The range of registers is specified by a start register and an end register.
954For example, <code>LMG</code> <code>(R9),</code> <code>R5,</code> <code>R7</code> would load
955<code>R5</code>, <code>R6</code> and <code>R7</code> with the 64-bit values at
956<code>0(R9)</code>, <code>8(R9)</code> and <code>16(R9)</code> respectively.
957</p>
958
959<p>
960Storage-and-storage instructions such as <code>MVC</code> and <code>XC</code> are written
961with the length as the first argument.
962For example, <code>XC</code> <code>$8,</code> <code>(R9),</code> <code>(R9)</code> would clear
963eight bytes at the address specified in <code>R9</code>.
964</p>
965
966<p>
967If a vector instruction takes a length or an index as an argument then it will be the
968first argument.
969For example, <code>VLEIF</code> <code>$1,</code> <code>$16,</code> <code>V2</code> will load
970the value sixteen into index one of <code>V2</code>.
971Care should be taken when using vector instructions to ensure that they are available at
972runtime.
973To use vector instructions a machine must have both the vector facility (bit 129 in the
974facility list) and kernel support.
975Without kernel support a vector instruction will have no effect (it will be equivalent
976to a <code>NOP</code> instruction).
977</p>
978
979<p>
980Addressing modes:
981</p>
982
983<ul>
984
985<li>
986<code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>.
987It is a scaled mode as on the x86, but the only scale allowed is <code>1</code>.
988</li>
989
990</ul>
991
992<h3 id="mips">MIPS, MIPS64</h3>
993
994<p>
995General purpose registers are named <code>R0</code> through <code>R31</code>,
996floating point registers are <code>F0</code> through <code>F31</code>.
997</p>
998
999<p>
1000<code>R30</code> is reserved to point to <code>g</code>.
1001<code>R23</code> is used as a temporary register.
1002</p>
1003
1004<p>
1005In a <code>TEXT</code> directive, the frame size <code>$-4</code> for MIPS or
1006<code>$-8</code> for MIPS64 instructs the linker not to save <code>LR</code>.
1007</p>
1008
1009<p>
1010<code>SP</code> refers to the virtual stack pointer.
1011For the hardware register, use <code>R29</code>.
1012</p>
1013
1014<p>
1015Addressing modes:
1016</p>
1017
1018<ul>
1019
1020<li>
1021<code>16(R1)</code>: The location at <code>R1</code> plus 16.
1022</li>
1023
1024<li>
1025<code>(R1)</code>: Alias for <code>0(R1)</code>.
1026</li>
1027
1028</ul>
1029
1030<p>
1031The value of <code>GOMIPS</code> environment variable (<code>hardfloat</code> or
1032<code>softfloat</code>) is made available to assembly code by predefining either
1033<code>GOMIPS_hardfloat</code> or <code>GOMIPS_softfloat</code>.
1034</p>
1035
1036<p>
1037The value of <code>GOMIPS64</code> environment variable (<code>hardfloat</code> or
1038<code>softfloat</code>) is made available to assembly code by predefining either
1039<code>GOMIPS64_hardfloat</code> or <code>GOMIPS64_softfloat</code>.
1040</p>
1041
1042<h3 id="unsupported_opcodes">Unsupported opcodes</h3>
1043
1044<p>
1045The assemblers are designed to support the compiler so not all hardware instructions
1046are defined for all architectures: if the compiler doesn't generate it, it might not be there.
1047If you need to use a missing instruction, there are two ways to proceed.
1048One is to update the assembler to support that instruction, which is straightforward
1049but only worthwhile if it's likely the instruction will be used again.
1050Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
1051and <code>WORD</code> directives
1052to lay down explicit data into the instruction stream within a <code>TEXT</code>.
1053Here's how the 386 runtime defines the 64-bit atomic load function.
1054</p>
1055
1056<pre>
1057// uint64 atomicload64(uint64 volatile* addr);
1058// so actually
1059// void atomicload64(uint64 *res, uint64 volatile *addr);
1060TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
1061	MOVL	ptr+0(FP), AX
1062	TESTL	$7, AX
1063	JZ	2(PC)
1064	MOVL	0, AX // crash with nil ptr deref
1065	LEAL	ret_lo+4(FP), BX
1066	// MOVQ (%EAX), %MM0
1067	BYTE $0x0f; BYTE $0x6f; BYTE $0x00
1068	// MOVQ %MM0, 0(%EBX)
1069	BYTE $0x0f; BYTE $0x7f; BYTE $0x03
1070	// EMMS
1071	BYTE $0x0F; BYTE $0x77
1072	RET
1073</pre>
1074