1<!--{ 2 "Title": "A Quick Guide to Go's Assembler", 3 "Path": "/doc/asm" 4}--> 5 6<h2 id="introduction">A Quick Guide to Go's Assembler</h2> 7 8<p> 9This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler. 10The document is not comprehensive. 11</p> 12 13<p> 14The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail 15<a href="https://9p.io/sys/doc/asm.html">elsewhere</a>. 16If you plan to write assembly language, you should read that document although much of it is Plan 9-specific. 17The current document provides a summary of the syntax and the differences with 18what is explained in that document, and 19describes the peculiarities that apply when writing assembly code to interact with Go. 20</p> 21 22<p> 23The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine. 24Some of the details map precisely to the machine, but some do not. 25This is because the compiler suite (see 26<a href="https://9p.io/sys/doc/compiler.html">this description</a>) 27needs no assembler pass in the usual pipeline. 28Instead, the compiler operates on a kind of semi-abstract instruction set, 29and instruction selection occurs partly after code generation. 30The assembler works on the semi-abstract form, so 31when you see an instruction like <code>MOV</code> 32what the toolchain actually generates for that operation might 33not be a move instruction at all, perhaps a clear or load. 34Or it might correspond exactly to the machine instruction with that name. 35In general, machine-specific operations tend to appear as themselves, while more general concepts like 36memory move and subroutine call and return are more abstract. 37The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined. 38</p> 39 40<p> 41The assembler program is a way to parse a description of that 42semi-abstract instruction set and turn it into instructions to be 43input to the linker. 44If you want to see what the instructions look like in assembly for a given architecture, say amd64, there 45are many examples in the sources of the standard library, in packages such as 46<a href="/pkg/runtime/"><code>runtime</code></a> and 47<a href="/pkg/math/big/"><code>math/big</code></a>. 48You can also examine what the compiler emits as assembly code 49(the actual output may differ from what you see here): 50</p> 51 52<pre> 53$ cat x.go 54package main 55 56func main() { 57 println(3) 58} 59$ GOOS=linux GOARCH=amd64 go tool compile -S x.go # or: go build -gcflags -S x.go 60"".main STEXT size=74 args=0x0 locals=0x10 61 0x0000 00000 (x.go:3) TEXT "".main(SB), $16-0 62 0x0000 00000 (x.go:3) MOVQ (TLS), CX 63 0x0009 00009 (x.go:3) CMPQ SP, 16(CX) 64 0x000d 00013 (x.go:3) JLS 67 65 0x000f 00015 (x.go:3) SUBQ $16, SP 66 0x0013 00019 (x.go:3) MOVQ BP, 8(SP) 67 0x0018 00024 (x.go:3) LEAQ 8(SP), BP 68 0x001d 00029 (x.go:3) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 69 0x001d 00029 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 70 0x001d 00029 (x.go:3) FUNCDATA $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 71 0x001d 00029 (x.go:4) PCDATA $0, $0 72 0x001d 00029 (x.go:4) PCDATA $1, $0 73 0x001d 00029 (x.go:4) CALL runtime.printlock(SB) 74 0x0022 00034 (x.go:4) MOVQ $3, (SP) 75 0x002a 00042 (x.go:4) CALL runtime.printint(SB) 76 0x002f 00047 (x.go:4) CALL runtime.printnl(SB) 77 0x0034 00052 (x.go:4) CALL runtime.printunlock(SB) 78 0x0039 00057 (x.go:5) MOVQ 8(SP), BP 79 0x003e 00062 (x.go:5) ADDQ $16, SP 80 0x0042 00066 (x.go:5) RET 81 0x0043 00067 (x.go:5) NOP 82 0x0043 00067 (x.go:3) PCDATA $1, $-1 83 0x0043 00067 (x.go:3) PCDATA $0, $-1 84 0x0043 00067 (x.go:3) CALL runtime.morestack_noctxt(SB) 85 0x0048 00072 (x.go:3) JMP 0 86... 87</pre> 88 89<p> 90The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information 91for use by the garbage collector; they are introduced by the compiler. 92</p> 93 94<p> 95To see what gets put in the binary after linking, use <code>go tool objdump</code>: 96</p> 97 98<pre> 99$ go build -o x.exe x.go 100$ go tool objdump -s main.main x.exe 101TEXT main.main(SB) /tmp/x.go 102 x.go:3 0x10501c0 65488b0c2530000000 MOVQ GS:0x30, CX 103 x.go:3 0x10501c9 483b6110 CMPQ 0x10(CX), SP 104 x.go:3 0x10501cd 7634 JBE 0x1050203 105 x.go:3 0x10501cf 4883ec10 SUBQ $0x10, SP 106 x.go:3 0x10501d3 48896c2408 MOVQ BP, 0x8(SP) 107 x.go:3 0x10501d8 488d6c2408 LEAQ 0x8(SP), BP 108 x.go:4 0x10501dd e86e45fdff CALL runtime.printlock(SB) 109 x.go:4 0x10501e2 48c7042403000000 MOVQ $0x3, 0(SP) 110 x.go:4 0x10501ea e8e14cfdff CALL runtime.printint(SB) 111 x.go:4 0x10501ef e8ec47fdff CALL runtime.printnl(SB) 112 x.go:4 0x10501f4 e8d745fdff CALL runtime.printunlock(SB) 113 x.go:5 0x10501f9 488b6c2408 MOVQ 0x8(SP), BP 114 x.go:5 0x10501fe 4883c410 ADDQ $0x10, SP 115 x.go:5 0x1050202 c3 RET 116 x.go:3 0x1050203 e83882ffff CALL runtime.morestack_noctxt(SB) 117 x.go:3 0x1050208 ebb6 JMP main.main(SB) 118</pre> 119 120<h3 id="constants">Constants</h3> 121 122<p> 123Although the assembler takes its guidance from the Plan 9 assemblers, 124it is a distinct program, so there are some differences. 125One is in constant evaluation. 126Constant expressions in the assembler are parsed using Go's operator 127precedence, not the C-like precedence of the original. 128Thus <code>3&1<<2</code> is 4, not 0—it parses as <code>(3&1)<<2</code> 129not <code>3&(1<<2)</code>. 130Also, constants are always evaluated as 64-bit unsigned integers. 131Thus <code>-2</code> is not the integer value minus two, 132but the unsigned 64-bit integer with the same bit pattern. 133The distinction rarely matters but 134to avoid ambiguity, division or right shift where the right operand's 135high bit is set is rejected. 136</p> 137 138<h3 id="symbols">Symbols</h3> 139 140<p> 141Some symbols, such as <code>R1</code> or <code>LR</code>, 142are predefined and refer to registers. 143The exact set depends on the architecture. 144</p> 145 146<p> 147There are four predeclared symbols that refer to pseudo-registers. 148These are not real registers, but rather virtual registers maintained by 149the toolchain, such as a frame pointer. 150The set of pseudo-registers is the same for all architectures: 151</p> 152 153<ul> 154 155<li> 156<code>FP</code>: Frame pointer: arguments and locals. 157</li> 158 159<li> 160<code>PC</code>: Program counter: 161jumps and branches. 162</li> 163 164<li> 165<code>SB</code>: Static base pointer: global symbols. 166</li> 167 168<li> 169<code>SP</code>: Stack pointer: the highest address within the local stack frame. 170</li> 171 172</ul> 173 174<p> 175All user-defined symbols are written as offsets to the pseudo-registers 176<code>FP</code> (arguments and locals) and <code>SB</code> (globals). 177</p> 178 179<p> 180The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code> 181is the name <code>foo</code> as an address in memory. 182This form is used to name global functions and data. 183Adding <code><></code> to the name, as in <span style="white-space: nowrap"><code>foo<>(SB)</code></span>, makes the name 184visible only in the current source file, like a top-level <code>static</code> declaration in a C file. 185Adding an offset to the name refers to that offset from the symbol's address, so 186<code>foo+4(SB)</code> is four bytes past the start of <code>foo</code>. 187</p> 188 189<p> 190The <code>FP</code> pseudo-register is a virtual frame pointer 191used to refer to function arguments. 192The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register. 193Thus <code>0(FP)</code> is the first argument to the function, 194<code>8(FP)</code> is the second (on a 64-bit machine), and so on. 195However, when referring to a function argument this way, it is necessary to place a name 196at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>. 197(The meaning of the offset—offset from the frame pointer—distinct 198from its use with <code>SB</code>, where it is an offset from the symbol.) 199The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>. 200The actual name is semantically irrelevant but should be used to document 201the argument's name. 202It is worth stressing that <code>FP</code> is always a 203pseudo-register, not a hardware 204register, even on architectures with a hardware frame pointer. 205</p> 206 207<p> 208For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names 209and offsets match. 210On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding 211a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>. 212If a Go prototype does not name its result, the expected assembly name is <code>ret</code>. 213</p> 214 215<p> 216The <code>SP</code> pseudo-register is a virtual stack pointer 217used to refer to frame-local variables and the arguments being 218prepared for function calls. 219It points to the highest address within the local stack frame, so references should use negative offsets 220in the range [−framesize, 0): 221<code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on. 222</p> 223 224<p> 225On architectures with a hardware register named <code>SP</code>, 226the name prefix distinguishes 227references to the virtual stack pointer from references to the architectural 228<code>SP</code> register. 229That is, <code>x-8(SP)</code> and <code>-8(SP)</code> 230are different memory locations: 231the first refers to the virtual stack pointer pseudo-register, 232while the second refers to the 233hardware's <code>SP</code> register. 234</p> 235 236<p> 237On machines where <code>SP</code> and <code>PC</code> are 238traditionally aliases for a physical, numbered register, 239in the Go assembler the names <code>SP</code> and <code>PC</code> 240are still treated specially; 241for instance, references to <code>SP</code> require a symbol, 242much like <code>FP</code>. 243To access the actual hardware register use the true <code>R</code> name. 244For example, on the ARM architecture the hardware 245<code>SP</code> and <code>PC</code> are accessible as 246<code>R13</code> and <code>R15</code>. 247</p> 248 249<p> 250Branches and direct jumps are always written as offsets to the PC, or as 251jumps to labels: 252</p> 253 254<pre> 255label: 256 MOVW $0, R1 257 JMP label 258</pre> 259 260<p> 261Each label is visible only within the function in which it is defined. 262It is therefore permitted for multiple functions in a file to define 263and use the same label names. 264Direct jumps and call instructions can target text symbols, 265such as <code>name(SB)</code>, but not offsets from symbols, 266such as <code>name+4(SB)</code>. 267</p> 268 269<p> 270Instructions, registers, and assembler directives are always in UPPER CASE to remind you 271that assembly programming is a fraught endeavor. 272(Exception: the <code>g</code> register renaming on ARM.) 273</p> 274 275<p> 276In Go object files and binaries, the full name of a symbol is the 277package path followed by a period and the symbol name: 278<code>fmt.Printf</code> or <code>math/rand.Int</code>. 279Because the assembler's parser treats period and slash as punctuation, 280those strings cannot be used directly as identifier names. 281Instead, the assembler allows the middle dot character U+00B7 282and the division slash U+2215 in identifiers and rewrites them to 283plain period and slash. 284Within an assembler source file, the symbols above are written as 285<code>fmt·Printf</code> and <code>math∕rand·Int</code>. 286The assembly listings generated by the compilers when using the <code>-S</code> flag 287show the period and slash directly instead of the Unicode replacements 288required by the assemblers. 289</p> 290 291<p> 292Most hand-written assembly files do not include the full package path 293in symbol names, because the linker inserts the package path of the current 294object file at the beginning of any name starting with a period: 295in an assembly source file within the math/rand package implementation, 296the package's Int function can be referred to as <code>·Int</code>. 297This convention avoids the need to hard-code a package's import path in its 298own source code, making it easier to move the code from one location to another. 299</p> 300 301<h3 id="directives">Directives</h3> 302 303<p> 304The assembler uses various directives to bind text and data to symbol names. 305For example, here is a simple complete function definition. The <code>TEXT</code> 306directive declares the symbol <code>runtime·profileloop</code> and the instructions 307that follow form the body of the function. 308The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction. 309(If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.) 310After the symbol, the arguments are flags (see below) 311and the frame size, a constant (but see below): 312</p> 313 314<pre> 315TEXT runtime·profileloop(SB),NOSPLIT,$8 316 MOVQ $runtime·profileloop1(SB), CX 317 MOVQ CX, 0(SP) 318 CALL runtime·externalthreadhandler(SB) 319 RET 320</pre> 321 322<p> 323In the general case, the frame size is followed by an argument size, separated by a minus sign. 324(It's not a subtraction, just idiosyncratic syntax.) 325The frame size <code>$24-8</code> states that the function has a 24-byte frame 326and is called with 8 bytes of argument, which live on the caller's frame. 327If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>, 328the argument size must be provided. 329For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the 330argument size is correct. 331</p> 332 333<p> 334Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the 335static base pseudo-register <code>SB</code>. 336This function would be called from Go source for package <code>runtime</code> using the 337simple name <code>profileloop</code>. 338</p> 339 340<p> 341Global data symbols are defined by a sequence of initializing 342<code>DATA</code> directives followed by a <code>GLOBL</code> directive. 343Each <code>DATA</code> directive initializes a section of the 344corresponding memory. 345The memory not explicitly initialized is zeroed. 346The general form of the <code>DATA</code> directive is 347 348<pre> 349DATA symbol+offset(SB)/width, value 350</pre> 351 352<p> 353which initializes the symbol memory at the given offset and width with the given value. 354The <code>DATA</code> directives for a given symbol must be written with increasing offsets. 355</p> 356 357<p> 358The <code>GLOBL</code> directive declares a symbol to be global. 359The arguments are optional flags and the size of the data being declared as a global, 360which will have initial value all zeros unless a <code>DATA</code> directive 361has initialized it. 362The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives. 363</p> 364 365<p> 366For example, 367</p> 368 369<pre> 370DATA divtab<>+0x00(SB)/4, $0xf4f8fcff 371DATA divtab<>+0x04(SB)/4, $0xe6eaedf0 372... 373DATA divtab<>+0x3c(SB)/4, $0x81828384 374GLOBL divtab<>(SB), RODATA, $64 375 376GLOBL runtime·tlsoffset(SB), NOPTR, $4 377</pre> 378 379<p> 380declares and initializes <code>divtab<></code>, a read-only 64-byte table of 4-byte integer values, 381and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that 382contains no pointers. 383</p> 384 385<p> 386There may be one or two arguments to the directives. 387If there are two, the first is a bit mask of flags, 388which can be written as numeric expressions, added or or-ed together, 389or can be set symbolically for easier absorption by a human. 390Their values, defined in the standard <code>#include</code> file <code>textflag.h</code>, are: 391</p> 392 393<ul> 394<li> 395<code>NOPROF</code> = 1 396<br> 397(For <code>TEXT</code> items.) 398Don't profile the marked function. This flag is deprecated. 399</li> 400<li> 401<code>DUPOK</code> = 2 402<br> 403It is legal to have multiple instances of this symbol in a single binary. 404The linker will choose one of the duplicates to use. 405</li> 406<li> 407<code>NOSPLIT</code> = 4 408<br> 409(For <code>TEXT</code> items.) 410Don't insert the preamble to check if the stack must be split. 411The frame for the routine, plus anything it calls, must fit in the 412spare space remaining in the current stack segment. 413Used to protect routines such as the stack splitting code itself. 414</li> 415<li> 416<code>RODATA</code> = 8 417<br> 418(For <code>DATA</code> and <code>GLOBL</code> items.) 419Put this data in a read-only section. 420</li> 421<li> 422<code>NOPTR</code> = 16 423<br> 424(For <code>DATA</code> and <code>GLOBL</code> items.) 425This data contains no pointers and therefore does not need to be 426scanned by the garbage collector. 427</li> 428<li> 429<code>WRAPPER</code> = 32 430<br> 431(For <code>TEXT</code> items.) 432This is a wrapper function and should not count as disabling <code>recover</code>. 433</li> 434<li> 435<code>NEEDCTXT</code> = 64 436<br> 437(For <code>TEXT</code> items.) 438This function is a closure so it uses its incoming context register. 439</li> 440<li> 441<code>LOCAL</code> = 128 442<br> 443This symbol is local to the dynamic shared object. 444</li> 445<li> 446<code>TLSBSS</code> = 256 447<br> 448(For <code>DATA</code> and <code>GLOBL</code> items.) 449Put this data in thread local storage. 450</li> 451<li> 452<code>NOFRAME</code> = 512 453<br> 454(For <code>TEXT</code> items.) 455Do not insert instructions to allocate a stack frame and save/restore the return 456address, even if this is not a leaf function. 457Only valid on functions that declare a frame size of 0. 458</li> 459<li> 460<code>TOPFRAME</code> = 2048 461<br> 462(For <code>TEXT</code> items.) 463Function is the outermost frame of the call stack. Traceback should stop at this function. 464</li> 465</ul> 466 467<h3 id="special-instructions">Special instructions</h3> 468 469<p> 470The <code>PCALIGN</code> pseudo-instruction is used to indicate that the next instruction should be aligned 471to a specified boundary by padding with no-op instructions. 472</p> 473 474<p> 475It is currently supported on arm64, amd64, ppc64, loong64 and riscv64. 476 477For example, the start of the <code>MOVD</code> instruction below is aligned to 32 bytes: 478<pre> 479PCALIGN $32 480MOVD $2, R0 481</pre> 482</p> 483 484<h3 id="data-offsets">Interacting with Go types and constants</h3> 485 486<p> 487If a package has any .s files, then <code>go build</code> will direct 488the compiler to emit a special header called <code>go_asm.h</code>, 489which the .s files can then <code>#include</code>. 490The file contains symbolic <code>#define</code> constants for the 491offsets of Go struct fields, the sizes of Go struct types, and most 492Go <code>const</code> declarations defined in the current package. 493Go assembly should avoid making assumptions about the layout of Go 494types and instead use these constants. 495This improves the readability of assembly code, and keeps it robust to 496changes in data layout either in the Go type definitions or in the 497layout rules used by the Go compiler. 498</p> 499 500<p> 501Constants are of the form <code>const_<i>name</i></code>. 502For example, given the Go declaration <code>const bufSize = 5031024</code>, assembly code can refer to the value of this constant 504as <code>const_bufSize</code>. 505</p> 506 507<p> 508Field offsets are of the form <code><i>type</i>_<i>field</i></code>. 509Struct sizes are of the form <code><i>type</i>__size</code>. 510For example, consider the following Go definition: 511</p> 512 513<pre> 514type reader struct { 515 buf [bufSize]byte 516 r int 517} 518</pre> 519 520<p> 521Assembly can refer to the size of this struct 522as <code>reader__size</code> and the offsets of the two fields 523as <code>reader_buf</code> and <code>reader_r</code>. 524Hence, if register <code>R1</code> contains a pointer to 525a <code>reader</code>, assembly can reference the <code>r</code> field 526as <code>reader_r(R1)</code>. 527</p> 528 529<p> 530If any of these <code>#define</code> names are ambiguous (for example, 531a struct with a <code>_size</code> field), <code>#include 532"go_asm.h"</code> will fail with a "redefinition of macro" error. 533</p> 534 535<h3 id="runtime">Runtime Coordination</h3> 536 537<p> 538For garbage collection to run correctly, the runtime must know the 539location of pointers in all global data and in most stack frames. 540The Go compiler emits this information when compiling Go source files, 541but assembly programs must define it explicitly. 542</p> 543 544<p> 545A data symbol marked with the <code>NOPTR</code> flag (see above) 546is treated as containing no pointers to runtime-allocated data. 547A data symbol with the <code>RODATA</code> flag 548is allocated in read-only memory and is therefore treated 549as implicitly marked <code>NOPTR</code>. 550A data symbol with a total size smaller than a pointer 551is also treated as implicitly marked <code>NOPTR</code>. 552It is not possible to define a symbol containing pointers in an assembly source file; 553such a symbol must be defined in a Go source file instead. 554Assembly source can still refer to the symbol by name 555even without <code>DATA</code> and <code>GLOBL</code> directives. 556A good general rule of thumb is to define all non-<code>RODATA</code> 557symbols in Go instead of in assembly. 558</p> 559 560<p> 561Each function also needs annotations giving the location of 562live pointers in its arguments, results, and local stack frame. 563For an assembly function with no pointer results and 564either no local stack frame or no function calls, 565the only requirement is to define a Go prototype for the function 566in a Go source file in the same package. The name of the assembly 567function must not contain the package name component (for example, 568function <code>Syscall</code> in package <code>syscall</code> should 569use the name <code>·Syscall</code> instead of the equivalent name 570<code>syscall·Syscall</code> in its <code>TEXT</code> directive). 571For more complex situations, explicit annotation is needed. 572These annotations use pseudo-instructions defined in the standard 573<code>#include</code> file <code>funcdata.h</code>. 574</p> 575 576<p> 577If a function has no arguments and no results, 578the pointer information can be omitted. 579This is indicated by an argument size annotation of <code>$<i>n</i>-0</code> 580on the <code>TEXT</code> instruction. 581Otherwise, pointer information must be provided by 582a Go prototype for the function in a Go source file, 583even for assembly functions not called directly from Go. 584(The prototype will also let <code>go</code> <code>vet</code> check the argument references.) 585At the start of the function, the arguments are assumed 586to be initialized but the results are assumed uninitialized. 587If the results will hold live pointers during a call instruction, 588the function should start by zeroing the results and then 589executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>. 590This instruction records that the results are now initialized 591and should be scanned during stack movement and garbage collection. 592It is typically easier to arrange that assembly functions do not 593return pointers or do not contain call instructions; 594no assembly functions in the standard library use 595<code>GO_RESULTS_INITIALIZED</code>. 596</p> 597 598<p> 599If a function has no local stack frame, 600the pointer information can be omitted. 601This is indicated by a local frame size annotation of <code>$0-<i>n</i></code> 602on the <code>TEXT</code> instruction. 603The pointer information can also be omitted if the 604function contains no call instructions. 605Otherwise, the local stack frame must not contain pointers, 606and the assembly must confirm this fact by executing the 607pseudo-instruction <code>NO_LOCAL_POINTERS</code>. 608Because stack resizing is implemented by moving the stack, 609the stack pointer may change during any function call: 610even pointers to stack data must not be kept in local variables. 611</p> 612 613<p> 614Assembly functions should always be given Go prototypes, 615both to provide pointer information for the arguments and results 616and to let <code>go</code> <code>vet</code> check that 617the offsets being used to access them are correct. 618</p> 619 620<h2 id="architectures">Architecture-specific details</h2> 621 622<p> 623It is impractical to list all the instructions and other details for each machine. 624To see what instructions are defined for a given machine, say ARM, 625look in the source for the <code>obj</code> support library for 626that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>. 627In that directory is a file <code>a.out.go</code>; it contains 628a long list of constants starting with <code>A</code>, like this: 629</p> 630 631<pre> 632const ( 633 AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota 634 AEOR 635 ASUB 636 ARSB 637 AADD 638 ... 639</pre> 640 641<p> 642This is the list of instructions and their spellings as known to the assembler and linker for that architecture. 643Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code> 644represents the bitwise and instruction, 645<code>AND</code> (without the leading <code>A</code>), 646and is written in assembly source as <code>AND</code>. 647The enumeration is mostly in alphabetical order. 648(The architecture-independent <code>AXXX</code>, defined in the 649<code>cmd/internal/obj</code> package, 650represents an invalid instruction). 651The sequence of the <code>A</code> names has nothing to do with the actual 652encoding of the machine instructions. 653The <code>cmd/internal/obj</code> package takes care of that detail. 654</p> 655 656<p> 657The instructions for both the 386 and AMD64 architectures are listed in 658<code>cmd/internal/obj/x86/a.out.go</code>. 659</p> 660 661<p> 662The architectures share syntax for common addressing modes such as 663<code>(R1)</code> (register indirect), 664<code>4(R1)</code> (register indirect with offset), and 665<code>$foo(SB)</code> (absolute address). 666The assembler also supports some (not necessarily all) addressing modes 667specific to each architecture. 668The sections below list these. 669</p> 670 671<p> 672One detail evident in the examples from the previous sections is that data in the instructions flows from left to right: 673<code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>. 674This rule applies even on architectures where the conventional notation uses the opposite direction. 675</p> 676 677<p> 678Here follow some descriptions of key Go-specific details for the supported architectures. 679</p> 680 681<h3 id="x86">32-bit Intel 386</h3> 682 683<p> 684The runtime pointer to the <code>g</code> structure is maintained 685through the value of an otherwise unused (as far as Go is concerned) register in the MMU. 686In the runtime package, assembly code can include <code>go_tls.h</code>, which defines 687an OS- and architecture-dependent macro <code>get_tls</code> for accessing this register. 688The <code>get_tls</code> macro takes one argument, which is the register to load the 689<code>g</code> pointer into. 690</p> 691 692<p> 693For example, the sequence to load <code>g</code> and <code>m</code> 694using <code>CX</code> looks like this: 695</p> 696 697<pre> 698#include "go_tls.h" 699#include "go_asm.h" 700... 701get_tls(CX) 702MOVL g(CX), AX // Move g into AX. 703MOVL g_m(AX), BX // Move g.m into BX. 704</pre> 705 706<p> 707The <code>get_tls</code> macro is also defined on <a href="#amd64">amd64</a>. 708</p> 709 710<p> 711Addressing modes: 712</p> 713 714<ul> 715 716<li> 717<code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>. 718</li> 719 720<li> 721<code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64. 722These modes accept only 1, 2, 4, and 8 as scale factors. 723</li> 724 725</ul> 726 727<p> 728When using the compiler and assembler's 729<code>-dynlink</code> or <code>-shared</code> modes, 730any load or store of a fixed memory location such as a global variable 731must be assumed to overwrite <code>CX</code>. 732Therefore, to be safe for use with these modes, 733assembly sources should typically avoid CX except between memory references. 734</p> 735 736<h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3> 737 738<p> 739The two architectures behave largely the same at the assembler level. 740Assembly code to access the <code>m</code> and <code>g</code> 741pointers on the 64-bit version is the same as on the 32-bit 386, 742except it uses <code>MOVQ</code> rather than <code>MOVL</code>: 743</p> 744 745<pre> 746get_tls(CX) 747MOVQ g(CX), AX // Move g into AX. 748MOVQ g_m(AX), BX // Move g.m into BX. 749</pre> 750 751<p> 752Register <code>BP</code> is callee-save. 753The assembler automatically inserts <code>BP</code> save/restore when frame size is larger than zero. 754Using <code>BP</code> as a general purpose register is allowed, 755however it can interfere with sampling-based profiling. 756</p> 757 758<h3 id="arm">ARM</h3> 759 760<p> 761The registers <code>R10</code> and <code>R11</code> 762are reserved by the compiler and linker. 763</p> 764 765<p> 766<code>R10</code> points to the <code>g</code> (goroutine) structure. 767Within assembler source code, this pointer must be referred to as <code>g</code>; 768the name <code>R10</code> is not recognized. 769</p> 770 771<p> 772To make it easier for people and compilers to write assembly, the ARM linker 773allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code> 774that may not be expressible using a single hardware instruction. 775It implements these forms as multiple instructions, often using the <code>R11</code> register 776to hold temporary values. 777Hand-written assembly can use <code>R11</code>, but doing so requires 778being sure that the linker is not also using it to implement any of the other 779instructions in the function. 780</p> 781 782<p> 783When defining a <code>TEXT</code>, specifying frame size <code>$-4</code> 784tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry. 785</p> 786 787<p> 788The name <code>SP</code> always refers to the virtual stack pointer described earlier. 789For the hardware register, use <code>R13</code>. 790</p> 791 792<p> 793Condition code syntax is to append a period and the one- or two-letter code to the instruction, 794as in <code>MOVW.EQ</code>. 795Multiple codes may be appended: <code>MOVM.IA.W</code>. 796The order of the code modifiers is irrelevant. 797</p> 798 799<p> 800Addressing modes: 801</p> 802 803<ul> 804 805<li> 806<code>R0->16</code> 807<br> 808<code>R0>>16</code> 809<br> 810<code>R0<<16</code> 811<br> 812<code>R0@>16</code>: 813For <code><<</code>, left shift <code>R0</code> by 16 bits. 814The other codes are <code>-></code> (arithmetic right shift), 815<code>>></code> (logical right shift), and 816<code>@></code> (rotate right). 817</li> 818 819<li> 820<code>R0->R1</code> 821<br> 822<code>R0>>R1</code> 823<br> 824<code>R0<<R1</code> 825<br> 826<code>R0@>R1</code>: 827For <code><<</code>, left shift <code>R0</code> by the count in <code>R1</code>. 828The other codes are <code>-></code> (arithmetic right shift), 829<code>>></code> (logical right shift), and 830<code>@></code> (rotate right). 831 832</li> 833 834<li> 835<code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising 836<code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive. 837</li> 838 839<li> 840<code>(R5, R6)</code>: Destination register pair. 841</li> 842 843</ul> 844 845<h3 id="arm64">ARM64</h3> 846 847<p> 848<code>R18</code> is the "platform register", reserved on the Apple platform. 849To prevent accidental misuse, the register is named <code>R18_PLATFORM</code>. 850<code>R27</code> and <code>R28</code> are reserved by the compiler and linker. 851<code>R29</code> is the frame pointer. 852<code>R30</code> is the link register. 853</p> 854 855<p> 856Instruction modifiers are appended to the instruction following a period. 857The only modifiers are <code>P</code> (postincrement) and <code>W</code> 858(preincrement): 859<code>MOVW.P</code>, <code>MOVW.W</code> 860</p> 861 862<p> 863Addressing modes: 864</p> 865 866<ul> 867 868<li> 869<code>R0->16</code> 870<br> 871<code>R0>>16</code> 872<br> 873<code>R0<<16</code> 874<br> 875<code>R0@>16</code>: 876These are the same as on the 32-bit ARM. 877</li> 878 879<li> 880<code>$(8<<12)</code>: 881Left shift the immediate value <code>8</code> by <code>12</code> bits. 882</li> 883 884<li> 885<code>8(R0)</code>: 886Add the value of <code>R0</code> and <code>8</code>. 887</li> 888 889<li> 890<code>(R2)(R0)</code>: 891The location at <code>R0</code> plus <code>R2</code>. 892</li> 893 894<li> 895<code>R0.UXTB</code> 896<br> 897<code>R0.UXTB<<imm</code>: 898<code>UXTB</code>: extract an 8-bit value from the low-order bits of <code>R0</code> and zero-extend it to the size of <code>R0</code>. 899<code>R0.UXTB<<imm</code>: left shift the result of <code>R0.UXTB</code> by <code>imm</code> bits. 900The <code>imm</code> value can be 0, 1, 2, 3, or 4. 901The other extensions include <code>UXTH</code> (16-bit), <code>UXTW</code> (32-bit), and <code>UXTX</code> (64-bit). 902</li> 903 904<li> 905<code>R0.SXTB</code> 906<br> 907<code>R0.SXTB<<imm</code>: 908<code>SXTB</code>: extract an 8-bit value from the low-order bits of <code>R0</code> and sign-extend it to the size of <code>R0</code>. 909<code>R0.SXTB<<imm</code>: left shift the result of <code>R0.SXTB</code> by <code>imm</code> bits. 910The <code>imm</code> value can be 0, 1, 2, 3, or 4. 911The other extensions include <code>SXTH</code> (16-bit), <code>SXTW</code> (32-bit), and <code>SXTX</code> (64-bit). 912</li> 913 914<li> 915<code>(R5, R6)</code>: Register pair for <code>LDAXP</code>/<code>LDP</code>/<code>LDXP</code>/<code>STLXP</code>/<code>STP</code>/<code>STP</code>. 916</li> 917 918</ul> 919 920<p> 921Reference: <a href="/pkg/cmd/internal/obj/arm64">Go ARM64 Assembly Instructions Reference Manual</a> 922</p> 923 924<h3 id="ppc64">PPC64</h3> 925 926<p> 927This assembler is used by GOARCH values ppc64 and ppc64le. 928</p> 929 930<p> 931Reference: <a href="/pkg/cmd/internal/obj/ppc64">Go PPC64 Assembly Instructions Reference Manual</a> 932</p> 933 934<h3 id="s390x">IBM z/Architecture, a.k.a. s390x</h3> 935 936<p> 937The registers <code>R10</code> and <code>R11</code> are reserved. 938The assembler uses them to hold temporary values when assembling some instructions. 939</p> 940 941<p> 942<code>R13</code> points to the <code>g</code> (goroutine) structure. 943This register must be referred to as <code>g</code>; the name <code>R13</code> is not recognized. 944</p> 945 946<p> 947<code>R15</code> points to the stack frame and should typically only be accessed using the 948virtual registers <code>SP</code> and <code>FP</code>. 949</p> 950 951<p> 952Load- and store-multiple instructions operate on a range of registers. 953The range of registers is specified by a start register and an end register. 954For example, <code>LMG</code> <code>(R9),</code> <code>R5,</code> <code>R7</code> would load 955<code>R5</code>, <code>R6</code> and <code>R7</code> with the 64-bit values at 956<code>0(R9)</code>, <code>8(R9)</code> and <code>16(R9)</code> respectively. 957</p> 958 959<p> 960Storage-and-storage instructions such as <code>MVC</code> and <code>XC</code> are written 961with the length as the first argument. 962For example, <code>XC</code> <code>$8,</code> <code>(R9),</code> <code>(R9)</code> would clear 963eight bytes at the address specified in <code>R9</code>. 964</p> 965 966<p> 967If a vector instruction takes a length or an index as an argument then it will be the 968first argument. 969For example, <code>VLEIF</code> <code>$1,</code> <code>$16,</code> <code>V2</code> will load 970the value sixteen into index one of <code>V2</code>. 971Care should be taken when using vector instructions to ensure that they are available at 972runtime. 973To use vector instructions a machine must have both the vector facility (bit 129 in the 974facility list) and kernel support. 975Without kernel support a vector instruction will have no effect (it will be equivalent 976to a <code>NOP</code> instruction). 977</p> 978 979<p> 980Addressing modes: 981</p> 982 983<ul> 984 985<li> 986<code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>. 987It is a scaled mode as on the x86, but the only scale allowed is <code>1</code>. 988</li> 989 990</ul> 991 992<h3 id="mips">MIPS, MIPS64</h3> 993 994<p> 995General purpose registers are named <code>R0</code> through <code>R31</code>, 996floating point registers are <code>F0</code> through <code>F31</code>. 997</p> 998 999<p> 1000<code>R30</code> is reserved to point to <code>g</code>. 1001<code>R23</code> is used as a temporary register. 1002</p> 1003 1004<p> 1005In a <code>TEXT</code> directive, the frame size <code>$-4</code> for MIPS or 1006<code>$-8</code> for MIPS64 instructs the linker not to save <code>LR</code>. 1007</p> 1008 1009<p> 1010<code>SP</code> refers to the virtual stack pointer. 1011For the hardware register, use <code>R29</code>. 1012</p> 1013 1014<p> 1015Addressing modes: 1016</p> 1017 1018<ul> 1019 1020<li> 1021<code>16(R1)</code>: The location at <code>R1</code> plus 16. 1022</li> 1023 1024<li> 1025<code>(R1)</code>: Alias for <code>0(R1)</code>. 1026</li> 1027 1028</ul> 1029 1030<p> 1031The value of <code>GOMIPS</code> environment variable (<code>hardfloat</code> or 1032<code>softfloat</code>) is made available to assembly code by predefining either 1033<code>GOMIPS_hardfloat</code> or <code>GOMIPS_softfloat</code>. 1034</p> 1035 1036<p> 1037The value of <code>GOMIPS64</code> environment variable (<code>hardfloat</code> or 1038<code>softfloat</code>) is made available to assembly code by predefining either 1039<code>GOMIPS64_hardfloat</code> or <code>GOMIPS64_softfloat</code>. 1040</p> 1041 1042<h3 id="unsupported_opcodes">Unsupported opcodes</h3> 1043 1044<p> 1045The assemblers are designed to support the compiler so not all hardware instructions 1046are defined for all architectures: if the compiler doesn't generate it, it might not be there. 1047If you need to use a missing instruction, there are two ways to proceed. 1048One is to update the assembler to support that instruction, which is straightforward 1049but only worthwhile if it's likely the instruction will be used again. 1050Instead, for simple one-off cases, it's possible to use the <code>BYTE</code> 1051and <code>WORD</code> directives 1052to lay down explicit data into the instruction stream within a <code>TEXT</code>. 1053Here's how the 386 runtime defines the 64-bit atomic load function. 1054</p> 1055 1056<pre> 1057// uint64 atomicload64(uint64 volatile* addr); 1058// so actually 1059// void atomicload64(uint64 *res, uint64 volatile *addr); 1060TEXT runtime·atomicload64(SB), NOSPLIT, $0-12 1061 MOVL ptr+0(FP), AX 1062 TESTL $7, AX 1063 JZ 2(PC) 1064 MOVL 0, AX // crash with nil ptr deref 1065 LEAL ret_lo+4(FP), BX 1066 // MOVQ (%EAX), %MM0 1067 BYTE $0x0f; BYTE $0x6f; BYTE $0x00 1068 // MOVQ %MM0, 0(%EBX) 1069 BYTE $0x0f; BYTE $0x7f; BYTE $0x03 1070 // EMMS 1071 BYTE $0x0F; BYTE $0x77 1072 RET 1073</pre> 1074