1# Introduction 2 3Given a minidump file, the Breakpad processor produces stack traces that include 4function names and source locations. However, minidump files contain only the 5byte-by-byte contents of threads' registers and stacks, without function names 6or machine-code-to-source mapping data. The processor consults Breakpad symbol 7files for the information it needs to produce human-readable stack traces from 8the binary-only minidump file. 9 10The platform-specific symbol dumping tools parse the debugging information the 11compiler provides (whether as DWARF or STABS sections in an ELF file or as 12stand-alone PDB files), and write that information back out in the Breakpad 13symbol file format. This format is much simpler and less detailed than compiler 14debugging information, and values legibility over compactness. 15 16# Overview 17 18Breakpad symbol files are ASCII text files, with lines delimited as appropriate 19for the host platform. Each line is a _record_, divided into fields by single 20spaces; in some cases, the last field of the record can contain spaces. The 21first field is a string indicating what sort of record the line represents 22(except for line records; these are very common, making them the default saves 23space). Some fields hold decimal or hexadecimal numbers; hexadecimal numbers 24have no "0x" prefix, and use lower-case letters. 25 26Breakpad symbol files contain the following record types. With some 27restrictions, these may appear in any order. 28 29* A `MODULE` record describes the executable file or shared library from which 30 this data was derived, for use by symbol suppliers. A `MODULE' record should 31 be the first record in the file. 32 33* A `FILE` record gives a source file name, and assigns it a number by which 34 other records can refer to it. 35 36* A `FUNC` record describes a function present in the source code. 37 38* A line record indicates to which source file and line a given range of 39 machine code should be attributed. The line is attributed to the function 40 defined by the most recent `FUNC` record. 41 42* A `PUBLIC` record gives the address of a linker symbol. 43 44* A `STACK` record provides information necessary to produce stack traces. 45 46# `MODULE` records 47 48A `MODULE` record provides meta-information about the module the symbol file 49describes. It has the form: 50 51> `MODULE` _operatingsystem_ _architecture_ _id_ _name_ 52 53For example: `MODULE Linux x86 D3096ED481217FD4C16B29CD9BC208BA0 firefox-bin 54` These records provide meta-information about the executable or shared library 55from which this symbol file was generated. A symbol supplier might use this 56information to find the correct symbol files to use to interpret a given 57minidump, or to perform other sorts of validation. If present, a `MODULE` record 58should be the first line in the file. 59 60The fields are separated by spaces, and cannot contain spaces themselves, except 61for _name_. 62 63* The _operatingsystem_ field names the operating system on which the 64 executable or shared library was intended to run. This field should have one 65 of the following values: | **Value** | **Meaning** | 66 |:----------|:--------------------| | Linux | Linux | | mac | Macintosh OSX 67 | | windows | Microsoft Windows | 68 69* The _architecture_ field indicates what processor architecture the 70 executable or shared library contains machine code for. This field should 71 have one of the following values: | **Value** | **Instruction Set 72 Architecture** | |:----------|:---------------------------------| | x86 | 73 Intel IA-32 | | x86\_64 | AMD64/Intel 64 | | ppc | 32-bit PowerPC | | ppc64 74 | 64-bit PowerPC | | unknown | unknown | 75 76* The _id_ field is a sequence of hexadecimal digits that identifies the exact 77 executable or library whose contents the symbol file describes. The way in 78 which it is computed varies from platform to platform. 79 80* The _name_ field contains the base name (the final component of the 81 directory path) of the executable or library. It may contain spaces, and 82 extends to the end of the line. 83 84# `FILE` records 85 86A `FILE` record holds a source file name for other records to refer to. It has 87the form: 88 89> `FILE` _number_ _name_ 90 91For example: `FILE 2 /home/jimb/mc/in/browser/app/nsBrowserApp.cpp 92` 93 94A `FILE` record provides the name of a source file, and assigns it a number 95which other records (line records, in particular) can use to refer to that file 96name. The _number_ field is a decimal number. The _name_ field is the name of 97the file; it may contain spaces. 98 99# `FUNC` records 100 101A `FUNC` record describes a source-language function. It has the form: 102 103> `FUNC` _[m]_ _address_ _size_ _parameter\_size_ _name_ 104 105For example: `FUNC m c184 30 0 nsQueryInterfaceWithError::operator()(nsID const&, 106void**) const 107` 108 109The _m_ field is optional. If present it indicates that multiple symbols 110reference this function's instructions. (In which case, only one symbol name is 111mentioned within the breakpad file.) Multiple symbols referencing the same 112instructions may occur due to identical code folding by the linker. 113 114The _address_ and _size_ fields are hexadecimal numbers indicating the start 115address and length in bytes of the machine code instructions the function 116occupies. (Breakpad symbol files cannot accurately describe functions whose code 117is not contiguous.) The start address is relative to the module's load address. 118 119The _parameter\_size_ field is a hexadecimal number indicating the size, in 120bytes, of the arguments pushed on the stack for this function. Some calling 121conventions, like the Microsoft Windows `stdcall` convention, require the called 122function to pop parameters passed to it on the stack from its caller before 123returning. The stack walker uses this value, along with data from `STACK` 124records, to step from the called function's frame to the caller's frame. 125 126The _name_ field is the name of the function. In languages that use linker 127symbol name mangling like C++, this should be the source language name (the 128"unmangled" form). This field may contain spaces. 129 130# Line records 131 132A line record describes the source file and line number to which a given range 133of machine code should be attributed. It has the form: 134 135> _address_ _size_ _line_ _filenum_ 136 137For example: `c184 7 59 4 138` 139 140Because they are so common, line records do not begin with a string indicating 141the record type. All other record types' names use upper-case letters; 142hexadecimal numbers, like a line record's _address_, use lower-case letters. 143 144The _address_ and _size_ fields are hexadecimal numbers indicating the start 145address and length in bytes of the machine code. The address is relative to the 146module's load address. 147 148The _line_ field is the line number to which the machine code should be 149attributed, in decimal; the first line of the source file is line number 1. The 150_filenum_ field is a decimal number appearing in a prior `FILE` record; the name 151given in that record is the source file name for the machine code. 152 153The line is assumed to belong to the function described by the last preceding 154`FUNC` record. Line records may not appear before the first `FUNC' record. 155 156No two line records in a symbol file cover the same range of addresses. However, 157there may be many line records with identical line and file numbers, as a given 158source line may contribute many non-contiguous blocks of machine code. 159 160# `PUBLIC` records 161 162A `PUBLIC` record describes a publicly visible linker symbol, such as that used 163to identify an assembly language entry point or region of memory. It has the 164form: 165 166> PUBLIC _[m]_ _address_ _parameter\_size_ _name_ 167 168For example: `PUBLIC m 2160 0 Public2_1 169` 170 171The Breakpad processor essentially treats a `PUBLIC` record as defining a 172function with no line number data and an indeterminate size: the code extends to 173the next address mentioned. If a given address is covered by both a `PUBLIC` 174record and a `FUNC` record, the processor uses the `FUNC` data. 175 176The _m_ field is optional. If present it indicates that multiple symbols 177reference this function's instructions. (In which case, only one symbol name is 178mentioned within the breakpad file.) Multiple symbols referencing the same 179instructions may occur due to identical code folding by the linker. 180 181The _address_ field is a hexadecimal number indicating the symbol's address, 182relative to the module's load address. 183 184The _parameter\_size_ field is a hexadecimal number indicating the size of the 185parameters passed to the code whose entry point the symbol marks, if known. This 186field has the same meaning as the _parameter\_size_ field of a `FUNC` record; 187see that description for more details. 188 189The _name_ field is the name of the symbol. In languages that use linker symbol 190name mangling like C++, this should be the source language name (the "unmangled" 191form). This field may contain spaces. 192 193# `STACK WIN` records 194 195Given a stack frame, a `STACK WIN` record indicates how to find the frame that 196called it. It has the form: 197 198> STACK WIN _type_ _rva_ _code\_size_ _prologue\_size_ _epilogue\_size_ 199> _parameter\_size_ _saved\_register\_size_ _local\_size_ _max\_stack\_size_ 200> _has\_program\_string_ _program\_string\_OR\_allocates\_base\_pointer_ 201 202For example: `STACK WIN 4 2170 14 1 0 0 0 0 0 1 $eip 4 + ^ = $esp $ebp 8 + = 203$ebp $ebp ^ = 204` 205 206All fields of a `STACK WIN` record, except for the last, are hexadecimal 207numbers. 208 209The _type_ field indicates what sort of stack frame data this record holds. Its 210value should be one of the values of the 211[StackFrameTypeEnum](http://msdn.microsoft.com/en-us/library/bc5207xw%28VS.100%29.aspx) 212type in Microsoft's 213[Debug Interface Access (DIA)](http://msdn.microsoft.com/en-us/library/x93ctkx8%28VS.100%29.aspx) API. 214Breakpad uses only records of type 4 (`FrameTypeFrameData`) and 0 215(`FrameTypeFPO`); it ignores others. These types differ only in whether the last 216field is an _allocates\_base\_pointer_ flag (`FrameTypeFPO`) or a program string 217(`FrameTypeFrameData`). If more than one record covers a given address, Breakpad 218prefers `FrameTypeFrameData` records over `FrameTypeFPO` records. 219 220The _rva_ and _code\_size_ fields give the starting address and length in bytes 221of the machine code covered by this record. The starting address is relative to 222the module's load address. 223 224The _prologue\_size_ and _epilogue\_size_ fields give the length, in bytes, of 225the prologue and epilogue machine code within the record's range. Breakpad does 226not use these values. 227 228The _parameter\_size_ field gives the number of argument bytes this function 229expects to have been passed. This field has the same meaning as the 230_parameter\_size_ field of a `FUNC` record; see that description for more 231details. 232 233The _saved\_register\_size_ field gives the number of bytes in the stack frame 234dedicated to preserving the values of any callee-saves registers used by this 235function. 236 237The _local\_size_ field gives the number of bytes in the stack frame dedicated 238to holding the function's local variables and temporary values. 239 240The _max\_stack\_size_ field gives the maximum number of bytes pushed on the 241stack in the frame. Breakpad does not use this value. 242 243If the _has\_program\_string_ field is zero, then the `STACK WIN` record's final 244field is an _allocates\_base\_pointer_ flag, as a hexadecimal number; this is 245expected for records whose _type_ is 0. Otherwise, the final field is a program 246string. 247 248## Interpreting a `STACK WIN` record 249 250Given the register values for a frame F, we can find the calling frame as 251follows: 252 253* If the _has\_program\_string_ field of a `STACK WIN` record is zero, then 254 the final field is _allocates\_base\_pointer_, a flag indicating whether the 255 frame uses the frame pointer register, `%ebp`, as a general-purpose 256 register. 257 * If _allocates\_base\_pointer_ is true, then `%ebp` does not point to the 258 frame's base address. Instead, 259 * Let _next\_parameter\_size_ be the parameter size of the function 260 frame F called (**not** this record's _parameter\_size_ field), or 261 zero if F is the youngest frame on the stack. You must find this 262 value in F's callee's `FUNC`, `STACK WIN`, or `PUBLIC` records. 263 * Let _frame\_size_ be the sum of the _local\_size_ field, the 264 _saved\_register\_size_ field, and _next\_parameter\_size_. > > With 265 those definitions in place, we can recover the calling frame as 266 follows: 267 * F's return address is at `%esp +`_frame\_size_, 268 * the caller's value of `%ebp` is saved at `%esp 269 +`_next\_parameter\_size_`+`_saved\_register\_size_`- 8`, and 270 * the caller's value of `%esp` just before the call instruction was 271 `%esp +`_frame\_size_`+ 4`. > > (Why do we include 272 _next\_parameter\_size_ in the sum when computing _frame\_size_ and 273 the address of the saved `%ebp`? When a function A has called a 274 function B, the arguments that A pushed for B are considered part of 275 A's stack frame: A's value for `%esp` points at the last argument 276 pushed for B. Thus, we must include the size of those arguments 277 (given by the debugging info for B) along with the size of A's 278 register save area and local variable area (given by the debugging 279 info for A) when computing the overall size of A's frame.) 280 * If _allocates\_base\_pointer_ is false, then F's function doesn't use 281 `%ebp` at all. You may recover the calling frame as above, except that 282 the caller's value of `%ebp` is the same as F's value for `%ebp`, so no 283 steps are necessary to recover it. 284* If the _has\_program\_string_ field of a `STACK WIN` record is not zero, 285 then the record's final field is a string containing a program to be 286 interpreted to recover the caller's frame. The comments in the 287 [postfix\_evaluator.h](../src/processor/postfix_evaluator.h#40) 288 header file explain the language in which the program is written. You should 289 place the following variables in the dictionary before interpreting the 290 program: 291 * `$ebp` and `$esp` should be the values of the `%ebp` and `%esp` 292 registers in F. 293 * `.cbParams`, `.cbSavedRegs`, and `.cbLocals`, should be the values of 294 the `STACK WIN` record's _parameter\_size_, _saved\_register\_size_, and 295 _local\_size_ fields. 296 * `.raSearchStart` should be set to the address on the stack to begin 297 scanning for a return address, if necessary. The Breakpad processor sets 298 this to the value of `%esp` in F, plus the _frame\_size_ value mentioned 299 above. 300 301> If the program stores values for `$eip`, `$esp`, `$ebp`, `$ebx`, `$esi`, or 302> `$edi`, then those are the values of the given registers in the caller. If the 303> value of `$eip` is zero, that indicates that the end of the stack has been 304> reached. 305 306The Breakpad processor checks that the value yielded by the above for the 307calling frame's instruction address refers to known code; if the address seems 308to be bogus, then it uses a heuristic search to find F's return address and 309stack base. 310 311# `STACK CFI` records 312 313`STACK CFI` ("Call Frame Information") records describe how to walk the stack 314when execution is at a given machine instruction. These records take one of two 315forms: 316 317> `STACK CFI INIT` _address_ _size_ _register<sub>1</sub>_: 318> _expression<sub>1</sub>_ _register<sub>2</sub>_: _expression<sub>2</sub>_ ... 319> 320> `STACK CFI` _address_ _register<sub>1</sub>_: _expression<sub>1</sub>_ 321> _register<sub>2</sub>_: _expression<sub>2</sub>_ ... 322 323For example: 324 325``` 326STACK CFI INIT 804c4b0 40 .cfa: $esp 4 + $eip: .cfa 4 - ^ 327STACK CFI 804c4b1 .cfa: $esp 8 + $ebp: .cfa 8 - ^ 328``` 329 330The _address_ and _size_ fields are hexadecimal numbers. Each 331_register_<sub>i</sub> is the name of a register or pseudoregister. Each 332_expression_ is a Breakpad postfix expression, which may contain spaces, but 333never ends with a colon. (The appropriate register names for a given 334architecture are determined when `STACK CFI` records are first enabled for that 335architecture, and should be documented in the appropriate 336`stackwalker_`_architecture_`.cc` source file.) 337 338STACK CFI records describe, at each machine instruction in a given function, how 339to recover the values the machine registers had in the function's caller. 340Naturally, some registers' values are simply lost, but there are three cases in 341which they can be recovered: 342 343* You can always recover the program counter, because that's the function's 344 return address. If the function is ever going to return, the PC must be 345 saved somewhere. 346 347* You can always recover the stack pointer. The function is responsible for 348 popping its stack frame before it returns to the caller, so it must be able 349 to restore this, as well. 350 351* You should be able to recover the values of callee-saves registers. These 352 are registers whose values the callee must preserve, either by saving them 353 in its own stack frame before using them and re-loading them before 354 returning, or by not using them at all. 355 356(As an exception, note that functions which never return may not save any of 357this data. It may not be possible to walk the stack past such functions' stack 358frames.) 359 360Given rules for recovering the values of a function's caller's registers, we can 361walk up the stack. Starting with the current set of registers --- the PC of the 362instruction we're currently executing, the current stack pointer, etc. --- we 363use CFI to recover the values those registers had in the caller of the current 364frame. This gives us a PC in the caller whose CFI we can look up; we apply the 365process again to find that function's caller; and so on. 366 367Concretely, CFI records represent a table with a row for each machine 368instruction address and a column for each register. The table entry for a given 369address and register contains a rule describing how, when the PC is at that 370address, to restore the value that register had in the caller. 371 372There are some special columns: 373 374* A column named `.cfa`, for "Canonical Frame Address", tells how to compute 375 the base address of the frame; other entries can refer to the CFA in their 376 rules. 377 378* A column named `.ra` represents the return address. 379 380For example, suppose we have a machine with 32-bit registers, one-byte 381instructions, a stack that grows downwards, and an assembly language that 382resembles C. Suppose further that we have a function whose machine code looks 383like this: 384 385``` 386func: ; entry point; return address at sp 387func+0: sp -= 16 ; allocate space for stack frame 388func+1: sp[12] = r0 ; save 4-byte r0 at sp+12 389 ... ; stuff that doesn't affect stack 390func+10: sp -= 4; *sp = x ; push some 4-byte x on the stack 391 ... ; stuff that doesn't affect stack 392func+20: r0 = sp[16] ; restore saved r0 393func+21: sp += 20 ; pop whole stack frame 394func+22: pc = *sp; sp += 4 ; pop return address and jump to it 395``` 396 397The following table would describe the function above: 398 399**code address** | **.cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **.ra** 400:--------------- | :------- | :---------------------- | :---------------------- | :-- | :------- 401func+0 | sp | | | | `cfa[0]` 402func+1 | sp+16 | | | | `cfa[0]` 403func+2 | sp+16 | `cfa[-4]` | | | `cfa[0]` 404func+11 | sp+20 | `cfa[-4]` | | | `cfa[0]` 405func+21 | sp+20 | | | | `cfa[0]` 406func+22 | sp | | | | `cfa[0]` 407 408Some things to note here: 409 410* Each row describes the state of affairs **before** executing the instruction 411 at the given address. Thus, the row for func+0 describes the state before we 412 execute the first instruction, which allocates the stack frame. In the next 413 row, the formula for computing the CFA has changed, reflecting the 414 allocation. 415 416* The other entries are written in terms of the CFA; this allows them to 417 remain unchanged as the stack pointer gets bumped around. For example, to 418 find the caller's value for r0 (on Google Code) at func+2, we would first 419 compute the CFA by adding 16 to the sp, and then subtract four from that to 420 find the address at which r0 (on Google Code) was saved. 421 422* Although the example doesn't show this, most calling conventions designate 423 "callee-saves" and "caller-saves" registers. The callee must restore the 424 values of "callee-saves" registers before returning (if it uses them at 425 all), whereas the callee is free to use "caller-saves" registers without 426 restoring their values. A function that uses caller-saves registers 427 typically does not save their original values at all; in this case, the CFI 428 marks such registers' values as "unrecoverable". 429 430* Exactly where the CFA points in the frame --- at the return address? below 431 it? At some fixed point within the frame? --- is a question of definition 432 that depends on the architecture and ABI in use. But by definition, the CFA 433 remains constant throughout the lifetime of the frame. It's up to 434 architecture- specific code to know what significance to assign the CFA, if 435 any. 436 437To save space, the most common type of CFI record only mentions the table 438entries at which changes take place. So for the above, the CFI data would only 439actually mention the non-blank entries here: 440 441**insn** | **cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **ra** 442:------- | :------ | :---------------------- | :---------------------- | :-- | :------- 443func+0 | sp | | | | `cfa[0]` 444func+1 | sp+16 | | | | 445func+2 | | `cfa[-4]` | | | 446func+11 | sp+20 | | | | 447func+21 | | r0 (on Google Code) | | | 448func+22 | sp | | | | 449 450A `STACK CFI INIT` record indicates that, at the machine instruction at 451_address_, belonging to some function, the value that _register<sub>n</sub>_ had 452in that function's caller can be recovered by evaluating 453_expression<sub>n</sub>_. The values of any callee-saves registers not mentioned 454are assumed to be unchanged. (`STACK CFI` records never mention caller-saves 455registers.) These rules apply starting at _address_ and continue up to, but not 456including, the address given in the next `STACK CFI` record. The _size_ field is 457the total number of bytes of machine code covered by this record and any 458subsequent `STACK CFI` records (until the next `STACK CFI INIT` record). The 459_address_ field is relative to the module's load address. 460 461A `STACK CFI` record (no `INIT`) is the same, except that it mentions only those 462registers whose recovery rules have changed from the previous CFI record. There 463must be a prior `STACK CFI INIT` or `STACK CFI` record in the symbol file. The 464_address_ field of this record must be greater than that of the previous record, 465and it must not be at or beyond the end of the range given by the most recent 466`STACK CFI INIT` record. The address is relative to the module's load address. 467 468Each expression is a breakpad-style postfix expression. Expressions may contain 469spaces, but their tokens may not end with colons. When an expression mentions a 470register, it refers to the value of that register in the callee, even if a prior 471name/expression pair gives that register's value in the caller. The exception is 472`.cfa`, which refers to the canonical frame address computed by the .cfa rule in 473force at the current instruction. 474 475The special expression `.undef` indicates that the given register's value cannot 476be recovered. 477 478The register names preceding the expressions are always followed by colons. The 479expressions themselves never contain tokens ending with colons. 480 481There are two special register names: 482 483* `.cfa` ("Canonical Frame Address") is the base address of the stack frame. 484 Other registers' rules may refer to this. If no rule is provided for the 485 stack pointer, the value of `.cfa` is the caller's stack pointer. 486 487* `.ra` is the return address. This is the value of the restored program 488 counter. We use `.ra` instead of the architecture-specific name for the 489 program counter. 490 491The Breakpad stack walker requires that there be rules in force for `.cfa` and 492`.ra` at every code address from which it unwinds. If those rules are not 493present, the stack walker will ignore the `STACK CFI` data, and try to use a 494different strategy. 495 496So the CFI for the example function above would be as follows, if `func` were at 497address 0x1000 (relative to the module's load address): 498 499``` 500STACK CFI INIT 1000 .cfa: $sp .ra: .cfa ^ 501STACK CFI 1001 .cfa: $sp 16 + 502STACK CFI 1002 $r0: .cfa 4 - ^ 503STACK CFI 100b .cfa: $sp 20 + 504STACK CFI 1015 $r0: $r0 505STACK CFI 1016 .cfa: $sp 506``` 507