• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Introduction
2
3Given a minidump file, the Breakpad processor produces stack traces that include
4function names and source locations. However, minidump files contain only the
5byte-by-byte contents of threads' registers and stacks, without function names
6or machine-code-to-source mapping data. The processor consults Breakpad symbol
7files for the information it needs to produce human-readable stack traces from
8the binary-only minidump file.
9
10The platform-specific symbol dumping tools parse the debugging information the
11compiler provides (whether as DWARF or STABS sections in an ELF file or as
12stand-alone PDB files), and write that information back out in the Breakpad
13symbol file format. This format is much simpler and less detailed than compiler
14debugging information, and values legibility over compactness.
15
16# Overview
17
18Breakpad symbol files are ASCII text files, with lines delimited as appropriate
19for the host platform. Each line is a _record_, divided into fields by single
20spaces; in some cases, the last field of the record can contain spaces. The
21first field is a string indicating what sort of record the line represents
22(except for line records; these are very common, making them the default saves
23space). Some fields hold decimal or hexadecimal numbers; hexadecimal numbers
24have no "0x" prefix, and use lower-case letters.
25
26Breakpad symbol files contain the following record types. With some
27restrictions, these may appear in any order.
28
29*   A `MODULE` record describes the executable file or shared library from which
30    this data was derived, for use by symbol suppliers. A `MODULE' record should
31    be the first record in the file.
32
33*   A `FILE` record gives a source file name, and assigns it a number by which
34    other records can refer to it.
35
36*   A `FUNC` record describes a function present in the source code.
37
38*   A line record indicates to which source file and line a given range of
39    machine code should be attributed. The line is attributed to the function
40    defined by the most recent `FUNC` record.
41
42*   A `PUBLIC` record gives the address of a linker symbol.
43
44*   A `STACK` record provides information necessary to produce stack traces.
45
46# `MODULE` records
47
48A `MODULE` record provides meta-information about the module the symbol file
49describes. It has the form:
50
51> `MODULE` _operatingsystem_ _architecture_ _id_ _name_
52
53For example: `MODULE Linux x86 D3096ED481217FD4C16B29CD9BC208BA0 firefox-bin
54` These records provide meta-information about the executable or shared library
55from which this symbol file was generated. A symbol supplier might use this
56information to find the correct symbol files to use to interpret a given
57minidump, or to perform other sorts of validation. If present, a `MODULE` record
58should be the first line in the file.
59
60The fields are separated by spaces, and cannot contain spaces themselves, except
61for _name_.
62
63*   The _operatingsystem_ field names the operating system on which the
64    executable or shared library was intended to run. This field should have one
65    of the following values: | **Value** | **Meaning** |
66    |:----------|:--------------------| | Linux | Linux | | mac | Macintosh OSX
67    | | windows | Microsoft Windows |
68
69*   The _architecture_ field indicates what processor architecture the
70    executable or shared library contains machine code for. This field should
71    have one of the following values: | **Value** | **Instruction Set
72    Architecture** | |:----------|:---------------------------------| | x86 |
73    Intel IA-32 | | x86\_64 | AMD64/Intel 64 | | ppc | 32-bit PowerPC | | ppc64
74    | 64-bit PowerPC | | unknown | unknown |
75
76*   The _id_ field is a sequence of hexadecimal digits that identifies the exact
77    executable or library whose contents the symbol file describes. The way in
78    which it is computed varies from platform to platform.
79
80*   The _name_ field contains the base name (the final component of the
81    directory path) of the executable or library. It may contain spaces, and
82    extends to the end of the line.
83
84# `FILE` records
85
86A `FILE` record holds a source file name for other records to refer to. It has
87the form:
88
89> `FILE` _number_ _name_
90
91For example: `FILE 2 /home/jimb/mc/in/browser/app/nsBrowserApp.cpp
92`
93
94A `FILE` record provides the name of a source file, and assigns it a number
95which other records (line records, in particular) can use to refer to that file
96name. The _number_ field is a decimal number. The _name_ field is the name of
97the file; it may contain spaces.
98
99# `FUNC` records
100
101A `FUNC` record describes a source-language function. It has the form:
102
103> `FUNC` _[m]_ _address_ _size_ _parameter\_size_ _name_
104
105For example: `FUNC m c184 30 0 nsQueryInterfaceWithError::operator()(nsID const&,
106void**) const
107`
108
109The _m_ field is optional. If present it indicates that multiple symbols
110reference this function's instructions. (In which case, only one symbol name is
111mentioned within the breakpad file.) Multiple symbols referencing the same
112instructions may occur due to identical code folding by the linker.
113
114The _address_ and _size_ fields are hexadecimal numbers indicating the start
115address and length in bytes of the machine code instructions the function
116occupies. (Breakpad symbol files cannot accurately describe functions whose code
117is not contiguous.) The start address is relative to the module's load address.
118
119The _parameter\_size_ field is a hexadecimal number indicating the size, in
120bytes, of the arguments pushed on the stack for this function. Some calling
121conventions, like the Microsoft Windows `stdcall` convention, require the called
122function to pop parameters passed to it on the stack from its caller before
123returning. The stack walker uses this value, along with data from `STACK`
124records, to step from the called function's frame to the caller's frame.
125
126The _name_ field is the name of the function. In languages that use linker
127symbol name mangling like C++, this should be the source language name (the
128"unmangled" form). This field may contain spaces.
129
130# Line records
131
132A line record describes the source file and line number to which a given range
133of machine code should be attributed. It has the form:
134
135> _address_ _size_ _line_ _filenum_
136
137For example: `c184 7 59 4
138`
139
140Because they are so common, line records do not begin with a string indicating
141the record type. All other record types' names use upper-case letters;
142hexadecimal numbers, like a line record's _address_, use lower-case letters.
143
144The _address_ and _size_ fields are hexadecimal numbers indicating the start
145address and length in bytes of the machine code. The address is relative to the
146module's load address.
147
148The _line_ field is the line number to which the machine code should be
149attributed, in decimal; the first line of the source file is line number 1. The
150_filenum_ field is a decimal number appearing in a prior `FILE` record; the name
151given in that record is the source file name for the machine code.
152
153The line is assumed to belong to the function described by the last preceding
154`FUNC` record. Line records may not appear before the first `FUNC' record.
155
156No two line records in a symbol file cover the same range of addresses. However,
157there may be many line records with identical line and file numbers, as a given
158source line may contribute many non-contiguous blocks of machine code.
159
160# `PUBLIC` records
161
162A `PUBLIC` record describes a publicly visible linker symbol, such as that used
163to identify an assembly language entry point or region of memory. It has the
164form:
165
166> PUBLIC _[m]_ _address_ _parameter\_size_ _name_
167
168For example: `PUBLIC m 2160 0 Public2_1
169`
170
171The Breakpad processor essentially treats a `PUBLIC` record as defining a
172function with no line number data and an indeterminate size: the code extends to
173the next address mentioned. If a given address is covered by both a `PUBLIC`
174record and a `FUNC` record, the processor uses the `FUNC` data.
175
176The _m_ field is optional. If present it indicates that multiple symbols
177reference this function's instructions. (In which case, only one symbol name is
178mentioned within the breakpad file.) Multiple symbols referencing the same
179instructions may occur due to identical code folding by the linker.
180
181The _address_ field is a hexadecimal number indicating the symbol's address,
182relative to the module's load address.
183
184The _parameter\_size_ field is a hexadecimal number indicating the size of the
185parameters passed to the code whose entry point the symbol marks, if known. This
186field has the same meaning as the _parameter\_size_ field of a `FUNC` record;
187see that description for more details.
188
189The _name_ field is the name of the symbol. In languages that use linker symbol
190name mangling like C++, this should be the source language name (the "unmangled"
191form). This field may contain spaces.
192
193# `STACK WIN` records
194
195Given a stack frame, a `STACK WIN` record indicates how to find the frame that
196called it. It has the form:
197
198> STACK WIN _type_ _rva_ _code\_size_ _prologue\_size_ _epilogue\_size_
199> _parameter\_size_ _saved\_register\_size_ _local\_size_ _max\_stack\_size_
200> _has\_program\_string_ _program\_string\_OR\_allocates\_base\_pointer_
201
202For example: `STACK WIN 4 2170 14 1 0 0 0 0 0 1 $eip 4 + ^ = $esp $ebp 8 + =
203$ebp $ebp ^ =
204`
205
206All fields of a `STACK WIN` record, except for the last, are hexadecimal
207numbers.
208
209The _type_ field indicates what sort of stack frame data this record holds. Its
210value should be one of the values of the
211[StackFrameTypeEnum](http://msdn.microsoft.com/en-us/library/bc5207xw%28VS.100%29.aspx)
212type in Microsoft's
213[Debug Interface Access (DIA)](http://msdn.microsoft.com/en-us/library/x93ctkx8%28VS.100%29.aspx) API.
214Breakpad uses only records of type 4 (`FrameTypeFrameData`) and 0
215(`FrameTypeFPO`); it ignores others. These types differ only in whether the last
216field is an _allocates\_base\_pointer_ flag (`FrameTypeFPO`) or a program string
217(`FrameTypeFrameData`). If more than one record covers a given address, Breakpad
218prefers `FrameTypeFrameData` records over `FrameTypeFPO` records.
219
220The _rva_ and _code\_size_ fields give the starting address and length in bytes
221of the machine code covered by this record. The starting address is relative to
222the module's load address.
223
224The _prologue\_size_ and _epilogue\_size_ fields give the length, in bytes, of
225the prologue and epilogue machine code within the record's range. Breakpad does
226not use these values.
227
228The _parameter\_size_ field gives the number of argument bytes this function
229expects to have been passed. This field has the same meaning as the
230_parameter\_size_ field of a `FUNC` record; see that description for more
231details.
232
233The _saved\_register\_size_ field gives the number of bytes in the stack frame
234dedicated to preserving the values of any callee-saves registers used by this
235function.
236
237The _local\_size_ field gives the number of bytes in the stack frame dedicated
238to holding the function's local variables and temporary values.
239
240The _max\_stack\_size_ field gives the maximum number of bytes pushed on the
241stack in the frame. Breakpad does not use this value.
242
243If the _has\_program\_string_ field is zero, then the `STACK WIN` record's final
244field is an _allocates\_base\_pointer_ flag, as a hexadecimal number; this is
245expected for records whose _type_ is 0. Otherwise, the final field is a program
246string.
247
248## Interpreting a `STACK WIN` record
249
250Given the register values for a frame F, we can find the calling frame as
251follows:
252
253*   If the _has\_program\_string_ field of a `STACK WIN` record is zero, then
254    the final field is _allocates\_base\_pointer_, a flag indicating whether the
255    frame uses the frame pointer register, `%ebp`, as a general-purpose
256    register.
257    *   If _allocates\_base\_pointer_ is true, then `%ebp` does not point to the
258        frame's base address. Instead,
259        *   Let _next\_parameter\_size_ be the parameter size of the function
260            frame F called (**not** this record's _parameter\_size_ field), or
261            zero if F is the youngest frame on the stack. You must find this
262            value in F's callee's `FUNC`, `STACK WIN`, or `PUBLIC` records.
263        *   Let _frame\_size_ be the sum of the _local\_size_ field, the
264            _saved\_register\_size_ field, and _next\_parameter\_size_. > > With
265            those definitions in place, we can recover the calling frame as
266            follows:
267        *   F's return address is at `%esp +`_frame\_size_,
268        *   the caller's value of `%ebp` is saved at `%esp
269            +`_next\_parameter\_size_`+`_saved\_register\_size_`- 8`, and
270        *   the caller's value of `%esp` just before the call instruction was
271            `%esp +`_frame\_size_`+ 4`. > > (Why do we include
272            _next\_parameter\_size_ in the sum when computing _frame\_size_ and
273            the address of the saved `%ebp`? When a function A has called a
274            function B, the arguments that A pushed for B are considered part of
275            A's stack frame: A's value for `%esp` points at the last argument
276            pushed for B. Thus, we must include the size of those arguments
277            (given by the debugging info for B) along with the size of A's
278            register save area and local variable area (given by the debugging
279            info for A) when computing the overall size of A's frame.)
280    *   If _allocates\_base\_pointer_ is false, then F's function doesn't use
281        `%ebp` at all. You may recover the calling frame as above, except that
282        the caller's value of `%ebp` is the same as F's value for `%ebp`, so no
283        steps are necessary to recover it.
284*   If the _has\_program\_string_ field of a `STACK WIN` record is not zero,
285    then the record's final field is a string containing a program to be
286    interpreted to recover the caller's frame. The comments in the
287    [postfix\_evaluator.h](../src/processor/postfix_evaluator.h#40)
288    header file explain the language in which the program is written. You should
289    place the following variables in the dictionary before interpreting the
290    program:
291    *   `$ebp` and `$esp` should be the values of the `%ebp` and `%esp`
292        registers in F.
293    *   `.cbParams`, `.cbSavedRegs`, and `.cbLocals`, should be the values of
294        the `STACK WIN` record's _parameter\_size_, _saved\_register\_size_, and
295        _local\_size_ fields.
296    *   `.raSearchStart` should be set to the address on the stack to begin
297        scanning for a return address, if necessary. The Breakpad processor sets
298        this to the value of `%esp` in F, plus the _frame\_size_ value mentioned
299        above.
300
301> If the program stores values for `$eip`, `$esp`, `$ebp`, `$ebx`, `$esi`, or
302> `$edi`, then those are the values of the given registers in the caller. If the
303> value of `$eip` is zero, that indicates that the end of the stack has been
304> reached.
305
306The Breakpad processor checks that the value yielded by the above for the
307calling frame's instruction address refers to known code; if the address seems
308to be bogus, then it uses a heuristic search to find F's return address and
309stack base.
310
311# `STACK CFI` records
312
313`STACK CFI` ("Call Frame Information") records describe how to walk the stack
314when execution is at a given machine instruction. These records take one of two
315forms:
316
317> `STACK CFI INIT` _address_ _size_ _register<sub>1</sub>_:
318> _expression<sub>1</sub>_ _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
319>
320> `STACK CFI` _address_ _register<sub>1</sub>_: _expression<sub>1</sub>_
321> _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
322
323For example:
324
325```
326STACK CFI INIT 804c4b0 40 .cfa: $esp 4 + $eip: .cfa 4 - ^
327STACK CFI 804c4b1 .cfa: $esp 8 + $ebp: .cfa 8 - ^
328```
329
330The _address_ and _size_ fields are hexadecimal numbers. Each
331_register_<sub>i</sub> is the name of a register or pseudoregister. Each
332_expression_ is a Breakpad postfix expression, which may contain spaces, but
333never ends with a colon. (The appropriate register names for a given
334architecture are determined when `STACK CFI` records are first enabled for that
335architecture, and should be documented in the appropriate
336`stackwalker_`_architecture_`.cc` source file.)
337
338STACK CFI records describe, at each machine instruction in a given function, how
339to recover the values the machine registers had in the function's caller.
340Naturally, some registers' values are simply lost, but there are three cases in
341which they can be recovered:
342
343*   You can always recover the program counter, because that's the function's
344    return address. If the function is ever going to return, the PC must be
345    saved somewhere.
346
347*   You can always recover the stack pointer. The function is responsible for
348    popping its stack frame before it returns to the caller, so it must be able
349    to restore this, as well.
350
351*   You should be able to recover the values of callee-saves registers. These
352    are registers whose values the callee must preserve, either by saving them
353    in its own stack frame before using them and re-loading them before
354    returning, or by not using them at all.
355
356(As an exception, note that functions which never return may not save any of
357this data. It may not be possible to walk the stack past such functions' stack
358frames.)
359
360Given rules for recovering the values of a function's caller's registers, we can
361walk up the stack. Starting with the current set of registers --- the PC of the
362instruction we're currently executing, the current stack pointer, etc. --- we
363use CFI to recover the values those registers had in the caller of the current
364frame. This gives us a PC in the caller whose CFI we can look up; we apply the
365process again to find that function's caller; and so on.
366
367Concretely, CFI records represent a table with a row for each machine
368instruction address and a column for each register. The table entry for a given
369address and register contains a rule describing how, when the PC is at that
370address, to restore the value that register had in the caller.
371
372There are some special columns:
373
374*   A column named `.cfa`, for "Canonical Frame Address", tells how to compute
375    the base address of the frame; other entries can refer to the CFA in their
376    rules.
377
378*   A column named `.ra` represents the return address.
379
380For example, suppose we have a machine with 32-bit registers, one-byte
381instructions, a stack that grows downwards, and an assembly language that
382resembles C. Suppose further that we have a function whose machine code looks
383like this:
384
385```
386func:                                ; entry point; return address at sp
387func+0:      sp -= 16                ; allocate space for stack frame
388func+1:      sp[12] = r0             ; save 4-byte r0 at sp+12
389             ...                     ; stuff that doesn't affect stack
390func+10:     sp -= 4; *sp = x        ; push some 4-byte x on the stack
391             ...                     ; stuff that doesn't affect stack
392func+20:     r0 = sp[16]             ; restore saved r0
393func+21:     sp += 20                ; pop whole stack frame
394func+22:     pc = *sp; sp += 4       ; pop return address and jump to it
395```
396
397The following table would describe the function above:
398
399**code address** | **.cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **.ra**
400:--------------- | :------- | :---------------------- | :---------------------- | :-- | :-------
401func+0           | sp       |                         |                         |     | `cfa[0]`
402func+1           | sp+16    |                         |                         |     | `cfa[0]`
403func+2           | sp+16    | `cfa[-4]`               |                         |     | `cfa[0]`
404func+11          | sp+20    | `cfa[-4]`               |                         |     | `cfa[0]`
405func+21          | sp+20    |                         |                         |     | `cfa[0]`
406func+22          | sp       |                         |                         |     | `cfa[0]`
407
408Some things to note here:
409
410*   Each row describes the state of affairs **before** executing the instruction
411    at the given address. Thus, the row for func+0 describes the state before we
412    execute the first instruction, which allocates the stack frame. In the next
413    row, the formula for computing the CFA has changed, reflecting the
414    allocation.
415
416*   The other entries are written in terms of the CFA; this allows them to
417    remain unchanged as the stack pointer gets bumped around. For example, to
418    find the caller's value for r0 (on Google Code) at func+2, we would first
419    compute the CFA by adding 16 to the sp, and then subtract four from that to
420    find the address at which r0 (on Google Code) was saved.
421
422*   Although the example doesn't show this, most calling conventions designate
423    "callee-saves" and "caller-saves" registers. The callee must restore the
424    values of "callee-saves" registers before returning (if it uses them at
425    all), whereas the callee is free to use "caller-saves" registers without
426    restoring their values. A function that uses caller-saves registers
427    typically does not save their original values at all; in this case, the CFI
428    marks such registers' values as "unrecoverable".
429
430*   Exactly where the CFA points in the frame --- at the return address? below
431    it? At some fixed point within the frame? --- is a question of definition
432    that depends on the architecture and ABI in use. But by definition, the CFA
433    remains constant throughout the lifetime of the frame. It's up to
434    architecture- specific code to know what significance to assign the CFA, if
435    any.
436
437To save space, the most common type of CFI record only mentions the table
438entries at which changes take place. So for the above, the CFI data would only
439actually mention the non-blank entries here:
440
441**insn** | **cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **ra**
442:------- | :------ | :---------------------- | :---------------------- | :-- | :-------
443func+0   | sp      |                         |                         |     | `cfa[0]`
444func+1   | sp+16   |                         |                         |     |
445func+2   |         | `cfa[-4]`               |                         |     |
446func+11  | sp+20   |                         |                         |     |
447func+21  |         | r0 (on Google Code)     |                         |     |
448func+22  | sp      |                         |                         |     |
449
450A `STACK CFI INIT` record indicates that, at the machine instruction at
451_address_, belonging to some function, the value that _register<sub>n</sub>_ had
452in that function's caller can be recovered by evaluating
453_expression<sub>n</sub>_. The values of any callee-saves registers not mentioned
454are assumed to be unchanged. (`STACK CFI` records never mention caller-saves
455registers.) These rules apply starting at _address_ and continue up to, but not
456including, the address given in the next `STACK CFI` record. The _size_ field is
457the total number of bytes of machine code covered by this record and any
458subsequent `STACK CFI` records (until the next `STACK CFI INIT` record). The
459_address_ field is relative to the module's load address.
460
461A `STACK CFI` record (no `INIT`) is the same, except that it mentions only those
462registers whose recovery rules have changed from the previous CFI record. There
463must be a prior `STACK CFI INIT` or `STACK CFI` record in the symbol file. The
464_address_ field of this record must be greater than that of the previous record,
465and it must not be at or beyond the end of the range given by the most recent
466`STACK CFI INIT` record. The address is relative to the module's load address.
467
468Each expression is a breakpad-style postfix expression. Expressions may contain
469spaces, but their tokens may not end with colons. When an expression mentions a
470register, it refers to the value of that register in the callee, even if a prior
471name/expression pair gives that register's value in the caller. The exception is
472`.cfa`, which refers to the canonical frame address computed by the .cfa rule in
473force at the current instruction.
474
475The special expression `.undef` indicates that the given register's value cannot
476be recovered.
477
478The register names preceding the expressions are always followed by colons. The
479expressions themselves never contain tokens ending with colons.
480
481There are two special register names:
482
483*   `.cfa` ("Canonical Frame Address") is the base address of the stack frame.
484    Other registers' rules may refer to this. If no rule is provided for the
485    stack pointer, the value of `.cfa` is the caller's stack pointer.
486
487*   `.ra` is the return address. This is the value of the restored program
488    counter. We use `.ra` instead of the architecture-specific name for the
489    program counter.
490
491The Breakpad stack walker requires that there be rules in force for `.cfa` and
492`.ra` at every code address from which it unwinds. If those rules are not
493present, the stack walker will ignore the `STACK CFI` data, and try to use a
494different strategy.
495
496So the CFI for the example function above would be as follows, if `func` were at
497address 0x1000 (relative to the module's load address):
498
499```
500STACK CFI INIT 1000 .cfa: $sp .ra: .cfa ^
501STACK CFI      1001 .cfa: $sp 16 +
502STACK CFI      1002 $r0: .cfa 4 - ^
503STACK CFI      100b .cfa: $sp 20 +
504STACK CFI      1015 $r0: $r0
505STACK CFI      1016 .cfa: $sp
506```
507