• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1=====================================
2The PDB File Format
3=====================================
4
5.. contents::
6   :local:
7
8.. _pdb_intro:
9
10Introduction
11============
12
13PDB (Program Database) is a file format invented by Microsoft and which contains
14debug information that can be consumed by debuggers and other tools.  Since
15officially supported APIs exist on Windows for querying debug information from
16PDBs even without the user understanding the internals of the file format, a
17large ecosystem of tools has been built for Windows to consume this format.  In
18order for Clang to be able to generate programs that can interoperate with these
19tools, it is necessary for us to generate PDB files ourselves.
20
21At the same time, LLVM has a long history of being able to cross-compile from
22any platform to any platform, and we wish for the same to be true here.  So it
23is necessary for us to understand the PDB file format at the byte-level so that
24we can generate PDB files entirely on our own.
25
26This manual describes what we know about the PDB file format today.  The layout
27of the file, the various streams contained within, the format of individual
28records within, and more.
29
30We would like to extend our heartfelt gratitude to Microsoft, without whom we
31would not be where we are today.  Much of the knowledge contained within this
32manual was learned through reading code published by Microsoft on their `GitHub
33repo <https://github.com/Microsoft/microsoft-pdb>`__.
34
35.. _pdb_layout:
36
37File Layout
38===========
39
40.. important::
41   Unless otherwise specified, all numeric values are encoded in little endian.
42   If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
43   assume it is little endian!
44
45.. toctree::
46   :hidden:
47
48   MsfFile
49   PdbStream
50   TpiStream
51   DbiStream
52   ModiStream
53   PublicStream
54   GlobalStream
55   HashStream
56   CodeViewSymbols
57   CodeViewTypes
58
59.. _msf:
60
61The MSF Container
62-----------------
63A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
64An MSF file is actually a miniature "file system within a file".  It contains
65multiple streams (aka files) which can represent arbitrary data, and these
66streams are divided into blocks which may not necessarily be contiguously
67laid out within the file (aka fragmented).  Additionally, the MSF contains a
68stream directory (aka MFT) which describes how the streams (files) are laid
69out within the MSF.
70
71For more information about the MSF container format, stream directory, and
72block layout, see :doc:`MsfFile`.
73
74.. _streams:
75
76Streams
77-------
78The PDB format contains a number of streams which describe various information
79such as the types, symbols, source files, and compilands (e.g. object files)
80of a program, as well as some additional streams containing hash tables that are
81used by debuggers and other tools to provide fast lookup of records and types
82by name, and various other information about how the program was compiled such
83as the specific toolchain used, and more.  A summary of streams contained in a
84PDB file is as follows:
85
86+--------------------+------------------------------+-------------------------------------------+
87| Name               | Stream Index                 | Contents                                  |
88+====================+==============================+===========================================+
89| Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |
90+--------------------+------------------------------+-------------------------------------------+
91| PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |
92|                    |                              | - Fields to match EXE to this PDB         |
93|                    |                              | - Map of named streams to stream indices  |
94+--------------------+------------------------------+-------------------------------------------+
95| TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |
96|                    |                              | - Index of TPI Hash Stream                |
97+--------------------+------------------------------+-------------------------------------------+
98| DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |
99|                    |                              | - Indices of individual module streams    |
100|                    |                              | - Indices of public / global streams      |
101|                    |                              | - Section Contribution Information        |
102|                    |                              | - Source File Information                 |
103|                    |                              | - FPO / PGO Data                          |
104+--------------------+------------------------------+-------------------------------------------+
105| IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |
106|                    |                              | - Index of IPI Hash Stream                |
107+--------------------+------------------------------+-------------------------------------------+
108| /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |
109|                    |   Named Stream map           |                                           |
110+--------------------+------------------------------+-------------------------------------------+
111| /src/headerblock   | - Contained in PDB Stream    | - Unknown                                 |
112|                    |   Named Stream map           |                                           |
113+--------------------+------------------------------+-------------------------------------------+
114| /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |
115|                    |   Named Stream map           |   string de-duplication                   |
116+--------------------+------------------------------+-------------------------------------------+
117| Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |
118|                    | - One for each compiland     | - Line Number Information                 |
119+--------------------+------------------------------+-------------------------------------------+
120| Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |
121|                    |                              | - Index of Public Hash Stream             |
122+--------------------+------------------------------+-------------------------------------------+
123| Global Stream      | - Contained in DBI Stream    | - Global Symbol Records                   |
124|                    |                              | - Index of Global Hash Stream             |
125+--------------------+------------------------------+-------------------------------------------+
126| TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |
127|                    |                              |   by name                                 |
128+--------------------+------------------------------+-------------------------------------------+
129| IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |
130|                    |                              |   by name                                 |
131+--------------------+------------------------------+-------------------------------------------+
132
133More information about the structure of each of these can be found on the
134following pages:
135
136:doc:`PdbStream`
137   Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
138
139:doc:`TpiStream`
140   Information about the TPI stream and the CodeView records contained within.
141
142:doc:`DbiStream`
143   Information about the DBI stream and relevant substreams including the Module Substreams,
144   source file information, and CodeView symbol records contained within.
145
146:doc:`ModiStream`
147   Information about the Module Information Stream, of which there is one for each compilation
148   unit and the format of symbols contained within.
149
150:doc:`PublicStream`
151   Information about the Public Symbol Stream.
152
153:doc:`GlobalStream`
154   Information about the Global Symbol Stream.
155
156:doc:`HashStream`
157   Information about the Hash Table stream, and how it can be used to quickly look up records
158   by name.
159
160CodeView
161========
162CodeView is another format which comes into the picture.  While MSF defines
163the structure of the overall file, and PDB defines the set of streams that
164appear within the MSF file and the format of those streams, CodeView defines
165the format of **symbol and type records** that appear within specific streams.
166Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
167more information about the CodeView format.
168