• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Reactor Debug Info Generation
2
3## Introduction
4
5Reactor produces Just In Time compiled dynamic executable code and can be used to JIT high performance functions specialized for runtime
6configurations, or to even build a compiler.
7
8In order to debug executable code at a higher level than disassembly, source code files are required.
9
10Reactor has two potential sources of source code:
11
121. The C++ source code of the program that calls into Reactor.
132. External source files read by the program and passed to Reactor.
14
15While case (2) is preferable for implementing a compiler, this is currently not
16implemented.
17
18Reactor implements case (1) and this can be used by GDB to single line step and
19inspect variables.
20
21## Supported Platforms
22
23Currently:
24
25* Debug info generation is only supported on Linux with the LLVM 7
26backend.
27* GDB is the only supported debugger.
28* The program must be compiled with debug info iteself.
29
30## Enabling
31
32Debug generation is enabled with `REACTOR_EMIT_DEBUG_INFO` CMake flag (defaults
33to disabled).
34
35## Implementation details
36
37### Source Location
38
39All Reactor functions begin with a call to `RR_DEBUG_INFO_UPDATE_LOC()`, which calls into `rr::DebugInfo::EmitLocation()`.
40
41`rr::DebugInfo::EmitLocation()` calls `rr::DebugInfo::getCallerBacktrace()`,
42which in turn uses [`libbacktrace`](https://github.com/ianlancetaylor/libbacktrace)
43to unwind the stack and find the file, function and line of the caller.
44
45This information is passed to `llvm::IRBuilder<>::SetCurrentDebugLocation`
46to emit source line information for the next LLVM instructions to be built.
47
48### Variables
49
50There are 3 aspects to generating variable debug information:
51
52#### 1. Variable names
53
54Constructing a Reactor `LValue`:
55
56```C++
57rr::Int a = 1;
58```
59
60Will emit an LLVM `alloca` instruction to allocate the storage of the variable,
61and emit another to initialize it to the constant `1`. While fluent, none of the
62Reactor calls see the name of the C++ local variable "`a`", and the LLVM `alloca`
63value gets a meaningless numerical value.
64
65There are two potential ways that Reactor can obtain the variable name:
66
671. Use the running executable's own debug information to examine the local
68   declaration and extract the local variable's name.
692. Use the backtrace information to parse the name from the source file.
70
71While (1) is arguably a cleaner and more robust solution, (2) is
72easier to implement and can work for the majority of use cases.
73
74(2) is the current solution implemented.
75
76`rr::DebugInfo::getOrParseFileTokens()` scans a source file line by line, and
77uses a regular expression to look for patterns of `<type> <name>`. Matching is not
78precise, but is adequate to find locals constructed with and without assignment.
79
80#### 2. Variable binding
81
82Given that we can find a variable name for a given source line, we need a way of
83binding the LLVM values to the name.
84
85Given our trivial example:
86
87```C++
88rr::Int a = 1
89```
90
91The `rr::Int` constructor calls `RR_DEBUG_INFO_EMIT_VAR()` passing the storage
92value as single argument. `RR_DEBUG_INFO_EMIT_VAR()` performs the backtrace
93to find the source file and line and uses the token information produced by
94`rr::DebugInfo::getOrParseFileTokens()` to identify the variable name.
95
96However, things get a bit more complicated when there are multiple variables
97being constructed on the same line.
98
99Take for example:
100
101```C++
102rr::Int a = rr::Int(1) + rr::Int(2)
103```
104
105Here we have 3 calls to the `rr::Int` constructor, each calling down
106to `RR_DEBUG_INFO_EMIT_VAR()`.
107
108To disambiguate which of these should be bound to the variable name "`a`",
109`rr::DebugInfo::EmitVariable()` buffers the binding into
110`scope.pending` and the last binding for a given line is used by
111`DebugInfo::emitPending()`. For variable construction and assignment, C++
112guarantees that the LHS is the last value to be constructed.
113
114This solution is not perfect.
115
116Multi-line expressions, multiple assignments on a single line, macro obfuscation
117can all break variable bindings - however the majority of typical cases work.
118
119#### 3. Variable scope
120
121`rr::DebugInfo` maintains a stack of `llvm::DIScope`s and `llvm::DILocation`s
122that mirrors the current backtrace for function being called.
123
124A synthetic call stack is produced by chaining `llvm::DILocation`s with
125`InlinedAt`s.
126
127For example, at the declaration of `i`:
128
129```C++
130void B()
131{
132    rr::Int i; // <- here
133}
134
135void A()
136{
137    B();
138}
139
140int main(int argc, const char* argv[])
141{
142    A();
143}
144```
145
146The `DIScope` hierarchy would be:
147
148```C++
149                              DIFile: "foo.cpp"
150rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
151rr::DebugInfo::diScope[1].di: ↳ DISubprogram: "A"
152rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
153```
154
155The `DILocation` hierarchy would be:
156
157```C++
158rr::DebugInfo::diRootLocation:      DILocation(DISubprogram: "ReactorFunction")
159rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
160rr::DebugInfo::diScope[1].location:   ↳ DILocation(DISubprogram: "A")
161rr::DebugInfo::diScope[2].location:     ↳ DILocation(DISubprogram: "B")
162```
163
164Where '↳' represents an `InlinedAt`.
165
166
167`rr::DebugInfo::diScope` is updated by `rr::DebugInfo::syncScope()`.
168
169`llvm::DIScope`s typically do not nest - there is usually a separate
170`llvm::DISubprogram` for each function in the callstack. All local variables
171within a function will typically share the same scope, regardless of whether
172they are declared within a sub-block.
173
174Loops and jumps within a function add complexity. Consider:
175
176```C++
177void B()
178{
179    rr::Int i = 0;
180}
181
182void A()
183{
184    for (int i = 0; i < 3; i++)
185    {
186        rr::Int x = 0;
187    }
188    B();
189}
190
191int main(int argc, const char* argv[])
192{
193    A();
194}
195```
196
197In this particular example Reactor will not be aware of the `for` loop, and will
198attempt to create three variables called "`x`" in the same function scope for `A()`.
199Duplicate symbols in the same `llvm::DIScope` result in undefined behavior.
200
201To solve this, `rr::DebugInfo::syncScope()` observes when a function jumps
202backwards, and forks the current `llvm::DILexicalBlock` for the function. This
203results in a number of `llvm::DILexicalBlock` chains, each declaring variables
204that shadow the previous block.
205
206At the declaration of `i`, the `DIScope` hierarchy would be:
207
208```C++
209                              DIFile: "foo.cpp"
210rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
211                              ↳ DISubprogram: "A"
212                              | ↳ DILexicalBlock: "A".1
213rr::DebugInfo::diScope[1].di: |   ↳ DILexicalBlock: "A".2
214rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
215```
216
217The `DILocation` hierarchy would be:
218
219```C++
220rr::DebugInfo::diRootLocation:      DILocation(DISubprogram: "ReactorFunction")
221rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
222rr::DebugInfo::diScope[1].location:   ↳ DILocation(DILexicalBlock: "A".2)
223rr::DebugInfo::diScope[2].location:     ↳ DILocation(DISubprogram: "B")
224```
225
226### Debugger integration
227
228Once the debug information has been generated, it needs to be handed to the
229debugger.
230
231Reactor uses [`llvm::JITEventListener::createGDBRegistrationListener()`](http://llvm.org/doxygen/classllvm_1_1JITEventListener.html#a004abbb5a0d48ac376dfbe3e3c97c306)
232to inform GDB of the JIT'd program and its debugging information.
233More information [can be found here](https://llvm.org/docs/DebuggingJITedCode.html).
234
235LLDB should be able to support this same mechanism, but at the time of writing
236this does not appear to work.
237
238