1# Reactor Debug Info Generation 2 3## Introduction 4 5Reactor produces Just In Time compiled dynamic executable code and can be used to JIT high performance functions specialized for runtime 6configurations, or to even build a compiler. 7 8In order to debug executable code at a higher level than disassembly, source code files are required. 9 10Reactor has two potential sources of source code: 11 121. The C++ source code of the program that calls into Reactor. 132. External source files read by the program and passed to Reactor. 14 15While case (2) is preferable for implementing a compiler, this is currently not 16implemented. 17 18Reactor implements case (1) and this can be used by GDB to single line step and 19inspect variables. 20 21## Supported Platforms 22 23Currently: 24 25* Debug info generation is only supported on Linux with the LLVM 7 26backend. 27* GDB is the only supported debugger. 28* The program must be compiled with debug info iteself. 29 30## Enabling 31 32Debug generation is enabled with `REACTOR_EMIT_DEBUG_INFO` CMake flag (defaults 33to disabled). 34 35## Implementation details 36 37### Source Location 38 39All Reactor functions begin with a call to `RR_DEBUG_INFO_UPDATE_LOC()`, which calls into `rr::DebugInfo::EmitLocation()`. 40 41`rr::DebugInfo::EmitLocation()` calls `rr::DebugInfo::getCallerBacktrace()`, 42which in turn uses [`libbacktrace`](https://github.com/ianlancetaylor/libbacktrace) 43to unwind the stack and find the file, function and line of the caller. 44 45This information is passed to `llvm::IRBuilder<>::SetCurrentDebugLocation` 46to emit source line information for the next LLVM instructions to be built. 47 48### Variables 49 50There are 3 aspects to generating variable debug information: 51 52#### 1. Variable names 53 54Constructing a Reactor `LValue`: 55 56```C++ 57rr::Int a = 1; 58``` 59 60Will emit an LLVM `alloca` instruction to allocate the storage of the variable, 61and emit another to initialize it to the constant `1`. While fluent, none of the 62Reactor calls see the name of the C++ local variable "`a`", and the LLVM `alloca` 63value gets a meaningless numerical value. 64 65There are two potential ways that Reactor can obtain the variable name: 66 671. Use the running executable's own debug information to examine the local 68 declaration and extract the local variable's name. 692. Use the backtrace information to parse the name from the source file. 70 71While (1) is arguably a cleaner and more robust solution, (2) is 72easier to implement and can work for the majority of use cases. 73 74(2) is the current solution implemented. 75 76`rr::DebugInfo::getOrParseFileTokens()` scans a source file line by line, and 77uses a regular expression to look for patterns of `<type> <name>`. Matching is not 78precise, but is adequate to find locals constructed with and without assignment. 79 80#### 2. Variable binding 81 82Given that we can find a variable name for a given source line, we need a way of 83binding the LLVM values to the name. 84 85Given our trivial example: 86 87```C++ 88rr::Int a = 1 89``` 90 91The `rr::Int` constructor calls `RR_DEBUG_INFO_EMIT_VAR()` passing the storage 92value as single argument. `RR_DEBUG_INFO_EMIT_VAR()` performs the backtrace 93to find the source file and line and uses the token information produced by 94`rr::DebugInfo::getOrParseFileTokens()` to identify the variable name. 95 96However, things get a bit more complicated when there are multiple variables 97being constructed on the same line. 98 99Take for example: 100 101```C++ 102rr::Int a = rr::Int(1) + rr::Int(2) 103``` 104 105Here we have 3 calls to the `rr::Int` constructor, each calling down 106to `RR_DEBUG_INFO_EMIT_VAR()`. 107 108To disambiguate which of these should be bound to the variable name "`a`", 109`rr::DebugInfo::EmitVariable()` buffers the binding into 110`scope.pending` and the last binding for a given line is used by 111`DebugInfo::emitPending()`. For variable construction and assignment, C++ 112guarantees that the LHS is the last value to be constructed. 113 114This solution is not perfect. 115 116Multi-line expressions, multiple assignments on a single line, macro obfuscation 117can all break variable bindings - however the majority of typical cases work. 118 119#### 3. Variable scope 120 121`rr::DebugInfo` maintains a stack of `llvm::DIScope`s and `llvm::DILocation`s 122that mirrors the current backtrace for function being called. 123 124A synthetic call stack is produced by chaining `llvm::DILocation`s with 125`InlinedAt`s. 126 127For example, at the declaration of `i`: 128 129```C++ 130void B() 131{ 132 rr::Int i; // <- here 133} 134 135void A() 136{ 137 B(); 138} 139 140int main(int argc, const char* argv[]) 141{ 142 A(); 143} 144``` 145 146The `DIScope` hierarchy would be: 147 148```C++ 149 DIFile: "foo.cpp" 150rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main" 151rr::DebugInfo::diScope[1].di: ↳ DISubprogram: "A" 152rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B" 153``` 154 155The `DILocation` hierarchy would be: 156 157```C++ 158rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction") 159rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main") 160rr::DebugInfo::diScope[1].location: ↳ DILocation(DISubprogram: "A") 161rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B") 162``` 163 164Where '↳' represents an `InlinedAt`. 165 166 167`rr::DebugInfo::diScope` is updated by `rr::DebugInfo::syncScope()`. 168 169`llvm::DIScope`s typically do not nest - there is usually a separate 170`llvm::DISubprogram` for each function in the callstack. All local variables 171within a function will typically share the same scope, regardless of whether 172they are declared within a sub-block. 173 174Loops and jumps within a function add complexity. Consider: 175 176```C++ 177void B() 178{ 179 rr::Int i = 0; 180} 181 182void A() 183{ 184 for (int i = 0; i < 3; i++) 185 { 186 rr::Int x = 0; 187 } 188 B(); 189} 190 191int main(int argc, const char* argv[]) 192{ 193 A(); 194} 195``` 196 197In this particular example Reactor will not be aware of the `for` loop, and will 198attempt to create three variables called "`x`" in the same function scope for `A()`. 199Duplicate symbols in the same `llvm::DIScope` result in undefined behavior. 200 201To solve this, `rr::DebugInfo::syncScope()` observes when a function jumps 202backwards, and forks the current `llvm::DILexicalBlock` for the function. This 203results in a number of `llvm::DILexicalBlock` chains, each declaring variables 204that shadow the previous block. 205 206At the declaration of `i`, the `DIScope` hierarchy would be: 207 208```C++ 209 DIFile: "foo.cpp" 210rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main" 211 ↳ DISubprogram: "A" 212 | ↳ DILexicalBlock: "A".1 213rr::DebugInfo::diScope[1].di: | ↳ DILexicalBlock: "A".2 214rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B" 215``` 216 217The `DILocation` hierarchy would be: 218 219```C++ 220rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction") 221rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main") 222rr::DebugInfo::diScope[1].location: ↳ DILocation(DILexicalBlock: "A".2) 223rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B") 224``` 225 226### Debugger integration 227 228Once the debug information has been generated, it needs to be handed to the 229debugger. 230 231Reactor uses [`llvm::JITEventListener::createGDBRegistrationListener()`](http://llvm.org/doxygen/classllvm_1_1JITEventListener.html#a004abbb5a0d48ac376dfbe3e3c97c306) 232to inform GDB of the JIT'd program and its debugging information. 233More information [can be found here](https://llvm.org/docs/DebuggingJITedCode.html). 234 235LLDB should be able to support this same mechanism, but at the time of writing 236this does not appear to work. 237 238