1Symbolication 2============= 3 4.. contents:: 5 :local: 6 7 8LLDB is separated into a shared library that contains the core of the debugger, 9and a driver that implements debugging and a command interpreter. LLDB can be 10used to symbolicate your crash logs and can often provide more information than 11other symbolication programs: 12 13- Inlined functions 14- Variables that are in scope for an address, along with their locations 15 16The simplest form of symbolication is to load an executable: 17 18.. code-block:: text 19 20 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 21 22We use the ``--no-dependents`` flag with the ``target create`` command so that 23we don't load all of the dependent shared libraries from the current system. 24When we symbolicate, we are often symbolicating a binary that was running on 25another system, and even though the main executable might reference shared 26libraries in ``/usr/lib``, we often don't want to load the versions on the 27current computer. 28 29Using the ``image list`` command will show us a list of all shared libraries 30associated with the current target. As expected, we currently only have a 31single binary: 32 33.. code-block:: text 34 35 (lldb) image list 36 [ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out 37 /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out 38 39Now we can look up an address: 40 41.. code-block:: text 42 43 (lldb) image lookup --address 0x100000aa3 44 Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131) 45 Summary: a.out`main + 67 at main.c:13 46 47Since we haven't specified a slide or any load addresses for individual 48sections in the binary, the address that we use here is a file address. A file 49address refers to a virtual address as defined by each object file. 50 51If we didn't use the ``--no-dependents`` option with ``target create``, we 52would have loaded all dependent shared libraries: 53 54.. code-block:: text 55 56 (lldb) image list 57 [ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out 58 /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out 59 [ 1] 8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B 0x0000000000000000 /usr/lib/system/libsystem_c.dylib 60 [ 2] 62AA0B84-188A-348B-8F9E-3E2DB08DB93C 0x0000000000000000 /usr/lib/system/libsystem_dnssd.dylib 61 [ 3] C0535565-35D1-31A7-A744-63D9F10F12A4 0x0000000000000000 /usr/lib/system/libsystem_kernel.dylib 62 ... 63 64Now if we do a lookup using a file address, this can result in multiple matches 65since most shared libraries have a virtual address space that starts at zero: 66 67.. code-block:: text 68 69 (lldb) image lookup -a 0x1000 70 Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096) 71 72 Address: libsystem_c.dylib[0x0000000000001000] (libsystem_c.dylib.__TEXT.__text + 928) 73 Summary: libsystem_c.dylib`mcount + 9 74 75 Address: libsystem_dnssd.dylib[0x0000000000001000] (libsystem_dnssd.dylib.__TEXT.__text + 456) 76 Summary: libsystem_dnssd.dylib`ConvertHeaderBytes + 38 77 78 Address: libsystem_kernel.dylib[0x0000000000001000] (libsystem_kernel.dylib.__TEXT.__text + 1116) 79 Summary: libsystem_kernel.dylib`clock_get_time + 102 80 ... 81 82To avoid getting multiple file address matches, you can specify the name of the 83shared library to limit the search: 84 85.. code-block:: text 86 87 (lldb) image lookup -a 0x1000 a.out 88 Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096) 89 90Defining Load Addresses for Sections 91------------------------------------ 92 93When symbolicating your crash logs, it can be tedious if you always have to 94adjust your crashlog-addresses into file addresses. To avoid having to do any 95conversion, you can set the load address for the sections of the modules in 96your target. Once you set any section load address, lookups will switch to 97using load addresses. You can slide all sections in the executable by the same 98amount, or set the load address for individual sections. The ``target modules 99load --slide`` command allows us to set the load address for all sections. 100 101Below is an example of sliding all sections in a.out by adding 0x123000 to each 102section's file address: 103 104.. code-block:: text 105 106 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 107 (lldb) target modules load --file a.out --slide 0x123000 108 109 110It is often much easier to specify the actual load location of each section by 111name. Crash logs on macOS have a Binary Images section that specifies that 112address of the __TEXT segment for each binary. Specifying a slide requires 113requires that you first find the original (file) address for the __TEXT 114segment, and subtract the two values. If you specify the address of the __TEXT 115segment with ``target modules load section address``, you don't need to do any 116calculations. To specify the load addresses of sections we can specify one or 117more section name + address pairs in the ``target modules load`` command: 118 119.. code-block:: text 120 121 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 122 (lldb) target modules load --file a.out __TEXT 0x100123000 123 124We specified that the __TEXT section is loaded at 0x100123000. Now that we have 125defined where sections have been loaded in our target, any lookups we do will 126now use load addresses so we don't have to do any math on the addresses in the 127crashlog backtraces, we can just use the raw addresses: 128 129.. code-block:: text 130 131 (lldb) image lookup --address 0x100123aa3 132 Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131) 133 Summary: a.out`main + 67 at main.c:13 134 135Loading Multiple Executables 136---------------------------- 137 138You often have more than one executable involved when you need to symbolicate a 139crash log. When this happens, you create a target for the main executable or 140one of the shared libraries, then add more modules to the target using the 141``target modules add`` command. 142 143Lets say we have a Darwin crash log that contains the following images: 144 145.. code-block:: text 146 147 Binary Images: 148 0x100000000 - 0x100000ff7 <A866975B-CA1E-3649-98D0-6C5FAA444ECF> /tmp/a.out 149 0x7fff83f32000 - 0x7fff83ffefe7 <8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B> /usr/lib/system/libsystem_c.dylib 150 0x7fff883db000 - 0x7fff883e3ff7 <62AA0B84-188A-348B-8F9E-3E2DB08DB93C> /usr/lib/system/libsystem_dnssd.dylib 151 0x7fff8c0dc000 - 0x7fff8c0f7ff7 <C0535565-35D1-31A7-A744-63D9F10F12A4> /usr/lib/system/libsystem_kernel.dylib 152 153First we create the target using the main executable and then add any extra 154shared libraries we want: 155 156.. code-block:: text 157 158 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 159 (lldb) target modules add /usr/lib/system/libsystem_c.dylib 160 (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib 161 (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib 162 163 164If you have debug symbols in standalone files, such as dSYM files on macOS, 165you can specify their paths using the --symfile option for the ``target create`` 166(recent LLDB releases only) and ``target modules add`` commands: 167 168.. code-block:: text 169 170 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out --symfile /tmp/a.out.dSYM 171 (lldb) target modules add /usr/lib/system/libsystem_c.dylib --symfile /build/server/a/libsystem_c.dylib.dSYM 172 (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib --symfile /build/server/b/libsystem_dnssd.dylib.dSYM 173 (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib --symfile /build/server/c/libsystem_kernel.dylib.dSYM 174 175Then we set the load addresses for each __TEXT section (note the colors of the 176load addresses above and below) using the first address from the Binary Images 177section for each image: 178 179.. code-block:: text 180 181 (lldb) target modules load --file a.out 0x100000000 182 (lldb) target modules load --file libsystem_c.dylib 0x7fff83f32000 183 (lldb) target modules load --file libsystem_dnssd.dylib 0x7fff883db000 184 (lldb) target modules load --file libsystem_kernel.dylib 0x7fff8c0dc000 185 186 187Now any stack backtraces that haven't been symbolicated can be symbolicated 188using ``image lookup`` with the raw backtrace addresses. 189 190Given the following raw backtrace: 191 192.. code-block:: text 193 194 Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 195 0 libsystem_kernel.dylib 0x00007fff8a1e6d46 __kill + 10 196 1 libsystem_c.dylib 0x00007fff84597df0 abort + 177 197 2 libsystem_c.dylib 0x00007fff84598e2a __assert_rtn + 146 198 3 a.out 0x0000000100000f46 main + 70 199 4 libdyld.dylib 0x00007fff8c4197e1 start + 1 200 201We can now symbolicate the load addresses: 202 203.. code-block:: text 204 205 (lldb) image lookup -a 0x00007fff8a1e6d46 206 (lldb) image lookup -a 0x00007fff84597df0 207 (lldb) image lookup -a 0x00007fff84598e2a 208 (lldb) image lookup -a 0x0000000100000f46 209 210 211Getting Variable Information 212---------------------------- 213 214If you add the --verbose flag to the ``image lookup --address`` command, you 215can get verbose information which can often include the locations of some of 216your local variables: 217 218.. code-block:: text 219 220 (lldb) image lookup --address 0x100123aa3 --verbose 221 Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 110) 222 Summary: a.out`main + 50 at main.c:13 223 Module: file = "/tmp/a.out", arch = "x86_64" 224 CompileUnit: id = {0x00000000}, file = "/tmp/main.c", language = "ISO C:1999" 225 Function: id = {0x0000004f}, name = "main", range = [0x0000000100000bc0-0x0000000100000dc9) 226 FuncType: id = {0x0000004f}, decl = main.c:9, compiler_type = "int (int, const char **, const char **, const char **)" 227 Blocks: id = {0x0000004f}, range = [0x100000bc0-0x100000dc9) 228 id = {0x000000ae}, range = [0x100000bf2-0x100000dc4) 229 LineEntry: [0x0000000100000bf2-0x0000000100000bfa): /tmp/main.c:13:23 230 Symbol: id = {0x00000004}, range = [0x0000000100000bc0-0x0000000100000dc9), name="main" 231 Variable: id = {0x000000bf}, name = "path", type= "char [1024]", location = DW_OP_fbreg(-1072), decl = main.c:28 232 Variable: id = {0x00000072}, name = "argc", type= "int", location = r13, decl = main.c:8 233 Variable: id = {0x00000081}, name = "argv", type= "const char **", location = r12, decl = main.c:8 234 Variable: id = {0x00000090}, name = "envp", type= "const char **", location = r15, decl = main.c:8 235 Variable: id = {0x0000009f}, name = "aapl", type= "const char **", location = rbx, decl = main.c:8 236 237 238The interesting part is the variables that are listed. The variables are the 239parameters and local variables that are in scope for the address that was 240specified. These variable entries have locations which are shown in bold above. 241Crash logs often have register information for the first frame in each stack, 242and being able to reconstruct one or more local variables can often help you 243decipher more information from a crash log than you normally would be able to. 244Note that this is really only useful for the first frame, and only if your 245crash logs have register information for your threads. 246 247Using Python API to Symbolicate 248------------------------------- 249 250All of the commands above can be done through the python script bridge. The 251code below will recreate the target and add the three shared libraries that we 252added in the darwin crash log example above: 253 254.. code-block:: python 255 256 triple = "x86_64-apple-macosx" 257 platform_name = None 258 add_dependents = False 259 target = lldb.debugger.CreateTarget("/tmp/a.out", triple, platform_name, add_dependents, lldb.SBError()) 260 if target: 261 # Get the executable module 262 module = target.GetModuleAtIndex(0) 263 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x100000000) 264 module = target.AddModule ("/usr/lib/system/libsystem_c.dylib", triple, None, "/build/server/a/libsystem_c.dylib.dSYM") 265 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff83f32000) 266 module = target.AddModule ("/usr/lib/system/libsystem_dnssd.dylib", triple, None, "/build/server/b/libsystem_dnssd.dylib.dSYM") 267 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff883db000) 268 module = target.AddModule ("/usr/lib/system/libsystem_kernel.dylib", triple, None, "/build/server/c/libsystem_kernel.dylib.dSYM") 269 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff8c0dc000) 270 271 load_addr = 0x00007fff8a1e6d46 272 # so_addr is a section offset address, or a lldb.SBAddress object 273 so_addr = target.ResolveLoadAddress (load_addr) 274 # Get a symbol context for the section offset address which includes 275 # a module, compile unit, function, block, line entry, and symbol 276 sym_ctx = so_addr.GetSymbolContext (lldb.eSymbolContextEverything) 277 print sym_ctx 278 279 280Use Builtin Python Module to Symbolicate 281---------------------------------------- 282 283LLDB includes a module in the lldb package named lldb.utils.symbolication. This module contains a lot of symbolication functions that simplify the symbolication process by allowing you to create objects that represent symbolication class objects such as: 284 285- lldb.utils.symbolication.Address 286- lldb.utils.symbolication.Section 287- lldb.utils.symbolication.Image 288- lldb.utils.symbolication.Symbolicator 289 290 291**lldb.utils.symbolication.Address** 292 293This class represents an address that will be symbolicated. It will cache any 294information that has been looked up: module, compile unit, function, block, 295line entry, symbol. It does this by having a lldb.SBSymbolContext as a member 296variable. 297 298**lldb.utils.symbolication.Section** 299 300This class represents a section that might get loaded in a 301lldb.utils.symbolication.Image. It has helper functions that allow you to set 302it from text that might have been extracted from a crash log file. 303 304**lldb.utils.symbolication.Image** 305 306This class represents a module that might get loaded into the target we use for 307symbolication. This class contains the executable path, optional symbol file 308path, the triple, and the list of sections that will need to be loaded if we 309choose the ask the target to load this image. Many of these objects will never 310be loaded into the target unless they are needed by symbolication. You often 311have a crash log that has 100 to 200 different shared libraries loaded, but 312your crash log stack backtraces only use a few of these shared libraries. Only 313the images that contain stack backtrace addresses need to be loaded in the 314target in order to symbolicate. 315 316Subclasses of this class will want to override the 317locate_module_and_debug_symbols method: 318 319.. code-block:: text 320 321 class CustomImage(lldb.utils.symbolication.Image): 322 def locate_module_and_debug_symbols (self): 323 # Locate the module and symbol given the info found in the crash log 324 325Overriding this function allows clients to find the correct executable module 326and symbol files as they might reside on a build server. 327 328**lldb.utils.symbolication.Symbolicator** 329 330This class coordinates the symbolication process by loading only the 331lldb.utils.symbolication.Image instances that need to be loaded in order to 332symbolicate an supplied address. 333 334**lldb.macosx.crashlog** 335 336lldb.macosx.crashlog is a package that is distributed on macOS builds that 337subclasses the above classes. This module parses the information in the Darwin 338crash logs and creates symbolication objects that represent the images, the 339sections and the thread frames for the backtraces. It then uses the functions 340in the lldb.utils.symbolication to symbolicate the crash logs. 341 342This module installs a new ``crashlog`` command into the lldb command 343interpreter so that you can use it to parse and symbolicate macOS crash 344logs: 345 346.. code-block:: text 347 348 (lldb) command script import lldb.macosx.crashlog 349 "crashlog" and "save_crashlog" command installed, use the "--help" option for detailed help 350 (lldb) crashlog /tmp/crash.log 351 ... 352 353The command that is installed has built in help that shows the options that can 354be used when symbolicating: 355 356.. code-block:: text 357 358 (lldb) crashlog --help 359 Usage: crashlog [options] [FILE ...] 360 361Symbolicate one or more darwin crash log files to provide source file and line 362information, inlined stack frames back to the concrete functions, and 363disassemble the location of the crash for the first frame of the crashed 364thread. If this script is imported into the LLDB command interpreter, a 365``crashlog`` command will be added to the interpreter for use at the LLDB 366command line. After a crash log has been parsed and symbolicated, a target will 367have been created that has all of the shared libraries loaded at the load 368addresses found in the crash log file. This allows you to explore the program 369as if it were stopped at the locations described in the crash log and functions 370can be disassembled and lookups can be performed using the addresses found in 371the crash log. 372 373.. code-block:: text 374 375 Options: 376 -h, --help show this help message and exit 377 -v, --verbose display verbose debug info 378 -g, --debug display verbose debug logging 379 -a, --load-all load all executable images, not just the images found 380 in the crashed stack frames 381 --images show image list 382 --debug-delay=NSEC pause for NSEC seconds for debugger 383 -c, --crashed-only only symbolicate the crashed thread 384 -d DISASSEMBLE_DEPTH, --disasm-depth=DISASSEMBLE_DEPTH 385 set the depth in stack frames that should be 386 disassembled (default is 1) 387 -D, --disasm-all enabled disassembly of frames on all threads (not just 388 the crashed thread) 389 -B DISASSEMBLE_BEFORE, --disasm-before=DISASSEMBLE_BEFORE 390 the number of instructions to disassemble before the 391 frame PC 392 -A DISASSEMBLE_AFTER, --disasm-after=DISASSEMBLE_AFTER 393 the number of instructions to disassemble after the 394 frame PC 395 -C NLINES, --source-context=NLINES 396 show NLINES source lines of source context (default = 397 4) 398 --source-frames=NFRAMES 399 show source for NFRAMES (default = 4) 400 --source-all show source for all threads, not just the crashed 401 thread 402 -i, --interactive parse all crash logs and enter interactive mode 403 404 405The source for the "symbolication" and "crashlog" modules are available in SVN. 406 407