1This file contains various notes/ideas/history/... related 2to gdbserver in valgrind. 3 4How to use Valgrind gdbserver ? 5------------------------------- 6This is described in the Valgrind user manual. 7Before reading the below, you better read the user manual first. 8 9What is gdbserver ? 10------------------- 11gdb debugger typically is used to debug a process running 12on the same machine : gdb uses system calls (such as ptrace) 13to fetch data from the process being debugged 14or to change data in the process 15or interrupt the process 16or ... 17 18gdb can also debug processes running in a different computer 19(e.g. it can debug a process running on a small real time 20board). 21 22gdb does this by sending some commands (e.g. using tcp/ip) to a piece 23of code running on the remote computer. This piece of code (called a 24gdb stub in small boards, or gdbserver when the remote computer runs 25an OS such as GNU/linux) will provide a set of commands allowing gdb 26to remotely debug the process. Examples of commands are: "get the 27registers", "get the list of running threads", "read xxx bytes at 28address yyyyyyyy", etc. The definition of all these commands and the 29associated replies is the gdb remote serial protocol, which is 30documented in Appendix D of gdb user manual. 31 32The standard gdb distribution has a standalone gdbserver (a small 33executable) which implements this protocol and the needed system calls 34to allow gdb to remotely debug process running on a linux or MacOS or 35... 36 37Activation of gdbserver code inside valgrind 38-------------------------------------------- 39The gdbserver code (from gdb 6.6, GPL2+) has been modified so as to 40link it with valgrind and allow the valgrind guest process to be 41debugged by a gdb speaking to this gdbserver embedded in valgrind. 42The ptrace system calls inside gdbserver have been replaced by reading 43the state of the guest. 44 45The gdbserver functionality is activated with valgrind command line 46options. If gdbserver is not enabled, then the impact on valgrind 47runtime is minimal: basically it just checks at startup the command 48line option to see that there is nothing to do for what concerns gdb 49server: there is a "if gdbserver is active" check in the translate 50function of translate.c and an "if" in the valgrind scheduler. 51If the valgrind gdbserver is activated (--vgdb=yes), the impact 52is minimal (from time to time, the valgrind scheduler checks a counter 53in memory). Option --vgdb-poll=yyyyy controls how often the scheduler 54will do a (somewhat) more heavy check to see if gdbserver needs to 55stop execution of the guest to allow debugging. 56If valgrind gdbserver is activated with --vgdb=full, then 57each instruction is instrumented with an additional call to a dirty 58helper. 59 60How does gdbserver code interacts with valgrind ? 61------------------------------------------------- 62When an error is reported, the gdbserver code is called. It reads 63commands from gdb using read system call on a FIFO (e.g. a command 64such as "get the registers"). It executes the command (e.g. fetches 65the registers from the guest state) and writes the reply (e.g. a 66packet containing the register data). When gdb instructs gdbserver to 67"continue", the control is returned to valgrind, which then continues 68to execute guest code. The FIFOs used to communication between 69valgrind and gdb are created at startup if gdbserver is activated 70according to the --vgdb=no/yes/full command line option. 71 72How are signals "handled" ? 73--------------------------- 74When a signal is to be given to the guest, valgrind core first calls 75gdbserver (if a gdb is currently connected to valgrind, otherwise the 76signal is delivered immediately). If gdb instructs to give the signal 77to the process, the signal is delivered to the guest. Otherwise, the 78signal is ignored (not given to the guest). The user can 79with gdb further decide to pass (or not pass) the signal. 80Note that some (fatal) signals cannot be ignored. 81 82How are "break/step/stepi/next/..." implemented ? 83------------------------------------------------- 84When a break is put by gdb on an instruction, a command is sent to the 85gdbserver in valgrind. This causes the basic block of this instruction 86to be discarded and then re-instrumented so as to insert calls to a 87dirty helper which calls the gdb server code. When a block is 88instrumented for gdbserver, all the "jump targets" of this block are 89invalidated, so as to allow step/stepi/next to properly work: these 90blocks will themselves automatically be re-instrumented for gdbserver 91if they are jumped to. 92The valgrind gdbserver remembers which blocks have been instrumented 93due to this "lazy 'jump targets' debugging instrumentation" so as to 94discard these "debugging translation" when gdb instructs to continue 95the execution normally. 96The blocks in which an explicit break has been put by the user 97are kept instrumented for gdbserver. 98(but note that by default, gdb removes all breaks when the 99process is stopped, and re-inserts all breaks when the process 100is continued). This behaviour can be changed using the gdb 101command 'set breakpoint always-inserted'. 102 103How are watchpoints implemented ? 104--------------------------------- 105Watchpoints implies support from the tool to detect that 106a location is read and/or written. Currently, only memcheck 107supports this : when a watchpoint is placed, memcheck changes 108the addressability bits of the watched memory zone to be unacessible. 109Before an access, memcheck then detects an error, but sees this error 110is due to a watchpoint and gives the control back to gdb. 111Stopping on the exact instruction for a write watchpoint implies 112to use --vgdb=full. This is because the error is detected by memcheck 113before modifying the value. gdb checks that the value has not changed 114and so "does not believe" the information that the write watchpoint 115was triggered, and continues the execution. At the next watchpoint 116occurence, gdb sees the value has changed. But the watchpoints are all 117reported "off by one". To avoid this, Valgrind gdbserver must 118terminate the current instruction before reporting the write watchpoint. 119Terminating precisely the current instruction implies to have 120instrumented all the instructions of the block for gdbserver even 121if there is no break in this block. This is ensured by --vgdb=full. 122See m_gdbserver.c Bool VG_(is_watched) where watchpoint handling 123is implemented. 124 125How is the Valgrind gdbserver receiving commands/packets from gdb ? 126------------------------------------------------------------------- 127The embedded gdbserver reads gdb commands on a named pipe having 128(by default) the name /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST 129where PID, USER, and HOST will be replaced by the actual pid, the user id, 130and the host name, respectively. 131The embedded gdbserver will reply to gdb commands on a named pipe 132/tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST 133 134gdb does not speak directly with gdbserver in valgrind: a relay application 135called vgdb is needed between gdb and the valgrind-ified process. 136gdb writes commands on the stdin of vgdb. vgdb reads these 137commands and writes them on FIFO /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST. 138vgdb reads replies on FIFO /tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST 139and writes them on its stdout. 140 141Note: The solution of named pipes was preferred to tcp ip connections as 142it allows a discovery of which valgrind-ified processes are ready to accept 143command by looking at files starting with the /tmp/vgdb-pipe- prefix 144(changeable by a command line option). 145Also, the usual unix protections are protecting 146the valgrind process against other users sending commands. 147The relay process also takes into account the wake up of the valgrind 148process in case all threads are blocked in a system call. 149The relay process can also be used in a shell to send commands 150without a gdb (this allows to have a standard mechanism to control 151valgrind tools from the command line, rather than specialized mechanism 152e.g. in callgrind). 153 154How is gdbserver activated if all Valgrind threads are blocked in a syscall ? 155----------------------------------------------------------------------------- 156vgdb relays characters from gdb to valgrind. The scheduler will from 157time to time check if gdbserver has to handle incoming characters. 158(the check is efficient i.e. most of the time consists in checking 159a counter in (shared) memory). 160 161However, it might be that all the threads in the valgrind process are 162blocked in a system call. In such a case, no polling will be done by 163the valgrind scheduler (as no activity takes place). By default, vgdb 164will check after 100ms if the characters it has written have been read 165by valgrind. If not, vgdb will force the invocation of the gdbserver 166code inside the valgrind process. 167 168This forced invocation is implemented using the ptrace system call: 169using ptrace, vgdb will cause the valgrind process to call the 170gdbserver code. 171 172This wake up is *not* done using signals as this would imply to 173implement a syscall restart logic in valgrind for all system 174calls. When using ptrace as above, the linux kernel is responsible to 175restart the system call. 176 177This wakeup is also *not* implemented by having a "system thread" 178started by valgrind as this would transform all non-threaded programs 179in threaded programs when running under valgrind. Also, such a 'system 180thread' for gdbserver was tried by Greg Parker in the early MacOS 181port, and was unreliable. 182 183So, the ptrace based solution was chosen instead. 184 185There used to be some bugs in the kernel when using ptrace on 186a process blocked in a system call : the symptom is that the system 187call fails with an unknown errno 512. This typically happens 188with a vgdb in 64bits ptrace-ing a 32 bits process. 189A bypass for old kernels has been integrated in vgdb.c (sign extend 190register rax). 191 192At least on a fedora core 12 (kernel 2.6.32), syscall restart of read 193and select are working ok and red-hat 5.3 (an old kernel), everything 194works properly. 195 196Need to investigate if darwin and/or AIX can similarly do syscall 197restart with ptrace. 198 199The vgdb argument --max-invoke-ms=xxx allows to control the nr of 200milli-seconds after which vgdb will force the invocation of gdbserver 201code. If xxx is 0, this disables the forced invocation. 202Also, disabling this ptrace mechanism is necessary in case you are 203debugging the valgrind code at the same time as debugging the guest 204process using gdbserver. 205 206Do not kill -9 vgdb while it has interrupted the valgrind process, 207otherwise the valgrind process will very probably stay stopped or die. 208 209 210Implementation is based on the gdbserver code from gdb 6.6 211---------------------------------------------------------- 212The gdbserver implementation is derived from the gdbserver included 213in the gdb distribution. 214The files originating from gdb are : inferiors.c, regcache.[ch], 215regdef.h, remote-utils.c, server.[ch], signals.c, target.[ch], utils.c, 216version.c. 217valgrind-low-* are inspired from gdb files. 218 219This code had to be changed to integrate properly within valgrind 220(e.g. no libc usage). Some of these changes have been ensured by 221using the preprocessor to replace calls by valgrind equivalent, 222e.g. #define memcpy(...) VG_(memcpy) (...). 223 224Some "control flow" changes are due to the fact that gdbserver inside 225valgrind must return the control to valgrind when the 'debugged' 226process has to run, while in a classical gdbserver usage, the 227gdbserver process waits for a debugged process to stop on a break or 228similar. This has implied to have some variables to remember the 229state of gdbserver before returning to valgrind (search for 230resume_packet_needed in server.c) and "goto" the place where gdbserver 231expects a stopped process to return control to gdbserver. 232 233How does a tool need to be changed to be "debuggable" ? 234------------------------------------------------------- 235There is no need to modify a tool to have it "debuggable" via 236gdbserver : e.g. reports of errors, break etc will work "out of the 237box". If an interactive usage of tool client requests or similar is 238desired for a tool, then simple code can be written for that via a 239specific client request VG_USERREQ__GDB_MONITOR_COMMAND code. The tool 240function "handle_client_request" must then parse the string received 241in argument and call the expected valgrind or tool code. See 242e.g. massif ms_handle_client_request as an example. 243 244 245Automatic regression tests: 246--------------------------- 247Automatic Valgrind gdbserver tests are in the directory 248$(top_srcdir)/gdbserver_tests. 249Read $(top_srcdir)/gdbserver_tests/README_DEVELOPPERS for more 250info about testing. 251 252How to integrate support for a new architecture xxx? 253---------------------------------------------------- 254Let's imagine a new architecture hal9000 has to be supported. 255 256Mandatory: 257The main thing to do is to make a file valgrind-low-hal9000.c. 258Start from an existing file (e.g. valgrind-low-x86.c). 259The data structures 'struct reg regs' 260and 'const char *expedite_regs' are build from files 261in the gdb sources, e.g. for an new arch hal9000 262 cd gdb/regformats 263 ./regdat.sh reg-hal9000.dat hal9000 264 265From the generated file hal9000, you copy/paste in 266valgrind-low-hal9000.c the two needed data structures and change their 267name to 'regs' and 'expedite_regs' 268 269Then adapt the set of functions needed to initialize the structure 270'static struct valgrind_target_ops low_target'. 271 272Optional but heavily recommended: 273To have a proper wake up of a Valgrind process with all threads 274blocked in a system call, some architecture specific code 275has to be done in vgdb.c : search for PTRACEINVOKER processor symbol 276to see what has to be completed. 277 278For Linux based platforms, all the ptrace calls should be ok. 279The only thing needed is the code needed to "push a dummy call" on the stack, 280i.e. assign the relevant registers in the struct user_regs_struct, and push 281values on the stack according to the ABI. 282 283For other platforms (i.e. Macos), more work is needed as the ptrace calls 284on Macos are either different and/or incomplete (and so, 'Mach' specific 285things are needed e.g. to attach to threads etc). 286A courageous Mac aficionado is welcome on this aspect. 287 288Optional: 289To let gdb see the Valgrind shadow registers, xml description 290files have to be provided + valgrind-low-hal9000.c has 291to give the top xml file. 292Start from the xml files found in the gdb distribution directory 293gdb/features. You need to duplicate and modify these files to provide 294shadow1 and shadow2 register sets description. 295 296Modify coregrind/Makefile.am: 297 add valgrind-low-hal9000.c 298 If you have target xml description, also add them in pkglib_DATA 299 300 301A not handled comment given by Julian at FOSDEM. 302------------------------------------------------ 303* the check for vgdb-poll in scheduler.c could/should be moved to another place: 304 instead of having it in run_thread_for_a_while 305 the vgdb poll check could be in VG_(scheduler). 306 (not clear to me why one is better than the other ???) 307 308TODO and/or additional nice things to have 309------------------------------------------ 310* many options can be changed on-line without problems. 311 => would be nice to have a v.option command that would evaluate 312 its arguments like the startup options of m_main.c and tool clo processing. 313 314* have a memcheck monitor command 315 who_points_at <address> | <loss_record_nr> 316 that would describe the addresses where a pointer is found 317 to address (or address leaked at loss_record_nr>) 318 This would allow to interactively searching who is "keeping" a piece 319 of memory. 320 321* some GDBTD in the code 322 323(GDBTD = GDB To Do = something still to look at and/or a question) 324 325* All architectures and platforms are done. 326 But there are still some "GDBTD" to convert between gdb registers 327 and VEX registers : 328 e.g. some registers in x86 or amd64 that I could not 329 translate to VEX registers. Someone with a good knowledge 330 of these architectures might complete this 331 (see the GDBTD in valgrind-low-*.c) 332 333* "hardware" watchpoint (read/write/access watchpoints) are implemented 334 but can't persuade gdb to insert a hw watchpoint of what valgrind 335 supports (i.e. of whatever length). 336 The reason why gdb does not accept a hardware watch of let's say 337 10 bytes is: 338default_region_ok_for_hw_watchpoint (addr=134520360, len=10) at target.c:2738 3392738 return (len <= gdbarch_ptr_bit (target_gdbarch) / TARGET_CHAR_BIT); 340#0 default_region_ok_for_hw_watchpoint (addr=134520360, len=10) 341 at target.c:2738 3422738 return (len <= gdbarch_ptr_bit (target_gdbarch) / TARGET_CHAR_BIT); 343#1 0x08132e65 in can_use_hardware_watchpoint (v=0x85a8ef0) 344 at breakpoint.c:8300 3458300 if (!target_region_ok_for_hw_watchpoint (vaddr, len)) 346#2 0x0813bd17 in watch_command_1 (arg=0x84169f0 "", accessflag=2, 347 from_tty=<value optimized out>) at breakpoint.c:8140 348 A small patch in gdb remote.c allowed to control the remote target watchpoint 349 length limit. This patch is to be submitted. 350 351* Currently, at least on recent linux kernel, vgdb can properly wake 352 up a valgrind process which is blocked in system calls. Maybe we 353 need to see till which kernel version the ptrace + syscall restart 354 is broken, and put the default value of --max-invoke-ms to 0 in this 355 case. 356 357* more client requests can be programmed in various tools. Currently, 358 there are only a few standard valgrind or memcheck client requests 359 implemented. 360 v.suppression [generate|add|delete] might be an interesting command: 361 generate would output a suppression, add/delete would add a suppression 362 in memory for the last (or selected?) error. 363 v.break on fn calls/entry/exit + commands associated to it 364 (such as search leaks)? 365 366 367 368* currently jump(s) and inferior call(s) are somewhat dangerous 369 when called from a block not yet instrumented : instead 370 of continuing till the next Imark, where there will be a 371 debugger call that can properly jump at an instruction boundary, 372 the jump/call will quit the "middle" of an instruction. 373 We could detect if the current block is instrumented by a trick 374 like this: 375 /* Each time helperc_CallDebugger is called, we will store 376 the address from which is it called and the nr of bbs_done 377 when called. This allows to detect that gdbserver is called 378 from a block which is instrumented. */ 379 static HWord CallDebugger_addr; 380 static ULong CallDebugger_bbs_done; 381 382 Bool VG_(gdbserver_current_IP_instrumented) (ThreadId tid) 383 { 384 if (VG_(get_IP) (tid) != CallDebugger_addr 385 || CallDebugger_bbs_done != VG_(bbs_done)()) 386 return False; 387 return True; 388 } 389 390 Alternatively, we ensure we can re-instrument the current 391 block for gdbserver while executing it. 392 Something like: 393 keep current block till the end of the current instruction, then 394 go back to scheduler. 395 Unsure if and how this is do-able. 396 397 398* ensure that all non static symbols of gdbserver files are #define 399 xxxxx VG_(xxxxx) ???? Is this really needed ? I have tried to put in 400 a test program variables and functions with the same name as valgrind 401 stuff, and everything seems to be ok. 402 I see that all exported symbols in valgrind have a unique prefix 403 created with VG_ or MC_ or ... 404 This is not done for the "gdb gdbserver code", where I have kept 405 the original names. Is this a problem ? I could not create 406 a "symbol" collision between the user symbol and the valgrind 407 core gdbserver symbol. 408 409* currently, gdbserver can only stop/continue the whole process. It 410 might be interesting to have a fine-grained thread control (vCont 411 packet) maybe for tools such as helgrind, drd. This would allow the 412 user to stop/resume specific threads. Also, maybe this would solve 413 the following problem: wait for a breakpoint to be encountered, 414 switch thread, next. This sometimes causes an internal error in gdb, 415 probably because gdb believes the current thread will be continued ? 416 417* would be nice to have some more tests. 418 419* better valgrind target support in gdb (see comments of Tom Tromey). 420 421 422-------- description of how gdb invokes a function in the inferior 423to call a function in the inferior (below is for x86): 424gdb writes ESP and EBP to have some more stack space 425push a return address equal to 0x8048390 <_start> 426puts a break at 0x8048390 427put address of the function to call (e.g. hello_world in EIP (0x8048444)) 428continue 429break encountered at 0x8048391 (90 after decrement) 430 => report stop to gdb 431 => gdb restores esp/ebp/eip to what it was (eg. 0x804848C) 432 => gdb "s" => causes the EIP to go to the new EIP (i.e. 0x804848C) 433 gdbserver tells "resuming from 0x804848c" 434 "stop pc is 0x8048491" => informed gdb of this 435 436