1Demonstrations of deadlock_detector. 2 3This program detects potential deadlocks on a running process. The program 4attaches uprobes on `pthread_mutex_lock` and `pthread_mutex_unlock` to build 5a mutex wait directed graph, and then looks for a cycle in this graph. This 6graph has the following properties: 7 8- Nodes in the graph represent mutexes. 9- Edge (A, B) exists if there exists some thread T where lock(A) was called 10 and lock(B) was called before unlock(A) was called. 11 12If there is a cycle in this graph, this indicates that there is a lock order 13inversion (potential deadlock). If the program finds a lock order inversion, the 14program will dump the cycle of mutexes, dump the stack traces where each mutex 15was acquired, and then exit. 16 17This program can only find potential deadlocks that occur while the program 18is tracing the process. It cannot find deadlocks that may have occurred 19before the program was attached to the process. 20 21Since this traces all mutex lock and unlock events and all thread creation 22events on the traced process, the overhead of this bpf program can be very 23high if the process has many threads and mutexes. You should only run this on 24a process where the slowdown is acceptable. 25 26Note: This tool does not work for shared mutexes or recursive mutexes. 27 28For shared (read-write) mutexes, a deadlock requires a cycle in the wait 29graph where at least one of the mutexes in the cycle is acquiring exclusive 30(write) ownership. 31 32For recursive mutexes, lock() is called multiple times on the same mutex. 33However, there is no way to determine if a mutex is a recursive mutex 34after the mutex has been created. As a result, this tool will not find 35potential deadlocks that involve only one mutex. 36 37 38# ./deadlock_detector.py 181 39Tracing... Hit Ctrl-C to end. 40---------------- 41Potential Deadlock Detected! 42 43Cycle in lock order graph: Mutex M0 (main::static_mutex3 0x0000000000473c60) => Mutex M1 (0x00007fff6d738400) => Mutex M2 (global_mutex1 0x0000000000473be0) => Mutex M3 (global_mutex2 0x0000000000473c20) => Mutex M0 (main::static_mutex3 0x0000000000473c60) 44 45Mutex M1 (0x00007fff6d738400) acquired here while holding Mutex M0 (main::static_mutex3 0x0000000000473c60) in Thread 357250 (lockinversion): 46@ 00000000004024d0 pthread_mutex_lock 47@ 0000000000406dd0 std::mutex::lock() 48@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 49@ 0000000000402e38 main::{lambda()#3}::operator()() const 50@ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>) 51@ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()() 52@ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run() 53@ 00007fd4496564e1 execute_native_thread_routine 54@ 00007fd449dd57f1 start_thread 55@ 00007fd44909746d __clone 56 57Mutex M0 (main::static_mutex3 0x0000000000473c60) previously acquired by the same Thread 357250 (lockinversion) here: 58@ 00000000004024d0 pthread_mutex_lock 59@ 0000000000406dd0 std::mutex::lock() 60@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 61@ 0000000000402e22 main::{lambda()#3}::operator()() const 62@ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>) 63@ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()() 64@ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run() 65@ 00007fd4496564e1 execute_native_thread_routine 66@ 00007fd449dd57f1 start_thread 67@ 00007fd44909746d __clone 68 69Mutex M2 (global_mutex1 0x0000000000473be0) acquired here while holding Mutex M1 (0x00007fff6d738400) in Thread 357251 (lockinversion): 70@ 00000000004024d0 pthread_mutex_lock 71@ 0000000000406dd0 std::mutex::lock() 72@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 73@ 0000000000402ea8 main::{lambda()#4}::operator()() const 74@ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>) 75@ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()() 76@ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run() 77@ 00007fd4496564e1 execute_native_thread_routine 78@ 00007fd449dd57f1 start_thread 79@ 00007fd44909746d __clone 80 81Mutex M1 (0x00007fff6d738400) previously acquired by the same Thread 357251 (lockinversion) here: 82@ 00000000004024d0 pthread_mutex_lock 83@ 0000000000406dd0 std::mutex::lock() 84@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 85@ 0000000000402e97 main::{lambda()#4}::operator()() const 86@ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>) 87@ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()() 88@ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run() 89@ 00007fd4496564e1 execute_native_thread_routine 90@ 00007fd449dd57f1 start_thread 91@ 00007fd44909746d __clone 92 93Mutex M3 (global_mutex2 0x0000000000473c20) acquired here while holding Mutex M2 (global_mutex1 0x0000000000473be0) in Thread 357247 (lockinversion): 94@ 00000000004024d0 pthread_mutex_lock 95@ 0000000000406dd0 std::mutex::lock() 96@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 97@ 0000000000402d5f main::{lambda()#1}::operator()() const 98@ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) 99@ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()() 100@ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run() 101@ 00007fd4496564e1 execute_native_thread_routine 102@ 00007fd449dd57f1 start_thread 103@ 00007fd44909746d __clone 104 105Mutex M2 (global_mutex1 0x0000000000473be0) previously acquired by the same Thread 357247 (lockinversion) here: 106@ 00000000004024d0 pthread_mutex_lock 107@ 0000000000406dd0 std::mutex::lock() 108@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 109@ 0000000000402d4e main::{lambda()#1}::operator()() const 110@ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) 111@ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()() 112@ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run() 113@ 00007fd4496564e1 execute_native_thread_routine 114@ 00007fd449dd57f1 start_thread 115@ 00007fd44909746d __clone 116 117Mutex M0 (main::static_mutex3 0x0000000000473c60) acquired here while holding Mutex M3 (global_mutex2 0x0000000000473c20) in Thread 357248 (lockinversion): 118@ 00000000004024d0 pthread_mutex_lock 119@ 0000000000406dd0 std::mutex::lock() 120@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 121@ 0000000000402dc9 main::{lambda()#2}::operator()() const 122@ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>) 123@ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()() 124@ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run() 125@ 00007fd4496564e1 execute_native_thread_routine 126@ 00007fd449dd57f1 start_thread 127@ 00007fd44909746d __clone 128 129Mutex M3 (global_mutex2 0x0000000000473c20) previously acquired by the same Thread 357248 (lockinversion) here: 130@ 00000000004024d0 pthread_mutex_lock 131@ 0000000000406dd0 std::mutex::lock() 132@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&) 133@ 0000000000402db8 main::{lambda()#2}::operator()() const 134@ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>) 135@ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()() 136@ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run() 137@ 00007fd4496564e1 execute_native_thread_routine 138@ 00007fd449dd57f1 start_thread 139@ 00007fd44909746d __clone 140 141Thread 357248 created by Thread 350692 (lockinversion) here: 142@ 00007fd449097431 __clone 143@ 00007fd449dd5ef5 pthread_create 144@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) 145@ 00000000004033ac std::thread::thread<main::{lambda()#2}>(main::{lambda()#2}&&) 146@ 000000000040308f main 147@ 00007fd448faa0f6 __libc_start_main 148@ 0000000000402ad8 [unknown] 149 150Thread 357250 created by Thread 350692 (lockinversion) here: 151@ 00007fd449097431 __clone 152@ 00007fd449dd5ef5 pthread_create 153@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) 154@ 00000000004034b2 std::thread::thread<main::{lambda()#3}>(main::{lambda()#3}&&) 155@ 00000000004030b9 main 156@ 00007fd448faa0f6 __libc_start_main 157@ 0000000000402ad8 [unknown] 158 159Thread 357251 created by Thread 350692 (lockinversion) here: 160@ 00007fd449097431 __clone 161@ 00007fd449dd5ef5 pthread_create 162@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) 163@ 00000000004035b8 std::thread::thread<main::{lambda()#4}>(main::{lambda()#4}&&) 164@ 00000000004030e6 main 165@ 00007fd448faa0f6 __libc_start_main 166@ 0000000000402ad8 [unknown] 167 168Thread 357247 created by Thread 350692 (lockinversion) here: 169@ 00007fd449097431 __clone 170@ 00007fd449dd5ef5 pthread_create 171@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) 172@ 00000000004032a6 std::thread::thread<main::{lambda()#1}>(main::{lambda()#1}&&) 173@ 0000000000403070 main 174@ 00007fd448faa0f6 __libc_start_main 175@ 0000000000402ad8 [unknown] 176 177This is output from a process that has a potential deadlock involving 4 mutexes 178and 4 threads: 179 180- Thread 357250 acquired M1 while holding M0 (edge M0 -> M1) 181- Thread 357251 acquired M2 while holding M1 (edge M1 -> M2) 182- Thread 357247 acquired M3 while holding M2 (edge M2 -> M3) 183- Thread 357248 acquired M0 while holding M3 (edge M3 -> M0) 184 185This is the C++ program that generated the output above: 186 187```c++ 188#include <chrono> 189#include <iostream> 190#include <mutex> 191#include <thread> 192 193std::mutex global_mutex1; 194std::mutex global_mutex2; 195 196int main(void) { 197 static std::mutex static_mutex3; 198 std::mutex local_mutex4; 199 200 std::cout << "sleeping for a bit to allow trace to attach..." << std::endl; 201 std::this_thread::sleep_for(std::chrono::seconds(10)); 202 std::cout << "starting program..." << std::endl; 203 204 auto t1 = std::thread([] { 205 std::lock_guard<std::mutex> g1(global_mutex1); 206 std::lock_guard<std::mutex> g2(global_mutex2); 207 }); 208 t1.join(); 209 210 auto t2 = std::thread([] { 211 std::lock_guard<std::mutex> g2(global_mutex2); 212 std::lock_guard<std::mutex> g3(static_mutex3); 213 }); 214 t2.join(); 215 216 auto t3 = std::thread([&local_mutex4] { 217 std::lock_guard<std::mutex> g3(static_mutex3); 218 std::lock_guard<std::mutex> g4(local_mutex4); 219 }); 220 t3.join(); 221 222 auto t4 = std::thread([&local_mutex4] { 223 std::lock_guard<std::mutex> g4(local_mutex4); 224 std::lock_guard<std::mutex> g1(global_mutex1); 225 }); 226 t4.join(); 227 228 std::cout << "sleeping to allow trace to collect data..." << std::endl; 229 std::this_thread::sleep_for(std::chrono::seconds(5)); 230 std::cout << "done!" << std::endl; 231} 232``` 233 234Note that an actual deadlock did not occur, although this mutex lock ordering 235creates the possibility of a deadlock, and this is a hint to the programmer to 236reconsider the lock ordering. If the mutexes are global or static and debug 237symbols are enabled, the output will contain the mutex symbol name. The output 238uses a similar format as ThreadSanitizer 239(https://github.com/google/sanitizers/wiki/ThreadSanitizerDeadlockDetector). 240 241 242# ./deadlock_detector.py 181 --binary /usr/local/bin/lockinversion 243 244Tracing... Hit Ctrl-C to end. 245^C 246 247If the traced process is instantiated from a statically-linked executable, 248this argument is optional, and the program will determine the path of the 249executable from the pid. However, on older kernels without this patch 250("uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET", 251https://lkml.org/lkml/2017/1/13/585), binaries that contain `:` in the path 252cannot be attached with uprobes. As a workaround, we can create a symlink 253to the binary, and provide the symlink name instead to the `--binary` option. 254 255 256# ./deadlock_detector.py 181 --binary /lib/x86_64-linux-gnu/libpthread.so.0 257 258Tracing... Hit Ctrl-C to end. 259^C 260 261If the traced process is instantiated from a dynamically-linked executable, 262this argument is required and needs to be the path to the pthread shared 263library used by the executable. 264 265 266# ./deadlock_detector.py 181 --dump-graph graph.json --verbose 267 268Tracing... Hit Ctrl-C to end. 269Mutexes: 0, Edges: 0 270Mutexes: 532, Edges: 411 271Mutexes: 735, Edges: 675 272Mutexes: 1118, Edges: 1278 273Mutexes: 1666, Edges: 2185 274Mutexes: 2056, Edges: 2694 275Mutexes: 2245, Edges: 2906 276Mutexes: 2656, Edges: 3479 277Mutexes: 2813, Edges: 3785 278^C 279 280If the program does not find a deadlock, it will keep running until you hit 281Ctrl-C. If you pass the `--verbose` flag, the program will also dump statistics 282about the number of mutexes and edges in the mutex wait graph. If you want to 283serialize the graph to analyze it later, you can pass the `--dump-graph FILE` 284flag, and the program will serialize the graph in json. 285 286 287# ./deadlock_detector.py 181 --lock-symbols custom_mutex1_lock,custom_mutex2_lock --unlock_symbols custom_mutex1_unlock,custom_mutex2_unlock --verbose 288 289Tracing... Hit Ctrl-C to end. 290Mutexes: 0, Edges: 0 291Mutexes: 532, Edges: 411 292Mutexes: 735, Edges: 675 293Mutexes: 1118, Edges: 1278 294Mutexes: 1666, Edges: 2185 295Mutexes: 2056, Edges: 2694 296Mutexes: 2245, Edges: 2906 297Mutexes: 2656, Edges: 3479 298Mutexes: 2813, Edges: 3785 299^C 300 301If your program is using custom mutexes and not pthread mutexes, you can use 302the `--lock-symbols` and `--unlock-symbols` flags to specify different mutex 303symbols to trace. The flags take a comma-separated string of symbol names. 304Note that if the symbols are inlined in the binary, then this program can result 305in false positives. 306 307 308USAGE message: 309 310# ./deadlock_detector.py -h 311 312usage: deadlock_detector.py [-h] [--binary BINARY] [--dump-graph DUMP_GRAPH] 313 [--verbose] [--lock-symbols LOCK_SYMBOLS] 314 [--unlock-symbols UNLOCK_SYMBOLS] 315 pid 316 317Detect potential deadlocks (lock inversions) in a running binary. 318Must be run as root. 319 320positional arguments: 321 pid Pid to trace 322 323optional arguments: 324 -h, --help show this help message and exit 325 --binary BINARY If set, trace the mutexes from the binary at this 326 path. For statically-linked binaries, this argument is 327 not required. For dynamically-linked binaries, this 328 argument is required and should be the path of the 329 pthread library the binary is using. Example: 330 /lib/x86_64-linux-gnu/libpthread.so.0 331 --dump-graph DUMP_GRAPH 332 If set, this will dump the mutex graph to the 333 specified file. 334 --verbose Print statistics about the mutex wait graph. 335 --lock-symbols LOCK_SYMBOLS 336 Comma-separated list of lock symbols to trace. Default 337 is pthread_mutex_lock. These symbols cannot be inlined 338 in the binary. 339 --unlock-symbols UNLOCK_SYMBOLS 340 Comma-separated list of unlock symbols to trace. 341 Default is pthread_mutex_unlock. These symbols cannot 342 be inlined in the binary. 343 344Examples: 345 deadlock_detector 181 # Analyze PID 181 346 347 deadlock_detector 181 --binary /lib/x86_64-linux-gnu/libpthread.so.0 348 # Analyze PID 181 and locks from this binary. 349 # If tracing a process that is running from 350 # a dynamically-linked binary, this argument 351 # is required and should be the path to the 352 # pthread library. 353 354 deadlock_detector 181 --verbose 355 # Analyze PID 181 and print statistics about 356 # the mutex wait graph. 357 358 deadlock_detector 181 --lock-symbols my_mutex_lock1,my_mutex_lock2 \ 359 --unlock-symbols my_mutex_unlock1,my_mutex_unlock2 360 # Analyze PID 181 and trace custom mutex 361 # symbols instead of pthread mutexes. 362 363 deadlock_detector 181 --dump-graph graph.json 364 # Analyze PID 181 and dump the mutex wait 365 # graph to graph.json. 366