• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
7<chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
8  <title>Helgrind: a thread error detector</title>
9
10<para>To use this tool, you must specify
11<option>--tool=helgrind</option> on the Valgrind
12command line.</para>
13
14
15<sect1 id="hg-manual.overview" xreflabel="Overview">
16<title>Overview</title>
17
18<para>Helgrind is a Valgrind tool for detecting synchronisation errors
19in C, C++ and Fortran programs that use the POSIX pthreads
20threading primitives.</para>
21
22<para>The main abstractions in POSIX pthreads are: a set of threads
23sharing a common address space, thread creation, thread joining,
24thread exit, mutexes (locks), condition variables (inter-thread event
25notifications), reader-writer locks, spinlocks, semaphores and
26barriers.</para>
27
28<para>Helgrind can detect three classes of errors, which are discussed
29in detail in the next three sections:</para>
30
31<orderedlist>
32 <listitem>
33  <para><link linkend="hg-manual.api-checks">
34        Misuses of the POSIX pthreads API.</link></para>
35 </listitem>
36 <listitem>
37  <para><link linkend="hg-manual.lock-orders">
38        Potential deadlocks arising from lock
39        ordering problems.</link></para>
40 </listitem>
41 <listitem>
42  <para><link linkend="hg-manual.data-races">
43        Data races -- accessing memory without adequate locking
44                      or synchronisation</link>.
45  </para>
46 </listitem>
47</orderedlist>
48
49<para>Problems like these often result in unreproducible,
50timing-dependent crashes, deadlocks and other misbehaviour, and
51can be difficult to find by other means.</para>
52
53<para>Helgrind is aware of all the pthread abstractions and tracks
54their effects as accurately as it can.  On x86 and amd64 platforms, it
55understands and partially handles implicit locking arising from the
56use of the LOCK instruction prefix.
57</para>
58
59<para>Helgrind works best when your application uses only the POSIX
60pthreads API.  However, if you want to use custom threading
61primitives, you can describe their behaviour to Helgrind using the
62<varname>ANNOTATE_*</varname> macros defined
63in <varname>helgrind.h</varname>.  This functionality was added in
64release 3.5.0 of Valgrind, and is considered experimental.</para>
65
66
67
68<para>Following those is a section containing
69<link linkend="hg-manual.effective-use">
70hints and tips on how to get the best out of Helgrind.</link>
71</para>
72
73<para>Then there is a
74<link linkend="hg-manual.options">summary of command-line
75options.</link>
76</para>
77
78<para>Finally, there is
79<link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
80could be improved.</link>
81</para>
82
83</sect1>
84
85
86
87
88<sect1 id="hg-manual.api-checks" xreflabel="API Checks">
89<title>Detected errors: Misuses of the POSIX pthreads API</title>
90
91<para>Helgrind intercepts calls to many POSIX pthreads functions, and
92is therefore able to report on various common problems.  Although
93these are unglamourous errors, their presence can lead to undefined
94program behaviour and hard-to-find bugs later on.  The detected errors
95are:</para>
96
97<itemizedlist>
98 <listitem><para>unlocking an invalid mutex</para></listitem>
99 <listitem><para>unlocking a not-locked mutex</para></listitem>
100 <listitem><para>unlocking a mutex held by a different
101                 thread</para></listitem>
102 <listitem><para>destroying an invalid or a locked mutex</para></listitem>
103 <listitem><para>recursively locking a non-recursive mutex</para></listitem>
104 <listitem><para>deallocation of memory that contains a
105                 locked mutex</para></listitem>
106 <listitem><para>passing mutex arguments to functions expecting
107                 reader-writer lock arguments, and vice
108                 versa</para></listitem>
109 <listitem><para>when a POSIX pthread function fails with an
110                 error code that must be handled</para></listitem>
111 <listitem><para>when a thread exits whilst still holding locked
112                 locks</para></listitem>
113 <listitem><para>calling <function>pthread_cond_wait</function>
114                 with a not-locked mutex, an invalid mutex,
115                 or one locked by a different
116                 thread</para></listitem>
117 <listitem><para>inconsistent bindings between condition
118                 variables and their associated mutexes</para></listitem>
119 <listitem><para>invalid or duplicate initialisation of a pthread
120                 barrier</para></listitem>
121 <listitem><para>initialisation of a pthread barrier on which threads
122                 are still waiting</para></listitem>
123 <listitem><para>destruction of a pthread barrier object which was
124                 never initialised, or on which threads are still
125                 waiting</para></listitem>
126 <listitem><para>waiting on an uninitialised pthread
127                 barrier</para></listitem>
128 <listitem><para>for all of the pthreads functions that Helgrind
129                 intercepts, an error is reported, along with a stack
130                 trace, if the system threading library routine returns
131                 an error code, even if Helgrind itself detected no
132                 error</para></listitem>
133</itemizedlist>
134
135<para>Checks pertaining to the validity of mutexes are generally also
136performed for reader-writer locks.</para>
137
138<para>Various kinds of this-can't-possibly-happen events are also
139reported.  These usually indicate bugs in the system threading
140library.</para>
141
142<para>Reported errors always contain a primary stack trace indicating
143where the error was detected.  They may also contain auxiliary stack
144traces giving additional information.  In particular, most errors
145relating to mutexes will also tell you where that mutex first came to
146Helgrind's attention (the "<computeroutput>was first observed
147at</computeroutput>" part), so you have a chance of figuring out which
148mutex it is referring to.  For example:</para>
149
150<programlisting><![CDATA[
151Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
152   at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
153   by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
154   by 0x40079B: main (tc09_bad_unlock.c:50)
155  Lock at 0x7FEFFFA90 was first observed
156   at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
157   by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
158   by 0x40079B: main (tc09_bad_unlock.c:50)
159]]></programlisting>
160
161<para>Helgrind has a way of summarising thread identities, as
162you see here with the text "<computeroutput>Thread
163#1</computeroutput>".  This is so that it can speak about threads and
164sets of threads without overwhelming you with details.  See
165<link linkend="hg-manual.data-races.errmsgs">below</link>
166for more information on interpreting error messages.</para>
167
168</sect1>
169
170
171
172
173<sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
174<title>Detected errors: Inconsistent Lock Orderings</title>
175
176<para>In this section, and in general, to "acquire" a lock simply
177means to lock that lock, and to "release" a lock means to unlock
178it.</para>
179
180<para>Helgrind monitors the order in which threads acquire locks.
181This allows it to detect potential deadlocks which could arise from
182the formation of cycles of locks.  Detecting such inconsistencies is
183useful because, whilst actual deadlocks are fairly obvious, potential
184deadlocks may never be discovered during testing and could later lead
185to hard-to-diagnose in-service failures.</para>
186
187<para>The simplest example of such a problem is as
188follows.</para>
189
190<itemizedlist>
191 <listitem><para>Imagine some shared resource R, which, for whatever
192  reason, is guarded by two locks, L1 and L2, which must both be held
193  when R is accessed.</para>
194 </listitem>
195 <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
196  to access R.  The implication of this is that all threads in the
197  program must acquire the two locks in the order first L1 then L2.
198  Not doing so risks deadlock.</para>
199 </listitem>
200 <listitem><para>The deadlock could happen if two threads -- call them
201  T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
202  and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
203  to acquire L1, but those locks are both already held.  So T1 and T2
204  become deadlocked.</para>
205 </listitem>
206</itemizedlist>
207
208<para>Helgrind builds a directed graph indicating the order in which
209locks have been acquired in the past.  When a thread acquires a new
210lock, the graph is updated, and then checked to see if it now contains
211a cycle.  The presence of a cycle indicates a potential deadlock involving
212the locks in the cycle.</para>
213
214<para>In simple situations, where the cycle only contains two locks,
215Helgrind will show where the required order was established:</para>
216
217<programlisting><![CDATA[
218Thread #1: lock order "0x7FEFFFAB0 before 0x7FEFFFA80" violated
219   at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
220   by 0x40081F: main (tc13_laog1.c:24)
221  Required order was established by acquisition of lock at 0x7FEFFFAB0
222   at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
223   by 0x400748: main (tc13_laog1.c:17)
224  followed by a later acquisition of lock at 0x7FEFFFA80
225   at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
226   by 0x400773: main (tc13_laog1.c:18)
227]]></programlisting>
228
229<para>When there are more than two locks in the cycle, the error is
230equally serious.  However, at present Helgrind does not show the locks
231involved, so as to avoid flooding you with information.  That could be
232fixed in future.  For example, here is a an example involving a cycle
233of five locks from a naive implementation the famous Dining
234Philosophers problem
235(see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
236In this case Helgrind has detected that all 5 philosophers could
237simultaneously pick up their left fork and then deadlock whilst
238waiting to pick up their right forks.</para>
239
240<programlisting><![CDATA[
241Thread #6: lock order "0x6010C0 before 0x601160" violated
242   at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
243   by 0x4007C0: dine (tc14_laog_dinphils.c:19)
244   by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
245   by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
246   by 0x51054CC: clone (in /lib64/libc-2.5.so)
247]]></programlisting>
248
249</sect1>
250
251
252
253
254<sect1 id="hg-manual.data-races" xreflabel="Data Races">
255<title>Detected errors: Data Races</title>
256
257<para>A data race happens, or could happen, when two threads access a
258shared memory location without using suitable locks or other
259synchronisation to ensure single-threaded access.  Such missing
260locking can cause obscure timing dependent bugs.  Ensuring programs
261are race-free is one of the central difficulties of threaded
262programming.</para>
263
264<para>Reliably detecting races is a difficult problem, and most
265of Helgrind's internals are devoted to do dealing with it.
266We begin with a simple example.</para>
267
268
269<sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
270<title>A Simple Data Race</title>
271
272<para>About the simplest possible example of a race is as follows.  In
273this program, it is impossible to know what the value
274of <computeroutput>var</computeroutput> is at the end of the program.
275Is it 2 ?  Or 1 ?</para>
276
277<programlisting><![CDATA[
278#include <pthread.h>
279
280int var = 0;
281
282void* child_fn ( void* arg ) {
283   var++; /* Unprotected relative to parent */ /* this is line 6 */
284   return NULL;
285}
286
287int main ( void ) {
288   pthread_t child;
289   pthread_create(&child, NULL, child_fn, NULL);
290   var++; /* Unprotected relative to child */ /* this is line 13 */
291   pthread_join(child, NULL);
292   return 0;
293}
294]]></programlisting>
295
296<para>The problem is there is nothing to
297stop <varname>var</varname> being updated simultaneously
298by both threads.  A correct program would
299protect <varname>var</varname> with a lock of type
300<function>pthread_mutex_t</function>, which is acquired
301before each access and released afterwards.  Helgrind's output for
302this program is:</para>
303
304<programlisting><![CDATA[
305Thread #1 is the program's root thread
306
307Thread #2 was created
308   at 0x511C08E: clone (in /lib64/libc-2.8.so)
309   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
310   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
311   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
312   by 0x400605: main (simple_race.c:12)
313
314Possible data race during read of size 4 at 0x601038 by thread #1
315   at 0x400606: main (simple_race.c:13)
316 This conflicts with a previous write of size 4 by thread #2
317   at 0x4005DC: child_fn (simple_race.c:6)
318   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
319   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
320   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
321 Location 0x601038 is 0 bytes inside global var "var"
322 declared at simple_race.c:3
323]]></programlisting>
324
325<para>This is quite a lot of detail for an apparently simple error.
326The last clause is the main error message.  It says there is a race as
327a result of a read of size 4 (bytes), at 0x601038, which is the
328address of <computeroutput>var</computeroutput>, happening in
329function <computeroutput>main</computeroutput> at line 13 in the
330program.</para>
331
332<para>Two important parts of the message are:</para>
333
334<itemizedlist>
335 <listitem>
336  <para>Helgrind shows two stack traces for the error, not one.  By
337   definition, a race involves two different threads accessing the
338   same location in such a way that the result depends on the relative
339   speeds of the two threads.</para>
340  <para>
341   The first stack trace follows the text "<computeroutput>Possible
342   data race during read of size 4 ...</computeroutput>" and the
343   second trace follows the text "<computeroutput>This conflicts with
344   a previous write of size 4 ...</computeroutput>".  Helgrind is
345   usually able to show both accesses involved in a race.  At least
346   one of these will be a write (since two concurrent, unsynchronised
347   reads are harmless), and they will of course be from different
348   threads.</para>
349  <para>By examining your program at the two locations, you should be
350   able to get at least some idea of what the root cause of the
351   problem is.</para>
352 </listitem>
353 <listitem>
354  <para>For races which occur on global or stack variables, Helgrind
355   tries to identify the name and defining point of the variable.
356   Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside
357   global var "var" declared at simple_race.c:3</computeroutput>".</para>
358  <para>Showing names of stack and global variables carries no
359   run-time overhead once Helgrind has your program up and running.
360   However, it does require Helgrind to spend considerable extra time
361   and memory at program startup to read the relevant debug info.
362   Hence this facility is disabled by default.  To enable it, you need
363   to give the <varname>--read-var-info=yes</varname> option to
364   Helgrind.</para>
365 </listitem>
366</itemizedlist>
367
368<para>The following section explains Helgrind's race detection
369algorithm in more detail.</para>
370
371</sect2>
372
373
374
375<sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm">
376<title>Helgrind's Race Detection Algorithm</title>
377
378<para>Most programmers think about threaded programming in terms of
379the basic functionality provided by the threading library (POSIX
380Pthreads): thread creation, thread joining, locks, condition
381variables, semaphores and barriers.</para>
382
383<para>The effect of using these functions is to impose
384constraints upon the order in which memory accesses can
385happen.  This implied ordering is generally known as the
386"happens-before relation".  Once you understand the happens-before
387relation, it is easy to see how Helgrind finds races in your code.
388Fortunately, the happens-before relation is itself easy to understand,
389and is by itself a useful tool for reasoning about the behaviour of
390parallel programs.  We now introduce it using a simple example.</para>
391
392<para>Consider first the following buggy program:</para>
393
394<programlisting><![CDATA[
395Parent thread:                         Child thread:
396
397int var;
398
399// create child thread
400pthread_create(...)
401var = 20;                              var = 10;
402                                       exit
403
404// wait for child
405pthread_join(...)
406printf("%d\n", var);
407]]></programlisting>
408
409<para>The parent thread creates a child.  Both then write different
410values to some variable <computeroutput>var</computeroutput>, and the
411parent then waits for the child to exit.</para>
412
413<para>What is the value of <computeroutput>var</computeroutput> at the
414end of the program, 10 or 20?  We don't know.  The program is
415considered buggy (it has a race) because the final value
416of <computeroutput>var</computeroutput> depends on the relative rates
417of progress of the parent and child threads.  If the parent is fast
418and the child is slow, then the child's assignment may happen later,
419so the final value will be 10; and vice versa if the child is faster
420than the parent.</para>
421
422<para>The relative rates of progress of parent vs child is not something
423the programmer can control, and will often change from run to run.
424It depends on factors such as the load on the machine, what else is
425running, the kernel's scheduling strategy, and many other factors.</para>
426
427<para>The obvious fix is to use a lock to
428protect <computeroutput>var</computeroutput>.  It is however
429instructive to consider a somewhat more abstract solution, which is to
430send a message from one thread to the other:</para>
431
432<programlisting><![CDATA[
433Parent thread:                         Child thread:
434
435int var;
436
437// create child thread
438pthread_create(...)
439var = 20;
440// send message to child
441                                       // wait for message to arrive
442                                       var = 10;
443                                       exit
444
445// wait for child
446pthread_join(...)
447printf("%d\n", var);
448]]></programlisting>
449
450<para>Now the program reliably prints "10", regardless of the speed of
451the threads.  Why?  Because the child's assignment cannot happen until
452after it receives the message.  And the message is not sent until
453after the parent's assignment is done.</para>
454
455<para>The message transmission creates a "happens-before" dependency
456between the two assignments: <computeroutput>var = 20;</computeroutput>
457must now happen-before <computeroutput>var = 10;</computeroutput>.
458And so there is no longer a race
459on <computeroutput>var</computeroutput>.
460</para>
461
462<para>Note that it's not significant that the parent sends a message
463to the child.  Sending a message from the child (after its assignment)
464to the parent (before its assignment) would also fix the problem, causing
465the program to reliably print "20".</para>
466
467<para>Helgrind's algorithm is (conceptually) very simple.  It monitors all
468accesses to memory locations.  If a location -- in this example,
469<computeroutput>var</computeroutput>,
470is accessed by two different threads, Helgrind checks to see if the
471two accesses are ordered by the happens-before relation.  If so,
472that's fine; if not, it reports a race.</para>
473
474<para>It is important to understand that the happens-before relation
475creates only a partial ordering, not a total ordering.  An example of
476a total ordering is comparison of numbers: for any two numbers
477<computeroutput>x</computeroutput> and
478<computeroutput>y</computeroutput>, either
479<computeroutput>x</computeroutput> is less than, equal to, or greater
480than
481<computeroutput>y</computeroutput>.  A partial ordering is like a
482total ordering, but it can also express the concept that two elements
483are neither equal, less or greater, but merely unordered with respect
484to each other.</para>
485
486<para>In the fixed example above, we say that
487<computeroutput>var = 20;</computeroutput> "happens-before"
488<computeroutput>var = 10;</computeroutput>.  But in the original
489version, they are unordered: we cannot say that either happens-before
490the other.</para>
491
492<para>What does it mean to say that two accesses from different
493threads are ordered by the happens-before relation?  It means that
494there is some chain of inter-thread synchronisation operations which
495cause those accesses to happen in a particular order, irrespective of
496the actual rates of progress of the individual threads.  This is a
497required property for a reliable threaded program, which is why
498Helgrind checks for it.</para>
499
500<para>The happens-before relations created by standard threading
501primitives are as follows:</para>
502
503<itemizedlist>
504 <listitem><para>When a mutex is unlocked by thread T1 and later (or
505  immediately) locked by thread T2, then the memory accesses in T1
506  prior to the unlock must happen-before those in T2 after it acquires
507  the lock.</para>
508 </listitem>
509 <listitem><para>The same idea applies to reader-writer locks,
510  although with some complication so as to allow correct handling of
511  reads vs writes.</para>
512 </listitem>
513 <listitem><para>When a condition variable (CV) is signalled on by
514  thread T1 and some other thread T2 is thereby released from a wait
515  on the same CV, then the memory accesses in T1 prior to the
516  signalling must happen-before those in T2 after it returns from the
517  wait.  If no thread was waiting on the CV then there is no
518  effect.</para>
519 </listitem>
520 <listitem><para>If instead T1 broadcasts on a CV, then all of the
521  waiting threads, rather than just one of them, acquire a
522  happens-before dependency on the broadcasting thread at the point it
523  did the broadcast.</para>
524 </listitem>
525 <listitem><para>A thread T2 that continues after completing sem_wait
526  on a semaphore that thread T1 posts on, acquires a happens-before
527  dependence on the posting thread, a bit like dependencies caused
528  mutex unlock-lock pairs.  However, since a semaphore can be posted
529  on many times, it is unspecified from which of the post calls the
530  wait call gets its happens-before dependency.</para>
531 </listitem>
532 <listitem><para>For a group of threads T1 .. Tn which arrive at a
533  barrier and then move on, each thread after the call has a
534  happens-after dependency from all threads before the
535  barrier.</para>
536 </listitem>
537 <listitem><para>A newly-created child thread acquires an initial
538  happens-after dependency on the point where its parent created it.
539  That is, all memory accesses performed by the parent prior to
540  creating the child are regarded as happening-before all the accesses
541  of the child.</para>
542 </listitem>
543 <listitem><para>Similarly, when an exiting thread is reaped via a
544  call to <function>pthread_join</function>, once the call returns, the
545  reaping thread acquires a happens-after dependency relative to all memory
546  accesses made by the exiting thread.</para>
547 </listitem>
548</itemizedlist>
549
550<para>In summary: Helgrind intercepts the above listed events, and builds a
551directed acyclic graph represented the collective happens-before
552dependencies.  It also monitors all memory accesses.</para>
553
554<para>If a location is accessed by two different threads, but Helgrind
555cannot find any path through the happens-before graph from one access
556to the other, then it reports a race.</para>
557
558<para>There are a couple of caveats:</para>
559
560<itemizedlist>
561 <listitem><para>Helgrind doesn't check for a race in the case where
562  both accesses are reads.  That would be silly, since concurrent
563  reads are harmless.</para>
564 </listitem>
565 <listitem><para>Two accesses are considered to be ordered by the
566  happens-before dependency even through arbitrarily long chains of
567  synchronisation events.  For example, if T1 accesses some location
568  L, and then <function>pthread_cond_signals</function> T2, which later
569  <function>pthread_cond_signals</function> T3, which then accesses L, then
570  a suitable happens-before dependency exists between the first and second
571  accesses, even though it involves two different inter-thread
572  synchronisation events.</para>
573 </listitem>
574</itemizedlist>
575
576</sect2>
577
578
579
580<sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
581<title>Interpreting Race Error Messages</title>
582
583<para>Helgrind's race detection algorithm collects a lot of
584information, and tries to present it in a helpful way when a race is
585detected.  Here's an example:</para>
586
587<programlisting><![CDATA[
588Thread #2 was created
589   at 0x511C08E: clone (in /lib64/libc-2.8.so)
590   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
591   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
592   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
593   by 0x4008F2: main (tc21_pthonce.c:86)
594
595Thread #3 was created
596   at 0x511C08E: clone (in /lib64/libc-2.8.so)
597   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
598   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
599   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
600   by 0x4008F2: main (tc21_pthonce.c:86)
601
602Possible data race during read of size 4 at 0x601070 by thread #3
603   at 0x40087A: child (tc21_pthonce.c:74)
604   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
605   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
606   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
607 This conflicts with a previous write of size 4 by thread #2
608   at 0x400883: child (tc21_pthonce.c:74)
609   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
610   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
611   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
612 Location 0x601070 is 0 bytes inside local var "unprotected2"
613 declared at tc21_pthonce.c:51, in frame #0 of thread 3
614]]></programlisting>
615
616<para>Helgrind first announces the creation points of any threads
617referenced in the error message.  This is so it can speak concisely
618about threads without repeatedly printing their creation point call
619stacks.  Each thread is only ever announced once, the first time it
620appears in any Helgrind error message.</para>
621
622<para>The main error message begins at the text
623"<computeroutput>Possible data race during read</computeroutput>".  At
624the start is information you would expect to see -- address and size
625of the racing access, whether a read or a write, and the call stack at
626the point it was detected.</para>
627
628<para>A second call stack is presented starting at the text
629"<computeroutput>This conflicts with a previous
630write</computeroutput>".  This shows a previous access which also
631accessed the stated address, and which is believed to be racing
632against the access in the first call stack.</para>
633
634<para>Finally, Helgrind may attempt to give a description of the
635raced-on address in source level terms.  In this example, it
636identifies it as a local variable, shows its name, declaration point,
637and in which frame (of the first call stack) it lives.  Note that this
638information is only shown when <varname>--read-var-info=yes</varname>
639is specified on the command line.  That's because reading the DWARF3
640debug information in enough detail to capture variable type and
641location information makes Helgrind much slower at startup, and also
642requires considerable amounts of memory, for large programs.
643</para>
644
645<para>Once you have your two call stacks, how do you find the root
646cause of the race?</para>
647
648<para>The first thing to do is examine the source locations referred
649to by each call stack.  They should both show an access to the same
650location, or variable.</para>
651
652<para>Now figure out how how that location should have been made
653thread-safe:</para>
654
655<itemizedlist>
656 <listitem><para>Perhaps the location was intended to be protected by
657  a mutex?  If so, you need to lock and unlock the mutex at both
658  access points, even if one of the accesses is reported to be a read.
659  Did you perhaps forget the locking at one or other of the
660  accesses?</para>
661 </listitem>
662 <listitem><para>Alternatively, perhaps you intended to use a some
663  other scheme to make it safe, such as signalling on a condition
664  variable.  In all such cases, try to find a synchronisation event
665  (or a chain thereof) which separates the earlier-observed access (as
666  shown in the second call stack) from the later-observed access (as
667  shown in the first call stack).  In other words, try to find
668  evidence that the earlier access "happens-before" the later access.
669  See the previous subsection for an explanation of the happens-before
670  relation.</para>
671  <para>
672  The fact that Helgrind is reporting a race means it did not observe
673  any happens-before relation between the two accesses.  If
674  Helgrind is working correctly, it should also be the case that you
675  also cannot find any such relation, even on detailed inspection
676  of the source code.  Hopefully, though, your inspection of the code
677  will show where the missing synchronisation operation(s) should have
678  been.</para>
679 </listitem>
680</itemizedlist>
681
682</sect2>
683
684
685</sect1>
686
687<sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
688<title>Hints and Tips for Effective Use of Helgrind</title>
689
690<para>Helgrind can be very helpful in finding and resolving
691threading-related problems.  Like all sophisticated tools, it is most
692effective when you understand how to play to its strengths.</para>
693
694<para>Helgrind will be less effective when you merely throw an
695existing threaded program at it and try to make sense of any reported
696errors.  It will be more effective if you design threaded programs
697from the start in a way that helps Helgrind verify correctness.  The
698same is true for finding memory errors with Memcheck, but applies more
699here, because thread checking is a harder problem.  Consequently it is
700much easier to write a correct program for which Helgrind falsely
701reports (threading) errors than it is to write a correct program for
702which Memcheck falsely reports (memory) errors.</para>
703
704<para>With that in mind, here are some tips, listed most important first,
705for getting reliable results and avoiding false errors.  The first two
706are critical.  Any violations of them will swamp you with huge numbers
707of false data-race errors.</para>
708
709
710<orderedlist>
711
712  <listitem>
713    <para>Make sure your application, and all the libraries it uses,
714    use the POSIX threading primitives.  Helgrind needs to be able to
715    see all events pertaining to thread creation, exit, locking and
716    other synchronisation events.  To do so it intercepts many POSIX
717    pthreads functions.</para>
718
719    <para>Do not roll your own threading primitives (mutexes, etc)
720    from combinations of the Linux futex syscall, atomic counters, etc.
721    These throw Helgrind's internal what's-going-on models
722    way off course and will give bogus results.</para>
723
724    <para>Also, do not reimplement existing POSIX abstractions using
725    other POSIX abstractions.  For example, don't build your own
726    semaphore routines or reader-writer locks from POSIX mutexes and
727    condition variables.  Instead use POSIX reader-writer locks and
728    semaphores directly, since Helgrind supports them directly.</para>
729
730    <para>Helgrind directly supports the following POSIX threading
731    abstractions: mutexes, reader-writer locks, condition variables
732    (but see below), semaphores and barriers.  Currently spinlocks
733    are not supported, although they could be in future.</para>
734
735    <para>At the time of writing, the following popular Linux packages
736    are known to implement their own threading primitives:</para>
737
738    <itemizedlist>
739     <listitem><para>Qt version 4.X.  Qt 3.X is harmless in that it
740      only uses POSIX pthreads primitives.  Unfortunately Qt 4.X
741      has its own implementation of mutexes (QMutex) and thread reaping.
742      Helgrind 3.4.x contains direct support
743      for Qt 4.X threading, which is experimental but is believed to
744      work fairly well.  A side effect of supporting Qt 4 directly is
745      that Helgrind can be used to debug KDE4 applications.  As this
746      is an experimental feature, we would particularly appreciate
747      feedback from folks who have used Helgrind to successfully debug
748      Qt 4 and/or KDE4 applications.</para>
749     </listitem>
750     <listitem><para>Runtime support library for GNU OpenMP (part of
751      GCC), at least for GCC versions 4.2 and 4.3.  The GNU OpenMP runtime
752      library (<filename>libgomp.so</filename>) constructs its own
753      synchronisation primitives using combinations of atomic memory
754      instructions and the futex syscall, which causes total chaos since in
755      Helgrind since it cannot "see" those.</para>
756     <para>Fortunately, this can be solved using a configuration-time
757      option (for GCC).  Rebuild GCC from source, and configure using
758      <varname>--disable-linux-futex</varname>.
759      This makes libgomp.so use the standard
760      POSIX threading primitives instead.  Note that this was tested
761      using GCC 4.2.3 and has not been re-tested using more recent GCC
762      versions.  We would appreciate hearing about any successes or
763      failures with more recent versions.</para>
764     </listitem>
765    </itemizedlist>
766  </listitem>
767
768  <listitem>
769    <para>Avoid memory recycling.  If you can't avoid it, you must use
770    tell Helgrind what is going on via the
771    <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
772    <computeroutput>helgrind.h</computeroutput>).</para>
773
774    <para>Helgrind is aware of standard heap memory allocation and
775    deallocation that occurs via
776    <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
777    and from entry and exit of stack frames.  In particular, when memory is
778    deallocated via <function>free</function>, <function>delete</function>,
779    or function exit, Helgrind considers that memory clean, so when it is
780    eventually reallocated, its history is irrelevant.</para>
781
782    <para>However, it is common practice to implement memory recycling
783    schemes.  In these, memory to be freed is not handed to
784    <function>free</function>/<function>delete</function>, but instead put
785    into a pool of free buffers to be handed out again as required.  The
786    problem is that Helgrind has no
787    way to know that such memory is logically no longer in use, and
788    its history is irrelevant.  Hence you must make that explicit,
789    using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
790    to specify the relevant address ranges.  It's easiest to put these
791    requests into the pool manager code, and use them either when memory is
792    returned to the pool, or is allocated from it.</para>
793  </listitem>
794
795  <listitem>
796    <para>Avoid POSIX condition variables.  If you can, use POSIX
797    semaphores (<function>sem_t</function>, <function>sem_post</function>,
798    <function>sem_wait</function>) to do inter-thread event signalling.
799    Semaphores with an initial value of zero are particularly useful for
800    this.</para>
801
802    <para>Helgrind only partially correctly handles POSIX condition
803    variables.  This is because Helgrind can see inter-thread
804    dependencies between a <function>pthread_cond_wait</function> call and a
805    <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
806    call only if the waiting thread actually gets to the rendezvous first
807    (so that it actually calls
808    <function>pthread_cond_wait</function>).  It can't see dependencies
809    between the threads if the signaller arrives first.  In the latter case,
810    POSIX guidelines imply that the associated boolean condition still
811    provides an inter-thread synchronisation event, but one which is
812    invisible to Helgrind.</para>
813
814    <para>The result of Helgrind missing some inter-thread
815    synchronisation events is to cause it to report false positives.
816    </para>
817
818    <para>The root cause of this synchronisation lossage is
819    particularly hard to understand, so an example is helpful.  It was
820    discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
821    in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
822    canonical POSIX-recommended usage scheme for condition variables
823    is as follows:</para>
824
825<programlisting><![CDATA[
826b   is a Boolean condition, which is False most of the time
827cv  is a condition variable
828mx  is its associated mutex
829
830Signaller:                             Waiter:
831
832lock(mx)                               lock(mx)
833b = True                               while (b == False)
834signal(cv)                                wait(cv,mx)
835unlock(mx)                             unlock(mx)
836]]></programlisting>
837
838    <para>Assume <computeroutput>b</computeroutput> is False most of
839    the time.  If the waiter arrives at the rendezvous first, it
840    enters its while-loop, waits for the signaller to signal, and
841    eventually proceeds.  Helgrind sees the signal, notes the
842    dependency, and all is well.</para>
843
844    <para>If the signaller arrives
845    first, <computeroutput>b</computeroutput> is set to true, and the
846    signal disappears into nowhere.  When the waiter later arrives, it
847    does not enter its while-loop and simply carries on.  But even in
848    this case, the waiter code following the while-loop cannot execute
849    until the signaller sets <computeroutput>b</computeroutput> to
850    True.  Hence there is still the same inter-thread dependency, but
851    this time it is through an arbitrary in-memory condition, and
852    Helgrind cannot see it.</para>
853
854    <para>By comparison, Helgrind's detection of inter-thread
855    dependencies caused by semaphore operations is believed to be
856    exactly correct.</para>
857
858    <para>As far as I know, a solution to this problem that does not
859    require source-level annotation of condition-variable wait loops
860    is beyond the current state of the art.</para>
861  </listitem>
862
863  <listitem>
864    <para>Make sure you are using a supported Linux distribution.  At
865    present, Helgrind only properly supports glibc-2.3 or later.  This
866    in turn means we only support glibc's NPTL threading
867    implementation.  The old LinuxThreads implementation is not
868    supported.</para>
869  </listitem>
870
871  <listitem>
872    <para>Round up all finished threads using
873    <function>pthread_join</function>.  Avoid
874    detaching threads: don't create threads in the detached state, and
875    don't call <function>pthread_detach</function> on existing threads.</para>
876
877    <para>Using <function>pthread_join</function> to round up finished
878    threads provides a clear synchronisation point that both Helgrind and
879    programmers can see.  If you don't call
880    <function>pthread_join</function> on a thread, Helgrind has no way to
881    know when it finishes, relative to any
882    significant synchronisation points for other threads in the program.  So
883    it assumes that the thread lingers indefinitely and can potentially
884    interfere indefinitely with the memory state of the program.  It
885    has every right to assume that -- after all, it might really be
886    the case that, for scheduling reasons, the exiting thread did run
887    very slowly in the last stages of its life.</para>
888  </listitem>
889
890  <listitem>
891    <para>Perform thread debugging (with Helgrind) and memory
892    debugging (with Memcheck) together.</para>
893
894    <para>Helgrind tracks the state of memory in detail, and memory
895    management bugs in the application are liable to cause confusion.
896    In extreme cases, applications which do many invalid reads and
897    writes (particularly to freed memory) have been known to crash
898    Helgrind.  So, ideally, you should make your application
899    Memcheck-clean before using Helgrind.</para>
900
901    <para>It may be impossible to make your application Memcheck-clean
902    unless you first remove threading bugs.  In particular, it may be
903    difficult to remove all reads and writes to freed memory in
904    multithreaded C++ destructor sequences at program termination.
905    So, ideally, you should make your application Helgrind-clean
906    before using Memcheck.</para>
907
908    <para>Since this circularity is obviously unresolvable, at least
909    bear in mind that Memcheck and Helgrind are to some extent
910    complementary, and you may need to use them together.</para>
911  </listitem>
912
913  <listitem>
914    <para>POSIX requires that implementations of standard I/O
915    (<function>printf</function>, <function>fprintf</function>,
916    <function>fwrite</function>, <function>fread</function>, etc) are thread
917    safe.  Unfortunately GNU libc implements this by using internal locking
918    primitives that Helgrind is unable to intercept.  Consequently Helgrind
919    generates many false race reports when you use these functions.</para>
920
921    <para>Helgrind attempts to hide these errors using the standard
922    Valgrind error-suppression mechanism.  So, at least for simple
923    test cases, you don't see any.  Nevertheless, some may slip
924    through.  Just something to be aware of.</para>
925  </listitem>
926
927  <listitem>
928    <para>Helgrind's error checks do not work properly inside the
929    system threading library itself
930    (<computeroutput>libpthread.so</computeroutput>), and it usually
931    observes large numbers of (false) errors in there.  Valgrind's
932    suppression system then filters these out, so you should not see
933    them.</para>
934
935    <para>If you see any race errors reported
936    where <computeroutput>libpthread.so</computeroutput> or
937    <computeroutput>ld.so</computeroutput> is the object associated
938    with the innermost stack frame, please file a bug report at
939    <ulink url="&vg-url;">&vg-url;</ulink>.
940    </para>
941  </listitem>
942
943</orderedlist>
944
945</sect1>
946
947
948
949
950<sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options">
951<title>Helgrind Command-line Options</title>
952
953<para>The following end-user options are available:</para>
954
955<!-- start of xi:include in the manpage -->
956<variablelist id="hg.opts.list">
957
958  <varlistentry id="opt.track-lockorders"
959                xreflabel="--track-lockorders">
960    <term>
961      <option><![CDATA[--track-lockorders=no|yes
962      [default: yes] ]]></option>
963    </term>
964    <listitem>
965      <para>When enabled (the default), Helgrind performs lock order
966      consistency checking.  For some buggy programs, the large number
967      of lock order errors reported can become annoying, particularly
968      if you're only interested in race errors.  You may therefore find
969      it helpful to disable lock order checking.</para>
970    </listitem>
971  </varlistentry>
972
973  <varlistentry id="opt.history-level"
974                xreflabel="--history-level">
975    <term>
976      <option><![CDATA[--history-level=none|approx|full
977      [default: full] ]]></option>
978    </term>
979    <listitem>
980      <para><option>--history-level=full</option> (the default) causes
981        Helgrind collects enough information about "old" accesses that
982        it can produce two stack traces in a race report -- both the
983        stack trace for the current access, and the trace for the
984        older, conflicting access.</para>
985      <para>Collecting such information is expensive in both speed and
986        memory, particularly for programs that do many inter-thread
987        synchronisation events (locks, unlocks, etc).  Without such
988        information, it is more difficult to track down the root
989        causes of races.  Nonetheless, you may not need it in
990        situations where you just want to check for the presence or
991        absence of races, for example, when doing regression testing
992        of a previously race-free program.</para>
993      <para><option>--history-level=none</option> is the opposite
994        extreme.  It causes Helgrind not to collect any information
995        about previous accesses.  This can be dramatically faster
996        than <option>--history-level=full</option>.</para>
997      <para><option>--history-level=approx</option> provides a
998        compromise between these two extremes.  It causes Helgrind to
999        show a full trace for the later access, and approximate
1000        information regarding the earlier access.  This approximate
1001        information consists of two stacks, and the earlier access is
1002        guaranteed to have occurred somewhere between program points
1003        denoted by the two stacks. This is not as useful as showing
1004        the exact stack for the previous access
1005        (as <option>--history-level=full</option> does), but it is
1006        better than nothing, and it is almost as fast as
1007        <option>--history-level=none</option>.</para>
1008    </listitem>
1009  </varlistentry>
1010
1011  <varlistentry id="opt.conflict-cache-size"
1012                xreflabel="--conflict-cache-size">
1013    <term>
1014      <option><![CDATA[--conflict-cache-size=N
1015      [default: 1000000] ]]></option>
1016    </term>
1017    <listitem>
1018      <para>This flag only has any effect
1019        at <option>--history-level=full</option>.</para>
1020      <para>Information about "old" conflicting accesses is stored in
1021        a cache of limited size, with LRU-style management.  This is
1022        necessary because it isn't practical to store a stack trace
1023        for every single memory access made by the program.
1024        Historical information on not recently accessed locations is
1025        periodically discarded, to free up space in the cache.</para>
1026      <para>This option controls the size of the cache, in terms of the
1027        number of different memory addresses for which
1028        conflicting access information is stored.  If you find that
1029        Helgrind is showing race errors with only one stack instead of
1030        the expected two stacks, try increasing this value.</para>
1031      <para>The minimum value is 10,000 and the maximum is 30,000,000
1032        (thirty times the default value).  Increasing the value by 1
1033        increases Helgrind's memory requirement by very roughly 100
1034        bytes, so the maximum value will easily eat up three extra
1035        gigabytes or so of memory.</para>
1036    </listitem>
1037  </varlistentry>
1038
1039</variablelist>
1040<!-- end of xi:include in the manpage -->
1041
1042<!-- start of xi:include in the manpage -->
1043<!--  commented out, because we don't document debugging options in the
1044      manual.  Nb: all the double-dashes below had a space inserted in them
1045      to avoid problems with premature closing of this comment.
1046<para>In addition, the following debugging options are available for
1047Helgrind:</para>
1048
1049<variablelist id="hg.debugopts.list">
1050
1051  <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
1052    <term>
1053      <option><![CDATA[- -trace-malloc=no|yes [no]
1054      ]]></option>
1055    </term>
1056    <listitem>
1057      <para>Show all client <function>malloc</function> (etc) and
1058      <function>free</function> (etc) requests.</para>
1059    </listitem>
1060  </varlistentry>
1061
1062  <varlistentry id="opt.cmp-race-err-addrs"
1063                xreflabel="- -cmp-race-err-addrs">
1064    <term>
1065      <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
1066      ]]></option>
1067    </term>
1068    <listitem>
1069      <para>Controls whether or not race (data) addresses should be
1070        taken into account when removing duplicates of race errors.
1071        With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
1072        identical race errors will be considered to be the same if
1073        their race addresses differ.  With
1074        With <varname>- -cmp-race-err-addrs=yes</varname> they will be
1075        considered different.  This is provided to help make certain
1076        regression tests work reliably.</para>
1077    </listitem>
1078  </varlistentry>
1079
1080  <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
1081    <term>
1082      <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
1083      ]]></option>
1084    </term>
1085    <listitem>
1086      <para>Run extensive sanity checks on Helgrind's internal
1087        data structures at events defined by the bitstring, as
1088        follows:</para>
1089      <para><computeroutput>010000 </computeroutput>after changes to
1090        the lock order acquisition graph</para>
1091      <para><computeroutput>001000 </computeroutput>after every client
1092        memory access (NB: not currently used)</para>
1093      <para><computeroutput>000100 </computeroutput>after every client
1094        memory range permission setting of 256 bytes or greater</para>
1095      <para><computeroutput>000010 </computeroutput>after every client
1096        lock or unlock event</para>
1097      <para><computeroutput>000001 </computeroutput>after every client
1098        thread creation or joinage event</para>
1099      <para>Note these will make Helgrind run very slowly, often to
1100        the point of being completely unusable.</para>
1101    </listitem>
1102  </varlistentry>
1103
1104</variablelist>
1105-->
1106<!-- end of xi:include in the manpage -->
1107
1108
1109</sect1>
1110
1111
1112
1113<sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
1114<title>Helgrind Client Requests</title>
1115
1116<para>The following client requests are defined in
1117<filename>helgrind.h</filename>.  See that file for exact details of their
1118arguments.</para>
1119
1120<itemizedlist>
1121
1122  <listitem>
1123    <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para>
1124    <para>This makes Helgrind forget everything it knows about a
1125    specified memory range.  This is particularly useful for memory
1126    allocators that wish to recycle memory.</para>
1127  </listitem>
1128  <listitem>
1129    <para><function>ANNOTATE_HAPPENS_BEFORE</function></para>
1130  </listitem>
1131  <listitem>
1132    <para><function>ANNOTATE_HAPPENS_AFTER</function></para>
1133  </listitem>
1134  <listitem>
1135    <para><function>ANNOTATE_NEW_MEMORY</function></para>
1136  </listitem>
1137  <listitem>
1138    <para><function>ANNOTATE_RWLOCK_CREATE</function></para>
1139  </listitem>
1140  <listitem>
1141    <para><function>ANNOTATE_RWLOCK_DESTROY</function></para>
1142  </listitem>
1143  <listitem>
1144    <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para>
1145  </listitem>
1146  <listitem>
1147    <para><function>ANNOTATE_RWLOCK_RELEASED</function></para>
1148    <para>These are used to describe to Helgrind, the behaviour of
1149    custom (non-POSIX) synchronisation primitives, which it otherwise
1150    has no way to understand.  See comments
1151    in <filename>helgrind.h</filename> for further
1152    documentation.</para>
1153  </listitem>
1154
1155</itemizedlist>
1156
1157</sect1>
1158
1159
1160
1161<sect1 id="hg-manual.todolist" xreflabel="To Do List">
1162<title>A To-Do List for Helgrind</title>
1163
1164<para>The following is a list of loose ends which should be tidied up
1165some time.</para>
1166
1167<itemizedlist>
1168  <listitem><para>For lock order errors, print the complete lock
1169    cycle, rather than only doing for size-2 cycles as at
1170    present.</para>
1171  </listitem>
1172  <listitem><para>The conflicting access mechanism sometimes
1173    mysteriously fails to show the conflicting access' stack, even
1174    when provided with unbounded storage for conflicting access info.
1175    This should be investigated.</para>
1176  </listitem>
1177  <listitem><para>Document races caused by GCC's thread-unsafe code
1178    generation for speculative stores.  In the interim see
1179    <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
1180    </computeroutput>
1181    and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
1182    </para>
1183  </listitem>
1184  <listitem><para>Don't update the lock-order graph, and don't check
1185    for errors, when a "try"-style lock operation happens (e.g.
1186    <function>pthread_mutex_trylock</function>).  Such calls do not add any real
1187    restrictions to the locking order, since they can always fail to
1188    acquire the lock, resulting in the caller going off and doing Plan
1189    B (presumably it will have a Plan B).  Doing such checks could
1190    generate false lock-order errors and confuse users.</para>
1191  </listitem>
1192  <listitem><para> Performance can be very poor.  Slowdowns on the
1193    order of 100:1 are not unusual.  There is limited scope for
1194    performance improvements.
1195    </para>
1196  </listitem>
1197
1198</itemizedlist>
1199
1200</sect1>
1201
1202</chapter>
1203