• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
7<chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
8  <title>Helgrind: a thread error detector</title>
9
10<para>To use this tool, you must specify
11<option>--tool=helgrind</option> on the Valgrind
12command line.</para>
13
14
15<sect1 id="hg-manual.overview" xreflabel="Overview">
16<title>Overview</title>
17
18<para>Helgrind is a Valgrind tool for detecting synchronisation errors
19in C, C++ and Fortran programs that use the POSIX pthreads
20threading primitives.</para>
21
22<para>The main abstractions in POSIX pthreads are: a set of threads
23sharing a common address space, thread creation, thread joining,
24thread exit, mutexes (locks), condition variables (inter-thread event
25notifications), reader-writer locks, spinlocks, semaphores and
26barriers.</para>
27
28<para>Helgrind can detect three classes of errors, which are discussed
29in detail in the next three sections:</para>
30
31<orderedlist>
32 <listitem>
33  <para><link linkend="hg-manual.api-checks">
34        Misuses of the POSIX pthreads API.</link></para>
35 </listitem>
36 <listitem>
37  <para><link linkend="hg-manual.lock-orders">
38        Potential deadlocks arising from lock
39        ordering problems.</link></para>
40 </listitem>
41 <listitem>
42  <para><link linkend="hg-manual.data-races">
43        Data races -- accessing memory without adequate locking
44                      or synchronisation</link>.
45  </para>
46 </listitem>
47</orderedlist>
48
49<para>Problems like these often result in unreproducible,
50timing-dependent crashes, deadlocks and other misbehaviour, and
51can be difficult to find by other means.</para>
52
53<para>Helgrind is aware of all the pthread abstractions and tracks
54their effects as accurately as it can.  On x86 and amd64 platforms, it
55understands and partially handles implicit locking arising from the
56use of the LOCK instruction prefix.  On PowerPC/POWER and ARM
57platforms, it partially handles implicit locking arising from
58load-linked and store-conditional instruction pairs.
59</para>
60
61<para>Helgrind works best when your application uses only the POSIX
62pthreads API.  However, if you want to use custom threading
63primitives, you can describe their behaviour to Helgrind using the
64<varname>ANNOTATE_*</varname> macros defined
65in <varname>helgrind.h</varname>.</para>
66
67
68
69<para>Following those is a section containing
70<link linkend="hg-manual.effective-use">
71hints and tips on how to get the best out of Helgrind.</link>
72</para>
73
74<para>Then there is a
75<link linkend="hg-manual.options">summary of command-line
76options.</link>
77</para>
78
79<para>Finally, there is
80<link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
81could be improved.</link>
82</para>
83
84</sect1>
85
86
87
88
89<sect1 id="hg-manual.api-checks" xreflabel="API Checks">
90<title>Detected errors: Misuses of the POSIX pthreads API</title>
91
92<para>Helgrind intercepts calls to many POSIX pthreads functions, and
93is therefore able to report on various common problems.  Although
94these are unglamourous errors, their presence can lead to undefined
95program behaviour and hard-to-find bugs later on.  The detected errors
96are:</para>
97
98<itemizedlist>
99 <listitem><para>unlocking an invalid mutex</para></listitem>
100 <listitem><para>unlocking a not-locked mutex</para></listitem>
101 <listitem><para>unlocking a mutex held by a different
102                 thread</para></listitem>
103 <listitem><para>destroying an invalid or a locked mutex</para></listitem>
104 <listitem><para>recursively locking a non-recursive mutex</para></listitem>
105 <listitem><para>deallocation of memory that contains a
106                 locked mutex</para></listitem>
107 <listitem><para>passing mutex arguments to functions expecting
108                 reader-writer lock arguments, and vice
109                 versa</para></listitem>
110 <listitem><para>when a POSIX pthread function fails with an
111                 error code that must be handled</para></listitem>
112 <listitem><para>when a thread exits whilst still holding locked
113                 locks</para></listitem>
114 <listitem><para>calling <function>pthread_cond_wait</function>
115                 with a not-locked mutex, an invalid mutex,
116                 or one locked by a different
117                 thread</para></listitem>
118 <listitem><para>inconsistent bindings between condition
119                 variables and their associated mutexes</para></listitem>
120 <listitem><para>invalid or duplicate initialisation of a pthread
121                 barrier</para></listitem>
122 <listitem><para>initialisation of a pthread barrier on which threads
123                 are still waiting</para></listitem>
124 <listitem><para>destruction of a pthread barrier object which was
125                 never initialised, or on which threads are still
126                 waiting</para></listitem>
127 <listitem><para>waiting on an uninitialised pthread
128                 barrier</para></listitem>
129 <listitem><para>for all of the pthreads functions that Helgrind
130                 intercepts, an error is reported, along with a stack
131                 trace, if the system threading library routine returns
132                 an error code, even if Helgrind itself detected no
133                 error</para></listitem>
134</itemizedlist>
135
136<para>Checks pertaining to the validity of mutexes are generally also
137performed for reader-writer locks.</para>
138
139<para>Various kinds of this-can't-possibly-happen events are also
140reported.  These usually indicate bugs in the system threading
141library.</para>
142
143<para>Reported errors always contain a primary stack trace indicating
144where the error was detected.  They may also contain auxiliary stack
145traces giving additional information.  In particular, most errors
146relating to mutexes will also tell you where that mutex first came to
147Helgrind's attention (the "<computeroutput>was first observed
148at</computeroutput>" part), so you have a chance of figuring out which
149mutex it is referring to.  For example:</para>
150
151<programlisting><![CDATA[
152Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
153   at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
154   by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
155   by 0x40079B: main (tc09_bad_unlock.c:50)
156  Lock at 0x7FEFFFA90 was first observed
157   at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
158   by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
159   by 0x40079B: main (tc09_bad_unlock.c:50)
160]]></programlisting>
161
162<para>Helgrind has a way of summarising thread identities, as
163you see here with the text "<computeroutput>Thread
164#1</computeroutput>".  This is so that it can speak about threads and
165sets of threads without overwhelming you with details.  See
166<link linkend="hg-manual.data-races.errmsgs">below</link>
167for more information on interpreting error messages.</para>
168
169</sect1>
170
171
172
173
174<sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
175<title>Detected errors: Inconsistent Lock Orderings</title>
176
177<para>In this section, and in general, to "acquire" a lock simply
178means to lock that lock, and to "release" a lock means to unlock
179it.</para>
180
181<para>Helgrind monitors the order in which threads acquire locks.
182This allows it to detect potential deadlocks which could arise from
183the formation of cycles of locks.  Detecting such inconsistencies is
184useful because, whilst actual deadlocks are fairly obvious, potential
185deadlocks may never be discovered during testing and could later lead
186to hard-to-diagnose in-service failures.</para>
187
188<para>The simplest example of such a problem is as
189follows.</para>
190
191<itemizedlist>
192 <listitem><para>Imagine some shared resource R, which, for whatever
193  reason, is guarded by two locks, L1 and L2, which must both be held
194  when R is accessed.</para>
195 </listitem>
196 <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
197  to access R.  The implication of this is that all threads in the
198  program must acquire the two locks in the order first L1 then L2.
199  Not doing so risks deadlock.</para>
200 </listitem>
201 <listitem><para>The deadlock could happen if two threads -- call them
202  T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
203  and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
204  to acquire L1, but those locks are both already held.  So T1 and T2
205  become deadlocked.</para>
206 </listitem>
207</itemizedlist>
208
209<para>Helgrind builds a directed graph indicating the order in which
210locks have been acquired in the past.  When a thread acquires a new
211lock, the graph is updated, and then checked to see if it now contains
212a cycle.  The presence of a cycle indicates a potential deadlock involving
213the locks in the cycle.</para>
214
215<para>In general, Helgrind will choose two locks involved in the cycle
216and show you how their acquisition ordering has become inconsistent.
217It does this by showing the program points that first defined the
218ordering, and the program points which later violated it.  Here is a
219simple example involving just two locks:</para>
220
221<programlisting><![CDATA[
222Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
223
224Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
225   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
226   by 0x400825: main (tc13_laog1.c:23)
227
228 followed by a later acquisition of lock at 0x7FF0006D0
229   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
230   by 0x400853: main (tc13_laog1.c:24)
231
232Required order was established by acquisition of lock at 0x7FF0006D0
233   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
234   by 0x40076D: main (tc13_laog1.c:17)
235
236 followed by a later acquisition of lock at 0x7FF0006A0
237   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
238   by 0x40079B: main (tc13_laog1.c:18)
239]]></programlisting>
240
241<para>When there are more than two locks in the cycle, the error is
242equally serious.  However, at present Helgrind does not show the locks
243involved, sometimes because it that information is not available, but
244also so as to avoid flooding you with information.  For example, here
245is an example involving a cycle of five locks from a naive
246implementation the famous Dining Philosophers problem
247(see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
248In this case Helgrind has detected that all 5 philosophers could
249simultaneously pick up their left fork and then deadlock whilst
250waiting to pick up their right forks.</para>
251
252<programlisting><![CDATA[
253Thread #6: lock order "0x6010C0 before 0x601160" violated
254
255Observed (incorrect) order is: acquisition of lock at 0x601160
256   (stack unavailable)
257
258 followed by a later acquisition of lock at 0x6010C0
259   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
260   by 0x4007DE: dine (tc14_laog_dinphils.c:19)
261   by 0x4C2CBE7: mythread_wrapper (hg_intercepts.c:219)
262   by 0x4E369C9: start_thread (pthread_create.c:300)
263]]></programlisting>
264
265</sect1>
266
267
268
269
270<sect1 id="hg-manual.data-races" xreflabel="Data Races">
271<title>Detected errors: Data Races</title>
272
273<para>A data race happens, or could happen, when two threads access a
274shared memory location without using suitable locks or other
275synchronisation to ensure single-threaded access.  Such missing
276locking can cause obscure timing dependent bugs.  Ensuring programs
277are race-free is one of the central difficulties of threaded
278programming.</para>
279
280<para>Reliably detecting races is a difficult problem, and most
281of Helgrind's internals are devoted to dealing with it.
282We begin with a simple example.</para>
283
284
285<sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
286<title>A Simple Data Race</title>
287
288<para>About the simplest possible example of a race is as follows.  In
289this program, it is impossible to know what the value
290of <computeroutput>var</computeroutput> is at the end of the program.
291Is it 2 ?  Or 1 ?</para>
292
293<programlisting><![CDATA[
294#include <pthread.h>
295
296int var = 0;
297
298void* child_fn ( void* arg ) {
299   var++; /* Unprotected relative to parent */ /* this is line 6 */
300   return NULL;
301}
302
303int main ( void ) {
304   pthread_t child;
305   pthread_create(&child, NULL, child_fn, NULL);
306   var++; /* Unprotected relative to child */ /* this is line 13 */
307   pthread_join(child, NULL);
308   return 0;
309}
310]]></programlisting>
311
312<para>The problem is there is nothing to
313stop <varname>var</varname> being updated simultaneously
314by both threads.  A correct program would
315protect <varname>var</varname> with a lock of type
316<function>pthread_mutex_t</function>, which is acquired
317before each access and released afterwards.  Helgrind's output for
318this program is:</para>
319
320<programlisting><![CDATA[
321Thread #1 is the program's root thread
322
323Thread #2 was created
324   at 0x511C08E: clone (in /lib64/libc-2.8.so)
325   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
326   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
327   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
328   by 0x400605: main (simple_race.c:12)
329
330Possible data race during read of size 4 at 0x601038 by thread #1
331Locks held: none
332   at 0x400606: main (simple_race.c:13)
333
334This conflicts with a previous write of size 4 by thread #2
335Locks held: none
336   at 0x4005DC: child_fn (simple_race.c:6)
337   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
338   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
339   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
340
341Location 0x601038 is 0 bytes inside global var "var"
342declared at simple_race.c:3
343]]></programlisting>
344
345<para>This is quite a lot of detail for an apparently simple error.
346The last clause is the main error message.  It says there is a race as
347a result of a read of size 4 (bytes), at 0x601038, which is the
348address of <computeroutput>var</computeroutput>, happening in
349function <computeroutput>main</computeroutput> at line 13 in the
350program.</para>
351
352<para>Two important parts of the message are:</para>
353
354<itemizedlist>
355 <listitem>
356  <para>Helgrind shows two stack traces for the error, not one.  By
357   definition, a race involves two different threads accessing the
358   same location in such a way that the result depends on the relative
359   speeds of the two threads.</para>
360  <para>
361   The first stack trace follows the text "<computeroutput>Possible
362   data race during read of size 4 ...</computeroutput>" and the
363   second trace follows the text "<computeroutput>This conflicts with
364   a previous write of size 4 ...</computeroutput>".  Helgrind is
365   usually able to show both accesses involved in a race.  At least
366   one of these will be a write (since two concurrent, unsynchronised
367   reads are harmless), and they will of course be from different
368   threads.</para>
369  <para>By examining your program at the two locations, you should be
370   able to get at least some idea of what the root cause of the
371   problem is.  For each location, Helgrind shows the set of locks
372   held at the time of the access.  This often makes it clear which
373   thread, if any, failed to take a required lock.  In this example
374   neither thread holds a lock during the access.</para>
375 </listitem>
376 <listitem>
377  <para>For races which occur on global or stack variables, Helgrind
378   tries to identify the name and defining point of the variable.
379   Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside
380   global var "var" declared at simple_race.c:3</computeroutput>".</para>
381  <para>Showing names of stack and global variables carries no
382   run-time overhead once Helgrind has your program up and running.
383   However, it does require Helgrind to spend considerable extra time
384   and memory at program startup to read the relevant debug info.
385   Hence this facility is disabled by default.  To enable it, you need
386   to give the <varname>--read-var-info=yes</varname> option to
387   Helgrind.</para>
388 </listitem>
389</itemizedlist>
390
391<para>The following section explains Helgrind's race detection
392algorithm in more detail.</para>
393
394</sect2>
395
396
397
398<sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm">
399<title>Helgrind's Race Detection Algorithm</title>
400
401<para>Most programmers think about threaded programming in terms of
402the basic functionality provided by the threading library (POSIX
403Pthreads): thread creation, thread joining, locks, condition
404variables, semaphores and barriers.</para>
405
406<para>The effect of using these functions is to impose
407constraints upon the order in which memory accesses can
408happen.  This implied ordering is generally known as the
409"happens-before relation".  Once you understand the happens-before
410relation, it is easy to see how Helgrind finds races in your code.
411Fortunately, the happens-before relation is itself easy to understand,
412and is by itself a useful tool for reasoning about the behaviour of
413parallel programs.  We now introduce it using a simple example.</para>
414
415<para>Consider first the following buggy program:</para>
416
417<programlisting><![CDATA[
418Parent thread:                         Child thread:
419
420int var;
421
422// create child thread
423pthread_create(...)
424var = 20;                              var = 10;
425                                       exit
426
427// wait for child
428pthread_join(...)
429printf("%d\n", var);
430]]></programlisting>
431
432<para>The parent thread creates a child.  Both then write different
433values to some variable <computeroutput>var</computeroutput>, and the
434parent then waits for the child to exit.</para>
435
436<para>What is the value of <computeroutput>var</computeroutput> at the
437end of the program, 10 or 20?  We don't know.  The program is
438considered buggy (it has a race) because the final value
439of <computeroutput>var</computeroutput> depends on the relative rates
440of progress of the parent and child threads.  If the parent is fast
441and the child is slow, then the child's assignment may happen later,
442so the final value will be 10; and vice versa if the child is faster
443than the parent.</para>
444
445<para>The relative rates of progress of parent vs child is not something
446the programmer can control, and will often change from run to run.
447It depends on factors such as the load on the machine, what else is
448running, the kernel's scheduling strategy, and many other factors.</para>
449
450<para>The obvious fix is to use a lock to
451protect <computeroutput>var</computeroutput>.  It is however
452instructive to consider a somewhat more abstract solution, which is to
453send a message from one thread to the other:</para>
454
455<programlisting><![CDATA[
456Parent thread:                         Child thread:
457
458int var;
459
460// create child thread
461pthread_create(...)
462var = 20;
463// send message to child
464                                       // wait for message to arrive
465                                       var = 10;
466                                       exit
467
468// wait for child
469pthread_join(...)
470printf("%d\n", var);
471]]></programlisting>
472
473<para>Now the program reliably prints "10", regardless of the speed of
474the threads.  Why?  Because the child's assignment cannot happen until
475after it receives the message.  And the message is not sent until
476after the parent's assignment is done.</para>
477
478<para>The message transmission creates a "happens-before" dependency
479between the two assignments: <computeroutput>var = 20;</computeroutput>
480must now happen-before <computeroutput>var = 10;</computeroutput>.
481And so there is no longer a race
482on <computeroutput>var</computeroutput>.
483</para>
484
485<para>Note that it's not significant that the parent sends a message
486to the child.  Sending a message from the child (after its assignment)
487to the parent (before its assignment) would also fix the problem, causing
488the program to reliably print "20".</para>
489
490<para>Helgrind's algorithm is (conceptually) very simple.  It monitors all
491accesses to memory locations.  If a location -- in this example,
492<computeroutput>var</computeroutput>,
493is accessed by two different threads, Helgrind checks to see if the
494two accesses are ordered by the happens-before relation.  If so,
495that's fine; if not, it reports a race.</para>
496
497<para>It is important to understand that the happens-before relation
498creates only a partial ordering, not a total ordering.  An example of
499a total ordering is comparison of numbers: for any two numbers
500<computeroutput>x</computeroutput> and
501<computeroutput>y</computeroutput>, either
502<computeroutput>x</computeroutput> is less than, equal to, or greater
503than
504<computeroutput>y</computeroutput>.  A partial ordering is like a
505total ordering, but it can also express the concept that two elements
506are neither equal, less or greater, but merely unordered with respect
507to each other.</para>
508
509<para>In the fixed example above, we say that
510<computeroutput>var = 20;</computeroutput> "happens-before"
511<computeroutput>var = 10;</computeroutput>.  But in the original
512version, they are unordered: we cannot say that either happens-before
513the other.</para>
514
515<para>What does it mean to say that two accesses from different
516threads are ordered by the happens-before relation?  It means that
517there is some chain of inter-thread synchronisation operations which
518cause those accesses to happen in a particular order, irrespective of
519the actual rates of progress of the individual threads.  This is a
520required property for a reliable threaded program, which is why
521Helgrind checks for it.</para>
522
523<para>The happens-before relations created by standard threading
524primitives are as follows:</para>
525
526<itemizedlist>
527 <listitem><para>When a mutex is unlocked by thread T1 and later (or
528  immediately) locked by thread T2, then the memory accesses in T1
529  prior to the unlock must happen-before those in T2 after it acquires
530  the lock.</para>
531 </listitem>
532 <listitem><para>The same idea applies to reader-writer locks,
533  although with some complication so as to allow correct handling of
534  reads vs writes.</para>
535 </listitem>
536 <listitem><para>When a condition variable (CV) is signalled on by
537  thread T1 and some other thread T2 is thereby released from a wait
538  on the same CV, then the memory accesses in T1 prior to the
539  signalling must happen-before those in T2 after it returns from the
540  wait.  If no thread was waiting on the CV then there is no
541  effect.</para>
542 </listitem>
543 <listitem><para>If instead T1 broadcasts on a CV, then all of the
544  waiting threads, rather than just one of them, acquire a
545  happens-before dependency on the broadcasting thread at the point it
546  did the broadcast.</para>
547 </listitem>
548 <listitem><para>A thread T2 that continues after completing sem_wait
549  on a semaphore that thread T1 posts on, acquires a happens-before
550  dependence on the posting thread, a bit like dependencies caused
551  mutex unlock-lock pairs.  However, since a semaphore can be posted
552  on many times, it is unspecified from which of the post calls the
553  wait call gets its happens-before dependency.</para>
554 </listitem>
555 <listitem><para>For a group of threads T1 .. Tn which arrive at a
556  barrier and then move on, each thread after the call has a
557  happens-after dependency from all threads before the
558  barrier.</para>
559 </listitem>
560 <listitem><para>A newly-created child thread acquires an initial
561  happens-after dependency on the point where its parent created it.
562  That is, all memory accesses performed by the parent prior to
563  creating the child are regarded as happening-before all the accesses
564  of the child.</para>
565 </listitem>
566 <listitem><para>Similarly, when an exiting thread is reaped via a
567  call to <function>pthread_join</function>, once the call returns, the
568  reaping thread acquires a happens-after dependency relative to all memory
569  accesses made by the exiting thread.</para>
570 </listitem>
571</itemizedlist>
572
573<para>In summary: Helgrind intercepts the above listed events, and builds a
574directed acyclic graph represented the collective happens-before
575dependencies.  It also monitors all memory accesses.</para>
576
577<para>If a location is accessed by two different threads, but Helgrind
578cannot find any path through the happens-before graph from one access
579to the other, then it reports a race.</para>
580
581<para>There are a couple of caveats:</para>
582
583<itemizedlist>
584 <listitem><para>Helgrind doesn't check for a race in the case where
585  both accesses are reads.  That would be silly, since concurrent
586  reads are harmless.</para>
587 </listitem>
588 <listitem><para>Two accesses are considered to be ordered by the
589  happens-before dependency even through arbitrarily long chains of
590  synchronisation events.  For example, if T1 accesses some location
591  L, and then <function>pthread_cond_signals</function> T2, which later
592  <function>pthread_cond_signals</function> T3, which then accesses L, then
593  a suitable happens-before dependency exists between the first and second
594  accesses, even though it involves two different inter-thread
595  synchronisation events.</para>
596 </listitem>
597</itemizedlist>
598
599</sect2>
600
601
602
603<sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
604<title>Interpreting Race Error Messages</title>
605
606<para>Helgrind's race detection algorithm collects a lot of
607information, and tries to present it in a helpful way when a race is
608detected.  Here's an example:</para>
609
610<programlisting><![CDATA[
611Thread #2 was created
612   at 0x511C08E: clone (in /lib64/libc-2.8.so)
613   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
614   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
615   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
616   by 0x4008F2: main (tc21_pthonce.c:86)
617
618Thread #3 was created
619   at 0x511C08E: clone (in /lib64/libc-2.8.so)
620   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
621   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
622   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
623   by 0x4008F2: main (tc21_pthonce.c:86)
624
625Possible data race during read of size 4 at 0x601070 by thread #3
626Locks held: none
627   at 0x40087A: child (tc21_pthonce.c:74)
628   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
629   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
630   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
631
632This conflicts with a previous write of size 4 by thread #2
633Locks held: none
634   at 0x400883: child (tc21_pthonce.c:74)
635   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
636   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
637   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
638
639Location 0x601070 is 0 bytes inside local var "unprotected2"
640declared at tc21_pthonce.c:51, in frame #0 of thread 3
641]]></programlisting>
642
643<para>Helgrind first announces the creation points of any threads
644referenced in the error message.  This is so it can speak concisely
645about threads without repeatedly printing their creation point call
646stacks.  Each thread is only ever announced once, the first time it
647appears in any Helgrind error message.</para>
648
649<para>The main error message begins at the text
650"<computeroutput>Possible data race during read</computeroutput>".  At
651the start is information you would expect to see -- address and size
652of the racing access, whether a read or a write, and the call stack at
653the point it was detected.</para>
654
655<para>A second call stack is presented starting at the text
656"<computeroutput>This conflicts with a previous
657write</computeroutput>".  This shows a previous access which also
658accessed the stated address, and which is believed to be racing
659against the access in the first call stack. Note that this second
660call stack is limited to a maximum of 8 entries to limit the
661memory usage.</para>
662
663<para>Finally, Helgrind may attempt to give a description of the
664raced-on address in source level terms.  In this example, it
665identifies it as a local variable, shows its name, declaration point,
666and in which frame (of the first call stack) it lives.  Note that this
667information is only shown when <varname>--read-var-info=yes</varname>
668is specified on the command line.  That's because reading the DWARF3
669debug information in enough detail to capture variable type and
670location information makes Helgrind much slower at startup, and also
671requires considerable amounts of memory, for large programs.
672</para>
673
674<para>Once you have your two call stacks, how do you find the root
675cause of the race?</para>
676
677<para>The first thing to do is examine the source locations referred
678to by each call stack.  They should both show an access to the same
679location, or variable.</para>
680
681<para>Now figure out how how that location should have been made
682thread-safe:</para>
683
684<itemizedlist>
685 <listitem><para>Perhaps the location was intended to be protected by
686  a mutex?  If so, you need to lock and unlock the mutex at both
687  access points, even if one of the accesses is reported to be a read.
688  Did you perhaps forget the locking at one or other of the accesses?
689  To help you do this, Helgrind shows the set of locks held by each
690  threads at the time they accessed the raced-on location.</para>
691 </listitem>
692 <listitem><para>Alternatively, perhaps you intended to use a some
693  other scheme to make it safe, such as signalling on a condition
694  variable.  In all such cases, try to find a synchronisation event
695  (or a chain thereof) which separates the earlier-observed access (as
696  shown in the second call stack) from the later-observed access (as
697  shown in the first call stack).  In other words, try to find
698  evidence that the earlier access "happens-before" the later access.
699  See the previous subsection for an explanation of the happens-before
700  relation.</para>
701  <para>
702  The fact that Helgrind is reporting a race means it did not observe
703  any happens-before relation between the two accesses.  If
704  Helgrind is working correctly, it should also be the case that you
705  also cannot find any such relation, even on detailed inspection
706  of the source code.  Hopefully, though, your inspection of the code
707  will show where the missing synchronisation operation(s) should have
708  been.</para>
709 </listitem>
710</itemizedlist>
711
712</sect2>
713
714
715</sect1>
716
717<sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
718<title>Hints and Tips for Effective Use of Helgrind</title>
719
720<para>Helgrind can be very helpful in finding and resolving
721threading-related problems.  Like all sophisticated tools, it is most
722effective when you understand how to play to its strengths.</para>
723
724<para>Helgrind will be less effective when you merely throw an
725existing threaded program at it and try to make sense of any reported
726errors.  It will be more effective if you design threaded programs
727from the start in a way that helps Helgrind verify correctness.  The
728same is true for finding memory errors with Memcheck, but applies more
729here, because thread checking is a harder problem.  Consequently it is
730much easier to write a correct program for which Helgrind falsely
731reports (threading) errors than it is to write a correct program for
732which Memcheck falsely reports (memory) errors.</para>
733
734<para>With that in mind, here are some tips, listed most important first,
735for getting reliable results and avoiding false errors.  The first two
736are critical.  Any violations of them will swamp you with huge numbers
737of false data-race errors.</para>
738
739
740<orderedlist>
741
742  <listitem>
743    <para>Make sure your application, and all the libraries it uses,
744    use the POSIX threading primitives.  Helgrind needs to be able to
745    see all events pertaining to thread creation, exit, locking and
746    other synchronisation events.  To do so it intercepts many POSIX
747    pthreads functions.</para>
748
749    <para>Do not roll your own threading primitives (mutexes, etc)
750    from combinations of the Linux futex syscall, atomic counters, etc.
751    These throw Helgrind's internal what's-going-on models
752    way off course and will give bogus results.</para>
753
754    <para>Also, do not reimplement existing POSIX abstractions using
755    other POSIX abstractions.  For example, don't build your own
756    semaphore routines or reader-writer locks from POSIX mutexes and
757    condition variables.  Instead use POSIX reader-writer locks and
758    semaphores directly, since Helgrind supports them directly.</para>
759
760    <para>Helgrind directly supports the following POSIX threading
761    abstractions: mutexes, reader-writer locks, condition variables
762    (but see below), semaphores and barriers.  Currently spinlocks
763    are not supported, although they could be in future.</para>
764
765    <para>At the time of writing, the following popular Linux packages
766    are known to implement their own threading primitives:</para>
767
768    <itemizedlist>
769     <listitem><para>Qt version 4.X.  Qt 3.X is harmless in that it
770      only uses POSIX pthreads primitives.  Unfortunately Qt 4.X
771      has its own implementation of mutexes (QMutex) and thread reaping.
772      Helgrind 3.4.x contains direct support
773      for Qt 4.X threading, which is experimental but is believed to
774      work fairly well.  A side effect of supporting Qt 4 directly is
775      that Helgrind can be used to debug KDE4 applications.  As this
776      is an experimental feature, we would particularly appreciate
777      feedback from folks who have used Helgrind to successfully debug
778      Qt 4 and/or KDE4 applications.</para>
779     </listitem>
780     <listitem><para>Runtime support library for GNU OpenMP (part of
781      GCC), at least for GCC versions 4.2 and 4.3.  The GNU OpenMP runtime
782      library (<filename>libgomp.so</filename>) constructs its own
783      synchronisation primitives using combinations of atomic memory
784      instructions and the futex syscall, which causes total chaos since in
785      Helgrind since it cannot "see" those.</para>
786     <para>Fortunately, this can be solved using a configuration-time
787      option (for GCC).  Rebuild GCC from source, and configure using
788      <varname>--disable-linux-futex</varname>.
789      This makes libgomp.so use the standard
790      POSIX threading primitives instead.  Note that this was tested
791      using GCC 4.2.3 and has not been re-tested using more recent GCC
792      versions.  We would appreciate hearing about any successes or
793      failures with more recent versions.</para>
794     </listitem>
795    </itemizedlist>
796
797    <para>If you must implement your own threading primitives, there
798      are a set of client request macros
799      in <computeroutput>helgrind.h</computeroutput> to help you
800      describe your primitives to Helgrind.  You should be able to
801      mark up mutexes, condition variables, etc, without difficulty.
802    </para>
803    <para>
804      It is also possible to mark up the effects of thread-safe
805      reference counting using the
806      <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>,
807      <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput> and
808      <computeroutput>ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</computeroutput>,
809      macros.  Thread-safe reference counting using an atomically
810      incremented/decremented refcount variable causes Helgrind
811      problems because a one-to-zero transition of the reference count
812      means the accessing thread has exclusive ownership of the
813      associated resource (normally, a C++ object) and can therefore
814      access it (normally, to run its destructor) without locking.
815      Helgrind doesn't understand this, and markup is essential to
816      avoid false positives.
817    </para>
818
819    <para>
820      Here are recommended guidelines for marking up thread safe
821      reference counting in C++.  You only need to mark up your
822      release methods -- the ones which decrement the reference count.
823      Given a class like this:
824    </para>
825
826<programlisting><![CDATA[
827class MyClass {
828   unsigned int mRefCount;
829
830   void Release ( void ) {
831      unsigned int newCount = atomic_decrement(&mRefCount);
832      if (newCount == 0) {
833         delete this;
834      }
835   }
836}
837]]></programlisting>
838
839   <para>
840     the release method should be marked up as follows:
841   </para>
842
843<programlisting><![CDATA[
844   void Release ( void ) {
845      unsigned int newCount = atomic_decrement(&mRefCount);
846      if (newCount == 0) {
847         ANNOTATE_HAPPENS_AFTER(&mRefCount);
848         ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount);
849         delete this;
850      } else {
851         ANNOTATE_HAPPENS_BEFORE(&mRefCount);
852      }
853   }
854]]></programlisting>
855
856    <para>
857      There are a number of complex, mostly-theoretical objections to
858      this scheme.  From a theoretical standpoint it appears to be
859      impossible to devise a markup scheme which is completely correct
860      in the sense of guaranteeing to remove all false races.  The
861      proposed scheme however works well in practice.
862    </para>
863
864  </listitem>
865
866  <listitem>
867    <para>Avoid memory recycling.  If you can't avoid it, you must use
868    tell Helgrind what is going on via the
869    <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
870    <computeroutput>helgrind.h</computeroutput>).</para>
871
872    <para>Helgrind is aware of standard heap memory allocation and
873    deallocation that occurs via
874    <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
875    and from entry and exit of stack frames.  In particular, when memory is
876    deallocated via <function>free</function>, <function>delete</function>,
877    or function exit, Helgrind considers that memory clean, so when it is
878    eventually reallocated, its history is irrelevant.</para>
879
880    <para>However, it is common practice to implement memory recycling
881    schemes.  In these, memory to be freed is not handed to
882    <function>free</function>/<function>delete</function>, but instead put
883    into a pool of free buffers to be handed out again as required.  The
884    problem is that Helgrind has no
885    way to know that such memory is logically no longer in use, and
886    its history is irrelevant.  Hence you must make that explicit,
887    using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
888    to specify the relevant address ranges.  It's easiest to put these
889    requests into the pool manager code, and use them either when memory is
890    returned to the pool, or is allocated from it.</para>
891  </listitem>
892
893  <listitem>
894    <para>Avoid POSIX condition variables.  If you can, use POSIX
895    semaphores (<function>sem_t</function>, <function>sem_post</function>,
896    <function>sem_wait</function>) to do inter-thread event signalling.
897    Semaphores with an initial value of zero are particularly useful for
898    this.</para>
899
900    <para>Helgrind only partially correctly handles POSIX condition
901    variables.  This is because Helgrind can see inter-thread
902    dependencies between a <function>pthread_cond_wait</function> call and a
903    <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
904    call only if the waiting thread actually gets to the rendezvous first
905    (so that it actually calls
906    <function>pthread_cond_wait</function>).  It can't see dependencies
907    between the threads if the signaller arrives first.  In the latter case,
908    POSIX guidelines imply that the associated boolean condition still
909    provides an inter-thread synchronisation event, but one which is
910    invisible to Helgrind.</para>
911
912    <para>The result of Helgrind missing some inter-thread
913    synchronisation events is to cause it to report false positives.
914    </para>
915
916    <para>The root cause of this synchronisation lossage is
917    particularly hard to understand, so an example is helpful.  It was
918    discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
919    in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
920    canonical POSIX-recommended usage scheme for condition variables
921    is as follows:</para>
922
923<programlisting><![CDATA[
924b   is a Boolean condition, which is False most of the time
925cv  is a condition variable
926mx  is its associated mutex
927
928Signaller:                             Waiter:
929
930lock(mx)                               lock(mx)
931b = True                               while (b == False)
932signal(cv)                                wait(cv,mx)
933unlock(mx)                             unlock(mx)
934]]></programlisting>
935
936    <para>Assume <computeroutput>b</computeroutput> is False most of
937    the time.  If the waiter arrives at the rendezvous first, it
938    enters its while-loop, waits for the signaller to signal, and
939    eventually proceeds.  Helgrind sees the signal, notes the
940    dependency, and all is well.</para>
941
942    <para>If the signaller arrives
943    first, <computeroutput>b</computeroutput> is set to true, and the
944    signal disappears into nowhere.  When the waiter later arrives, it
945    does not enter its while-loop and simply carries on.  But even in
946    this case, the waiter code following the while-loop cannot execute
947    until the signaller sets <computeroutput>b</computeroutput> to
948    True.  Hence there is still the same inter-thread dependency, but
949    this time it is through an arbitrary in-memory condition, and
950    Helgrind cannot see it.</para>
951
952    <para>By comparison, Helgrind's detection of inter-thread
953    dependencies caused by semaphore operations is believed to be
954    exactly correct.</para>
955
956    <para>As far as I know, a solution to this problem that does not
957    require source-level annotation of condition-variable wait loops
958    is beyond the current state of the art.</para>
959  </listitem>
960
961  <listitem>
962    <para>Make sure you are using a supported Linux distribution.  At
963    present, Helgrind only properly supports glibc-2.3 or later.  This
964    in turn means we only support glibc's NPTL threading
965    implementation.  The old LinuxThreads implementation is not
966    supported.</para>
967  </listitem>
968
969  <listitem>
970    <para>Round up all finished threads using
971    <function>pthread_join</function>.  Avoid
972    detaching threads: don't create threads in the detached state, and
973    don't call <function>pthread_detach</function> on existing threads.</para>
974
975    <para>Using <function>pthread_join</function> to round up finished
976    threads provides a clear synchronisation point that both Helgrind and
977    programmers can see.  If you don't call
978    <function>pthread_join</function> on a thread, Helgrind has no way to
979    know when it finishes, relative to any
980    significant synchronisation points for other threads in the program.  So
981    it assumes that the thread lingers indefinitely and can potentially
982    interfere indefinitely with the memory state of the program.  It
983    has every right to assume that -- after all, it might really be
984    the case that, for scheduling reasons, the exiting thread did run
985    very slowly in the last stages of its life.</para>
986  </listitem>
987
988  <listitem>
989    <para>Perform thread debugging (with Helgrind) and memory
990    debugging (with Memcheck) together.</para>
991
992    <para>Helgrind tracks the state of memory in detail, and memory
993    management bugs in the application are liable to cause confusion.
994    In extreme cases, applications which do many invalid reads and
995    writes (particularly to freed memory) have been known to crash
996    Helgrind.  So, ideally, you should make your application
997    Memcheck-clean before using Helgrind.</para>
998
999    <para>It may be impossible to make your application Memcheck-clean
1000    unless you first remove threading bugs.  In particular, it may be
1001    difficult to remove all reads and writes to freed memory in
1002    multithreaded C++ destructor sequences at program termination.
1003    So, ideally, you should make your application Helgrind-clean
1004    before using Memcheck.</para>
1005
1006    <para>Since this circularity is obviously unresolvable, at least
1007    bear in mind that Memcheck and Helgrind are to some extent
1008    complementary, and you may need to use them together.</para>
1009  </listitem>
1010
1011  <listitem>
1012    <para>POSIX requires that implementations of standard I/O
1013    (<function>printf</function>, <function>fprintf</function>,
1014    <function>fwrite</function>, <function>fread</function>, etc) are thread
1015    safe.  Unfortunately GNU libc implements this by using internal locking
1016    primitives that Helgrind is unable to intercept.  Consequently Helgrind
1017    generates many false race reports when you use these functions.</para>
1018
1019    <para>Helgrind attempts to hide these errors using the standard
1020    Valgrind error-suppression mechanism.  So, at least for simple
1021    test cases, you don't see any.  Nevertheless, some may slip
1022    through.  Just something to be aware of.</para>
1023  </listitem>
1024
1025  <listitem>
1026    <para>Helgrind's error checks do not work properly inside the
1027    system threading library itself
1028    (<computeroutput>libpthread.so</computeroutput>), and it usually
1029    observes large numbers of (false) errors in there.  Valgrind's
1030    suppression system then filters these out, so you should not see
1031    them.</para>
1032
1033    <para>If you see any race errors reported
1034    where <computeroutput>libpthread.so</computeroutput> or
1035    <computeroutput>ld.so</computeroutput> is the object associated
1036    with the innermost stack frame, please file a bug report at
1037    <ulink url="&vg-url;">&vg-url;</ulink>.
1038    </para>
1039  </listitem>
1040
1041</orderedlist>
1042
1043</sect1>
1044
1045
1046
1047
1048<sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options">
1049<title>Helgrind Command-line Options</title>
1050
1051<para>The following end-user options are available:</para>
1052
1053<!-- start of xi:include in the manpage -->
1054<variablelist id="hg.opts.list">
1055
1056  <varlistentry id="opt.free-is-write"
1057                xreflabel="--free-is-write">
1058    <term>
1059      <option><![CDATA[--free-is-write=no|yes
1060      [default: no] ]]></option>
1061    </term>
1062    <listitem>
1063      <para>When enabled (not the default), Helgrind treats freeing of
1064        heap memory as if the memory was written immediately before
1065        the free.  This exposes races where memory is referenced by
1066        one thread, and freed by another, but there is no observable
1067        synchronisation event to ensure that the reference happens
1068        before the free.
1069      </para>
1070      <para>This functionality is new in Valgrind 3.7.0, and is
1071        regarded as experimental.  It is not enabled by default
1072        because its interaction with custom memory allocators is not
1073        well understood at present.  User feedback is welcomed.
1074      </para>
1075    </listitem>
1076  </varlistentry>
1077
1078  <varlistentry id="opt.track-lockorders"
1079                xreflabel="--track-lockorders">
1080    <term>
1081      <option><![CDATA[--track-lockorders=no|yes
1082      [default: yes] ]]></option>
1083    </term>
1084    <listitem>
1085      <para>When enabled (the default), Helgrind performs lock order
1086      consistency checking.  For some buggy programs, the large number
1087      of lock order errors reported can become annoying, particularly
1088      if you're only interested in race errors.  You may therefore find
1089      it helpful to disable lock order checking.</para>
1090    </listitem>
1091  </varlistentry>
1092
1093  <varlistentry id="opt.history-level"
1094                xreflabel="--history-level">
1095    <term>
1096      <option><![CDATA[--history-level=none|approx|full
1097      [default: full] ]]></option>
1098    </term>
1099    <listitem>
1100      <para><option>--history-level=full</option> (the default) causes
1101        Helgrind collects enough information about "old" accesses that
1102        it can produce two stack traces in a race report -- both the
1103        stack trace for the current access, and the trace for the
1104        older, conflicting access. To limit memory usage, "old" accesses
1105        stack traces are limited to a maximum of 8 entries, even if
1106        <option>--num-callers</option> value is bigger.</para>
1107      <para>Collecting such information is expensive in both speed and
1108        memory, particularly for programs that do many inter-thread
1109        synchronisation events (locks, unlocks, etc).  Without such
1110        information, it is more difficult to track down the root
1111        causes of races.  Nonetheless, you may not need it in
1112        situations where you just want to check for the presence or
1113        absence of races, for example, when doing regression testing
1114        of a previously race-free program.</para>
1115      <para><option>--history-level=none</option> is the opposite
1116        extreme.  It causes Helgrind not to collect any information
1117        about previous accesses.  This can be dramatically faster
1118        than <option>--history-level=full</option>.</para>
1119      <para><option>--history-level=approx</option> provides a
1120        compromise between these two extremes.  It causes Helgrind to
1121        show a full trace for the later access, and approximate
1122        information regarding the earlier access.  This approximate
1123        information consists of two stacks, and the earlier access is
1124        guaranteed to have occurred somewhere between program points
1125        denoted by the two stacks. This is not as useful as showing
1126        the exact stack for the previous access
1127        (as <option>--history-level=full</option> does), but it is
1128        better than nothing, and it is almost as fast as
1129        <option>--history-level=none</option>.</para>
1130    </listitem>
1131  </varlistentry>
1132
1133  <varlistentry id="opt.conflict-cache-size"
1134                xreflabel="--conflict-cache-size">
1135    <term>
1136      <option><![CDATA[--conflict-cache-size=N
1137      [default: 1000000] ]]></option>
1138    </term>
1139    <listitem>
1140      <para>This flag only has any effect
1141        at <option>--history-level=full</option>.</para>
1142      <para>Information about "old" conflicting accesses is stored in
1143        a cache of limited size, with LRU-style management.  This is
1144        necessary because it isn't practical to store a stack trace
1145        for every single memory access made by the program.
1146        Historical information on not recently accessed locations is
1147        periodically discarded, to free up space in the cache.</para>
1148      <para>This option controls the size of the cache, in terms of the
1149        number of different memory addresses for which
1150        conflicting access information is stored.  If you find that
1151        Helgrind is showing race errors with only one stack instead of
1152        the expected two stacks, try increasing this value.</para>
1153      <para>The minimum value is 10,000 and the maximum is 30,000,000
1154        (thirty times the default value).  Increasing the value by 1
1155        increases Helgrind's memory requirement by very roughly 100
1156        bytes, so the maximum value will easily eat up three extra
1157        gigabytes or so of memory.</para>
1158    </listitem>
1159  </varlistentry>
1160
1161  <varlistentry id="opt.check-stack-refs"
1162                xreflabel="--check-stack-refs">
1163    <term>
1164      <option><![CDATA[--check-stack-refs=no|yes
1165      [default: yes] ]]></option>
1166    </term>
1167    <listitem>
1168      <para>
1169        By default Helgrind checks all data memory accesses made by your
1170        program.  This flag enables you to skip checking for accesses
1171        to thread stacks (local variables).  This can improve
1172        performance, but comes at the cost of missing races on
1173        stack-allocated data.
1174      </para>
1175    </listitem>
1176  </varlistentry>
1177
1178
1179</variablelist>
1180<!-- end of xi:include in the manpage -->
1181
1182<!-- start of xi:include in the manpage -->
1183<!--  commented out, because we don't document debugging options in the
1184      manual.  Nb: all the double-dashes below had a space inserted in them
1185      to avoid problems with premature closing of this comment.
1186<para>In addition, the following debugging options are available for
1187Helgrind:</para>
1188
1189<variablelist id="hg.debugopts.list">
1190
1191  <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
1192    <term>
1193      <option><![CDATA[- -trace-malloc=no|yes [no]
1194      ]]></option>
1195    </term>
1196    <listitem>
1197      <para>Show all client <function>malloc</function> (etc) and
1198      <function>free</function> (etc) requests.</para>
1199    </listitem>
1200  </varlistentry>
1201
1202  <varlistentry id="opt.cmp-race-err-addrs"
1203                xreflabel="- -cmp-race-err-addrs">
1204    <term>
1205      <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
1206      ]]></option>
1207    </term>
1208    <listitem>
1209      <para>Controls whether or not race (data) addresses should be
1210        taken into account when removing duplicates of race errors.
1211        With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
1212        identical race errors will be considered to be the same if
1213        their race addresses differ.  With
1214        With <varname>- -cmp-race-err-addrs=yes</varname> they will be
1215        considered different.  This is provided to help make certain
1216        regression tests work reliably.</para>
1217    </listitem>
1218  </varlistentry>
1219
1220  <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
1221    <term>
1222      <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
1223      ]]></option>
1224    </term>
1225    <listitem>
1226      <para>Run extensive sanity checks on Helgrind's internal
1227        data structures at events defined by the bitstring, as
1228        follows:</para>
1229      <para><computeroutput>010000 </computeroutput>after changes to
1230        the lock order acquisition graph</para>
1231      <para><computeroutput>001000 </computeroutput>after every client
1232        memory access (NB: not currently used)</para>
1233      <para><computeroutput>000100 </computeroutput>after every client
1234        memory range permission setting of 256 bytes or greater</para>
1235      <para><computeroutput>000010 </computeroutput>after every client
1236        lock or unlock event</para>
1237      <para><computeroutput>000001 </computeroutput>after every client
1238        thread creation or joinage event</para>
1239      <para>Note these will make Helgrind run very slowly, often to
1240        the point of being completely unusable.</para>
1241    </listitem>
1242  </varlistentry>
1243
1244</variablelist>
1245-->
1246<!-- end of xi:include in the manpage -->
1247
1248
1249</sect1>
1250
1251
1252
1253<sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
1254<title>Helgrind Client Requests</title>
1255
1256<para>The following client requests are defined in
1257<filename>helgrind.h</filename>.  See that file for exact details of their
1258arguments.</para>
1259
1260<itemizedlist>
1261
1262  <listitem>
1263    <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para>
1264    <para>This makes Helgrind forget everything it knows about a
1265    specified memory range.  This is particularly useful for memory
1266    allocators that wish to recycle memory.</para>
1267  </listitem>
1268  <listitem>
1269    <para><function>ANNOTATE_HAPPENS_BEFORE</function></para>
1270  </listitem>
1271  <listitem>
1272    <para><function>ANNOTATE_HAPPENS_AFTER</function></para>
1273  </listitem>
1274  <listitem>
1275    <para><function>ANNOTATE_NEW_MEMORY</function></para>
1276  </listitem>
1277  <listitem>
1278    <para><function>ANNOTATE_RWLOCK_CREATE</function></para>
1279  </listitem>
1280  <listitem>
1281    <para><function>ANNOTATE_RWLOCK_DESTROY</function></para>
1282  </listitem>
1283  <listitem>
1284    <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para>
1285  </listitem>
1286  <listitem>
1287    <para><function>ANNOTATE_RWLOCK_RELEASED</function></para>
1288    <para>These are used to describe to Helgrind, the behaviour of
1289    custom (non-POSIX) synchronisation primitives, which it otherwise
1290    has no way to understand.  See comments
1291    in <filename>helgrind.h</filename> for further
1292    documentation.</para>
1293  </listitem>
1294
1295</itemizedlist>
1296
1297</sect1>
1298
1299
1300
1301<sect1 id="hg-manual.todolist" xreflabel="To Do List">
1302<title>A To-Do List for Helgrind</title>
1303
1304<para>The following is a list of loose ends which should be tidied up
1305some time.</para>
1306
1307<itemizedlist>
1308  <listitem><para>For lock order errors, print the complete lock
1309    cycle, rather than only doing for size-2 cycles as at
1310    present.</para>
1311  </listitem>
1312  <listitem><para>The conflicting access mechanism sometimes
1313    mysteriously fails to show the conflicting access' stack, even
1314    when provided with unbounded storage for conflicting access info.
1315    This should be investigated.</para>
1316  </listitem>
1317  <listitem><para>Document races caused by GCC's thread-unsafe code
1318    generation for speculative stores.  In the interim see
1319    <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
1320    </computeroutput>
1321    and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
1322    </para>
1323  </listitem>
1324  <listitem><para>Don't update the lock-order graph, and don't check
1325    for errors, when a "try"-style lock operation happens (e.g.
1326    <function>pthread_mutex_trylock</function>).  Such calls do not add any real
1327    restrictions to the locking order, since they can always fail to
1328    acquire the lock, resulting in the caller going off and doing Plan
1329    B (presumably it will have a Plan B).  Doing such checks could
1330    generate false lock-order errors and confuse users.</para>
1331  </listitem>
1332  <listitem><para> Performance can be very poor.  Slowdowns on the
1333    order of 100:1 are not unusual.  There is limited scope for
1334    performance improvements.
1335    </para>
1336  </listitem>
1337
1338</itemizedlist>
1339
1340</sect1>
1341
1342</chapter>
1343