• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
7<chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
8  <title>Helgrind: a thread error detector</title>
9
10<para>To use this tool, you must specify
11<option>--tool=helgrind</option> on the Valgrind
12command line.</para>
13
14
15<sect1 id="hg-manual.overview" xreflabel="Overview">
16<title>Overview</title>
17
18<para>Helgrind is a Valgrind tool for detecting synchronisation errors
19in C, C++ and Fortran programs that use the POSIX pthreads
20threading primitives.</para>
21
22<para>The main abstractions in POSIX pthreads are: a set of threads
23sharing a common address space, thread creation, thread joining,
24thread exit, mutexes (locks), condition variables (inter-thread event
25notifications), reader-writer locks, spinlocks, semaphores and
26barriers.</para>
27
28<para>Helgrind can detect three classes of errors, which are discussed
29in detail in the next three sections:</para>
30
31<orderedlist>
32 <listitem>
33  <para><link linkend="hg-manual.api-checks">
34        Misuses of the POSIX pthreads API.</link></para>
35 </listitem>
36 <listitem>
37  <para><link linkend="hg-manual.lock-orders">
38        Potential deadlocks arising from lock
39        ordering problems.</link></para>
40 </listitem>
41 <listitem>
42  <para><link linkend="hg-manual.data-races">
43        Data races -- accessing memory without adequate locking
44                      or synchronisation</link>.
45  </para>
46 </listitem>
47</orderedlist>
48
49<para>Problems like these often result in unreproducible,
50timing-dependent crashes, deadlocks and other misbehaviour, and
51can be difficult to find by other means.</para>
52
53<para>Helgrind is aware of all the pthread abstractions and tracks
54their effects as accurately as it can.  On x86 and amd64 platforms, it
55understands and partially handles implicit locking arising from the
56use of the LOCK instruction prefix.  On PowerPC/POWER and ARM
57platforms, it partially handles implicit locking arising from
58load-linked and store-conditional instruction pairs.
59</para>
60
61<para>Helgrind works best when your application uses only the POSIX
62pthreads API.  However, if you want to use custom threading
63primitives, you can describe their behaviour to Helgrind using the
64<varname>ANNOTATE_*</varname> macros defined
65in <varname>helgrind.h</varname>.</para>
66
67
68
69<para>Following those is a section containing
70<link linkend="hg-manual.effective-use">
71hints and tips on how to get the best out of Helgrind.</link>
72</para>
73
74<para>Then there is a
75<link linkend="hg-manual.options">summary of command-line
76options.</link>
77</para>
78
79<para>Finally, there is
80<link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
81could be improved.</link>
82</para>
83
84</sect1>
85
86
87
88
89<sect1 id="hg-manual.api-checks" xreflabel="API Checks">
90<title>Detected errors: Misuses of the POSIX pthreads API</title>
91
92<para>Helgrind intercepts calls to many POSIX pthreads functions, and
93is therefore able to report on various common problems.  Although
94these are unglamourous errors, their presence can lead to undefined
95program behaviour and hard-to-find bugs later on.  The detected errors
96are:</para>
97
98<itemizedlist>
99 <listitem><para>unlocking an invalid mutex</para></listitem>
100 <listitem><para>unlocking a not-locked mutex</para></listitem>
101 <listitem><para>unlocking a mutex held by a different
102                 thread</para></listitem>
103 <listitem><para>destroying an invalid or a locked mutex</para></listitem>
104 <listitem><para>recursively locking a non-recursive mutex</para></listitem>
105 <listitem><para>deallocation of memory that contains a
106                 locked mutex</para></listitem>
107 <listitem><para>passing mutex arguments to functions expecting
108                 reader-writer lock arguments, and vice
109                 versa</para></listitem>
110 <listitem><para>when a POSIX pthread function fails with an
111                 error code that must be handled</para></listitem>
112 <listitem><para>when a thread exits whilst still holding locked
113                 locks</para></listitem>
114 <listitem><para>calling <function>pthread_cond_wait</function>
115                 with a not-locked mutex, an invalid mutex,
116                 or one locked by a different
117                 thread</para></listitem>
118 <listitem><para>inconsistent bindings between condition
119                 variables and their associated mutexes</para></listitem>
120 <listitem><para>invalid or duplicate initialisation of a pthread
121                 barrier</para></listitem>
122 <listitem><para>initialisation of a pthread barrier on which threads
123                 are still waiting</para></listitem>
124 <listitem><para>destruction of a pthread barrier object which was
125                 never initialised, or on which threads are still
126                 waiting</para></listitem>
127 <listitem><para>waiting on an uninitialised pthread
128                 barrier</para></listitem>
129 <listitem><para>for all of the pthreads functions that Helgrind
130                 intercepts, an error is reported, along with a stack
131                 trace, if the system threading library routine returns
132                 an error code, even if Helgrind itself detected no
133                 error</para></listitem>
134</itemizedlist>
135
136<para>Checks pertaining to the validity of mutexes are generally also
137performed for reader-writer locks.</para>
138
139<para>Various kinds of this-can't-possibly-happen events are also
140reported.  These usually indicate bugs in the system threading
141library.</para>
142
143<para>Reported errors always contain a primary stack trace indicating
144where the error was detected.  They may also contain auxiliary stack
145traces giving additional information.  In particular, most errors
146relating to mutexes will also tell you where that mutex first came to
147Helgrind's attention (the "<computeroutput>was first observed
148at</computeroutput>" part), so you have a chance of figuring out which
149mutex it is referring to.  For example:</para>
150
151<programlisting><![CDATA[
152Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
153   at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
154   by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
155   by 0x40079B: main (tc09_bad_unlock.c:50)
156  Lock at 0x7FEFFFA90 was first observed
157   at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
158   by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
159   by 0x40079B: main (tc09_bad_unlock.c:50)
160]]></programlisting>
161
162<para>Helgrind has a way of summarising thread identities, as
163you see here with the text "<computeroutput>Thread
164#1</computeroutput>".  This is so that it can speak about threads and
165sets of threads without overwhelming you with details.  See
166<link linkend="hg-manual.data-races.errmsgs">below</link>
167for more information on interpreting error messages.</para>
168
169</sect1>
170
171
172
173
174<sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
175<title>Detected errors: Inconsistent Lock Orderings</title>
176
177<para>In this section, and in general, to "acquire" a lock simply
178means to lock that lock, and to "release" a lock means to unlock
179it.</para>
180
181<para>Helgrind monitors the order in which threads acquire locks.
182This allows it to detect potential deadlocks which could arise from
183the formation of cycles of locks.  Detecting such inconsistencies is
184useful because, whilst actual deadlocks are fairly obvious, potential
185deadlocks may never be discovered during testing and could later lead
186to hard-to-diagnose in-service failures.</para>
187
188<para>The simplest example of such a problem is as
189follows.</para>
190
191<itemizedlist>
192 <listitem><para>Imagine some shared resource R, which, for whatever
193  reason, is guarded by two locks, L1 and L2, which must both be held
194  when R is accessed.</para>
195 </listitem>
196 <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
197  to access R.  The implication of this is that all threads in the
198  program must acquire the two locks in the order first L1 then L2.
199  Not doing so risks deadlock.</para>
200 </listitem>
201 <listitem><para>The deadlock could happen if two threads -- call them
202  T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
203  and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
204  to acquire L1, but those locks are both already held.  So T1 and T2
205  become deadlocked.</para>
206 </listitem>
207</itemizedlist>
208
209<para>Helgrind builds a directed graph indicating the order in which
210locks have been acquired in the past.  When a thread acquires a new
211lock, the graph is updated, and then checked to see if it now contains
212a cycle.  The presence of a cycle indicates a potential deadlock involving
213the locks in the cycle.</para>
214
215<para>In general, Helgrind will choose two locks involved in the cycle
216and show you how their acquisition ordering has become inconsistent.
217It does this by showing the program points that first defined the
218ordering, and the program points which later violated it.  Here is a
219simple example involving just two locks:</para>
220
221<programlisting><![CDATA[
222Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
223
224Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
225   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
226   by 0x400825: main (tc13_laog1.c:23)
227
228 followed by a later acquisition of lock at 0x7FF0006D0
229   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
230   by 0x400853: main (tc13_laog1.c:24)
231
232Required order was established by acquisition of lock at 0x7FF0006D0
233   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
234   by 0x40076D: main (tc13_laog1.c:17)
235
236 followed by a later acquisition of lock at 0x7FF0006A0
237   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
238   by 0x40079B: main (tc13_laog1.c:18)
239]]></programlisting>
240
241<para>When there are more than two locks in the cycle, the error is
242equally serious.  However, at present Helgrind does not show the locks
243involved, sometimes because it that information is not available, but
244also so as to avoid flooding you with information.  For example, here
245is an example involving a cycle of five locks from a naive
246implementation the famous Dining Philosophers problem
247(see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
248In this case Helgrind has detected that all 5 philosophers could
249simultaneously pick up their left fork and then deadlock whilst
250waiting to pick up their right forks.</para>
251
252<programlisting><![CDATA[
253Thread #6: lock order "0x6010C0 before 0x601160" violated
254
255Observed (incorrect) order is: acquisition of lock at 0x601160
256   (stack unavailable)
257
258 followed by a later acquisition of lock at 0x6010C0
259   at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
260   by 0x4007DE: dine (tc14_laog_dinphils.c:19)
261   by 0x4C2CBE7: mythread_wrapper (hg_intercepts.c:219)
262   by 0x4E369C9: start_thread (pthread_create.c:300)
263]]></programlisting>
264
265</sect1>
266
267
268
269
270<sect1 id="hg-manual.data-races" xreflabel="Data Races">
271<title>Detected errors: Data Races</title>
272
273<para>A data race happens, or could happen, when two threads access a
274shared memory location without using suitable locks or other
275synchronisation to ensure single-threaded access.  Such missing
276locking can cause obscure timing dependent bugs.  Ensuring programs
277are race-free is one of the central difficulties of threaded
278programming.</para>
279
280<para>Reliably detecting races is a difficult problem, and most
281of Helgrind's internals are devoted to dealing with it.
282We begin with a simple example.</para>
283
284
285<sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
286<title>A Simple Data Race</title>
287
288<para>About the simplest possible example of a race is as follows.  In
289this program, it is impossible to know what the value
290of <computeroutput>var</computeroutput> is at the end of the program.
291Is it 2 ?  Or 1 ?</para>
292
293<programlisting><![CDATA[
294#include <pthread.h>
295
296int var = 0;
297
298void* child_fn ( void* arg ) {
299   var++; /* Unprotected relative to parent */ /* this is line 6 */
300   return NULL;
301}
302
303int main ( void ) {
304   pthread_t child;
305   pthread_create(&child, NULL, child_fn, NULL);
306   var++; /* Unprotected relative to child */ /* this is line 13 */
307   pthread_join(child, NULL);
308   return 0;
309}
310]]></programlisting>
311
312<para>The problem is there is nothing to
313stop <varname>var</varname> being updated simultaneously
314by both threads.  A correct program would
315protect <varname>var</varname> with a lock of type
316<function>pthread_mutex_t</function>, which is acquired
317before each access and released afterwards.  Helgrind's output for
318this program is:</para>
319
320<programlisting><![CDATA[
321Thread #1 is the program's root thread
322
323Thread #2 was created
324   at 0x511C08E: clone (in /lib64/libc-2.8.so)
325   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
326   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
327   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
328   by 0x400605: main (simple_race.c:12)
329
330Possible data race during read of size 4 at 0x601038 by thread #1
331Locks held: none
332   at 0x400606: main (simple_race.c:13)
333
334This conflicts with a previous write of size 4 by thread #2
335Locks held: none
336   at 0x4005DC: child_fn (simple_race.c:6)
337   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
338   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
339   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
340
341Location 0x601038 is 0 bytes inside global var "var"
342declared at simple_race.c:3
343]]></programlisting>
344
345<para>This is quite a lot of detail for an apparently simple error.
346The last clause is the main error message.  It says there is a race as
347a result of a read of size 4 (bytes), at 0x601038, which is the
348address of <computeroutput>var</computeroutput>, happening in
349function <computeroutput>main</computeroutput> at line 13 in the
350program.</para>
351
352<para>Two important parts of the message are:</para>
353
354<itemizedlist>
355 <listitem>
356  <para>Helgrind shows two stack traces for the error, not one.  By
357   definition, a race involves two different threads accessing the
358   same location in such a way that the result depends on the relative
359   speeds of the two threads.</para>
360  <para>
361   The first stack trace follows the text "<computeroutput>Possible
362   data race during read of size 4 ...</computeroutput>" and the
363   second trace follows the text "<computeroutput>This conflicts with
364   a previous write of size 4 ...</computeroutput>".  Helgrind is
365   usually able to show both accesses involved in a race.  At least
366   one of these will be a write (since two concurrent, unsynchronised
367   reads are harmless), and they will of course be from different
368   threads.</para>
369  <para>By examining your program at the two locations, you should be
370   able to get at least some idea of what the root cause of the
371   problem is.  For each location, Helgrind shows the set of locks
372   held at the time of the access.  This often makes it clear which
373   thread, if any, failed to take a required lock.  In this example
374   neither thread holds a lock during the access.</para>
375 </listitem>
376 <listitem>
377  <para>For races which occur on global or stack variables, Helgrind
378   tries to identify the name and defining point of the variable.
379   Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside
380   global var "var" declared at simple_race.c:3</computeroutput>".</para>
381  <para>Showing names of stack and global variables carries no
382   run-time overhead once Helgrind has your program up and running.
383   However, it does require Helgrind to spend considerable extra time
384   and memory at program startup to read the relevant debug info.
385   Hence this facility is disabled by default.  To enable it, you need
386   to give the <varname>--read-var-info=yes</varname> option to
387   Helgrind.</para>
388 </listitem>
389</itemizedlist>
390
391<para>The following section explains Helgrind's race detection
392algorithm in more detail.</para>
393
394</sect2>
395
396
397
398<sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm">
399<title>Helgrind's Race Detection Algorithm</title>
400
401<para>Most programmers think about threaded programming in terms of
402the basic functionality provided by the threading library (POSIX
403Pthreads): thread creation, thread joining, locks, condition
404variables, semaphores and barriers.</para>
405
406<para>The effect of using these functions is to impose
407constraints upon the order in which memory accesses can
408happen.  This implied ordering is generally known as the
409"happens-before relation".  Once you understand the happens-before
410relation, it is easy to see how Helgrind finds races in your code.
411Fortunately, the happens-before relation is itself easy to understand,
412and is by itself a useful tool for reasoning about the behaviour of
413parallel programs.  We now introduce it using a simple example.</para>
414
415<para>Consider first the following buggy program:</para>
416
417<programlisting><![CDATA[
418Parent thread:                         Child thread:
419
420int var;
421
422// create child thread
423pthread_create(...)
424var = 20;                              var = 10;
425                                       exit
426
427// wait for child
428pthread_join(...)
429printf("%d\n", var);
430]]></programlisting>
431
432<para>The parent thread creates a child.  Both then write different
433values to some variable <computeroutput>var</computeroutput>, and the
434parent then waits for the child to exit.</para>
435
436<para>What is the value of <computeroutput>var</computeroutput> at the
437end of the program, 10 or 20?  We don't know.  The program is
438considered buggy (it has a race) because the final value
439of <computeroutput>var</computeroutput> depends on the relative rates
440of progress of the parent and child threads.  If the parent is fast
441and the child is slow, then the child's assignment may happen later,
442so the final value will be 10; and vice versa if the child is faster
443than the parent.</para>
444
445<para>The relative rates of progress of parent vs child is not something
446the programmer can control, and will often change from run to run.
447It depends on factors such as the load on the machine, what else is
448running, the kernel's scheduling strategy, and many other factors.</para>
449
450<para>The obvious fix is to use a lock to
451protect <computeroutput>var</computeroutput>.  It is however
452instructive to consider a somewhat more abstract solution, which is to
453send a message from one thread to the other:</para>
454
455<programlisting><![CDATA[
456Parent thread:                         Child thread:
457
458int var;
459
460// create child thread
461pthread_create(...)
462var = 20;
463// send message to child
464                                       // wait for message to arrive
465                                       var = 10;
466                                       exit
467
468// wait for child
469pthread_join(...)
470printf("%d\n", var);
471]]></programlisting>
472
473<para>Now the program reliably prints "10", regardless of the speed of
474the threads.  Why?  Because the child's assignment cannot happen until
475after it receives the message.  And the message is not sent until
476after the parent's assignment is done.</para>
477
478<para>The message transmission creates a "happens-before" dependency
479between the two assignments: <computeroutput>var = 20;</computeroutput>
480must now happen-before <computeroutput>var = 10;</computeroutput>.
481And so there is no longer a race
482on <computeroutput>var</computeroutput>.
483</para>
484
485<para>Note that it's not significant that the parent sends a message
486to the child.  Sending a message from the child (after its assignment)
487to the parent (before its assignment) would also fix the problem, causing
488the program to reliably print "20".</para>
489
490<para>Helgrind's algorithm is (conceptually) very simple.  It monitors all
491accesses to memory locations.  If a location -- in this example,
492<computeroutput>var</computeroutput>,
493is accessed by two different threads, Helgrind checks to see if the
494two accesses are ordered by the happens-before relation.  If so,
495that's fine; if not, it reports a race.</para>
496
497<para>It is important to understand that the happens-before relation
498creates only a partial ordering, not a total ordering.  An example of
499a total ordering is comparison of numbers: for any two numbers
500<computeroutput>x</computeroutput> and
501<computeroutput>y</computeroutput>, either
502<computeroutput>x</computeroutput> is less than, equal to, or greater
503than
504<computeroutput>y</computeroutput>.  A partial ordering is like a
505total ordering, but it can also express the concept that two elements
506are neither equal, less or greater, but merely unordered with respect
507to each other.</para>
508
509<para>In the fixed example above, we say that
510<computeroutput>var = 20;</computeroutput> "happens-before"
511<computeroutput>var = 10;</computeroutput>.  But in the original
512version, they are unordered: we cannot say that either happens-before
513the other.</para>
514
515<para>What does it mean to say that two accesses from different
516threads are ordered by the happens-before relation?  It means that
517there is some chain of inter-thread synchronisation operations which
518cause those accesses to happen in a particular order, irrespective of
519the actual rates of progress of the individual threads.  This is a
520required property for a reliable threaded program, which is why
521Helgrind checks for it.</para>
522
523<para>The happens-before relations created by standard threading
524primitives are as follows:</para>
525
526<itemizedlist>
527 <listitem><para>When a mutex is unlocked by thread T1 and later (or
528  immediately) locked by thread T2, then the memory accesses in T1
529  prior to the unlock must happen-before those in T2 after it acquires
530  the lock.</para>
531 </listitem>
532 <listitem><para>The same idea applies to reader-writer locks,
533  although with some complication so as to allow correct handling of
534  reads vs writes.</para>
535 </listitem>
536 <listitem><para>When a condition variable (CV) is signalled on by
537  thread T1 and some other thread T2 is thereby released from a wait
538  on the same CV, then the memory accesses in T1 prior to the
539  signalling must happen-before those in T2 after it returns from the
540  wait.  If no thread was waiting on the CV then there is no
541  effect.</para>
542 </listitem>
543 <listitem><para>If instead T1 broadcasts on a CV, then all of the
544  waiting threads, rather than just one of them, acquire a
545  happens-before dependency on the broadcasting thread at the point it
546  did the broadcast.</para>
547 </listitem>
548 <listitem><para>A thread T2 that continues after completing sem_wait
549  on a semaphore that thread T1 posts on, acquires a happens-before
550  dependence on the posting thread, a bit like dependencies caused
551  mutex unlock-lock pairs.  However, since a semaphore can be posted
552  on many times, it is unspecified from which of the post calls the
553  wait call gets its happens-before dependency.</para>
554 </listitem>
555 <listitem><para>For a group of threads T1 .. Tn which arrive at a
556  barrier and then move on, each thread after the call has a
557  happens-after dependency from all threads before the
558  barrier.</para>
559 </listitem>
560 <listitem><para>A newly-created child thread acquires an initial
561  happens-after dependency on the point where its parent created it.
562  That is, all memory accesses performed by the parent prior to
563  creating the child are regarded as happening-before all the accesses
564  of the child.</para>
565 </listitem>
566 <listitem><para>Similarly, when an exiting thread is reaped via a
567  call to <function>pthread_join</function>, once the call returns, the
568  reaping thread acquires a happens-after dependency relative to all memory
569  accesses made by the exiting thread.</para>
570 </listitem>
571</itemizedlist>
572
573<para>In summary: Helgrind intercepts the above listed events, and builds a
574directed acyclic graph represented the collective happens-before
575dependencies.  It also monitors all memory accesses.</para>
576
577<para>If a location is accessed by two different threads, but Helgrind
578cannot find any path through the happens-before graph from one access
579to the other, then it reports a race.</para>
580
581<para>There are a couple of caveats:</para>
582
583<itemizedlist>
584 <listitem><para>Helgrind doesn't check for a race in the case where
585  both accesses are reads.  That would be silly, since concurrent
586  reads are harmless.</para>
587 </listitem>
588 <listitem><para>Two accesses are considered to be ordered by the
589  happens-before dependency even through arbitrarily long chains of
590  synchronisation events.  For example, if T1 accesses some location
591  L, and then <function>pthread_cond_signals</function> T2, which later
592  <function>pthread_cond_signals</function> T3, which then accesses L, then
593  a suitable happens-before dependency exists between the first and second
594  accesses, even though it involves two different inter-thread
595  synchronisation events.</para>
596 </listitem>
597</itemizedlist>
598
599</sect2>
600
601
602
603<sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
604<title>Interpreting Race Error Messages</title>
605
606<para>Helgrind's race detection algorithm collects a lot of
607information, and tries to present it in a helpful way when a race is
608detected.  Here's an example:</para>
609
610<programlisting><![CDATA[
611Thread #2 was created
612   at 0x511C08E: clone (in /lib64/libc-2.8.so)
613   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
614   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
615   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
616   by 0x4008F2: main (tc21_pthonce.c:86)
617
618Thread #3 was created
619   at 0x511C08E: clone (in /lib64/libc-2.8.so)
620   by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
621   by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
622   by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
623   by 0x4008F2: main (tc21_pthonce.c:86)
624
625Possible data race during read of size 4 at 0x601070 by thread #3
626Locks held: none
627   at 0x40087A: child (tc21_pthonce.c:74)
628   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
629   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
630   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
631
632This conflicts with a previous write of size 4 by thread #2
633Locks held: none
634   at 0x400883: child (tc21_pthonce.c:74)
635   by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
636   by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
637   by 0x511C0CC: clone (in /lib64/libc-2.8.so)
638
639Location 0x601070 is 0 bytes inside local var "unprotected2"
640declared at tc21_pthonce.c:51, in frame #0 of thread 3
641]]></programlisting>
642
643<para>Helgrind first announces the creation points of any threads
644referenced in the error message.  This is so it can speak concisely
645about threads without repeatedly printing their creation point call
646stacks.  Each thread is only ever announced once, the first time it
647appears in any Helgrind error message.</para>
648
649<para>The main error message begins at the text
650"<computeroutput>Possible data race during read</computeroutput>".  At
651the start is information you would expect to see -- address and size
652of the racing access, whether a read or a write, and the call stack at
653the point it was detected.</para>
654
655<para>A second call stack is presented starting at the text
656"<computeroutput>This conflicts with a previous
657write</computeroutput>".  This shows a previous access which also
658accessed the stated address, and which is believed to be racing
659against the access in the first call stack.</para>
660
661<para>Finally, Helgrind may attempt to give a description of the
662raced-on address in source level terms.  In this example, it
663identifies it as a local variable, shows its name, declaration point,
664and in which frame (of the first call stack) it lives.  Note that this
665information is only shown when <varname>--read-var-info=yes</varname>
666is specified on the command line.  That's because reading the DWARF3
667debug information in enough detail to capture variable type and
668location information makes Helgrind much slower at startup, and also
669requires considerable amounts of memory, for large programs.
670</para>
671
672<para>Once you have your two call stacks, how do you find the root
673cause of the race?</para>
674
675<para>The first thing to do is examine the source locations referred
676to by each call stack.  They should both show an access to the same
677location, or variable.</para>
678
679<para>Now figure out how how that location should have been made
680thread-safe:</para>
681
682<itemizedlist>
683 <listitem><para>Perhaps the location was intended to be protected by
684  a mutex?  If so, you need to lock and unlock the mutex at both
685  access points, even if one of the accesses is reported to be a read.
686  Did you perhaps forget the locking at one or other of the accesses?
687  To help you do this, Helgrind shows the set of locks held by each
688  threads at the time they accessed the raced-on location.</para>
689 </listitem>
690 <listitem><para>Alternatively, perhaps you intended to use a some
691  other scheme to make it safe, such as signalling on a condition
692  variable.  In all such cases, try to find a synchronisation event
693  (or a chain thereof) which separates the earlier-observed access (as
694  shown in the second call stack) from the later-observed access (as
695  shown in the first call stack).  In other words, try to find
696  evidence that the earlier access "happens-before" the later access.
697  See the previous subsection for an explanation of the happens-before
698  relation.</para>
699  <para>
700  The fact that Helgrind is reporting a race means it did not observe
701  any happens-before relation between the two accesses.  If
702  Helgrind is working correctly, it should also be the case that you
703  also cannot find any such relation, even on detailed inspection
704  of the source code.  Hopefully, though, your inspection of the code
705  will show where the missing synchronisation operation(s) should have
706  been.</para>
707 </listitem>
708</itemizedlist>
709
710</sect2>
711
712
713</sect1>
714
715<sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
716<title>Hints and Tips for Effective Use of Helgrind</title>
717
718<para>Helgrind can be very helpful in finding and resolving
719threading-related problems.  Like all sophisticated tools, it is most
720effective when you understand how to play to its strengths.</para>
721
722<para>Helgrind will be less effective when you merely throw an
723existing threaded program at it and try to make sense of any reported
724errors.  It will be more effective if you design threaded programs
725from the start in a way that helps Helgrind verify correctness.  The
726same is true for finding memory errors with Memcheck, but applies more
727here, because thread checking is a harder problem.  Consequently it is
728much easier to write a correct program for which Helgrind falsely
729reports (threading) errors than it is to write a correct program for
730which Memcheck falsely reports (memory) errors.</para>
731
732<para>With that in mind, here are some tips, listed most important first,
733for getting reliable results and avoiding false errors.  The first two
734are critical.  Any violations of them will swamp you with huge numbers
735of false data-race errors.</para>
736
737
738<orderedlist>
739
740  <listitem>
741    <para>Make sure your application, and all the libraries it uses,
742    use the POSIX threading primitives.  Helgrind needs to be able to
743    see all events pertaining to thread creation, exit, locking and
744    other synchronisation events.  To do so it intercepts many POSIX
745    pthreads functions.</para>
746
747    <para>Do not roll your own threading primitives (mutexes, etc)
748    from combinations of the Linux futex syscall, atomic counters, etc.
749    These throw Helgrind's internal what's-going-on models
750    way off course and will give bogus results.</para>
751
752    <para>Also, do not reimplement existing POSIX abstractions using
753    other POSIX abstractions.  For example, don't build your own
754    semaphore routines or reader-writer locks from POSIX mutexes and
755    condition variables.  Instead use POSIX reader-writer locks and
756    semaphores directly, since Helgrind supports them directly.</para>
757
758    <para>Helgrind directly supports the following POSIX threading
759    abstractions: mutexes, reader-writer locks, condition variables
760    (but see below), semaphores and barriers.  Currently spinlocks
761    are not supported, although they could be in future.</para>
762
763    <para>At the time of writing, the following popular Linux packages
764    are known to implement their own threading primitives:</para>
765
766    <itemizedlist>
767     <listitem><para>Qt version 4.X.  Qt 3.X is harmless in that it
768      only uses POSIX pthreads primitives.  Unfortunately Qt 4.X
769      has its own implementation of mutexes (QMutex) and thread reaping.
770      Helgrind 3.4.x contains direct support
771      for Qt 4.X threading, which is experimental but is believed to
772      work fairly well.  A side effect of supporting Qt 4 directly is
773      that Helgrind can be used to debug KDE4 applications.  As this
774      is an experimental feature, we would particularly appreciate
775      feedback from folks who have used Helgrind to successfully debug
776      Qt 4 and/or KDE4 applications.</para>
777     </listitem>
778     <listitem><para>Runtime support library for GNU OpenMP (part of
779      GCC), at least for GCC versions 4.2 and 4.3.  The GNU OpenMP runtime
780      library (<filename>libgomp.so</filename>) constructs its own
781      synchronisation primitives using combinations of atomic memory
782      instructions and the futex syscall, which causes total chaos since in
783      Helgrind since it cannot "see" those.</para>
784     <para>Fortunately, this can be solved using a configuration-time
785      option (for GCC).  Rebuild GCC from source, and configure using
786      <varname>--disable-linux-futex</varname>.
787      This makes libgomp.so use the standard
788      POSIX threading primitives instead.  Note that this was tested
789      using GCC 4.2.3 and has not been re-tested using more recent GCC
790      versions.  We would appreciate hearing about any successes or
791      failures with more recent versions.</para>
792     </listitem>
793    </itemizedlist>
794
795    <para>If you must implement your own threading primitives, there
796      are a set of client request macros
797      in <computeroutput>helgrind.h</computeroutput> to help you
798      describe your primitives to Helgrind.  You should be able to
799      mark up mutexes, condition variables, etc, without difficulty.
800    </para>
801    <para>
802      It is also possible to mark up the effects of thread-safe
803      reference counting using the
804      <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>,
805      <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput> and
806      <computeroutput>ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</computeroutput>,
807      macros.  Thread-safe reference counting using an atomically
808      incremented/decremented refcount variable causes Helgrind
809      problems because a one-to-zero transition of the reference count
810      means the accessing thread has exclusive ownership of the
811      associated resource (normally, a C++ object) and can therefore
812      access it (normally, to run its destructor) without locking.
813      Helgrind doesn't understand this, and markup is essential to
814      avoid false positives.
815    </para>
816
817    <para>
818      Here are recommended guidelines for marking up thread safe
819      reference counting in C++.  You only need to mark up your
820      release methods -- the ones which decrement the reference count.
821      Given a class like this:
822    </para>
823
824<programlisting><![CDATA[
825class MyClass {
826   unsigned int mRefCount;
827
828   void Release ( void ) {
829      unsigned int newCount = atomic_decrement(&mRefCount);
830      if (newCount == 0) {
831         delete this;
832      }
833   }
834}
835]]></programlisting>
836
837   <para>
838     the release method should be marked up as follows:
839   </para>
840
841<programlisting><![CDATA[
842   void Release ( void ) {
843      unsigned int newCount = atomic_decrement(&mRefCount);
844      if (newCount == 0) {
845         ANNOTATE_HAPPENS_AFTER(&mRefCount);
846         ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount);
847         delete this;
848      } else {
849         ANNOTATE_HAPPENS_BEFORE(&mRefCount);
850      }
851   }
852]]></programlisting>
853
854    <para>
855      There are a number of complex, mostly-theoretical objections to
856      this scheme.  From a theoretical standpoint it appears to be
857      impossible to devise a markup scheme which is completely correct
858      in the sense of guaranteeing to remove all false races.  The
859      proposed scheme however works well in practice.
860    </para>
861
862  </listitem>
863
864  <listitem>
865    <para>Avoid memory recycling.  If you can't avoid it, you must use
866    tell Helgrind what is going on via the
867    <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
868    <computeroutput>helgrind.h</computeroutput>).</para>
869
870    <para>Helgrind is aware of standard heap memory allocation and
871    deallocation that occurs via
872    <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
873    and from entry and exit of stack frames.  In particular, when memory is
874    deallocated via <function>free</function>, <function>delete</function>,
875    or function exit, Helgrind considers that memory clean, so when it is
876    eventually reallocated, its history is irrelevant.</para>
877
878    <para>However, it is common practice to implement memory recycling
879    schemes.  In these, memory to be freed is not handed to
880    <function>free</function>/<function>delete</function>, but instead put
881    into a pool of free buffers to be handed out again as required.  The
882    problem is that Helgrind has no
883    way to know that such memory is logically no longer in use, and
884    its history is irrelevant.  Hence you must make that explicit,
885    using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
886    to specify the relevant address ranges.  It's easiest to put these
887    requests into the pool manager code, and use them either when memory is
888    returned to the pool, or is allocated from it.</para>
889  </listitem>
890
891  <listitem>
892    <para>Avoid POSIX condition variables.  If you can, use POSIX
893    semaphores (<function>sem_t</function>, <function>sem_post</function>,
894    <function>sem_wait</function>) to do inter-thread event signalling.
895    Semaphores with an initial value of zero are particularly useful for
896    this.</para>
897
898    <para>Helgrind only partially correctly handles POSIX condition
899    variables.  This is because Helgrind can see inter-thread
900    dependencies between a <function>pthread_cond_wait</function> call and a
901    <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
902    call only if the waiting thread actually gets to the rendezvous first
903    (so that it actually calls
904    <function>pthread_cond_wait</function>).  It can't see dependencies
905    between the threads if the signaller arrives first.  In the latter case,
906    POSIX guidelines imply that the associated boolean condition still
907    provides an inter-thread synchronisation event, but one which is
908    invisible to Helgrind.</para>
909
910    <para>The result of Helgrind missing some inter-thread
911    synchronisation events is to cause it to report false positives.
912    </para>
913
914    <para>The root cause of this synchronisation lossage is
915    particularly hard to understand, so an example is helpful.  It was
916    discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
917    in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
918    canonical POSIX-recommended usage scheme for condition variables
919    is as follows:</para>
920
921<programlisting><![CDATA[
922b   is a Boolean condition, which is False most of the time
923cv  is a condition variable
924mx  is its associated mutex
925
926Signaller:                             Waiter:
927
928lock(mx)                               lock(mx)
929b = True                               while (b == False)
930signal(cv)                                wait(cv,mx)
931unlock(mx)                             unlock(mx)
932]]></programlisting>
933
934    <para>Assume <computeroutput>b</computeroutput> is False most of
935    the time.  If the waiter arrives at the rendezvous first, it
936    enters its while-loop, waits for the signaller to signal, and
937    eventually proceeds.  Helgrind sees the signal, notes the
938    dependency, and all is well.</para>
939
940    <para>If the signaller arrives
941    first, <computeroutput>b</computeroutput> is set to true, and the
942    signal disappears into nowhere.  When the waiter later arrives, it
943    does not enter its while-loop and simply carries on.  But even in
944    this case, the waiter code following the while-loop cannot execute
945    until the signaller sets <computeroutput>b</computeroutput> to
946    True.  Hence there is still the same inter-thread dependency, but
947    this time it is through an arbitrary in-memory condition, and
948    Helgrind cannot see it.</para>
949
950    <para>By comparison, Helgrind's detection of inter-thread
951    dependencies caused by semaphore operations is believed to be
952    exactly correct.</para>
953
954    <para>As far as I know, a solution to this problem that does not
955    require source-level annotation of condition-variable wait loops
956    is beyond the current state of the art.</para>
957  </listitem>
958
959  <listitem>
960    <para>Make sure you are using a supported Linux distribution.  At
961    present, Helgrind only properly supports glibc-2.3 or later.  This
962    in turn means we only support glibc's NPTL threading
963    implementation.  The old LinuxThreads implementation is not
964    supported.</para>
965  </listitem>
966
967  <listitem>
968    <para>Round up all finished threads using
969    <function>pthread_join</function>.  Avoid
970    detaching threads: don't create threads in the detached state, and
971    don't call <function>pthread_detach</function> on existing threads.</para>
972
973    <para>Using <function>pthread_join</function> to round up finished
974    threads provides a clear synchronisation point that both Helgrind and
975    programmers can see.  If you don't call
976    <function>pthread_join</function> on a thread, Helgrind has no way to
977    know when it finishes, relative to any
978    significant synchronisation points for other threads in the program.  So
979    it assumes that the thread lingers indefinitely and can potentially
980    interfere indefinitely with the memory state of the program.  It
981    has every right to assume that -- after all, it might really be
982    the case that, for scheduling reasons, the exiting thread did run
983    very slowly in the last stages of its life.</para>
984  </listitem>
985
986  <listitem>
987    <para>Perform thread debugging (with Helgrind) and memory
988    debugging (with Memcheck) together.</para>
989
990    <para>Helgrind tracks the state of memory in detail, and memory
991    management bugs in the application are liable to cause confusion.
992    In extreme cases, applications which do many invalid reads and
993    writes (particularly to freed memory) have been known to crash
994    Helgrind.  So, ideally, you should make your application
995    Memcheck-clean before using Helgrind.</para>
996
997    <para>It may be impossible to make your application Memcheck-clean
998    unless you first remove threading bugs.  In particular, it may be
999    difficult to remove all reads and writes to freed memory in
1000    multithreaded C++ destructor sequences at program termination.
1001    So, ideally, you should make your application Helgrind-clean
1002    before using Memcheck.</para>
1003
1004    <para>Since this circularity is obviously unresolvable, at least
1005    bear in mind that Memcheck and Helgrind are to some extent
1006    complementary, and you may need to use them together.</para>
1007  </listitem>
1008
1009  <listitem>
1010    <para>POSIX requires that implementations of standard I/O
1011    (<function>printf</function>, <function>fprintf</function>,
1012    <function>fwrite</function>, <function>fread</function>, etc) are thread
1013    safe.  Unfortunately GNU libc implements this by using internal locking
1014    primitives that Helgrind is unable to intercept.  Consequently Helgrind
1015    generates many false race reports when you use these functions.</para>
1016
1017    <para>Helgrind attempts to hide these errors using the standard
1018    Valgrind error-suppression mechanism.  So, at least for simple
1019    test cases, you don't see any.  Nevertheless, some may slip
1020    through.  Just something to be aware of.</para>
1021  </listitem>
1022
1023  <listitem>
1024    <para>Helgrind's error checks do not work properly inside the
1025    system threading library itself
1026    (<computeroutput>libpthread.so</computeroutput>), and it usually
1027    observes large numbers of (false) errors in there.  Valgrind's
1028    suppression system then filters these out, so you should not see
1029    them.</para>
1030
1031    <para>If you see any race errors reported
1032    where <computeroutput>libpthread.so</computeroutput> or
1033    <computeroutput>ld.so</computeroutput> is the object associated
1034    with the innermost stack frame, please file a bug report at
1035    <ulink url="&vg-url;">&vg-url;</ulink>.
1036    </para>
1037  </listitem>
1038
1039</orderedlist>
1040
1041</sect1>
1042
1043
1044
1045
1046<sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options">
1047<title>Helgrind Command-line Options</title>
1048
1049<para>The following end-user options are available:</para>
1050
1051<!-- start of xi:include in the manpage -->
1052<variablelist id="hg.opts.list">
1053
1054  <varlistentry id="opt.free-is-write"
1055                xreflabel="--free-is-write">
1056    <term>
1057      <option><![CDATA[--free-is-write=no|yes
1058      [default: no] ]]></option>
1059    </term>
1060    <listitem>
1061      <para>When enabled (not the default), Helgrind treats freeing of
1062        heap memory as if the memory was written immediately before
1063        the free.  This exposes races where memory is referenced by
1064        one thread, and freed by another, but there is no observable
1065        synchronisation event to ensure that the reference happens
1066        before the free.
1067      </para>
1068      <para>This functionality is new in Valgrind 3.7.0, and is
1069        regarded as experimental.  It is not enabled by default
1070        because its interaction with custom memory allocators is not
1071        well understood at present.  User feedback is welcomed.
1072      </para>
1073    </listitem>
1074  </varlistentry>
1075
1076  <varlistentry id="opt.track-lockorders"
1077                xreflabel="--track-lockorders">
1078    <term>
1079      <option><![CDATA[--track-lockorders=no|yes
1080      [default: yes] ]]></option>
1081    </term>
1082    <listitem>
1083      <para>When enabled (the default), Helgrind performs lock order
1084      consistency checking.  For some buggy programs, the large number
1085      of lock order errors reported can become annoying, particularly
1086      if you're only interested in race errors.  You may therefore find
1087      it helpful to disable lock order checking.</para>
1088    </listitem>
1089  </varlistentry>
1090
1091  <varlistentry id="opt.history-level"
1092                xreflabel="--history-level">
1093    <term>
1094      <option><![CDATA[--history-level=none|approx|full
1095      [default: full] ]]></option>
1096    </term>
1097    <listitem>
1098      <para><option>--history-level=full</option> (the default) causes
1099        Helgrind collects enough information about "old" accesses that
1100        it can produce two stack traces in a race report -- both the
1101        stack trace for the current access, and the trace for the
1102        older, conflicting access.</para>
1103      <para>Collecting such information is expensive in both speed and
1104        memory, particularly for programs that do many inter-thread
1105        synchronisation events (locks, unlocks, etc).  Without such
1106        information, it is more difficult to track down the root
1107        causes of races.  Nonetheless, you may not need it in
1108        situations where you just want to check for the presence or
1109        absence of races, for example, when doing regression testing
1110        of a previously race-free program.</para>
1111      <para><option>--history-level=none</option> is the opposite
1112        extreme.  It causes Helgrind not to collect any information
1113        about previous accesses.  This can be dramatically faster
1114        than <option>--history-level=full</option>.</para>
1115      <para><option>--history-level=approx</option> provides a
1116        compromise between these two extremes.  It causes Helgrind to
1117        show a full trace for the later access, and approximate
1118        information regarding the earlier access.  This approximate
1119        information consists of two stacks, and the earlier access is
1120        guaranteed to have occurred somewhere between program points
1121        denoted by the two stacks. This is not as useful as showing
1122        the exact stack for the previous access
1123        (as <option>--history-level=full</option> does), but it is
1124        better than nothing, and it is almost as fast as
1125        <option>--history-level=none</option>.</para>
1126    </listitem>
1127  </varlistentry>
1128
1129  <varlistentry id="opt.conflict-cache-size"
1130                xreflabel="--conflict-cache-size">
1131    <term>
1132      <option><![CDATA[--conflict-cache-size=N
1133      [default: 1000000] ]]></option>
1134    </term>
1135    <listitem>
1136      <para>This flag only has any effect
1137        at <option>--history-level=full</option>.</para>
1138      <para>Information about "old" conflicting accesses is stored in
1139        a cache of limited size, with LRU-style management.  This is
1140        necessary because it isn't practical to store a stack trace
1141        for every single memory access made by the program.
1142        Historical information on not recently accessed locations is
1143        periodically discarded, to free up space in the cache.</para>
1144      <para>This option controls the size of the cache, in terms of the
1145        number of different memory addresses for which
1146        conflicting access information is stored.  If you find that
1147        Helgrind is showing race errors with only one stack instead of
1148        the expected two stacks, try increasing this value.</para>
1149      <para>The minimum value is 10,000 and the maximum is 30,000,000
1150        (thirty times the default value).  Increasing the value by 1
1151        increases Helgrind's memory requirement by very roughly 100
1152        bytes, so the maximum value will easily eat up three extra
1153        gigabytes or so of memory.</para>
1154    </listitem>
1155  </varlistentry>
1156
1157  <varlistentry id="opt.check-stack-refs"
1158                xreflabel="--check-stack-refs">
1159    <term>
1160      <option><![CDATA[--check-stack-refs=no|yes
1161      [default: yes] ]]></option>
1162    </term>
1163    <listitem>
1164      <para>
1165        By default Helgrind checks all data memory accesses made by your
1166        program.  This flag enables you to skip checking for accesses
1167        to thread stacks (local variables).  This can improve
1168        performance, but comes at the cost of missing races on
1169        stack-allocated data.
1170      </para>
1171    </listitem>
1172  </varlistentry>
1173
1174
1175</variablelist>
1176<!-- end of xi:include in the manpage -->
1177
1178<!-- start of xi:include in the manpage -->
1179<!--  commented out, because we don't document debugging options in the
1180      manual.  Nb: all the double-dashes below had a space inserted in them
1181      to avoid problems with premature closing of this comment.
1182<para>In addition, the following debugging options are available for
1183Helgrind:</para>
1184
1185<variablelist id="hg.debugopts.list">
1186
1187  <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
1188    <term>
1189      <option><![CDATA[- -trace-malloc=no|yes [no]
1190      ]]></option>
1191    </term>
1192    <listitem>
1193      <para>Show all client <function>malloc</function> (etc) and
1194      <function>free</function> (etc) requests.</para>
1195    </listitem>
1196  </varlistentry>
1197
1198  <varlistentry id="opt.cmp-race-err-addrs"
1199                xreflabel="- -cmp-race-err-addrs">
1200    <term>
1201      <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
1202      ]]></option>
1203    </term>
1204    <listitem>
1205      <para>Controls whether or not race (data) addresses should be
1206        taken into account when removing duplicates of race errors.
1207        With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
1208        identical race errors will be considered to be the same if
1209        their race addresses differ.  With
1210        With <varname>- -cmp-race-err-addrs=yes</varname> they will be
1211        considered different.  This is provided to help make certain
1212        regression tests work reliably.</para>
1213    </listitem>
1214  </varlistentry>
1215
1216  <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
1217    <term>
1218      <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
1219      ]]></option>
1220    </term>
1221    <listitem>
1222      <para>Run extensive sanity checks on Helgrind's internal
1223        data structures at events defined by the bitstring, as
1224        follows:</para>
1225      <para><computeroutput>010000 </computeroutput>after changes to
1226        the lock order acquisition graph</para>
1227      <para><computeroutput>001000 </computeroutput>after every client
1228        memory access (NB: not currently used)</para>
1229      <para><computeroutput>000100 </computeroutput>after every client
1230        memory range permission setting of 256 bytes or greater</para>
1231      <para><computeroutput>000010 </computeroutput>after every client
1232        lock or unlock event</para>
1233      <para><computeroutput>000001 </computeroutput>after every client
1234        thread creation or joinage event</para>
1235      <para>Note these will make Helgrind run very slowly, often to
1236        the point of being completely unusable.</para>
1237    </listitem>
1238  </varlistentry>
1239
1240</variablelist>
1241-->
1242<!-- end of xi:include in the manpage -->
1243
1244
1245</sect1>
1246
1247
1248
1249<sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
1250<title>Helgrind Client Requests</title>
1251
1252<para>The following client requests are defined in
1253<filename>helgrind.h</filename>.  See that file for exact details of their
1254arguments.</para>
1255
1256<itemizedlist>
1257
1258  <listitem>
1259    <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para>
1260    <para>This makes Helgrind forget everything it knows about a
1261    specified memory range.  This is particularly useful for memory
1262    allocators that wish to recycle memory.</para>
1263  </listitem>
1264  <listitem>
1265    <para><function>ANNOTATE_HAPPENS_BEFORE</function></para>
1266  </listitem>
1267  <listitem>
1268    <para><function>ANNOTATE_HAPPENS_AFTER</function></para>
1269  </listitem>
1270  <listitem>
1271    <para><function>ANNOTATE_NEW_MEMORY</function></para>
1272  </listitem>
1273  <listitem>
1274    <para><function>ANNOTATE_RWLOCK_CREATE</function></para>
1275  </listitem>
1276  <listitem>
1277    <para><function>ANNOTATE_RWLOCK_DESTROY</function></para>
1278  </listitem>
1279  <listitem>
1280    <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para>
1281  </listitem>
1282  <listitem>
1283    <para><function>ANNOTATE_RWLOCK_RELEASED</function></para>
1284    <para>These are used to describe to Helgrind, the behaviour of
1285    custom (non-POSIX) synchronisation primitives, which it otherwise
1286    has no way to understand.  See comments
1287    in <filename>helgrind.h</filename> for further
1288    documentation.</para>
1289  </listitem>
1290
1291</itemizedlist>
1292
1293</sect1>
1294
1295
1296
1297<sect1 id="hg-manual.todolist" xreflabel="To Do List">
1298<title>A To-Do List for Helgrind</title>
1299
1300<para>The following is a list of loose ends which should be tidied up
1301some time.</para>
1302
1303<itemizedlist>
1304  <listitem><para>For lock order errors, print the complete lock
1305    cycle, rather than only doing for size-2 cycles as at
1306    present.</para>
1307  </listitem>
1308  <listitem><para>The conflicting access mechanism sometimes
1309    mysteriously fails to show the conflicting access' stack, even
1310    when provided with unbounded storage for conflicting access info.
1311    This should be investigated.</para>
1312  </listitem>
1313  <listitem><para>Document races caused by GCC's thread-unsafe code
1314    generation for speculative stores.  In the interim see
1315    <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
1316    </computeroutput>
1317    and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
1318    </para>
1319  </listitem>
1320  <listitem><para>Don't update the lock-order graph, and don't check
1321    for errors, when a "try"-style lock operation happens (e.g.
1322    <function>pthread_mutex_trylock</function>).  Such calls do not add any real
1323    restrictions to the locking order, since they can always fail to
1324    acquire the lock, resulting in the caller going off and doing Plan
1325    B (presumably it will have a Plan B).  Doing such checks could
1326    generate false lock-order errors and confuse users.</para>
1327  </listitem>
1328  <listitem><para> Performance can be very poor.  Slowdowns on the
1329    order of 100:1 are not unusual.  There is limited scope for
1330    performance improvements.
1331    </para>
1332  </listitem>
1333
1334</itemizedlist>
1335
1336</sect1>
1337
1338</chapter>
1339