• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
7<chapter id="drd-manual" xreflabel="DRD: a thread error detector">
8  <title>DRD: a thread error detector</title>
9
10<para>To use this tool, you must specify
11<option>--tool=drd</option>
12on the Valgrind command line.</para>
13
14
15<sect1 id="drd-manual.overview" xreflabel="Overview">
16<title>Overview</title>
17
18<para>
19DRD is a Valgrind tool for detecting errors in multithreaded C and C++
20programs. The tool works for any program that uses the POSIX threading
21primitives or that uses threading concepts built on top of the POSIX threading
22primitives.
23</para>
24
25<sect2 id="drd-manual.mt-progr-models" xreflabel="MT-progr-models">
26<title>Multithreaded Programming Paradigms</title>
27
28<para>
29There are two possible reasons for using multithreading in a program:
30<itemizedlist>
31  <listitem>
32    <para>
33      To model concurrent activities. Assigning one thread to each activity
34      can be a great simplification compared to multiplexing the states of
35      multiple activities in a single thread. This is why most server software
36      and embedded software is multithreaded.
37    </para>
38  </listitem>
39  <listitem>
40    <para>
41      To use multiple CPU cores simultaneously for speeding up
42      computations. This is why many High Performance Computing (HPC)
43      applications are multithreaded.
44    </para>
45  </listitem>
46</itemizedlist>
47</para>
48
49<para>
50Multithreaded programs can use one or more of the following programming
51paradigms. Which paradigm is appropriate depends e.g. on the application type.
52Some examples of multithreaded programming paradigms are:
53<itemizedlist>
54  <listitem>
55    <para>
56      Locking. Data that is shared over threads is protected from concurrent
57      accesses via locking. E.g. the POSIX threads library, the Qt library
58      and the Boost.Thread library support this paradigm directly.
59    </para>
60  </listitem>
61  <listitem>
62    <para>
63      Message passing. No data is shared between threads, but threads exchange
64      data by passing messages to each other. Examples of implementations of
65      the message passing paradigm are MPI and CORBA.
66    </para>
67  </listitem>
68  <listitem>
69    <para>
70      Automatic parallelization. A compiler converts a sequential program into
71      a multithreaded program. The original program may or may not contain
72      parallelization hints. One example of such parallelization hints is the
73      OpenMP standard. In this standard a set of directives are defined which
74      tell a compiler how to parallelize a C, C++ or Fortran program. OpenMP
75      is well suited for computational intensive applications. As an example,
76      an open source image processing software package is using OpenMP to
77      maximize performance on systems with multiple CPU
78      cores. GCC supports the
79      OpenMP standard from version 4.2.0 on.
80    </para>
81  </listitem>
82  <listitem>
83    <para>
84      Software Transactional Memory (STM). Any data that is shared between
85      threads is updated via transactions. After each transaction it is
86      verified whether there were any conflicting transactions. If there were
87      conflicts, the transaction is aborted, otherwise it is committed. This
88      is a so-called optimistic approach. There is a prototype of the Intel C++
89      Compiler available that supports STM. Research about the addition of
90      STM support to GCC is ongoing.
91    </para>
92  </listitem>
93</itemizedlist>
94</para>
95
96<para>
97DRD supports any combination of multithreaded programming paradigms as
98long as the implementation of these paradigms is based on the POSIX
99threads primitives. DRD however does not support programs that use
100e.g. Linux' futexes directly. Attempts to analyze such programs with
101DRD will cause DRD to report many false positives.
102</para>
103
104</sect2>
105
106
107<sect2 id="drd-manual.pthreads-model" xreflabel="Pthreads-model">
108<title>POSIX Threads Programming Model</title>
109
110<para>
111POSIX threads, also known as Pthreads, is the most widely available
112threading library on Unix systems.
113</para>
114
115<para>
116The POSIX threads programming model is based on the following abstractions:
117<itemizedlist>
118  <listitem>
119    <para>
120      A shared address space. All threads running within the same
121      process share the same address space. All data, whether shared or
122      not, is identified by its address.
123    </para>
124  </listitem>
125  <listitem>
126    <para>
127      Regular load and store operations, which allow to read values
128      from or to write values to the memory shared by all threads
129      running in the same process.
130    </para>
131  </listitem>
132  <listitem>
133    <para>
134      Atomic store and load-modify-store operations. While these are
135      not mentioned in the POSIX threads standard, most
136      microprocessors support atomic memory operations.
137    </para>
138  </listitem>
139  <listitem>
140    <para>
141      Threads. Each thread represents a concurrent activity.
142    </para>
143  </listitem>
144  <listitem>
145    <para>
146      Synchronization objects and operations on these synchronization
147      objects. The following types of synchronization objects have been
148      defined in the POSIX threads standard: mutexes, condition variables,
149      semaphores, reader-writer synchronization objects, barriers and
150      spinlocks.
151    </para>
152  </listitem>
153</itemizedlist>
154</para>
155
156<para>
157Which source code statements generate which memory accesses depends on
158the <emphasis>memory model</emphasis> of the programming language being
159used. There is not yet a definitive memory model for the C and C++
160languages. For a draft memory model, see also the document
161<ulink url="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html">
162WG21/N2338: Concurrency memory model compiler consequences</ulink>.
163</para>
164
165<para>
166For more information about POSIX threads, see also the Single UNIX
167Specification version 3, also known as
168<ulink url="http://www.opengroup.org/onlinepubs/000095399/idx/threads.html">
169IEEE Std 1003.1</ulink>.
170</para>
171
172</sect2>
173
174
175<sect2 id="drd-manual.mt-problems" xreflabel="MT-Problems">
176<title>Multithreaded Programming Problems</title>
177
178<para>
179Depending on which multithreading paradigm is being used in a program,
180one or more of the following problems can occur:
181<itemizedlist>
182  <listitem>
183    <para>
184      Data races. One or more threads access the same memory location without
185      sufficient locking. Most but not all data races are programming errors
186      and are the cause of subtle and hard-to-find bugs.
187    </para>
188  </listitem>
189  <listitem>
190    <para>
191      Lock contention. One thread blocks the progress of one or more other
192      threads by holding a lock too long.
193    </para>
194  </listitem>
195  <listitem>
196    <para>
197      Improper use of the POSIX threads API. Most implementations of the POSIX
198      threads API have been optimized for runtime speed. Such implementations
199      will not complain on certain errors, e.g. when a mutex is being unlocked
200      by another thread than the thread that obtained a lock on the mutex.
201    </para>
202  </listitem>
203  <listitem>
204    <para>
205      Deadlock. A deadlock occurs when two or more threads wait for
206      each other indefinitely.
207    </para>
208  </listitem>
209  <listitem>
210    <para>
211      False sharing. If threads that run on different processor cores
212      access different variables located in the same cache line
213      frequently, this will slow down the involved threads a lot due
214      to frequent exchange of cache lines.
215    </para>
216  </listitem>
217</itemizedlist>
218</para>
219
220<para>
221Although the likelihood of the occurrence of data races can be reduced
222through a disciplined programming style, a tool for automatic
223detection of data races is a necessity when developing multithreaded
224software. DRD can detect these, as well as lock contention and
225improper use of the POSIX threads API.
226</para>
227
228</sect2>
229
230
231<sect2 id="drd-manual.data-race-detection" xreflabel="data-race-detection">
232<title>Data Race Detection</title>
233
234<para>
235The result of load and store operations performed by a multithreaded program
236depends on the order in which memory operations are performed. This order is
237determined by:
238<orderedlist>
239  <listitem>
240    <para>
241      All memory operations performed by the same thread are performed in
242      <emphasis>program order</emphasis>, that is, the order determined by the
243      program source code and the results of previous load operations.
244    </para>
245  </listitem>
246  <listitem>
247    <para>
248      Synchronization operations determine certain ordering constraints on
249      memory operations performed by different threads. These ordering
250      constraints are called the <emphasis>synchronization order</emphasis>.
251    </para>
252  </listitem>
253</orderedlist>
254The combination of program order and synchronization order is called the
255<emphasis>happens-before relationship</emphasis>. This concept was first
256defined by S. Adve et al in the paper <emphasis>Detecting data races on weak
257memory systems</emphasis>, ACM SIGARCH Computer Architecture News, v.19 n.3,
258p.234-243, May 1991.
259</para>
260
261<para>
262Two memory operations <emphasis>conflict</emphasis> if both operations are
263performed by different threads, refer to the same memory location and at least
264one of them is a store operation.
265</para>
266
267<para>
268A multithreaded program is <emphasis>data-race free</emphasis> if all
269conflicting memory accesses are ordered by synchronization
270operations.
271</para>
272
273<para>
274A well known way to ensure that a multithreaded program is data-race
275free is to ensure that a locking discipline is followed. It is e.g.
276possible to associate a mutex with each shared data item, and to hold
277a lock on the associated mutex while the shared data is accessed.
278</para>
279
280<para>
281All programs that follow a locking discipline are data-race free, but not all
282data-race free programs follow a locking discipline. There exist multithreaded
283programs where access to shared data is arbitrated via condition variables,
284semaphores or barriers. As an example, a certain class of HPC applications
285consists of a sequence of computation steps separated in time by barriers, and
286where these barriers are the only means of synchronization. Although there are
287many conflicting memory accesses in such applications and although such
288applications do not make use mutexes, most of these applications do not
289contain data races.
290</para>
291
292<para>
293There exist two different approaches for verifying the correctness of
294multithreaded programs at runtime. The approach of the so-called Eraser
295algorithm is to verify whether all shared memory accesses follow a consistent
296locking strategy. And the happens-before data race detectors verify directly
297whether all interthread memory accesses are ordered by synchronization
298operations. While the last approach is more complex to implement, and while it
299is more sensitive to OS scheduling, it is a general approach that works for
300all classes of multithreaded programs. An important advantage of
301happens-before data race detectors is that these do not report any false
302positives.
303</para>
304
305<para>
306DRD is based on the happens-before algorithm.
307</para>
308
309</sect2>
310
311
312</sect1>
313
314
315<sect1 id="drd-manual.using-drd" xreflabel="Using DRD">
316<title>Using DRD</title>
317
318<sect2 id="drd-manual.options" xreflabel="DRD Command-line Options">
319<title>DRD Command-line Options</title>
320
321<para>The following command-line options are available for controlling the
322behavior of the DRD tool itself:</para>
323
324<!-- start of xi:include in the manpage -->
325<variablelist id="drd.opts.list">
326  <varlistentry>
327    <term>
328      <option><![CDATA[--check-stack-var=<yes|no> [default: no]]]></option>
329    </term>
330    <listitem>
331      <para>
332        Controls whether DRD detects data races on stack
333        variables. Verifying stack variables is disabled by default because
334        most programs do not share stack variables over threads.
335      </para>
336    </listitem>
337  </varlistentry>
338  <varlistentry>
339    <term>
340      <option><![CDATA[--exclusive-threshold=<n> [default: off]]]></option>
341    </term>
342    <listitem>
343      <para>
344        Print an error message if any mutex or writer lock has been
345        held longer than the time specified in milliseconds. This
346        option enables the detection of lock contention.
347      </para>
348    </listitem>
349  </varlistentry>
350  <varlistentry>
351    <term>
352      <option><![CDATA[--join-list-vol=<n> [default: 10]]]></option>
353    </term>
354    <listitem>
355      <para>
356        Data races that occur between a statement at the end of one thread
357	and another thread can be missed if memory access information is
358	discarded immediately after a thread has been joined. This option
359	allows to specify for how many joined threads memory access information
360	should be retained.
361      </para>
362    </listitem>
363  </varlistentry>
364  <varlistentry>
365    <term>
366      <option>
367        <![CDATA[--first-race-only=<yes|no> [default: no]]]>
368      </option>
369    </term>
370    <listitem>
371      <para>
372        Whether to report only the first data race that has been detected on a
373        memory location or all data races that have been detected on a memory
374        location.
375      </para>
376    </listitem>
377  </varlistentry>
378  <varlistentry>
379    <term>
380      <option>
381        <![CDATA[--free-is-write=<yes|no> [default: no]]]>
382      </option>
383    </term>
384    <listitem>
385      <para>
386        Whether to report races between accessing memory and freeing
387        memory. Enabling this option may cause DRD to run slightly
388        slower. Notes:
389	<itemizedlist>
390	  <listitem>
391	    <para>
392	      Don't enable this option when using custom memory allocators
393	      that use
394	      the <computeroutput>VG_USERREQ__MALLOCLIKE_BLOCK</computeroutput>
395	      and <computeroutput>VG_USERREQ__FREELIKE_BLOCK</computeroutput>
396	      because that would result in false positives.
397	    </para>
398	  </listitem>
399	  <listitem>
400	    <para>Don't enable this option when using reference-counted
401	      objects because that will result in false positives, even when
402	      that code has been annotated properly with
403	      <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>
404	      and <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput>. See
405	      e.g.  the output of the following command for an example:
406	      <computeroutput>valgrind --tool=drd --free-is-write=yes
407		drd/tests/annotate_smart_pointer</computeroutput>.
408	    </para>
409	  </listitem>
410	</itemizedlist>
411      </para>
412    </listitem>
413  </varlistentry>
414  <varlistentry>
415    <term>
416      <option>
417        <![CDATA[--report-signal-unlocked=<yes|no> [default: yes]]]>
418      </option>
419    </term>
420    <listitem>
421      <para>
422        Whether to report calls to
423        <function>pthread_cond_signal</function> and
424        <function>pthread_cond_broadcast</function> where the mutex
425        associated with the signal through
426        <function>pthread_cond_wait</function> or
427        <function>pthread_cond_timed_wait</function>is not locked at
428        the time the signal is sent.  Sending a signal without holding
429        a lock on the associated mutex is a common programming error
430        which can cause subtle race conditions and unpredictable
431        behavior. There exist some uncommon synchronization patterns
432        however where it is safe to send a signal without holding a
433        lock on the associated mutex.
434      </para>
435    </listitem>
436  </varlistentry>
437  <varlistentry>
438    <term>
439      <option><![CDATA[--segment-merging=<yes|no> [default: yes]]]></option>
440    </term>
441    <listitem>
442      <para>
443        Controls segment merging. Segment merging is an algorithm to
444        limit memory usage of the data race detection
445        algorithm. Disabling segment merging may improve the accuracy
446        of the so-called 'other segments' displayed in race reports
447        but can also trigger an out of memory error.
448      </para>
449    </listitem>
450  </varlistentry>
451  <varlistentry>
452    <term>
453      <option><![CDATA[--segment-merging-interval=<n> [default: 10]]]></option>
454    </term>
455    <listitem>
456      <para>
457        Perform segment merging only after the specified number of new
458        segments have been created. This is an advanced configuration option
459        that allows to choose whether to minimize DRD's memory usage by
460        choosing a low value or to let DRD run faster by choosing a slightly
461        higher value. The optimal value for this parameter depends on the
462        program being analyzed. The default value works well for most programs.
463      </para>
464    </listitem>
465  </varlistentry>
466  <varlistentry>
467    <term>
468      <option><![CDATA[--shared-threshold=<n> [default: off]]]></option>
469    </term>
470    <listitem>
471      <para>
472        Print an error message if a reader lock has been held longer
473        than the specified time (in milliseconds). This option enables
474        the detection of lock contention.
475      </para>
476    </listitem>
477  </varlistentry>
478  <varlistentry>
479    <term>
480      <option><![CDATA[--show-confl-seg=<yes|no> [default: yes]]]></option>
481    </term>
482    <listitem>
483      <para>
484         Show conflicting segments in race reports. Since this
485         information can help to find the cause of a data race, this
486         option is enabled by default. Disabling this option makes the
487         output of DRD more compact.
488      </para>
489    </listitem>
490  </varlistentry>
491  <varlistentry>
492    <term>
493      <option><![CDATA[--show-stack-usage=<yes|no> [default: no]]]></option>
494    </term>
495    <listitem>
496      <para>
497        Print stack usage at thread exit time. When a program creates a large
498        number of threads it becomes important to limit the amount of virtual
499        memory allocated for thread stacks. This option makes it possible to
500        observe how much stack memory has been used by each thread of the the
501        client program. Note: the DRD tool itself allocates some temporary
502        data on the client thread stack. The space necessary for this
503        temporary data must be allocated by the client program when it
504        allocates stack memory, but is not included in stack usage reported by
505        DRD.
506      </para>
507    </listitem>
508  </varlistentry>
509</variablelist>
510<!-- end of xi:include in the manpage -->
511
512<!-- start of xi:include in the manpage -->
513<para>
514The following options are available for monitoring the behavior of the
515client program:
516</para>
517
518<variablelist id="drd.debugopts.list">
519  <varlistentry>
520    <term>
521      <option><![CDATA[--trace-addr=<address> [default: none]]]></option>
522    </term>
523    <listitem>
524      <para>
525        Trace all load and store activity for the specified
526        address. This option may be specified more than once.
527      </para>
528    </listitem>
529  </varlistentry>
530  <varlistentry>
531    <term>
532      <option><![CDATA[--ptrace-addr=<address> [default: none]]]></option>
533    </term>
534    <listitem>
535      <para>
536        Trace all load and store activity for the specified address and keep
537        doing that even after the memory at that address has been freed and
538        reallocated.
539      </para>
540    </listitem>
541  </varlistentry>
542  <varlistentry>
543    <term>
544      <option><![CDATA[--trace-alloc=<yes|no> [default: no]]]></option>
545    </term>
546    <listitem>
547      <para>
548        Trace all memory allocations and deallocations. May produce a huge
549        amount of output.
550      </para>
551    </listitem>
552  </varlistentry>
553  <varlistentry>
554    <term>
555      <option><![CDATA[--trace-barrier=<yes|no> [default: no]]]></option>
556    </term>
557    <listitem>
558      <para>
559        Trace all barrier activity.
560      </para>
561    </listitem>
562  </varlistentry>
563  <varlistentry>
564    <term>
565      <option><![CDATA[--trace-cond=<yes|no> [default: no]]]></option>
566    </term>
567    <listitem>
568      <para>
569        Trace all condition variable activity.
570      </para>
571    </listitem>
572  </varlistentry>
573  <varlistentry>
574    <term>
575      <option><![CDATA[--trace-fork-join=<yes|no> [default: no]]]></option>
576    </term>
577    <listitem>
578      <para>
579        Trace all thread creation and all thread termination events.
580      </para>
581    </listitem>
582  </varlistentry>
583  <varlistentry>
584    <term>
585      <option><![CDATA[--trace-hb=<yes|no> [default: no]]]></option>
586    </term>
587    <listitem>
588      <para>
589        Trace execution of the <literal>ANNOTATE_HAPPENS_BEFORE()</literal>,
590	<literal>ANNOTATE_HAPPENS_AFTER()</literal> and
591	<literal>ANNOTATE_HAPPENS_DONE()</literal> client requests.
592      </para>
593    </listitem>
594  </varlistentry>
595  <varlistentry>
596    <term>
597      <option><![CDATA[--trace-mutex=<yes|no> [default: no]]]></option>
598    </term>
599    <listitem>
600      <para>
601        Trace all mutex activity.
602      </para>
603    </listitem>
604  </varlistentry>
605  <varlistentry>
606    <term>
607      <option><![CDATA[--trace-rwlock=<yes|no> [default: no]]]></option>
608    </term>
609    <listitem>
610      <para>
611         Trace all reader-writer lock activity.
612      </para>
613    </listitem>
614  </varlistentry>
615  <varlistentry>
616    <term>
617      <option><![CDATA[--trace-semaphore=<yes|no> [default: no]]]></option>
618    </term>
619    <listitem>
620      <para>
621        Trace all semaphore activity.
622      </para>
623    </listitem>
624  </varlistentry>
625</variablelist>
626<!-- end of xi:include in the manpage -->
627
628</sect2>
629
630
631<sect2 id="drd-manual.data-races" xreflabel="Data Races">
632<title>Detected Errors: Data Races</title>
633
634<para>
635DRD prints a message every time it detects a data race. Please keep
636the following in mind when interpreting DRD's output:
637<itemizedlist>
638  <listitem>
639    <para>
640      Every thread is assigned a <emphasis>thread ID</emphasis> by the DRD
641      tool. A thread ID is a number. Thread ID's start at one and are never
642      recycled.
643    </para>
644  </listitem>
645  <listitem>
646    <para>
647      The term <emphasis>segment</emphasis> refers to a consecutive
648      sequence of load, store and synchronization operations, all
649      issued by the same thread. A segment always starts and ends at a
650      synchronization operation. Data race analysis is performed
651      between segments instead of between individual load and store
652      operations because of performance reasons.
653    </para>
654  </listitem>
655  <listitem>
656    <para>
657      There are always at least two memory accesses involved in a data
658      race. Memory accesses involved in a data race are called
659      <emphasis>conflicting memory accesses</emphasis>. DRD prints a
660      report for each memory access that conflicts with a past memory
661      access.
662    </para>
663  </listitem>
664</itemizedlist>
665</para>
666
667<para>
668Below you can find an example of a message printed by DRD when it
669detects a data race:
670</para>
671<programlisting><![CDATA[
672$ valgrind --tool=drd --read-var-info=yes drd/tests/rwlock_race
673...
674==9466== Thread 3:
675==9466== Conflicting load by thread 3 at 0x006020b8 size 4
676==9466==    at 0x400B6C: thread_func (rwlock_race.c:29)
677==9466==    by 0x4C291DF: vg_thread_wrapper (drd_pthread_intercepts.c:186)
678==9466==    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
679==9466==    by 0x53250CC: clone (in /lib64/libc-2.8.so)
680==9466== Location 0x6020b8 is 0 bytes inside local var "s_racy"
681==9466== declared at rwlock_race.c:18, in frame #0 of thread 3
682==9466== Other segment start (thread 2)
683==9466==    at 0x4C2847D: pthread_rwlock_rdlock* (drd_pthread_intercepts.c:813)
684==9466==    by 0x400B6B: thread_func (rwlock_race.c:28)
685==9466==    by 0x4C291DF: vg_thread_wrapper (drd_pthread_intercepts.c:186)
686==9466==    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
687==9466==    by 0x53250CC: clone (in /lib64/libc-2.8.so)
688==9466== Other segment end (thread 2)
689==9466==    at 0x4C28B54: pthread_rwlock_unlock* (drd_pthread_intercepts.c:912)
690==9466==    by 0x400B84: thread_func (rwlock_race.c:30)
691==9466==    by 0x4C291DF: vg_thread_wrapper (drd_pthread_intercepts.c:186)
692==9466==    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
693==9466==    by 0x53250CC: clone (in /lib64/libc-2.8.so)
694...
695]]></programlisting>
696
697<para>
698The above report has the following meaning:
699<itemizedlist>
700  <listitem>
701    <para>
702      The number in the column on the left is the process ID of the
703      process being analyzed by DRD.
704    </para>
705  </listitem>
706  <listitem>
707    <para>
708      The first line ("Thread 3") tells you the thread ID for
709      the thread in which context the data race has been detected.
710    </para>
711  </listitem>
712  <listitem>
713    <para>
714      The next line tells which kind of operation was performed (load or
715      store) and by which thread. On the same line the start address and the
716      number of bytes involved in the conflicting access are also displayed.
717    </para>
718  </listitem>
719  <listitem>
720    <para>
721      Next, the call stack of the conflicting access is displayed. If
722      your program has been compiled with debug information
723      (<option>-g</option>), this call stack will include file names and
724      line numbers. The two
725      bottommost frames in this call stack (<function>clone</function>
726      and <function>start_thread</function>) show how the NPTL starts
727      a thread. The third frame
728      (<function>vg_thread_wrapper</function>) is added by DRD. The
729      fourth frame (<function>thread_func</function>) is the first
730      interesting line because it shows the thread entry point, that
731      is the function that has been passed as the third argument to
732      <function>pthread_create</function>.
733    </para>
734  </listitem>
735  <listitem>
736    <para>
737      Next, the allocation context for the conflicting address is
738      displayed. For dynamically allocated data the allocation call
739      stack is shown. For static variables and stack variables the
740      allocation context is only shown when the option
741      <option>--read-var-info=yes</option> has been
742      specified. Otherwise DRD will print <computeroutput>Allocation
743      context: unknown</computeroutput>.
744    </para>
745  </listitem>
746  <listitem>
747    <para>
748      A conflicting access involves at least two memory accesses. For
749      one of these accesses an exact call stack is displayed, and for
750      the other accesses an approximate call stack is displayed,
751      namely the start and the end of the segments of the other
752      accesses. This information can be interpreted as follows:
753      <orderedlist>
754        <listitem>
755          <para>
756            Start at the bottom of both call stacks, and count the
757            number stack frames with identical function name, file
758            name and line number. In the above example the three
759            bottommost frames are identical
760            (<function>clone</function>,
761            <function>start_thread</function> and
762            <function>vg_thread_wrapper</function>).
763          </para>
764        </listitem>
765        <listitem>
766          <para>
767            The next higher stack frame in both call stacks now tells
768            you between in which source code region the other memory
769            access happened. The above output tells that the other
770            memory access involved in the data race happened between
771            source code lines 28 and 30 in file
772            <computeroutput>rwlock_race.c</computeroutput>.
773          </para>
774        </listitem>
775      </orderedlist>
776    </para>
777  </listitem>
778</itemizedlist>
779</para>
780
781</sect2>
782
783
784<sect2 id="drd-manual.lock-contention" xreflabel="Lock Contention">
785<title>Detected Errors: Lock Contention</title>
786
787<para>
788Threads must be able to make progress without being blocked for too long by
789other threads. Sometimes a thread has to wait until a mutex or reader-writer
790synchronization object is unlocked by another thread. This is called
791<emphasis>lock contention</emphasis>.
792</para>
793
794<para>
795Lock contention causes delays. Such delays should be as short as
796possible. The two command line options
797<literal>--exclusive-threshold=&lt;n&gt;</literal> and
798<literal>--shared-threshold=&lt;n&gt;</literal> make it possible to
799detect excessive lock contention by making DRD report any lock that
800has been held longer than the specified threshold. An example:
801</para>
802<programlisting><![CDATA[
803$ valgrind --tool=drd --exclusive-threshold=10 drd/tests/hold_lock -i 500
804...
805==10668== Acquired at:
806==10668==    at 0x4C267C8: pthread_mutex_lock (drd_pthread_intercepts.c:395)
807==10668==    by 0x400D92: main (hold_lock.c:51)
808==10668== Lock on mutex 0x7fefffd50 was held during 503 ms (threshold: 10 ms).
809==10668==    at 0x4C26ADA: pthread_mutex_unlock (drd_pthread_intercepts.c:441)
810==10668==    by 0x400DB5: main (hold_lock.c:55)
811...
812]]></programlisting>
813
814<para>
815The <literal>hold_lock</literal> test program holds a lock as long as
816specified by the <literal>-i</literal> (interval) argument. The DRD
817output reports that the lock acquired at line 51 in source file
818<literal>hold_lock.c</literal> and released at line 55 was held during
819503 ms, while a threshold of 10 ms was specified to DRD.
820</para>
821
822</sect2>
823
824
825<sect2 id="drd-manual.api-checks" xreflabel="API Checks">
826<title>Detected Errors: Misuse of the POSIX threads API</title>
827
828<para>
829  DRD is able to detect and report the following misuses of the POSIX
830  threads API:
831  <itemizedlist>
832    <listitem>
833      <para>
834        Passing the address of one type of synchronization object
835        (e.g. a mutex) to a POSIX API call that expects a pointer to
836        another type of synchronization object (e.g. a condition
837        variable).
838      </para>
839    </listitem>
840    <listitem>
841      <para>
842        Attempts to unlock a mutex that has not been locked.
843      </para>
844    </listitem>
845    <listitem>
846      <para>
847        Attempts to unlock a mutex that was locked by another thread.
848      </para>
849    </listitem>
850    <listitem>
851      <para>
852        Attempts to lock a mutex of type
853        <literal>PTHREAD_MUTEX_NORMAL</literal> or a spinlock
854        recursively.
855      </para>
856    </listitem>
857    <listitem>
858      <para>
859        Destruction or deallocation of a locked mutex.
860      </para>
861    </listitem>
862    <listitem>
863      <para>
864        Sending a signal to a condition variable while no lock is held
865        on the mutex associated with the condition variable.
866      </para>
867    </listitem>
868    <listitem>
869      <para>
870        Calling <function>pthread_cond_wait</function> on a mutex
871        that is not locked, that is locked by another thread or that
872        has been locked recursively.
873      </para>
874    </listitem>
875    <listitem>
876      <para>
877        Associating two different mutexes with a condition variable
878        through <function>pthread_cond_wait</function>.
879      </para>
880    </listitem>
881    <listitem>
882      <para>
883        Destruction or deallocation of a condition variable that is
884        being waited upon.
885      </para>
886    </listitem>
887    <listitem>
888      <para>
889        Destruction or deallocation of a locked reader-writer synchronization
890        object.
891      </para>
892    </listitem>
893    <listitem>
894      <para>
895        Attempts to unlock a reader-writer synchronization object that was not
896        locked by the calling thread.
897      </para>
898    </listitem>
899    <listitem>
900      <para>
901        Attempts to recursively lock a reader-writer synchronization object
902        exclusively.
903      </para>
904    </listitem>
905    <listitem>
906      <para>
907        Attempts to pass the address of a user-defined reader-writer
908        synchronization object to a POSIX threads function.
909      </para>
910    </listitem>
911    <listitem>
912      <para>
913        Attempts to pass the address of a POSIX reader-writer synchronization
914        object to one of the annotations for user-defined reader-writer
915        synchronization objects.
916      </para>
917    </listitem>
918    <listitem>
919      <para>
920        Reinitialization of a mutex, condition variable, reader-writer
921        lock, semaphore or barrier.
922      </para>
923    </listitem>
924    <listitem>
925      <para>
926        Destruction or deallocation of a semaphore or barrier that is
927        being waited upon.
928      </para>
929    </listitem>
930    <listitem>
931      <para>
932        Missing synchronization between barrier wait and barrier destruction.
933      </para>
934    </listitem>
935    <listitem>
936      <para>
937        Exiting a thread without first unlocking the spinlocks, mutexes or
938        reader-writer synchronization objects that were locked by that thread.
939      </para>
940    </listitem>
941    <listitem>
942      <para>
943        Passing an invalid thread ID to <function>pthread_join</function>
944        or <function>pthread_cancel</function>.
945      </para>
946    </listitem>
947  </itemizedlist>
948</para>
949
950</sect2>
951
952
953<sect2 id="drd-manual.clientreqs" xreflabel="Client requests">
954<title>Client Requests</title>
955
956<para>
957Just as for other Valgrind tools it is possible to let a client program
958interact with the DRD tool through client requests. In addition to the
959client requests several macros have been defined that allow to use the
960client requests in a convenient way.
961</para>
962
963<para>
964The interface between client programs and the DRD tool is defined in
965the header file <literal>&lt;valgrind/drd.h&gt;</literal>. The
966available macros and client requests are:
967<itemizedlist>
968  <listitem>
969    <para>
970      The macro <literal>DRD_GET_VALGRIND_THREADID</literal> and the
971      corresponding client
972      request <varname>VG_USERREQ__DRD_GET_VALGRIND_THREAD_ID</varname>.
973      Query the thread ID that has been assigned by the Valgrind core to the
974      thread executing this client request. Valgrind's thread ID's start at
975      one and are recycled in case a thread stops.
976    </para>
977  </listitem>
978  <listitem>
979    <para>
980      The macro <literal>DRD_GET_DRD_THREADID</literal> and the corresponding
981      client request <varname>VG_USERREQ__DRD_GET_DRD_THREAD_ID</varname>.
982      Query the thread ID that has been assigned by DRD to the thread
983      executing this client request. These are the thread ID's reported by DRD
984      in data race reports and in trace messages. DRD's thread ID's start at
985      one and are never recycled.
986    </para>
987  </listitem>
988  <listitem>
989    <para>
990      The macros <literal>DRD_IGNORE_VAR(x)</literal>,
991      <literal>ANNOTATE_TRACE_MEMORY(&amp;x)</literal> and the corresponding
992      client request <varname>VG_USERREQ__DRD_START_SUPPRESSION</varname>. Some
993      applications contain intentional races. There exist e.g. applications
994      where the same value is assigned to a shared variable from two different
995      threads. It may be more convenient to suppress such races than to solve
996      these. This client request allows to suppress such races.
997    </para>
998  </listitem>
999  <listitem>
1000    <para>
1001      The macro <literal>DRD_STOP_IGNORING_VAR(x)</literal> and the
1002      corresponding client request
1003      <varname>VG_USERREQ__DRD_FINISH_SUPPRESSION</varname>. Tell DRD
1004      to no longer ignore data races for the address range that was suppressed
1005      either via the macro <literal>DRD_IGNORE_VAR(x)</literal> or via the
1006      client request <varname>VG_USERREQ__DRD_START_SUPPRESSION</varname>.
1007    </para>
1008  </listitem>
1009  <listitem>
1010    <para>
1011      The macro <literal>DRD_TRACE_VAR(x)</literal>. Trace all load and store
1012      activity for the address range starting at <literal>&amp;x</literal> and
1013      occupying <literal>sizeof(x)</literal> bytes. When DRD reports a data
1014      race on a specified variable, and it's not immediately clear which
1015      source code statements triggered the conflicting accesses, it can be
1016      very helpful to trace all activity on the offending memory location.
1017    </para>
1018  </listitem>
1019  <listitem>
1020    <para>
1021      The macro <literal>DRD_STOP_TRACING_VAR(x)</literal>. Stop tracing load
1022      and store activity for the address range starting
1023      at <literal>&amp;x</literal> and occupying <literal>sizeof(x)</literal>
1024      bytes.
1025    </para>
1026  </listitem>
1027  <listitem>
1028    <para>
1029      The macro <literal>ANNOTATE_TRACE_MEMORY(&amp;x)</literal>. Trace all
1030      load and store activity that touches at least the single byte at the
1031      address <literal>&amp;x</literal>.
1032    </para>
1033  </listitem>
1034  <listitem>
1035    <para>
1036      The client request <varname>VG_USERREQ__DRD_START_TRACE_ADDR</varname>,
1037      which allows to trace all load and store activity for the specified
1038      address range.
1039    </para>
1040  </listitem>
1041  <listitem>
1042    <para>
1043      The client
1044      request <varname>VG_USERREQ__DRD_STOP_TRACE_ADDR</varname>. Do no longer
1045      trace load and store activity for the specified address range.
1046    </para>
1047  </listitem>
1048  <listitem>
1049    <para>
1050      The macro <literal>ANNOTATE_HAPPENS_BEFORE(addr)</literal> tells DRD to
1051      insert a mark. Insert this macro just after an access to the variable at
1052      the specified address has been performed.
1053    </para>
1054  </listitem>
1055  <listitem>
1056    <para>
1057      The macro <literal>ANNOTATE_HAPPENS_AFTER(addr)</literal> tells DRD that
1058      the next access to the variable at the specified address should be
1059      considered to have happened after the access just before the latest
1060      <literal>ANNOTATE_HAPPENS_BEFORE(addr)</literal> annotation that
1061      references the same variable. The purpose of these two macros is to tell
1062      DRD about the order of inter-thread memory accesses implemented via
1063      atomic memory operations. See
1064      also <literal>drd/tests/annotate_smart_pointer.cpp</literal> for an
1065      example.
1066    </para>
1067  </listitem>
1068  <listitem>
1069    <para>
1070      The macro <literal>ANNOTATE_RWLOCK_CREATE(rwlock)</literal> tells DRD
1071      that the object at address <literal>rwlock</literal> is a
1072      reader-writer synchronization object that is not a
1073      <literal>pthread_rwlock_t</literal> synchronization object.  See
1074      also <literal>drd/tests/annotate_rwlock.c</literal> for an example.
1075    </para>
1076  </listitem>
1077  <listitem>
1078    <para>
1079      The macro <literal>ANNOTATE_RWLOCK_DESTROY(rwlock)</literal> tells DRD
1080      that the reader-writer synchronization object at
1081      address <literal>rwlock</literal> has been destroyed.
1082    </para>
1083  </listitem>
1084  <listitem>
1085    <para>
1086      The macro <literal>ANNOTATE_WRITERLOCK_ACQUIRED(rwlock)</literal> tells
1087      DRD that a writer lock has been acquired on the reader-writer
1088      synchronization object at address <literal>rwlock</literal>.
1089    </para>
1090  </listitem>
1091  <listitem>
1092    <para>
1093      The macro <literal>ANNOTATE_READERLOCK_ACQUIRED(rwlock)</literal> tells
1094      DRD that a reader lock has been acquired on the reader-writer
1095      synchronization object at address <literal>rwlock</literal>.
1096    </para>
1097  </listitem>
1098  <listitem>
1099    <para>
1100      The macro <literal>ANNOTATE_RWLOCK_ACQUIRED(rwlock, is_w)</literal>
1101      tells DRD that a writer lock (when <literal>is_w != 0</literal>) or that
1102      a reader lock (when <literal>is_w == 0</literal>) has been acquired on
1103      the reader-writer synchronization object at
1104      address <literal>rwlock</literal>.
1105    </para>
1106  </listitem>
1107  <listitem>
1108    <para>
1109      The macro <literal>ANNOTATE_WRITERLOCK_RELEASED(rwlock)</literal> tells
1110      DRD that a writer lock has been released on the reader-writer
1111      synchronization object at address <literal>rwlock</literal>.
1112    </para>
1113  </listitem>
1114  <listitem>
1115    <para>
1116      The macro <literal>ANNOTATE_READERLOCK_RELEASED(rwlock)</literal> tells
1117      DRD that a reader lock has been released on the reader-writer
1118      synchronization object at address <literal>rwlock</literal>.
1119    </para>
1120  </listitem>
1121  <listitem>
1122    <para>
1123      The macro <literal>ANNOTATE_RWLOCK_RELEASED(rwlock, is_w)</literal>
1124      tells DRD that a writer lock (when <literal>is_w != 0</literal>) or that
1125      a reader lock (when <literal>is_w == 0</literal>) has been released on
1126      the reader-writer synchronization object at
1127      address <literal>rwlock</literal>.
1128    </para>
1129  </listitem>
1130  <listitem>
1131    <para>
1132      The macro <literal>ANNOTATE_BARRIER_INIT(barrier, count,
1133      reinitialization_allowed)</literal> tells DRD that a new barrier object
1134      at the address <literal>barrier</literal> has been initialized,
1135      that <literal>count</literal> threads participate in each barrier and
1136      also whether or not barrier reinitialization without intervening
1137      destruction should be reported as an error. See
1138      also <literal>drd/tests/annotate_barrier.c</literal> for an example.
1139    </para>
1140  </listitem>
1141  <listitem>
1142    <para>
1143      The macro <literal>ANNOTATE_BARRIER_DESTROY(barrier)</literal>
1144      tells DRD that a barrier object is about to be destroyed.
1145    </para>
1146  </listitem>
1147  <listitem>
1148    <para>
1149      The macro <literal>ANNOTATE_BARRIER_WAIT_BEFORE(barrier)</literal>
1150      tells DRD that waiting for a barrier will start.
1151    </para>
1152  </listitem>
1153  <listitem>
1154    <para>
1155      The macro <literal>ANNOTATE_BARRIER_WAIT_AFTER(barrier)</literal>
1156      tells DRD that waiting for a barrier has finished.
1157    </para>
1158  </listitem>
1159  <listitem>
1160    <para>
1161      The macro <literal>ANNOTATE_BENIGN_RACE_SIZED(addr, size,
1162      descr)</literal> tells DRD that any races detected on the specified
1163      address are benign and hence should not be
1164      reported. The <literal>descr</literal> argument is ignored but can be
1165      used to document why data races on <literal>addr</literal> are benign.
1166    </para>
1167  </listitem>
1168  <listitem>
1169    <para>
1170      The macro <literal>ANNOTATE_BENIGN_RACE_STATIC(var, descr)</literal>
1171      tells DRD that any races detected on the specified static variable are
1172      benign and hence should not be reported. The <literal>descr</literal>
1173      argument is ignored but can be used to document why data races
1174      on <literal>var</literal> are benign. Note: this macro can only be
1175      used in C++ programs and not in C programs.
1176    </para>
1177  </listitem>
1178  <listitem>
1179    <para>
1180      The macro <literal>ANNOTATE_IGNORE_READS_BEGIN</literal> tells
1181      DRD to ignore all memory loads performed by the current thread.
1182    </para>
1183  </listitem>
1184  <listitem>
1185    <para>
1186      The macro <literal>ANNOTATE_IGNORE_READS_END</literal> tells
1187      DRD to stop ignoring the memory loads performed by the current thread.
1188    </para>
1189  </listitem>
1190  <listitem>
1191    <para>
1192      The macro <literal>ANNOTATE_IGNORE_WRITES_BEGIN</literal> tells
1193      DRD to ignore all memory stores performed by the current thread.
1194    </para>
1195  </listitem>
1196  <listitem>
1197    <para>
1198      The macro <literal>ANNOTATE_IGNORE_WRITES_END</literal> tells
1199      DRD to stop ignoring the memory stores performed by the current thread.
1200    </para>
1201  </listitem>
1202  <listitem>
1203    <para>
1204      The macro <literal>ANNOTATE_IGNORE_READS_AND_WRITES_BEGIN</literal> tells
1205      DRD to ignore all memory accesses performed by the current thread.
1206    </para>
1207  </listitem>
1208  <listitem>
1209    <para>
1210      The macro <literal>ANNOTATE_IGNORE_READS_AND_WRITES_END</literal> tells
1211      DRD to stop ignoring the memory accesses performed by the current thread.
1212    </para>
1213  </listitem>
1214  <listitem>
1215    <para>
1216      The macro <literal>ANNOTATE_NEW_MEMORY(addr, size)</literal> tells
1217      DRD that the specified memory range has been allocated by a custom
1218      memory allocator in the client program and that the client program
1219      will start using this memory range.
1220    </para>
1221  </listitem>
1222  <listitem>
1223    <para>
1224      The macro <literal>ANNOTATE_THREAD_NAME(name)</literal> tells DRD to
1225      associate the specified name with the current thread and to include this
1226      name in the error messages printed by DRD.
1227    </para>
1228  </listitem>
1229  <listitem>
1230    <para>
1231      The macros <literal>VALGRIND_MALLOCLIKE_BLOCK</literal> and
1232      <literal>VALGRIND_FREELIKE_BLOCK</literal> from the Valgrind core are
1233      implemented;  they are described in
1234      <xref linkend="manual-core-adv.clientreq"/>.
1235    </para>
1236  </listitem>
1237</itemizedlist>
1238</para>
1239
1240<para>
1241Note: if you compiled Valgrind yourself, the header file
1242<literal>&lt;valgrind/drd.h&gt;</literal> will have been installed in
1243the directory <literal>/usr/include</literal> by the command
1244<literal>make install</literal>. If you obtained Valgrind by
1245installing it as a package however, you will probably have to install
1246another package with a name like <literal>valgrind-devel</literal>
1247before Valgrind's header files are available.
1248</para>
1249
1250</sect2>
1251
1252
1253<sect2 id="drd-manual.gnome" xreflabel="GNOME">
1254<title>Debugging GNOME Programs</title>
1255
1256<para>
1257GNOME applications use the threading primitives provided by the
1258<computeroutput>glib</computeroutput> and
1259<computeroutput>gthread</computeroutput> libraries. These libraries
1260are built on top of POSIX threads, and hence are directly supported by
1261DRD. Please keep in mind that you have to call
1262<function>g_thread_init</function> before creating any threads, or
1263DRD will report several data races on glib functions. See also the
1264<ulink
1265url="http://library.gnome.org/devel/glib/stable/glib-Threads.html">GLib
1266Reference Manual</ulink> for more information about
1267<function>g_thread_init</function>.
1268</para>
1269
1270<para>
1271One of the many facilities provided by the <literal>glib</literal>
1272library is a block allocator, called <literal>g_slice</literal>. You
1273have to disable this block allocator when using DRD by adding the
1274following to the shell environment variables:
1275<literal>G_SLICE=always-malloc</literal>. See also the <ulink
1276url="http://library.gnome.org/devel/glib/stable/glib-Memory-Slices.html">GLib
1277Reference Manual</ulink> for more information.
1278</para>
1279
1280</sect2>
1281
1282
1283<sect2 id="drd-manual.boost.thread" xreflabel="Boost.Thread">
1284<title>Debugging Boost.Thread Programs</title>
1285
1286<para>
1287The Boost.Thread library is the threading library included with the
1288cross-platform Boost Libraries. This threading library is an early
1289implementation of the upcoming C++0x threading library.
1290</para>
1291
1292<para>
1293Applications that use the Boost.Thread library should run fine under DRD.
1294</para>
1295
1296<para>
1297More information about Boost.Thread can be found here:
1298<itemizedlist>
1299  <listitem>
1300    <para>
1301      Anthony Williams, <ulink
1302      url="http://www.boost.org/doc/libs/1_37_0/doc/html/thread.html">Boost.Thread</ulink>
1303      Library Documentation, Boost website, 2007.
1304    </para>
1305  </listitem>
1306  <listitem>
1307    <para>
1308      Anthony Williams, <ulink
1309      url="http://www.ddj.com/cpp/211600441">What's New in Boost
1310      Threads?</ulink>, Recent changes to the Boost Thread library,
1311      Dr. Dobbs Magazine, October 2008.
1312    </para>
1313  </listitem>
1314</itemizedlist>
1315</para>
1316
1317</sect2>
1318
1319
1320<sect2 id="drd-manual.openmp" xreflabel="OpenMP">
1321<title>Debugging OpenMP Programs</title>
1322
1323<para>
1324OpenMP stands for <emphasis>Open Multi-Processing</emphasis>. The OpenMP
1325standard consists of a set of compiler directives for C, C++ and Fortran
1326programs that allows a compiler to transform a sequential program into a
1327parallel program. OpenMP is well suited for HPC applications and allows to
1328work at a higher level compared to direct use of the POSIX threads API. While
1329OpenMP ensures that the POSIX API is used correctly, OpenMP programs can still
1330contain data races. So it definitely makes sense to verify OpenMP programs
1331with a thread checking tool.
1332</para>
1333
1334<para>
1335DRD supports OpenMP shared-memory programs generated by GCC. GCC
1336supports OpenMP since version 4.2.0.  GCC's runtime support
1337for OpenMP programs is provided by a library called
1338<literal>libgomp</literal>. The synchronization primitives implemented
1339in this library use Linux' futex system call directly, unless the
1340library has been configured with the
1341<literal>--disable-linux-futex</literal> option. DRD only supports
1342libgomp libraries that have been configured with this option and in
1343which symbol information is present. For most Linux distributions this
1344means that you will have to recompile GCC. See also the script
1345<literal>drd/scripts/download-and-build-gcc</literal> in the
1346Valgrind source tree for an example of how to compile GCC. You will
1347also have to make sure that the newly compiled
1348<literal>libgomp.so</literal> library is loaded when OpenMP programs
1349are started. This is possible by adding a line similar to the
1350following to your shell startup script:
1351</para>
1352<programlisting><![CDATA[
1353export LD_LIBRARY_PATH=~/gcc-4.4.0/lib64:~/gcc-4.4.0/lib:
1354]]></programlisting>
1355
1356<para>
1357As an example, the test OpenMP test program
1358<literal>drd/tests/omp_matinv</literal> triggers a data race
1359when the option -r has been specified on the command line. The data
1360race is triggered by the following code:
1361</para>
1362<programlisting><![CDATA[
1363#pragma omp parallel for private(j)
1364for (j = 0; j < rows; j++)
1365{
1366  if (i != j)
1367  {
1368    const elem_t factor = a[j * cols + i];
1369    for (k = 0; k < cols; k++)
1370    {
1371      a[j * cols + k] -= a[i * cols + k] * factor;
1372    }
1373  }
1374}
1375]]></programlisting>
1376
1377<para>
1378The above code is racy because the variable <literal>k</literal> has
1379not been declared private. DRD will print the following error message
1380for the above code:
1381</para>
1382<programlisting><![CDATA[
1383$ valgrind --tool=drd --check-stack-var=yes --read-var-info=yes drd/tests/omp_matinv 3 -t 2 -r
1384...
1385Conflicting store by thread 1/1 at 0x7fefffbc4 size 4
1386   at 0x4014A0: gj.omp_fn.0 (omp_matinv.c:203)
1387   by 0x401211: gj (omp_matinv.c:159)
1388   by 0x40166A: invert_matrix (omp_matinv.c:238)
1389   by 0x4019B4: main (omp_matinv.c:316)
1390Location 0x7fefffbc4 is 0 bytes inside local var "k"
1391declared at omp_matinv.c:160, in frame #0 of thread 1
1392...
1393]]></programlisting>
1394<para>
1395In the above output the function name <function>gj.omp_fn.0</function>
1396has been generated by GCC from the function name
1397<function>gj</function>. The allocation context information shows that the
1398data race has been caused by modifying the variable <literal>k</literal>.
1399</para>
1400
1401<para>
1402Note: for GCC versions before 4.4.0, no allocation context information is
1403shown. With these GCC versions the most usable information in the above output
1404is the source file name and the line number where the data race has been
1405detected (<literal>omp_matinv.c:203</literal>).
1406</para>
1407
1408<para>
1409For more information about OpenMP, see also
1410<ulink url="http://openmp.org/">openmp.org</ulink>.
1411</para>
1412
1413</sect2>
1414
1415
1416<sect2 id="drd-manual.cust-mem-alloc" xreflabel="Custom Memory Allocators">
1417<title>DRD and Custom Memory Allocators</title>
1418
1419<para>
1420DRD tracks all memory allocation events that happen via the
1421standard memory allocation and deallocation functions
1422(<function>malloc</function>, <function>free</function>,
1423<function>new</function> and <function>delete</function>), via entry
1424and exit of stack frames or that have been annotated with Valgrind's
1425memory pool client requests. DRD uses memory allocation and deallocation
1426information for two purposes:
1427<itemizedlist>
1428  <listitem>
1429    <para>
1430      To know where the scope ends of POSIX objects that have not been
1431      destroyed explicitly. It is e.g. not required by the POSIX
1432      threads standard to call
1433      <function>pthread_mutex_destroy</function> before freeing the
1434      memory in which a mutex object resides.
1435    </para>
1436  </listitem>
1437  <listitem>
1438    <para>
1439      To know where the scope of variables ends. If e.g. heap memory
1440      has been used by one thread, that thread frees that memory, and
1441      another thread allocates and starts using that memory, no data
1442      races must be reported for that memory.
1443    </para>
1444  </listitem>
1445</itemizedlist>
1446</para>
1447
1448<para>
1449It is essential for correct operation of DRD that the tool knows about
1450memory allocation and deallocation events. When analyzing a client program
1451with DRD that uses a custom memory allocator, either instrument the custom
1452memory allocator with the <literal>VALGRIND_MALLOCLIKE_BLOCK</literal>
1453and <literal>VALGRIND_FREELIKE_BLOCK</literal> macros or disable the
1454custom memory allocator.
1455</para>
1456
1457<para>
1458As an example, the GNU libstdc++ library can be configured
1459to use standard memory allocation functions instead of memory pools by
1460setting the environment variable
1461<literal>GLIBCXX_FORCE_NEW</literal>. For more information, see also
1462the <ulink
1463url="http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt04ch11.html">libstdc++
1464manual</ulink>.
1465</para>
1466
1467</sect2>
1468
1469
1470<sect2 id="drd-manual.drd-versus-memcheck" xreflabel="DRD Versus Memcheck">
1471<title>DRD Versus Memcheck</title>
1472
1473<para>
1474It is essential for correct operation of DRD that there are no memory
1475errors such as dangling pointers in the client program. Which means that
1476it is a good idea to make sure that your program is Memcheck-clean
1477before you analyze it with DRD. It is possible however that some of
1478the Memcheck reports are caused by data races. In this case it makes
1479sense to run DRD before Memcheck.
1480</para>
1481
1482<para>
1483So which tool should be run first? In case both DRD and Memcheck
1484complain about a program, a possible approach is to run both tools
1485alternatingly and to fix as many errors as possible after each run of
1486each tool until none of the two tools prints any more error messages.
1487</para>
1488
1489</sect2>
1490
1491
1492<sect2 id="drd-manual.resource-requirements" xreflabel="Resource Requirements">
1493<title>Resource Requirements</title>
1494
1495<para>
1496The requirements of DRD with regard to heap and stack memory and the
1497effect on the execution time of client programs are as follows:
1498<itemizedlist>
1499  <listitem>
1500    <para>
1501      When running a program under DRD with default DRD options,
1502      between 1.1 and 3.6 times more memory will be needed compared to
1503      a native run of the client program. More memory will be needed
1504      if loading debug information has been enabled
1505      (<literal>--read-var-info=yes</literal>).
1506    </para>
1507  </listitem>
1508  <listitem>
1509    <para>
1510      DRD allocates some of its temporary data structures on the stack
1511      of the client program threads. This amount of data is limited to
1512      1 - 2 KB. Make sure that thread stacks are sufficiently large.
1513    </para>
1514  </listitem>
1515  <listitem>
1516    <para>
1517      Most applications will run between 20 and 50 times slower under
1518      DRD than a native single-threaded run. The slowdown will be most
1519      noticeable for applications which perform frequent mutex lock /
1520      unlock operations.
1521    </para>
1522  </listitem>
1523</itemizedlist>
1524</para>
1525
1526</sect2>
1527
1528
1529<sect2 id="drd-manual.effective-use" xreflabel="Effective Use">
1530<title>Hints and Tips for Effective Use of DRD</title>
1531
1532<para>
1533The following information may be helpful when using DRD:
1534<itemizedlist>
1535  <listitem>
1536    <para>
1537      Make sure that debug information is present in the executable
1538      being analyzed, such that DRD can print function name and line
1539      number information in stack traces. Most compilers can be told
1540      to include debug information via compiler option
1541      <option>-g</option>.
1542    </para>
1543  </listitem>
1544  <listitem>
1545    <para>
1546      Compile with option <option>-O1</option> instead of
1547      <option>-O0</option>. This will reduce the amount of generated
1548      code, may reduce the amount of debug info and will speed up
1549      DRD's processing of the client program. For more information,
1550      see also <xref linkend="manual-core.started"/>.
1551    </para>
1552  </listitem>
1553  <listitem>
1554    <para>
1555      If DRD reports any errors on libraries that are part of your
1556      Linux distribution like e.g. <literal>libc.so</literal> or
1557      <literal>libstdc++.so</literal>, installing the debug packages
1558      for these libraries will make the output of DRD a lot more
1559      detailed.
1560    </para>
1561  </listitem>
1562  <listitem>
1563    <para>
1564      When using C++, do not send output from more than one thread to
1565      <literal>std::cout</literal>. Doing so would not only
1566      generate multiple data race reports, it could also result in
1567      output from several threads getting mixed up.  Either use
1568      <function>printf</function> or do the following:
1569      <orderedlist>
1570        <listitem>
1571          <para>Derive a class from <literal>std::ostreambuf</literal>
1572          and let that class send output line by line to
1573          <literal>stdout</literal>. This will avoid that individual
1574          lines of text produced by different threads get mixed
1575          up.</para>
1576        </listitem>
1577        <listitem>
1578          <para>Create one instance of <literal>std::ostream</literal>
1579          for each thread. This makes stream formatting settings
1580          thread-local. Pass a per-thread instance of the class
1581          derived from <literal>std::ostreambuf</literal> to the
1582          constructor of each instance. </para>
1583        </listitem>
1584        <listitem>
1585          <para>Let each thread send its output to its own instance of
1586          <literal>std::ostream</literal> instead of
1587          <literal>std::cout</literal>.</para>
1588        </listitem>
1589      </orderedlist>
1590    </para>
1591  </listitem>
1592</itemizedlist>
1593</para>
1594
1595</sect2>
1596
1597
1598</sect1>
1599
1600
1601<sect1 id="drd-manual.Pthreads" xreflabel="Pthreads">
1602<title>Using the POSIX Threads API Effectively</title>
1603
1604<sect2 id="drd-manual.mutex-types" xreflabel="mutex-types">
1605<title>Mutex types</title>
1606
1607<para>
1608The Single UNIX Specification version two defines the following four
1609mutex types (see also the documentation of <ulink
1610url="http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_mutexattr_settype.html"><function>pthread_mutexattr_settype</function></ulink>):
1611<itemizedlist>
1612  <listitem>
1613    <para>
1614      <emphasis>normal</emphasis>, which means that no error checking
1615      is performed, and that the mutex is non-recursive.
1616    </para>
1617  </listitem>
1618  <listitem>
1619    <para>
1620      <emphasis>error checking</emphasis>, which means that the mutex
1621      is non-recursive and that error checking is performed.
1622    </para>
1623  </listitem>
1624  <listitem>
1625    <para>
1626      <emphasis>recursive</emphasis>, which means that a mutex may be
1627      locked recursively.
1628    </para>
1629  </listitem>
1630  <listitem>
1631    <para>
1632      <emphasis>default</emphasis>, which means that error checking
1633      behavior is undefined, and that the behavior for recursive
1634      locking is also undefined. Or: portable code must neither
1635      trigger error conditions through the Pthreads API nor attempt to
1636      lock a mutex of default type recursively.
1637    </para>
1638  </listitem>
1639</itemizedlist>
1640</para>
1641
1642<para>
1643In complex applications it is not always clear from beforehand which
1644mutex will be locked recursively and which mutex will not be locked
1645recursively. Attempts lock a non-recursive mutex recursively will
1646result in race conditions that are very hard to find without a thread
1647checking tool. So either use the error checking mutex type and
1648consistently check the return value of Pthread API mutex calls, or use
1649the recursive mutex type.
1650</para>
1651
1652</sect2>
1653
1654<sect2 id="drd-manual.condvar" xreflabel="condition-variables">
1655<title>Condition variables</title>
1656
1657<para>
1658A condition variable allows one thread to wake up one or more other
1659threads. Condition variables are often used to notify one or more
1660threads about state changes of shared data. Unfortunately it is very
1661easy to introduce race conditions by using condition variables as the
1662only means of state information propagation. A better approach is to
1663let threads poll for changes of a state variable that is protected by
1664a mutex, and to use condition variables only as a thread wakeup
1665mechanism. See also the source file
1666<computeroutput>drd/tests/monitor_example.cpp</computeroutput> for an
1667example of how to implement this concept in C++. The monitor concept
1668used in this example is a well known and very useful concept -- see
1669also Wikipedia for more information about the <ulink
1670url="http://en.wikipedia.org/wiki/Monitor_(synchronization)">monitor</ulink>
1671concept.
1672</para>
1673
1674</sect2>
1675
1676<sect2 id="drd-manual.pctw" xreflabel="pthread_cond_timedwait">
1677<title>pthread_cond_timedwait and timeouts</title>
1678
1679<para>
1680Historically the function
1681<function>pthread_cond_timedwait</function> only allowed the
1682specification of an absolute timeout, that is a timeout independent of
1683the time when this function was called. However, almost every call to
1684this function expresses a relative timeout. This typically happens by
1685passing the sum of
1686<computeroutput>clock_gettime(CLOCK_REALTIME)</computeroutput> and a
1687relative timeout as the third argument. This approach is incorrect
1688since forward or backward clock adjustments by e.g. ntpd will affect
1689the timeout. A more reliable approach is as follows:
1690<itemizedlist>
1691  <listitem>
1692    <para>
1693      When initializing a condition variable through
1694      <function>pthread_cond_init</function>, specify that the timeout of
1695      <function>pthread_cond_timedwait</function> will use the clock
1696      <literal>CLOCK_MONOTONIC</literal> instead of
1697      <literal>CLOCK_REALTIME</literal>. You can do this via
1698      <computeroutput>pthread_condattr_setclock(...,
1699      CLOCK_MONOTONIC)</computeroutput>.
1700    </para>
1701  </listitem>
1702  <listitem>
1703    <para>
1704      When calling <function>pthread_cond_timedwait</function>, pass
1705      the sum of
1706      <computeroutput>clock_gettime(CLOCK_MONOTONIC)</computeroutput>
1707      and a relative timeout as the third argument.
1708    </para>
1709  </listitem>
1710</itemizedlist>
1711See also
1712<computeroutput>drd/tests/monitor_example.cpp</computeroutput> for an
1713example.
1714</para>
1715
1716</sect2>
1717
1718</sect1>
1719
1720
1721<sect1 id="drd-manual.limitations" xreflabel="Limitations">
1722<title>Limitations</title>
1723
1724<para>DRD currently has the following limitations:</para>
1725
1726<itemizedlist>
1727  <listitem>
1728    <para>
1729      DRD, just like Memcheck, will refuse to start on Linux
1730      distributions where all symbol information has been removed from
1731      <filename>ld.so</filename>. This is e.g. the case for the PPC editions
1732      of openSUSE and Gentoo. You will have to install the glibc debuginfo
1733      package on these platforms before you can use DRD. See also openSUSE
1734      bug <ulink url="http://bugzilla.novell.com/show_bug.cgi?id=396197">
1735      396197</ulink> and Gentoo bug <ulink
1736      url="http://bugs.gentoo.org/214065">214065</ulink>.
1737    </para>
1738  </listitem>
1739  <listitem>
1740    <para>
1741      With gcc 4.4.3 and before, DRD may report data races on the C++
1742      class <literal>std::string</literal> in a multithreaded program. This is
1743      a know <literal>libstdc++</literal> issue -- see also GCC bug
1744      <ulink url="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40518">40518</ulink>
1745      for more information.
1746    </para>
1747  </listitem>
1748  <listitem>
1749    <para>
1750      If you compile the DRD source code yourself, you need GCC 3.0 or
1751      later. GCC 2.95 is not supported.
1752    </para>
1753  </listitem>
1754  <listitem>
1755    <para>
1756      Of the two POSIX threads implementations for Linux, only the
1757      NPTL (Native POSIX Thread Library) is supported. The older
1758      LinuxThreads library is not supported.
1759    </para>
1760  </listitem>
1761</itemizedlist>
1762
1763</sect1>
1764
1765
1766<sect1 id="drd-manual.feedback" xreflabel="Feedback">
1767<title>Feedback</title>
1768
1769<para>
1770If you have any comments, suggestions, feedback or bug reports about
1771DRD, feel free to either post a message on the Valgrind users mailing
1772list or to file a bug report. See also <ulink
1773url="&vg-url;">&vg-url;</ulink> for more information.
1774</para>
1775
1776</sect1>
1777
1778
1779</chapter>
1780