• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
7<chapter id="dh-manual"
8         xreflabel="DHAT: a dynamic heap analysis tool">
9  <title>DHAT: a dynamic heap analysis tool</title>
10
11<para>To use this tool, you must specify
12<option>--tool=exp-dhat</option> on the Valgrind
13command line.</para>
14
15
16
17<sect1 id="dh-manual.overview" xreflabel="Overview">
18<title>Overview</title>
19
20<para>DHAT is a tool for examining how programs use their heap
21allocations.</para>
22
23<para>It tracks the allocated blocks, and inspects every memory access
24to find which block, if any, it is to.  The following data is
25collected and presented per allocation point (allocation
26stack):</para>
27
28<itemizedlist>
29  <listitem><para>Total allocation (number of bytes and
30  blocks)</para></listitem>
31
32  <listitem><para>maximum live volume (number of bytes and
33  blocks)</para></listitem>
34
35  <listitem><para>average block lifetime (number of instructions
36   between allocation and freeing)</para></listitem>
37
38  <listitem><para>average number of reads and writes to each byte in
39   the block ("access ratios")</para></listitem>
40
41  <listitem><para>for allocation points which always allocate blocks
42   only of one size, and that size is 4096 bytes or less: counts
43   showing how often each byte offset inside the block is
44   accessed.</para></listitem>
45</itemizedlist>
46
47<para>Using these statistics it is possible to identify allocation
48points with the following characteristics:</para>
49
50<itemizedlist>
51
52  <listitem><para>potential process-lifetime leaks: blocks allocated
53   by the point just accumulate, and are freed only at the end of the
54   run.</para></listitem>
55
56 <listitem><para>excessive turnover: points which chew through a lot
57  of heap, even if it is not held onto for very long</para></listitem>
58
59 <listitem><para>excessively transient: points which allocate very
60 short lived blocks</para></listitem>
61
62 <listitem><para>useless or underused allocations: blocks which are
63  allocated but not completely filled in, or are filled in but not
64  subsequently read.</para></listitem>
65
66 <listitem><para>blocks with inefficient layout -- areas never
67  accessed, or with hot fields scattered throughout the
68  block.</para></listitem>
69</itemizedlist>
70
71<para>As with the Massif heap profiler, DHAT measures program progress
72by counting instructions, and so presents all age/time related figures
73as instruction counts.  This sounds a little odd at first, but it
74makes runs repeatable in a way which is not possible if CPU time is
75used.</para>
76
77</sect1>
78
79
80
81
82<sect1 id="dh-manual.understanding" xreflabel="Understanding DHAT's output">
83<title>Understanding DHAT's output</title>
84
85
86<para>DHAT provides a lot of useful information on dynamic heap usage.
87Most of the art of using it is in interpretation of the resulting
88numbers.  That is best illustrated via a set of examples.</para>
89
90
91<sect2>
92<title>Interpreting the max-live, tot-alloc and deaths fields</title>
93
94<sect3><title>A simple example</title></sect3>
95
96<screen><![CDATA[
97   ======== SUMMARY STATISTICS ========
98
99   guest_insns:  1,045,339,534
100   [...]
101   max-live:    63,490 in 984 blocks
102   tot-alloc:   1,904,700 in 29,520 blocks (avg size 64.52)
103   deaths:      29,520, at avg age 22,227,424
104   acc-ratios:  6.37 rd, 1.14 wr  (12,141,526 b-read, 2,174,460 b-written)
105      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
106      by 0x40350E: tcc_malloc (tinycc.c:6712)
107      by 0x404580: tok_alloc_new (tinycc.c:7151)
108      by 0x40870A: next_nomacro1 (tinycc.c:9305)
109]]></screen>
110
111<para>Over the entire run of the program, this stack (allocation
112point) allocated 29,520 blocks in total, containing 1,904,700 bytes in
113total.  By looking at the max-live data, we see that not many blocks
114were simultaneously live, though: at the peak, there were 63,490
115allocated bytes in 984 blocks.  This tells us that the program is
116steadily freeing such blocks as it runs, rather than hanging on to all
117of them until the end and freeing them all.</para>
118
119<para>The deaths entry tells us that 29,520 blocks allocated by this stack
120died (were freed) during the run of the program.  Since 29,520 is
121also the number of blocks allocated in total, that tells us that
122all allocated blocks were freed by the end of the program.</para>
123
124<para>It also tells us that the average age at death was 22,227,424
125instructions.  From the summary statistics we see that the program ran
126for 1,045,339,534 instructions, and so the average age at death is
127about 2% of the program's total run time.</para>
128
129<sect3><title>Example of a potential process-lifetime leak</title></sect3>
130
131<para>This next example (from a different program than the above)
132shows a potential process lifetime leak.  A process lifetime leak
133occurs when a program keeps allocating data, but only frees the
134data just before it exits.  Hence the program's heap grows constantly
135in size, yet Memcheck reports no leak, because the program has
136freed up everything at exit.  This is particularly a hazard for
137long running programs.</para>
138
139<screen><![CDATA[
140   ======== SUMMARY STATISTICS ========
141
142   guest_insns:  418,901,537
143   [...]
144   max-live:    32,512 in 254 blocks
145   tot-alloc:   32,512 in 254 blocks (avg size 128.00)
146   deaths:      254, at avg age 300,467,389
147   acc-ratios:  0.26 rd, 0.20 wr  (8,756 b-read, 6,604 b-written)
148      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
149      by 0x4C27632: realloc (vg_replace_malloc.c:525)
150      by 0x56FF41D: QtFontStyle::pixelSize(unsigned short, bool) (qfontdatabase.cpp:269)
151      by 0x5700D69: loadFontConfig() (qfontdatabase_x11.cpp:1146)
152]]></screen>
153
154<para>There are two tell-tale signs that this might be a
155process-lifetime leak.  Firstly, the max-live and tot-alloc numbers
156are identical.  The only way that can happen is if these blocks are
157all allocated and then all deallocated.</para>
158
159<para>Secondly, the average age at death (300 million insns) is 71% of
160the total program lifetime (419 million insns), hence this is not a
161transient allocation-free spike -- rather, it is spread out over a
162large part of the entire run.  One interpretation is, roughly, that
163all 254 blocks were allocated in the first half of the run, held onto
164for the second half, and then freed just before exit.</para>
165
166</sect2>
167
168
169<sect2>
170<title>Interpreting the acc-ratios fields</title>
171
172
173<sect3><title>A fairly harmless allocation point record</title></sect3>
174
175<screen><![CDATA[
176   max-live:    49,398 in 808 blocks
177   tot-alloc:   1,481,940 in 24,240 blocks (avg size 61.13)
178   deaths:      24,240, at avg age 34,611,026
179   acc-ratios:  2.13 rd, 0.91 wr  (3,166,650 b-read, 1,358,820 b-written)
180      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
181      by 0x40350E: tcc_malloc (tinycc.c:6712)
182      by 0x404580: tok_alloc_new (tinycc.c:7151)
183      by 0x4046C4: tok_alloc (tinycc.c:7190)
184]]></screen>
185
186<para>The acc-ratios field tells us that each byte in the blocks
187allocated here is read an average of 2.13 times before the block is
188deallocated.  Given that the blocks have an average age at death of
18934,611,026, that's one read per block per approximately every 15
190million instructions.  So from that standpoint the blocks aren't
191"working" very hard.</para>
192
193<para>More interesting is the write ratio: each byte is written an
194average of 0.91 times.  This tells us that some parts of the allocated
195blocks are never written, at least 9% on average.  To completely
196initialise the block would require writing each byte at least once,
197and that would give a write ratio of 1.0.  The fact that some block
198areas are evidently unused might point to data alignment holes or
199other layout inefficiencies.</para>
200
201<para>Well, at least all the blocks are freed (24,240 allocations,
20224,240 deaths).</para>
203
204<para>If all the blocks had been the same size, DHAT would also show
205the access counts by block offset, so we could see where exactly these
206unused areas are.  However, that isn't the case: the blocks have
207varying sizes, so DHAT can't perform such an analysis.  We can see
208that they must have varying sizes since the average block size, 61.13,
209isn't a whole number.</para>
210
211
212<sect3><title>A more suspicious looking example</title></sect3>
213
214<screen><![CDATA[
215   max-live:    180,224 in 22 blocks
216   tot-alloc:   180,224 in 22 blocks (avg size 8192.00)
217   deaths:      none (none of these blocks were freed)
218   acc-ratios:  0.00 rd, 0.00 wr  (0 b-read, 0 b-written)
219      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
220      by 0x40350E: tcc_malloc (tinycc.c:6712)
221      by 0x40369C: __sym_malloc (tinycc.c:6787)
222      by 0x403711: sym_malloc (tinycc.c:6805)
223]]></screen>
224
225<para>Here, both the read and write access ratios are zero.  Hence
226this point is allocating blocks which are never used, neither read nor
227written.  Indeed, they are also not freed ("deaths: none") and are
228simply leaked.  So, here is 180k of completely useless allocation that
229could be removed.</para>
230
231<para>Re-running with Memcheck does indeed report the same leak.  What
232DHAT can tell us, that Memcheck can't, is that not only are the blocks
233leaked, they are also never used.</para>
234
235<sect3><title>Another suspicious example</title></sect3>
236
237<para>Here's one where blocks are allocated, written to,
238but never read from.  We see this immediately from the zero read
239access ratio.  They do get freed, though:</para>
240
241<screen><![CDATA[
242   max-live:    54 in 3 blocks
243   tot-alloc:   1,620 in 90 blocks (avg size 18.00)
244   deaths:      90, at avg age 34,558,236
245   acc-ratios:  0.00 rd, 1.11 wr  (0 b-read, 1,800 b-written)
246      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
247      by 0x40350E: tcc_malloc (tinycc.c:6712)
248      by 0x4035BD: tcc_strdup (tinycc.c:6750)
249      by 0x41FEBB: tcc_add_sysinclude_path (tinycc.c:20931)
250]]></screen>
251
252<para>In the previous two examples, it is easy to see blocks that are
253never written to, or never read from, or some combination of both.
254Unfortunately, in C++ code, the situation is less clear.  That's
255because an object's constructor will write to the underlying block,
256and its destructor will read from it.  So the block's read and write
257ratios will be non-zero even if the object, once constructed, is never
258used, but only eventually destructed.</para>
259
260<para>Really, what we want is to measure only memory accesses in
261between the end of an object's construction and the start of its
262destruction.  Unfortunately I do not know of a reliable way to
263determine when those transitions are made.</para>
264
265
266</sect2>
267
268<sect2>
269<title>Interpreting "Aggregated access counts by offset" data</title>
270
271<para>For allocation points that always allocate blocks of the same
272size, and which are 4096 bytes or smaller, DHAT counts accesses
273per offset, for example:</para>
274
275<screen><![CDATA[
276   max-live:    317,408 in 5,668 blocks
277   tot-alloc:   317,408 in 5,668 blocks (avg size 56.00)
278   deaths:      5,668, at avg age 622,890,597
279   acc-ratios:  1.03 rd, 1.28 wr  (327,642 b-read, 408,172 b-written)
280      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
281      by 0x5440C16: QDesignerPropertySheetPrivate::ensureInfo (qhash.h:515)
282      by 0x544350B: QDesignerPropertySheet::setVisible (qdesigner_propertysh...)
283      by 0x5446232: QDesignerPropertySheet::QDesignerPropertySheet (qdesigne...)
284
285   Aggregated access counts by offset:
286
287   [   0]  28782 28782 28782 28782 28782 28782 28782 28782
288   [   8]  20638 20638 20638 20638 0 0 0 0
289   [  16]  22738 22738 22738 22738 22738 22738 22738 22738
290   [  24]  6013 6013 6013 6013 6013 6013 6013 6013
291   [  32]  18883 18883 18883 37422 0 0 0 0
292   [  36]  5668 11915 5668 5668 11336 11336 11336 11336
293   [  48]  6166 6166 6166 6166 0 0 0 0
294]]></screen>
295
296<para>This is fairly typical, for C++ code running on a 64-bit
297platform.  Here, we have aggregated access statistics for 5668 blocks,
298all of size 56 bytes.  Each byte has been accessed at least 5668
299times, except for offsets 12--15, 36--39 and 52--55.  These are likely
300to be alignment holes.</para>
301
302<para>Careful interpretation of the numbers reveals useful information.
303Groups of N consecutive identical numbers that begin at an N-aligned
304offset, for N being 2, 4 or 8, are likely to indicate an N-byte object
305in the structure at that point.  For example, the first 32 bytes of
306this object are likely to have the layout</para>
307
308<screen><![CDATA[
309   [0 ]  64-bit type
310   [8 ]  32-bit type
311   [12]  32-bit alignment hole
312   [16]  64-bit type
313   [24]  64-bit type
314]]></screen>
315
316<para>As a counterexample, it's also clear that, whatever is at offset 32,
317it is not a 32-bit value.  That's because the last number of the group
318(37422) is not the same as the first three (18883 18883 18883).</para>
319
320<para>This example leads one to enquire (by reading the source code)
321whether the zeroes at 12--15 and 52--55 are alignment holes, and
322whether 48--51 is indeed a 32-bit type.  If so, it might be possible
323to place what's at 48--51 at 12--15 instead, which would reduce
324the object size from 56 to 48 bytes.</para>
325
326<para>Bear in mind that the above inferences are all only "maybes".  That's
327because they are based on dynamic data, not static analysis of the
328object layout.  For example, the zeroes might not be alignment
329holes, but rather just parts of the structure which were not used
330at all for this particular run.  Experience shows that's unlikely
331to be the case, but it could happen.</para>
332
333</sect2>
334
335</sect1>
336
337
338
339
340
341
342
343<sect1 id="dh-manual.options" xreflabel="DHAT Command-line Options">
344<title>DHAT Command-line Options</title>
345
346<para>DHAT-specific command-line options are:</para>
347
348<!-- start of xi:include in the manpage -->
349<variablelist id="dh.opts.list">
350
351  <varlistentry id="opt.show-top-n" xreflabel="--show-top-n">
352    <term>
353      <option><![CDATA[--show-top-n=<number>
354      [default: 10] ]]></option>
355    </term>
356    <listitem>
357      <para>At the end of the run, DHAT sorts the accumulated
358       allocation points according to some metric, and shows the
359       highest scoring entries.  <varname>--show-top-n</varname>
360       controls how many entries are shown.  The default of 10 is
361       quite small.  For realistic applications you will probably need
362       to set it much higher, at least several hundred.</para>
363    </listitem>
364  </varlistentry>
365
366  <varlistentry id="opt.sort-by" xreflabel="--sort-by=string">
367    <term>
368      <option><![CDATA[--sort-by=<string> [default: max-bytes-live] ]]></option>
369    </term>
370    <listitem>
371      <para>At the end of the run, DHAT sorts the accumulated
372       allocation points according to some metric, and shows the
373       highest scoring entries.  <varname>--sort-by</varname>
374       selects the metric used for sorting:</para>
375      <para><varname>max-bytes-live   </varname>  maximum live bytes [default]</para>
376      <para><varname>tot-bytes-allocd </varname>  bytes allocates in total (turnover)</para>
377      <para><varname>max-blocks-live  </varname>  maximum live blocks</para>
378      <para><varname>tot-blocks-allocd </varname> blocks allocated in total (turnover)</para>
379      <para>This controls the order in which allocation points are
380       displayed.  You can choose to look at allocation points with
381       the highest number of live bytes, or the highest total byte turnover, or
382       by the highest number of live blocks, or the highest total block
383       turnover.  These give usefully different pictures of program behaviour.
384       For example, sorting by maximum live blocks tends to show up allocation
385       points creating large numbers of small objects.</para>
386    </listitem>
387  </varlistentry>
388
389</variablelist>
390
391<para>One important point to note is that each allocation stack counts
392as a separate allocation point.  Because stacks by default have 12
393frames, this tends to spread data out over multiple allocation points.
394You may want to use the flag --num-callers=4 or some such small
395number, to reduce the spreading.</para>
396
397<!-- end of xi:include in the manpage -->
398
399</sect1>
400
401</chapter>
402