• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
2<HTML>
3
4<HEAD>
5  <link rel="stylesheet" href="designstyle.css">
6  <title>Gperftools Heap Profiler</title>
7</HEAD>
8
9<BODY>
10
11<p align=right>
12  <i>Last modified
13  <script type=text/javascript>
14    var lm = new Date(document.lastModified);
15    document.write(lm.toDateString());
16  </script></i>
17</p>
18
19<p>This is the heap profiler we use at Google, to explore how C++
20programs manage memory.  This facility can be useful for</p>
21<ul>
22  <li> Figuring out what is in the program heap at any given time
23  <li> Locating memory leaks
24  <li> Finding places that do a lot of allocation
25</ul>
26
27<p>The profiling system instruments all allocations and frees.  It
28keeps track of various pieces of information per allocation site.  An
29allocation site is defined as the active stack trace at the call to
30<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
31<code>new</code>.</p>
32
33<p>There are three parts to using it: linking the library into an
34application, running the code, and analyzing the output.</p>
35
36
37<h1>Linking in the Library</h1>
38
39<p>To install the heap profiler into your executable, add
40<code>-ltcmalloc</code> to the link-time step for your executable.
41Also, while we don't necessarily recommend this form of usage, it's
42possible to add in the profiler at run-time using
43<code>LD_PRELOAD</code>:
44<pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" &lt;binary&gt;</pre>
45
46<p>This does <i>not</i> turn on heap profiling; it just inserts the
47code.  For that reason, it's practical to just always link
48<code>-ltcmalloc</code> into a binary while developing; that's what we
49do at Google.  (However, since any user can turn on the profiler by
50setting an environment variable, it's not necessarily recommended to
51install profiler-linked binaries into a production, running
52system.)  Note that if you wish to use the heap profiler, you must
53also use the tcmalloc memory-allocation library.  There is no way
54currently to use the heap profiler separate from tcmalloc.</p>
55
56
57<h1>Running the Code</h1>
58
59<p>There are several alternatives to actually turn on heap profiling
60for a given run of an executable:</p>
61
62<ol>
63  <li> <p>Define the environment variable HEAPPROFILE to the filename
64       to dump the profile to.  For instance, to profile
65       <code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p>
66       <pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre>
67  <li> <p>In your code, bracket the code you want profiled in calls to
68       <code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>.
69       (These functions are declared in <code>&lt;gperftools/heap-profiler.h&gt;</code>.)
70       <code>HeapProfilerStart()</code> will take the
71       profile-filename-prefix as an argument.  Then, as often as
72       you'd like before calling <code>HeapProfilerStop()</code>, you
73       can use <code>HeapProfilerDump()</code> or
74       <code>GetHeapProfile()</code> to examine the profile.  In case
75       it's useful, <code>IsHeapProfilerRunning()</code> will tell you
76       whether you've already called HeapProfilerStart() or not.</p>
77</ol>
78
79
80<p>For security reasons, heap profiling will not write to a file --
81and is thus not usable -- for setuid programs.</p>
82
83<H2>Modifying Runtime Behavior</H2>
84
85<p>You can more finely control the behavior of the heap profiler via
86environment variables.</p>
87
88<table frame=box rules=sides cellpadding=5 width=100%>
89
90<tr valign=top>
91  <td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td>
92  <td>default: 1073741824 (1 Gb)</td>
93  <td>
94    Dump heap profiling information once every specified number of
95    bytes has been allocated by the program.
96  </td>
97</tr>
98
99<tr valign=top>
100  <td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td>
101  <td>default: 104857600 (100 Mb)</td>
102  <td>
103    Dump heap profiling information whenever the high-water memory
104    usage mark increases by the specified number of bytes.
105  </td>
106</tr>
107
108<tr valign=top>
109  <td><code>HEAP_PROFILE_MMAP</code></td>
110  <td>default: false</td>
111  <td>
112    Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code>
113    calls in addition
114    to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
115    and <code>new</code>.  <b>NOTE:</b> this causes the profiler to
116    profile calls internal to tcmalloc, since tcmalloc and friends use
117    mmap and sbrk internally for allocations.  One partial solution is
118    to filter these allocations out when running <code>pprof</code>,
119    with something like
120    <code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>.
121  </td>
122</tr>
123
124<tr valign=top>
125  <td><code>HEAP_PROFILE_MMAP_ONLY</code></td>
126  <td>default: false</td>
127  <td>
128    Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code>
129    calls; do not profile
130    <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
131    or <code>new</code>.
132  </td>
133</tr>
134
135<tr valign=top>
136  <td><code>HEAP_PROFILE_MMAP_LOG</code></td>
137  <td>default: false</td>
138  <td>
139    Log <code>mmap</code>/<code>munmap</code> calls.
140  </td>
141</tr>
142
143</table>
144
145<H2>Checking for Leaks</H2>
146
147<p>You can use the heap profiler to manually check for leaks, for
148instance by reading the profiler output and looking for large
149allocations.  However, for that task, it's easier to use the <A
150HREF="heap_checker.html">automatic heap-checking facility</A> built
151into tcmalloc.</p>
152
153
154<h1><a name="pprof">Analyzing the Output</a></h1>
155
156<p>If heap-profiling is turned on in a program, the program will
157periodically write profiles to the filesystem.  The sequence of
158profiles will be named:</p>
159<pre>
160           &lt;prefix&gt;.0000.heap
161           &lt;prefix&gt;.0001.heap
162           &lt;prefix&gt;.0002.heap
163           ...
164</pre>
165<p>where <code>&lt;prefix&gt;</code> is the filename-prefix supplied
166when running the code (e.g. via the <code>HEAPPROFILE</code>
167environment variable).  Note that if the supplied prefix
168does not start with a <code>/</code>, the profile files will be
169written to the program's working directory.</p>
170
171<p>The profile output can be viewed by passing it to the
172<code>pprof</code> tool -- the same tool that's used to analyze <A
173HREF="cpuprofile.html">CPU profiles</A>.
174
175<p>Here are some examples.  These examples assume the binary is named
176<code>gfs_master</code>, and a sequence of heap profile files can be
177found in files named:</p>
178<pre>
179  /tmp/profile.0001.heap
180  /tmp/profile.0002.heap
181  ...
182  /tmp/profile.0100.heap
183</pre>
184
185<h3>Why is a process so big</h3>
186
187<pre>
188    % pprof --gv gfs_master /tmp/profile.0100.heap
189</pre>
190
191<p>This command will pop-up a <code>gv</code> window that displays
192the profile information as a directed graph.  Here is a portion
193of the resulting output:</p>
194
195<p><center>
196<img src="heap-example1.png">
197</center></p>
198
199A few explanations:
200<ul>
201<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
202     of the live memory, which is 25% of the total live memory.
203<li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
204     accountable for 176.2 MB of the live memory (i.e., it directly
205     allocated 176.2 MB that has not been freed yet).  Furthermore,
206     it and its callees are responsible for 729.9 MB.  The
207     labels on the outgoing edges give a good indication of the
208     amount allocated by each callee.
209</ul>
210
211<h3>Comparing Profiles</h3>
212
213<p>You often want to skip allocations during the initialization phase
214of a program so you can find gradual memory leaks.  One simple way to
215do this is to compare two profiles -- both collected after the program
216has been running for a while.  Specify the name of the first profile
217using the <code>--base</code> option.  For example:</p>
218<pre>
219   % pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap
220</pre>
221
222<p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be
223subtracted from the memory-usage in
224<code>/tmp/profile.0100.heap</code> and the result will be
225displayed.</p>
226
227<h3>Text display</h3>
228
229<pre>
230% pprof --text gfs_master /tmp/profile.0100.heap
231   255.6  24.7%  24.7%    255.6  24.7% GFS_MasterChunk::AddServer
232   184.6  17.8%  42.5%    298.8  28.8% GFS_MasterChunkTable::Create
233   176.2  17.0%  59.5%    729.9  70.5% GFS_MasterChunkTable::UpdateState
234   169.8  16.4%  75.9%    169.8  16.4% PendingClone::PendingClone
235    76.3   7.4%  83.3%     76.3   7.4% __default_alloc_template::_S_chunk_alloc
236    49.5   4.8%  88.0%     49.5   4.8% hashtable::resize
237   ...
238</pre>
239
240<p>
241<ul>
242  <li> The first column contains the direct memory use in MB.
243  <li> The fourth column contains memory use by the procedure
244       and all of its callees.
245  <li> The second and fifth columns are just percentage
246       representations of the numbers in the first and fourth columns.
247  <li> The third column is a cumulative sum of the second column
248       (i.e., the <code>k</code>th entry in the third column is the
249       sum of the first <code>k</code> entries in the second column.)
250</ul>
251
252<h3>Ignoring or focusing on specific regions</h3>
253
254<p>The following command will give a graphical display of a subset of
255the call-graph.  Only paths in the call-graph that match the regular
256expression <code>DataBuffer</code> are included:</p>
257<pre>
258% pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap
259</pre>
260
261<p>Similarly, the following command will omit all paths subset of the
262call-graph.  All paths in the call-graph that match the regular
263expression <code>DataBuffer</code> are discarded:</p>
264<pre>
265% pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap
266</pre>
267
268<h3>Total allocations + object-level information</h3>
269
270<p>All of the previous examples have displayed the amount of in-use
271space.  I.e., the number of bytes that have been allocated but not
272freed.  You can also get other types of information by supplying a
273flag to <code>pprof</code>:</p>
274
275<center>
276<table frame=box rules=sides cellpadding=5 width=100%>
277
278<tr valign=top>
279  <td><code>--inuse_space</code></td>
280  <td>
281     Display the number of in-use megabytes (i.e. space that has
282     been allocated but not freed).  This is the default.
283  </td>
284</tr>
285
286<tr valign=top>
287  <td><code>--inuse_objects</code></td>
288  <td>
289     Display the number of in-use objects (i.e. number of
290     objects that have been allocated but not freed).
291  </td>
292</tr>
293
294<tr valign=top>
295  <td><code>--alloc_space</code></td>
296  <td>
297     Display the number of allocated megabytes.  This includes
298     the space that has since been de-allocated.  Use this
299     if you want to find the main allocation sites in the
300     program.
301  </td>
302</tr>
303
304<tr valign=top>
305  <td><code>--alloc_objects</code></td>
306  <td>
307     Display the number of allocated objects.  This includes
308     the objects that have since been de-allocated.  Use this
309     if you want to find the main allocation sites in the
310     program.
311  </td>
312
313</table>
314</center>
315
316
317<h3>Interactive mode</a></h3>
318
319<p>By default -- if you don't specify any flags to the contrary --
320pprof runs in interactive mode.  At the <code>(pprof)</code> prompt,
321you can run many of the commands described above.  You can type
322<code>help</code> for a list of what commands are available in
323interactive mode.</p>
324
325
326<h1>Caveats</h1>
327
328<ul>
329  <li> Heap profiling requires the use of libtcmalloc.  This
330       requirement may be removed in a future version of the heap
331       profiler, and the heap profiler separated out into its own
332       library.
333
334  <li> If the program linked in a library that was not compiled
335       with enough symbolic information, all samples associated
336       with the library may be charged to the last symbol found
337       in the program before the libary.  This will artificially
338       inflate the count for that symbol.
339
340  <li> If you run the program on one machine, and profile it on
341       another, and the shared libraries are different on the two
342       machines, the profiling output may be confusing: samples that
343       fall within the shared libaries may be assigned to arbitrary
344       procedures.
345
346  <li> Several libraries, such as some STL implementations, do their
347       own memory management.  This may cause strange profiling
348       results.  We have code in libtcmalloc to cause STL to use
349       tcmalloc for memory management (which in our tests is better
350       than STL's internal management), though it only works for some
351       STL implementations.
352
353  <li> If your program forks, the children will also be profiled
354       (since they inherit the same HEAPPROFILE setting).  Each
355       process is profiled separately; to distinguish the child
356       profiles from the parent profile and from each other, all
357       children will have their process-id attached to the HEAPPROFILE
358       name.
359
360  <li> Due to a hack we make to work around a possible gcc bug, your
361       profiles may end up named strangely if the first character of
362       your HEAPPROFILE variable has ascii value greater than 127.
363       This should be exceedingly rare, but if you need to use such a
364       name, just set prepend <code>./</code> to your filename:
365       <code>HEAPPROFILE=./&Auml;gypten</code>.
366</ul>
367
368<hr>
369<address>Sanjay Ghemawat
370<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
371</address>
372</body>
373</html>
374