• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
4<title>12.�BBV: an experimental basic block vector generation tool</title>
5<link rel="stylesheet" type="text/css" href="vg_basic.css">
6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
7<link rel="home" href="index.html" title="Valgrind Documentation">
8<link rel="up" href="manual.html" title="Valgrind User Manual">
9<link rel="prev" href="sg-manual.html" title="11.�SGCheck: an experimental stack and global array overrun detector">
10<link rel="next" href="lk-manual.html" title="13.�Lackey: an example tool">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
14<td width="22px" align="center" valign="middle"><a accesskey="p" href="sg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
15<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
16<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
17<th align="center" valign="middle">Valgrind User Manual</th>
18<td width="22px" align="center" valign="middle"><a accesskey="n" href="lk-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
19</tr></table></div>
20<div class="chapter">
21<div class="titlepage"><div><div><h1 class="title">
22<a name="bbv-manual"></a>12.�BBV: an experimental basic block vector generation tool</h1></div></div></div>
23<div class="toc">
24<p><b>Table of Contents</b></p>
25<dl class="toc">
26<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.overview">12.1. Overview</a></span></dt>
27<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.quickstart">12.2. Using Basic Block Vectors to create SimPoints</a></span></dt>
28<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.usage">12.3. BBV Command-line Options</a></span></dt>
29<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.fileformat">12.4. Basic Block Vector File Format</a></span></dt>
30<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.implementation">12.5. Implementation</a></span></dt>
31<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.threadsupport">12.6. Threaded Executable Support</a></span></dt>
32<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.validation">12.7. Validation</a></span></dt>
33<dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.performance">12.8. Performance</a></span></dt>
34</dl>
35</div>
36<p>To use this tool, you must specify
37<code class="option">--tool=exp-bbv</code> on the Valgrind
38command line.</p>
39<div class="sect1">
40<div class="titlepage"><div><div><h2 class="title" style="clear: both">
41<a name="bbv-manual.overview"></a>12.1.�Overview</h2></div></div></div>
42<p>
43   A basic block is a linear section of code with one entry point and one exit
44   point.  A <span class="emphasis"><em>basic block vector</em></span> (BBV) is a list of all
45   basic blocks entered during program execution, and a count of how many
46   times each basic block was run.
47</p>
48<p>
49   BBV is a tool that generates basic block vectors for use with the
50   <a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint</a>
51   analysis tool.
52   The SimPoint methodology enables speeding up architectural
53   simulations by only running a small portion of a program
54   and then extrapolating total behavior from this
55   small portion.  Most programs exhibit phase-based behavior, which
56   means that at various times during execution a program will encounter
57   intervals of time where the code behaves similarly to a previous
58   interval.  If you can detect these intervals and group them together,
59   an approximation of the total program behavior can be obtained
60   by only simulating a bare minimum number of intervals, and then scaling
61   the results.
62</p>
63<p>
64  In computer architecture research, running a
65  benchmark on a cycle-accurate simulator can cause slowdowns on the order
66  of 1000 times, making it take days, weeks, or even longer to run full
67  benchmarks.  By utilizing SimPoint this can be reduced significantly,
68  usually by 90-95%, while still retaining reasonable accuracy.
69</p>
70<p>
71   A more complete introduction to how SimPoint works can be
72   found in the paper "Automatically Characterizing Large Scale
73   Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and
74   B. Calder.
75</p>
76</div>
77<div class="sect1">
78<div class="titlepage"><div><div><h2 class="title" style="clear: both">
79<a name="bbv-manual.quickstart"></a>12.2.�Using Basic Block Vectors to create SimPoints</h2></div></div></div>
80<p>
81   To quickly create a basic block vector file, you will call Valgrind
82   like this:
83
84   </p>
85<pre class="programlisting">valgrind --tool=exp-bbv /bin/ls</pre>
86<p>
87
88   In this case we are running on <code class="filename">/bin/ls</code>,
89   but this can be any program.  By default a file called
90   <code class="computeroutput">bb.out.PID</code> will be created,
91   where PID is replaced by the process ID of the running process.
92   This file contains the basic block vector.  For long-running programs
93   this file can be quite large, so it might be wise to compress
94   it with gzip or some other compression program.
95</p>
96<p>
97   To create actual SimPoint results, you will need the SimPoint utility,
98   available from the
99   <a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint webpage</a>.
100   Assuming you have downloaded SimPoint 3.2 and compiled it,
101   create SimPoint results with a command like the following:
102
103   </p>
104<pre class="programlisting">
105./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \
106    -loadFVFile bb.out.1234.gz \
107    -k 5 -saveSimpoints results.simpts \
108    -saveSimpointWeights results.weights</pre>
109<p>
110
111   where bb.out.1234.gz is your compressed basic block vector file
112   generated by BBV.
113</p>
114<p>
115   The SimPoint utility does random linear projection using 15-dimensions,
116   then does k-mean clustering to calculate which intervals are
117   of interest.  In this example we specify 5 intervals with the
118   -k 5 option.
119</p>
120<p>
121   The outputs from the SimPoint run are the
122   <code class="computeroutput">results.simpts</code>
123   and <code class="computeroutput">results.weights</code> files.
124   The first holds the 5 most relevant intervals of the program.
125   The seconds holds the weight to scale each interval by when
126   extrapolating full-program behavior.  The intervals and the weights
127   can be used in conjunction with a simulator that supports
128   fast-forwarding; you fast-forward to the interval of interest,
129   collect stats for the desired interval length, then use
130   statistics gathered in conjunction with the weights to
131   calculate your results.
132</p>
133</div>
134<div class="sect1">
135<div class="titlepage"><div><div><h2 class="title" style="clear: both">
136<a name="bbv-manual.usage"></a>12.3.�BBV Command-line Options</h2></div></div></div>
137<p> BBV-specific command-line options are:</p>
138<div class="variablelist">
139<a name="bbv.opts.list"></a><dl class="variablelist">
140<dt>
141<a name="opt.bb-out-file"></a><span class="term">
142        <code class="option">--bb-out-file=&lt;name&gt; [default: bb.out.%p] </code>
143     </span>
144</dt>
145<dd><p>
146           This option selects the name of the basic block vector file.  The
147           <code class="option">%p</code> and <code class="option">%q</code> format specifiers can be
148           used to embed the process ID and/or the contents of an environment
149           variable in the name, as is the case for the core option
150           <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
151        </p></dd>
152<dt>
153<a name="opt.pc-out-file"></a><span class="term">
154        <code class="option">--pc-out-file=&lt;name&gt; [default: pc.out.%p] </code>
155     </span>
156</dt>
157<dd><p>
158           This option selects the name of the PC file.
159           This file holds program counter addresses
160           and function name info for the various basic blocks.
161           This can be used in conjunction
162           with the basic block vector file to fast-forward via function names
163           instead of just instruction counts.  The
164           <code class="option">%p</code> and <code class="option">%q</code> format specifiers can be
165           used to embed the process ID and/or the contents of an environment
166           variable in the name, as is the case for the core option
167           <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
168        </p></dd>
169<dt>
170<a name="opt.interval-size"></a><span class="term">
171        <code class="option">--interval-size=&lt;number&gt; [default: 100000000] </code>
172      </span>
173</dt>
174<dd><p>
175         This option selects the size of the interval to use.
176         The default is 100
177         million instructions, which is a commonly used value.
178         Other sizes can be used; smaller intervals can help programs
179         with finer-grained phases.  However smaller interval size
180         can lead to accuracy issues due to warm-up effects
181         (When fast-forwarding the various architectural features
182         will be un-initialized, and it will take some number
183         of instructions before they "warm up" to the state a
184         full simulation would be at without the fast-forwarding.
185         Large interval sizes tend to mitigate this.)
186      </p></dd>
187<dt>
188<a name="opt.instr-count-only"></a><span class="term">
189        <code class="option">--instr-count-only [default: no] </code>
190     </span>
191</dt>
192<dd><p>
193           This option tells the tool to only display instruction count
194           totals, and to not generate the actual basic block vector file.
195           This is useful for debugging, and for gathering instruction count
196           info without generating the large basic block vector files.
197        </p></dd>
198</dl>
199</div>
200</div>
201<div class="sect1">
202<div class="titlepage"><div><div><h2 class="title" style="clear: both">
203<a name="bbv-manual.fileformat"></a>12.4.�Basic Block Vector File Format</h2></div></div></div>
204<p>
205  The Basic Block Vector is dumped at fixed intervals.  This
206  is commonly done every 100 million instructions; the
207  <code class="option">--interval-size</code> option can be
208  used to change this.
209</p>
210<p>
211  The output file looks like this:
212</p>
213<pre class="programlisting">
214T:45:1024 :189:99343
215T:11:78573 :15:1353  :56:1
216T:18:45 :12:135353 :56:78 314:4324263</pre>
217<p>
218  Each new interval starts with a T.   This is followed on the same line
219  by a series of basic block and frequency pairs, one for each
220  basic block that was entered during the interval.  The format for
221  each block/frequency pair is a colon, followed by a number that
222  uniquely identifies the basic block, another colon, and then
223  the frequency (which is the number of times the block was entered,
224  multiplied by the number of instructions in the block).  The
225  pairs are separated from each other by a space.
226</p>
227<p>
228  The frequency count is multiplied by the number of instructions that are
229  in the basic block, in order to weigh the count so that instructions in
230  small basic blocks aren't counted as more important than instructions
231  in large basic blocks.
232</p>
233<p>
234  The SimPoint program only processes lines that start with a "T".  All
235  other lines are ignored.  Traditionally comments are indicated by
236  starting a line with a "#" character.  Some other BBV generation tools,
237  such as PinPoints, generate lines beginning with letters other than "T"
238  to indicate more information about the program being run.  We do
239  not generate these, as the SimPoint utility ignores them.
240</p>
241</div>
242<div class="sect1">
243<div class="titlepage"><div><div><h2 class="title" style="clear: both">
244<a name="bbv-manual.implementation"></a>12.5.�Implementation</h2></div></div></div>
245<p>
246   Valgrind provides all of the information necessary to create
247   BBV files.  In the current implementation, all instructions
248   are instrumented.  This is slower (by approximately a factor
249   of two) than a method that instruments at the basic block level,
250   but there are some complications (especially with rep prefix
251   detection) that make that method more difficult.
252</p>
253<p>
254   Valgrind actually provides instrumentation at a superblock level.
255   A superblock has one entry point but unlike basic blocks can
256   have multiple exit points.  Once a branch occurs into the middle
257   of a block, it is split into a new basic block.  Because
258   Valgrind cannot produce "true" basic blocks, the generated
259   BBV vectors will be different than those generated by other tools.
260   In practice this does not seem to affect the accuracy of the
261   SimPoint results.  We do internally force the
262   <code class="option">--vex-guest-chase-thresh=0</code>
263   option to Valgrind which forces a more basic-block-like
264   behavior.
265</p>
266<p>
267   When a superblock is run for the first time, it is instrumented
268   with our BBV routine.  A block info (bbInfo) structure is allocated
269   which holds the various information and statistics for the block.
270   A unique block ID is assigned to the block, and then the
271   structure is placed into an ordered set.
272   Then each native instruction in the block is instrumented to
273   call an instruction counting routine with a pointer to the block
274   info structure as an argument.
275</p>
276<p>
277   At run-time, our instruction counting routines are called once
278   per native instruction.  The relevant block info structure is accessed
279   and the block count and total instruction count is updated.
280   If the total instruction count overflows the interval size
281   then we walk the ordered set, writing out the statistics for
282   any block that was accessed in the interval, then resetting the
283   block counters to zero.
284</p>
285<p>
286   On the x86 and amd64 architectures the counting code has extra
287   code to handle rep-prefixed string instructions.  This is because
288   actual hardware counts a rep-prefixed instruction
289   as one instruction, while a naive Valgrind implementation
290   would count it as many (possibly hundreds, thousands or even millions)
291   of instructions.  We handle rep-prefixed instructions specially,
292   in order to make the results match those obtained with hardware performance
293   counters.
294</p>
295<p>
296   BBV also counts the fldcw instruction.  This instruction is used on
297   x86 machines in various ways; it is most commonly found when converting
298   floating point values into integers.
299   On Pentium 4 systems the retired instruction performance
300   counter counts this instruction as two instructions (all other
301   known processors only count it as one).
302   This can affect results when using SimPoint on Pentium 4 systems.
303   We provide the fldcw count so that users can evaluate whether it
304   will impact their results enough to avoid using Pentium 4 machines
305   for their experiments.  It would be possible to add an option to
306   this tool that mimics the double-counting so that the generated BBV
307   files would be usable for experiments using hardware performance
308   counters on Pentium 4 systems.
309</p>
310</div>
311<div class="sect1">
312<div class="titlepage"><div><div><h2 class="title" style="clear: both">
313<a name="bbv-manual.threadsupport"></a>12.6.�Threaded Executable Support</h2></div></div></div>
314<p>
315   BBV supports threaded programs.  When a program has multiple threads,
316   an additional basic block vector file is created for each thread (each
317   additional file is the specified filename with the thread number
318   appended at the end).
319</p>
320<p>
321   There is no official method of using SimPoint with
322   threaded workloads.  The most common method is to run
323   SimPoint on each thread's results independently, and use
324   some method of deterministic execution to try to match the
325   original workload.  This should be possible with the current
326   BBV.
327</p>
328</div>
329<div class="sect1">
330<div class="titlepage"><div><div><h2 class="title" style="clear: both">
331<a name="bbv-manual.validation"></a>12.7.�Validation</h2></div></div></div>
332<p>
333   BBV has been tested on x86, amd64, and ppc32 platforms.
334   An earlier version of BBV was tested in detail using
335   hardware performance counters, this work is described in a paper
336   from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation
337   to Generate Multi-Platform SimPoints: Methodology and Accuracy" by
338   V.M. Weaver and S.A. McKee.
339</p>
340</div>
341<div class="sect1">
342<div class="titlepage"><div><div><h2 class="title" style="clear: both">
343<a name="bbv-manual.performance"></a>12.8.�Performance</h2></div></div></div>
344<p>
345  Using this program slows down execution by roughly a factor of 40
346  over native execution.  This varies depending on the machine
347  used and the benchmark being run.
348  On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D
349  processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2).
350</p>
351</div>
352</div>
353<div>
354<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
355<tr>
356<td rowspan="2" width="40%" align="left">
357<a accesskey="p" href="sg-manual.html">&lt;&lt;�11.�SGCheck: an experimental stack and global array overrun detector</a>�</td>
358<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
359<td rowspan="2" width="40%" align="right">�<a accesskey="n" href="lk-manual.html">13.�Lackey: an example tool�&gt;&gt;</a>
360</td>
361</tr>
362<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
363</table>
364</div>
365</body>
366</html>
367