1<?xml version="1.0"?> <!-- -*- sgml -*- --> 2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" 4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> 5 6 7<chapter id="sg-manual" 8 xreflabel="SGCheck: an experimental stack and global array overrun detector"> 9 <title>SGCheck: an experimental stack and global array overrun detector</title> 10 11<para>To use this tool, you must specify 12<option>--tool=exp-sgcheck</option> on the Valgrind 13command line.</para> 14 15 16 17 18<sect1 id="sg-manual.overview" xreflabel="Overview"> 19<title>Overview</title> 20 21<para>SGCheck is a tool for finding overruns of stack and global 22arrays. It works by using a heuristic approach derived from an 23observation about the likely forms of stack and global array accesses. 24</para> 25 26</sect1> 27 28 29 30 31<sect1 id="sg-manual.options" xreflabel="SGCheck Command-line Options"> 32<title>SGCheck Command-line Options</title> 33 34<para id="sg.opts.list">There are no SGCheck-specific command-line options at present.</para> 35<!-- 36<para>SGCheck-specific command-line options are:</para> 37 38 39<variablelist id="sg.opts.list"> 40</variablelist> 41--> 42 43</sect1> 44 45 46 47<sect1 id="sg-manual.how-works.sg-checks" 48 xreflabel="How SGCheck Works"> 49<title>How SGCheck Works</title> 50 51<para>When a source file is compiled 52with <option>-g</option>, the compiler attaches DWARF3 53debugging information which describes the location of all stack and 54global arrays in the file.</para> 55 56<para>Checking of accesses to such arrays would then be relatively 57simple, if the compiler could also tell us which array (if any) each 58memory referencing instruction was supposed to access. Unfortunately 59the DWARF3 debugging format does not provide a way to represent such 60information, so we have to resort to a heuristic technique to 61approximate it. The key observation is that 62 <emphasis> 63 if a memory referencing instruction accesses inside a stack or 64 global array once, then it is highly likely to always access that 65 same array</emphasis>.</para> 66 67<para>To see how this might be useful, consider the following buggy 68fragment:</para> 69<programlisting><![CDATA[ 70 { int i, a[10]; // both are auto vars 71 for (i = 0; i <= 10; i++) 72 a[i] = 42; 73 } 74]]></programlisting> 75 76<para>At run time we will know the precise address 77of <computeroutput>a[]</computeroutput> on the stack, and so we can 78observe that the first store resulting from <computeroutput>a[i] = 7942</computeroutput> writes <computeroutput>a[]</computeroutput>, and 80we will (correctly) assume that that instruction is intended always to 81access <computeroutput>a[]</computeroutput>. Then, on the 11th 82iteration, it accesses somewhere else, possibly a different local, 83possibly an un-accounted for area of the stack (eg, spill slot), so 84SGCheck reports an error.</para> 85 86<para>There is an important caveat.</para> 87 88<para>Imagine a function such as <function>memcpy</function>, which is used 89to read and write many different areas of memory over the lifetime of the 90program. If we insist that the read and write instructions in its memory 91copying loop only ever access one particular stack or global variable, we 92will be flooded with errors resulting from calls to 93<function>memcpy</function>.</para> 94 95<para>To avoid this problem, SGCheck instantiates fresh likely-target 96records for each entry to a function, and discards them on exit. This 97allows detection of cases where (e.g.) <function>memcpy</function> 98overflows its source or destination buffers for any specific call, but 99does not carry any restriction from one call to the next. Indeed, 100multiple threads may make multiple simultaneous calls to 101(e.g.) <function>memcpy</function> without mutual interference.</para> 102 103<para>It is important to note that the association is done between 104 a <emphasis>binary instruction</emphasis> and an array, the 105 <emphasis>first time</emphasis> this binary instruction accesses an 106 array during a function call. When the same instruction is executed 107 again during the same function call, then SGCheck might report a 108 problem, if these further executions are not accessing the same 109 array. This technique causes several limitations in SGCheck, see 110 <xref linkend="sg-manual.limitations"/>. 111</para> 112</sect1> 113 114 115 116<sect1 id="sg-manual.cmp-w-memcheck" 117 xreflabel="Comparison with Memcheck"> 118<title>Comparison with Memcheck</title> 119 120<para>SGCheck and Memcheck are complementary: their capabilities do 121not overlap. Memcheck performs bounds checks and use-after-free 122checks for heap arrays. It also finds uses of uninitialised values 123created by heap or stack allocations. But it does not perform bounds 124checking for stack or global arrays.</para> 125 126<para>SGCheck, on the other hand, does do bounds checking for stack or 127global arrays, but it doesn't do anything else.</para> 128 129</sect1> 130 131 132 133 134 135<sect1 id="sg-manual.limitations" 136 xreflabel="Limitations"> 137<title>Limitations</title> 138 139<para>This is an experimental tool, which relies rather too heavily on some 140not-as-robust-as-I-would-like assumptions on the behaviour of correct 141programs. There are a number of limitations which you should be aware 142of.</para> 143 144<itemizedlist> 145 146 <listitem> 147 <para>False negatives (missed errors): it follows from the 148 description above (<xref linkend="sg-manual.how-works.sg-checks"/>) 149 that the first access by a memory referencing instruction to a 150 stack or global array creates an association between that 151 instruction and the array, which is checked on subsequent accesses 152 by that instruction, until the containing function exits. Hence, 153 the first access by an instruction to an array (in any given 154 function instantiation) is not checked for overrun, since SGCheck 155 uses that as the "example" of how subsequent accesses should 156 behave.</para> 157 <para>It also means that errors will not be found in an instruction 158 executed only once (e.g. because this instruction is not in a loop, 159 or the loop is executed only once).</para> 160 </listitem> 161 162 <listitem> 163 <para>False positives (false errors): similarly, and more serious, 164 it is clearly possible to write legitimate pieces of code which 165 break the basic assumption upon which the checking algorithm 166 depends. For example:</para> 167 168<programlisting><![CDATA[ 169 { int a[10], b[10], *p, i; 170 for (i = 0; i < 10; i++) { 171 p = /* arbitrary condition */ ? &a[i] : &b[i]; 172 *p = 42; 173 } 174 } 175]]></programlisting> 176 177 <para>In this case the store sometimes 178 accesses <computeroutput>a[]</computeroutput> and 179 sometimes <computeroutput>b[]</computeroutput>, but in no cases is 180 the addressed array overrun. Nevertheless the change in target 181 will cause an error to be reported.</para> 182 183 <para>It is hard to see how to get around this problem. The only 184 mitigating factor is that such constructions appear very rare, at 185 least judging from the results using the tool so far. Such a 186 construction appears only once in the Valgrind sources (running 187 Valgrind on Valgrind) and perhaps two or three times for a start 188 and exit of Firefox. The best that can be done is to suppress the 189 errors.</para> 190 </listitem> 191 192 <listitem> 193 <para>Performance: SGCheck has to read all of 194 the DWARF3 type and variable information on the executable and its 195 shared objects. This is computationally expensive and makes 196 startup quite slow. You can expect debuginfo reading time to be in 197 the region of a minute for an OpenOffice sized application, on a 198 2.4 GHz Core 2 machine. Reading this information also requires a 199 lot of memory. To make it viable, SGCheck goes to considerable 200 trouble to compress the in-memory representation of the DWARF3 201 data, which is why the process of reading it appears slow.</para> 202 </listitem> 203 204 <listitem> 205 <para>Performance: SGCheck runs slower than Memcheck. This is 206 partly due to a lack of tuning, but partly due to algorithmic 207 difficulties. The 208 stack and global checks can sometimes require a number of range 209 checks per memory access, and these are difficult to short-circuit, 210 despite considerable efforts having been made. A 211 redesign and reimplementation could potentially make it much faster. 212 </para> 213 </listitem> 214 215 <listitem> 216 <para>Coverage: Stack and global checking is fragile. If a shared 217 object does not have debug information attached, then SGCheck will 218 not be able to determine the bounds of any stack or global arrays 219 defined within that shared object, and so will not be able to check 220 accesses to them. This is true even when those arrays are accessed 221 from some other shared object which was compiled with debug 222 info.</para> 223 224 <para>At the moment SGCheck accepts objects lacking debuginfo 225 without comment. This is dangerous as it causes SGCheck to 226 silently skip stack and global checking for such objects. It would 227 be better to print a warning in such circumstances.</para> 228 </listitem> 229 230 <listitem> 231 <para>Coverage: SGCheck does not check whether the areas read 232 or written by system calls do overrun stack or global arrays. This 233 would be easy to add.</para> 234 </listitem> 235 236 <listitem> 237 <para>Platforms: the stack/global checks won't work properly on 238 PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets. 239 That's because the stack and global checking requires tracking 240 function calls and exits reliably, and there's no obvious way to do 241 it on ABIs that use a link register for function returns. 242 </para> 243 </listitem> 244 245 <listitem> 246 <para>Robustness: related to the previous point. Function 247 call/exit tracking for X86 and AMD64 is believed to work properly 248 even in the presence of longjmps within the same stack (although 249 this has not been tested). However, code which switches stacks is 250 likely to cause breakage/chaos.</para> 251 </listitem> 252</itemizedlist> 253 254</sect1> 255 256 257 258 259 260<sect1 id="sg-manual.todo-user-visible" 261 xreflabel="Still To Do: User-visible Functionality"> 262<title>Still To Do: User-visible Functionality</title> 263 264<itemizedlist> 265 266 <listitem> 267 <para>Extend system call checking to work on stack and global arrays.</para> 268 </listitem> 269 270 <listitem> 271 <para>Print a warning if a shared object does not have debug info 272 attached, or if, for whatever reason, debug info could not be 273 found, or read.</para> 274 </listitem> 275 276 <listitem> 277 <para>Add some heuristic filtering that removes obvious false 278 positives. This would be easy to do. For example, an access 279 transition from a heap to a stack object almost certainly isn't a 280 bug and so should not be reported to the user.</para> 281 </listitem> 282 283</itemizedlist> 284 285</sect1> 286 287 288 289 290<sect1 id="sg-manual.todo-implementation" 291 xreflabel="Still To Do: Implementation Tidying"> 292<title>Still To Do: Implementation Tidying</title> 293 294<para>Items marked CRITICAL are considered important for correctness: 295non-fixage of them is liable to lead to crashes or assertion failures 296in real use.</para> 297 298<itemizedlist> 299 300 <listitem> 301 <para> sg_main.c: Redesign and reimplement the basic checking 302 algorithm. It could be done much faster than it is -- the current 303 implementation isn't very good. 304 </para> 305 </listitem> 306 307 <listitem> 308 <para> sg_main.c: Improve the performance of the stack / global 309 checks by doing some up-front filtering to ignore references in 310 areas which "obviously" can't be stack or globals. This will 311 require using information that m_aspacemgr knows about the address 312 space layout.</para> 313 </listitem> 314 315 <listitem> 316 <para>sg_main.c: fix compute_II_hash to make it a bit more sensible 317 for ppc32/64 targets (except that sg_ doesn't work on ppc32/64 318 targets, so this is a bit academic at the moment).</para> 319 </listitem> 320 321</itemizedlist> 322 323</sect1> 324 325 326 327</chapter> 328