• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1- A slightly edited irc discussion with Josh Triplett.
2- Describes most data structures used in sparse.
3
4As far as the parsing structures go...
5The C parser exists in two main files: parse.c, which parses statements, and expression.c, which parses expressions.
6parse.h contains the definition of struct statement, which represents a C statement.
7That includes only those things which can't appear as an expression, which primarily includes control flow statements such as if, loops, switch/case, and goto.
8expression.h contains the definition of struct expression, which represents a C expression.  That has a lot more content, since most C constructs can appear in expressions.
9A series of statements forms a compound statement (STMT_COMPOUND).
10That appears as another struct statement which has a statement_list member.
11A function body consists of a compound statement.
12When you look at a loop body, if or else body, or case body, you'll notice that they just have a struct statement, not a statement_list; they can have multiple statements by using a compound statement.
13Also note that all loops get turned into a single "iterator" statement.
14for, while, and do-while.
15A symbol, then, represents a name in a C file.  A symbol might represent a variable, a function, a label, or various other things.
16See symbol.h.
17"struct symbol" represents one symbol.
18As with the various other structures, it has some common data and a union of sub-structures for the parts that differ between different types.
19Most of the interesting bits come in the NS_SYMBOL case.
20Among other things, it has a struct statement for the body of a function (if any), a list of symbols for the arguments, an expression for a variable initializer, and so on.
21Together, struct symbol, struct statement, and struct expression represent most of the abstract syntax tree for C.
22So, that represents most of the "front-end" of Sparse: parsing C and generating that abstract syntax tree.
23That much occurs in pretty much any program using the Sparse frontend.
24The backend varies among programs.
25For instance, the c2xml backend goes that far, then outputs XML.
26The sparse static analysis backend has a few steps: it generates linearized bytecode, does some evaluation on that, and outputs some warnings.
27Several other backends run that linearized bytecode stage.
28The linearized bytecode itself has a set of nested structures.
29linearize.h defines all of them.
30At the top level, it has struct entrypoint.
31That represents an entrypoint to the code, which would normally mean a function.
32An entrypoint has a list of basic blocks.
33struct basic_block.
34A basic block represents a series of instructions with no branches.
35Straight-line code.
36A branch only occurs at the end of a basic block, and branches can only target the beginning of a basic block.
37Typically, a conditional will consist of a basic block leading up to the branch, a basic block for the true case, a basic block for the false case, and a basic block where the two paths merge back together.
38Either the true or the false case may not exist.
39A loop will normally have a basic block for the loop body, which can branch to the top at the end or continue to the next basic block.
40So basic blocks represent a node in the control flow graph.
41The edges in that graph lead from one basic block to a basic block which can follow it in the execution of the program.
42Each basic block has a series of instructions, "struct instruction".
43"enum opcode" lists all the instructions.
44Fairly high-level instruction set, corresponding directly to bits of C.
45So you have an entrypoint, which has a graph of basic blocks, each of which has a list of instructions.
46An entrypoint also has a pointer to the first instruction.
47One last bit of trickiness: struct pseudo.
48Have you ever heard of "static single assignment" or SSA form?
49struct pseudo represents one of those single-assignment variables.
50Each one has a pointer to the symbol it represents (which may have many pseudos referencing it).
51Each one also has a pointer to the instruction that defines it.
52That covers most of the major data structures in Sparse.
53Now, given all that, some of the top-level stuff in sparse.c may make more sense.
54For instance, the context checking works in terms of basic blocks.
55Hopefully some of that helped you understand Sparse better.
56