• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<HTML>
2<!--
3     Copyright (c) Jeremy Siek 2000
4
5     Distributed under the Boost Software License, Version 1.0.
6     (See accompanying file LICENSE_1_0.txt or copy at
7     http://www.boost.org/LICENSE_1_0.txt)
8  -->
9<Head>
10<Title>File Dependency Example</Title>
11<BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b"
12        ALINK="#ff0000">
13<IMG SRC="../../../boost.png"
14     ALT="C++ Boost" width="277" height="86">
15
16<BR Clear>
17
18
19<H1><A NAME="sec:file-depend-eg"></A>
20File Dependency Example
21</H1>
22
23<P>
24One of the most common uses of the graph abstraction in computer
25science is to track dependencies. An example of dependency tracking
26that we deal with on a day to day basis is the compilation
27dependencies for files in programs that we write. These dependencies
28are used inside programs such as <TT>make</TT> or in an IDE such as
29Visual C++ to minimize the number of files that must be recompiled
30after some changes have been made.
31
32<P>
33<A HREF="#fig:file-dep">Figure 1</A> shows a graph that has a vertex
34for each source file, object file, and library that is used in the
35<TT>killerapp</TT> program. The edges in the graph show which files
36are used in creating other files. The choice of which direction to
37point the arrows is somewhat arbitrary. As long as we are consistent
38in remembering that the arrows mean ``used by'' then things will work
39out. The opposite direction would mean ``depends on''.
40
41<P>
42
43<P></P>
44<DIV ALIGN="CENTER"><A NAME="fig:file-dep"></A>
45<TABLE>
46<CAPTION ALIGN="BOTTOM"><STRONG>Figure 1:</STRONG>
47A graph representing file dependencies.</CAPTION>
48<TR><TD><IMG SRC="./figs/file_dep.gif" width="331" height="351"></TD></TR>
49</TABLE>
50</DIV><P></P>
51
52<P>
53A compilation system such as <TT>make</TT> has to be able to answer a
54number of questions:
55
56<P>
57
58<OL>
59<LI>If we need to compile (or recompile) all of the files, what order
60   should that be done it?
61</LI>
62<LI>What files can be compiled in parallel?
63</LI>
64<LI>If a file is changed, which files must be recompiled?
65</LI>
66<LI>Are there any cycles in the dependencies? (which means the user
67  has made a mistake and an error should be emitted)
68</LI>
69</OL>
70
71<P>
72In the following examples we will formulate each of these questions in
73terms of the dependency graph, and then find a graph algorithm to
74provide the solution. The graph in <A HREF="#fig:file-dep">Figure
751</A> will be used in all of the following examples. The source code
76for this example can be found in the file <a
77href="../example/file_dependencies.cpp"><TT>examples/file_dependencies.cpp</TT></a>.
78
79<P>
80
81<H2>Graph Setup</H2>
82
83<P>
84Here we show the construction of the graph.  First, these are the required
85header files:
86
87<P>
88<PRE>
89#include &lt;iostream&gt; // std::cout
90#include &lt;utility&gt; // std::pair
91#include &lt;boost/graph/graph_traits.hpp&gt;
92#include &lt;boost/graph/adjacency_list.hpp&gt;
93#include &lt;boost/graph/topological_sort.hpp&gt;
94</PRE>
95
96<P>
97For simplicity we have
98constructed the graph &quot;by-hand&quot;. A compilation system such
99as <TT>make</TT> would instead parse a <TT>Makefile</TT> to get the
100list of files and to set-up the dependencies. We use the
101<TT>adjacency_list</TT> class to represent the graph. The
102<TT>vecS</TT> selector means that a <TT>std::vector</TT> will be used
103to represent each edge-list, which provides efficient traversal. The
104<TT>bidirectionalS</TT> selector means we want a directed graph with access to both the edges outgoing from each vertex and the edges incoming to each vertex, and the
105<TT>color_property</TT> attaches a color property to each vertex of the
106graph. The color property will be used in several of the algorithms in
107the following sections.
108
109<P>
110<PRE>
111  enum files_e { dax_h, yow_h, boz_h, zow_h, foo_cpp,
112                 foo_o, bar_cpp, bar_o, libfoobar_a,
113                 zig_cpp, zig_o, zag_cpp, zag_o,
114                 libzigzag_a, killerapp, N };
115  const char* name[] = { "dax.h", "yow.h", "boz.h", "zow.h", "foo.cpp",
116                         "foo.o", "bar.cpp", "bar.o", "libfoobar.a",
117                         "zig.cpp", "zig.o", "zag.cpp", "zag.o",
118                         "libzigzag.a", "killerapp" };
119
120  typedef std::pair&lt;int, int&gt; Edge;
121  Edge used_by[] = {
122    Edge(dax_h, foo_cpp), Edge(dax_h, bar_cpp), Edge(dax_h, yow_h),
123    Edge(yow_h, bar_cpp), Edge(yow_h, zag_cpp),
124    Edge(boz_h, bar_cpp), Edge(boz_h, zig_cpp), Edge(boz_h, zag_cpp),
125    Edge(zow_h, foo_cpp),
126    Edge(foo_cpp, foo_o),
127    Edge(foo_o, libfoobar_a),
128    Edge(bar_cpp, bar_o),
129    Edge(bar_o, libfoobar_a),
130    Edge(libfoobar_a, libzigzag_a),
131    Edge(zig_cpp, zig_o),
132    Edge(zig_o, libzigzag_a),
133    Edge(zag_cpp, zag_o),
134    Edge(zag_o, libzigzag_a),
135    Edge(libzigzag_a, killerapp)
136  };
137
138  using namespace boost;
139  typedef adjacency_list&lt;vecS, vecS, bidirectionalS,
140      property&lt;vertex_color_t, default_color_type&gt;
141    &gt; Graph;
142  Graph g(used_by, used_by + sizeof(used_by) / sizeof(Edge), N);
143  typedef graph_traits&lt;Graph&gt;::vertex_descriptor Vertex;
144</PRE>
145
146<P>
147
148<H2>Compilation Order (All Files)</H2>
149
150<P>
151On the first invocation of <TT>make</TT> for a particular project, all
152of the files must be compiled. Given the dependencies between the
153various files, what is the correct order in which to compile and link
154them? First we need to formulate this in terms of a graph. Finding a
155compilation order is the same as ordering the vertices in the graph.
156The constraint on the ordering is the file dependencies which we have
157represented as edges. So if there is an edge <i>(u,v)</i> in the graph
158then <i>v</i> better not come before <i>u</i> in the ordering. It
159turns out that this kind of constrained ordering is called a
160<I>topological sort</I>. Therefore, answering the question of
161compilation order is as easy as calling the BGL algorithm <a
162href="./topological_sort.html"><TT>topological_sort()</TT></a>. The
163traditional form of the output for topological sort is a linked-list
164of the sorted vertices.  The BGL algorithm instead puts the sorted
165vertices into any <a
166href="http://www.boost.org/sgi/stl/OutputIterator.html">OutputIterator</a>,
167which allows for much more flexibility.  Here we use the
168<TT>std::front_insert_iterator</TT> to create an output iterator that
169inserts the vertices on the front of a linked list.  Other possible
170options are writing the output to a file or inserting into a different
171STL or custom-made container.
172
173<P>
174<PRE>
175  typedef std::list&lt;Vertex&gt; MakeOrder;
176  MakeOrder make_order;
177  boost::topological_sort(g, std::front_inserter(make_order));
178
179  std::cout &lt;&lt; "make ordering: ";
180  for (MakeOrder::iterator i = make_order.begin();
181       i != make_order.end(); ++i)
182    std::cout &lt;&lt; name[*i] &lt;&lt; " ";
183  std::cout &lt;&lt; std::endl;
184</PRE>
185The output of this is:
186<PRE>
187  make ordering: zow.h boz.h zig.cpp zig.o dax.h yow.h zag.cpp \
188  zag.o bar.cpp bar.o foo.cpp foo.o libfoobar.a libzigzag.a killerapp
189</PRE>
190
191<P>
192
193<H2><A NAME="sec:parallel-compilation"></A>
194Parallel Compilation
195</H2>
196
197<P>
198Another question the compilation system might need to answer is: what
199files can be compiled simultaneously? This would allow the system to
200spawn threads and utilize multiple processors to speed up the build.
201This question can also be put in a slightly different way: what is the
202earliest time that a file can be built assuming that an unlimited
203number of files can be built at the same time? The main criteria for
204when a file can be built is that all of the files it depends on must
205already be built. To simplify things for this example, we'll assume
206that each file takes 1 time unit to build (even header files). For
207parallel compilation, we can build all of the files corresponding to
208vertices with no dependencies, e.g., those that have
209an <i>in-degree</i> of 0, in the first step. For all other files, the
210main observation for determining the ``time slot'' for a file is that
211the time slot must be one more than the maximum time-slot of the files
212it depends on.
213
214<P>We start by creating a vector <code>time</code> that will store the
215  time step at which each file can be built. We initialize every value
216  with time step zero.</p>
217
218<P>
219<PRE>
220  std::vector&lt;int&gt; time(N, 0);
221</PRE>
222
223<p>Now, we want to visit the vertices against in topological order,
224  from those files that need to be built first until those that need
225  to be built last. However, instead of printing out the order
226  immediately, we will compute the time step in which each file should
227  be built based on the time steps of the files it depends on. We
228  only need to consider those files whose in-degree is greater than
229  zero.</p>
230
231<pre>
232    for (i = make_order.begin(); i != make_order.end(); ++i) {
233      if (in_degree (*i, g) &gt; 0) {
234        Graph::in_edge_iterator j, j_end;
235        int maxdist = 0;
236        for (boost::tie(j, j_end) = in_edges(*i, g); j != j_end; ++j)
237          maxdist = std::max(time[source(*j, g)], maxdist);
238        time[*i]=maxdist+1;
239      }
240    }
241</pre>
242
243<P>
244Last, we output the time-slot that we've calculated for each vertex.
245
246<P>
247<PRE>
248  std::cout &lt;&lt; "parallel make ordering, " &lt;&lt; std::endl
249       &lt;&lt; "  vertices with same group number" &lt;&lt; std::endl
250       &lt;&lt; "  can be made in parallel" &lt;&lt; std::endl &lt;&lt; std::endl;
251  for (boost::tie(i, iend) = vertices(g); i != iend; ++i)
252    std::cout &lt;&lt; "time_slot[" &lt;&lt; name[*i] &lt;&lt; "] = " &lt;&lt; time[*i] &lt;&lt; std::endl;
253</PRE>
254The output is:
255<PRE>
256  parallel make ordering,
257    vertices with same group number
258    can be made in parallel
259  time_slot[dax.h] = 0
260  time_slot[yow.h] = 1
261  time_slot[boz.h] = 0
262  time_slot[zow.h] = 0
263  time_slot[foo.cpp] = 1
264  time_slot[foo.o] = 2
265  time_slot[bar.cpp] = 2
266  time_slot[bar.o] = 3
267  time_slot[libfoobar.a] = 4
268  time_slot[zig.cpp] = 1
269  time_slot[zig.o] = 2
270  time_slot[zag.cpp] = 2
271  time_slot[zag.o] = 3
272  time_slot[libzigzag.a] = 5
273  time_slot[killerapp] = 6
274</PRE>
275
276<P>
277
278<H2><A NAME="SECTION001014000000000000000"></A>
279<A NAME="sec:cycles"></A>
280<BR>
281Cyclic Dependencies
282</H2>
283
284<P>
285Another question the compilation system needs to be able to answer is
286whether there are any cycles in the dependencies. If there are cycles,
287the system will need to report an error to the user so that the cycles
288can be removed. One easy way to detect a cycle is to run a <a
289href="./depth_first_search.html">depth-first search</a>, and if the
290search runs into an already discovered vertex (of the current search
291tree), then there is a cycle. The BGL graph search algorithms (which
292includes
293<TT>depth_first_search()</TT>) are all extensible via the
294<I>visitor</I> mechanism. A visitor is similar to a function object,
295but it has several methods instead of just the one
296<TT>operator()</TT>.  The visitor's methods are called at certain
297points within the algorithm, thereby giving the user a way to extend
298the functionality of the graph search algorithms. See Section <A
299HREF="visitor_concepts.html">Visitor Concepts</A>
300for a detailed description of visitors.
301
302<P>
303We will create a visitor class and fill in the <TT>back_edge()</TT>
304method, which is the <a href="./DFSVisitor.html">DFSVisitor</a> method
305that is called when DFS explores an edge to an already discovered
306vertex. A call to this method indicates the existence of a
307cycle. Inheriting from <TT>dfs_visitor&lt;&gt;</TT>
308provides the visitor with empty versions of the other visitor methods.
309Once our visitor is created, we can construct and object and pass it
310to the BGL algorithm.  Visitor objects are passed by value inside of
311the BGL algorithms, so the <TT>has_cycle</TT> flag is stored by
312reference in this visitor.
313
314<P>
315<PRE>
316  struct cycle_detector : public dfs_visitor&lt;&gt;
317  {
318    cycle_detector( bool&amp; has_cycle)
319      : _has_cycle(has_cycle) { }
320
321    template &lt;class Edge, class Graph&gt;
322    void back_edge(Edge, Graph&amp;) {
323      _has_cycle = true;
324    }
325  protected:
326    bool&amp; _has_cycle;
327  };
328</PRE>
329
330<P>
331We can now invoke the BGL <TT>depth_first_search()</TT>
332algorithm and pass in the cycle detector visitor.
333
334<P>
335<PRE>
336  bool has_cycle = false;
337  cycle_detector vis(has_cycle);
338  boost::depth_first_search(g, visitor(vis));
339  std::cout &lt;&lt; "The graph has a cycle? " &lt;&lt; has_cycle &lt;&lt; std::endl;
340</PRE>
341
342<P>
343The graph in <A HREF="#fig:file-dep">Figure 1</A> (ignoring the dotted
344line) did not have any cycles, but if we add a dependency between
345<TT>bar.cpp</TT> and <TT>dax.h</TT> there there will be. Such a
346dependency would be flagged as a user error.
347
348<P>
349<PRE>
350  add_edge(bar_cpp, dax_h, g);
351</PRE>
352
353
354<br>
355<HR>
356<TABLE>
357<TR valign=top>
358<TD nowrap>Copyright &copy; 2000-2001</TD><TD>
359<A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>)
360</TD></TR></TABLE>
361
362</BODY>
363</HTML>
364