• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<HTML>
2<!--
3     Copyright (c) Jeremy Siek 2000
4
5     Distributed under the Boost Software License, Version 1.0.
6     (See accompanying file LICENSE_1_0.txt or copy at
7     http://www.boost.org/LICENSE_1_0.txt)
8  -->
9<Head>
10<Title>Kevin Bacon Example</Title>
11<BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b"
12        ALINK="#ff0000">
13<IMG SRC="../../../boost.png"
14     ALT="C++ Boost" width="277" height="86">
15
16<BR Clear>
17
18<H1>Six Degrees of Kevin Bacon</H1>
19
20<P>
21A fun application of graph theory comes up in the popular game ``Six
22Degrees of Kevin Bacon''. The idea of the game is to connect an actor
23to Kevin Bacon through a trail of actors who appeared together in
24movies, and do so in less than six steps. For example, Theodore
25Hesburgh (President Emeritus of the University of Notre Dame) was in
26the movie ``Rudy'' with the actor Gerry Becker, who was in the movie
27``Sleepers'' with Kevin Bacon. Why Kevin Bacon? For some reason the
28three students who invented the game, Mike Ginelli, Craig Fass, and
29Brian Turtle decided that Kevin Bacon is the center of the
30entertainment world. The Kevin Bacon game is quite similar to the <a
31href="http://www.oakland.edu/~grossman/erdoshp.html">Erd&ouml;s
32Number</a> which has ``been a part of the folklore of
33mathematicians throughout the world for many years''.
34
35<P>
36The ``Six Degrees of Kevin Bacon'' game is really a graph problem.  If
37you assign each actor to a vertex, and add an edge between two actors
38if they have appeared together in a movie, then you have a graph that
39represents the data for this game. Then the problem of finding a trail
40of actors to Kevin Bacon becomes a traditional graph problem: that of
41finding a <I>path</I> between two vertices. Since we wish to find a
42path that is shorter than six steps, ideally we would find the
43<I>shortest path</I> between the vertices. One might think to apply
44Dijkstra's shortest path algorithm to this problem, but that would be
45overkill since Dijkstra's algorithm is meant for situations when each
46edge has an associated length (or weight) and the goal is to find the
47path with the shortest cumulative length. Since we are only concerned
48with finding the shortest paths in terms of the number of edges the
49breadth-first search algorithm will solve the problem (and use less
50time than Dijkstra's).
51
52<P>
53
54<H2>Input File and Graph Setup</H2>
55
56<P>
57For this example we will use a much smaller graph than all movies and
58all actors. The full source code for this example is in <a
59href="../example/kevin-bacon.cpp"><TT>examples/kevin-bacon.cpp</TT></a>.
60We have supplied a file <TT>kevin-bacon.dat</TT> which contains a list
61of actor pairs who appeared in the same movie. Each line of the file
62contains an actor's name, a movie, and another actor that appeared in
63the movie. The ``;'' character is used as a separator. Here is an
64excerpt.
65
66<P>
67<PRE>
68Patrick Stewart;Prince of Egypt, The (1998);Steve Martin
69</PRE>
70
71<P>
72Our first task will be to read in the file and create a graph from
73it. We use a <TT>std::ifstream</TT> to input the file.
74
75<P>
76<PRE>
77  std::ifstream datafile("./kevin-bacon.dat");
78  if (!datafile) {
79    std::cerr &lt;&lt; "No ./kevin-bacon.dat file" &lt;&lt; std::endl;
80    return EXIT_FAILURE;
81  }
82</PRE>
83
84<P>
85We will want to attach the actor's names to the vertices in the graph,
86and the movie names to the edges. We use the <TT>property</TT> class to
87specify the addition of these vertex and edge properties to the graph.
88We choose to use an undirected graph, because the relationship of
89actors appearing together in a movie is symmetric.
90
91<P>
92<PRE>
93  using namespace boost;
94  typedef adjacency_list&lt;vecS, vecS, undirectedS,
95    property&lt;vertex_name_t, string&gt;,
96    property&lt;edge_name_t, string &gt; &gt;
97  &gt; Graph;
98</PRE>
99
100<P>
101To access the properties, we will need to obtain property map
102objects from the graph. The following code shows how this is done.
103
104<P>
105<PRE>
106  property_map&lt;Graph, vertex_name_t&gt;::type actor_name = get(vertex_name, g);
107  property_map&lt;Graph, edge_name_t&gt;::type connecting_movie = get(edge_name, g);
108</PRE>
109
110<P>
111Next we can start to loop through the file, parsing each line into a
112sequence of tokens using the <a href="../../tokenizer/index.html">Boost
113Tokenizer Library</a>.
114
115<P>
116<PRE>
117  for (std::string line; std::getline(datafile, line);) {
118    char_delimiters_separator&lt;char&gt; sep(false, "", ";");
119    tokenizer&lt;&gt; line_toks(line, sep);
120    ...
121  }
122</PRE>
123
124<P>
125Each line of the input corresponds to an edge in the graph, with the
126two actors as the vertices incident to the edge, and the movie name
127will be a property attached to the edge. One challenge in creating the
128graph from this format of input is that the edges are specified by a
129pair of actor names. As we add vertices to the graph, we'll need to
130create a map from actor names to their vertices so that later
131appearances of the same actor (on a different edge) can be linked with
132the correct vertex in the graph. The <a
133href="http://www.boost.org/sgi/stl/Map.html"><TT>std::map</TT></a>
134can be used to implement this mapping.
135
136<P>
137<PRE>
138  typedef graph_traits&lt;Graph&gt;::vertex_descriptor Vertex;
139  typedef std::map&lt;string, Vertex&gt; NameVertexMap;
140  NameVertexMap actors;
141</PRE>
142
143<P>
144The first token will be an actor's name. If the actor is not already
145in our actor's map we will add a vertex to the graph, set the name
146property of the vertex, and record the vertex descriptor in the map.
147If the actor is already in the graph, we will retrieve the vertex
148descriptor from the map's <TT>pos</TT> iterator.
149
150<P>
151<PRE>
152  NameVertexMap::iterator pos;
153  bool inserted;
154  Vertex u, v;
155
156  tokenizer&lt;&gt;::iterator i = line_toks.begin();
157  std::string actors_name = *i++;
158
159  boost::tie(pos, inserted) = actors.insert(std::make_pair(actors_name, Vertex()));
160  if (inserted) {
161    u = add_vertex(g);
162    actor_name[u] = actors_name;
163    pos-&gt;second = u;
164  } else
165    u = pos-&gt;second;
166</PRE>
167
168<P>
169The second token is the name of the movie, and the third token is the
170other actor. We use the same technique as above to insert the second
171actor into the graph.
172
173<P>
174<PRE>
175  std::string movie_name = *i++;
176
177  boost::tie(pos, inserted) = actors.insert(std::make_pair(*i, Vertex()));
178  if (inserted) {
179    v = add_vertex(g);
180    actor_name[v] = *i;
181    pos-&gt;second = v;
182  } else
183    v = pos-&gt;second;
184</PRE>
185
186<P>
187The final step is to add an edge connecting the two actors, and record
188the name of the connecting movie.
189
190<P>
191<PRE>
192  graph_traits&lt;Graph&gt;::edge_descriptor e;
193  boost::tie(e, inserted) = add_edge(u, v, g);
194  if (inserted)
195    connecting_movie[e] = movie_name;
196</PRE>
197
198<P>
199
200<H2>A Custom Visitor Class</H2>
201
202<P>
203We create a custom visitor class to record the Kevin Bacon
204numbers. The Kevin Bacon number will be the shortest distances from
205each actor to Kevin Bacon. This distance has also been referred to as
206the ``Bacon Number'' in memory of the ``Erdos Number'' used to rank
207mathematicians according to how many publications separate them from
208Paul Erdos. The shortest distance to from Kevin Bacon to each actor
209will follow the breadth-first tree. The BFS algorithm identifies edges
210that belong to the breadth-first tree and calls our visitor's
211<tt>tree_edge</tt> function. We record the distance to the target
212vertex as one plus the distance to the source vertex.
213
214
215<PRE>
216  template &lt;typename DistanceMap&gt;
217  class bacon_number_recorder : public default_bfs_visitor
218  {
219  public:
220    bacon_number_recorder(DistanceMap dist) : d(dist) { }
221
222    template &lt;typename Edge, typename Graph&gt;
223    void tree_edge(Edge e, const Graph& g) const
224    {
225      typename graph_traits&lt;Graph&gt;::vertex_descriptor
226	u = source(e, g), v = target(e, g);
227	d[v] = d[u] + 1;
228    }
229  private:
230      DistanceMap d;
231  };
232  // Convenience function
233  template &lt;typename DistanceMap&gt;
234  bacon_number_recorder&lt;DistanceMap&gt;
235  record_bacon_number(DistanceMap d)
236  {
237    return bacon_number_recorder&lt;DistanceMap&gt;(d);
238  }
239</PRE>
240
241<P>
242We will use a vector to store the bacon numbers.
243
244<P>
245<PRE>
246  std::vector&lt;int&gt; bacon_number(num_vertices(g));
247</PRE>
248
249<H2>Apply the Breadth-First Search</H2>
250
251<P>
252The BFS algorithm requires a source vertex as input; of course this
253will be Kevin Bacon. We now call the BGL BFS algorithm, passing in the
254graph, source vertex, and the visitor.
255
256<P>
257<PRE>
258  Vertex src = actors["Kevin Bacon"];
259  bacon_number[src] = 0;
260
261  breadth_first_search(g, src, visitor(record_bacon_number(&amp;bacon_number[0])));
262</PRE>
263
264<P>
265We can output the Bacon number for each actor simply by looping
266through all the vertices in the graph and use them to index into the
267<TT>bacon_number</TT> vector.
268
269<P>
270<PRE>
271  graph_traits&lt;Graph&gt;::vertex_iterator i, end;
272  for (boost::tie(i, end) = vertices(g); i != end; ++i)
273    std::cout &lt;&lt; actor_name[*i] &lt;&lt; "'s bacon number is "
274              &lt;&lt; bacon_number[*i] &lt;&lt; std::endl;
275</PRE>
276
277<P>
278Note that vertex descriptor objects can not always be used as indices
279into vectors or arrays such as <TT>bacon_number</TT>. This is valid
280with the <TT>adjacency_list</TT> class with <TT>VertexList=vecS</TT>,
281but not with other variations of <TT>adjacency_list</TT>. A more
282generic way to index based on vertices is to use the ID property
283map (<TT>vertex_index_t</TT>) in coordination with the <A
284HREF="../../property_map/doc/iterator_property_map.html"><TT>iterator_property_map</TT></a>.
285
286<P>
287Here are some excepts from the output of the program.
288<PRE>
289William Shatner's bacon number is 2
290Denise Richards's bacon number is 1
291Kevin Bacon's bacon number is 0
292Patrick Stewart's bacon number is 2
293Steve Martin's bacon number is 1
294...
295</PRE>
296
297
298
299
300<br>
301<HR>
302<TABLE>
303<TR valign=top>
304<TD nowrap>Copyright &copy; 2000-2001</TD><TD>
305<A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>)
306</TD></TR></TABLE>
307
308</BODY>
309</HTML>
310<!--  LocalWords:  gif Hesburgh Ginelli Fass Erd vertices cerr namespace vecS
311 -->
312<!--  LocalWords:  undirectedS Tokenizer sep tokenizer toks NameVertexMap pos
313 -->
314<!--  LocalWords:  bool Erdos BFS typename DistanceMap bfs const num BGL src
315 -->
316<!--  LocalWords:  indices Shatner's Richards's siek htm
317 -->
318