1<HTML> 2<!-- 3 Copyright (c) Jeremy Siek 2000 4 5 Distributed under the Boost Software License, Version 1.0. 6 (See accompanying file LICENSE_1_0.txt or copy at 7 http://www.boost.org/LICENSE_1_0.txt) 8 --> 9<Head> 10<Title>Sparse Matrix Ordering Example</Title> 11<BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b" 12 ALINK="#ff0000"> 13<IMG SRC="../../../boost.png" 14 ALT="C++ Boost" width="277" height="86"> 15 16<BR Clear> 17 18<H1><A NAME="sec:sparse-matrix-ordering"></A> 19Sparse Matrix Ordering 20</H1> 21 22<P> 23Graph theory was identified as a powerful tool for sparse matrix 24computation when Seymour Parter used undirected graphs to model 25symmetric Gaussian elimination more than 30 years ago [<A HREF="bibliography.html#parter61:_gauss">28</A>]. 26Graphs can be used to model symmetric matrices, factorizations and 27algorithms on non-symmetric matrices, such as fill paths in Gaussian 28elimination, strongly connected components in matrix irreducibility, 29bipartite matching, and alternating paths in linear dependence and 30structural singularity. Not only do graphs make it easier to 31understand and analyze sparse matrix algorithms, but they broaden the 32area of manipulating sparse matrices using existing graph algorithms 33and techniques [<A 34 HREF="bibliography.html#george93:graphtheory">13</A>]. In this section, we are 35going to illustrate how to use BGL in sparse matrix computation such 36as ordering algorithms. Before we go further into the sparse matrix 37algorithms, let us take a step back and review a few things. 38 39<P> 40 41<H2>Graphs and Sparse Matrices</H2> 42 43<P> 44A graph is fundamentally a way to represent a binary relation between 45objects. The nonzero pattern of a sparse matrix also describes a 46binary relation between unknowns. Therefore the nonzero pattern of a 47sparse matrix of a linear system can be modeled with a graph 48<I>G(V,E)</I>, whose <I>n</I> vertices in <I>V</I> represent the 49<I>n</I> unknowns, and where there is an edge from vertex <I>i</I> to 50vertex <I>j</I> when <i>A<sub>ij</sub></i> is nonzero. Thus, when a 51matrix has a symmetric nonzero pattern, the corresponding graph is 52undirected. 53 54<P> 55 56<H2>Sparse Matrix Ordering Algorithms</H2> 57 58<P> 59The process for solving a sparse symmetric positive definite linear 60system, <i>Ax=b</i>, can be divided into four stages as follows: 61<DL> 62<DT><STRONG>Ordering:</STRONG></DT> 63<DD>Find a permutation <i>P</i> of matrix <i>A</i>, 64</DD> 65<DT><STRONG>Symbolic factorization:</STRONG></DT> 66<DD>Set up a data structure for the Cholesky 67 factor <i>L</i> of <i>PAP<sup>T</sup></i>, 68</DD> 69<DT><STRONG>Numerical factorization:</STRONG></DT> 70<DD>Decompose <i>PAP<sup>T</sup></i> into <i>LL<sup>T</sup></i>, 71</DD> 72<DT><STRONG>Triangular system solution:</STRONG></DT> 73<DD>Solve <i>LL<sup>T</sup>Px=Pb</i> for <i>x</i>. 74</DD> 75</DL> 76Because the choice of permutation <i>P</i> will directly determine the 77number of fill-in elements (elements present in the non-zero structure 78of <i>L</i> that are not present in the non-zero structure of 79<i>A</i>), the ordering has a significant impact on the memory and 80computational requirements for the latter stages. However, finding 81the optimal ordering for <i>A</i> (in the sense of minimizing fill-in) 82has been proven to be NP-complete [<a 83href="bibliography.html#garey79:computers-and-intractability">30</a>] 84requiring that heuristics be used for all but simple (or specially 85structured) cases. 86 87<P> 88A widely used but rather simple ordering algorithm is a variant of the 89Cuthill-McKee orderings, the reverse Cuthill-McKee ordering algorithm. 90This algorithm can be used as a preordering method to improve ordering 91in more sophisticated methods such as minimum degree 92algorithms [<a href="bibliography.html#LIU:MMD">21</a>]. 93 94<P> 95 96<H3><A NAME="SECTION001032100000000000000"> 97Reverse Cuthill-McKee Ordering Algorithm</A> 98</H3> 99The original Cuthill-McKee ordering algorithm is primarily designed to 100reduce the profile of a matrix [<A 101 HREF="bibliography.html#george81:__sparse_pos_def">14</A>]. 102George discovered that the reverse ordering often turned out to be 103superior to the original ordering in 1971. Here we describe the 104Reverse Cuthill-McKee algorithm in terms of a graph: 105 106<OL> 107<LI><I>Finding a starting vertex</I>: Determine a starting vertex 108 <I>r</I> and assign <i>r</i> to <I>x<sub>1</sub></I>. 109</LI> 110<LI><I>Main part</I>: For <I>i=1,...,N</I>, find all 111 the unnumbered neighbors of the vertex <I>x<sub>i</sub></I> and number them 112 in increasing order of degree. 113</LI> 114<LI><I>Reversing ordering</I>: The reverse Cuthill-McKee ordering 115 is given by <I>y<sub>1</sub>,...,y<sub>N</sub></I> where 116 <I>y<sub>i</sub></I> is <I>x<sub>N-i+1</sub></I> for <I>i = 1,...,N</I>. 117</LI> 118</OL> 119In the first step a good starting vertex needs to be determined. The 120study by George and Liu [<A 121HREF="bibliography.html#george81:__sparse_pos_def">14</A>] showed that 122a pair of vertices which are at maximum or near maximum ``distance'' 123apart are good ones. They also proposed an algorithm to find such a 124starting vertex in [<A 125HREF="bibliography.html#george81:__sparse_pos_def">14</A>], which we 126show in <A 127HREF="#fig:ggcl_find_starting_vertex">Figure 1281</A> implemented using the BGL interface. 129 130<P> 131<P></P> 132<DIV ALIGN="CENTER"><A NAME="fig:ggcl_find_starting_vertex"></A> 133<table border><tr><td> 134<pre> 135namespace boost { 136 template <class Graph, class Vertex, class Color, class Degree> 137 Vertex 138 pseudo_peripheral_pair(Graph& G, const Vertex& u, int& ecc, 139 Color color, Degree degree) 140 { 141 typename property_traits<Color>::value_type c = get(color, u); 142 143 rcm_queue<Vertex, Degree> Q(degree); 144 145 typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end; 146 for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui) 147 put(color, *ui, white(c)); 148 breadth_first_search(G, u, Q, bfs_visitor<>(), color); 149 150 ecc = Q.eccentricity(); 151 return Q.spouse(); 152 } 153 154 template <class Graph, class Vertex, class Color, class Degree> 155 Vertex find_starting_node(Graph& G, Vertex r, Color c, Degree d) { 156 int eccen_r, eccen_x; 157 Vertex x = pseudo_peripheral_pair(G, r, eccen_r, c, d); 158 Vertex y = pseudo_peripheral_pair(G, x, eccen_x, c, d); 159 160 while (eccen_x > eccen_r) { 161 r = x; 162 eccen_r = eccen_x; 163 x = y; 164 y = pseudo_peripheral_pair(G, x, eccen_x, c, d); 165 } 166 return x; 167 } 168} // namespace boost 169</pre> 170</td></tr></table> 171<CAPTION ALIGN="BOTTOM"> 172<table><tr><td> 173<STRONG>Figure 1:</STRONG> 174The BGL implementation of <TT>find_starting_node</TT>. The key 175part <TT>pseudo_peripheral_pair</TT> is BFS with a custom queue 176 type virtually. 177</td></tr></table></CAPTION> 178</DIV> 179<P></P> 180 181The key part of step one is a custom queue type with BFS as shown in 182<A 183HREF="#fig:ggcl_find_starting_vertex">Figure 1841</A>. The main Cuthill-McKee algorithm shown in Figure <A 185HREF="#fig:cuthill-mckee">Figure 2</A> 186is a fenced priority queue with a generalized BFS. If 187<TT>inverse_permutation</TT> is a normal iterator, the result stored 188is the inverse permutation of original Cuthill-McKee ordering. On the 189other hand, if <TT>inverse_permutation</TT> is a reverse iterator, the 190result stored is the inverse permutation of reverse Cuthill-McKee 191ordering. Our implementation is quite concise because many components 192from BGL can be reused. 193 194 195<P></P> 196<DIV ALIGN="CENTER"><A NAME="fig:cuthill-mckee"></A><A NAME="6261"></A> 197<table border><tr><td> 198<pre> 199 template < class Graph, class Vertex, class OutputIterator, 200 class Color, class Degree > 201 inline void 202 cuthill_mckee_ordering(Graph& G, 203 Vertex s, 204 OutputIterator inverse_permutation, 205 Color color, Degree degree) 206 { 207 typedef typename property_traits<Degree>::value_type DS; 208 typename property_traits<Color>::value_type c = get(color, s); 209 typedef indirect_cmp<Degree, std::greater<DS> > Compare; 210 Compare comp(degree); 211 fenced_priority_queue<Vertex, Compare > Q(comp); 212 213 typedef cuthill_mckee_visitor<OutputIterator> CMVisitor; 214 CMVisitor cm_visitor(inverse_permutation); 215 216 typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end; 217 for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui) 218 put(color, *ui, white(c)); 219 breadth_first_search(G, s, Q, cm_visitor, color); 220 } 221</pre> 222</td></tr></table> 223<CAPTION ALIGN="BOTTOM"> 224<TABLE> 225<TR><TD> <STRONG>Figure 2:</STRONG> The BGL implementation of 226Cuthill-McKee algoithm. </TD></TR> </TABLE></CAPTION> </DIV><P></P> 227<P> 228 229 230<P> 231 232<H3>Minimum Degree Ordering Algorithm</H3> 233 234<P> 235The pattern of another category of high-quality ordering algorithms in 236wide use is based on a greedy approach such that the ordering is 237chosen to minimize some quantity at each step of a simulated -step 238symmetric Gaussian elimination process. The algorithms using such an 239approach are typically distinguished by their greedy minimization 240criteria [<a href="bibliography.html#ng-raghavan">34</a>]. 241 242<P> 243In graph terms, the basic ordering process used by most greedy 244algorithms is as follows: 245 246<OL> 247<LI><EM>Start:</EM> Construct undirected graph <i>G<sup>0</sup></i> 248 corresponding to matrix <i>A</i> 249 250</LI> 251<LI><EM>Iterate:</EM> For <i>k = 1, 2, ... </i> until 252 <i>G<sup>k</sup> = { }</i> do: 253 254<UL> 255<LI>Choose a vertex <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> 256 according to some criterion 257</LI> 258<LI>Eliminate <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> to form 259 <i>G<sup>k+1</sup></i> 260 261</LI> 262</UL> 263</LI> 264</OL> 265The resulting ordering is the sequence of vertices <i>{v<sup>0</sup>, 266v<sup>1</sup>,...}</i> selected by the algorithm. 267 268<P> 269One of the most important examples of such an algorithm is the 270<I>Minimum Degree</I> algorithm. At each step the minimum degree 271algorithm chooses the vertex with minimum degree in the corresponding 272graph as <i>v<sup>k</sup></i>. A number of enhancements to the basic 273minimum degree algorithm have been developed, such as the use of a 274quotient graph representation, mass elimination, incomplete degree 275update, multiple elimination, and external degree. See [<A 276href="bibliography.html#George:evolution">35</a>] for a historical 277survey of the minimum degree algorithm. 278 279<P> 280The BGL implementation of the Minimum Degree algorithm closely follows 281the algorithmic descriptions of the one in [<A 282HREF="bibliography.html#LIU:MMD">21</A>]. The implementation presently 283includes the enhancements for mass elimination, incomplete degree 284update, multiple elimination, and external degree. 285 286<P> 287In particular, we create a graph representation to improve the 288performance of the algorithm. It is based on a templated ``vector of 289vectors.'' The vector container used is an adaptor class built on top 290the STL <TT>std::vector</TT> class. Particular characteristics of this 291adaptor class include the following: 292 293<UL> 294<LI>Erasing elements does not shrink the associated memory. 295 Adding new elements after erasing will not need to allocate 296 additional memory. 297</LI> 298<LI>Additional memory is allocated efficiently on demand when new 299 elements are added (doubling the capacity every time it is 300 increased). This property comes from STL vector. 301</LI> 302</UL> 303 304<P> 305Note that this representation is similar to that used in Liu's 306implementation, with some important differences due to dynamic memory 307allocation. With the dynamic memory allocation we do not need to 308over-write portions of the graph that have been eliminated, 309allowing for a more efficient graph traversal. More importantly, 310information about the elimination graph is preserved allowing for 311trivial symbolic factorization. Since symbolic factorization can be 312an expensive part of the entire solution process, improving its 313performance can result in significant computational savings. 314 315<P> 316The overhead of dynamic memory allocation could conceivably compromise 317performance in some cases. However, in practice, memory allocation 318overhead does not contribute significantly to run-time for our 319implementation as shown in [] because it is not 320done very often and the cost gets amortized. 321 322<P> 323 324<H2>Proper Self-Avoiding Walk</H2> 325 326<P> 327The finite element method (FEM) is a flexible and attractive numerical 328approach for solving partial differential equations since it is 329straightforward to handle geometrically complicated domains. However, 330unstructured meshes generated by FEM does not provide an obvious 331labeling (numbering) of the unknowns while it is vital to have it for 332matrix-vector notation of the underlying algebraic equations. Special 333numbering techniques have been developed to optimize memory usage and 334locality of such algorithms. One novel technique is the self-avoiding 335walk []. 336 337<P> 338If we think a mesh is a graph, a self-avoiding walk(SAW) over an 339arbitrary unstructured two-dimensional mesh is an enumeration of all 340the triangles of the mesh such that two successive triangles shares an 341edge or a vertex. A proper SAW is a SAW where jumping twice over the 342same vertex is forbidden for three consecutive triangles in the walk. 343it can be used to improve parallel efficiency of several irregular 344algorithms, in particular issues related to reducing runtime memory 345access (improving locality) and interprocess communications. The 346reference [] has proved the existence 347of A PSAW for any arbitrary triangular mesh by extending an existing 348PSAW over a small piece of mesh to a larger part. The proof 349effectively provides a set of rules to construct a PSAW. 350 351<P> 352The algorithm family of constructing a PSAW on a mesh is to start from 353any random triangle of the mesh, choose new triangles sharing an edge 354with the current sub-mesh and extend the existing partial PSAW over 355the new triangles. 356 357<P> 358Let us define a dual graph of a mesh. Let a triangle in the mesh be a 359vertex and two triangles sharing an edge in the mesh means there is an 360edge between two vertices in the dual graph. By using a dual graph of 361a mesh, one way to implement the algorithm family of constructing a 362PSAW is to reuse BGL algorithms such as BFS and DFS with a customized 363visitor to provide operations during 364traversal. The customized visitor has the function <TT>tree_edge()</TT> 365which is to extend partial PSAW in case by case and function 366<TT>start_vertex()</TT> which is to set up the PSAW for the starting vertex. 367 368<br> 369<HR> 370<TABLE> 371<TR valign=top> 372<TD nowrap>Copyright © 2000-2001</TD><TD> 373<A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>) 374</TD></TR></TABLE> 375 376</BODY> 377</HTML> 378