1 2namespace Eigen { 3 4/** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions 5 6In general achieving good performance with Eigen does no require any special effort: 7simply write your expressions in the most high level way. This is especially true 8for small fixed size matrices. For large matrices, however, it might be useful to 9take some care when writing your expressions in order to minimize useless evaluations 10and optimize the performance. 11In this page we will give a brief overview of the Eigen's internal mechanism to simplify 12and evaluate complex product expressions, and discuss the current limitations. 13In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e, 14all kind of matrix products and triangular solvers. 15 16Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar 17to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and 18natural API. Each of these routines can compute in a single evaluation a wide variety of expressions. 19Given an expression, the challenge is then to map it to a minimal set of routines. 20As explained latter, this mechanism has some limitations, and knowing them will allow 21you to write faster code by making your expressions more Eigen friendly. 22 23\section GEMM General Matrix-Matrix product (GEMM) 24 25Let's start with the most common primitive: the matrix product of general dense matrices. 26In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can 27perform the following operation: 28\f$ C.noalias() += \alpha op1(A) op2(B) \f$ 29where A, B, and C are column and/or row major matrices (or sub-matrices), 30alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity. 31When Eigen detects a matrix product, it analyzes both sides of the product to extract a 32unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states. 33More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple, 34negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order 35and shape. All other expressions are immediately evaluated. 36For instance, the following expression: 37\code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2)) \endcode 38is automatically simplified to: 39\code m1.noalias() += (s1*s2*conj(s3)*s4) * m2.adjoint() * m3.conjugate() \endcode 40which exactly matches our GEMM routine. 41 42\subsection GEMM_Limitations Limitations 43Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be 44handled by a single GEMM-like call are correctly detected. 45<table class="manual" style="width:100%"> 46<tr> 47<th>Not optimal expression</th> 48<th>Evaluated as</th> 49<th>Optimal version (single evaluation)</th> 50<th>Comments</th> 51</tr> 52<tr> 53<td>\code 54m1 += m2 * m3; \endcode</td> 55<td>\code 56temp = m2 * m3; 57m1 += temp; \endcode</td> 58<td>\code 59m1.noalias() += m2 * m3; \endcode</td> 60<td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias. 61 Otherwise the product m2 * m3 is evaluated into a temporary.</td> 62</tr> 63<tr class="alt"> 64<td></td> 65<td></td> 66<td>\code 67m1.noalias() += s1 * (m2 * m3); \endcode</td> 68<td>This is a special feature of Eigen. Here the product between a scalar 69 and a matrix product does not evaluate the matrix product but instead it 70 returns a matrix product expression tracking the scalar scaling factor. <br> 71 Without this optimization, the matrix product would be evaluated into a 72 temporary as in the next example.</td> 73</tr> 74<tr> 75<td>\code 76m1.noalias() += (m2 * m3).adjoint(); \endcode</td> 77<td>\code 78temp = m2 * m3; 79m1 += temp.adjoint(); \endcode</td> 80<td>\code 81m1.noalias() += m3.adjoint() 82* * m2.adjoint(); \endcode</td> 83<td>This is because the product expression has the EvalBeforeNesting bit which 84 enforces the evaluation of the product by the Tranpose expression.</td> 85</tr> 86<tr class="alt"> 87<td>\code 88m1 = m1 + m2 * m3; \endcode</td> 89<td>\code 90temp = m2 * m3; 91m1 = m1 + temp; \endcode</td> 92<td>\code m1.noalias() += m2 * m3; \endcode</td> 93<td>Here there is no way to detect at compile time that the two m1 are the same, 94 and so the matrix product will be immediately evaluated.</td> 95</tr> 96<tr> 97<td>\code 98m1.noalias() = m4 + m2 * m3; \endcode</td> 99<td>\code 100temp = m2 * m3; 101m1 = m4 + temp; \endcode</td> 102<td>\code 103m1 = m4; 104m1.noalias() += m2 * m3; \endcode</td> 105<td>First of all, here the .noalias() in the first expression is useless because 106 m2*m3 will be evaluated anyway. However, note how this expression can be rewritten 107 so that no temporary is required. (tip: for very small fixed size matrix 108 it is slighlty better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td> 109</tr> 110<tr class="alt"> 111<td>\code 112m1.noalias() += (s1*m2).block(..) * m3; \endcode</td> 113<td>\code 114temp = (s1*m2).block(..); 115m1 += temp * m3; \endcode</td> 116<td>\code 117m1.noalias() += s1 * m2.block(..) * m3; \endcode</td> 118<td>This is because our expression analyzer is currently not able to extract trivial 119 expressions nested in a Block expression. Therefore the nested scalar 120 multiple cannot be properly extracted.</td> 121</tr> 122</table> 123 124Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices. 125 126*/ 127 128} 129