1# Chapter 3: High-level Language-Specific Analysis and Transformation 2 3[TOC] 4 5Creating a dialect that closely represents the semantics of an input language 6enables analyses, transformations and optimizations in MLIR that require 7high-level language information and are generally performed on the language AST. 8For example, `clang` has a fairly 9[heavy mechanism](https://clang.llvm.org/doxygen/classclang_1_1TreeTransform.html) 10for performing template instantiation in C++. 11 12We divide compiler transformations into two categories: local and global. In 13this chapter, we focus on how to leverage the Toy Dialect and its high-level 14semantics to perform local pattern-match transformations that would be difficult 15in LLVM. For this, we use MLIR's 16[Generic DAG Rewriter](../../PatternRewriter.md). 17 18There are two methods that can be used to implement pattern-match 19transformations: 1. Imperative, C++ pattern-match and rewrite 2. Declarative, 20rule-based pattern-match and rewrite using table-driven 21[Declarative Rewrite Rules](../../DeclarativeRewrites.md) (DRR). Note that the 22use of DRR requires that the operations be defined using ODS, as described in 23[Chapter 2](Ch-2.md). 24 25## Optimize Transpose using C++ style pattern-match and rewrite 26 27Let's start with a simple pattern and try to eliminate a sequence of two 28transposes that cancel out: `transpose(transpose(X)) -> X`. Here is the 29corresponding Toy example: 30 31```toy 32def transpose_transpose(x) { 33 return transpose(transpose(x)); 34} 35``` 36 37Which corresponds to the following IR: 38 39```mlir 40func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> { 41 %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> 42 %1 = toy.transpose(%0 : tensor<*xf64>) to tensor<*xf64> 43 toy.return %1 : tensor<*xf64> 44} 45``` 46 47This is a good example of a transformation that is trivial to match on the Toy 48IR but that would be quite hard for LLVM to figure. For example, today Clang 49can't optimize away the temporary array, and the computation with the naive 50transpose is expressed with these loops: 51 52```c++ 53#define N 100 54#define M 100 55 56void sink(void *); 57void double_transpose(int A[N][M]) { 58 int B[M][N]; 59 for(int i = 0; i < N; ++i) { 60 for(int j = 0; j < M; ++j) { 61 B[j][i] = A[i][j]; 62 } 63 } 64 for(int i = 0; i < N; ++i) { 65 for(int j = 0; j < M; ++j) { 66 A[i][j] = B[j][i]; 67 } 68 } 69 sink(A); 70} 71``` 72 73For a simple C++ approach to rewrite, involving matching a tree-like pattern in 74the IR and replacing it with a different set of operations, we can plug into the 75MLIR `Canonicalizer` pass by implementing a `RewritePattern`: 76 77```c++ 78/// Fold transpose(transpose(x)) -> x 79struct SimplifyRedundantTranspose : public mlir::OpRewritePattern<TransposeOp> { 80 /// We register this pattern to match every toy.transpose in the IR. 81 /// The "benefit" is used by the framework to order the patterns and process 82 /// them in order of profitability. 83 SimplifyRedundantTranspose(mlir::MLIRContext *context) 84 : OpRewritePattern<TransposeOp>(context, /*benefit=*/1) {} 85 86 /// This method is attempting to match a pattern and rewrite it. The rewriter 87 /// argument is the orchestrator of the sequence of rewrites. It is expected 88 /// to interact with it to perform any changes to the IR from here. 89 mlir::LogicalResult 90 matchAndRewrite(TransposeOp op, 91 mlir::PatternRewriter &rewriter) const override { 92 // Look through the input of the current transpose. 93 mlir::Value transposeInput = op.getOperand(); 94 TransposeOp transposeInputOp = transposeInput.getDefiningOp<TransposeOp>(); 95 96 // Input defined by another transpose? If not, no match. 97 if (!transposeInputOp) 98 return failure(); 99 100 // Otherwise, we have a redundant transpose. Use the rewriter. 101 rewriter.replaceOp(op, {transposeInputOp.getOperand()}); 102 return success(); 103 } 104}; 105``` 106 107The implementation of this rewriter is in `ToyCombine.cpp`. The 108[canonicalization pass](../../Canonicalization.md) applies transformations 109defined by operations in a greedy, iterative manner. To ensure that the 110canonicalization pass applies our new transform, we set 111[hasCanonicalizer = 1](../../OpDefinitions.md#hascanonicalizer) and register the 112pattern with the canonicalization framework. 113 114```c++ 115// Register our patterns for rewrite by the Canonicalization framework. 116void TransposeOp::getCanonicalizationPatterns( 117 OwningRewritePatternList &results, MLIRContext *context) { 118 results.insert<SimplifyRedundantTranspose>(context); 119} 120``` 121 122We also need to update our main file, `toyc.cpp`, to add an optimization 123pipeline. In MLIR, the optimizations are run through a `PassManager` in a 124similar way to LLVM: 125 126```c++ 127 mlir::PassManager pm(module.getContext()); 128 pm.addNestedPass<mlir::FuncOp>(mlir::createCanonicalizerPass()); 129``` 130 131Finally, we can run `toyc-ch3 test/Examples/Toy/Ch3/transpose_transpose.toy 132-emit=mlir -opt` and observe our pattern in action: 133 134```mlir 135func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> { 136 %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> 137 toy.return %arg0 : tensor<*xf64> 138} 139``` 140 141As expected, we now directly return the function argument, bypassing any 142transpose operation. However, one of the transposes still hasn't been 143eliminated. That is not ideal! What happened is that our pattern replaced the 144last transform with the function input and left behind the now dead transpose 145input. The Canonicalizer knows to clean up dead operations; however, MLIR 146conservatively assumes that operations may have side-effects. We can fix this by 147adding a new trait, `NoSideEffect`, to our `TransposeOp`: 148 149```tablegen 150def TransposeOp : Toy_Op<"transpose", [NoSideEffect]> {...} 151``` 152 153Let's retry now `toyc-ch3 test/transpose_transpose.toy -emit=mlir -opt`: 154 155```mlir 156func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> { 157 toy.return %arg0 : tensor<*xf64> 158} 159``` 160 161Perfect! No `transpose` operation is left - the code is optimal. 162 163In the next section, we use DRR for pattern match optimizations associated with 164the Reshape op. 165 166## Optimize Reshapes using DRR 167 168Declarative, rule-based pattern-match and rewrite (DRR) is an operation 169DAG-based declarative rewriter that provides a table-based syntax for 170pattern-match and rewrite rules: 171 172```tablegen 173class Pattern< 174 dag sourcePattern, list<dag> resultPatterns, 175 list<dag> additionalConstraints = [], 176 dag benefitsAdded = (addBenefit 0)>; 177``` 178 179A redundant reshape optimization similar to SimplifyRedundantTranspose can be 180expressed more simply using DRR as follows: 181 182```tablegen 183// Reshape(Reshape(x)) = Reshape(x) 184def ReshapeReshapeOptPattern : Pat<(ReshapeOp(ReshapeOp $arg)), 185 (ReshapeOp $arg)>; 186``` 187 188The automatically generated C++ code corresponding to each of the DRR patterns 189can be found under `path/to/BUILD/tools/mlir/examples/toy/Ch3/ToyCombine.inc`. 190 191DRR also provides a method for adding argument constraints when the 192transformation is conditional on some properties of the arguments and results. 193An example is a transformation that eliminates reshapes when they are redundant, 194i.e. when the input and output shapes are identical. 195 196```tablegen 197def TypesAreIdentical : Constraint<CPred<"$0.getType() == $1.getType()">>; 198def RedundantReshapeOptPattern : Pat< 199 (ReshapeOp:$res $arg), (replaceWithValue $arg), 200 [(TypesAreIdentical $res, $arg)]>; 201``` 202 203Some optimizations may require additional transformations on instruction 204arguments. This is achieved using NativeCodeCall, which allows for more complex 205transformations either by calling into a C++ helper function or by using inline 206C++. An example of such an optimization is FoldConstantReshape, where we 207optimize Reshape of a constant value by reshaping the constant in place and 208eliminating the reshape operation. 209 210```tablegen 211def ReshapeConstant : NativeCodeCall<"$0.reshape(($1.getType()).cast<ShapedType>())">; 212def FoldConstantReshapeOptPattern : Pat< 213 (ReshapeOp:$res (ConstantOp $arg)), 214 (ConstantOp (ReshapeConstant $arg, $res))>; 215``` 216 217We demonstrate these reshape optimizations using the following 218trivial_reshape.toy program: 219 220```c++ 221def main() { 222 var a<2,1> = [1, 2]; 223 var b<2,1> = a; 224 var c<2,1> = b; 225 print(c); 226} 227``` 228 229```mlir 230module { 231 func @main() { 232 %0 = toy.constant dense<[1.000000e+00, 2.000000e+00]> : tensor<2xf64> 233 %1 = toy.reshape(%0 : tensor<2xf64>) to tensor<2x1xf64> 234 %2 = toy.reshape(%1 : tensor<2x1xf64>) to tensor<2x1xf64> 235 %3 = toy.reshape(%2 : tensor<2x1xf64>) to tensor<2x1xf64> 236 toy.print %3 : tensor<2x1xf64> 237 toy.return 238 } 239} 240``` 241 242We can try to run `toyc-ch3 test/Examples/Toy/Ch3/trivial_reshape.toy -emit=mlir 243-opt` and observe our pattern in action: 244 245```mlir 246module { 247 func @main() { 248 %0 = toy.constant dense<[[1.000000e+00], [2.000000e+00]]> : tensor<2x1xf64> 249 toy.print %0 : tensor<2x1xf64> 250 toy.return 251 } 252} 253``` 254 255As expected, no reshape operations remain after canonicalization. 256 257Further details on the declarative rewrite method can be found at 258[Table-driven Declarative Rewrite Rule (DRR)](../../DeclarativeRewrites.md). 259 260In this chapter, we saw how to use certain core transformations through always 261available hooks. In the [next chapter](Ch-4.md), we will see how to use generic 262solutions that scale better through Interfaces. 263