1========= 2MemorySSA 3========= 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11``MemorySSA`` is an analysis that allows us to cheaply reason about the 12interactions between various memory operations. Its goal is to replace 13``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because, 14unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily 15result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't 16have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get 17better results, too. 18 19At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based 20form for memory, complete with def-use and use-def chains, which 21enables users to quickly find may-def and may-uses of memory operations. 22It can also be thought of as a way to cheaply give versions to the complete 23state of heap memory, and associate memory operations with those versions. 24 25This document goes over how ``MemorySSA`` is structured, and some basic 26intuition on how ``MemorySSA`` works. 27 28A paper on MemorySSA (with notes about how it's implemented in GCC) `can be 29found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's 30relatively out-of-date; the paper references multiple heap partitions, but GCC 31eventually swapped to just using one, like we now have in LLVM. Like 32GCC's, LLVM's MemorySSA is intraprocedural. 33 34 35MemorySSA Structure 36=================== 37 38MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a 39structure that maps ``Instruction``\ s to ``MemoryAccess``\ es, which are 40``MemorySSA``'s parallel to LLVM ``Instruction``\ s. 41 42Each ``MemoryAccess`` can be one of three types: 43 44- ``MemoryPhi`` 45- ``MemoryUse`` 46- ``MemoryDef`` 47 48``MemoryPhi``\ s are ``PhiNode``\ s, but for memory operations. If at any 49point we have two (or more) ``MemoryDef``\ s that could flow into a 50``BasicBlock``, the block's top ``MemoryAccess`` will be a 51``MemoryPhi``. As in LLVM IR, ``MemoryPhi``\ s don't correspond to any 52concrete operation. As such, ``BasicBlock``\ s are mapped to ``MemoryPhi``\ s 53inside ``MemorySSA``, whereas ``Instruction``\ s are mapped to ``MemoryUse``\ s 54and ``MemoryDef``\ s. 55 56Note also that in SSA, Phi nodes merge must-reach definitions (that is, 57definitions that *must* be new versions of variables). In MemorySSA, PHI nodes 58merge may-reach definitions (that is, until disambiguated, the versions that 59reach a phi node may or may not clobber a given variable). 60 61``MemoryUse``\ s are operations which use but don't modify memory. An example of 62a ``MemoryUse`` is a ``load``, or a ``readonly`` function call. 63 64``MemoryDef``\ s are operations which may either modify memory, or which 65introduce some kind of ordering constraints. Examples of ``MemoryDef``\ s 66include ``store``\ s, function calls, ``load``\ s with ``acquire`` (or higher) 67ordering, volatile operations, memory fences, etc. 68 69Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``. 70It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being 71run on, and implies that we've hit the top of the function. It's the only 72``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of 73``liveOnEntry`` implies that the memory being used is either undefined or 74defined before the function begins. 75 76An example of all of this overlaid on LLVM IR (obtained by running ``opt 77-passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When 78viewing this example, it may be helpful to view it in terms of clobbers. The 79operands of a given ``MemoryAccess`` are all (potential) clobbers of said 80MemoryAccess, and the value produced by a ``MemoryAccess`` can act as a clobber 81for other ``MemoryAccess``\ es. Another useful way of looking at it is in 82terms of heap versions. In that view, operands of a given 83``MemoryAccess`` are the version of the heap before the operation, and 84if the access produces a value, the value is the new version of the heap 85after the operation. 86 87.. code-block:: llvm 88 89 define void @foo() { 90 entry: 91 %p1 = alloca i8 92 %p2 = alloca i8 93 %p3 = alloca i8 94 ; 1 = MemoryDef(liveOnEntry) 95 store i8 0, i8* %p3 96 br label %while.cond 97 98 while.cond: 99 ; 6 = MemoryPhi({%0,1},{if.end,4}) 100 br i1 undef, label %if.then, label %if.else 101 102 if.then: 103 ; 2 = MemoryDef(6) 104 store i8 0, i8* %p1 105 br label %if.end 106 107 if.else: 108 ; 3 = MemoryDef(6) 109 store i8 1, i8* %p2 110 br label %if.end 111 112 if.end: 113 ; 5 = MemoryPhi({if.then,2},{if.else,3}) 114 ; MemoryUse(5) 115 %1 = load i8, i8* %p1 116 ; 4 = MemoryDef(5) 117 store i8 2, i8* %p2 118 ; MemoryUse(1) 119 %2 = load i8, i8* %p3 120 br label %while.cond 121 } 122 123The ``MemorySSA`` IR is shown in comments that precede the instructions they map 124to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)`` 125is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM 126instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this 127particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8* 128%p1`` in LLVM with ``%1``). Again, ``MemoryPhi``\ s don't correspond to any LLVM 129Instruction, so the line directly below a ``MemoryPhi`` isn't special. 130 131Going from the top down: 132 133- ``6 = MemoryPhi({entry,1},{if.end,4})`` notes that, when entering 134 ``while.cond``, the reaching definition for it is either ``1`` or ``4``. This 135 ``MemoryPhi`` is referred to in the textual IR by the number ``6``. 136- ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition, 137 and its reaching definition before it is ``6``, or the ``MemoryPhi`` after 138 ``while.cond``. (See the `Build-time use optimization`_ and `Precision`_ 139 sections below for why this ``MemoryDef`` isn't linked to a separate, 140 disambiguated ``MemoryPhi``.) 141- ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its 142 reaching definition is also ``6``. 143- ``5 = MemoryPhi({if.then,2},{if.else,3})`` notes that the clobber before 144 this block could either be ``2`` or ``3``. 145- ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that 146 it's clobbered by ``5``. 147- ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; it's 148 reaching definition is ``5``. 149- ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory, 150 and the last thing that could clobber this use is above ``while.cond`` (e.g. 151 the store to ``%p3``). In heap versioning parlance, it really only depends on 152 the heap version 1, and is unaffected by the new heap versions generated since 153 then. 154 155As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not 156meant to interact with LLVM IR. 157 158Design of MemorySSA 159=================== 160 161``MemorySSA`` is an analysis that can be built for any arbitrary function. When 162it's built, it does a pass over the function's IR in order to build up its 163mapping of ``MemoryAccess``\ es. You can then query ``MemorySSA`` for things 164like the dominance relation between ``MemoryAccess``\ es, and get the 165``MemoryAccess`` for any given ``Instruction`` . 166 167When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker`` 168that you can use (see below). 169 170 171The walker 172---------- 173 174A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or 175the walker, for short. The goal of the walker is to provide answers to clobber 176queries beyond what's represented directly by ``MemoryAccess``\ es. For example, 177given: 178 179.. code-block:: llvm 180 181 define void @foo() { 182 %a = alloca i8 183 %b = alloca i8 184 185 ; 1 = MemoryDef(liveOnEntry) 186 store i8 0, i8* %a 187 ; 2 = MemoryDef(1) 188 store i8 0, i8* %b 189 } 190 191The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would 192be the walker's goal to figure this out, and return ``liveOnEntry`` when queried 193for the clobber of ``MemoryAccess`` ``2``. 194 195By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef``\ s 196and ``MemoryUse``\ s by consulting whatever alias analysis stack you happen to 197be using. Walkers were built to be flexible, though, so it's entirely reasonable 198(and expected) to create more specialized walkers (e.g. one that specifically 199queries ``GlobalsAA``, one that always stops at ``MemoryPhi`` nodes, etc). 200 201 202Locating clobbers yourself 203^^^^^^^^^^^^^^^^^^^^^^^^^^ 204 205If you choose to make your own walker, you can find the clobber for a 206``MemoryAccess`` by walking every ``MemoryDef`` that dominates said 207``MemoryAccess``. The structure of ``MemoryDef``\ s makes this relatively simple; 208they ultimately form a linked list of every clobber that dominates the 209``MemoryAccess`` that you're trying to optimize. In other words, the 210``definingAccess`` of a ``MemoryDef`` is always the nearest dominating 211``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``. 212 213 214Build-time use optimization 215--------------------------- 216 217``MemorySSA`` will optimize some ``MemoryAccess``\ es at build-time. 218Specifically, we optimize the operand of every ``MemoryUse`` to point to the 219actual clobber of said ``MemoryUse``. This can be seen in the above example; the 220second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a 221``MemoryDef`` from the entry block. This is done to make walking, 222value numbering, etc, faster and easier. 223 224It is not possible to optimize ``MemoryDef`` in the same way, as we 225restrict ``MemorySSA`` to one heap variable and, thus, one Phi node 226per block. 227 228 229Invalidation and updating 230------------------------- 231 232Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever 233the IR is updated. "Update", in this case, includes the addition, deletion, and 234motion of ``Instructions``. The update API is being made on an as-needed basis. 235If you'd like examples, ``GVNHoist`` is a user of ``MemorySSA``\ s update API. 236 237 238Phi placement 239^^^^^^^^^^^^^ 240 241``MemorySSA`` only places ``MemoryPhi``\ s where they're actually 242needed. That is, it is a pruned SSA form, like LLVM's SSA form. For 243example, consider: 244 245.. code-block:: llvm 246 247 define void @foo() { 248 entry: 249 %p1 = alloca i8 250 %p2 = alloca i8 251 %p3 = alloca i8 252 ; 1 = MemoryDef(liveOnEntry) 253 store i8 0, i8* %p3 254 br label %while.cond 255 256 while.cond: 257 ; 3 = MemoryPhi({%0,1},{if.end,2}) 258 br i1 undef, label %if.then, label %if.else 259 260 if.then: 261 br label %if.end 262 263 if.else: 264 br label %if.end 265 266 if.end: 267 ; MemoryUse(1) 268 %1 = load i8, i8* %p1 269 ; 2 = MemoryDef(3) 270 store i8 2, i8* %p2 271 ; MemoryUse(1) 272 %2 = load i8, i8* %p3 273 br label %while.cond 274 } 275 276Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi`` 277for ``if.end`` would be pointless, so we don't place one. So, if you need to 278place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create 279a ``MemoryPhi`` for ``if.end``. 280 281If it turns out that this is a large burden, we can just place ``MemoryPhi``\ s 282everywhere. Because we have Walkers that are capable of optimizing above said 283phis, doing so shouldn't prohibit optimizations. 284 285 286Non-Goals 287--------- 288 289``MemorySSA`` is meant to reason about the relation between memory 290operations, and enable quicker querying. 291It isn't meant to be the single source of truth for all potential memory-related 292optimizations. Specifically, care must be taken when trying to use ``MemorySSA`` 293to reason about atomic or volatile operations, as in: 294 295.. code-block:: llvm 296 297 define i8 @foo(i8* %a) { 298 entry: 299 br i1 undef, label %if.then, label %if.end 300 301 if.then: 302 ; 1 = MemoryDef(liveOnEntry) 303 %0 = load volatile i8, i8* %a 304 br label %if.end 305 306 if.end: 307 %av = phi i8 [0, %entry], [%0, %if.then] 308 ret i8 %av 309 } 310 311Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may 312seem legal. Because it's a volatile load, though, it's not. 313 314 315Design tradeoffs 316---------------- 317 318Precision 319^^^^^^^^^ 320 321``MemorySSA`` in LLVM deliberately trades off precision for speed. 322Let us think about memory variables as if they were disjoint partitions of the 323heap (that is, if you have one variable, as above, it represents the entire 324heap, and if you have multiple variables, each one represents some 325disjoint portion of the heap) 326 327First, because alias analysis results conflict with each other, and 328each result may be what an analysis wants (IE 329TBAA may say no-alias, and something else may say must-alias), it is 330not possible to partition the heap the way every optimization wants. 331Second, some alias analysis results are not transitive (IE A noalias B, 332and B noalias C, does not mean A noalias C), so it is not possible to 333come up with a precise partitioning in all cases without variables to 334represent every pair of possible aliases. Thus, partitioning 335precisely may require introducing at least N^2 new virtual variables, 336phi nodes, etc. 337 338Each of these variables may be clobbered at multiple def sites. 339 340To give an example, if you were to split up struct fields into 341individual variables, all aliasing operations that may-def multiple struct 342fields, will may-def more than one of them. This is pretty common (calls, 343copies, field stores, etc). 344 345Experience with SSA forms for memory in other compilers has shown that 346it is simply not possible to do this precisely, and in fact, doing it 347precisely is not worth it, because now all the optimizations have to 348walk tons and tons of virtual variables and phi nodes. 349 350So we partition. At the point at which you partition, again, 351experience has shown us there is no point in partitioning to more than 352one variable. It simply generates more IR, and optimizations still 353have to query something to disambiguate further anyway. 354 355As a result, LLVM partitions to one variable. 356 357Use Optimization 358^^^^^^^^^^^^^^^^ 359 360Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one 361useful guarantee - all loads are optimized to point at the thing that 362actually clobbers them. This gives some nice properties. For example, 363for a given store, you can find all loads actually clobbered by that 364store by walking the immediate uses of the store. 365