1The analyzer "Store" represents the contents of memory regions. It is an opaque 2functional data structure stored in each ProgramState; the only class that can 3modify the store is its associated StoreManager. 4 5Currently (Feb. 2013), the only StoreManager implementation being used is 6RegionStoreManager. This store records bindings to memory regions using a "base 7region + offset" key. (This allows `*p` and `p[0]` to map to the same location, 8among other benefits.) 9 10Regions are grouped into "clusters", which roughly correspond to "regions with 11the same base region". This allows certain operations to be more efficient, 12such as invalidation. 13 14Regions that do not have a known offset use a special "symbolic" offset. These 15keys store both the original region, and the "concrete offset region" -- the 16last region whose offset is entirely concrete. (For example, in the expression 17`foo.bar[1][i].baz`, the concrete offset region is the array `foo.bar[1]`, 18since that has a known offset from the start of the top-level `foo` struct.) 19 20 21Binding Invalidation 22==================== 23 24Supporting both concrete and symbolic offsets makes things a bit tricky. Here's 25an example: 26 27 foo[0] = 0; 28 foo[1] = 1; 29 foo[i] = i; 30 31After the third assignment, nothing can be said about the value of `foo[0]`, 32because `foo[i]` may have overwritten it! Thus, *binding to a region with a 33symbolic offset invalidates the entire concrete offset region.* We know 34`foo[i]` is somewhere within `foo`, so we don't have to invalidate anything 35else, but we do have to be conservative about all other bindings within `foo`. 36 37Continuing the example: 38 39 foo[i] = i; 40 foo[0] = 0; 41 42After this latest assignment, nothing can be said about the value of `foo[i]`, 43because `foo[0]` may have overwritten it! *Binding to a region R with a 44concrete offset invalidates any symbolic offset bindings whose concrete offset 45region is a super-region **or** sub-region of R.* All we know about `foo[i]` is 46that it is somewhere within `foo`, so changing *anything* within `foo` might 47change `foo[i]`, and changing *all* of `foo` (or its base region) will 48*definitely* change `foo[i]`. 49 50This logic could be improved by using the current constraints on `i`, at the 51cost of speed. The latter case could also be improved by matching region kinds, 52i.e. changing `foo[0].a` is unlikely to affect `foo[i].b`, no matter what `i` 53is. 54 55For more detail, read through RegionStoreManager::removeSubRegionBindings in 56RegionStore.cpp. 57 58 59ObjCIvarRegions 60=============== 61 62Objective-C instance variables require a bit of special handling. Like struct 63fields, they are not base regions, and when their parent object region is 64invalidated, all the instance variables must be invalidated as well. However, 65they have no concrete compile-time offsets (in the modern, "non-fragile" 66runtime), and so cannot easily be represented as an offset from the start of 67the object in the analyzer. Moreover, this means that invalidating a single 68instance variable should *not* invalidate the rest of the object, since unlike 69struct fields or array elements there is no way to perform pointer arithmetic 70to access another instance variable. 71 72Consequently, although the base region of an ObjCIvarRegion is the entire 73object, RegionStore offsets are computed from the start of the instance 74variable. Thus it is not valid to assume that all bindings with non-symbolic 75offsets start from the base region! 76 77 78Region Invalidation 79=================== 80 81Unlike binding invalidation, region invalidation occurs when the entire 82contents of a region may have changed---say, because it has been passed to a 83function the analyzer can model, like memcpy, or because its address has 84escaped, usually as an argument to an opaque function call. In these cases we 85need to throw away not just all bindings within the region itself, but within 86its entire cluster, since neighboring regions may be accessed via pointer 87arithmetic. 88 89Region invalidation typically does even more than this, however. Because it 90usually represents the complete escape of a region from the analyzer's model, 91its *contents* must also be transitively invalidated. (For example, if a region 92'p' of type 'int **' is invalidated, the contents of '*p' and '**p' may have 93changed as well.) The algorithm that traverses this transitive closure of 94accessible regions is known as ClusterAnalysis, and is also used for finding 95all live bindings in the store (in order to throw away the dead ones). The name 96"ClusterAnalysis" predates the cluster-based organization of bindings, but 97refers to the same concept: during invalidation and liveness analysis, all 98bindings within a cluster must be treated in the same way for a conservative 99model of program behavior. 100 101 102Default Bindings 103================ 104 105Most bindings in RegionStore are simple scalar values -- integers and pointers. 106These are known as "Direct" bindings. However, RegionStore supports a second 107type of binding called a "Default" binding. These are used to provide values to 108all the elements of an aggregate type (struct or array) without having to 109explicitly specify a binding for each individual element. 110 111When there is no Direct binding for a particular region, the store manager 112looks at each super-region in turn to see if there is a Default binding. If so, 113this value is used as the value of the original region. The search ends when 114the base region is reached, at which point the RegionStore will pick an 115appropriate default value for the region (usually a symbolic value, but 116sometimes zero, for static data, or "uninitialized", for stack variables). 117 118 int manyInts[10]; 119 manyInts[1] = 42; // Creates a Direct binding for manyInts[1]. 120 print(manyInts[1]); // Retrieves the Direct binding for manyInts[1]; 121 print(manyInts[0]); // There is no Direct binding for manyInts[1]. 122 // Is there a Default binding for the entire array? 123 // There is not, but it is a stack variable, so we use 124 // "uninitialized" as the default value (and emit a 125 // diagnostic!). 126 127NOTE: The fact that bindings are stored as a base region plus an offset limits 128the Default Binding strategy, because in C aggregates can contain other 129aggregates. In the current implementation of RegionStore, there is no way to 130distinguish a Default binding for an entire aggregate from a Default binding 131for the sub-aggregate at offset 0. 132 133 134Lazy Bindings (LazyCompoundVal) 135=============================== 136 137RegionStore implements an optimization for copying aggregates (structs and 138arrays) called "lazy bindings", implemented using a special SVal called 139LazyCompoundVal. When the store is asked for the "binding" for an entire 140aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a 141LazyCompoundVal instead. When this value is then stored into a variable, it is 142bound as a Default value. This makes copying arrays and structs much cheaper 143than if they had required memberwise access. 144 145Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region, 146store), representing "the value of the region during this 'snapshot' of the 147store". This has important implications for any sort of liveness or 148reachability analysis, which must take the bindings in the old store into 149account. 150 151Retrieving a value from a lazy binding happens in the same way as any other 152Default binding: since there is no direct binding, the store manager falls back 153to super-regions to look for an appropriate default binding. LazyCompoundVal 154differs from a normal default binding, however, in that it contains several 155different values, instead of one value that will appear several times. Because 156of this, the store manager has to reconstruct the subregion chain on top of the 157LazyCompoundVal region, and look up *that* region in the previous store. 158 159Here's a concrete example: 160 161 CGPoint p; 162 p.x = 42; // A Direct binding is made to the FieldRegion 'p.x'. 163 CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a 164 // snapshot of the current store state. This value is then 165 // used as a Default binding for the VarRegion 'p2'. 166 return p2.x; // The binding for FieldRegion 'p2.x' is requested. 167 // There is no Direct binding, so we look for a Default 168 // binding to 'p2' and find the LCV. 169 // Because it's an LCV, we look at our requested region 170 // and see that it's the '.x' field. We ask for the value 171 // of 'p.x' within the snapshot, and get back 42. 172