1Intro 2===== 3 4The basic rule for dealing with weakref callbacks (and __del__ methods too, 5for that matter) during cyclic gc: 6 7 Once gc has computed the set of unreachable objects, no Python-level 8 code can be allowed to access an unreachable object. 9 10If that can happen, then the Python code can resurrect unreachable objects 11too, and gc can't detect that without starting over. Since gc eventually 12runs tp_clear on all unreachable objects, if an unreachable object is 13resurrected then tp_clear will eventually be called on it (or may already 14have been called before resurrection). At best (and this has been an 15historically common bug), tp_clear empties an instance's __dict__, and 16"impossible" AttributeErrors result. At worst, tp_clear leaves behind an 17insane object at the C level, and segfaults result (historically, most 18often by setting a class's mro pointer to NULL, after which attribute 19lookups performed by the class can segfault). 20 21OTOH, it's OK to run Python-level code that can't access unreachable 22objects, and sometimes that's necessary. The chief example is the callback 23attached to a reachable weakref W to an unreachable object O. Since O is 24going away, and W is still alive, the callback must be invoked. Because W 25is still alive, everything reachable from its callback is also reachable, 26so it's also safe to invoke the callback (although that's trickier than it 27sounds, since other reachable weakrefs to other unreachable objects may 28still exist, and be accessible to the callback -- there are lots of painful 29details like this covered in the rest of this file). 30 31Python 2.4/2.3.5 32================ 33 34The "Before 2.3.3" section below turned out to be wrong in some ways, but 35I'm leaving it as-is because it's more right than wrong, and serves as a 36wonderful example of how painful analysis can miss not only the forest for 37the trees, but also miss the trees for the aphids sucking the trees 38dry <wink>. 39 40The primary thing it missed is that when a weakref to a piece of cyclic 41trash (CT) exists, then any call to any Python code whatsoever can end up 42materializing a strong reference to that weakref's CT referent, and so 43possibly resurrect an insane object (one for which cyclic gc has called-- or 44will call before it's done --tp_clear()). It's not even necessarily that a 45weakref callback or __del__ method does something nasty on purpose: as 46soon as we execute Python code, threads other than the gc thread can run 47too, and they can do ordinary things with weakrefs that end up resurrecting 48CT while gc is running. 49 50 https://www.python.org/sf/1055820 51 52shows how innocent it can be, and also how nasty. Variants of the three 53focused test cases attached to that bug report are now part of Python's 54standard Lib/test/test_gc.py. 55 56Jim Fulton gave the best nutshell summary of the new (in 2.4 and 2.3.5) 57approach: 58 59 Clearing cyclic trash can call Python code. If there are weakrefs to 60 any of the cyclic trash, then those weakrefs can be used to resurrect 61 the objects. Therefore, *before* clearing cyclic trash, we need to 62 remove any weakrefs. If any of the weakrefs being removed have 63 callbacks, then we need to save the callbacks and call them *after* all 64 of the weakrefs have been cleared. 65 66Alas, doing just that much doesn't work, because it overlooks what turned 67out to be the much subtler problems that were fixed earlier, and described 68below. We do clear all weakrefs to CT now before breaking cycles, but not 69all callbacks encountered can be run later. That's explained in horrid 70detail below. 71 72Older text follows, with a some later comments in [] brackets: 73 74Before 2.3.3 75============ 76 77Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs. 78Segfaults in Zope3 resulted. 79 80weakrefs in Python are designed to, at worst, let *other* objects learn 81that a given object has died, via a callback function. The weakly 82referenced object itself is not passed to the callback, and the presumption 83is that the weakly referenced object is unreachable trash at the time the 84callback is invoked. 85 86That's usually true, but not always. Suppose a weakly referenced object 87becomes part of a clump of cyclic trash. When enough cycles are broken by 88cyclic gc that the object is reclaimed, the callback is invoked. If it's 89possible for the callback to get at objects in the cycle(s), then it may be 90possible for those objects to access (via strong references in the cycle) 91the weakly referenced object being torn down, or other objects in the cycle 92that have already suffered a tp_clear() call. There's no guarantee that an 93object is in a sane state after tp_clear(). Bad things (including 94segfaults) can happen right then, during the callback's execution, or can 95happen at any later time if the callback manages to resurrect an insane 96object. 97 98[That missed that, in addition, a weakref to CT can exist outside CT, and 99 any callback into Python can use such a non-CT weakref to resurrect its CT 100 referent. The same bad kinds of things can happen then.] 101 102Note that if it's possible for the callback to get at objects in the trash 103cycles, it must also be the case that the callback itself is part of the 104trash cycles. Else the callback would have acted as an external root to 105the current collection, and nothing reachable from it would be in cyclic 106trash either. 107 108[Except that a non-CT callback can also use a non-CT weakref to get at 109 CT objects.] 110 111More, if the callback itself is in cyclic trash, then the weakref to which 112the callback is attached must also be trash, and for the same kind of 113reason: if the weakref acted as an external root, then the callback could 114not have been cyclic trash. 115 116So a problem here requires that a weakref, that weakref's callback, and the 117weakly referenced object, all be in cyclic trash at the same time. This 118isn't easy to stumble into by accident while Python is running, and, indeed, 119it took quite a while to dream up failing test cases. Zope3 saw segfaults 120during shutdown, during the second call of gc in Py_Finalize, after most 121modules had been torn down. That creates many trash cycles (esp. those 122involving classes), making the problem much more likely. Once you 123know what's required to provoke the problem, though, it's easy to create 124tests that segfault before shutdown. 125 126In 2.3.3, before breaking cycles, we first clear all the weakrefs with 127callbacks in cyclic trash. Since the weakrefs *are* trash, and there's no 128defined-- or even predictable --order in which tp_clear() gets called on 129cyclic trash, it's defensible to first clear weakrefs with callbacks. It's 130a feature of Python's weakrefs too that when a weakref goes away, the 131callback (if any) associated with it is thrown away too, unexecuted. 132 133[In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not 134 those weakrefs are themselves CT, and whether or not they have callbacks. 135 The callbacks (if any) on non-CT weakrefs (if any) are invoked later, 136 after all weakrefs-to-CT have been cleared. The callbacks (if any) on CT 137 weakrefs (if any) are never invoked, for the excruciating reasons 138 explained here.] 139 140Just that much is almost enough to prevent problems, by throwing away 141*almost* all the weakref callbacks that could get triggered by gc. The 142problem remaining is that clearing a weakref with a callback decrefs the 143callback object, and the callback object may *itself* be weakly referenced, 144via another weakref with another callback. So the process of clearing 145weakrefs can trigger callbacks attached to other weakrefs, and those 146latter weakrefs may or may not be part of cyclic trash. 147 148So, to prevent any Python code from running while gc is invoking tp_clear() 149on all the objects in cyclic trash, 150 151[That was always wrong: we can't stop Python code from running when gc 152 is breaking cycles. If an object with a __del__ method is not itself in 153 a cycle, but is reachable only from CT, then breaking cycles will, as a 154 matter of course, drop the refcount on that object to 0, and its __del__ 155 will run right then. What we can and must stop is running any Python 156 code that could access CT.] 157 it's not quite enough just to invoke 158tp_clear() on weakrefs with callbacks first. Instead the weakref module 159grew a new private function (_PyWeakref_ClearRef) that does only part of 160tp_clear(): it removes the weakref from the weakly-referenced object's list 161of weakrefs, but does not decref the callback object. So calling 162_PyWeakref_ClearRef(wr) ensures that wr's callback object will never 163trigger, and (unlike weakref's tp_clear()) also prevents any callback 164associated *with* wr's callback object from triggering. 165 166[Although we may trigger such callbacks later, as explained below.] 167 168Then we can call tp_clear on all the cyclic objects and never trigger 169Python code. 170 171[As above, not so: it means never trigger Python code that can access CT.] 172 173After we do that, the callback objects still need to be decref'ed. Callbacks 174(if any) *on* the callback objects that were also part of cyclic trash won't 175get invoked, because we cleared all trash weakrefs with callbacks at the 176start. Callbacks on the callback objects that were not part of cyclic trash 177acted as external roots to everything reachable from them, so nothing 178reachable from them was part of cyclic trash, so gc didn't do any damage to 179objects reachable from them, and it's safe to call them at the end of gc. 180 181[That's so. In addition, now we also invoke (if any) the callbacks on 182 non-CT weakrefs to CT objects, during the same pass that decrefs the 183 callback objects.] 184 185An alternative would have been to treat objects with callbacks like objects 186with __del__ methods, refusing to collect them, appending them to gc.garbage 187instead. That would have been much easier. Jim Fulton gave a strong 188argument against that (on Python-Dev): 189 190 There's a big difference between __del__ and weakref callbacks. 191 The __del__ method is "internal" to a design. When you design a 192 class with a del method, you know you have to avoid including the 193 class in cycles. 194 195 Now, suppose you have a design that makes has no __del__ methods but 196 that does use cyclic data structures. You reason about the design, 197 run tests, and convince yourself you don't have a leak. 198 199 Now, suppose some external code creates a weakref to one of your 200 objects. All of a sudden, you start leaking. You can look at your 201 code all you want and you won't find a reason for the leak. 202 203IOW, a class designer can out-think __del__ problems, but has no control 204over who creates weakrefs to his classes or class instances. The class 205user has little chance either of predicting when the weakrefs he creates 206may end up in cycles. 207 208Callbacks on weakref callbacks are executed in an arbitrary order, and 209that's not good (a primary reason not to collect cycles with objects with 210__del__ methods is to avoid running finalizers in an arbitrary order). 211However, a weakref callback on a weakref callback has got to be rare. 212It's possible to do such a thing, so gc has to be robust against it, but 213I doubt anyone has done it outside the test case I wrote for it. 214 215[The callbacks (if any) on non-CT weakrefs to CT objects are also executed 216 in an arbitrary order now. But they were before too, depending on the 217 vagaries of when tp_clear() happened to break enough cycles to trigger 218 them. People simply shouldn't try to use __del__ or weakref callbacks to 219 do fancy stuff.] 220