• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/
2 / Copyright (c) 2008 Eric Niebler
3 /
4 / Distributed under the Boost Software License, Version 1.0. (See accompanying
5 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
6 /]
7
8[section Cycle collection with [^tracking_ptr<>]]
9
10In xpressive, regex objects can refer to each other and themselves by value or by reference.
11In addition, they ref-count their referenced regexes to keep them alive. This creates the possibility
12for cyclic reference counts, and raises the possibility of memory leaks. xpressive avoids leaks
13by using a type called `tracking_ptr<>`. This doc describes at a high level how `tracking_ptr<>`
14works.
15
16[h2 Constraints]
17
18Our solution must meet the following design constraints:
19
20* No dangling references: All objects referred to directly or indirectly must be kept alive
21  as long as the references are needed.
22* No leaks: all objects must be freed eventually.
23* No user intervention: The solution must not require users to explicitly invoke some cycle
24  collection routine.
25* Clean-up is no-throw: The collection phase will likely be called from a destructor, so it
26  must never throw an exception under any circumstance.
27
28[h2 Handle-Body Idiom]
29
30To use `tracking_ptr<>`, you must separate your type into a handle and a body. In the case of
31xpressive, the handle type is called `basic_regex<>` and the body is called `regex_impl<>`. The
32handle will store a `tracking_ptr<>` to the body.
33
34The body type must inherit from `enable_reference_tracking<>`. This gives the body the bookkeeping
35data structures that `tracking_ptr<>` will use. In particular, it gives the body:
36
37# `std::set<shared_ptr<body> > refs_` : collection of bodies to which this body refers, and
38# `std::set<weak_ptr<body> > deps_` : collection of bodies which refer to this body.
39
40[h2 References and Dependencies]
41
42We refer to (1) above as the "references" and (2) as the "dependencies". It is crucial to the
43understanding of `tracking_ptr<>` to recognize that the set of references includes both those
44objects that are referred to directly as well as those that are referred to indirectly (that
45is, through another reference). The same is true for the set of dependencies. In other words,
46each body holds a ref-count directly to every other body that it needs.
47
48Why is this important?  Because it means that when a body no longer has a handle referring
49to it, all its references can be released immediately without fear of creating dangling references.
50
51References and dependencies cross-pollinate. Here's how it works:
52
53# When one object acquires another as a reference, the second object acquires the first as
54  a dependency.
55# In addition, the first object acquires all of the second object's references, and the second
56  object acquires all of the first object's dependencies.
57# When an object picks up a new reference, the reference is also added to all dependent objects.
58# When an object picks up a new dependency, the dependency is also added to all referenced
59  objects.
60# An object is never allowed to have itself as a dependency. Objects may have themselves as
61  references, and often do.
62
63Consider the following code:
64
65    sregex expr;
66    {
67        sregex group  = '(' >> by_ref(expr) >> ')';                 // (1)
68        sregex fact   = +_d | group;                                // (2)
69        sregex term   = fact >> *(('*' >> fact) | ('/' >> fact));   // (3)
70        expr          = term >> *(('+' >> term) | ('-' >> term));   // (4)
71    }                                                               // (5)
72
73Here is how the references and dependencies propagate, line by line:
74
75[table
76[[Expression][Effects]]
77[[1) `sregex group  = '(' >> by_ref(expr) >> ')';`]
78[[^group: cnt=1 refs={expr} deps={}\n
79expr:  cnt=2 refs={} deps={group}]]]
80
81[[2) `sregex fact   = +_d | group;`]
82[[^group: cnt=2 refs={expr} deps={fact}\n
83expr:  cnt=3 refs={} deps={group,fact}\n
84fact:  cnt=1 refs={expr,group} deps={}]]]
85
86[[3) `sregex term   = fact >> *(('*' >> fact) | ('/' >> fact));`]
87[[^group: cnt=3 refs={expr} deps={fact,term}\n
88expr:  cnt=4 refs={} deps={group,fact,term}\n
89fact:  cnt=2 refs={expr,group} deps={term}\n
90term:  cnt=1 refs={expr,group,fact} deps={}]]]
91
92[[4) `expr          = term >> *(('+' >> term) | ('-' >> term));`]
93[[^group: cnt=5 refs={expr,group,fact,term} deps={expr,fact,term}\n
94expr:  cnt=5 refs={expr,group,fact,term} deps={group,fact,term}\n
95fact:  cnt=5 refs={expr,group,fact,term} deps={expr,group,term}\n
96term:  cnt=5 refs={expr,group,fact,term} deps={expr,group,fact}]]]
97
98[[5) `}`]
99[[^expr:  cnt=2 refs={expr,group,fact,term} deps={group,fact,term}]]]
100]
101
102This shows how references and dependencies propagate when creating cycles of objects. After
103line (4), which closes the cycle, every object has a ref-count on every other object, even
104to itself. So how does this not leak? Read on.
105
106[h2 Cycle Breaking]
107
108Now that the bodies have their sets of references and dependencies, the hard part is done.
109All that remains is to decide when and where to break the cycle. That is the job of `tracking_ptr<>`,
110which is part of the handle. The `tracking_ptr<>` holds 2 `shared_ptr`s. The first, obviously,
111is the `shared_ptr<body>` -- the reference to the body to which this handle refers. The other
112`shared_ptr` is used to break the cycle. It ensures that when all the handles to a body go out
113of scope, the body's set of references is cleared.
114
115This suggests that more than one handle can refer to a body. In fact, `tracking_ptr<>` gives
116you copy-on-write semantics -- when you copy a handle, the body is shared. That makes copies
117very efficient. Eventually, all the handles to a particular body go out of scope. When that
118happens, the ref count to the body might still be greater than 0 because some other body (or
119this body itself!) might be holding a reference to it. However, we are certain that the cycle-breaker's
120ref-count goes to 0 because the cycle-breaker only lives in handles. No more handles, no more
121cycle-breakers.
122
123What does the cycle-breaker do? Recall that the body has a set of references of type
124`std::set<shared_ptr<body> >`. Let's call this type "references_type". The cycle-breaker is a
125`shared_ptr<references_type>`. It uses a custom deleter, which is defined as follows:
126
127  template<typename DerivedT>
128  struct reference_deleter
129  {
130      void operator ()(std::set<shared_ptr<DerivedT> > *refs) const
131      {
132          refs->clear();
133      }
134  };
135
136The job of to the cycle breaker is to ensure that when the last handle to a body goes away,
137the body's set of references is cleared. That's it.
138
139We can clearly see how this guarantees that all bodies are cleaned up eventually. Once every
140handle has gone out of scope, all the bodies' sets of references will be cleared, leaving none
141with a non-zero ref-count. No leaks, guaranteed.
142
143It's a bit harder to see how this guarantees no dangling references. Imagine that there are 3
144bodies: A, B and C. A refers to B which refers to C. Now all the handles to B go out of scope,
145so B's set of references is cleared. Doesn't this mean that C gets deleted, even though it
146is being used (indirectly) by A? It doesn't. This situation can never occur because we propagated
147the references and dependencies above such that A will be holding a reference directly to C
148in addition to B. When B's set of references is cleared, no bodies get deleted, because they
149are all still in use by A.
150
151[h2 Future Work]
152
153All these `std::set`s and `shared_ptr`s and `weak_ptr`s! Very inefficient. I used them because
154they were handy. I could probably do better.
155
156Also, some objects stick around longer than they need to. Consider:
157
158    sregex b;
159    {
160        sregex a = _;
161        b = by_ref(a);
162        b = _;
163    }
164    // a is still alive here!
165
166Due to the way references and dependencies are propagated, the `std::set` of references can only
167grow. It never shrinks, even when some references are no longer needed. For xpressive this
168isn't an issue. The graphs of referential objects generally stay small and isolated. If someone
169were to try to use `tracking_ptr<>` as a general ref-count-cycle-collection mechanism, this problem
170would have to be addressed.
171
172[endsect]
173