• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Copyright (c) 2014 The Chromium OS Authors. All rights reserved.
2# Use of this source code is governed by a BSD-style license that can be
3# found in the LICENSE file.
4
5
6"""Cache module for rdb requests/host objects.
7
8This module supplies the following api:
9    1. A cache backend.
10    2. A cache manager for the backend.
11    3. A memoize decorator to encapsulate caching logic.
12
13This cache manager functions as a lookaside buffer for host requests.
14Its correctness is contingent on the following conditions:
151. The concurrency of the rdb remains 0.
162. Clients of the cache don't trust the leased bit on the cached object.
173. The cache is created at the start of a single batched request,
18    populated during the request, and completely discarded at the end.
19
20Rather than caching individual hosts, the cache manager maintains
21'cache lines'. A cache line is defined as a key: value pair, where
22the key is as returned by get_key, and the value is a list of RDBHosts
23that match the key. The following limitations are placed on cache lines:
241. A new line can only contain unleased hosts.
252. A key can only be set once, with a single line, before removal.
263. Every 'get' deletes the entire line.
27
28Consider the following examples:
29Normal case: 3 grouped requests, all with the same deps/acls, but different
30priorities/parent_job_ids. The requests (X+Y+Z) > matching hosts (K):
31 (request1, count=X)- hits the database, takes X hosts, caches (K-X)
32 (request2, count=Y) - hits the cache and is fully satisfied, caches (K-(X+Y))
33 (request3, count=Z) - hits the cache, needs to acquire (X+Y+Z)-K next tick]:
34
35 Host Count |  RDB                         | Cache
36------------------------------------------------------------------
37X:          | request1                     | {}
38K:          | find_hosts(deps, acls)       |
39X:          | leased_hosts                 |
40K-X:        | ---------------------------> | {key: [K-X hosts]}
41Y<K-X:      | request2 <---[K-X hosts]---- | {}
42Y:          | leased_hosts                 |
43K-(X+Y):    | ---------------------------> | {key: [K-(X+Y) hosts]}
44Z>K-(X+Y):  | request3 <-[K-(X+Y) hosts]-- | {}
45Z-(K-(X+Y)):| leased_hosts                 |
46
47Since hosts are only released by the scheduler there is no way the
48third request could have been satisfied completely even if we had checked
49the database real-time.
50
51Stale cache entries: 3 grouped requests that don't have the same deps/acls.
52P(1,2,3) are priorities, with P3 being the highest:
53 (request1(deps=[a,b], P3), Count=X) - Caches hosts
54 (request2(deps=[a], P2), Count=Y) - hits the database
55 (request3(deps=[a,b], P1)], Count=Z) - Tries to use cached hosts but fails
56
57 Host Count |  RDB                         | Cache
58------------------------------------------------------------------
59X:          | request1(deps=[a,b])         | {}
60K:          | find_hosts(deps=[a,b])       |
61X:          | leased_hosts                 |
62K-X:        | ---------------------------> | {deps=[a,b]: [(K-X) hosts]}
63Y<K-X:      | request2(deps=[a])           | {}
64K-X:        | find_hosts(deps=[a])         |
65Y:          | leased_hosts                 |
66K-(X+Y):    | ---------------------------> | {deps=[a]: [(K-(X+Y)) hosts],
67            |                              |        | overlap |
68            |                              |  deps=[a, b], [(K-X) hosts]}
69Z:          | request3(deps=[a,b])<-[K-X]--| {deps=[a]: [K-(X+Y) hosts]}
70Z-(K-(X+Y)):| leased_hosts                 | {deps=[a]: [N-Y hosts]}
71
72Note that in the last case, even though the cache returns hosts that
73have already been assigned to request2, request3 cannot use them. This is
74acceptable because the number of hosts we lease per tick is << the number
75of requests, so it's faster to check leased bits real time than query for hosts.
76"""
77
78
79import abc
80import collections
81import logging
82
83import common
84from autotest_lib.client.common_lib.cros.graphite import autotest_stats
85from autotest_lib.client.common_lib.global_config import global_config
86from autotest_lib.scheduler import rdb_utils
87
88MEMOIZE_KEY = 'memoized_hosts'
89
90def memoize_hosts(func):
91    """Decorator used to memoize through the cache manager.
92
93    @param func: The function/method to decorate.
94        Before calling this function we check the cache for values matching
95        its request argument, and anything returned by the function is cached
96        cached under the same request.
97    """
98    def cache(self, request, count, **kwargs):
99        """Caching function for the memoize decorator.
100
101        @param request: The rdb request, as defined in rdb_requests.
102        @param count: The count of elements needed to satisfy the request.
103        @param kwargs:
104            Named args for the memoized function. This map should not contain
105            the key MEMOIZED_KEY, as this is reserved for the passing of
106            the cached/memoized hosts to the function itself.
107        """
108        cache_key = self.cache.get_key(request.deps, request.acls)
109        try:
110            kwargs[MEMOIZE_KEY] = self.cache.get_line(cache_key)
111        except rdb_utils.CacheMiss:
112            pass
113        hosts = func(self, request, count, **kwargs)
114        self.cache.set_line(cache_key, hosts)
115        return hosts
116    return cache
117
118
119class CacheBackend(object):
120    """Base class for a cache backend."""
121    __metaclass__ = abc.ABCMeta
122
123    def set(self, key, value):
124        """Set a key.
125
126        @param key: The key to set.
127        @param value: The value to cache.
128        """
129        pass
130
131
132    def get(self, key):
133        """Get the value stored under a key.
134
135        @param key: The key to retrieve the value for.
136        @return: The value stored under the key.
137        @raises KeyError: If the key isn't present in the cache.
138        """
139        pass
140
141
142    def delete(self, key):
143        """Delete the key, value pair from the cache.
144
145        @param key: The key used to find the key, value pair to delete.
146        @raises KeyError: If the key isn't already in the cache.
147        """
148        pass
149
150
151    def has_key(self, key):
152        """Check if the key exists in the cache.
153
154        @param key: The key to check.
155        @return: True if the key is in the cache.
156        """
157        return False
158
159
160class DummyCacheBackend(CacheBackend):
161    """A dummy cache backend.
162
163    This cache will claim to have no keys. Every get is a cache miss.
164    """
165
166    def get(self, key):
167        raise KeyError
168
169
170class InMemoryCacheBackend(CacheBackend):
171    """In memory cache backend.
172
173    Uses a simple dictionary to store key, value pairs.
174    """
175    def __init__(self):
176        self._cache = {}
177
178    def get(self, key):
179        return self._cache[key]
180
181    def set(self, key, value):
182        self._cache[key] = value
183
184    def delete(self, key):
185        self._cache.pop(key)
186
187    def has_key(self, key):
188        return key in self._cache
189
190# TODO: Implement a MemecacheBackend, invalidate when unleasing a host, refactor
191# the AcquireHostRequest to contain a core of (deps, acls) that we can use as
192# the key for population and invalidation. The caching manager is still valid,
193# regardless of the backend.
194
195class RDBHostCacheManager(object):
196    """RDB Cache manager."""
197
198    key = collections.namedtuple('key', ['deps', 'acls'])
199    use_cache = global_config.get_config_value(
200            'RDB', 'use_cache', type=bool, default=True)
201
202    def __init__(self):
203        self._cache_backend = (InMemoryCacheBackend()
204                               if self.use_cache else DummyCacheBackend())
205        self.hits = 0
206        self.misses = 0
207        self.stale_entries = []
208
209
210    def mean_staleness(self):
211        """Compute the average stale entries per line.
212
213        @return: A floating point representing the mean staleness.
214        """
215        return (reduce(lambda x, y: float(x+y), self.stale_entries)/
216                len(self.stale_entries)) if self.stale_entries else 0
217
218
219    def hit_ratio(self):
220        """Compute the hit ratio of this cache.
221
222        @return: A floating point percentage of the hit ratio.
223        """
224        if not self.hits and not self.misses:
225            return 0
226        requests = float(self.hits + self.misses)
227        return (self.hits/requests) * 100
228
229
230    def record_stats(self):
231        """Record stats about the cache managed by this instance."""
232        hit_ratio = self.hit_ratio()
233        staleness = self.mean_staleness()
234        logging.debug('Cache stats: hit ratio: %.2f%%, '
235                      'avg staleness per line: %.2f%%.', hit_ratio, staleness)
236        autotest_stats.Gauge(rdb_utils.RDB_STATS_KEY).send(
237                'cache.hit_ratio', hit_ratio)
238        autotest_stats.Gauge(rdb_utils.RDB_STATS_KEY).send(
239                'cache.stale_entries', staleness)
240
241
242    @classmethod
243    def get_key(cls, deps, acls):
244        """Return a key for the given deps, acls.
245
246        @param deps: A list of deps, as taken by the AcquireHostRequest.
247        @param acls: A list of acls, as taken by the AcquireHostRequest.
248        @return: A cache key for the given deps/acls.
249        """
250        # All requests with the same deps, acls should hit the same cache line.
251        # TODO: Do something smarter with acls, only one needs to match.
252        return cls.key(deps=frozenset(deps), acls=frozenset(acls))
253
254
255    def get_line(self, key):
256        """Clear and return the cache line matching the key.
257
258        @param key: The key the desired cache_line is stored under.
259        @return: A list of rdb hosts matching the key, or None.
260
261        @raises rdb_utils.CacheMiss: If the key isn't in the cache.
262        """
263        try:
264            cache_line = self._cache_backend.get(key)
265        except KeyError:
266            self.misses += 1
267            raise rdb_utils.CacheMiss('Key %s not in cache' % (key,))
268        self.hits += 1
269        self._cache_backend.delete(key)
270        return list(cache_line)
271
272
273    def _check_line(self, line, key):
274        """Sanity check a cache line.
275
276        This method assumes that a cache line is made up of RDBHost objects,
277        and checks to see if they all match each other/the key passed in.
278        Checking is done in terms of host labels and acls, note that the hosts
279        in the line can have different deps/acls, as long as they all have the
280        deps required by the key, and at least one matching acl of the key.
281
282        @param line: The cache line value.
283        @param key: The key the line will be stored under.
284        @raises rdb_utils.RDBException:
285            If one of the hosts in the cache line is already leased.
286            The cache already has a different line under the given key.
287            The given key doesn't match the hosts in the line.
288        """
289        # Note that this doesn't mean that all hosts in the cache are unleased.
290        if any(host.leased for host in line):
291            raise rdb_utils.RDBException('Cannot cache leased hosts %s' % line)
292
293        # Confirm that the given line can be used to service the key by checking
294        # that all hosts have the deps mentioned in the key, and at least one
295        # matching acl.
296        h_keys = set([self.get_key(host.labels, host.acls) for host in line])
297        for h_key in h_keys:
298            if (not h_key.deps.issuperset(key.deps) or
299                    not key.acls.intersection(h_key.acls)):
300                raise rdb_utils.RDBException('Given key: %s does not match key '
301                        'computed from hosts in line: %s' % (key, h_keys))
302        if self._cache_backend.has_key(key):
303            raise rdb_utils.RDBException('Cannot override a cache line. It '
304                    'must be cleared before setting. Key: %s, hosts %s' %
305                    (key, line))
306
307
308    def set_line(self, key, hosts):
309        """Cache a list of similar hosts.
310
311        set_line will no-op if:
312            The hosts aren't all unleased.
313            The hosts don't have deps/acls matching the key.
314            A cache line under the same key already exists.
315        The first 2 cases will lead to a cache miss in the corresponding get.
316
317        @param hosts: A list of unleased hosts with the same deps/acls.
318        @raises RDBException: If hosts is None, since None is reserved for
319            key expiration.
320        """
321        if hosts is None:
322            raise rdb_utils.RDBException('Cannot set None in the cache.')
323
324        # An empty list means no hosts matching the request are available.
325        # This can happen if a previous request leased all matching hosts.
326        if not hosts or not self.use_cache:
327            self._cache_backend.set(key, [])
328            return
329        try:
330            self._check_line(hosts, key)
331        except rdb_utils.RDBException as e:
332            logging.error(e)
333        else:
334            self._cache_backend.set(key, set(hosts))
335