1[/ Copyright 2006-2008 Daniel James. 2 / Distributed under the Boost Software License, Version 1.0. (See accompanying 3 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ] 4 5[section:buckets The Data Structure] 6 7The containers are made up of a number of 'buckets', each of which can contain 8any number of elements. For example, the following diagram shows an [classref 9boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`, 10`B`, `C`, `D` and `E` (this is just for illustration, containers will typically 11have more buckets). 12 13[diagram buckets] 14 15In order to decide which bucket to place an element in, the container applies 16the hash function, `Hash`, to the element's key (for `unordered_set` and 17`unordered_multiset` the key is the whole element, but is referred to as the key 18so that the same terminology can be used for sets and maps). This returns a 19value of type `std::size_t`. `std::size_t` has a much greater range of values 20then the number of buckets, so the container applies another transformation to 21that value to choose a bucket to place the element in. 22 23Retrieving the elements for a given key is simple. The same process is applied 24to the key to find the correct bucket. Then the key is compared with the 25elements in the bucket to find any elements that match (using the equality 26predicate `Pred`). If the hash function has worked well the elements will be 27evenly distributed amongst the buckets so only a small number of elements will 28need to be examined. 29 30There is [link unordered.hash_equality more information on hash functions and 31equality predicates in the next section]. 32 33You can see in the diagram that `A` & `D` have been placed in the same bucket. 34When looking for elements in this bucket up to 2 comparisons are made, making 35the search slower. This is known as a collision. To keep things fast we try to 36keep collisions to a minimum. 37 38''' 39<table frame="all"><title>Methods for Accessing Buckets</title> 40 <tgroup cols="2"> 41 <thead><row> 42 <entry><para>Method</para></entry> 43 <entry><para>Description</para></entry> 44 </row></thead> 45 <tbody> 46 <row> 47 <entry>'''`size_type bucket_count() const`'''</entry> 48 <entry>'''The number of buckets.'''</entry> 49 </row> 50 <row> 51 <entry>'''`size_type max_bucket_count() const`'''</entry> 52 <entry>'''An upper bound on the number of buckets.'''</entry> 53 </row> 54 <row> 55 <entry>'''`size_type bucket_size(size_type n) const`'''</entry> 56 <entry>'''The number of elements in bucket `n`.'''</entry> 57 </row> 58 <row> 59 <entry>'''`size_type bucket(key_type const& k) const`'''</entry> 60 <entry>'''Returns the index of the bucket which would contain `k`.'''</entry> 61 </row> 62 <row> 63 <entry>'''`local_iterator begin(size_type n);`'''</entry> 64 <entry morerows='5'>'''Return begin and end iterators for bucket `n`.'''</entry> 65 </row> 66 <row> 67 <entry>'''`local_iterator end(size_type n);`'''</entry> 68 </row> 69 <row> 70 <entry>'''`const_local_iterator begin(size_type n) const;`'''</entry> 71 </row> 72 <row> 73 <entry>'''`const_local_iterator end(size_type n) const;`'''</entry> 74 </row> 75 <row> 76 <entry>'''`const_local_iterator cbegin(size_type n) const;`'''</entry> 77 </row> 78 <row> 79 <entry>'''`const_local_iterator cend(size_type n) const;`'''</entry> 80 </row> 81 </tbody> 82 </tgroup> 83</table> 84''' 85 86[h2 Controlling the number of buckets] 87 88As more elements are added to an unordered associative container, the number 89of elements in the buckets will increase causing performance to degrade. 90To combat this the containers increase the bucket count as elements are inserted. 91You can also tell the container to change the bucket count (if required) by 92calling `rehash`. 93 94The standard leaves a lot of freedom to the implementer to decide how the 95number of buckets is chosen, but it does make some requirements based on the 96container's 'load factor', the average number of elements per bucket. 97Containers also have a 'maximum load factor' which they should try to keep the 98load factor below. 99 100You can't control the bucket count directly but there are two ways to 101influence it: 102 103* Specify the minimum number of buckets when constructing a container or 104 when calling `rehash`. 105* Suggest a maximum load factor by calling `max_load_factor`. 106 107`max_load_factor` doesn't let you set the maximum load factor yourself, it just 108lets you give a /hint/. And even then, the draft standard doesn't actually 109require the container to pay much attention to this value. The only time the 110load factor is /required/ to be less than the maximum is following a call to 111`rehash`. But most implementations will try to keep the number of elements 112below the max load factor, and set the maximum load factor to be the same as 113or close to the hint - unless your hint is unreasonably small or large. 114 115[table:bucket_size Methods for Controlling Bucket Size 116 [[Method] [Description]] 117 118 [ 119 [`X(size_type n)`] 120 [Construct an empty container with at least `n` buckets (`X` is the container type).] 121 ] 122 [ 123 [`X(InputIterator i, InputIterator j, size_type n)`] 124 [Construct an empty container with at least `n` buckets and insert elements 125 from the range \[`i`, `j`) (`X` is the container type).] 126 ] 127 [ 128 [`float load_factor() const`] 129 [The average number of elements per bucket.] 130 ] 131 [ 132 [`float max_load_factor() const`] 133 [Returns the current maximum load factor.] 134 ] 135 [ 136 [`float max_load_factor(float z)`] 137 [Changes the container's maximum load factor, using `z` as a hint.] 138 ] 139 [ 140 [`void rehash(size_type n)`] 141 [Changes the number of buckets so that there at least `n` buckets, and 142 so that the load factor is less than the maximum load factor.] 143 ] 144 145] 146 147[h2 Iterator Invalidation] 148 149It is not specified how member functions other than `rehash` affect 150the bucket count, although `insert` is only allowed to invalidate iterators 151when the insertion causes the load factor to be greater than or equal to the 152maximum load factor. For most implementations this means that `insert` will only 153change the number of buckets when this happens. While iterators can be 154invalidated by calls to `insert` and `rehash`, pointers and references to the 155container's elements are never invalidated. 156 157In a similar manner to using `reserve` for `vector`s, it can be a good idea 158to call `rehash` before inserting a large number of elements. This will get 159the expensive rehashing out of the way and let you store iterators, safe in 160the knowledge that they won't be invalidated. If you are inserting `n` 161elements into container `x`, you could first call: 162 163 x.rehash((x.size() + n) / x.max_load_factor()); 164 165[blurb Note: `rehash`'s argument is the minimum number of buckets, not the 166number of elements, which is why the new size is divided by the maximum load factor.] 167 168[endsect] 169