1[/ 2 Copyright (c) 2009-2009 Joachim Faulhaber 3 4 Distributed under the Boost Software License, Version 1.0. 5 (See accompanying file LICENSE_1_0.txt or copy at 6 http://www.boost.org/LICENSE_1_0.txt) 7] 8 9[section Projects] 10 11['*Projects*] are examples on the usage of interval containers 12that go beyond small toy snippets of code. The code presented 13here addresses more serious applications that approach the 14quality of real world programming. At the same time it aims to 15guide the reader more deeply into various aspects of 16the library. In order not to overburden the reader with implementation 17details, the code in ['*projects*] tries to be ['*minimal*]. It has a focus on 18the main aspects of the projects and is not intended to be complete 19and mature like the library code itself. Cause it's minimal, 20project code lives in `namespace mini`. 21 22[/ 23[section Overview] 24 25[table Overview over Icl Examples 26 [[level] [example] [classes] [features]] 27 [[basic] [[link boost_icl.examples.large_bitset Large Bitset]] 28 [__itv_map__][The most simple application of an interval map: 29 Counting the overlaps of added intervals.]] 30] 31 32[endsect][/ Overview IN Projects] 33] 34 35[section Large Bitset][/ IN Projects] 36 37Bitsets are just sets. Sets of unsigned integrals, 38to be more precise. 39The prefix ['*bit*] usually only indicates, 40that the representation of those sets is organized in a compressed 41form that exploits the fact, that we can switch on an off single 42bits in machine words. Bitsets are therefore known to be very small 43and thus efficient. 44The efficiency of bitsets is usually coupled to the 45precondition that the range of values of elements 46is relatively small, like [0..32) or [0..64), values that can 47be typically represented in single or a small number of 48machine words. If we wanted to represent a set containing 49two values {1, 1000000}, we would be much better off using 50other sets like e.g. an `std::set`. 51 52Bitsets compress well, if elements spread over narrow ranges 53only. Interval sets compress well, if many elements are clustered 54over intervals. They can span large sets very efficiently then. 55In project ['*Large Bitset*] we want to ['*combine the bit compression 56and the interval compression*] to achieve a set implementation, 57that is capable of spanning large chunks of contiguous elements 58using intervals and also to represent more narrow ['nests] of varying 59bit sequences using bitset compression. 60As we will see, this can be achieved using only a small 61amount of code because most of the properties we need 62are provided by an __itv_map__ of `bitsets`: 63 64`` 65typedef interval_map<IntegralT, SomeBitSet<N>, partial_absorber, 66 std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap; 67`` 68 69Such an `IntervalBitmap` represents `k*N` bits for every segment. 70`` 71[a, a+k)->'1111....1111' // N bits associated: Represents a total of k*N bits. 72`` 73 74For the interval `[a, a+k)` above all bits are set. 75But we can also have individual ['nests] or ['clusters] 76of bitsequences. 77 78`` 79[b, b+1)->'01001011...1' 80[b+1,b+2)->'11010001...0' 81. . . 82`` 83 84and we can span intervals of equal bit sequences that represent 85periodic patterns. 86 87`` 88[c,d)->'010101....01' // Every second bit is set in range [c,d) 89[d,e)->'001100..0011' // Every two bits alterate in range [d,e) 90[e,f)->'bit-sequence' // 'bit-sequence' reoccurs every N bits in range [e,f) 91`` 92 93An `IntervalBitmap` can represent 94`N*(2^M)` elements, if `M` is the number of bits of the 95integral type `IntegralT`. Unlike bitsets, that usually 96represent ['*unsigned*] integral numbers, large_bitset may 97range over negative numbers as well. 98There are fields where such large 99bitsets implementations are needed. E.g. for the compact 100representation of large file allocation tables. 101What remains to be done for project *Large Bitset* is 102to code a wrapper `class large_bitset` around `IntervalBitmap` 103so that `large_bitset` looks and feels like a usual 104set class. 105 106[section Using large_bitset] 107 108To quicken your appetite for a look at the implementation 109here are a few use cases first. Within the examples that follow, 110we will use `nat`[^['k]] for unsigned integrals 111and `bits`[^['k]] for bitsets containing [^['k]] bits. 112 113Let's start large. 114In the first example . . . 115 116[import ../example/large_bitset_/large_bitset.cpp] 117[large_bitset_test_large_set_all] 118 119. . . we are testing the limits. First we set all bits and 120then we switch off the very last bit. 121 122[large_bitset_test_large_erase_last] 123 124Program output (/a little beautified/): 125`` 126----- Test function test_large() ----------------------------------------------- 127We have just turned on the awesome amount of 18,446,744,073,709,551,616 bits ;-) 128[ 0, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111111 129---- Let's swich off the very last bit ----------------------------------------- 130[ 0, 288230376151711743) -> 1111111111111111111111111111111111111111111111111111111111111111 131[288230376151711743, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111110 132---- Venti is plenty ... let's do something small: A tall ---------------------- 133`` 134 135More readable is a smaller version of `large_bitset`. 136In function `test_small()` we apply a few more operations . . . 137 138[large_bitset_test_small] 139 140. . . producing this output: 141`` 142----- Test function test_small() ----------- 143-- Switch on all bits in range [0,64] ------ 144[0,8)->11111111 145[8,9)->10000000 146-------------------------------------------- 147-- Turn off bits: 25,27,28 ----------------- 148[0,3)->11111111 149[3,4)->10100111 150[4,8)->11111111 151[8,9)->10000000 152-------------------------------------------- 153-- Flip bits in range [24,30) -------------- 154[0,3)->11111111 155[3,4)->01011011 156[4,8)->11111111 157[8,9)->10000000 158-------------------------------------------- 159-- Remove the first 10 bits ---------------- 160[1,2)->00111111 161[2,3)->11111111 162[3,4)->01011011 163[4,8)->11111111 164[8,9)->10000000 165-- Remove even bits in range [0,72) -------- 166[1,2)->00010101 167[2,3)->01010101 168[3,4)->01010001 169[4,8)->01010101 170-- Set odd bits in range [0,72) -------- 171[0,9)->01010101 172-------------------------------------------- 173`` 174 175Finally, we present a little /picturesque/ example, 176that demonstrates that `large_bitset` can also serve 177as a self compressing bitmap, that we can 'paint' with. 178 179[large_bitset_test_picturesque] 180 181Note that we have two `large_bitsets` for the /outline/ 182and the /interior/. Both parts are compressed but we can 183compose both by `operator +`, because the right /positions/ 184are provided. This is the program output: 185 186`` 187----- Test function test_picturesque() ----- 188-------- empty face: 3 intervals ----- 189******** 190* * 191* * 192* * 193* * 194 ****** 195-------- compressed smile: 2 intervals ----- 196 * * 197 **** 198-------- staring bitset: 6 intervals ----- 199******** 200* * 201* * * * 202* * 203* **** * 204 ****** 205-------------------------------------------- 206`` 207 208So, may be you are curious how this class template 209is coded on top of __itv_map__ using only about 250 lines 210of code. This is shown in the sections that follow. 211 212[endsect][/ Usage of a large_bitset] 213 214[section The interval_bitmap] 215 216To begin, let's look at the basic data type again, that 217will be providing the major functionality: 218 219`` 220typedef interval_map<DomainT, BitSetT, partial_absorber, 221 std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap; 222`` 223 224`DomainT` is supposed to be an integral 225type, the bitset type `BitSetT` will be a wrapper class around an 226unsigned integral type. `BitSetT` has to implement bitwise operators 227that will be called by the functors `inplace_bit_add<BitSetT>` 228and `inplace_bit_and<BitSetT>`. 229The type trait of interval_map is `partial_absorber`, which means 230that it is /partial/ and that empty `BitSetTs` are not stored in 231the map. This is desired and keeps the `interval_map` minimal, storing 232only bitsets, that contain at least one bit switched on. 233Functor template `inplace_bit_add` for parameter `Combine` indicates that we 234do not expect `operator +=` as addition but the bitwise operator 235`|=`. For template parameter `Section` which is instaniated by 236`inplace_bit_and` we expect the bitwise `&=` operator. 237 238[endsect][/ section The interval_bitmap] 239 240[section A class implementation for the bitset type] 241 242The code of the project is enclosed in a `namespace mini`. 243The name indicates, that the implementation is a /minimal/ 244example implementation. The name of the bitset class will 245be `bits` or `mini::bits` if qualified. 246 247To be used as a codomain parameter of class template __itv_map__, 248`mini::bits` has to implement all the functions that are required 249for a codomain_type in general, which are the default constructor `bits()` 250and an equality `operator==`. Moreover `mini::bits` has to implement operators 251required by the instantiations for parameter `Combine` and `Section` 252which are `inplace_bit_add` and `inplace_bit_and`. From functors 253`inplace_bit_add` and `inplace_bit_and` there are inverse functors 254`inplace_bit_subtract` and `inplace_bit_xor`. Those functors 255use operators `|= &= ^=` and `~`. Finally if we want to apply 256lexicographical and subset comparison on large_bitset, we also 257need an `operator <`. All the operators that we need can be implemented 258for `mini::bits` on a few lines: 259 260[import ../example/large_bitset_/bits.hpp] 261[mini_bits_class_bits] 262 263Finally there is one important piece of meta information, we have 264to provide: `mini::bits` has to be recognized as a `Set` by the 265icl code. Otherwise we can not exploit the fact that a map of 266sets is model of `Set` and the resulting large_bitset would not 267behave like a set. So we have to say that `mini::bits` shall be sets: 268 269[mini_bits_is_set] 270 271This is done by adding a partial template specialization to 272the type trait template `icl::is_set`. 273For the extension of this type trait template and the result 274values of inclusion_compare we need these `#includes` for the 275implementation of `mini::bits`: 276 277[mini_bits_includes] 278 279[endsect][/ A class implementation for the bitset type] 280 281[section Implementation of a large bitset] 282 283Having finished our `mini::bits` implementation, we can start to 284code the wrapper class that hides the efficient interval map of mini::bits 285and exposes a simple and convenient set behavior to the world of users. 286 287Let's start with the required `#include`s this time: 288 289[import ../example/large_bitset_/large_bitset.hpp] 290[large_bitset_includes] 291 292Besides `boost/icl/interval_map.hpp` and `bits.hpp` the most important 293include here is `boost/operators.hpp`. We use this library in order 294to further minimize the code and to provide pretty extensive operator 295functionality using very little code. 296 297For a short and concise naming of the most important unsigned integer 298types and the corresponding `mini::bits` we define this: 299 300[large_bitset_natural_typedefs] 301 302[section large_bitset: Public interface][/ ------------------------------------] 303 304And now let's code `large_bitset`. 305 306[large_bitset_class_template_header] 307 308The first template parameter `DomainT` will be instantiated with 309an integral type that defines the kind of numbers that can 310be elements of the set. Since we want to go for a large set we 311use `nat64` as default which is a 64 bit unsigned integer ranging 312from `0` to `2^64-1`. As bitset parameter we also choose a 64-bit 313default. Parameters `Combine` and `Interval` are necessary to 314be passed to dependent type expressions. An allocator can be 315chosen, if desired. 316 317The nested list of private inheritance contains groups of template 318instantiations from 319[@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator], 320that provides derivable operators from more fundamental once. 321Implementing the fundamental operators, we get the derivable ones 322/for free/. Below is a short overview of what we get using Boost.Operator, 323where __S stands for `large_bitset`, __i for it's `interval_type` 324and __e for it's `domain_type` or `element_type`. 325 326[table 327[[Group ][fundamental] [derivable ]] 328[[Equality, ordering ][`==`] [`!=` ]] 329[[ ][`<` ] [`> <= >=` ]] 330[[Set operators (__S x __S)][`+= |= -= &= ^=` ][`+ | - & ^` ]] 331[[Set operators (__S x __e)][`+= |= -= &= ^=` ][`+ | - & ^` ]] 332[[Set operators (__S x __i)][`+= |= -= &= ^=` ][`+ | - & ^` ]] 333] 334 335There is a number of associated types 336 337[large_bitset_associated_types] 338 339most importantly the implementing `interval_bitmap_type` that is used 340for the implementing container. 341 342[large_bitmap_impl_map] 343 344In order to use 345[@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator] 346we have to implement the fundamental operators as class members. This can be 347done quite schematically. 348 349[large_bitset_operators] 350 351As we can see, the seven most important operators that work on the 352class type `large_bitset` can be directly implemented by propagating 353the operation to the implementing `_map` 354of type `interval_bitmap_type`. For the operators that work on segment and 355element types, we use member functions `add`, `subtract`, `intersect` and 356`flip`. As we will see only a small amount of adaper code is needed 357to couple those functions with the functionality of the implementing 358container. 359 360Member functions `add`, `subtract`, `intersect` and `flip`, 361that allow to combine ['*intervals*] to `large_bitsets` can 362be uniformly implemented using a private function 363`segment_apply` that applies /addition/, /subtraction/, 364/intersection/ or /symmetric difference/, after having 365translated the interval's borders into the right bitset 366positions. 367 368[large_bitset_fundamental_functions] 369 370In the sample programs, that we will present to demonstrate 371the capabilities of `large_bitset` we will need a few additional 372functions specifically output functions in two different 373flavors. 374 375[large_bitset_demo_functions] 376 377* The first one, `show_segments()` shows the container 378content as it is implemented, in the compressed form. 379 380* The second function `show_matrix` shows the complete 381matrix of bits that is represented by the container. 382 383[endsect][/ large_bitset: Public interface] 384[section large_bitset: Private implementation] [/ -------------------------------------] 385 386In order to implement operations like the addition of 387an element say `42` to 388the large bitset, we need to translate the /value/ to the 389/position/ of the associated *bit* representing `42` in the 390interval container of bitsets. As an example, suppose we 391use a 392`` 393large_bitset<nat, mini::bits8> lbs; 394`` 395that carries small bitsets of 8 bits only. 396The first four interval of `lbs` are assumed to 397be associated with some bitsets. Now we want to add the interval 398`[a,b]==[5,27]`. This will result in the following situation: 399`` 400 [0,1)-> [1,2)-> [2,3)-> [3,4)-> 401 [00101100][11001011][11101001][11100000] 402+ [111 11111111 11111111 1111] [5,27] as bitset 403 a b 404 405=> [0,1)-> [1,3)-> [3,4)-> 406 [00101111][11111111][11110000] 407`` 408So we have to convert values 5 and 27 into a part that 409points to the interval and a part that refers to the 410position within the interval, which is done by a 411/division/ and a /modulo/ computation. (In order to have a 412consistent representation of the bitsequences across the containers, 413within this project, bitsets are denoted with the 414['*least significant bit on the left!*]) 415 416`` 417A = a/8 = 5/8 = 0 // refers to interval 418B = b/8 = 27/8 = 3 419R = a%8 = 5%8 = 5 // refers to the position in the associated bitset. 420S = b%8 = 27%8 = 3 421`` 422 423All /division/ and /modulo operations/ needed here are always done 424using a divisor `d` that is a power of `2`: `d = 2^x`. Therefore 425division and modulo can be expressed by bitset operations. 426The constants needed for those bitset computations are defined here: 427 428[large_bitset_impl_constants] 429 430Looking at the example again, we can see that we have to identify the 431positions of the beginning and ending of the interval [5,27] that is 432to insert, and then ['*subdivide*] that range of bitsets into three partitions. 433 434# The bitset where the interval starts. 435# the bitset where the interval ends 436# The bitsets that are completely overlapped by the interval 437 438`` 439combine interval [5,27] to large_bitset lbs w.r.t. some operation o 440 441 [0,1)-> [1,2)-> [2,3)-> [3,4)-> 442 [00101100][11001011][11101001][11100000] 443o [111 11111111 11111111 1111] 444 a b 445subdivide: 446 [first! ][mid_1] . . .[mid_n][ !last] 447 [00000111][1...1] . . .[1...1][11110000] 448`` 449 450After subdividing, we perform the operation `o` as follows: 451 452# For the first bitset: Set all bits from ther starting bit (`!`) 453 to the end of the bitset to `1`. All other bits are `0`. Then 454 perform operation `o`: `_map o= ([0,1)->00000111)` 455 456# For the last bitset: Set all bits from the beginning of the bitset 457 to the ending bit (`!`) to `1`. All other bits are `0`. Then 458 perform operation `o`: `_map o= ([3,4)->11110000)` 459 460# For the range of bitsets in between the staring and ending one, 461 perform operation `o`: `_map o= ([1,3)->11111111)` 462 463The algorithm, that has been outlined and illustrated by the 464example, is implemented by the private member function 465`segment_apply`. To make the combiner operation a variable 466in this algorithm, we use a /pointer to member function type/ 467 468[large_bitset_segment_combiner] 469 470as first function argument. We will pass member functions `combine_` here, 471`` 472combine_(first_of_interval, end_of_interval, some_bitset); 473`` 474that take the beginning and ending of an interval and a bitset and combine 475them to the implementing `interval_bitmap_type _map`. Here are these functions: 476 477[large_bitmap_combiners] 478 479Finally we can code function `segment_apply`, that does the partitioning 480and subsequent combining: 481 482[large_bitset_segment_apply] 483 484The functions that help filling bitsets to and from a given bit are 485implemented here: 486[large_bitset_bitset_filler] 487 488This completes the implementation of class template `large_bitset`. 489Using only a small amount of mostly schematic code, 490we have been able to provide a pretty powerful, self compressing 491and generally usable set type for all integral domain types. 492 493[endsect][/ large_bitset: Private implementation] 494 495[endsect][/ Implementation of a large bitset] 496 497[endsect][/ Large Bitsets IN Projects] 498 499 500[endsect][/ Projects] 501 502 503 504