1 //! Performance characteristics. 2 //! 3 //! There are several performance advantages of [`ArcSwap`] over [`RwLock`]. 4 //! 5 //! ## Lock-free readers 6 //! 7 //! All the read operations are always [lock-free]. Most of the time, they are actually 8 //! [wait-free]. They are [lock-free] from time to time, with at least `usize::MAX / 4` accesses 9 //! that are [wait-free] in between. 10 //! 11 //! Writers are [lock-free]. 12 //! 13 //! Whenever the documentation talks about *contention* in the context of [`ArcSwap`], it talks 14 //! about contention on the CPU level ‒ multiple cores having to deal with accessing the same cache 15 //! line. This slows things down (compared to each one accessing its own cache line), but an 16 //! eventual progress is still guaranteed and the cost is significantly lower than parking threads 17 //! as with mutex-style contention. 18 //! 19 //! ## Speeds 20 //! 21 //! The base line speed of read operations is similar to using an *uncontended* [`Mutex`]. 22 //! However, [`load`] suffers no contention from any other read operations and only slight 23 //! ones during updates. The [`load_full`] operation is additionally contended only on 24 //! the reference count of the [`Arc`] inside ‒ so, in general, while [`Mutex`] rapidly 25 //! loses its performance when being in active use by multiple threads at once and 26 //! [`RwLock`] is slow to start with, [`ArcSwap`] mostly keeps its performance even when read by 27 //! many threads in parallel. 28 //! 29 //! Write operations are considered expensive. A write operation is more expensive than access to 30 //! an *uncontended* [`Mutex`] and on some architectures even slower than uncontended 31 //! [`RwLock`]. However, it is faster than either under contention. 32 //! 33 //! There are some (very unscientific) [benchmarks] within the source code of the library, and the 34 //! [`DefaultStrategy`][crate::DefaultStrategy] has some numbers measured on my computer. 35 //! 36 //! The exact numbers are highly dependant on the machine used (both absolute numbers and relative 37 //! between different data structures). Not only architectures have a huge impact (eg. x86 vs ARM), 38 //! but even AMD vs. Intel or two different Intel processors. Therefore, if what matters is more 39 //! the speed than the wait-free guarantees, you're advised to do your own measurements. 40 //! 41 //! Further speed improvements may be gained by the use of the [`Cache`]. 42 //! 43 //! ## Consistency 44 //! 45 //! The combination of [wait-free] guarantees of readers and no contention between concurrent 46 //! [`load`]s provides *consistent* performance characteristics of the synchronization mechanism. 47 //! This might be important for soft-realtime applications (the CPU-level contention caused by a 48 //! recent update/write operation might be problematic for some hard-realtime cases, though). 49 //! 50 //! ## Choosing the right reading operation 51 //! 52 //! There are several load operations available. While the general go-to one should be 53 //! [`load`], there may be situations in which the others are a better match. 54 //! 55 //! The [`load`] usually only borrows the instance from the shared [`ArcSwap`]. This makes 56 //! it faster, because different threads don't contend on the reference count. There are two 57 //! situations when this borrow isn't possible. If the content gets changed, all existing 58 //! [`Guard`]s are promoted to contain an owned instance. The promotion is done by the 59 //! writer, but the readers still need to decrement the reference counts of the old instance when 60 //! they no longer use it, contending on the count. 61 //! 62 //! The other situation derives from internal implementation. The number of borrows each thread can 63 //! have at each time (across all [`Guard`]s) is limited. If this limit is exceeded, an owned 64 //! instance is created instead. 65 //! 66 //! Therefore, if you intend to hold onto the loaded value for extended time span, you may prefer 67 //! [`load_full`]. It loads the pointer instance ([`Arc`]) without borrowing, which is 68 //! slower (because of the possible contention on the reference count), but doesn't consume one of 69 //! the borrow slots, which will make it more likely for following [`load`]s to have a slot 70 //! available. Similarly, if some API needs an owned `Arc`, [`load_full`] is more convenient and 71 //! potentially faster then first [`load`]ing and then cloning that [`Arc`]. 72 //! 73 //! Additionally, it is possible to use a [`Cache`] to get further speed improvement at the 74 //! cost of less comfortable API and possibly keeping the older values alive for longer than 75 //! necessary. 76 //! 77 //! [`ArcSwap`]: crate::ArcSwap 78 //! [`Cache`]: crate::cache::Cache 79 //! [`Guard`]: crate::Guard 80 //! [`load`]: crate::ArcSwapAny::load 81 //! [`load_full`]: crate::ArcSwapAny::load_full 82 //! [`Arc`]: std::sync::Arc 83 //! [`Mutex`]: std::sync::Mutex 84 //! [`RwLock`]: std::sync::RwLock 85 //! [benchmarks]: https://github.com/vorner/arc-swap/tree/master/benchmarks 86 //! [lock-free]: https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom 87 //! [wait-free]: https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom 88