1# Introduction: the sleep module 2 3The code in this module governs when worker threads should go to 4sleep. The system used in this code was introduced in [Rayon RFC #5]. 5There is also a [video walkthrough] available. Both of those may be 6valuable resources to understanding the code, though naturally they 7will also grow stale over time. The comments in this file are 8extracted from the RFC and meant to be kept up to date. 9 10[Rayon RFC #5]: https://github.com/rayon-rs/rfcs/pull/5 11[video walkthrough]: https://youtu.be/HvmQsE5M4cY 12 13# The `Sleep` struct 14 15The `Sleep` struct is embedded into each registry. It performs several functions: 16 17* It tracks when workers are awake or asleep. 18* It decides how long a worker should look for work before it goes to sleep, 19 via a callback that is invoked periodically from the worker's search loop. 20* It is notified when latches are set, jobs are published, or other 21 events occur, and it will go and wake the appropriate threads if 22 they are sleeping. 23 24# Thread states 25 26There are three main thread states: 27 28* An **active** thread is one that is actively executing a job. 29* An **idle** thread is one that is searching for work to do. It will be 30 trying to steal work or pop work from the global injector queue. 31* A **sleeping** thread is one that is blocked on a condition variable, 32 waiting to be awoken. 33 34We sometimes refer to the final two states collectively as **inactive**. 35Threads begin as idle but transition to idle and finally sleeping when 36they're unable to find work to do. 37 38## Sleepy threads 39 40There is one other special state worth mentioning. During the idle state, 41threads can get **sleepy**. A sleepy thread is still idle, in that it is still 42searching for work, but it is *about* to go to sleep after it does one more 43search (or some other number, potentially). When a thread enters the sleepy 44state, it signals (via the **jobs event counter**, described below) that it is 45about to go to sleep. If new work is published, this will lead to the counter 46being adjusted. When the thread actually goes to sleep, it will (hopefully, but 47not guaranteed) see that the counter has changed and elect not to sleep, but 48instead to search again. See the section on the **jobs event counter** for more 49details. 50 51# The counters 52 53One of the key structs in the sleep module is `AtomicCounters`, found in 54`counters.rs`. It packs three counters into one atomically managed value: 55 56* Two **thread counters**, which track the number of threads in a particular state. 57* The **jobs event counter**, which is used to signal when new work is available. 58 It (sort of) tracks the number of jobs posted, but not quite, and it can rollover. 59 60## Thread counters 61 62There are two thread counters, one that tracks **inactive** threads and one that 63tracks **sleeping** threads. From this, one can deduce the number of threads 64that are idle by subtracting sleeping threads from inactive threads. We track 65the counters in this way because it permits simpler atomic operations. One can 66increment the number of sleeping threads (and thus decrease the number of idle 67threads) simply by doing one atomic increment, for example. Similarly, one can 68decrease the number of sleeping threads (and increase the number of idle 69threads) through one atomic decrement. 70 71These counters are adjusted as follows: 72 73* When a thread enters the idle state: increment the inactive thread counter. 74* When a thread enters the sleeping state: increment the sleeping thread counter. 75* When a thread awakens a sleeping thread: decrement the sleeping thread counter. 76 * Subtle point: the thread that *awakens* the sleeping thread decrements the 77 counter, not the thread that is *sleeping*. This is because there is a delay 78 between siganling a thread to wake and the thread actually waking: 79 decrementing the counter when awakening the thread means that other threads 80 that may be posting work will see the up-to-date value that much faster. 81* When a thread finds work, exiting the idle state: decrement the inactive 82 thread counter. 83 84## Jobs event counter 85 86The final counter is the **jobs event counter**. The role of this counter is to 87help sleepy threads detect when new work is posted in a lightweight fashion. In 88its simplest form, we would simply have a counter that gets incremented each 89time a new job is posted. This way, when a thread gets sleepy, it could read the 90counter, and then compare to see if the value has changed before it actually 91goes to sleep. But this [turns out to be too expensive] in practice, so we use a 92somewhat more complex scheme. 93 94[turns out to be too expensive]: https://github.com/rayon-rs/rayon/pull/746#issuecomment-624802747 95 96The idea is that the counter toggles between two states, depending on whether 97its value is even or odd (or, equivalently, on the value of its low bit): 98 99* Even -- If the low bit is zero, then it means that there has been no new work 100 since the last thread got sleepy. 101* Odd -- If the low bit is one, then it means that new work was posted since 102 the last thread got sleepy. 103 104### New work is posted 105 106When new work is posted, we check the value of the counter: if it is even, 107then we increment it by one, so that it becomes odd. 108 109### Worker thread gets sleepy 110 111When a worker thread gets sleepy, it will read the value of the counter. If the 112counter is odd, it will increment the counter so that it is even. Either way, it 113remembers the final value of the counter. The final value will be used later, 114when the thread is going to sleep. If at that time the counter has not changed, 115then we can assume no new jobs have been posted (though note the remote 116possibility of rollover, discussed in detail below). 117 118# Protocol for a worker thread to post work 119 120The full protocol for a thread to post work is as follows 121 122* If the work is posted into the injection queue, then execute a seq-cst fence (see below). 123* Load the counters, incrementing the JEC if it is even so that it is odd. 124* Check if there are idle threads available to handle this new job. If not, 125 and there are sleeping threads, then wake one or more threads. 126 127# Protocol for a worker thread to fall asleep 128 129The full protocol for a thread to fall asleep is as follows: 130 131* After completing all its jobs, the worker goes idle and begins to 132 search for work. As it searches, it counts "rounds". In each round, 133 it searches all other work threads' queues, plus the 'injector queue' for 134 work injected from the outside. If work is found in this search, the thread 135 becomes active again and hence restarts this protocol from the top. 136* After a certain number of rounds, the thread "gets sleepy" and executes `get_sleepy` 137 above, remembering the `final_value` of the JEC. It does one more search for work. 138* If no work is found, the thread atomically: 139 * Checks the JEC to see that it has not changed from `final_value`. 140 * If it has, then the thread goes back to searchin for work. We reset to 141 just before we got sleepy, so that we will do one more search 142 before attending to sleep again (rather than searching for many rounds). 143 * Increments the number of sleeping threads by 1. 144* The thread then executes a seq-cst fence operation (see below). 145* The thread then does one final check for injected jobs (see below). If any 146 are available, it returns to the 'pre-sleepy' state as if the JEC had changed. 147* The thread waits to be signaled. Once signaled, it returns to the idle state. 148 149# The jobs event counter and deadlock 150 151As described in the section on the JEC, the main concern around going to sleep 152is avoiding a race condition wherein: 153 154* Thread A looks for work, finds none. 155* Thread B posts work but sees no sleeping threads. 156* Thread A goes to sleep. 157 158The JEC protocol largely prevents this, but due to rollover, this prevention is 159not complete. It is possible -- if unlikely -- that enough activity occurs for 160Thread A to observe the same JEC value that it saw when getting sleepy. If the 161new work being published came from *inside* the thread-pool, then this race 162condition isn't too harmful. It means that we have fewer workers processing the 163work then we should, but we won't deadlock. This seems like an acceptable risk 164given that this is unlikely in practice. 165 166However, if the work was posted as an *external* job, that is a problem. In that 167case, it's possible that all of our workers could go to sleep, and the external 168job would never get processed. To prevent that, the sleeping protocol includes 169one final check to see if the injector queue is empty before fully falling 170asleep. Note that this final check occurs **after** the number of sleeping 171threads has been incremented. We are not concerned therefore with races against 172injections that occur after that increment, only before. 173 174Unfortunately, there is one rather subtle point concerning this final check: 175we wish to avoid the possibility that: 176 177* work is pushed into the injection queue by an outside thread X, 178* the sleepy thread S sees the JEC but it has rolled over and is equal 179* the sleepy thread S reads the injection queue but does not see the work posted by X. 180 181This is possible because the C++ memory model typically offers guarantees of the 182form "if you see the access A, then you must see those other accesses" -- but it 183doesn't guarantee that you will see the access A (i.e., if you think of 184processors with independent caches, you may be operating on very out of date 185cache state). 186 187## Using seq-cst fences to prevent deadlock 188 189To overcome this problem, we have inserted two sequentially consistent fence 190operations into the protocols above: 191 192* One fence occurs after work is posted into the injection queue, but before the 193 counters are read (including the number of sleeping threads). 194 * Note that no fence is needed for work posted to internal queues, since it is ok 195 to overlook work in that case. 196* One fence occurs after the number of sleeping threads is incremented, but 197 before the injection queue is read. 198 199### Proof sketch 200 201What follows is a "proof sketch" that the protocol is deadlock free. We model 202two relevant bits of memory, the job injector queue J and the atomic counters C. 203 204Consider the actions of the injecting thread: 205 206* PushJob: Job is injected, which can be modeled as an atomic write to J with release semantics. 207* PushFence: A sequentially consistent fence is executed. 208* ReadSleepers: The counters C are read (they may also be incremented, but we just consider the read that comes first). 209 210Meanwhile, the sleepy thread does the following: 211 212* IncSleepers: The number of sleeping threads is incremented, which is atomic exchange to C. 213* SleepFence: A sequentially consistent fence is executed. 214* ReadJob: We look to see if the queue is empty, which is a read of J with acquire semantics. 215 216Either PushFence or SleepFence must come first: 217 218* If PushFence comes first, then PushJob must be visible to ReadJob. 219* If SleepFence comes first, then IncSleepers is visible to ReadSleepers.