• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Rayon FAQ
2
3This file is for general questions that don't fit into the README or
4crate docs.
5
6## How many threads will Rayon spawn?
7
8By default, Rayon uses the same number of threads as the number of
9CPUs available. Note that on systems with hyperthreading enabled this
10equals the number of logical cores and not the physical ones.
11
12If you want to alter the number of threads spawned, you can set the
13environmental variable `RAYON_NUM_THREADS` to the desired number of
14threads or use the
15[`ThreadPoolBuilder::build_global` function](https://docs.rs/rayon/*/rayon/struct.ThreadPoolBuilder.html#method.build_global)
16method.
17
18## How does Rayon balance work between threads?
19
20Behind the scenes, Rayon uses a technique called **work stealing** to
21try and dynamically ascertain how much parallelism is available and
22exploit it. The idea is very simple: we always have a pool of worker
23threads available, waiting for some work to do. When you call `join`
24the first time, we shift over into that pool of threads. But if you
25call `join(a, b)` from a worker thread W, then W will place `b` into
26its work queue, advertising that this is work that other worker
27threads might help out with. W will then start executing `a`.
28
29While W is busy with `a`, other threads might come along and take `b`
30from its queue. That is called *stealing* `b`. Once `a` is done, W
31checks whether `b` was stolen by another thread and, if not, executes
32`b` itself. If W runs out of jobs in its own queue, it will look
33through the other threads' queues and try to steal work from them.
34
35This technique is not new. It was first introduced by the
36[Cilk project][cilk], done at MIT in the late nineties. The name Rayon
37is an homage to that work.
38
39[cilk]: http://supertech.csail.mit.edu/cilk/
40
41## What should I do if I use `Rc`, `Cell`, `RefCell` or other non-Send-and-Sync types?
42
43There are a number of non-threadsafe types in the Rust standard library,
44and if your code is using them, you will not be able to combine it
45with Rayon. Similarly, even if you don't have such types, but you try
46to have multiple closures mutating the same state, you will get
47compilation errors; for example, this function won't work, because
48both closures access `slice`:
49
50```rust
51/// Increment all values in slice.
52fn increment_all(slice: &mut [i32]) {
53    rayon::join(|| process(slice), || process(slice));
54}
55```
56
57The correct way to resolve such errors will depend on the case.  Some
58cases are easy: for example, uses of [`Rc`] can typically be replaced
59with [`Arc`], which is basically equivalent, but thread-safe.
60
61Code that uses `Cell` or `RefCell`, however, can be somewhat more complicated.
62If you can refactor your code to avoid those types, that is often the best way
63forward, but otherwise, you can try to replace those types with their threadsafe
64equivalents:
65
66- `Cell` -- replacement: `AtomicUsize`, `AtomicBool`, etc
67- `RefCell` -- replacement: `RwLock`, or perhaps `Mutex`
68
69However, you have to be wary! The parallel versions of these types
70have different atomicity guarantees. For example, with a `Cell`, you
71can increment a counter like so:
72
73```rust
74let value = counter.get();
75counter.set(value + 1);
76```
77
78But when you use the equivalent `AtomicUsize` methods, you are
79actually introducing a potential race condition (not a data race,
80technically, but it can be an awfully fine distinction):
81
82```rust
83let value = tscounter.load(Ordering::SeqCst);
84tscounter.store(value + 1, Ordering::SeqCst);
85```
86
87You can already see that the `AtomicUsize` API is a bit more complex,
88as it requires you to specify an
89[ordering](https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html). (I
90won't go into the details on ordering here, but suffice to say that if
91you don't know what an ordering is, and probably even if you do, you
92should use `Ordering::SeqCst`.) The danger in this parallel version of
93the counter is that other threads might be running at the same time
94and they could cause our counter to get out of sync. For example, if
95we have two threads, then they might both execute the "load" before
96either has a chance to execute the "store":
97
98```
99Thread 1                                          Thread 2
100let value = tscounter.load(Ordering::SeqCst);
101// value = X                                      let value = tscounter.load(Ordering::SeqCst);
102                                                  // value = X
103tscounter.store(value+1);                         tscounter.store(value+1);
104// tscounter = X+1                                // tscounter = X+1
105```
106
107Now even though we've had two increments, we'll only increase the
108counter by one!  Even though we've got no data race, this is still
109probably not the result we wanted. The problem here is that the `Cell`
110API doesn't make clear the scope of a "transaction" -- that is, the
111set of reads/writes that should occur atomically. In this case, we
112probably wanted the get/set to occur together.
113
114In fact, when using the `Atomic` types, you very rarely want a plain
115`load` or plain `store`. You probably want the more complex
116operations. A counter, for example, would use `fetch_add` to
117atomically load and increment the value in one step. Compare-and-swap
118is another popular building block.
119
120A similar problem can arise when converting `RefCell` to `RwLock`, but
121it is somewhat less likely, because the `RefCell` API does in fact
122have a notion of a transaction: the scope of the handle returned by
123`borrow` or `borrow_mut`. So if you convert each call to `borrow` to
124`read` (and `borrow_mut` to `write`), things will mostly work fine in
125a parallel setting, but there can still be changes in behavior.
126Consider using a `handle: RefCell<Vec<i32>>` like:
127
128```rust
129let len = handle.borrow().len();
130for i in 0 .. len {
131    let data = handle.borrow()[i];
132    println!("{}", data);
133}
134```
135
136In sequential code, we know that this loop is safe. But if we convert
137this to parallel code with an `RwLock`, we do not: this is because
138another thread could come along and do
139`handle.write().unwrap().pop()`, and thus change the length of the
140vector. In fact, even in *sequential* code, using very small borrow
141sections like this is an anti-pattern: you ought to be enclosing the
142entire transaction together, like so:
143
144```rust
145let vec = handle.borrow();
146let len = vec.len();
147for i in 0 .. len {
148    let data = vec[i];
149    println!("{}", data);
150}
151```
152
153Or, even better, using an iterator instead of indexing:
154
155```rust
156let vec = handle.borrow();
157for data in vec {
158    println!("{}", data);
159}
160```
161
162There are several reasons to prefer one borrow over many. The most
163obvious is that it is more efficient, since each borrow has to perform
164some safety checks. But it's also more reliable: suppose we modified
165the loop above to not just print things out, but also call into a
166helper function:
167
168```rust
169let vec = handle.borrow();
170for data in vec {
171    helper(...);
172}
173```
174
175And now suppose, independently, this helper fn evolved and had to pop
176something off of the vector:
177
178```rust
179fn helper(...) {
180    handle.borrow_mut().pop();
181}
182```
183
184Under the old model, where we did lots of small borrows, this would
185yield precisely the same error that we saw in parallel land using an
186`RwLock`: the length would be out of sync and our indexing would fail
187(note that in neither case would there be an actual *data race* and
188hence there would never be undefined behavior). But now that we use a
189single borrow, we'll see a borrow error instead, which is much easier
190to diagnose, since it occurs at the point of the `borrow_mut`, rather
191than downstream. Similarly, if we move to an `RwLock`, we'll find that
192the code either deadlocks (if the write is on the same thread as the
193read) or, if the write is on another thread, works just fine. Both of
194these are preferable to random failures in my experience.
195
196## But wait, isn't Rust supposed to free me from this kind of thinking?
197
198You might think that Rust is supposed to mean that you don't have to
199think about atomicity at all. In fact, if you avoid interior
200mutability (`Cell` and `RefCell` in a sequential setting, or
201`AtomicUsize`, `RwLock`, `Mutex`, et al. in parallel code), then this
202is true: the type system will basically guarantee that you don't have
203to think about atomicity at all. But often there are times when you
204WANT threads to interleave in the ways I showed above.
205
206Consider for example when you are conducting a search in parallel, say
207to find the shortest route. To avoid fruitless search, you might want
208to keep a cell with the shortest route you've found thus far.  This
209way, when you are searching down some path that's already longer than
210this shortest route, you can just stop and avoid wasted effort. In
211sequential land, you might model this "best result" as a shared value
212like `Rc<Cell<usize>>` (here the `usize` represents the length of best
213path found so far); in parallel land, you'd use a `Arc<AtomicUsize>`.
214Now we can make our search function look like:
215
216```rust
217fn search(path: &Path, cost_so_far: usize, best_cost: &Arc<AtomicUsize>) {
218    if cost_so_far >= best_cost.load(Ordering::SeqCst) {
219        return;
220    }
221    ...
222    best_cost.store(...);
223}
224```
225
226Now in this case, we really WANT to see results from other threads
227interjected into our execution!
228