• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Mechanisms for Coordination Between Garbage Collector and Mutator
2-----------------------------------------------------------------
3
4Most garbage collection work can proceed concurrently with the client or
5mutator Java threads. But in certain places, for example while tracing from
6thread stacks, the garbage collector needs to ensure that Java data processed
7by the collector is consistent and complete. At these points, the mutators
8should not hold references to the heap that are invisible to the garbage
9collector. And they should not be modifying the data that is visible to the
10collector.
11
12Logically, the collector and mutator share a reader-writer lock on the Java
13heap and associated data structures. Mutators hold the lock in reader or shared mode
14while running Java code or touching heap-related data structures. The collector
15holds the lock in writer or exclusive mode while it needs the heap data
16structures to be stable. However, this reader-writer lock has a very customized
17implementation that also provides additional facilities, such as the ability
18to exclude only a single thread, so that we can specifically examine its heap
19references.
20
21In order to ensure consistency of the Java data, the compiler inserts "suspend
22points", sometimes also called "safe points" into the code. These allow a thread
23to respond to external requests.
24
25Whenever a thread is runnable, i.e. whenever a thread logically holds the
26mutator lock in shared mode, it is expected to regularly execute such a suspend
27point, and check for pending requests. They are currently implemented by
28setting a flag in the thread structure[^1], which is then explicitly tested by the
29compiler-generated code.
30
31A thread responds to suspend requests only when it is "runnable", i.e. logically
32running Java code. When it runs native code, or is blocked in a kernel call, it
33logically releases the mutator lock. When the garbage collector needs mutator
34cooperation, and the thread is not runnable, it is assured that the mutator is
35not touching Java data, and hence the collector can safely perform the required
36action itself, on the mutator thread's behalf.
37
38Normally, when a thread makes a JNI call, it is not considered runnable while
39executing native code. This makes the transitions to and from running native JNI
40code somewhat expensive (see below). But these transitions are necessary to
41ensure that such code, which does not execute "suspend points", and can thus not
42cooperate with the GC, doesn't delay GC completion. `@FastNative` and
43`@CriticalNative` calls avoid these transitions, instead allowing the thread to
44remain "runnable", at the expense of potentially delaying GC operations for the
45duration of the call.
46
47Although we say that a thread is "suspended" when it is not running Java code,
48it may in fact still be running native code and touching data structures that
49are not considered "Java data". This distinction can be a fine line. For
50example, a Java thread blocked on a Java monitor will normally be "suspended"
51and blocked on a mutex contained in the monitor data structure. But it may wake
52up for reasons beyond ARTs control, which will normally result in touching the
53mutex. The monitor code must be quite careful to ensure that this does not cause
54problems, especially if the ART runtime was shut down in the interim and the
55monitor data structure has been reclaimed.
56
57Calls to change thread state
58----------------------------
59
60When a thread changes between running Java and native code, it has to
61correspondingly change its state between "runnable" and one of several
62other states, all of which are considered to be "suspended" for our purposes.
63When a Java thread starts to execute native code, and may thus not respond
64promptly to suspend requests, it will normally create an object of type
65`ScopedThreadSuspension`. `ScopedThreadSuspension`'s constructor changes state to
66the "suspended" state given as an argument, logically releasing the mutator lock
67and promising to no longer touch Java data structures. It also handles any
68pending suspension requests that slid in just before it changed state.
69
70Conversely, `ScopedThreadSuspension`'s destructor waits until the GC has finished
71any actions it is currently performing on the thread's behalf and effectively
72released the mutator exclusive lock, and then returns to runnable state,
73re-acquiring the mutator lock.
74
75Occasionally a thread running native code needs to temporarily again access Java
76data structures, performing the above transitions in the opposite order.
77`ScopedObjectAccess` is a similar RAII object whose constructor and destructor
78perform those transitions in the reverse order from `ScopedThreadSuspension`.
79
80Mutator lock implementation
81---------------------------
82
83The mutator lock is not implemented as a conventional mutex. But it plays by the
84rules of our normal static thread-safety analysis. Thus a function that is
85expected to be called in runnable state, with the ability to access Java data,
86should be annotated with `REQUIRES_SHARED(Locks::mutator_lock_)`.
87
88There is an explicit `mutator_lock_` object, of type `MutatorMutex`. `MutatorMutex` is
89seemingly a minor refinement of `ReaderWriterMutex`, but it is used entirely
90differently. It is acquired explicitly by clients that need to hold it
91exclusively, and in a small number of cases, it is acquired in shared mode, e.g.
92via `SharedTryLock()`, or by the GC itself. However, more commonly
93`MutatorMutex::TransitionFromSuspendedToRunnable()`, is used to logically acquire
94the mutator mutex, e.g. as part of `ScopedObjectAccess` construction.
95
96`TransitionFromSuspendedToRunnable()` does not physically acquire the
97`ReaderWriterMutex` in shared mode. Thus any thread acquiring the lock in exclusive mode
98must, in addition, explicitly arrange for mutator threads to be suspended via the
99thread suspension mechanism, and then make them runnable again on release.
100
101Logically the mutator lock is held in shared/reader mode if ***either*** the
102underlying reader-writer lock is held in shared mode, ***or*** if a mutator is in
103runnable state. These two ways of holding the mutator mutex are ***not***
104equivalent: In particular, we rely on the garbage collector never actually
105entering a "runnable" state while active (see below). However, it often runs with
106the explicit mutator mutex in shared mode, thus blocking others from acquiring it
107in exclusive mode.
108
109Suspension and checkpoint API
110-----------------------------
111
112Suspend point checks enable three kinds of communication with mutator threads:
113
114**Checkpoints**
115: Checkpoint requests are used to get a thread to perform an action
116on our behalf. `RequestCheckpoint()` asks a specific thread to execute the closure
117supplied as an argument at its leisure. `RequestSynchronousCheckpoint()` in
118addition waits for the thread to complete running the closure, and handles
119suspended threads by running the closure on their behalf. In addition to these
120functions provided by `Thread`, `ThreadList` provides the `RunCheckpoint()` function
121that runs a checkpoint function on behalf of each thread, either by using
122`RequestCheckpoint()` to run it inside a running thread, or by ensuring that a
123suspended thread stays suspended, and then running the function on its behalf.
124`RunCheckpoint()` does not wait for completion of the function calls triggered by
125the resulting `RequestCheckpoint()` invocations.
126
127**Empty checkpoints**
128: ThreadList provides `RunEmptyCheckpoint()`, which waits until
129all threads have either passed a suspend point, or have been suspended. This
130ensures that no thread is still executing Java code inside the same
131suspend-point-delimited code interval it was executing before the call. For
132example, a read-barrier started before a `RunEmptyCheckpoint()` call will have
133finished before the call returns.
134
135**Thread suspension**
136: ThreadList provides a number of `SuspendThread...()` calls and
137a `SuspendAll()` call to suspend one or all threads until they are resumed by
138`Resume()` or `ResumeAll()`. The `Suspend...` calls guarantee that the target
139thread(s) are suspended (again, only in the sense of not running Java code)
140when the call returns.
141
142Deadlock freedom
143----------------
144
145It is easy to deadlock while attempting to run checkpoints, or suspending
146threads. In particular, we need to avoid situations in which we cannot suspend
147a thread because it is blocked, directly, or indirectly, on the GC completing
148its task. Deadlocks are avoided as follows:
149
150**Mutator lock ordering**
151The mutator lock participates in the normal ART lock ordering hierarchy, as though it
152were a regular lock. See `base/locks.h` for the hierarchy. In particular, only
153locks at or below level `kPostMutatorTopLockLevel` may be acquired after
154acquiring the mutator lock, e.g. inside the scope of a `ScopedObjectAccess`.
155Similarly only locks at level strictly above `kMutatatorLock` may be held while
156acquiring the mutator lock, e.g. either by starting a `ScopedObjectAccess`, or
157ending a `ScopedThreadSuspension`.
158
159This ensures that code that uses purely mutexes and threads state changes cannot
160deadlock: Since we always wait on a lower-level lock, the holder of the
161lowest-level lock can always progress. An attempt to initiate a checkpoint or to
162suspend another thread must also be treated as an acquisition of the mutator
163lock: A thread that is waiting for a lock before it can respond to the request
164is itself holding the mutator lock, and can only be blocked on lower-level
165locks. And acquisition of those can never depend on acquiring the mutator
166lock.
167
168**Checkpoints**
169Running a checkpoint in a thread requires suspending that thread for the
170duration of the checkpoint, or running the checkpoint on the threads behalf
171while that thread is blocked from executing Java code. In the former case, the
172checkpoint code is run from `CheckSuspend`, which requires the mutator lock,
173so checkpoint code may only acquire mutexes at or below level
174`kPostMutatorTopLockLevel`. But that is not sufficient.
175
176No matter whether the checkpoint is run in the target thread, or on its behalf,
177the target thread is effectively suspended and prevented from running Java code.
178However the target may hold arbitrary Java monitors, which it can no longer
179release. This may also prevent higher level mutexes from getting released.  Thus
180checkpoint code should only acquire mutexes at level `kPostMonitorLock` or
181below.
182
183
184**Waiting**
185This becomes much more problematic when we wait for something other than a lock.
186Waiting for something that may depend on the GC, while holding the mutator lock,
187can potentially lead to deadlock, since it will prevent the waiting thread from
188participating in GC checkpoints. Waiting while holding a lower-level lock like
189`thread_list_lock_` is similarly unsafe in general, since a runnable thread may
190not respond to checkpoints until it acquires `thread_list_lock_`. In general,
191waiting for a condition variable while holding an unrelated lock is problematic,
192and these are specific instances of that general problem.
193
194We do currently provide `WaitHoldingLocks`, and it is sometimes used with
195low-level locks held. But such code must somehow ensure that such waits
196eventually terminate without deadlock.
197
198One common use of WaitHoldingLocks is to wait for weak reference processing.
199Special rules apply to avoid deadlocks in this case: Such waits must start after
200weak reference processing is disabled; the GC may not issue further nonempty
201checkpoints or suspend requests until weak reference processing has been
202reenabled, and threads have been notified. Thus the waiting thread's inability
203to respond to nonempty checkpoints and suspend requests cannot directly block
204the GC. Non-GC checkpoint or suspend requests that target a thread waiting on
205reference processing will block until reference processing completes.
206
207Consider a case in which thread W1 waits on reference processing, while holding
208a low-level mutex M. Thread W2 holds the mutator lock and waits on M. We avoid a
209situation in which the GC needs to suspend or checkpoint W2 by briefly stopping
210the world to disable weak reference access. During the stop-the-world phase, W1
211cannot yet be waiting for weak-reference access.  Thus there is no danger of
212deadlock while entering this phase. After this phase, there is no need for W2 to
213suspend or execute a nonempty checkpoint. If we replaced the stop-the-world
214phase by a checkpoint, W2 could receive the checkpoint request too late, and be
215unable to respond.
216
217Empty checkpoints can continue to occur during reference processing.  Reference
218processing wait loops explicitly handle empty checkpoints, and an empty
219checkpoint request notifies the condition variable used to wait for reference
220processing, after acquiring `reference_processor_lock_`.  This means that empty
221checkpoints do not preclude client threads from being in the middle of an
222operation that involves a weak reference access, while nonempty checkpoints do.
223
224**Suspending the GC**
225Under unusual conditions, the GC can run on any thread. This means that when
226thread *A* suspends thread *B* for some other reason, Thread *B* might be
227running the garbage collector and conceivably thus cause it to block.  This
228would be very deadlock prone. If Thread *A* allocates while Thread *B* is
229suspended in the GC, and the allocation requires the GC's help to complete, we
230deadlock.
231
232Thus we ensure that the GC, together with anything else that can block GCs,
233cannot be blocked for thread suspension requests. This is accomplished by
234ensuring that it always appears to be in a suspended thread state. Since we
235only check for suspend requests when entering the runnable state, suspend
236requests go unnoticed until the GC completes. It may physically acquire and
237release the actual `mutator_lock_` in either shared or exclusive mode.
238
239Thread Suspension Mechanics
240---------------------------
241
242Thread suspension is initiated by a registered thread, except that, for testing
243purposes, `SuspendAll` may be invoked with `self == nullptr`.  We never suspend
244the initiating thread, explicitly exclusing it from `SuspendAll()`, and failing
245`SuspendThreadBy...()` requests to that effect.
246
247The suspend calls invoke `IncrementSuspendCount()` to increment the thread
248suspend count for each thread. That adds a "suspend barrier" (atomic counter) to
249the per-thread list of such counters to decrement. It normally sets the
250`kSuspendRequest` ("should enter safepoint handler") and `kActiveSuspendBarrier`
251("need to notify us when suspended") flags.
252
253After setting these two flags, we check whether the thread is suspended and
254`kSuspendRequest` is still set. Since the thread is already suspended, it cannot
255be expected to respond to "pass the suspend barrier" (decrement the atomic
256counter) in a timely fashion.  Hence we do so on its behalf. This decrements
257the "barrier" and removes it from the thread's list of barriers to decrement,
258and clears `kActiveSuspendBarrier`. `kSuspendRequest` remains to ensure the
259thread doesn't prematurely return to runnable state.
260
261If `SuspendAllInternal()` does not immediately see a suspended state, then it is up
262to the target thread to decrement the suspend barrier.
263`TransitionFromRunnableToSuspended()` calls
264`TransitionToSuspendedAndRunCheckpoints()`, which changes the thread state
265and returns. `TransitionFromRunnableToSuspended()` then calls
266`CheckActiveSuspendBarriers()` to check for the `kActiveSuspendBarrier` flag
267and decrement the suspend barrier if set.
268
269The `suspend_count_lock_` is not consistently held in the target thread
270during this process.  Thus correctness in resolving the race between a
271suspension-requesting thread and a target thread voluntarily suspending relies
272on first requesting suspension, and then checking whether the target is
273already suspended, The detailed correctness argument is given in a comment
274inside `SuspendAllInternal()`. This also ensures that the barrier cannot be
275decremented after the stack frame holding the barrier goes away.
276
277This relies on the fact that the two stores in the two threads to the state and
278kActiveSuspendBarrier flag are ordered with respect to the later loads. That's
279guaranteed, since they are all stored in a single `atomic<>`. Thus even relaxed
280accesses are OK.
281
282The actual suspend barrier representation still varies between `SuspendAll()`
283and `SuspendThreadBy...()`.  The former relies on the fact that only one such
284barrier can be in use at a time, while the latter maintains a linked list of
285active suspend barriers for each target thread, relying on the fact that each
286one can appear on the list of only one thread, and we can thus use list nodes
287allocated in the stack frames of requesting threads.
288
289**Avoiding suspension cycles**
290
291Any thread can issue a `SuspendThreadByPeer()`, `SuspendThreadByThreadId()` or
292`SuspendAll()` request. But if Thread A increments Thread B's suspend count
293while Thread B increments Thread A's suspend count, and they then both suspend
294during a subsequent thread transition, we're deadlocked.
295
296For single-thread suspension requests, we refuse to initiate
297a suspend request from a registered thread that is also being asked to suspend
298(i.e. the suspend count is nonzero).  Instead the requestor waits for that
299condition to change.  This means that we cannot create a cycle in which each
300thread has asked to suspend the next one, and thus no thread can progress.  The
301required atomicity of the requestor suspend count check with setting the suspend
302count of the target(s) target is ensured by holding `suspend_count_lock_`.
303
304For `SuspendAll()`, we enforce a requirement that at most one `SuspendAll()`
305request is running at one time. We also set the `kSuspensionImmune` thread flag
306to prevent a single thread suspension of a thread currently between
307`SuspendAll()` and `ResumeAll()` calls. Thus once a `SuspendAll()` call starts,
308it will complete before it can be affected by suspension requests from other
309threads.
310
311
312Thread state transitions, flags, and memory ordering
313----------------------------------------------------
314
315Logically when a state transitions to state `kRunnable`, it acquires the mutator
316lock (in shared mode). When it changes its state back to suspended, it releases
317that lock. These must enforce proper happens-before ordering with respect to a
318thread acquiring the mutator lock in exclusive mode. Thus changing the thread
319state to suspended normally requires a release operation, to ensure visibility
320to a thread wishing to "suspend". Conversely, when we change the state back to
321`kRunnable` in `TransitionFromSuspendedToRunnable` we do so with an acquire
322compare-exchange operation to check that no suspension request remains.
323
324Any suspending thread should clear `kSuspendRequest` with a release
325operation. This, together with the acquire compare-exchange when returning to
326runnable state, ensure that suspend-count decrements happen-before a thread
327becomes runnable again.
328
329Flags are often set and tested with relaxed ordering, even when they may
330communicate a request to another thread. The reason this is safe varies.
331Generally the request is conformed by some other better synchronized
332interaction, which establishes the necessary ordering. In particular:
333
334`kCheckPointRequest`
335: See below. The call happens-before checkpoint execution since
336`checkpoint_function` accesses are guarded by a lock, as are updates to the
337flag. Checkpoint completion ordering is ensured by waiting for both
338`ThreadList::RunCheckpoint()` to complete, and for a barrier indicating
339completion in "runnable" threads.
340
341`kEmptyCheckpointRequest`
342: Currently (12/2024) in flux. See below and b/382722942 . We currently use
343acquire/release ordering to access this flag in some places to partially
344mitigate memory ordering issues here.
345
346`kActiveSuspendBarrier`
347: Changes are protected by `thread_suspend_count_lock_`, and
348`PassActiveSuspendBarriers` rechecks it while holding that lock.
349
350`kPendingFlipFunction`
351: Set while holding `thread_list_lock_`, but this lock is not consistently held
352while checking the flag, notably in `EnsureFlipFunctionStarted()`. We need
353acquire/release ordering to make sure that the requestor happens-before flip
354function execution, and the flip function is properly visible, among other
355things. We still commonly read the flag using a relaxed operation, but we
356confirm it with an acquire compare-exchange operation in
357`EnsureFlipFunctionStarted()` before actiong on it.
358
359`kRunningFlipFunction`
360: Cleared with release ordering, and read with acquire ordering by
361`WaitForFlipFunction` and its callers, thus ensuring that flip function
362execution happens-before completion of WaitForFlipFunction.
363
364`kSuspensionImmune`
365: Guarded by `thread_suspend_count_lock_`.
366
367### Checkpoint-specific considerations
368
369Checkpoints expose additional tricky issues, since they are also used to ensure
370that certain global state changes have propagated to all running Java threads.
371
372`ThreadList::RunCheckpoint()` guarantees that "if at point _X_ `RunCheckpoint()`
373has returned, and all checkpoints have been properly observed to have completed,
374then every thread has executed a code sequence _S_ during which it remained in
375a suspended state, such that the call to `RunCheckpoint` happens-before the end
376of _S_, and the beginning of _S_ happened before _X_."
377
378For each thread _T_, we attempt to atomically install the `kCheckpointRequest`
379flag while ensuring that _T_ is runnable.  If this succeeds, the fact that
380`tlsPtr_.checkpoint_function` is protected by `thread_suspend_count_lock_`, and
381`checkpoint_function` is written by the requestor, and read by _T_ at a suspend
382point, ensures that the call happens-before _S_. Normally a barrier ensures that
383_S_ happens-before _X_ .
384
385If we are unable to do so for a thread _T_ because we find it suspended, we run
386`checkpoint_function` ourselves. Before running it we set `kSuspendRequest`
387(with a release store) while _T_ is in _S_, preventing _T_ from terminating _S_
388by becoming runnable, after an acquire load of the flags. This ensures that the
389`RunCheckpoint()` call happens-before the end of _S_. Since we learned of _T_'s
390suspended state via an acquire load of its state, which it stored with a release
391store, we know that the beginning of _S_ happens-before we return from
392`RunCheckpoint()`, and hence before _X_ .
393
394The case of empty checkpoints is worth highlighting. We have traditionally
395optimized that by avoiding ever suspending the affected threads. This appears
396correct if all operations are sequentially consistent. But once memory model
397issues are considered, it appears more expensive to do fully correctly than it
398is to forego the optimization, as we explain below:
399
400We need to ensure that if Thread _A_ performs some update _U_ and then calls
401`RunEmptyCheckpoint()` then, when `RunEmptyCheckpoint()` returns, every Thread
402_B_ will have passed a suspend point at which _U_ became guaranteed visible to
403that thread. This could be ensured in different ways, depending on the observed
404state of Thread _B_:
405
406Runnable
407: Use acquire/release ordering when setting and detecting the
408`kEmptyCheckpointRequest` flag, and then use a barrier to observe when thread _B_
409has done so. In this case, Thread _B_ actually executes code at a suspend point.
410The acquire release ordering on `kEmptyCheckpointRequest` ensures that _U_
411happens-before the suspend point, and the barrier ensures that the suspend point
412happens-before the `RunEmptyCheckpoint()` return.
413
414Suspended
415: In this case, we have no mechanism for the thread to execute synchronization
416operations, and things become trickier. Thread _B_ is suspended, but it may
417restart at any time. Thread _A_ wants to conclude that Thread _B_ is already
418suspended at a suspend point, and thus it is safe to ignore Thread _B_.
419Effectively, it wants to perform the update _U_, read _B_'s state, and continue,
420while _B_ may change its state back to runnable, and then read the result of
421_U_. Both threads effectively perform a store (either _U_ or to the state of
422_B_) and then read the other value. With only acquire/release ordering, this can
423result in _A_ seeing the old suspended state, and _B_ failing to see the update
424_U_, which is an incorrect outcome.
425: The canonical fix for this kind of `store; load` reordering problem is to make
426all operations sequentially consistent. There does not appear to be an easy way
427to limit this overhead to just empty checkpoint execution. Thus it appears to be
428better to forego this "optimization".
429
430[^1]: In the most recent versions of ART, compiler-generated code loads through
431    the address at `tlsPtr_.suspend_trigger`. A thread suspension is requested
432    by setting this to null, triggering a `SIGSEGV`, causing that thread to
433    check for GC cooperation requests. The older mechanism instead sets an
434    appropriate `ThreadFlag` entry to request suspension or a checkpoint. Note
435    that the actual checkpoint function value is set, along with the flag, while
436    holding `suspend_count_lock_`. If the target thread notices that a
437    checkpoint is requested, it then acquires the `suspend_count_lock_` to read
438    the checkpoint function.
439