• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright 2009 The Go Authors. All rights reserved.
2// Use of this source code is governed by a BSD-style
3// license that can be found in the LICENSE file.
4
5// Garbage collector (GC).
6//
7// The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple
8// GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is
9// non-generational and non-compacting. Allocation is done using size segregated per P allocation
10// areas to minimize fragmentation while eliminating locks in the common case.
11//
12// The algorithm decomposes into several steps.
13// This is a high level description of the algorithm being used. For an overview of GC a good
14// place to start is Richard Jones' gchandbook.org.
15//
16// The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see
17// Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978.
18// On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978),
19// 966-975.
20// For journal quality proofs that these steps are complete, correct, and terminate see
21// Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world.
22// Concurrency and Computation: Practice and Experience 15(3-5), 2003.
23//
24// 1. GC performs sweep termination.
25//
26//    a. Stop the world. This causes all Ps to reach a GC safe-point.
27//
28//    b. Sweep any unswept spans. There will only be unswept spans if
29//    this GC cycle was forced before the expected time.
30//
31// 2. GC performs the mark phase.
32//
33//    a. Prepare for the mark phase by setting gcphase to _GCmark
34//    (from _GCoff), enabling the write barrier, enabling mutator
35//    assists, and enqueueing root mark jobs. No objects may be
36//    scanned until all Ps have enabled the write barrier, which is
37//    accomplished using STW.
38//
39//    b. Start the world. From this point, GC work is done by mark
40//    workers started by the scheduler and by assists performed as
41//    part of allocation. The write barrier shades both the
42//    overwritten pointer and the new pointer value for any pointer
43//    writes (see mbarrier.go for details). Newly allocated objects
44//    are immediately marked black.
45//
46//    c. GC performs root marking jobs. This includes scanning all
47//    stacks, shading all globals, and shading any heap pointers in
48//    off-heap runtime data structures. Scanning a stack stops a
49//    goroutine, shades any pointers found on its stack, and then
50//    resumes the goroutine.
51//
52//    d. GC drains the work queue of grey objects, scanning each grey
53//    object to black and shading all pointers found in the object
54//    (which in turn may add those pointers to the work queue).
55//
56//    e. Because GC work is spread across local caches, GC uses a
57//    distributed termination algorithm to detect when there are no
58//    more root marking jobs or grey objects (see gcMarkDone). At this
59//    point, GC transitions to mark termination.
60//
61// 3. GC performs mark termination.
62//
63//    a. Stop the world.
64//
65//    b. Set gcphase to _GCmarktermination, and disable workers and
66//    assists.
67//
68//    c. Perform housekeeping like flushing mcaches.
69//
70// 4. GC performs the sweep phase.
71//
72//    a. Prepare for the sweep phase by setting gcphase to _GCoff,
73//    setting up sweep state and disabling the write barrier.
74//
75//    b. Start the world. From this point on, newly allocated objects
76//    are white, and allocating sweeps spans before use if necessary.
77//
78//    c. GC does concurrent sweeping in the background and in response
79//    to allocation. See description below.
80//
81// 5. When sufficient allocation has taken place, replay the sequence
82// starting with 1 above. See discussion of GC rate below.
83
84// Concurrent sweep.
85//
86// The sweep phase proceeds concurrently with normal program execution.
87// The heap is swept span-by-span both lazily (when a goroutine needs another span)
88// and concurrently in a background goroutine (this helps programs that are not CPU bound).
89// At the end of STW mark termination all spans are marked as "needs sweeping".
90//
91// The background sweeper goroutine simply sweeps spans one-by-one.
92//
93// To avoid requesting more OS memory while there are unswept spans, when a
94// goroutine needs another span, it first attempts to reclaim that much memory
95// by sweeping. When a goroutine needs to allocate a new small-object span, it
96// sweeps small-object spans for the same object size until it frees at least
97// one object. When a goroutine needs to allocate large-object span from heap,
98// it sweeps spans until it frees at least that many pages into heap. There is
99// one case where this may not suffice: if a goroutine sweeps and frees two
100// nonadjacent one-page spans to the heap, it will allocate a new two-page
101// span, but there can still be other one-page unswept spans which could be
102// combined into a two-page span.
103//
104// It's critical to ensure that no operations proceed on unswept spans (that would corrupt
105// mark bits in GC bitmap). During GC all mcaches are flushed into the central cache,
106// so they are empty. When a goroutine grabs a new span into mcache, it sweeps it.
107// When a goroutine explicitly frees an object or sets a finalizer, it ensures that
108// the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish).
109// The finalizer goroutine is kicked off only when all spans are swept.
110// When the next GC starts, it sweeps all not-yet-swept spans (if any).
111
112// GC rate.
113// Next GC is after we've allocated an extra amount of memory proportional to
114// the amount already in use. The proportion is controlled by GOGC environment variable
115// (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M
116// (this mark is computed by the gcController.heapGoal method). This keeps the GC cost in
117// linear proportion to the allocation cost. Adjusting GOGC just changes the linear constant
118// (and also the amount of extra memory used).
119
120// Oblets
121//
122// In order to prevent long pauses while scanning large objects and to
123// improve parallelism, the garbage collector breaks up scan jobs for
124// objects larger than maxObletBytes into "oblets" of at most
125// maxObletBytes. When scanning encounters the beginning of a large
126// object, it scans only the first oblet and enqueues the remaining
127// oblets as new scan jobs.
128
129package runtime
130
131import (
132	"internal/cpu"
133	"internal/runtime/atomic"
134	"unsafe"
135)
136
137const (
138	_DebugGC      = 0
139	_FinBlockSize = 4 * 1024
140
141	// concurrentSweep is a debug flag. Disabling this flag
142	// ensures all spans are swept while the world is stopped.
143	concurrentSweep = true
144
145	// debugScanConservative enables debug logging for stack
146	// frames that are scanned conservatively.
147	debugScanConservative = false
148
149	// sweepMinHeapDistance is a lower bound on the heap distance
150	// (in bytes) reserved for concurrent sweeping between GC
151	// cycles.
152	sweepMinHeapDistance = 1024 * 1024
153)
154
155// heapObjectsCanMove always returns false in the current garbage collector.
156// It exists for go4.org/unsafe/assume-no-moving-gc, which is an
157// unfortunate idea that had an even more unfortunate implementation.
158// Every time a new Go release happened, the package stopped building,
159// and the authors had to add a new file with a new //go:build line, and
160// then the entire ecosystem of packages with that as a dependency had to
161// explicitly update to the new version. Many packages depend on
162// assume-no-moving-gc transitively, through paths like
163// inet.af/netaddr -> go4.org/intern -> assume-no-moving-gc.
164// This was causing a significant amount of friction around each new
165// release, so we added this bool for the package to //go:linkname
166// instead. The bool is still unfortunate, but it's not as bad as
167// breaking the ecosystem on every new release.
168//
169// If the Go garbage collector ever does move heap objects, we can set
170// this to true to break all the programs using assume-no-moving-gc.
171//
172//go:linkname heapObjectsCanMove
173func heapObjectsCanMove() bool {
174	return false
175}
176
177func gcinit() {
178	if unsafe.Sizeof(workbuf{}) != _WorkbufSize {
179		throw("size of Workbuf is suboptimal")
180	}
181	// No sweep on the first cycle.
182	sweep.active.state.Store(sweepDrainedMask)
183
184	// Initialize GC pacer state.
185	// Use the environment variable GOGC for the initial gcPercent value.
186	// Use the environment variable GOMEMLIMIT for the initial memoryLimit value.
187	gcController.init(readGOGC(), readGOMEMLIMIT())
188
189	work.startSema = 1
190	work.markDoneSema = 1
191	lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters)
192	lockInit(&work.assistQueue.lock, lockRankAssistQueue)
193	lockInit(&work.strongFromWeak.lock, lockRankStrongFromWeakQueue)
194	lockInit(&work.wbufSpans.lock, lockRankWbufSpans)
195}
196
197// gcenable is called after the bulk of the runtime initialization,
198// just before we're about to start letting user code run.
199// It kicks off the background sweeper goroutine, the background
200// scavenger goroutine, and enables GC.
201func gcenable() {
202	// Kick off sweeping and scavenging.
203	c := make(chan int, 2)
204	go bgsweep(c)
205	go bgscavenge(c)
206	<-c
207	<-c
208	memstats.enablegc = true // now that runtime is initialized, GC is okay
209}
210
211// Garbage collector phase.
212// Indicates to write barrier and synchronization task to perform.
213var gcphase uint32
214
215// The compiler knows about this variable.
216// If you change it, you must change builtin/runtime.go, too.
217// If you change the first four bytes, you must also change the write
218// barrier insertion code.
219//
220// writeBarrier should be an internal detail,
221// but widely used packages access it using linkname.
222// Notable members of the hall of shame include:
223//   - github.com/bytedance/sonic
224//   - github.com/cloudwego/frugal
225//
226// Do not remove or change the type signature.
227// See go.dev/issue/67401.
228//
229//go:linkname writeBarrier
230var writeBarrier struct {
231	enabled bool    // compiler emits a check of this before calling write barrier
232	pad     [3]byte // compiler uses 32-bit load for "enabled" field
233	alignme uint64  // guarantee alignment so that compiler can use a 32 or 64-bit load
234}
235
236// gcBlackenEnabled is 1 if mutator assists and background mark
237// workers are allowed to blacken objects. This must only be set when
238// gcphase == _GCmark.
239var gcBlackenEnabled uint32
240
241const (
242	_GCoff             = iota // GC not running; sweeping in background, write barrier disabled
243	_GCmark                   // GC marking roots and workbufs: allocate black, write barrier ENABLED
244	_GCmarktermination        // GC mark termination: allocate black, P's help GC, write barrier ENABLED
245)
246
247//go:nosplit
248func setGCPhase(x uint32) {
249	atomic.Store(&gcphase, x)
250	writeBarrier.enabled = gcphase == _GCmark || gcphase == _GCmarktermination
251}
252
253// gcMarkWorkerMode represents the mode that a concurrent mark worker
254// should operate in.
255//
256// Concurrent marking happens through four different mechanisms. One
257// is mutator assists, which happen in response to allocations and are
258// not scheduled. The other three are variations in the per-P mark
259// workers and are distinguished by gcMarkWorkerMode.
260type gcMarkWorkerMode int
261
262const (
263	// gcMarkWorkerNotWorker indicates that the next scheduled G is not
264	// starting work and the mode should be ignored.
265	gcMarkWorkerNotWorker gcMarkWorkerMode = iota
266
267	// gcMarkWorkerDedicatedMode indicates that the P of a mark
268	// worker is dedicated to running that mark worker. The mark
269	// worker should run without preemption.
270	gcMarkWorkerDedicatedMode
271
272	// gcMarkWorkerFractionalMode indicates that a P is currently
273	// running the "fractional" mark worker. The fractional worker
274	// is necessary when GOMAXPROCS*gcBackgroundUtilization is not
275	// an integer and using only dedicated workers would result in
276	// utilization too far from the target of gcBackgroundUtilization.
277	// The fractional worker should run until it is preempted and
278	// will be scheduled to pick up the fractional part of
279	// GOMAXPROCS*gcBackgroundUtilization.
280	gcMarkWorkerFractionalMode
281
282	// gcMarkWorkerIdleMode indicates that a P is running the mark
283	// worker because it has nothing else to do. The idle worker
284	// should run until it is preempted and account its time
285	// against gcController.idleMarkTime.
286	gcMarkWorkerIdleMode
287)
288
289// gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes
290// to use in execution traces.
291var gcMarkWorkerModeStrings = [...]string{
292	"Not worker",
293	"GC (dedicated)",
294	"GC (fractional)",
295	"GC (idle)",
296}
297
298// pollFractionalWorkerExit reports whether a fractional mark worker
299// should self-preempt. It assumes it is called from the fractional
300// worker.
301func pollFractionalWorkerExit() bool {
302	// This should be kept in sync with the fractional worker
303	// scheduler logic in findRunnableGCWorker.
304	now := nanotime()
305	delta := now - gcController.markStartTime
306	if delta <= 0 {
307		return true
308	}
309	p := getg().m.p.ptr()
310	selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime)
311	// Add some slack to the utilization goal so that the
312	// fractional worker isn't behind again the instant it exits.
313	return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal
314}
315
316var work workType
317
318type workType struct {
319	full  lfstack          // lock-free list of full blocks workbuf
320	_     cpu.CacheLinePad // prevents false-sharing between full and empty
321	empty lfstack          // lock-free list of empty blocks workbuf
322	_     cpu.CacheLinePad // prevents false-sharing between empty and nproc/nwait
323
324	wbufSpans struct {
325		lock mutex
326		// free is a list of spans dedicated to workbufs, but
327		// that don't currently contain any workbufs.
328		free mSpanList
329		// busy is a list of all spans containing workbufs on
330		// one of the workbuf lists.
331		busy mSpanList
332	}
333
334	// Restore 64-bit alignment on 32-bit.
335	_ uint32
336
337	// bytesMarked is the number of bytes marked this cycle. This
338	// includes bytes blackened in scanned objects, noscan objects
339	// that go straight to black, and permagrey objects scanned by
340	// markroot during the concurrent scan phase. This is updated
341	// atomically during the cycle. Updates may be batched
342	// arbitrarily, since the value is only read at the end of the
343	// cycle.
344	//
345	// Because of benign races during marking, this number may not
346	// be the exact number of marked bytes, but it should be very
347	// close.
348	//
349	// Put this field here because it needs 64-bit atomic access
350	// (and thus 8-byte alignment even on 32-bit architectures).
351	bytesMarked uint64
352
353	markrootNext uint32 // next markroot job
354	markrootJobs uint32 // number of markroot jobs
355
356	nproc  uint32
357	tstart int64
358	nwait  uint32
359
360	// Number of roots of various root types. Set by gcMarkRootPrepare.
361	//
362	// nStackRoots == len(stackRoots), but we have nStackRoots for
363	// consistency.
364	nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int
365
366	// Base indexes of each root type. Set by gcMarkRootPrepare.
367	baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32
368
369	// stackRoots is a snapshot of all of the Gs that existed
370	// before the beginning of concurrent marking. The backing
371	// store of this must not be modified because it might be
372	// shared with allgs.
373	stackRoots []*g
374
375	// Each type of GC state transition is protected by a lock.
376	// Since multiple threads can simultaneously detect the state
377	// transition condition, any thread that detects a transition
378	// condition must acquire the appropriate transition lock,
379	// re-check the transition condition and return if it no
380	// longer holds or perform the transition if it does.
381	// Likewise, any transition must invalidate the transition
382	// condition before releasing the lock. This ensures that each
383	// transition is performed by exactly one thread and threads
384	// that need the transition to happen block until it has
385	// happened.
386	//
387	// startSema protects the transition from "off" to mark or
388	// mark termination.
389	startSema uint32
390	// markDoneSema protects transitions from mark to mark termination.
391	markDoneSema uint32
392
393	bgMarkDone uint32 // cas to 1 when at a background mark completion point
394	// Background mark completion signaling
395
396	// mode is the concurrency mode of the current GC cycle.
397	mode gcMode
398
399	// userForced indicates the current GC cycle was forced by an
400	// explicit user call.
401	userForced bool
402
403	// initialHeapLive is the value of gcController.heapLive at the
404	// beginning of this GC cycle.
405	initialHeapLive uint64
406
407	// assistQueue is a queue of assists that are blocked because
408	// there was neither enough credit to steal or enough work to
409	// do.
410	assistQueue struct {
411		lock mutex
412		q    gQueue
413	}
414
415	// sweepWaiters is a list of blocked goroutines to wake when
416	// we transition from mark termination to sweep.
417	sweepWaiters struct {
418		lock mutex
419		list gList
420	}
421
422	// strongFromWeak controls how the GC interacts with weak->strong
423	// pointer conversions.
424	strongFromWeak struct {
425		// block is a flag set during mark termination that prevents
426		// new weak->strong conversions from executing by blocking the
427		// goroutine and enqueuing it onto q.
428		//
429		// Mutated only by one goroutine at a time in gcMarkDone,
430		// with globally-synchronizing events like forEachP and
431		// stopTheWorld.
432		block bool
433
434		// q is a queue of goroutines that attempted to perform a
435		// weak->strong conversion during mark termination.
436		//
437		// Protected by lock.
438		lock mutex
439		q    gQueue
440	}
441
442	// cycles is the number of completed GC cycles, where a GC
443	// cycle is sweep termination, mark, mark termination, and
444	// sweep. This differs from memstats.numgc, which is
445	// incremented at mark termination.
446	cycles atomic.Uint32
447
448	// Timing/utilization stats for this cycle.
449	stwprocs, maxprocs                 int32
450	tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start
451
452	// pauseNS is the total STW time this cycle, measured as the time between
453	// when stopping began (just before trying to stop Ps) and just after the
454	// world started again.
455	pauseNS int64
456
457	// debug.gctrace heap sizes for this cycle.
458	heap0, heap1, heap2 uint64
459
460	// Cumulative estimated CPU usage.
461	cpuStats
462}
463
464// GC runs a garbage collection and blocks the caller until the
465// garbage collection is complete. It may also block the entire
466// program.
467func GC() {
468	// We consider a cycle to be: sweep termination, mark, mark
469	// termination, and sweep. This function shouldn't return
470	// until a full cycle has been completed, from beginning to
471	// end. Hence, we always want to finish up the current cycle
472	// and start a new one. That means:
473	//
474	// 1. In sweep termination, mark, or mark termination of cycle
475	// N, wait until mark termination N completes and transitions
476	// to sweep N.
477	//
478	// 2. In sweep N, help with sweep N.
479	//
480	// At this point we can begin a full cycle N+1.
481	//
482	// 3. Trigger cycle N+1 by starting sweep termination N+1.
483	//
484	// 4. Wait for mark termination N+1 to complete.
485	//
486	// 5. Help with sweep N+1 until it's done.
487	//
488	// This all has to be written to deal with the fact that the
489	// GC may move ahead on its own. For example, when we block
490	// until mark termination N, we may wake up in cycle N+2.
491
492	// Wait until the current sweep termination, mark, and mark
493	// termination complete.
494	n := work.cycles.Load()
495	gcWaitOnMark(n)
496
497	// We're now in sweep N or later. Trigger GC cycle N+1, which
498	// will first finish sweep N if necessary and then enter sweep
499	// termination N+1.
500	gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
501
502	// Wait for mark termination N+1 to complete.
503	gcWaitOnMark(n + 1)
504
505	// Finish sweep N+1 before returning. We do this both to
506	// complete the cycle and because runtime.GC() is often used
507	// as part of tests and benchmarks to get the system into a
508	// relatively stable and isolated state.
509	for work.cycles.Load() == n+1 && sweepone() != ^uintptr(0) {
510		Gosched()
511	}
512
513	// Callers may assume that the heap profile reflects the
514	// just-completed cycle when this returns (historically this
515	// happened because this was a STW GC), but right now the
516	// profile still reflects mark termination N, not N+1.
517	//
518	// As soon as all of the sweep frees from cycle N+1 are done,
519	// we can go ahead and publish the heap profile.
520	//
521	// First, wait for sweeping to finish. (We know there are no
522	// more spans on the sweep queue, but we may be concurrently
523	// sweeping spans, so we have to wait.)
524	for work.cycles.Load() == n+1 && !isSweepDone() {
525		Gosched()
526	}
527
528	// Now we're really done with sweeping, so we can publish the
529	// stable heap profile. Only do this if we haven't already hit
530	// another mark termination.
531	mp := acquirem()
532	cycle := work.cycles.Load()
533	if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) {
534		mProf_PostSweep()
535	}
536	releasem(mp)
537}
538
539// gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has
540// already completed this mark phase, it returns immediately.
541func gcWaitOnMark(n uint32) {
542	for {
543		// Disable phase transitions.
544		lock(&work.sweepWaiters.lock)
545		nMarks := work.cycles.Load()
546		if gcphase != _GCmark {
547			// We've already completed this cycle's mark.
548			nMarks++
549		}
550		if nMarks > n {
551			// We're done.
552			unlock(&work.sweepWaiters.lock)
553			return
554		}
555
556		// Wait until sweep termination, mark, and mark
557		// termination of cycle N complete.
558		work.sweepWaiters.list.push(getg())
559		goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceBlockUntilGCEnds, 1)
560	}
561}
562
563// gcMode indicates how concurrent a GC cycle should be.
564type gcMode int
565
566const (
567	gcBackgroundMode gcMode = iota // concurrent GC and sweep
568	gcForceMode                    // stop-the-world GC now, concurrent sweep
569	gcForceBlockMode               // stop-the-world GC now and STW sweep (forced by user)
570)
571
572// A gcTrigger is a predicate for starting a GC cycle. Specifically,
573// it is an exit condition for the _GCoff phase.
574type gcTrigger struct {
575	kind gcTriggerKind
576	now  int64  // gcTriggerTime: current time
577	n    uint32 // gcTriggerCycle: cycle number to start
578}
579
580type gcTriggerKind int
581
582const (
583	// gcTriggerHeap indicates that a cycle should be started when
584	// the heap size reaches the trigger heap size computed by the
585	// controller.
586	gcTriggerHeap gcTriggerKind = iota
587
588	// gcTriggerTime indicates that a cycle should be started when
589	// it's been more than forcegcperiod nanoseconds since the
590	// previous GC cycle.
591	gcTriggerTime
592
593	// gcTriggerCycle indicates that a cycle should be started if
594	// we have not yet started cycle number gcTrigger.n (relative
595	// to work.cycles).
596	gcTriggerCycle
597)
598
599// test reports whether the trigger condition is satisfied, meaning
600// that the exit condition for the _GCoff phase has been met. The exit
601// condition should be tested when allocating.
602func (t gcTrigger) test() bool {
603	if !memstats.enablegc || panicking.Load() != 0 || gcphase != _GCoff {
604		return false
605	}
606	switch t.kind {
607	case gcTriggerHeap:
608		trigger, _ := gcController.trigger()
609		return gcController.heapLive.Load() >= trigger
610	case gcTriggerTime:
611		if gcController.gcPercent.Load() < 0 {
612			return false
613		}
614		lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
615		return lastgc != 0 && t.now-lastgc > forcegcperiod
616	case gcTriggerCycle:
617		// t.n > work.cycles, but accounting for wraparound.
618		return int32(t.n-work.cycles.Load()) > 0
619	}
620	return true
621}
622
623// gcStart starts the GC. It transitions from _GCoff to _GCmark (if
624// debug.gcstoptheworld == 0) or performs all of GC (if
625// debug.gcstoptheworld != 0).
626//
627// This may return without performing this transition in some cases,
628// such as when called on a system stack or with locks held.
629func gcStart(trigger gcTrigger) {
630	// Since this is called from malloc and malloc is called in
631	// the guts of a number of libraries that might be holding
632	// locks, don't attempt to start GC in non-preemptible or
633	// potentially unstable situations.
634	mp := acquirem()
635	if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
636		releasem(mp)
637		return
638	}
639	releasem(mp)
640	mp = nil
641
642	// Pick up the remaining unswept/not being swept spans concurrently
643	//
644	// This shouldn't happen if we're being invoked in background
645	// mode since proportional sweep should have just finished
646	// sweeping everything, but rounding errors, etc, may leave a
647	// few spans unswept. In forced mode, this is necessary since
648	// GC can be forced at any point in the sweeping cycle.
649	//
650	// We check the transition condition continuously here in case
651	// this G gets delayed in to the next GC cycle.
652	for trigger.test() && sweepone() != ^uintptr(0) {
653	}
654
655	// Perform GC initialization and the sweep termination
656	// transition.
657	semacquire(&work.startSema)
658	// Re-check transition condition under transition lock.
659	if !trigger.test() {
660		semrelease(&work.startSema)
661		return
662	}
663
664	// In gcstoptheworld debug mode, upgrade the mode accordingly.
665	// We do this after re-checking the transition condition so
666	// that multiple goroutines that detect the heap trigger don't
667	// start multiple STW GCs.
668	mode := gcBackgroundMode
669	if debug.gcstoptheworld == 1 {
670		mode = gcForceMode
671	} else if debug.gcstoptheworld == 2 {
672		mode = gcForceBlockMode
673	}
674
675	// Ok, we're doing it! Stop everybody else
676	semacquire(&gcsema)
677	semacquire(&worldsema)
678
679	// For stats, check if this GC was forced by the user.
680	// Update it under gcsema to avoid gctrace getting wrong values.
681	work.userForced = trigger.kind == gcTriggerCycle
682
683	trace := traceAcquire()
684	if trace.ok() {
685		trace.GCStart()
686		traceRelease(trace)
687	}
688
689	// Check that all Ps have finished deferred mcache flushes.
690	for _, p := range allp {
691		if fg := p.mcache.flushGen.Load(); fg != mheap_.sweepgen {
692			println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen)
693			throw("p mcache not flushed")
694		}
695	}
696
697	gcBgMarkStartWorkers()
698
699	systemstack(gcResetMarkState)
700
701	work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
702	if work.stwprocs > ncpu {
703		// This is used to compute CPU time of the STW phases,
704		// so it can't be more than ncpu, even if GOMAXPROCS is.
705		work.stwprocs = ncpu
706	}
707	work.heap0 = gcController.heapLive.Load()
708	work.pauseNS = 0
709	work.mode = mode
710
711	now := nanotime()
712	work.tSweepTerm = now
713	var stw worldStop
714	systemstack(func() {
715		stw = stopTheWorldWithSema(stwGCSweepTerm)
716	})
717
718	// Accumulate fine-grained stopping time.
719	work.cpuStats.accumulateGCPauseTime(stw.stoppingCPUTime, 1)
720
721	// Finish sweep before we start concurrent scan.
722	systemstack(func() {
723		finishsweep_m()
724	})
725
726	// clearpools before we start the GC. If we wait the memory will not be
727	// reclaimed until the next GC cycle.
728	clearpools()
729
730	work.cycles.Add(1)
731
732	// Assists and workers can start the moment we start
733	// the world.
734	gcController.startCycle(now, int(gomaxprocs), trigger)
735
736	// Notify the CPU limiter that assists may begin.
737	gcCPULimiter.startGCTransition(true, now)
738
739	// In STW mode, disable scheduling of user Gs. This may also
740	// disable scheduling of this goroutine, so it may block as
741	// soon as we start the world again.
742	if mode != gcBackgroundMode {
743		schedEnableUser(false)
744	}
745
746	// Enter concurrent mark phase and enable
747	// write barriers.
748	//
749	// Because the world is stopped, all Ps will
750	// observe that write barriers are enabled by
751	// the time we start the world and begin
752	// scanning.
753	//
754	// Write barriers must be enabled before assists are
755	// enabled because they must be enabled before
756	// any non-leaf heap objects are marked. Since
757	// allocations are blocked until assists can
758	// happen, we want to enable assists as early as
759	// possible.
760	setGCPhase(_GCmark)
761
762	gcBgMarkPrepare() // Must happen before assists are enabled.
763	gcMarkRootPrepare()
764
765	// Mark all active tinyalloc blocks. Since we're
766	// allocating from these, they need to be black like
767	// other allocations. The alternative is to blacken
768	// the tiny block on every allocation from it, which
769	// would slow down the tiny allocator.
770	gcMarkTinyAllocs()
771
772	// At this point all Ps have enabled the write
773	// barrier, thus maintaining the no white to
774	// black invariant. Enable mutator assists to
775	// put back-pressure on fast allocating
776	// mutators.
777	atomic.Store(&gcBlackenEnabled, 1)
778
779	// In STW mode, we could block the instant systemstack
780	// returns, so make sure we're not preemptible.
781	mp = acquirem()
782
783	// Update the CPU stats pause time.
784	//
785	// Use maxprocs instead of stwprocs here because the total time
786	// computed in the CPU stats is based on maxprocs, and we want them
787	// to be comparable.
788	work.cpuStats.accumulateGCPauseTime(nanotime()-stw.finishedStopping, work.maxprocs)
789
790	// Concurrent mark.
791	systemstack(func() {
792		now = startTheWorldWithSema(0, stw)
793		work.pauseNS += now - stw.startedStopping
794		work.tMark = now
795
796		// Release the CPU limiter.
797		gcCPULimiter.finishGCTransition(now)
798	})
799
800	// Release the world sema before Gosched() in STW mode
801	// because we will need to reacquire it later but before
802	// this goroutine becomes runnable again, and we could
803	// self-deadlock otherwise.
804	semrelease(&worldsema)
805	releasem(mp)
806
807	// Make sure we block instead of returning to user code
808	// in STW mode.
809	if mode != gcBackgroundMode {
810		Gosched()
811	}
812
813	semrelease(&work.startSema)
814}
815
816// gcMarkDoneFlushed counts the number of P's with flushed work.
817//
818// Ideally this would be a captured local in gcMarkDone, but forEachP
819// escapes its callback closure, so it can't capture anything.
820//
821// This is protected by markDoneSema.
822var gcMarkDoneFlushed uint32
823
824// gcDebugMarkDone contains fields used to debug/test mark termination.
825var gcDebugMarkDone struct {
826	// spinAfterRaggedBarrier forces gcMarkDone to spin after it executes
827	// the ragged barrier.
828	spinAfterRaggedBarrier atomic.Bool
829
830	// restartedDueTo27993 indicates that we restarted mark termination
831	// due to the bug described in issue #27993.
832	//
833	// Protected by worldsema.
834	restartedDueTo27993 bool
835}
836
837// gcMarkDone transitions the GC from mark to mark termination if all
838// reachable objects have been marked (that is, there are no grey
839// objects and can be no more in the future). Otherwise, it flushes
840// all local work to the global queues where it can be discovered by
841// other workers.
842//
843// This should be called when all local mark work has been drained and
844// there are no remaining workers. Specifically, when
845//
846//	work.nwait == work.nproc && !gcMarkWorkAvailable(p)
847//
848// The calling context must be preemptible.
849//
850// Flushing local work is important because idle Ps may have local
851// work queued. This is the only way to make that work visible and
852// drive GC to completion.
853//
854// It is explicitly okay to have write barriers in this function. If
855// it does transition to mark termination, then all reachable objects
856// have been marked, so the write barrier cannot shade any more
857// objects.
858func gcMarkDone() {
859	// Ensure only one thread is running the ragged barrier at a
860	// time.
861	semacquire(&work.markDoneSema)
862
863top:
864	// Re-check transition condition under transition lock.
865	//
866	// It's critical that this checks the global work queues are
867	// empty before performing the ragged barrier. Otherwise,
868	// there could be global work that a P could take after the P
869	// has passed the ragged barrier.
870	if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
871		semrelease(&work.markDoneSema)
872		return
873	}
874
875	// forEachP needs worldsema to execute, and we'll need it to
876	// stop the world later, so acquire worldsema now.
877	semacquire(&worldsema)
878
879	// Prevent weak->strong conversions from generating additional
880	// GC work. forEachP will guarantee that it is observed globally.
881	work.strongFromWeak.block = true
882
883	// Flush all local buffers and collect flushedWork flags.
884	gcMarkDoneFlushed = 0
885	forEachP(waitReasonGCMarkTermination, func(pp *p) {
886		// Flush the write barrier buffer, since this may add
887		// work to the gcWork.
888		wbBufFlush1(pp)
889
890		// Flush the gcWork, since this may create global work
891		// and set the flushedWork flag.
892		//
893		// TODO(austin): Break up these workbufs to
894		// better distribute work.
895		pp.gcw.dispose()
896		// Collect the flushedWork flag.
897		if pp.gcw.flushedWork {
898			atomic.Xadd(&gcMarkDoneFlushed, 1)
899			pp.gcw.flushedWork = false
900		}
901	})
902
903	if gcMarkDoneFlushed != 0 {
904		// More grey objects were discovered since the
905		// previous termination check, so there may be more
906		// work to do. Keep going. It's possible the
907		// transition condition became true again during the
908		// ragged barrier, so re-check it.
909		semrelease(&worldsema)
910		goto top
911	}
912
913	// For debugging/testing.
914	for gcDebugMarkDone.spinAfterRaggedBarrier.Load() {
915	}
916
917	// There was no global work, no local work, and no Ps
918	// communicated work since we took markDoneSema. Therefore
919	// there are no grey objects and no more objects can be
920	// shaded. Transition to mark termination.
921	now := nanotime()
922	work.tMarkTerm = now
923	getg().m.preemptoff = "gcing"
924	var stw worldStop
925	systemstack(func() {
926		stw = stopTheWorldWithSema(stwGCMarkTerm)
927	})
928	// The gcphase is _GCmark, it will transition to _GCmarktermination
929	// below. The important thing is that the wb remains active until
930	// all marking is complete. This includes writes made by the GC.
931
932	// Accumulate fine-grained stopping time.
933	work.cpuStats.accumulateGCPauseTime(stw.stoppingCPUTime, 1)
934
935	// There is sometimes work left over when we enter mark termination due
936	// to write barriers performed after the completion barrier above.
937	// Detect this and resume concurrent mark. This is obviously
938	// unfortunate.
939	//
940	// See issue #27993 for details.
941	//
942	// Switch to the system stack to call wbBufFlush1, though in this case
943	// it doesn't matter because we're non-preemptible anyway.
944	restart := false
945	systemstack(func() {
946		for _, p := range allp {
947			wbBufFlush1(p)
948			if !p.gcw.empty() {
949				restart = true
950				break
951			}
952		}
953	})
954	if restart {
955		gcDebugMarkDone.restartedDueTo27993 = true
956
957		getg().m.preemptoff = ""
958		systemstack(func() {
959			// Accumulate the time we were stopped before we had to start again.
960			work.cpuStats.accumulateGCPauseTime(nanotime()-stw.finishedStopping, work.maxprocs)
961
962			// Start the world again.
963			now := startTheWorldWithSema(0, stw)
964			work.pauseNS += now - stw.startedStopping
965		})
966		semrelease(&worldsema)
967		goto top
968	}
969
970	gcComputeStartingStackSize()
971
972	// Disable assists and background workers. We must do
973	// this before waking blocked assists.
974	atomic.Store(&gcBlackenEnabled, 0)
975
976	// Notify the CPU limiter that GC assists will now cease.
977	gcCPULimiter.startGCTransition(false, now)
978
979	// Wake all blocked assists. These will run when we
980	// start the world again.
981	gcWakeAllAssists()
982
983	// Wake all blocked weak->strong conversions. These will run
984	// when we start the world again.
985	work.strongFromWeak.block = false
986	gcWakeAllStrongFromWeak()
987
988	// Likewise, release the transition lock. Blocked
989	// workers and assists will run when we start the
990	// world again.
991	semrelease(&work.markDoneSema)
992
993	// In STW mode, re-enable user goroutines. These will be
994	// queued to run after we start the world.
995	schedEnableUser(true)
996
997	// endCycle depends on all gcWork cache stats being flushed.
998	// The termination algorithm above ensured that up to
999	// allocations since the ragged barrier.
1000	gcController.endCycle(now, int(gomaxprocs), work.userForced)
1001
1002	// Perform mark termination. This will restart the world.
1003	gcMarkTermination(stw)
1004}
1005
1006// World must be stopped and mark assists and background workers must be
1007// disabled.
1008func gcMarkTermination(stw worldStop) {
1009	// Start marktermination (write barrier remains enabled for now).
1010	setGCPhase(_GCmarktermination)
1011
1012	work.heap1 = gcController.heapLive.Load()
1013	startTime := nanotime()
1014
1015	mp := acquirem()
1016	mp.preemptoff = "gcing"
1017	mp.traceback = 2
1018	curgp := mp.curg
1019	// N.B. The execution tracer is not aware of this status
1020	// transition and handles it specially based on the
1021	// wait reason.
1022	casGToWaitingForGC(curgp, _Grunning, waitReasonGarbageCollection)
1023
1024	// Run gc on the g0 stack. We do this so that the g stack
1025	// we're currently running on will no longer change. Cuts
1026	// the root set down a bit (g0 stacks are not scanned, and
1027	// we don't need to scan gc's internal state).  We also
1028	// need to switch to g0 so we can shrink the stack.
1029	systemstack(func() {
1030		gcMark(startTime)
1031		// Must return immediately.
1032		// The outer function's stack may have moved
1033		// during gcMark (it shrinks stacks, including the
1034		// outer function's stack), so we must not refer
1035		// to any of its variables. Return back to the
1036		// non-system stack to pick up the new addresses
1037		// before continuing.
1038	})
1039
1040	var stwSwept bool
1041	systemstack(func() {
1042		work.heap2 = work.bytesMarked
1043		if debug.gccheckmark > 0 {
1044			// Run a full non-parallel, stop-the-world
1045			// mark using checkmark bits, to check that we
1046			// didn't forget to mark anything during the
1047			// concurrent mark process.
1048			startCheckmarks()
1049			gcResetMarkState()
1050			gcw := &getg().m.p.ptr().gcw
1051			gcDrain(gcw, 0)
1052			wbBufFlush1(getg().m.p.ptr())
1053			gcw.dispose()
1054			endCheckmarks()
1055		}
1056
1057		// marking is complete so we can turn the write barrier off
1058		setGCPhase(_GCoff)
1059		stwSwept = gcSweep(work.mode)
1060	})
1061
1062	mp.traceback = 0
1063	casgstatus(curgp, _Gwaiting, _Grunning)
1064
1065	trace := traceAcquire()
1066	if trace.ok() {
1067		trace.GCDone()
1068		traceRelease(trace)
1069	}
1070
1071	// all done
1072	mp.preemptoff = ""
1073
1074	if gcphase != _GCoff {
1075		throw("gc done but gcphase != _GCoff")
1076	}
1077
1078	// Record heapInUse for scavenger.
1079	memstats.lastHeapInUse = gcController.heapInUse.load()
1080
1081	// Update GC trigger and pacing, as well as downstream consumers
1082	// of this pacing information, for the next cycle.
1083	systemstack(gcControllerCommit)
1084
1085	// Update timing memstats
1086	now := nanotime()
1087	sec, nsec, _ := time_now()
1088	unixNow := sec*1e9 + int64(nsec)
1089	work.pauseNS += now - stw.startedStopping
1090	work.tEnd = now
1091	atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
1092	atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
1093	memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
1094	memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
1095	memstats.pause_total_ns += uint64(work.pauseNS)
1096
1097	// Accumulate CPU stats.
1098	//
1099	// Use maxprocs instead of stwprocs for GC pause time because the total time
1100	// computed in the CPU stats is based on maxprocs, and we want them to be
1101	// comparable.
1102	//
1103	// Pass gcMarkPhase=true to accumulate so we can get all the latest GC CPU stats
1104	// in there too.
1105	work.cpuStats.accumulateGCPauseTime(now-stw.finishedStopping, work.maxprocs)
1106	work.cpuStats.accumulate(now, true)
1107
1108	// Compute overall GC CPU utilization.
1109	// Omit idle marking time from the overall utilization here since it's "free".
1110	memstats.gc_cpu_fraction = float64(work.cpuStats.GCTotalTime-work.cpuStats.GCIdleTime) / float64(work.cpuStats.TotalTime)
1111
1112	// Reset assist time and background time stats.
1113	//
1114	// Do this now, instead of at the start of the next GC cycle, because
1115	// these two may keep accumulating even if the GC is not active.
1116	scavenge.assistTime.Store(0)
1117	scavenge.backgroundTime.Store(0)
1118
1119	// Reset idle time stat.
1120	sched.idleTime.Store(0)
1121
1122	if work.userForced {
1123		memstats.numforcedgc++
1124	}
1125
1126	// Bump GC cycle count and wake goroutines waiting on sweep.
1127	lock(&work.sweepWaiters.lock)
1128	memstats.numgc++
1129	injectglist(&work.sweepWaiters.list)
1130	unlock(&work.sweepWaiters.lock)
1131
1132	// Increment the scavenge generation now.
1133	//
1134	// This moment represents peak heap in use because we're
1135	// about to start sweeping.
1136	mheap_.pages.scav.index.nextGen()
1137
1138	// Release the CPU limiter.
1139	gcCPULimiter.finishGCTransition(now)
1140
1141	// Finish the current heap profiling cycle and start a new
1142	// heap profiling cycle. We do this before starting the world
1143	// so events don't leak into the wrong cycle.
1144	mProf_NextCycle()
1145
1146	// There may be stale spans in mcaches that need to be swept.
1147	// Those aren't tracked in any sweep lists, so we need to
1148	// count them against sweep completion until we ensure all
1149	// those spans have been forced out.
1150	//
1151	// If gcSweep fully swept the heap (for example if the sweep
1152	// is not concurrent due to a GODEBUG setting), then we expect
1153	// the sweepLocker to be invalid, since sweeping is done.
1154	//
1155	// N.B. Below we might duplicate some work from gcSweep; this is
1156	// fine as all that work is idempotent within a GC cycle, and
1157	// we're still holding worldsema so a new cycle can't start.
1158	sl := sweep.active.begin()
1159	if !stwSwept && !sl.valid {
1160		throw("failed to set sweep barrier")
1161	} else if stwSwept && sl.valid {
1162		throw("non-concurrent sweep failed to drain all sweep queues")
1163	}
1164
1165	systemstack(func() {
1166		// The memstats updated above must be updated with the world
1167		// stopped to ensure consistency of some values, such as
1168		// sched.idleTime and sched.totaltime. memstats also include
1169		// the pause time (work,pauseNS), forcing computation of the
1170		// total pause time before the pause actually ends.
1171		//
1172		// Here we reuse the same now for start the world so that the
1173		// time added to /sched/pauses/total/gc:seconds will be
1174		// consistent with the value in memstats.
1175		startTheWorldWithSema(now, stw)
1176	})
1177
1178	// Flush the heap profile so we can start a new cycle next GC.
1179	// This is relatively expensive, so we don't do it with the
1180	// world stopped.
1181	mProf_Flush()
1182
1183	// Prepare workbufs for freeing by the sweeper. We do this
1184	// asynchronously because it can take non-trivial time.
1185	prepareFreeWorkbufs()
1186
1187	// Free stack spans. This must be done between GC cycles.
1188	systemstack(freeStackSpans)
1189
1190	// Ensure all mcaches are flushed. Each P will flush its own
1191	// mcache before allocating, but idle Ps may not. Since this
1192	// is necessary to sweep all spans, we need to ensure all
1193	// mcaches are flushed before we start the next GC cycle.
1194	//
1195	// While we're here, flush the page cache for idle Ps to avoid
1196	// having pages get stuck on them. These pages are hidden from
1197	// the scavenger, so in small idle heaps a significant amount
1198	// of additional memory might be held onto.
1199	//
1200	// Also, flush the pinner cache, to avoid leaking that memory
1201	// indefinitely.
1202	forEachP(waitReasonFlushProcCaches, func(pp *p) {
1203		pp.mcache.prepareForSweep()
1204		if pp.status == _Pidle {
1205			systemstack(func() {
1206				lock(&mheap_.lock)
1207				pp.pcache.flush(&mheap_.pages)
1208				unlock(&mheap_.lock)
1209			})
1210		}
1211		pp.pinnerCache = nil
1212	})
1213	if sl.valid {
1214		// Now that we've swept stale spans in mcaches, they don't
1215		// count against unswept spans.
1216		//
1217		// Note: this sweepLocker may not be valid if sweeping had
1218		// already completed during the STW. See the corresponding
1219		// begin() call that produced sl.
1220		sweep.active.end(sl)
1221	}
1222
1223	// Print gctrace before dropping worldsema. As soon as we drop
1224	// worldsema another cycle could start and smash the stats
1225	// we're trying to print.
1226	if debug.gctrace > 0 {
1227		util := int(memstats.gc_cpu_fraction * 100)
1228
1229		var sbuf [24]byte
1230		printlock()
1231		print("gc ", memstats.numgc,
1232			" @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ",
1233			util, "%: ")
1234		prev := work.tSweepTerm
1235		for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
1236			if i != 0 {
1237				print("+")
1238			}
1239			print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
1240			prev = ns
1241		}
1242		print(" ms clock, ")
1243		for i, ns := range []int64{
1244			int64(work.stwprocs) * (work.tMark - work.tSweepTerm),
1245			gcController.assistTime.Load(),
1246			gcController.dedicatedMarkTime.Load() + gcController.fractionalMarkTime.Load(),
1247			gcController.idleMarkTime.Load(),
1248			int64(work.stwprocs) * (work.tEnd - work.tMarkTerm),
1249		} {
1250			if i == 2 || i == 3 {
1251				// Separate mark time components with /.
1252				print("/")
1253			} else if i != 0 {
1254				print("+")
1255			}
1256			print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
1257		}
1258		print(" ms cpu, ",
1259			work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
1260			gcController.lastHeapGoal>>20, " MB goal, ",
1261			gcController.lastStackScan.Load()>>20, " MB stacks, ",
1262			gcController.globalsScan.Load()>>20, " MB globals, ",
1263			work.maxprocs, " P")
1264		if work.userForced {
1265			print(" (forced)")
1266		}
1267		print("\n")
1268		printunlock()
1269	}
1270
1271	// Set any arena chunks that were deferred to fault.
1272	lock(&userArenaState.lock)
1273	faultList := userArenaState.fault
1274	userArenaState.fault = nil
1275	unlock(&userArenaState.lock)
1276	for _, lc := range faultList {
1277		lc.mspan.setUserArenaChunkToFault()
1278	}
1279
1280	// Enable huge pages on some metadata if we cross a heap threshold.
1281	if gcController.heapGoal() > minHeapForMetadataHugePages {
1282		systemstack(func() {
1283			mheap_.enableMetadataHugePages()
1284		})
1285	}
1286
1287	semrelease(&worldsema)
1288	semrelease(&gcsema)
1289	// Careful: another GC cycle may start now.
1290
1291	releasem(mp)
1292	mp = nil
1293
1294	// now that gc is done, kick off finalizer thread if needed
1295	if !concurrentSweep {
1296		// give the queued finalizers, if any, a chance to run
1297		Gosched()
1298	}
1299}
1300
1301// gcBgMarkStartWorkers prepares background mark worker goroutines. These
1302// goroutines will not run until the mark phase, but they must be started while
1303// the work is not stopped and from a regular G stack. The caller must hold
1304// worldsema.
1305func gcBgMarkStartWorkers() {
1306	// Background marking is performed by per-P G's. Ensure that each P has
1307	// a background GC G.
1308	//
1309	// Worker Gs don't exit if gomaxprocs is reduced. If it is raised
1310	// again, we can reuse the old workers; no need to create new workers.
1311	if gcBgMarkWorkerCount >= gomaxprocs {
1312		return
1313	}
1314
1315	// Increment mp.locks when allocating. We are called within gcStart,
1316	// and thus must not trigger another gcStart via an allocation. gcStart
1317	// bails when allocating with locks held, so simulate that for these
1318	// allocations.
1319	//
1320	// TODO(prattmic): cleanup gcStart to use a more explicit "in gcStart"
1321	// check for bailing.
1322	mp := acquirem()
1323	ready := make(chan struct{}, 1)
1324	releasem(mp)
1325
1326	for gcBgMarkWorkerCount < gomaxprocs {
1327		mp := acquirem() // See above, we allocate a closure here.
1328		go gcBgMarkWorker(ready)
1329		releasem(mp)
1330
1331		// N.B. we intentionally wait on each goroutine individually
1332		// rather than starting all in a batch and then waiting once
1333		// afterwards. By running one goroutine at a time, we can take
1334		// advantage of runnext to bounce back and forth between
1335		// workers and this goroutine. In an overloaded application,
1336		// this can reduce GC start latency by prioritizing these
1337		// goroutines rather than waiting on the end of the run queue.
1338		<-ready
1339		// The worker is now guaranteed to be added to the pool before
1340		// its P's next findRunnableGCWorker.
1341
1342		gcBgMarkWorkerCount++
1343	}
1344}
1345
1346// gcBgMarkPrepare sets up state for background marking.
1347// Mutator assists must not yet be enabled.
1348func gcBgMarkPrepare() {
1349	// Background marking will stop when the work queues are empty
1350	// and there are no more workers (note that, since this is
1351	// concurrent, this may be a transient state, but mark
1352	// termination will clean it up). Between background workers
1353	// and assists, we don't really know how many workers there
1354	// will be, so we pretend to have an arbitrarily large number
1355	// of workers, almost all of which are "waiting". While a
1356	// worker is working it decrements nwait. If nproc == nwait,
1357	// there are no workers.
1358	work.nproc = ^uint32(0)
1359	work.nwait = ^uint32(0)
1360}
1361
1362// gcBgMarkWorkerNode is an entry in the gcBgMarkWorkerPool. It points to a single
1363// gcBgMarkWorker goroutine.
1364type gcBgMarkWorkerNode struct {
1365	// Unused workers are managed in a lock-free stack. This field must be first.
1366	node lfnode
1367
1368	// The g of this worker.
1369	gp guintptr
1370
1371	// Release this m on park. This is used to communicate with the unlock
1372	// function, which cannot access the G's stack. It is unused outside of
1373	// gcBgMarkWorker().
1374	m muintptr
1375}
1376
1377func gcBgMarkWorker(ready chan struct{}) {
1378	gp := getg()
1379
1380	// We pass node to a gopark unlock function, so it can't be on
1381	// the stack (see gopark). Prevent deadlock from recursively
1382	// starting GC by disabling preemption.
1383	gp.m.preemptoff = "GC worker init"
1384	node := new(gcBgMarkWorkerNode)
1385	gp.m.preemptoff = ""
1386
1387	node.gp.set(gp)
1388
1389	node.m.set(acquirem())
1390
1391	ready <- struct{}{}
1392	// After this point, the background mark worker is generally scheduled
1393	// cooperatively by gcController.findRunnableGCWorker. While performing
1394	// work on the P, preemption is disabled because we are working on
1395	// P-local work buffers. When the preempt flag is set, this puts itself
1396	// into _Gwaiting to be woken up by gcController.findRunnableGCWorker
1397	// at the appropriate time.
1398	//
1399	// When preemption is enabled (e.g., while in gcMarkDone), this worker
1400	// may be preempted and schedule as a _Grunnable G from a runq. That is
1401	// fine; it will eventually gopark again for further scheduling via
1402	// findRunnableGCWorker.
1403	//
1404	// Since we disable preemption before notifying ready, we guarantee that
1405	// this G will be in the worker pool for the next findRunnableGCWorker.
1406	// This isn't strictly necessary, but it reduces latency between
1407	// _GCmark starting and the workers starting.
1408
1409	for {
1410		// Go to sleep until woken by
1411		// gcController.findRunnableGCWorker.
1412		gopark(func(g *g, nodep unsafe.Pointer) bool {
1413			node := (*gcBgMarkWorkerNode)(nodep)
1414
1415			if mp := node.m.ptr(); mp != nil {
1416				// The worker G is no longer running; release
1417				// the M.
1418				//
1419				// N.B. it is _safe_ to release the M as soon
1420				// as we are no longer performing P-local mark
1421				// work.
1422				//
1423				// However, since we cooperatively stop work
1424				// when gp.preempt is set, if we releasem in
1425				// the loop then the following call to gopark
1426				// would immediately preempt the G. This is
1427				// also safe, but inefficient: the G must
1428				// schedule again only to enter gopark and park
1429				// again. Thus, we defer the release until
1430				// after parking the G.
1431				releasem(mp)
1432			}
1433
1434			// Release this G to the pool.
1435			gcBgMarkWorkerPool.push(&node.node)
1436			// Note that at this point, the G may immediately be
1437			// rescheduled and may be running.
1438			return true
1439		}, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceBlockSystemGoroutine, 0)
1440
1441		// Preemption must not occur here, or another G might see
1442		// p.gcMarkWorkerMode.
1443
1444		// Disable preemption so we can use the gcw. If the
1445		// scheduler wants to preempt us, we'll stop draining,
1446		// dispose the gcw, and then preempt.
1447		node.m.set(acquirem())
1448		pp := gp.m.p.ptr() // P can't change with preemption disabled.
1449
1450		if gcBlackenEnabled == 0 {
1451			println("worker mode", pp.gcMarkWorkerMode)
1452			throw("gcBgMarkWorker: blackening not enabled")
1453		}
1454
1455		if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker {
1456			throw("gcBgMarkWorker: mode not set")
1457		}
1458
1459		startTime := nanotime()
1460		pp.gcMarkWorkerStartTime = startTime
1461		var trackLimiterEvent bool
1462		if pp.gcMarkWorkerMode == gcMarkWorkerIdleMode {
1463			trackLimiterEvent = pp.limiterEvent.start(limiterEventIdleMarkWork, startTime)
1464		}
1465
1466		decnwait := atomic.Xadd(&work.nwait, -1)
1467		if decnwait == work.nproc {
1468			println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
1469			throw("work.nwait was > work.nproc")
1470		}
1471
1472		systemstack(func() {
1473			// Mark our goroutine preemptible so its stack
1474			// can be scanned. This lets two mark workers
1475			// scan each other (otherwise, they would
1476			// deadlock). We must not modify anything on
1477			// the G stack. However, stack shrinking is
1478			// disabled for mark workers, so it is safe to
1479			// read from the G stack.
1480			//
1481			// N.B. The execution tracer is not aware of this status
1482			// transition and handles it specially based on the
1483			// wait reason.
1484			casGToWaitingForGC(gp, _Grunning, waitReasonGCWorkerActive)
1485			switch pp.gcMarkWorkerMode {
1486			default:
1487				throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
1488			case gcMarkWorkerDedicatedMode:
1489				gcDrainMarkWorkerDedicated(&pp.gcw, true)
1490				if gp.preempt {
1491					// We were preempted. This is
1492					// a useful signal to kick
1493					// everything out of the run
1494					// queue so it can run
1495					// somewhere else.
1496					if drainQ, n := runqdrain(pp); n > 0 {
1497						lock(&sched.lock)
1498						globrunqputbatch(&drainQ, int32(n))
1499						unlock(&sched.lock)
1500					}
1501				}
1502				// Go back to draining, this time
1503				// without preemption.
1504				gcDrainMarkWorkerDedicated(&pp.gcw, false)
1505			case gcMarkWorkerFractionalMode:
1506				gcDrainMarkWorkerFractional(&pp.gcw)
1507			case gcMarkWorkerIdleMode:
1508				gcDrainMarkWorkerIdle(&pp.gcw)
1509			}
1510			casgstatus(gp, _Gwaiting, _Grunning)
1511		})
1512
1513		// Account for time and mark us as stopped.
1514		now := nanotime()
1515		duration := now - startTime
1516		gcController.markWorkerStop(pp.gcMarkWorkerMode, duration)
1517		if trackLimiterEvent {
1518			pp.limiterEvent.stop(limiterEventIdleMarkWork, now)
1519		}
1520		if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode {
1521			atomic.Xaddint64(&pp.gcFractionalMarkTime, duration)
1522		}
1523
1524		// Was this the last worker and did we run out
1525		// of work?
1526		incnwait := atomic.Xadd(&work.nwait, +1)
1527		if incnwait > work.nproc {
1528			println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode,
1529				"work.nwait=", incnwait, "work.nproc=", work.nproc)
1530			throw("work.nwait > work.nproc")
1531		}
1532
1533		// We'll releasem after this point and thus this P may run
1534		// something else. We must clear the worker mode to avoid
1535		// attributing the mode to a different (non-worker) G in
1536		// traceGoStart.
1537		pp.gcMarkWorkerMode = gcMarkWorkerNotWorker
1538
1539		// If this worker reached a background mark completion
1540		// point, signal the main GC goroutine.
1541		if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
1542			// We don't need the P-local buffers here, allow
1543			// preemption because we may schedule like a regular
1544			// goroutine in gcMarkDone (block on locks, etc).
1545			releasem(node.m.ptr())
1546			node.m.set(nil)
1547
1548			gcMarkDone()
1549		}
1550	}
1551}
1552
1553// gcMarkWorkAvailable reports whether executing a mark worker
1554// on p is potentially useful. p may be nil, in which case it only
1555// checks the global sources of work.
1556func gcMarkWorkAvailable(p *p) bool {
1557	if p != nil && !p.gcw.empty() {
1558		return true
1559	}
1560	if !work.full.empty() {
1561		return true // global work available
1562	}
1563	if work.markrootNext < work.markrootJobs {
1564		return true // root scan work available
1565	}
1566	return false
1567}
1568
1569// gcMark runs the mark (or, for concurrent GC, mark termination)
1570// All gcWork caches must be empty.
1571// STW is in effect at this point.
1572func gcMark(startTime int64) {
1573	if gcphase != _GCmarktermination {
1574		throw("in gcMark expecting to see gcphase as _GCmarktermination")
1575	}
1576	work.tstart = startTime
1577
1578	// Check that there's no marking work remaining.
1579	if work.full != 0 || work.markrootNext < work.markrootJobs {
1580		print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n")
1581		panic("non-empty mark queue after concurrent mark")
1582	}
1583
1584	if debug.gccheckmark > 0 {
1585		// This is expensive when there's a large number of
1586		// Gs, so only do it if checkmark is also enabled.
1587		gcMarkRootCheck()
1588	}
1589
1590	// Drop allg snapshot. allgs may have grown, in which case
1591	// this is the only reference to the old backing store and
1592	// there's no need to keep it around.
1593	work.stackRoots = nil
1594
1595	// Clear out buffers and double-check that all gcWork caches
1596	// are empty. This should be ensured by gcMarkDone before we
1597	// enter mark termination.
1598	//
1599	// TODO: We could clear out buffers just before mark if this
1600	// has a non-negligible impact on STW time.
1601	for _, p := range allp {
1602		// The write barrier may have buffered pointers since
1603		// the gcMarkDone barrier. However, since the barrier
1604		// ensured all reachable objects were marked, all of
1605		// these must be pointers to black objects. Hence we
1606		// can just discard the write barrier buffer.
1607		if debug.gccheckmark > 0 {
1608			// For debugging, flush the buffer and make
1609			// sure it really was all marked.
1610			wbBufFlush1(p)
1611		} else {
1612			p.wbBuf.reset()
1613		}
1614
1615		gcw := &p.gcw
1616		if !gcw.empty() {
1617			printlock()
1618			print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork)
1619			if gcw.wbuf1 == nil {
1620				print(" wbuf1=<nil>")
1621			} else {
1622				print(" wbuf1.n=", gcw.wbuf1.nobj)
1623			}
1624			if gcw.wbuf2 == nil {
1625				print(" wbuf2=<nil>")
1626			} else {
1627				print(" wbuf2.n=", gcw.wbuf2.nobj)
1628			}
1629			print("\n")
1630			throw("P has cached GC work at end of mark termination")
1631		}
1632		// There may still be cached empty buffers, which we
1633		// need to flush since we're going to free them. Also,
1634		// there may be non-zero stats because we allocated
1635		// black after the gcMarkDone barrier.
1636		gcw.dispose()
1637	}
1638
1639	// Flush scanAlloc from each mcache since we're about to modify
1640	// heapScan directly. If we were to flush this later, then scanAlloc
1641	// might have incorrect information.
1642	//
1643	// Note that it's not important to retain this information; we know
1644	// exactly what heapScan is at this point via scanWork.
1645	for _, p := range allp {
1646		c := p.mcache
1647		if c == nil {
1648			continue
1649		}
1650		c.scanAlloc = 0
1651	}
1652
1653	// Reset controller state.
1654	gcController.resetLive(work.bytesMarked)
1655}
1656
1657// gcSweep must be called on the system stack because it acquires the heap
1658// lock. See mheap for details.
1659//
1660// Returns true if the heap was fully swept by this function.
1661//
1662// The world must be stopped.
1663//
1664//go:systemstack
1665func gcSweep(mode gcMode) bool {
1666	assertWorldStopped()
1667
1668	if gcphase != _GCoff {
1669		throw("gcSweep being done but phase is not GCoff")
1670	}
1671
1672	lock(&mheap_.lock)
1673	mheap_.sweepgen += 2
1674	sweep.active.reset()
1675	mheap_.pagesSwept.Store(0)
1676	mheap_.sweepArenas = mheap_.allArenas
1677	mheap_.reclaimIndex.Store(0)
1678	mheap_.reclaimCredit.Store(0)
1679	unlock(&mheap_.lock)
1680
1681	sweep.centralIndex.clear()
1682
1683	if !concurrentSweep || mode == gcForceBlockMode {
1684		// Special case synchronous sweep.
1685		// Record that no proportional sweeping has to happen.
1686		lock(&mheap_.lock)
1687		mheap_.sweepPagesPerByte = 0
1688		unlock(&mheap_.lock)
1689		// Flush all mcaches.
1690		for _, pp := range allp {
1691			pp.mcache.prepareForSweep()
1692		}
1693		// Sweep all spans eagerly.
1694		for sweepone() != ^uintptr(0) {
1695		}
1696		// Free workbufs eagerly.
1697		prepareFreeWorkbufs()
1698		for freeSomeWbufs(false) {
1699		}
1700		// All "free" events for this mark/sweep cycle have
1701		// now happened, so we can make this profile cycle
1702		// available immediately.
1703		mProf_NextCycle()
1704		mProf_Flush()
1705		return true
1706	}
1707
1708	// Background sweep.
1709	lock(&sweep.lock)
1710	if sweep.parked {
1711		sweep.parked = false
1712		ready(sweep.g, 0, true)
1713	}
1714	unlock(&sweep.lock)
1715	return false
1716}
1717
1718// gcResetMarkState resets global state prior to marking (concurrent
1719// or STW) and resets the stack scan state of all Gs.
1720//
1721// This is safe to do without the world stopped because any Gs created
1722// during or after this will start out in the reset state.
1723//
1724// gcResetMarkState must be called on the system stack because it acquires
1725// the heap lock. See mheap for details.
1726//
1727//go:systemstack
1728func gcResetMarkState() {
1729	// This may be called during a concurrent phase, so lock to make sure
1730	// allgs doesn't change.
1731	forEachG(func(gp *g) {
1732		gp.gcscandone = false // set to true in gcphasework
1733		gp.gcAssistBytes = 0
1734	})
1735
1736	// Clear page marks. This is just 1MB per 64GB of heap, so the
1737	// time here is pretty trivial.
1738	lock(&mheap_.lock)
1739	arenas := mheap_.allArenas
1740	unlock(&mheap_.lock)
1741	for _, ai := range arenas {
1742		ha := mheap_.arenas[ai.l1()][ai.l2()]
1743		clear(ha.pageMarks[:])
1744	}
1745
1746	work.bytesMarked = 0
1747	work.initialHeapLive = gcController.heapLive.Load()
1748}
1749
1750// Hooks for other packages
1751
1752var poolcleanup func()
1753var boringCaches []unsafe.Pointer  // for crypto/internal/boring
1754var uniqueMapCleanup chan struct{} // for unique
1755
1756// sync_runtime_registerPoolCleanup should be an internal detail,
1757// but widely used packages access it using linkname.
1758// Notable members of the hall of shame include:
1759//   - github.com/bytedance/gopkg
1760//   - github.com/songzhibin97/gkit
1761//
1762// Do not remove or change the type signature.
1763// See go.dev/issue/67401.
1764//
1765//go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup
1766func sync_runtime_registerPoolCleanup(f func()) {
1767	poolcleanup = f
1768}
1769
1770//go:linkname boring_registerCache crypto/internal/boring/bcache.registerCache
1771func boring_registerCache(p unsafe.Pointer) {
1772	boringCaches = append(boringCaches, p)
1773}
1774
1775//go:linkname unique_runtime_registerUniqueMapCleanup unique.runtime_registerUniqueMapCleanup
1776func unique_runtime_registerUniqueMapCleanup(f func()) {
1777	// Start the goroutine in the runtime so it's counted as a system goroutine.
1778	uniqueMapCleanup = make(chan struct{}, 1)
1779	go func(cleanup func()) {
1780		for {
1781			<-uniqueMapCleanup
1782			cleanup()
1783		}
1784	}(f)
1785}
1786
1787func clearpools() {
1788	// clear sync.Pools
1789	if poolcleanup != nil {
1790		poolcleanup()
1791	}
1792
1793	// clear boringcrypto caches
1794	for _, p := range boringCaches {
1795		atomicstorep(p, nil)
1796	}
1797
1798	// clear unique maps
1799	if uniqueMapCleanup != nil {
1800		select {
1801		case uniqueMapCleanup <- struct{}{}:
1802		default:
1803		}
1804	}
1805
1806	// Clear central sudog cache.
1807	// Leave per-P caches alone, they have strictly bounded size.
1808	// Disconnect cached list before dropping it on the floor,
1809	// so that a dangling ref to one entry does not pin all of them.
1810	lock(&sched.sudoglock)
1811	var sg, sgnext *sudog
1812	for sg = sched.sudogcache; sg != nil; sg = sgnext {
1813		sgnext = sg.next
1814		sg.next = nil
1815	}
1816	sched.sudogcache = nil
1817	unlock(&sched.sudoglock)
1818
1819	// Clear central defer pool.
1820	// Leave per-P pools alone, they have strictly bounded size.
1821	lock(&sched.deferlock)
1822	// disconnect cached list before dropping it on the floor,
1823	// so that a dangling ref to one entry does not pin all of them.
1824	var d, dlink *_defer
1825	for d = sched.deferpool; d != nil; d = dlink {
1826		dlink = d.link
1827		d.link = nil
1828	}
1829	sched.deferpool = nil
1830	unlock(&sched.deferlock)
1831}
1832
1833// Timing
1834
1835// itoaDiv formats val/(10**dec) into buf.
1836func itoaDiv(buf []byte, val uint64, dec int) []byte {
1837	i := len(buf) - 1
1838	idec := i - dec
1839	for val >= 10 || i >= idec {
1840		buf[i] = byte(val%10 + '0')
1841		i--
1842		if i == idec {
1843			buf[i] = '.'
1844			i--
1845		}
1846		val /= 10
1847	}
1848	buf[i] = byte(val + '0')
1849	return buf[i:]
1850}
1851
1852// fmtNSAsMS nicely formats ns nanoseconds as milliseconds.
1853func fmtNSAsMS(buf []byte, ns uint64) []byte {
1854	if ns >= 10e6 {
1855		// Format as whole milliseconds.
1856		return itoaDiv(buf, ns/1e6, 0)
1857	}
1858	// Format two digits of precision, with at most three decimal places.
1859	x := ns / 1e3
1860	if x == 0 {
1861		buf[0] = '0'
1862		return buf[:1]
1863	}
1864	dec := 3
1865	for x >= 100 {
1866		x /= 10
1867		dec--
1868	}
1869	return itoaDiv(buf, x, dec)
1870}
1871
1872// Helpers for testing GC.
1873
1874// gcTestMoveStackOnNextCall causes the stack to be moved on a call
1875// immediately following the call to this. It may not work correctly
1876// if any other work appears after this call (such as returning).
1877// Typically the following call should be marked go:noinline so it
1878// performs a stack check.
1879//
1880// In rare cases this may not cause the stack to move, specifically if
1881// there's a preemption between this call and the next.
1882func gcTestMoveStackOnNextCall() {
1883	gp := getg()
1884	gp.stackguard0 = stackForceMove
1885}
1886
1887// gcTestIsReachable performs a GC and returns a bit set where bit i
1888// is set if ptrs[i] is reachable.
1889func gcTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) {
1890	// This takes the pointers as unsafe.Pointers in order to keep
1891	// them live long enough for us to attach specials. After
1892	// that, we drop our references to them.
1893
1894	if len(ptrs) > 64 {
1895		panic("too many pointers for uint64 mask")
1896	}
1897
1898	// Block GC while we attach specials and drop our references
1899	// to ptrs. Otherwise, if a GC is in progress, it could mark
1900	// them reachable via this function before we have a chance to
1901	// drop them.
1902	semacquire(&gcsema)
1903
1904	// Create reachability specials for ptrs.
1905	specials := make([]*specialReachable, len(ptrs))
1906	for i, p := range ptrs {
1907		lock(&mheap_.speciallock)
1908		s := (*specialReachable)(mheap_.specialReachableAlloc.alloc())
1909		unlock(&mheap_.speciallock)
1910		s.special.kind = _KindSpecialReachable
1911		if !addspecial(p, &s.special) {
1912			throw("already have a reachable special (duplicate pointer?)")
1913		}
1914		specials[i] = s
1915		// Make sure we don't retain ptrs.
1916		ptrs[i] = nil
1917	}
1918
1919	semrelease(&gcsema)
1920
1921	// Force a full GC and sweep.
1922	GC()
1923
1924	// Process specials.
1925	for i, s := range specials {
1926		if !s.done {
1927			printlock()
1928			println("runtime: object", i, "was not swept")
1929			throw("IsReachable failed")
1930		}
1931		if s.reachable {
1932			mask |= 1 << i
1933		}
1934		lock(&mheap_.speciallock)
1935		mheap_.specialReachableAlloc.free(unsafe.Pointer(s))
1936		unlock(&mheap_.speciallock)
1937	}
1938
1939	return mask
1940}
1941
1942// gcTestPointerClass returns the category of what p points to, one of:
1943// "heap", "stack", "data", "bss", "other". This is useful for checking
1944// that a test is doing what it's intended to do.
1945//
1946// This is nosplit simply to avoid extra pointer shuffling that may
1947// complicate a test.
1948//
1949//go:nosplit
1950func gcTestPointerClass(p unsafe.Pointer) string {
1951	p2 := uintptr(noescape(p))
1952	gp := getg()
1953	if gp.stack.lo <= p2 && p2 < gp.stack.hi {
1954		return "stack"
1955	}
1956	if base, _, _ := findObject(p2, 0, 0); base != 0 {
1957		return "heap"
1958	}
1959	for _, datap := range activeModules() {
1960		if datap.data <= p2 && p2 < datap.edata || datap.noptrdata <= p2 && p2 < datap.enoptrdata {
1961			return "data"
1962		}
1963		if datap.bss <= p2 && p2 < datap.ebss || datap.noptrbss <= p2 && p2 <= datap.enoptrbss {
1964			return "bss"
1965		}
1966	}
1967	KeepAlive(p)
1968	return "other"
1969}
1970