cpufreq.rst - OpenGrok cross reference for /Documentation/admin-guide/pm/cpufreq.rst

Lines Matching +full:cpu +full:- +full:core
1 .. SPDX-License-Identifier: GPL-2.0
7 CPU Performance Scaling
15 The Concept of CPU Performance Scaling
20 Operating Performance Points or P-states (in ACPI terminology).  As a rule,
22 can be retired by the CPU over a unit of time, but also the higher the clock
24 time (or the more power is drawn) by the CPU in the given P-state.  Therefore
25 there is a natural tradeoff between the CPU capacity (the number of instructions
26 that can be executed over a unit of time) and the power drawn by the CPU.
29 as possible and then there is no reason to use any P-states different from the
30 highest one (i.e. the highest-performance frequency/voltage configuration
32 instructions so quickly and maintaining the highest available CPU capacity for a
34 It also may not be physically possible to maintain maximum CPU capacity for too
38 put into different P-states.
40 Typically, they are used along with algorithms to estimate the required CPU
41 capacity, so as to decide which P-states to put the CPUs into.  Of course, since
44 to as CPU performance scaling or CPU frequency scaling (because it involves
45 adjusting the CPU clock frequency).
48 CPU Performance Scaling in Linux
51 The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
52 (CPU Frequency scaling) subsystem that consists of three layers of code: the
53 core, scaling governors and scaling drivers.
55 The ``CPUFreq`` core provides the common code infrastructure and user space
56 interfaces for all platforms that support CPU performance scaling.  It defines
59 Scaling governors implement algorithms to estimate the required CPU capacity.
64 information on the available P-states (or P-state ranges in some cases) and
65 access platform-specific hardware interfaces to change CPU P-states as requested
70 performance scaling algorithms for P-state selection can be represented in a
71 platform-independent form in the majority of cases, so it should be possible
80 platform-independent way.  For this reason, ``CPUFreq`` allows scaling drivers
88 In some cases the hardware interface for P-state control is shared by multiple
90 control the P-state of multiple CPUs at the same time and writing to it affects
93 Sets of CPUs sharing hardware P-state control interfaces are represented by
95 struct cpufreq_policy is also used when there is only one CPU in the given
98 The ``CPUFreq`` core maintains a pointer to a struct cpufreq_policy object for
99 every CPU in the system, including CPUs that are currently offline.  If multiple
100 CPUs share the same hardware P-state control interface, all of the pointers
107 CPU Initialization
114 The scaling driver may be registered before or after CPU registration.  If
115 CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to
118 the scaling driver, the ``CPUFreq`` core will be invoked to take note of them
121 In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
122 has not seen so far as soon as it is ready to handle that CPU.  [Note that the
123 logical CPU may be a physical single-core processor, or a single core in a
125 core.  In what follows "CPU" always means "logical CPU" unless explicitly stated
129 Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set
130 for the given CPU and if so, it skips the policy object creation.  Otherwise,
133 the given CPU is set to the new policy object's address in memory.
135 Next, the scaling driver's ``->init()`` callback is invoked with the policy
136 pointer of the new CPU passed to it as the argument.  That callback is expected
137 to initialize the performance scaling hardware interface for the given CPU (or,
142 the set of supported P-states is not a continuous range), and the mask of CPUs
144 mask is then used by the core to populate the policy pointers for all of the
151 the governor's ``->init()`` callback which is expected to initialize all of the
154 invoking its ``->start()`` callback.
156 That callback is expected to register per-CPU utilization update callbacks for
157 all of the online CPUs belonging to the given policy with the CPU scheduler.
158 The utilization update callbacks will be invoked by the CPU scheduler on
160 scheduler tick or generally whenever the CPU utilization may change (from the
162 to determine the P-state to use for the given policy going forward and to
164 the P-state selection.  The scaling driver may be invoked directly from
170 only practical difference in that case is that the ``CPUFreq`` core will attempt
172 "inactive" (and is re-initialized now) instead of the default governor.
174 In turn, if a previously offline CPU is being brought back online, but some
176 need to re-initialize the policy object at all.  In that case, it only is
177 necessary to restart the scaling governor so that it can take the new online CPU
178 into account.  That is achieved by invoking the governor's ``->stop`` and
179 ``->start()`` callbacks, in this order, for the entire policy.
182 governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
184 new policy objects.  Instead, the driver's ``->setpolicy()`` callback is invoked
185 to register per-CPU utilization update callbacks for each policy.  These
186 callbacks are invoked by the CPU scheduler in the same way as for scaling
187 governors, but in the |intel_pstate| case they both determine the P-state to
191 The policy objects created during CPU initialization and other data structures
194 when the last CPU belonging to the given policy in unregistered.
200 During the initialization of the kernel, the ``CPUFreq`` core creates a
202 :file:`/sys/devices/system/cpu/`.
205 integer number) for every policy object maintained by the ``CPUFreq`` core.
207 under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
210 in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
214 Some of those attributes are generic.  They are created by the ``CPUFreq`` core
217 also add driver-specific attributes to the policy directories in ``sysfs`` to
218 control policy-specific aspects of driver behavior.
220 The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
230 	CPU frequencies, that limit will be reported through this attribute (if
235 	BIOS/HW-based mechanisms.
261 	P-state to another, in nanoseconds.
264 	work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
287 	In the majority of cases, this is the frequency of the last P-state
290 	the CPU is actually running at (due to hardware design and other
294 	more precisely reflecting the current CPU frequency through this
295 	attribute, but that still may not be the exact current CPU frequency as
306 	This attribute is read-write and writing to it will cause a new scaling
317 	This attribute is read-write and writing a string representing an
325 	This attribute is read-write and writing a string representing a
326 	non-negative integer to it will cause a new limit to be set (it must not
351 Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
353 tunables, can be either global (system-wide) or per-policy, depending on the
355 per-policy, they are located in a subdirectory of each policy directory.
357 :file:`/sys/devices/system/cpu/cpufreq/`.  In either case the name of the
362 ---------------
372 -------------
382 -------------
385 to set the CPU frequency for the policy it is attached to by writing to the
389 -------------
391 This governor uses CPU utilization data available from the CPU scheduler.  It
392 generally is regarded as a part of the CPU scheduler, so it can access the
396 invoke the scaling driver asynchronously when it decides that the CPU frequency
398 is capable of changing the CPU frequency from scheduler context).
400 The actions of this governor for a particular CPU depend on the scheduling class
401 invoking its utilization update callback for that CPU.  If it is invoked by the
405 Per-Entity Load Tracking (PELT) metric for the root control group of the
406 given CPU as the CPU utilization estimate (see the *Per-entity load tracking*
408 CPU frequency to apply is computed in accordance with the formula
413 ``util``, and ``f_0`` is either the maximum possible CPU frequency for the given
414 policy (if the PELT number is frequency-invariant), or the current CPU frequency
418 CPU frequency for tasks that have been waiting on I/O most recently, called
419 "IO-wait boosting".  That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
436 tightly integrated with the CPU scheduler, its overhead in terms of CPU context
437 switches and similar is less significant, and it uses the scheduler's own CPU
442 ------------
444 This governor uses CPU load as a CPU frequency selection metric.
446 In order to estimate the current CPU load, it measures the time elapsed between
448 time in which the given CPU was not idle.  The ratio of the non-idle (active)
449 time to the total CPU time is taken as an estimate of the load.
456 invoked asynchronously (via a workqueue) and CPU P-states are updated from
458 governor is minimum, but it causes additional CPU context switches to happen
459 relatively often and the CPU P-state updates triggered by it can be relatively
460 irregular.  Also, it affects its own CPU load metric by running code that
461 reduces the CPU idle time (even though the CPU idle time is only reduced very
464 It generally selects CPU frequencies proportional to the estimated load, so that
483 	If this tunable is per-policy, the following shell command sets the time
490 	If the estimated CPU load is above this value (in percent), the governor
493 	CPU load.
496 	If set to 1 (default 0), it will cause the CPU load estimation code to
497 	treat the CPU time spent on executing tasks with "nice" levels greater
498 	than 0 as CPU idle time.
507 	the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
514 	at the cost of additional energy spent on maintaining the maximum CPU
520 	value is exceeded by the estimated CPU load) or sensitivity threshold
528 		f * (1 - ``powersave_bias`` / 1000)
540 	workload running on a CPU will change in response to frequency changes.
542 	The performance of a workload with the sensitivity of 0 (memory-bound or
543 	IO-bound) is not expected to increase at all as a result of increasing
544 	the CPU frequency, whereas workloads with the sensitivity of 100%
545 	(CPU-bound) are expected to perform much better if the CPU frequency is
551 	target, so as to avoid over-provisioning workloads that will not benefit
552 	from running at higher CPU frequencies.
555 ----------------
557 This governor uses CPU load as a CPU frequency selection metric.
559 It estimates the CPU load in the same way as the `ondemand`_ governor described
560 above, but the CPU frequency selection algorithm implemented by it is different.
564 battery-powered).  To achieve that, it changes the frequency in relatively
565 small steps, one step at a time, up or down - depending on whether or not a
566 (configurable) threshold has been exceeded by the estimated CPU load.
585 	If the estimated CPU load is greater than this value, the frequency will
602 ----------
611 "Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
616 The frequency boost mechanism may be either hardware-based or software-based.
617 If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
619 into a special state in which it can control the CPU frequency within certain
620 limits).  If it is software-based (e.g. on ARM), the scaling driver decides
624 -------------------------------
626 This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
629 but provides a driver-specific interface for controlling it, like
634 trigger boosting (in the hardware-based case), or the software is allowed to
635 trigger boosting (in the software-based case).  It does not mean that boosting
646 --------------------------------
649 CPU performance on time scales below software resolution (e.g. below the
676      single-thread performance may vary because of it which may lead to
682 -----------------------
684 The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
685 the global ``boost`` one.  It is used for disabling/enabling the "Core
689 ``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
691 implementation, however, works on the system-wide basis and setting that knob
711 .. [1] Jonathan Corbet, *Per-entity load tracking*,