Lines Matching +full:cpu +full:- +full:capacity
1 .. SPDX-License-Identifier: GPL-2.0
7 CPU Performance Scaling
15 The Concept of CPU Performance Scaling
20 Operating Performance Points or P-states (in ACPI terminology). As a rule,
22 can be retired by the CPU over a unit of time, but also the higher the clock
24 time (or the more power is drawn) by the CPU in the given P-state. Therefore
25 there is a natural tradeoff between the CPU capacity (the number of instructions
26 that can be executed over a unit of time) and the power drawn by the CPU.
29 as possible and then there is no reason to use any P-states different from the
30 highest one (i.e. the highest-performance frequency/voltage configuration
32 instructions so quickly and maintaining the highest available CPU capacity for a
34 It also may not be physically possible to maintain maximum CPU capacity for too
35 long for thermal or power supply capacity reasons or similar. To cover those
38 put into different P-states.
40 Typically, they are used along with algorithms to estimate the required CPU
41 capacity, so as to decide which P-states to put the CPUs into. Of course, since
44 to as CPU performance scaling or CPU frequency scaling (because it involves
45 adjusting the CPU clock frequency).
48 CPU Performance Scaling in Linux
51 The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
52 (CPU Frequency scaling) subsystem that consists of three layers of code: the
56 interfaces for all platforms that support CPU performance scaling. It defines
59 Scaling governors implement algorithms to estimate the required CPU capacity.
64 information on the available P-states (or P-state ranges in some cases) and
65 access platform-specific hardware interfaces to change CPU P-states as requested
70 performance scaling algorithms for P-state selection can be represented in a
71 platform-independent form in the majority of cases, so it should be possible
80 platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers
88 In some cases the hardware interface for P-state control is shared by multiple
90 control the P-state of multiple CPUs at the same time and writing to it affects
93 Sets of CPUs sharing hardware P-state control interfaces are represented by
95 struct cpufreq_policy is also used when there is only one CPU in the given
99 every CPU in the system, including CPUs that are currently offline. If multiple
100 CPUs share the same hardware P-state control interface, all of the pointers
107 CPU Initialization
114 The scaling driver may be registered before or after CPU registration. If
121 In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
122 has not seen so far as soon as it is ready to handle that CPU. [Note that the
123 logical CPU may be a physical single-core processor, or a single core in a
125 core. In what follows "CPU" always means "logical CPU" unless explicitly stated
130 for the given CPU and if so, it skips the policy object creation. Otherwise,
133 the given CPU is set to the new policy object's address in memory.
135 Next, the scaling driver's ``->init()`` callback is invoked with the policy
136 pointer of the new CPU passed to it as the argument. That callback is expected
137 to initialize the performance scaling hardware interface for the given CPU (or,
142 the set of supported P-states is not a continuous range), and the mask of CPUs
151 the governor's ``->init()`` callback which is expected to initialize all of the
154 invoking its ``->start()`` callback.
156 That callback is expected to register per-CPU utilization update callbacks for
157 all of the online CPUs belonging to the given policy with the CPU scheduler.
158 The utilization update callbacks will be invoked by the CPU scheduler on
160 scheduler tick or generally whenever the CPU utilization may change (from the
162 to determine the P-state to use for the given policy going forward and to
164 the P-state selection. The scaling driver may be invoked directly from
172 "inactive" (and is re-initialized now) instead of the default governor.
174 In turn, if a previously offline CPU is being brought back online, but some
176 need to re-initialize the policy object at all. In that case, it only is
177 necessary to restart the scaling governor so that it can take the new online CPU
178 into account. That is achieved by invoking the governor's ``->stop`` and
179 ``->start()`` callbacks, in this order, for the entire policy.
182 governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
184 new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked
185 to register per-CPU utilization update callbacks for each policy. These
186 callbacks are invoked by the CPU scheduler in the same way as for scaling
187 governors, but in the |intel_pstate| case they both determine the P-state to
191 The policy objects created during CPU initialization and other data structures
194 when the last CPU belonging to the given policy in unregistered.
202 :file:`/sys/devices/system/cpu/`.
207 under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
210 in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
217 also add driver-specific attributes to the policy directories in ``sysfs`` to
218 control policy-specific aspects of driver behavior.
220 The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
230 CPU frequencies, that limit will be reported through this attribute (if
235 BIOS/HW-based mechanisms.
261 P-state to another, in nanoseconds.
264 work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
283 In the majority of cases, this is the frequency of the last P-state
286 the CPU is actually running at (due to hardware design and other
290 more precisely reflecting the current CPU frequency through this
291 attribute, but that still may not be the exact current CPU frequency as
302 This attribute is read-write and writing to it will cause a new scaling
313 This attribute is read-write and writing a string representing an
321 This attribute is read-write and writing a string representing a
322 non-negative integer to it will cause a new limit to be set (it must not
347 Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
349 tunables, can be either global (system-wide) or per-policy, depending on the
351 per-policy, they are located in a subdirectory of each policy directory.
353 :file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the
358 ---------------
368 -------------
378 -------------
381 to set the CPU frequency for the policy it is attached to by writing to the
385 -------------
387 This governor uses CPU utilization data available from the CPU scheduler. It
388 generally is regarded as a part of the CPU scheduler, so it can access the
392 invoke the scaling driver asynchronously when it decides that the CPU frequency
394 is capable of changing the CPU frequency from scheduler context).
396 The actions of this governor for a particular CPU depend on the scheduling class
397 invoking its utilization update callback for that CPU. If it is invoked by the
401 Per-Entity Load Tracking (PELT) metric for the root control group of the
402 given CPU as the CPU utilization estimate (see the *Per-entity load tracking*
404 CPU frequency to apply is computed in accordance with the formula
409 ``util``, and ``f_0`` is either the maximum possible CPU frequency for the given
410 policy (if the PELT number is frequency-invariant), or the current CPU frequency
414 CPU frequency for tasks that have been waiting on I/O most recently, called
415 "IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
432 tightly integrated with the CPU scheduler, its overhead in terms of CPU context
433 switches and similar is less significant, and it uses the scheduler's own CPU
438 ------------
440 This governor uses CPU load as a CPU frequency selection metric.
442 In order to estimate the current CPU load, it measures the time elapsed between
444 time in which the given CPU was not idle. The ratio of the non-idle (active)
445 time to the total CPU time is taken as an estimate of the load.
452 invoked asynchronously (via a workqueue) and CPU P-states are updated from
454 governor is minimum, but it causes additional CPU context switches to happen
455 relatively often and the CPU P-state updates triggered by it can be relatively
456 irregular. Also, it affects its own CPU load metric by running code that
457 reduces the CPU idle time (even though the CPU idle time is only reduced very
460 It generally selects CPU frequencies proportional to the estimated load, so that
480 If this tunable is per-policy, the following shell command sets the time
486 If the estimated CPU load is above this value (in percent), the governor
489 CPU load.
492 If set to 1 (default 0), it will cause the CPU load estimation code to
493 treat the CPU time spent on executing tasks with "nice" levels greater
494 than 0 as CPU idle time.
503 the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
510 at the cost of additional energy spent on maintaining the maximum CPU
511 capacity.
516 value is exceeded by the estimated CPU load) or sensitivity threshold
524 f * (1 - ``powersave_bias`` / 1000)
536 workload running on a CPU will change in response to frequency changes.
538 The performance of a workload with the sensitivity of 0 (memory-bound or
539 IO-bound) is not expected to increase at all as a result of increasing
540 the CPU frequency, whereas workloads with the sensitivity of 100%
541 (CPU-bound) are expected to perform much better if the CPU frequency is
547 target, so as to avoid over-provisioning workloads that will not benefit
548 from running at higher CPU frequencies.
551 ----------------
553 This governor uses CPU load as a CPU frequency selection metric.
555 It estimates the CPU load in the same way as the `ondemand`_ governor described
556 above, but the CPU frequency selection algorithm implemented by it is different.
559 which may not be suitable for systems with limited power supply capacity (e.g.
560 battery-powered). To achieve that, it changes the frequency in relatively
561 small steps, one step at a time, up or down - depending on whether or not a
562 (configurable) threshold has been exceeded by the estimated CPU load.
581 If the estimated CPU load is greater than this value, the frequency will
598 ----------
607 "Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
612 The frequency boost mechanism may be either hardware-based or software-based.
613 If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
615 into a special state in which it can control the CPU frequency within certain
616 limits). If it is software-based (e.g. on ARM), the scaling driver decides
620 -------------------------------
622 This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
625 but provides a driver-specific interface for controlling it, like
630 trigger boosting (in the hardware-based case), or the software is allowed to
631 trigger boosting (in the software-based case). It does not mean that boosting
642 --------------------------------
645 CPU performance on time scales below software resolution (e.g. below the
658 limited capacity, such as batteries, so the ability to disable the boost
672 single-thread performance may vary because of it which may lead to
678 -----------------------
680 The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
685 ``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
687 implementation, however, works on the system-wide basis and setting that knob
707 .. [1] Jonathan Corbet, *Per-entity load tracking*,