Lines Matching +full:default +full:- +full:sample +full:- +full:phase
1 .. SPDX-License-Identifier: GPL-2.0
14 2.2.1 Default mode
15 2.2.2 Per-thread mode
16 2.2.3 Per-CPU mode
19 2.3.1 Producer-consumer model
55 -------------------
63 +---------------------------+
65 +---------------------------+
66 `-> Tail `-> Head
86 read-only mapping, which is to be addressed in the section
92 +---------+---------+ +---------------------------------------+
94 +---------+---------+ +---------------------------------------+
95 ` `----------------^ ^
96 `----------------------------------------------|
103 with option ``-m`` or ``--mmap-pages=``, the given size will be rounded up
114 -------------------------------------------
116 The perf profiles programs with different modes: default mode, per thread
121 2.2.1 Default mode
130 perf tool applies the default mode on the perf event. It maps all the
135 evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
151 +----+ +-----------+ +----+
153 +----+--------------+-----------+----------+----+-------->
156 +-----------------------------------------------------+
158 +-----------------------------------------------------+
161 +-----+
163 -----+-----+--------------------------------------------->
166 +-----------------------------------------------------+
168 +-----------------------------------------------------+
171 +----+ +-------+
173 --------------------------+----+--------+-------+-------->
176 +-----------------------------------------------------+
178 +-----------------------------------------------------+
181 +--------------+
183 -----------+--------------+------------------------------>
186 +-----------------------------------------------------+
188 +-----------------------------------------------------+
193 Figure 3. Ring buffer for default mode
195 2.2.2 Per-thread mode
198 By specifying option ``--per-thread`` in perf command, e.g.
202 perf record --per-thread test_program
207 evsel::cpus::map[0] = { -1 }
221 +----+ +-----------+ +----+
223 +----+--------------+-----------+----------+----+-------->
226 | +-----+ |
228 --|--+-----+----------------------------------|---------->
231 | | +----+ +---+ |
233 --|-----|-----------------+----+--------+---+-|---------->
236 | | +--------------+ | |
238 --|-----|--+--------------+-|-----------------|---------->
241 +-----------------------------------------------------+
243 +-----------------------------------------------------+
248 Figure 4. Ring buffer for per-thread mode
250 When perf runs in per-thread mode, a ring buffer is allocated for the
256 2.2.3 Per-CPU mode
259 The option ``-C`` is used to collect samples on the list of CPUs, for
260 example the below perf command receives option ``-C 0,2``::
262 perf record -C 0,2 test_program
268 evsel::threads::map[] = { -1 }
271 This results in the session of ``perf record`` will sample all threads on CPU0
275 options for per-thread mode and per-CPU mode, e.g. the options ``–C 0,2`` and
282 +----+ +-----------+ +----+
284 +----+--------------+-----------+----------+----+-------->
287 +-----------------------------------------------------+
289 +-----------------------------------------------------+
292 +-----+
294 -----+-----+--------------------------------------------->
297 +----+ +-------+
299 --------------------------+----+--------+-------+-------->
302 +-----------------------------------------------------+
304 +-----------------------------------------------------+
307 +--------------+
309 -----------+--------------+------------------------------>
314 Figure 5. Ring buffer for per-CPU mode
322 perf record -a test_program
324 Similar to the per-CPU mode, the perf event doesn't bind to any PID, and
327 evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
328 evsel::threads::map[] = { -1 }
338 +----+ +-----------+ +----+
340 +----+--------------+-----------+----------+----+-------->
343 +-----------------------------------------------------+
345 +-----------------------------------------------------+
348 +-----+
350 -----+-----+--------------------------------------------->
353 +-----------------------------------------------------+
355 +-----------------------------------------------------+
358 +----+ +-------+
360 --------------------------+----+--------+-------+-------->
363 +-----------------------------------------------------+
365 +-----------------------------------------------------+
368 +--------------+
370 -----------+--------------+------------------------------>
373 +-----------------------------------------------------+
375 +-----------------------------------------------------+
383 --------------------
388 2.3.1 Producer-consumer model
394 data into the file for post analysis. It’s a typical producer-consumer
400 stored in the ``perf_buffer::watermark``. When a sample is recorded into
408 Polling / `--------------| Ring buffer
409 v v ;---------------------v
410 +----------------+ +---------+---------+ +-------------------+
412 +----------------+ +---------+---------+ +-------------------+
413 ^ ^ `------------------------^
415 +-----------------------------+
417 +-----------------------------+
445 Additionally, the tool can map buffers in either read-write mode or read-only
448 The ring buffer in the read-write mode is mapped with the property
455 Alternatively, in the read-only mode, only the kernel keeps to update
461 combinations to support buffer types: the non-overwrite buffer and the
464 .. list-table::
466 :header-rows: 1
468 * - Mapping mode
469 - Forward
470 - Backward
471 * - read-write
472 - Non-overwrite ring buffer
473 - Not used
474 * - read-only
475 - Not used
476 - Overwritable ring buffer
478 The non-overwrite ring buffer uses the read-write mapping with forward
480 and wrap around when overflow, which is used with the read-write mode in
487 read-only mode. It saves the data from the end of the ring buffer and
498 When a sample is taken and saved into the ring buffer, the kernel
499 prepares sample fields based on the sample type; then it prepares the
501 ``perf_output_handle``. In the end, the kernel outputs the sample into
542 if (LOAD ->data_tail) { LOAD ->data_head
546 STORE ->data_head STORE ->data_tail
554 pointer ``perf_event_mmap_page::data_tail`` and filling sample into ring
564 makes sure that recording a sample must be prior to updating the head
575 Some architectures support one-way permeable barrier with load-acquire
576 and store-release operations, these barriers are more relaxed with less
580 If an architecture doesn’t support load-acquire and store-release in its
593 examine how the AUX ring buffer co-works with the regular ring buffer,
598 ---------------------------------------------------------
619 During the initialisation phase, besides the mmap()-ed regular ring
622 non-zero file offset; ``rb_alloc_aux()`` in the kernel allocates pages
630 perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2
638 are allocated in pairs. The perf in default mode allocates the regular
639 ring buffer and the AUX ring buffer per CPU-wise, which is the same as
640 the system wide mode, however, the default mode records samples only for
642 in the system. For per-thread mode, the perf tool allocates only one
644 the per-CPU mode, the perf allocates two kinds of ring buffers for
645 selected CPUs specified by the option ``-C``.
655 +----+ +-----------+ +----+
657 +----+--------------+-----------+----------+----+-------->
660 +-----------------------------------------------------+
662 +-----------------------------------------------------+
665 +-----------------------------------------------------+
667 +-----------------------------------------------------+
670 +-----+
672 -----+-----+--------------------------------------------->
675 +-----------------------------------------------------+
677 +-----------------------------------------------------+
680 +-----------------------------------------------------+
682 +-----------------------------------------------------+
685 +----+ +-------+
687 --------------------------+----+--------+-------+-------->
690 +-----------------------------------------------------+
692 +-----------------------------------------------------+
695 +-----------------------------------------------------+
697 +-----------------------------------------------------+
700 +--------------+
702 -----------+--------------+------------------------------>
705 +-----------------------------------------------------+
707 +-----------------------------------------------------+
710 +-----------------------------------------------------+
712 +-----------------------------------------------------+
720 --------------
738 - It fills an event ``PERF_RECORD_AUX`` into the regular ring buffer, this
742 - Since the hardware trace driver has stored new trace data into the AUX
749 will introduce a discontinuity during decoding phase.
762 -----------------
769 perf record -e cs_etm/@tmc_etr0/u -S -a program &
772 kill -USR2 $PERFPID
778 - Before a snapshot is taken, the AUX ring buffer acts in free run mode.
782 - Once the perf tool receives the *USR2* signal, it triggers the callback
788 - Then perf tool takes a snapshot, ``record__read_auxtrace_snapshot()``
792 - After the snapshot is finished, ``auxtrace_record::snapshot_finish()``
804 As we know, the buffers' deployment can be per-thread mode, per-CPU
814 +------------------------+
815 | AUX Ring buffer 0 | <- aux_head
816 +------------------------+
818 +--------------------------------+
819 | AUX Ring buffer 1 | <- aux_head
820 +--------------------------------+
822 +--------------------------------------------+
823 | AUX Ring buffer 2 | <- aux_head
824 +--------------------------------------------+
826 +---------------------------------------+
827 | AUX Ring buffer 3 | <- aux_head
828 +---------------------------------------+