1perf-record(1) 2============== 3 4NAME 5---- 6perf-record - Run a command and record its profile into perf.data 7 8SYNOPSIS 9-------- 10[verse] 11'perf record' [-e <EVENT> | --event=EVENT] [-a] <command> 12'perf record' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>] 13 14DESCRIPTION 15----------- 16This command runs a command and gathers a performance counter profile 17from it, into perf.data - without displaying anything. 18 19This file can then be inspected later on, using 'perf report'. 20 21 22OPTIONS 23------- 24<command>...:: 25 Any command you can specify in a shell. 26 27-e:: 28--event=:: 29 Select the PMU event. Selection can be: 30 31 - a symbolic event name (use 'perf list' to list all events) 32 33 - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a 34 hexadecimal event descriptor. 35 36 - a symbolic or raw PMU event followed by an optional colon 37 and a list of event modifiers, e.g., cpu-cycles:p. See the 38 linkperf:perf-list[1] man page for details on event modifiers. 39 40 - a symbolically formed PMU event like 'pmu/param1=0x3,param2/' where 41 'param1', 'param2', etc are defined as formats for the PMU in 42 /sys/bus/event_source/devices/<pmu>/format/*. 43 44 - a symbolically formed event like 'pmu/config=M,config1=N,config3=K/' 45 46 where M, N, K are numbers (in decimal, hex, octal format). Acceptable 47 values for each of 'config', 'config1' and 'config2' are defined by 48 corresponding entries in /sys/bus/event_source/devices/<pmu>/format/* 49 param1 and param2 are defined as formats for the PMU in: 50 /sys/bus/event_source/devices/<pmu>/format/* 51 52 There are also some parameters which are not defined in .../<pmu>/format/*. 53 These params can be used to overload default config values per event. 54 Here are some common parameters: 55 - 'period': Set event sampling period 56 - 'freq': Set event sampling frequency 57 - 'time': Disable/enable time stamping. Acceptable values are 1 for 58 enabling time stamping. 0 for disabling time stamping. 59 The default is 1. 60 - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for 61 FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and 62 "no" for disable callgraph. 63 - 'stack-size': user stack size for dwarf mode 64 - 'name' : User defined event name. Single quotes (') may be used to 65 escape symbols in the name from parsing by shell and tool 66 like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'. 67 - 'aux-output': Generate AUX records instead of events. This requires 68 that an AUX area event is also provided. 69 70 See the linkperf:perf-list[1] man page for more parameters. 71 72 Note: If user explicitly sets options which conflict with the params, 73 the value set by the parameters will be overridden. 74 75 Also not defined in .../<pmu>/format/* are PMU driver specific 76 configuration parameters. Any configuration parameter preceded by 77 the letter '@' is not interpreted in user space and sent down directly 78 to the PMU driver. For example: 79 80 perf record -e some_event/@cfg1,@cfg2=config/ ... 81 82 will see 'cfg1' and 'cfg2=config' pushed to the PMU driver associated 83 with the event for further processing. There is no restriction on 84 what the configuration parameters are, as long as their semantic is 85 understood and supported by the PMU driver. 86 87 - a hardware breakpoint event in the form of '\mem:addr[/len][:access]' 88 where addr is the address in memory you want to break in. 89 Access is the memory access type (read, write, execute) it can 90 be passed as follows: '\mem:addr[:[r][w][x]]'. len is the range, 91 number of bytes from specified addr, which the breakpoint will cover. 92 If you want to profile read-write accesses in 0x1000, just set 93 'mem:0x1000:rw'. 94 If you want to profile write accesses in [0x1000~1008), just set 95 'mem:0x1000/8:w'. 96 97 - a BPF source file (ending in .c) or a precompiled object file (ending 98 in .o) selects one or more BPF events. 99 The BPF program can attach to various perf events based on the ELF section 100 names. 101 102 When processing a '.c' file, perf searches an installed LLVM to compile it 103 into an object file first. Optional clang options can be passed via the 104 '--clang-opt' command line option, e.g.: 105 106 perf record --clang-opt "-DLINUX_VERSION_CODE=0x50000" \ 107 -e tests/bpf-script-example.c 108 109 Note: '--clang-opt' must be placed before '--event/-e'. 110 111 - a group of events surrounded by a pair of brace ("{event1,event2,...}"). 112 Each event is separated by commas and the group should be quoted to 113 prevent the shell interpretation. You also need to use --group on 114 "perf report" to view group events together. 115 116--filter=<filter>:: 117 Event filter. This option should follow an event selector (-e) which 118 selects either tracepoint event(s) or a hardware trace PMU 119 (e.g. Intel PT or CoreSight). 120 121 - tracepoint filters 122 123 In the case of tracepoints, multiple '--filter' options are combined 124 using '&&'. 125 126 - address filters 127 128 A hardware trace PMU advertises its ability to accept a number of 129 address filters by specifying a non-zero value in 130 /sys/bus/event_source/devices/<pmu>/nr_addr_filters. 131 132 Address filters have the format: 133 134 filter|start|stop|tracestop <start> [/ <size>] [@<file name>] 135 136 Where: 137 - 'filter': defines a region that will be traced. 138 - 'start': defines an address at which tracing will begin. 139 - 'stop': defines an address at which tracing will stop. 140 - 'tracestop': defines a region in which tracing will stop. 141 142 <file name> is the name of the object file, <start> is the offset to the 143 code to trace in that file, and <size> is the size of the region to 144 trace. 'start' and 'stop' filters need not specify a <size>. 145 146 If no object file is specified then the kernel is assumed, in which case 147 the start address must be a current kernel memory address. 148 149 <start> can also be specified by providing the name of a symbol. If the 150 symbol name is not unique, it can be disambiguated by inserting #n where 151 'n' selects the n'th symbol in address order. Alternately #0, #g or #G 152 select only a global symbol. <size> can also be specified by providing 153 the name of a symbol, in which case the size is calculated to the end 154 of that symbol. For 'filter' and 'tracestop' filters, if <size> is 155 omitted and <start> is a symbol, then the size is calculated to the end 156 of that symbol. 157 158 If <size> is omitted and <start> is '*', then the start and size will 159 be calculated from the first and last symbols, i.e. to trace the whole 160 file. 161 162 If symbol names (or '*') are provided, they must be surrounded by white 163 space. 164 165 The filter passed to the kernel is not necessarily the same as entered. 166 To see the filter that is passed, use the -v option. 167 168 The kernel may not be able to configure a trace region if it is not 169 within a single mapping. MMAP events (or /proc/<pid>/maps) can be 170 examined to determine if that is a possibility. 171 172 Multiple filters can be separated with space or comma. 173 174--exclude-perf:: 175 Don't record events issued by perf itself. This option should follow 176 an event selector (-e) which selects tracepoint event(s). It adds a 177 filter expression 'common_pid != $PERFPID' to filters. If other 178 '--filter' exists, the new filter expression will be combined with 179 them by '&&'. 180 181-a:: 182--all-cpus:: 183 System-wide collection from all CPUs (default if no target is specified). 184 185-p:: 186--pid=:: 187 Record events on existing process ID (comma separated list). 188 189-t:: 190--tid=:: 191 Record events on existing thread ID (comma separated list). 192 This option also disables inheritance by default. Enable it by adding 193 --inherit. 194 195-u:: 196--uid=:: 197 Record events in threads owned by uid. Name or number. 198 199-r:: 200--realtime=:: 201 Collect data with this RT SCHED_FIFO priority. 202 203--no-buffering:: 204 Collect data without buffering. 205 206-c:: 207--count=:: 208 Event period to sample. 209 210-o:: 211--output=:: 212 Output file name. 213 214-i:: 215--no-inherit:: 216 Child tasks do not inherit counters. 217 218-F:: 219--freq=:: 220 Profile at this frequency. Use 'max' to use the currently maximum 221 allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate 222 sysctl. Will throttle down to the currently maximum allowed frequency. 223 See --strict-freq. 224 225--strict-freq:: 226 Fail if the specified frequency can't be used. 227 228-m:: 229--mmap-pages=:: 230 Number of mmap data pages (must be a power of two) or size 231 specification with appended unit character - B/K/M/G. The 232 size is rounded up to have nearest pages power of two value. 233 Also, by adding a comma, the number of mmap pages for AUX 234 area tracing can be specified. 235 236--group:: 237 Put all events in a single event group. This precedes the --event 238 option and remains only for backward compatibility. See --event. 239 240-g:: 241 Enables call-graph (stack chain/backtrace) recording. 242 243--call-graph:: 244 Setup and enable call-graph (stack chain/backtrace) recording, 245 implies -g. Default is "fp". 246 247 Allows specifying "fp" (frame pointer) or "dwarf" 248 (DWARF's CFI - Call Frame Information) or "lbr" 249 (Hardware Last Branch Record facility) as the method to collect 250 the information used to show the call graphs. 251 252 In some systems, where binaries are build with gcc 253 --fomit-frame-pointer, using the "fp" method will produce bogus 254 call graphs, using "dwarf", if available (perf tools linked to 255 the libunwind or libdw library) should be used instead. 256 Using the "lbr" method doesn't require any compiler options. It 257 will produce call graphs from the hardware LBR registers. The 258 main limitation is that it is only available on new Intel 259 platforms, such as Haswell. It can only get user call chain. It 260 doesn't work with branch stack sampling at the same time. 261 262 When "dwarf" recording is used, perf also records (user) stack dump 263 when sampled. Default size of the stack dump is 8192 (bytes). 264 User can change the size by passing the size after comma like 265 "--call-graph dwarf,4096". 266 267-q:: 268--quiet:: 269 Don't print any message, useful for scripting. 270 271-v:: 272--verbose:: 273 Be more verbose (show counter open errors, etc). 274 275-s:: 276--stat:: 277 Record per-thread event counts. Use it with 'perf report -T' to see 278 the values. 279 280-d:: 281--data:: 282 Record the sample virtual addresses. 283 284--phys-data:: 285 Record the sample physical addresses. 286 287-T:: 288--timestamp:: 289 Record the sample timestamps. Use it with 'perf report -D' to see the 290 timestamps, for instance. 291 292-P:: 293--period:: 294 Record the sample period. 295 296--sample-cpu:: 297 Record the sample cpu. 298 299-n:: 300--no-samples:: 301 Don't sample. 302 303-R:: 304--raw-samples:: 305Collect raw sample records from all opened counters (default for tracepoint counters). 306 307-C:: 308--cpu:: 309Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a 310comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. 311In per-thread mode with inheritance mode on (default), samples are captured only when 312the thread executes on the designated CPUs. Default is to monitor all CPUs. 313 314-B:: 315--no-buildid:: 316Do not save the build ids of binaries in the perf.data files. This skips 317post processing after recording, which sometimes makes the final step in 318the recording process to take a long time, as it needs to process all 319events looking for mmap records. The downside is that it can misresolve 320symbols if the workload binaries used when recording get locally rebuilt 321or upgraded, because the only key available in this case is the 322pathname. You can also set the "record.build-id" config variable to 323'skip to have this behaviour permanently. 324 325-N:: 326--no-buildid-cache:: 327Do not update the buildid cache. This saves some overhead in situations 328where the information in the perf.data file (which includes buildids) 329is sufficient. You can also set the "record.build-id" config variable to 330'no-cache' to have the same effect. 331 332-G name,...:: 333--cgroup name,...:: 334monitor only in the container (cgroup) called "name". This option is available only 335in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to 336container "name" are monitored when they run on the monitored CPUs. Multiple cgroups 337can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup 338to first event, second cgroup to second event and so on. It is possible to provide 339an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have 340corresponding events, i.e., they always refer to events defined earlier on the command 341line. If the user wants to track multiple events for a specific cgroup, the user can 342use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'. 343 344If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this 345command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'. 346 347-b:: 348--branch-any:: 349Enable taken branch stack sampling. Any type of taken branch may be sampled. 350This is a shortcut for --branch-filter any. See --branch-filter for more infos. 351 352-j:: 353--branch-filter:: 354Enable taken branch stack sampling. Each sample captures a series of consecutive 355taken branches. The number of branches captured with each sample depends on the 356underlying hardware, the type of branches of interest, and the executed code. 357It is possible to select the types of branches captured by enabling filters. The 358following filters are defined: 359 360 - any: any type of branches 361 - any_call: any function call or system call 362 - any_ret: any function return or system call return 363 - ind_call: any indirect branch 364 - call: direct calls, including far (to/from kernel) calls 365 - u: only when the branch target is at the user level 366 - k: only when the branch target is in the kernel 367 - hv: only when the target is at the hypervisor level 368 - in_tx: only when the target is in a hardware transaction 369 - no_tx: only when the target is not in a hardware transaction 370 - abort_tx: only when the target is a hardware transaction abort 371 - cond: conditional branches 372 - save_type: save branch type during sampling in case binary is not available later 373 374+ 375The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. 376The privilege levels may be omitted, in which case, the privilege levels of the associated 377event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege 378levels are subject to permissions. When sampling on multiple events, branch stack sampling 379is enabled for all the sampling events. The sampled branch type is the same for all events. 380The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k 381Note that this feature may not be available on all processors. 382 383--weight:: 384Enable weightened sampling. An additional weight is recorded per sample and can be 385displayed with the weight and local_weight sort keys. This currently works for TSX 386abort events and some memory events in precise mode on modern Intel CPUs. 387 388--namespaces:: 389Record events of type PERF_RECORD_NAMESPACES. 390 391--transaction:: 392Record transaction flags for transaction related events. 393 394--per-thread:: 395Use per-thread mmaps. By default per-cpu mmaps are created. This option 396overrides that and uses per-thread mmaps. A side-effect of that is that 397inheritance is automatically disabled. --per-thread is ignored with a warning 398if combined with -a or -C options. 399 400-D:: 401--delay=:: 402After starting the program, wait msecs before measuring. This is useful to 403filter out the startup phase of the program, which is often very different. 404 405-I:: 406--intr-regs:: 407Capture machine state (registers) at interrupt, i.e., on counter overflows for 408each sample. List of captured registers depends on the architecture. This option 409is off by default. It is possible to select the registers to sample using their 410symbolic names, e.g. on x86, ax, si. To list the available registers use 411--intr-regs=\?. To name registers, pass a comma separated list such as 412--intr-regs=ax,bx. The list of register is architecture dependent. 413 414--user-regs:: 415Similar to -I, but capture user registers at sample time. To list the available 416user registers use --user-regs=\?. 417 418--running-time:: 419Record running and enabled time for read events (:S) 420 421-k:: 422--clockid:: 423Sets the clock id to use for the various time fields in the perf_event_type 424records. See clock_gettime(). In particular CLOCK_MONOTONIC and 425CLOCK_MONOTONIC_RAW are supported, some events might also allow 426CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI. 427 428-S:: 429--snapshot:: 430Select AUX area tracing Snapshot Mode. This option is valid only with an 431AUX area tracing event. Optionally, certain snapshot capturing parameters 432can be specified in a string that follows this option: 433 'e': take one last snapshot on exit; guarantees that there is at least one 434 snapshot in the output file; 435 <size>: if the PMU supports this, specify the desired snapshot size. 436 437In Snapshot Mode trace data is captured only when signal SIGUSR2 is received 438and on exit if the above 'e' option is given. 439 440--proc-map-timeout:: 441When processing pre-existing threads /proc/XXX/mmap, it may take a long time, 442because the file may be huge. A time out is needed in such cases. 443This option sets the time out limit. The default value is 500 ms. 444 445--switch-events:: 446Record context switch events i.e. events of type PERF_RECORD_SWITCH or 447PERF_RECORD_SWITCH_CPU_WIDE. 448 449--clang-path=PATH:: 450Path to clang binary to use for compiling BPF scriptlets. 451(enabled when BPF support is on) 452 453--clang-opt=OPTIONS:: 454Options passed to clang when compiling BPF scriptlets. 455(enabled when BPF support is on) 456 457--vmlinux=PATH:: 458Specify vmlinux path which has debuginfo. 459(enabled when BPF prologue is on) 460 461--buildid-all:: 462Record build-id of all DSOs regardless whether it's actually hit or not. 463 464--aio[=n]:: 465Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default: 1, max: 4). 466Asynchronous mode is supported only when linking Perf tool with libc library 467providing implementation for Posix AIO API. 468 469--affinity=mode:: 470Set affinity mask of trace reading thread according to the policy defined by 'mode' value: 471 node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer 472 cpu - thread affinity mask is set to cpu of the processed mmap buffer 473 474--mmap-flush=number:: 475 476Specify minimal number of bytes that is extracted from mmap data pages and 477processed for output. One can specify the number using B/K/M/G suffixes. 478 479The maximal allowed value is a quarter of the size of mmaped data pages. 480 481The default option value is 1 byte which means that every time that the output 482writing thread finds some new data in the mmaped buffer the data is extracted, 483possibly compressed (-z) and written to the output, perf.data or pipe. 484 485Larger data chunks are compressed more effectively in comparison to smaller 486chunks so extraction of larger chunks from the mmap data pages is preferable 487from the perspective of output size reduction. 488 489Also at some cases executing less output write syscalls with bigger data size 490can take less time than executing more output write syscalls with smaller data 491size thus lowering runtime profiling overhead. 492 493-z:: 494--compression-level[=n]:: 495Produce compressed trace using specified level n (default: 1 - fastest compression, 49622 - smallest trace) 497 498--all-kernel:: 499Configure all used events to run in kernel space. 500 501--all-user:: 502Configure all used events to run in user space. 503 504--kernel-callchains:: 505Collect callchains only from kernel space. I.e. this option sets 506perf_event_attr.exclude_callchain_user to 1. 507 508--user-callchains:: 509Collect callchains only from user space. I.e. this option sets 510perf_event_attr.exclude_callchain_kernel to 1. 511 512Don't use both --kernel-callchains and --user-callchains at the same time or no 513callchains will be collected. 514 515--timestamp-filename 516Append timestamp to output file name. 517 518--timestamp-boundary:: 519Record timestamp boundary (time of first/last samples). 520 521--switch-output[=mode]:: 522Generate multiple perf.data files, timestamp prefixed, switching to a new one 523based on 'mode' value: 524 "signal" - when receiving a SIGUSR2 (default value) or 525 <size> - when reaching the size threshold, size is expected to 526 be a number with appended unit character - B/K/M/G 527 <time> - when reaching the time threshold, size is expected to 528 be a number with appended unit character - s/m/h/d 529 530 Note: the precision of the size threshold hugely depends 531 on your configuration - the number and size of your ring 532 buffers (-m). It is generally more precise for higher sizes 533 (like >5M), for lower values expect different sizes. 534 535A possible use case is to, given an external event, slice the perf.data file 536that gets then processed, possibly via a perf script, to decide if that 537particular perf.data snapshot should be kept or not. 538 539Implies --timestamp-filename, --no-buildid and --no-buildid-cache. 540The reason for the latter two is to reduce the data file switching 541overhead. You can still switch them on with: 542 543 --switch-output --no-no-buildid --no-no-buildid-cache 544 545--switch-max-files=N:: 546 547When rotating perf.data with --switch-output, only keep N files. 548 549--dry-run:: 550Parse options then exit. --dry-run can be used to detect errors in cmdline 551options. 552 553'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj 554in config file is set to true. 555 556--tail-synthesize:: 557Instead of collecting non-sample events (for example, fork, comm, mmap) at 558the beginning of record, collect them during finalizing an output file. 559The collected non-sample events reflects the status of the system when 560record is finished. 561 562--overwrite:: 563Makes all events use an overwritable ring buffer. An overwritable ring 564buffer works like a flight recorder: when it gets full, the kernel will 565overwrite the oldest records, that thus will never make it to the 566perf.data file. 567 568When '--overwrite' and '--switch-output' are used perf records and drops 569events until it receives a signal, meaning that something unusual was 570detected that warrants taking a snapshot of the most current events, 571those fitting in the ring buffer at that moment. 572 573'overwrite' attribute can also be set or canceled for an event using 574config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'. 575 576Implies --tail-synthesize. 577 578SEE ALSO 579-------- 580linkperf:perf-stat[1], linkperf:perf-list[1] 581