1# hiperf 2 3<!--Kit: Performance Analysis Kit--> 4<!--Subsystem: HiviewDFX--> 5<!--Owner: @leiguangyu--> 6<!--Designer: @Maplestroy--> 7<!--Tester: @gcw_KuLfPSbe--> 8<!--Adviser: @foryourself--> 9 10hiperf is a command line tool that integrates multiple performance analysis capabilities, enabling you to identify system bottlenecks, locate software hotspots, optimize code efficiency, and collect and analyze runtime performance data. 11 12 13You can preferentially use a graphical frontend tool such as [DevEco Studio](https://developer.huawei.com/consumer/en/doc/harmonyos-guides/ide-insight-session-time) or [SmartPerf](https://gitee.com/openharmony/developtools_smartperf_host/blob/master/smartperf_host/ide/src/doc/md/quickstart_hiperf.md) to collect the call stack of a function, obtain the execution time of the function at each layer in the call stack, and view the call chain information in a swimlane diagram for performance analysis. To specify the event, sampling period, collection duration, and number of CPU cores, you can use HiPerf. The **perf.data** file can be opened using SmartPerf and displayed in a flame graph. 14 15 16This topic describes how to use hiperf to perform performance analysis. 17 18 19## Environment Setup 20 21- The environment for OpenHarmony Device Connector (hdc) has been set up. For details, see [Environment Setup](hdc.md#environment-setup). 22 23- The devices are properly connected and **hdc shell** is executed. 24 25 26## Command Syntax 27 28Run the **hiperf --help** command to list all hiperf commands, including **dump**, **list**, **record**, **report**, and **stat**. 29 30```shell 31$ hiperf --help 32``` 33 34 35| Command| Description| 36| -------- | -------- | 37| --hilog | Records logs generated during program running to HiLog.| 38| --logpath | Sets the save path of log files. You can set the output file path to **/data/local/tmp/** and customize the file name.| 39| --logtag | Enables logs of a specified funtionality.| 40| --debug | Records **debug** logs.| 41| --verbose | Records **verbose** logs.| 42| --much | Records **much** logs.| 43| --nodebug | Disables all logs.| 44| --mixlog | Outputs logs to the CLI.| 45| -h/--help | Displays the help information.| 46| [dump](#dump)| Converts the performance data file (for example, **perf.data**) into a readable format.| 47| [list](#list)| Displays the performance event types supported by the system.| 48| [record](#record)| Collects performance data.| 49| [report](#report)| Converts performance data into visualized data.| 50| [stat](#stat)| Collects statistics on performance data.| 51 52 53**Example** 54 55 56```shell 57$ hiperf --help 58Usage: hiperf [options] command [args for command] 59options: 60 --debug show debug log, usage format: --debug [command] [args] 61 --help show help 62 --hilog use hilog not file to record log 63 --logpath log file name full path, usage format: --logpath [filepath] [command] [args] 64 --logtag enable log level for HILOG_TAG, usage format: --logtag <tag>[:level][,<tag>[:level]] [command] [args] 65 tag: Dump, Report, Record, Stat... level: D, V, M... 66 example: hiperf --verbose --logtag Record:D [command] [args] 67 --mixlog mix the log in output, usage format: --mixlog [command] [args] 68 --much show extremely much debug log, usage format: --much [command] [args] 69 --nodebug disable debug log, usage format: --nodebug [command] [args] 70 --verbose show debug log, usage format: --verbose [command] [args] 71 -h show help 72command: 73 dump: Dump content of a perf data file, like perf.data 74 help: Show more help information for hiperf 75 list: List the supported event types. 76 record: Collect performance sample information 77 report: report sampling information from perf.data format file 78 stat: Collect performance counter information 79 80See 'hiperf help [command]' for more information on a specific command. 81``` 82 83 84## Common Commands 85 86 87### Recording Performance Data Sampling 88 89 901. Sample the process **1234** for 10 seconds. Set the stack unwinding mode to **fp**, sampling frequency to **1000** times per second, event types to **hw-cpu-cycles** and **hw-instructions**, and save the sampling file to **/data/local/tmp/perf.data**. 91 92 93```shell 94$ hiperf record -p 1234 -s fp -f 1000 -d 10 -e hw-cpu-cycles,hw-instructions -o /data/local/tmp/perf.data 95Profiling duration is 10.000 seconds. 96Start Profiling... 97Timeout exit (total 10335 ms) 98Process and Saving data... 99Hiperf is not running as root mode. Do not need load kernel syms 100[ hiperf record: Captured 3.014 MB perf data. ] 101[ Sample records: 1293, Non sample records: 855 ] 102[ Sample lost: 0, Non sample lost: 0 ] 103``` 104 105 106The collected data is saved as a **perf.data** file in binary format, which contains the sampling data, process information, symbol table, and function calls required for performance analysis. You can use the flame graph script to convert the sampling data into a flame graph to identify system performance bottlenecks, locate software hotspots, and optimize code efficiency. 107 108 1092. Sample the application **com.example.insight_test_stage**. Set the sampling duration to **10s**, stack unwinding mode to **dwarf** (debug information table), sampling period to **1000**, event types to **hw-cpu-cycles** and **hw-instructions**, and use the default save path. 110 111 112```shell 113$ hiperf record --app com.example.insight_test_stage -d 10 -s dwarf --period 1000 -e hw-cpu-cycles,hw-instructions 114Profiling duration is 10.000 seconds. 115Start Profiling... 116Timeout exit (total 10000 ms) 117Process and Saving data... 118Hiperf is not running as root mode. Do not need load kernel syms 119[ hiperf record: Captured 0.296 MB perf data. ] 120[ Sample records: 0, Non sample records: 2640 ] 121[ Sample lost: 0, Non sample lost: 0 ] 122``` 123 124 125The collected data is saved to the default path **/data/local/tmp/perf.data**. 126 127 128### Collecting Performance Statistics 129 130 1311. Count the **1745** and **1910** processes for 10 seconds. 132 133 134``` 135$ hiperf stat -d 10 -p 1745,1910 136Profiling duration is 10.000 seconds. 137Start Profiling... 138Timeout exit (total 10000 ms) 139 count name | comment | coverage 140 148,450 hw-branch-instructions | 26.404 M/sec | (100%) 141 49,833 hw-branch-misses | 33.568878 miss rate | (100%) 142 8,986,523 hw-cpu-cycles | 1.598409 GHz | (100%) 143 1,283,596 hw-instructions | 7.001053 cycles per instruction | (100%) 144 63 sw-context-switches | 11.206 K/sec | (100%) 145 0 sw-page-faults | 0.000 /sec | (100%) 146 5,622,169 sw-task-clock | 0.000562 cpus used | (100%) 147``` 148 149 1502. Count processes **1745** and **1910** for **10** seconds, with event types set to **hw-cpu-cycles**, **hw-instructions**, and **sw-task-clock**, and a print interval of **3000** ms. 151 152 153``` 154$ hiperf stat -d 10 -p 1745,1910 -e hw-cpu-cycles,hw-instructions,sw-task-clock -i 3000 155Profiling duration is 10.000 seconds. 156Start Profiling... 157Report at 3000 ms (6999 ms left): 158 count name | comment | coverage 159 2,534,675 hw-cpu-cycles | 1.717114 GHz | (100%) 160 324,279 hw-instructions | 7.816340 cycles per instruction | (100%) 161 1,476,125 sw-task-clock | 0.000492 cpus used | (100%) 162Report at 6000 ms (3999 ms left): 163 count name | comment | coverage 164 5,112,570 hw-cpu-cycles | 1.724259 GHz | (100%) 165 648,303 hw-instructions | 7.886081 cycles per instruction | (100%) 166 2,965,083 sw-task-clock | 0.000494 cpus used | (100%) 167Report at 9000 ms (999 ms left): 168 count name | comment | coverage 169 7,870,422 hw-cpu-cycles | 1.724897 GHz | (100%) 170 994,407 hw-instructions | 7.914689 cycles per instruction | (100%) 171 4,562,835 sw-task-clock | 0.000507 cpus used | (100%) 172Timeout exit (total 10000 ms) 173``` 174 175 1763. Count the process **1910**, with the counting duration set to **3** seconds and the event types to **hw-cpu-cycles** and **hw-instructions**, and print detailed information. 177 178 179``` 180$ hiperf stat -d 3 -p 1910 -e hw-cpu-cycles,hw-instructions --verbose 181Profiling duration is 3.000 seconds. 182Start Profiling... 183Timeout exit (total 3000 ms) 184hw-cpu-cycles id:1342(c-1:p1910) timeEnabled:133583 timeRunning:133583 value:255740 185hw-cpu-cycles id:1343(c-1:p1988) timeEnabled:0 timeRunning:0 value:0 186hw-cpu-cycles id:1344(c-1:p1989) timeEnabled:0 timeRunning:0 value:0 187hw-cpu-cycles id:1345(c-1:p1990) timeEnabled:187833 timeRunning:187833 value:331425 188... 189hw-instructions id:1375(c-1:p1910) timeEnabled:133583 timeRunning:133583 value:36485 190hw-instructions id:1376(c-1:p1988) timeEnabled:0 timeRunning:0 value:0 191hw-instructions id:1377(c-1:p1989) timeEnabled:0 timeRunning:0 value:0 192hw-instructions id:1378(c-1:p1990) timeEnabled:187833 timeRunning:187833 value:47816 193... 194 count name | comment | coverage 195 669,850 hw-cpu-cycles | | (100%) 196 94,903 hw-instructions | 7.058259 cycles per instruction | (100%) 197``` 198 199 200## Debug-Type Applications 201 202 203> **NOTE** 204> 205> The **hiperf record/stat -p [pid]** command should be used for applications signed by the debug certificate. 206> 207> Run the **hdc shell "bm dump -n bundlename | grep appProvisionType"** command to check whether the application specified in the command is a debug-type application. The expected output is **"appProvisionType": "debug"**. 208> 209> For example, run the following command to check the bundle name **com.example.myapplication**: 210> 211> ```shell 212> hdc shell "bm dump -n com.example.myapplication | grep appProvisionType" 213> ``` 214> 215> If the application is a debug-type application, the following information is displayed: 216> 217> ```shell 218> "appProvisionType": "debug", 219> ``` 220> 221> To build a debug-type application, you need to use a debug certificate for signature. For details about how to request and use the debug certificate, see [Requesting a Debug Certificate](https://developer.huawei.com/consumer/en/doc/app/agc-help-add-debugcert-0000001914263178). 222 223 224## list 225 226Displays the performance event types supported by the system, which can be used as parameters of the **-e** option in the **record** and **stat** commands. 227 228**Parameters** 229 230| Name| Description| 231| -------- | -------- | 232| -h/--help | Displays the help information.| 233| hw | Lists the hardware events.<br>The following events are supported:<br>- hw-cpu-cycles<br>- hw-instructions<br>- hw-cache-references<br>- hw-cache-misses<br>- hw-branch-instructions<br>- hw-branch-misses<br>- hw-bus-cycles<br>- hw-stalled-cycles-frontend<br>- hw-stalled-cycles-backend | 234| sw | Lists the software events.| 235| tp | Lists the tracepoint event.| 236| cache | Lists the hardware cache events.| 237| raw | Lists original performance monitoring unit (PMU) events.| 238 239**Example** 240 241``` 242Usage: hiperf list [event type name] 243``` 244 245Query the supported hardware event types. 246 247 248``` 249$ hiperf list hw 250event not support hw-ref-cpu-cycles 251 252Supported events for hardware: 253 hw-cpu-cycles 254 hw-instructions 255 hw-cache-references 256 hw-cache-misses 257 hw-branch-instructions 258 hw-branch-misses 259 hw-bus-cycles 260 hw-stalled-cycles-frontend 261 hw-stalled-cycles-backend 262``` 263 264 265## record 266 267Collects the performance data of a specified process or application, including the CPU cycle, number of instructions, and function calls, and saves the sampling data to a specified file (**/data/local/tmp/perf.data** by default). 268 269**Parameters of the record command** 270 271<!--RP1--> 272| Parameter| Description| 273| -------- | -------- | 274| -h/--help | Displays the help information.| 275| -c | Sets the ID of the CPU to collect its data.| 276| --cpu-limit | Sets the maximum CPU usage during collection. The value ranges from 1 to 100. The default value is 25.| 277| -d | Sets the collection duration, in seconds. This parameter cannot be used together with **--control**.| 278| -f | Sets the collection frequency. The default value is **4000** times per second. This parameter cannot be used together with **--period**.| 279| --period | Sets the event collection period, that is, the number of events for each collection. This parameter cannot be used together with **-f**.| 280| -e | Sets the event to collect. Multiple event types are supported; separate them with commas. You can run the **list** command to obtain the supported event types.| 281| -g | Specifies the event groups to collect, which are separated by commas (,).| 282| --no-inherit | Collects no subprocess data.| 283| -p | Specifies the process ID to collect. Multiple process IDs are supported; separate them with commas (,). This parameter cannot be used together with **-a**.| 284| -t | Specifies the thread ID to collect. Multiple thread IDs are supported; separate them with commas (,). This parameter cannot be used together with **-a**.| 285| --exclude-tid | Specifies the thread ID not to collect. Multiple thread IDs are supported; separate them with commas (,). This parameter cannot be used together with **-a**.| 286| --exclude-thread | Specifies the thread name not to collect. Multiple thread names are supported; separate them with commas (,). This parameter cannot be used together with **-a**.| 287| --offcpu | Traces the time when a thread is out of CPU scheduling.| 288| -j | Samples branch stacks. The following filters are supported: **any**, **any_call**, **any_ret**, **ind_call**, **ind_jmp**, **cond** and **call**.| 289| -s/--callstack | Sets the stack unwinding mode, which can be **fp** (stack pointer) or **dwarf** (debug information table). The default mode is **fp**.| 290| --kernel-callchain | Collects kernel-mode stacks. This parameter must be used together with the **-s** parameter.| 291| --callchain-useronly | Collects only user stacks.| 292| --delay-unwind | Delays call stack unwinding until after recording when the stack mode is set to **dwarf**.| 293| --disable-unwind | Disables call stack unwinding after recording when the stack mode is set to **dwarf**.| 294| --disable-callstack-expand | Merges the call stacks using the cached thread stack when the stack mode is set to **dwarf**.| 295| --enable-debuginfo-symbolic | Parses the symbols in the **.gnu_debugdata** section of elf when **-s fp/dwarf** is set. By default, the symbols are not parsed.| 296| --clockid | Sets the collection clock type, which can be **monotonic** or **monotonic_raw**. Some events support the **boottime**, **realtime**, and **clock_tai clock** types.| 297| --symbol-dir | Sets the symbol table file path, which is used for symbolization during collection.| 298| -m | Sets the number of mmap pages. Value range: 2 to 1024. The default value is **1024**.| 299| --app | Sets the application names to collect. Use commas (,) to separate them. The application must already be running. If it has not started, the command waits up to 20s and then exits automatically. This parameter cannot be used together with **-a**.| 300| --chkms | Sets the query interval, in milliseconds. The value ranges from 1 to 200. The default value is **10**.| 301| --data-limit | Sets the limit of the output data size. When this limit is reached, the collection stops. By default, there is no limit.| 302| -o | Sets the output file path. You can set the output file path to **/data/local/tmp/** and customize the file name.| 303| -z | Outputs the data in a .gz file.| 304| --restart | Collects performance metrics about application startup. If the process is not started within 30 seconds, the collection stops.| 305| --verbose | Outputs a more detailed report.| 306| --control [command] | Controls the collection operation. The following commands are supported: **prepare**/**start**/**pause**/**resume**/**output**/**stop**. This parameter cannot be used together with **-d**.| 307| --dedup_stack | Deletes duplicate stacks from the record.| 308| --cmdline-size | Sets the value of the **/sys/kernel/tracing/saved_cmdlines_size** node, in bytes. The value ranges from 512 to 4096.| 309| --report | Collects the backtrace report.| 310| --backtrack | Collects data in a previous period. This parameter must be used together with **--control prepare**.| 311| --backtrack-sec | Collects the duration of previous data, in seconds. The value ranges from 5 to 30. The default value is **10**. This parameter must be used together with **--backtrack**.| 312| --dumpoptions | Displays the collection parameter details.| 313| -a | Collects the device performance data.| 314| --exclude-hiperf | Excludes the performance data of the hiperf process. This parameter must be used together with **-a**.| 315| --exclude-process | Specifies the process name not to collect. This parameter must be used together with **-a**.| 316<!--RP1End--> 317 318**Example** 319 320``` 321Usage: hiperf record [options] [command [command-args]] 322``` 323 324Sample the process 267 for 10 seconds and use **dwarf** to unwind the stack. 325 326``` 327$ hiperf record -p 267 -d 10 -s dwarf 328``` 329 330 331## stat 332 333Monitors the specified application and periodically prints the values of performance counters. 334 335**Parameters of the stat command** 336 337<!--RP2--> 338| Parameter| Description| 339| -------- | -------- | 340| -h/--help | Displays the help information.| 341| -c | Sets the ID of the CPU to collect its data.| 342| -d | Sets the collection duration, in seconds. This parameter cannot be used together with **--control**.| 343| -i | Sets the interval for printing **stat** information, in milliseconds.| 344| -e | Specifies the events to collect. Multiple events are supported; use commas (,) to separate them.| 345| -g | Specifies the event groups to collect, which are separated by commas (,). You can run the **list** command to obtain the supported event types.| 346| --no-inherit | Collects no subprocess data.| 347| -p | Specifies the process ID to collect. Multiple process IDs are supported; separate them with commas (,). This parameter cannot be used together with **-a**.| 348| -t | Specifies the thread ID to collect. Multiple thread IDs are supported; separate them with commas (,). This parameter cannot be used together with **-a**.| 349| --app | Sets the application names to collect. Use commas (,) to separate them. The application must already be running. If it has not started, the command waits up to 20s and then exits automatically. This parameter cannot be used together with **-a**.| 350| --chkms | Sets the query interval, in milliseconds. The value ranges from 1 to 200. The default value is **10**.| 351| --per-core | Obtains the print count of each CPU core.| 352| --pre-thread | Obtains the print count of each thread.| 353| --restart | Collects performance indicator information about application startup. If a process is not started within 30 seconds, the record exits. This parameter must be used together with **--app**.| 354| --verbose | Outputs detailed information.| 355| --dumpoptions | Displays details about all options in the list.| 356| --control [command] | Controls the collection operation. The commands include **prepare**, **start**, and **stop**. This parameter cannot be used together with **-d**.<br>**NOTE**: This parameter is supported since API version 20.| 357| -o | Sets the output file path. You can set the output file path to **/data/local/tmp/** and customize the file name. This parameter must be used with **--control prepare**, and cannot be used with **--control**.<br>**NOTE**: This parameter is supported since API version 20.| 358| -a | Collects the device performance data.| 359 360**Example** 361 362``` 363hiperf stat [options] [command [command-args]] 364``` 365 366Run the **stat** command to monitor the performance data of the process **2349** that runs on CPU 0 for three seconds. 367 368``` 369$ hiperf stat -p 1745 -d 3 -c 0 370``` 371 372 373## dump 374 375Converts performance data files in different formats (for example, **perf.data**) into plain texts for you to check the correctness of original sampling data. 376 377**Parameters of the dump command** 378 379| Parameter| Description| 380| -------- | -------- | 381| -h/--help | Displays the help information.| 382| --head | Outputs only the data header and attributes.| 383| -d | Outputs only the data segment.| 384| -f | Outputs only additional functions.| 385| --sympath | Specifies the path of the symbol table file.| 386| -i | Specifies the path of the sampling file.| 387| -o | Sets the output file path. You can set the output file path to **/data/local/tmp/** and customize the file name. If this parameter is not set, the data is output to the CLI.| 388| --elf | Converts the ELF file to a readable plaintext.| 389| --proto | Converts the .proto file to a readable plaintext.| 390| --export | Splits the user stack data into multiple files.| 391 392**Example** 393 394``` 395Usage: hiperf dump [option] \<filename\> 396``` 397 398Run the **dump** command to read the **/data/local/tmp/perf.data** file and export it to the **/data/local/tmp/perf.dump** file. 399 400``` 401$ hiperf dump -i /data/local/tmp/perf.data -o /data/local/tmp/perf.dump 402``` 403 404 405## report 406 407Converts the sampling data (**perf.data**) to the specified format (such as JSON or ProtoBuf), groups samples belonging to the same process, thread, or function into individual sample entries, sorts these entries by event count, and displays them in a report. 408 409**Parameters of the report command** 410 411| Parameter| Description| 412| -------- | -------- | 413| -h/--help | Displays the help information.| 414| --symbol-dir | Specifies the path of the symbol table file.| 415| --limit-percent | Filters performance data whose share is at least the specified percentage (1 to 100). Only entries meeting this threshold are included in the report.| 416| -s | Displays the stack mode.| 417| --call-stack-limit-percent | Displays the stack content of a specified proportion. The value ranges from 1 to 100.| 418| -i | Specifies the resource file path. The default value is **perf.data**.| 419| -o | Sets the output file path. You can set the output file path to **/data/local/tmp/** and customize the file name. If this parameter is not set, the data is output to the CLI.| 420| --proto | Outputs data in ProtoBuf format.| 421| --json | Outputs data in JSON format.| 422| --diff | Displays the differences between the source file and the converted file. This parameter cannot be used together with **--proto**, **--json**, or **-s**.| 423| --branch | Displays the branches based on the function address.| 424| --<keys> <keyname1>[,keyname2][,...] | Specifies the keywords, which can be **comms**, **pids**, **tids**, **dsos**, **funcs**, **from_dsos** or **from_funcs**, for example, **--comms hiperf**.| 425| --sort [key1],[key2],[...] | Sorts the data by keyword.| 426| --hide_count | Hides values in the report.| 427| --dumpoptions | Displays details about all options in the list.| 428 429**Example** 430 431``` 432Usage: hiperf report [option] \<filename\> 433``` 434 435Extract key data that has a great impact on performance (≥ 1%) from the **perf.data** file and displays the data in a report. 436``` 437$ hiperf report -i /data/local/tmp/perf.data --limit-percent 1 438``` 439