1# hiprofiler 2 3<!--Kit: Performance Analysis Kit--> 4<!--Subsystem: HiviewDFX--> 5<!--Owner: @zyxzyx--> 6<!--Designer: @Maplestroy--> 7<!--Tester: @gcw_KuLfPSbe--> 8<!--Adviser: @foryourself--> 9 10 11## Overview 12 13 14hiprofiler consists of the system and application profiler frameworks. It provides a performance profiler platform for you to analyze memory and performance issues. 15 16 17Its overall architecture comprises the profiling data display page on the PC and performance profiling service on the device. The services on the PC and device use the C/S model, and the profiling data on the PC is displayed on the [DevEco Studio](https://cbg.huawei.com/#/group/ipd/DevEcoToolsList)/[SmartPerf](https://gitee.com/openharmony/developtools_smartperf_host) web page. The device program consists of multiple parts that run in the system environment. The **hiprofilerd** process that communicates with DevEco Studio is the profiling service. The device also contains the CLI tool (**hiprofiler_cmd**) and data collection process (**hiprofiler_plugins**). Based on the Producer-Consumer model, the profiling service controls the data collection process to obtain profiling data and sends the data to DevEco Studio. Currently, plugins such as nativehook, CPU, ftrace, GPU, hiperf, xpower, and memory have been implemented, providing comprehensive profiling capabilities for CPU, GPU, memory, and energy consumption. 18 19 20 21Benchmarking with profilers in the industry, hiprofiler provides more capabilities, such as [cross-language stack unwinding, power consumption data obtaining, and long-time heap memory stack capturing](#plugin-parameters). 22 23 24 25## Environment Requirements 26 27- The environment for OpenHarmony Device Connector (hdc) has been set up. For details, see [Environment Setup](hdc.md#environment-setup). 28 29- The devices are properly connected and **hdc shell** is executed. 30 31 32## Architecture 33 341. The PC calls **hiprofiler_cmd** on DevEco Studio/SmartPerf. 35 362. The **hiprofiler_cmd** process starts the **hiprofilerd** service and the **hiprofiler_plugins** process. 37 383. **hiprofiler_plugins** enables the corresponding plugin and summarizes the obtained profiling data to the hiprofilerd process. 39 404. The **hiprofilerd** process stores the profiling data in proto format to a file, or returns it to the PC in real time. 41 425. The PC parses the data, generates lanes, and displays the obtained profiling data. 43 44 45 46 47## Command Syntax 48 49You can use hiprofiler_cmd to call different profiler plugins and input different parameters for different profiling requirements. The following is an example command: 50 51```shell 52$ hiprofiler_cmd \ 53 -c - \ 54 -o /data/local/tmp/hiprofiler_data.htrace \ 55 -t 30 \ 56 -s \ 57 -k \ 58<<CONFIG 59 request_id: 1 60 session_config { 61 buffers { 62 pages: 16384 63 } 64 } 65 plugin_configs { 66 plugin_name: "ftrace-plugin" 67 sample_interval: 1000 68 config_data { 69 hitrace_categories: "binder" 70 buffer_size_kb: 204800 71 flush_interval_ms: 1000 72 flush_threshold_kb: 4096 73 trace_period_ms: 200 74 } 75 } 76CONFIG 77``` 78 79 80| Command| Description| 81| -------- | -------- | 82| -c | After setting this option, you need to place the configuration file in the **/data/local/tmp** directory and input the path.| 83| -o | Sets the custom file save path, which must start with **/data/local/tmp**. If no path is set, the profiling data is saved to **/data/local/tmp/hiprofiler_data.htrace** by default. If the profiling is performed repeatedly, the file in the original path will be overwritten.| 84| -k | Kills the existing profiling service process.| 85| -s | Starts the profiling service process.| 86| -t | Sets the profiling duration, in seconds.| 87 88 89After inputting the hiprofiler_cmd parameter, input the plugin configuration information. The configuration information starts with **<<CONFIG** and ends with **CONFIG**. The content in the middle is in JSON format. 90 91 92The following table describes the **session config** fields. 93 94 95| Field| Description| 96| -------- | -------- | 97| buffers | Number of shared memory pages.| 98| split_file | Whether to split a file. The value **true** means to split the file, and **false** means the opposite.| 99| split_file_max_size_mb | Maximum size of each split file when **split_file** is set to **true**.| 100 101 102The following table describes the **plugin_configs** fields. 103 104 105| Field| Description| 106| -------- | -------- | 107| plugin_name | Name of the plugin to enable.| 108| sample_interval | Interval for the plugin to obtain profiling data, in milliseconds.| 109| config_data | Parameters of the plugin. The parameters required by each plugin are different. For details, see the **proto** definition of the plugins.<br>(The code path is **developtools/profiler/protos**.)| 110 111 112Download the generated trace file to the local PC by running the **hdc file recv** command, and then upload the file to SmartPerf or DevEco Studio for parsing. 113 114 115## Plugins Supported 116 117<!--RP1--> 118| Name| Description| Specifications| 119| -------- | -------- | -------- | 120| native_hook | Obtains the call stack information about heap memory allocation.| | 121| ftrace-plugin | Obtains the trace events of kernel logging and the HiTrace logging data.| | 122| cpu-plugin | Obtains the CPU usage information of a process, including the process-level and thread-level usage.| | 123| gpu-plugin | Obtains the GPU usage information of a process.| | 124| xpower-plugin | Obtains the power consumption data of a process.| | 125| memory-plugin | Obtains the memory usage of a process, primarily the data from its **smaps** node.| In user mode, the file paths in **smaps** are not displayed.| 126| diskio plugin | Obtains the disk space usage of a process.| | 127| network profiler | Obtains the HTTP request information of a process through process logging.| | 128| network plugin | Obtains the network traffic information of a process.| | 129| hisysevent plugin | Obtains the HiSysEvent logging data by running the HiSysEvent commands.| | 130| hiperf plugin | Obtains the number of instructions and the corresponding stack of a process by running the HiPerf commands.| | 131| hidump plugin | Obtains the HiDump data by running the SP_daemon commands.| | 132<!--RP1End--> 133 134 135## Applications Signed by the Debug Certificate 136 137 138> **NOTE** 139> 140> Run the **hdc shell "bm dump -n bundlename | grep appProvisionType"** command to check whether the application specified in the command can be debugged. The expected output is **"appProvisionType": "debug"**. 141 142For example, run the following command to check the bundle name **com.example.myapplication**: 143 144```shell 145hdc shell "bm dump -n com.example.myapplication | grep appProvisionType" 146``` 147 148If the application is a debug application, the following information is displayed: 149 150```shell 151"appProvisionType": "debug", 152``` 153 154To build a debug application, you need to use a debug certificate for signature. For details about how to request and use the debug certificate, see [Requesting a Debug Certificate](https://developer.huawei.com/consumer/en/doc/app/agc-help-add-debugcert-0000001914263178). 155 156 157## Plugin Parameters 158 159**native_hook** 160 161Obtains the call stacks for heap memory allocations, including cross-language allocations (for example, using Node-API in ArkTS to allocate native heap memory), covering the **malloc**, **mmap**, **calloc**, and **realloc** functions. It can also display the call stacks of unreleased heap memory due to memory leak. 162 163Parameters 164 165| Name| Type| Description| Details| 166| -------- | -------- | -------- | -------- | 167| fp_unwind | bool | Whether to enable stack unwinding in fp mode. The value **true** means to enable stack unwinding in fp mode.<br>The value **false** means to enable stack unwinding in dwarf mode.| The stack unwinding in fp mode is implemented by using the x29 register, in which the function fp always points to the parent function (caller) fp. After stack unwinding, the profiling service calculates the relative PC based on the instruction pointer (IP) and searches for the corresponding mapping in maps for symbolization.<br>Due to increasingly aggressive compiler optimizations, register reuse and disabled fp can cause stack unwinding in fp mode to fail. In mixed stacks, the fp alone cannot capture all frames, so dwarf is required for more accurate stack rewinding.<br>The stack unwinding in dwarf mode is to search for the corresponding mapping information in the map table based on the PC register. The performance of dwarf is worse than that of fp because the call stack is parsed level by level in dwarf mode.<br>Note: fp stack unwinding does not support profiling for non-AArch64 devices.| 168| statistics_interval | int | Statistics interval, in seconds. Stacks in a statistics interval are summarized.| The statistics stack capture mode is provided to implement long-term lightweight collection. If profiling performance is a priority and you only need call counts and total stack size, use statistics mode.| 169| startup_mode | bool | Whether to capture the memory during process startup. By default, the memory during process startup is not captured.| This parameter records the heap memory allocation information during the period from when the process is started by AppSpawn to when the profiling ends. If a system-ability (SA) service is captured, locate the name (for example, **sa_main**) of the process that launches it in the corresponding .cfg file and add that name to this parameter.| 170| js_stack_report | int | Whether to enable cross-language stack unwinding.<br>The value **0** means not to capture the JS stack.<br>The value **1** means to capture the JS stack.| This parameter provides the cross-language stack unwinding feature for the Ark environment.| 171| malloc_free_matching_interval | int | Matching interval, in seconds. **malloc** and **free** are matched within the interval. If matched, the stack is not flushed to the disk.| Within the matching interval, the allocated and released call stacks are not recorded, reducing the overhead of the stack capture service process. If this parameter is set to a value greater than 0, **statistics_interval** cannot be set to **true**.| 172| offline_symbolization | bool | Whether to enable offline symbolization.<br>The value **true** means to enable offline symbolization;<br>the value **false** means the opposite.| When offline symbolization is used, the operation of matching symbols based on IP is transferred to SmartPerf, optimizing the performance of the native daemon and reducing process freezes. However, since the offline symbol table must be written into the trace file, the trace file generated under offline symbolization is larger in size than that under online symbolization.| 173| sample_interval | int | Sampling size.| When this parameter is set, the sampling mode is enabled. In sampling mode, malloc allocations smaller than the sampling size are accounted for probabilistically. The larger the call-stack allocation size, the more frequently it occurs and the greater its chance of being sampled.| 174 175Result examples: 176 177The fp stack unwinding and cross-language stack unwinding are enabled (green frames denote JavaScript). 178 179 180 181The dwarf stack unwinding and cross-language stack unwinding are enabled (native ->JS -> native stack frames are displayed). 182 183 184 185Statistics mode is enabled (the stack data is displayed periodically). 186 187 188 189Non-statistics mode is enabled. 190 191 192 193**ftrace_plugin**: 194 1951. Parameters 196 197| Name| Type| Description| Details| 198| -------- | -------- | -------- | -------- | 199| ftrace_events | string | Captured trace events.| Trace events that record kernel logging.| 200| hitrace_categories | string | Captured HiTrace logging information.| The HiTrace capability is called to obtain data and write the data to a file in proto format.| 201| buffer_size_kb | int | Buffer size, in KB.| Cache size required for the **hiprofiler_plugins** process to read kernel events. The default value **204800** is recommended.| 202| flush_interval_ms | int | Data collection interval, in ms.| The default value **1000** is recommended.| 203| flush_threshold_kb | int | Size of the data to refresh.| Data is refreshed to the file once when the threshold is exceeded. It is recommended that you use the default value of SmartPerf.| 204| parse_ksyms | bool | Whether to obtain kernel data.| The value **true** means to obtain kernel data, and **false** means the opposite.| 205| trace_period_ms | int | Period for reading kernel data.| It is recommended that you use the default value of SmartPerf.| 206 2072. Result analysis 208 209Example command: 210 211```shell 212$ hiprofiler_cmd \ 213 -c - \ 214 -o /data/local/tmp/hiprofiler_data.htrace \ 215 -t 10 \ 216 -s \ 217 -k \ 218<<CONFIG 219 request_id: 1 220 session_config { 221 buffers { 222 pages: 16384 223 } 224 } 225 plugin_configs { 226 plugin_name: "ftrace-plugin" 227 sample_interval: 1000 228 config_data { 229 ftrace_events: "binder/binder_transaction" 230 ftrace_events: "binder/binder_transaction_received" 231 buffer_size_kb: 204800 232 flush_interval_ms: 1000 233 flush_threshold_kb: 4096 234 parse_ksyms: true 235 clock: "boot" 236 trace_period_ms: 200 237 debug_on: false 238 } 239 } 240CONFIG 241``` 242 243This command reads the kernel **binder_transaction** and **binder_transaction_received** data. The two fields must be used together to completely display the data at both ends of the binder. After the command is executed, run the **hdc file recv** command to export the file and drag the file to SmartPerf for parsing. The following figure shows an example result. 244 245You can click the arrow on the right of **binder transaction** to go to the process/thread on the other end of the binder. 246 247 248 249**memory_plugin**: 250 2511. Parameters 252 253| Name| Type| Description| Details| 254| -------- | -------- | -------- | -------- | 255| report_sysmem_vmem_info | bool | Whether to read virtual memory data.| Data is read from the **/proc/vmstat** node.| 256| report_process_mem_info | bool | Whether to obtain the detailed memory data of a process, such as **rss_shmem**, **rss_file**, and **vm_swap**.| Data is read from the **/proc/${pid}/stat** node.| 257| report_smaps_mem_info | bool | Whether to obtain the smaps memory information of a process.| Data is read from the **/proc/${pid}/smaps** node.| 258| report_gpu_mem_info | bool | Whether to obtain the GPU usage of a process.| Data is read from the **/proc/gpu_memory** node.| 259| parse_smaps_rollup | bool | Whether to refresh the data size.| Data is read from the **/proc/{pid}/smaps_rollup** node. The profiling effect (such as CPU and memory) is better than that of using the **report_smaps_mem_info** parameter.| 260 2612. Result analysis 262 263 264 265You can go to **DevEco Studio** -> **Profiler** -> **Allocation** and select **Memory** to use the **memory plug-in** feature of the profiler. The preceding figure shows the process smaps memory information in the selected time range. 266 267**xpower_plugin**: 268 2691. Parameters 270 271| Name| Type| Description| Details| 272| -------- | -------- | -------- | -------- | 273| bundle_name | string | Name of the process for which power consumption profiling is required.| The value must be the same as the process name in the **/proc/ directory**.| 274| message_type | XpowerMessageType | Type of the power consumption data to be obtained.| The data types include **REAL_BATTERY**, **APP_STATISTIC**, **APP_DETAIL**, **COMPONENT_TOP**, **ABNORMAL_EVENTS**, and **THERMAL_REPORT**.| 275 2762. Result analysis 277 278 279 280You can go to **DevEco Studio** -> **Profiler** -> **Realtime Monitor** to obtain the power consumption data of related processes. 281 282**gpu_plugin**: 283 284Obtaining GPU usage information 285 2861. Parameters 287 288| Name| Type| Description| Details| 289| -------- | -------- | -------- | -------- | 290| pid | int | Name of the process to profile.| The value must be the same as the process name in the **/proc/ directory**.| 291| report_gpu_info | bool | Whether to display the GPU usage of a specified process.| The value **true** means to display the GPU data of a specified process. In this case, you need to set **pid**. <br>Data is read from the **/sys/class/devfreq/gpufreq/gpu_scene_aware/utilisation** node.<br>The value **false** means not to display the GPU data of a specified process.| 292 293**cpu_plugin**: 294 295Obtaining CPU usage information 296 2971. Parameters 298 299| Name| Type| Description| Details| 300| -------- | -------- | -------- | -------- | 301| pid | int | Name of the process to profile.| The value must be the same as the process name in the **/proc/ directory**.| 302| report_process_info | bool | Whether to display the CPU usage of a specified process.| The value **true** means to display the data of a specified process and you need to set the **pid** parameter.<br>The value false means to display the system CPU usage data.| 303| skip_thread_cpu_info | bool | Whether to skip the thread CPU usage data.| The value **true** means to not display the CPU usage of each thread. When this parameter is set to **true**, the profiling service overhead is reduced.<br>The value false means to display the CPU usage of each thread.| 304 305 306## Common Commands 307 308 309### Sampling Records of Heap Memory Allocation Call Stack Data 310 311 312Capture the stack for heap memory allocation of the **com.example.insight_test_stage** process. Enable fp stack unwinding, offline symbolization, and statistics mode. 313 314 315```shell 316$ hiprofiler_cmd \ 317 -c - \ 318 -t 30 \ 319 -s \ 320 -k \ 321<<CONFIG 322 request_id: 1 323 session_config { 324 buffers { 325 pages: 16384 326 } 327 } 328 plugin_configs { 329 plugin_name: "nativehook" 330 sample_interval: 5000 331 config_data { 332 save_file: false 333 smb_pages: 16384 334 max_stack_depth: 20 335 process_name: "com.example.insight_test_stage" 336 string_compressed: true 337 fp_unwind: true 338 blocked: true 339 callframe_compress: true 340 record_accurately: true 341 offline_symbolization: true 342 startup_mode: false 343 statistics_interval: 10 344 sample_interval: 256 345 js_stack_report: 1 346 max_js_stack_depth: 10 347 } 348 } 349CONFIG 350``` 351 352 353The collected data is saved to the **/data/local/tmp/hiprofiler_data.htrace** file, which contains the function call information, thread and dynamic library memory allocation information, call stack count and allocation size required for memory leak analysis. Enabling offline symbolization, fp stack unwinding, and statistics mode can improve the data processing efficiency of the profiling service. 354 355 356 357Capturing the CPU usage of a specified process 358 359 360Collect CPU data of the process whose process ID is **1234**. The collection duration is 30s, the sampling period is 1000 ms, the size of the shared memory for transmitting profiling data is 16384 memory pages, and the collected data is saved to the **/data/local/tmp/hiprofiler_data.htrace** file. 361 362 363```shell 364$ hiprofiler_cmd \ 365 -c - \ 366 -o /data/local/tmp/hiprofiler_data.htrace \ 367 -t 30 \ 368 -s \ 369 -k \ 370<<CONFIG 371 request_id: 1 372 session_config { 373 buffers { 374 pages: 16384 375 } 376 } 377 plugin_configs { 378 plugin_name: "cpu-plugin" 379 sample_interval: 1000 380 config_data { 381 pid: 1234 382 report_process_info: true 383 } 384 } 385CONFIG 386``` 387 388 389 390## FAQs 391 392What should I do if an exception occurs during profiling? 393 394**Symptom** 395 396"Service not started" is displayed when the **hiprofiler_cmd** command is executed. 397 398 399 400**Possible Causes and Solution** 401 402If the profiling service is not started, DevEco Studio is being used for profiling or the previous profiling exits frequently. In this case, run the **hiprofiler_cmd -k** command and then run the profiling command again. 403 404What should I do if the captured trace file is empty? 405 406**Symptom** 407 408The captured trace file is empty. 409 410**Possible Causes and Solution** 411 412Check whether the generated file is in the **/data/local/tmp/** directory. If the target path is a folder in **/data/local/tmp**, run the **chmod 777** command on the folder. If the user version uses **nativehook** or **network profiler** to capture a no-debug application, no data can be captured. (For details, see changelog https://gitcode.com/openharmony/docs/pulls/57419.) 413 414What should I do if I suspect that the profiling data is inaccurate? 415 416**Symptom** 417 418The native heap captured by hiprofiler is different from that viewed by hidumper. 419 420**Possible Causes and Solution** 421 422hidumper captures the process-level memory usage, while hiprofiler captures the heap memory data allocated by the user-mode process using basic library functions such as **malloc**, **mmap**, and **realloc**. The **operator new** function also calls **malloc**. Therefore, the native heap information capture by them differs in the thread memory cache, heap memory release delay, and memory used by the loader. 423