• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# hiprofiler
2
3<!--Kit: Performance Analysis Kit-->
4<!--Subsystem: HiviewDFX-->
5<!--Owner: @zyxzyx-->
6<!--Designer: @Maplestroy-->
7<!--Tester: @gcw_KuLfPSbe-->
8<!--Adviser: @foryourself-->
9
10
11## Overview
12
13
14hiprofiler consists of the system and application profiler frameworks. It provides a performance profiler platform for you to analyze memory and performance issues.
15
16
17Its overall architecture comprises the profiling data display page on the PC and performance profiling service on the device. The services on the PC and device use the C/S model, and the profiling data on the PC is displayed on the [DevEco Studio](https://cbg.huawei.com/#/group/ipd/DevEcoToolsList)/[SmartPerf](https://gitee.com/openharmony/developtools_smartperf_host) web page. The device program consists of multiple parts that run in the system environment. The **hiprofilerd** process that communicates with DevEco Studio is the profiling service. The device also contains the CLI tool (**hiprofiler_cmd**) and data collection process (**hiprofiler_plugins**). Based on the Producer-Consumer model, the profiling service controls the data collection process to obtain profiling data and sends the data to DevEco Studio. Currently, plugins such as nativehook, CPU, ftrace, GPU, hiperf, xpower, and memory have been implemented, providing comprehensive profiling capabilities for CPU, GPU, memory, and energy consumption.
18
19
20
21Benchmarking with profilers in the industry, hiprofiler provides more capabilities, such as [cross-language stack unwinding, power consumption data obtaining, and long-time heap memory stack capturing](#plugin-parameters).
22
23
24
25## Environment Requirements
26
27- The environment for OpenHarmony Device Connector (hdc) has been set up. For details, see [Environment Setup](hdc.md#environment-setup).
28
29- The devices are properly connected and **hdc shell** is executed.
30
31
32## Architecture
33
341. The PC calls **hiprofiler_cmd** on DevEco Studio/SmartPerf.
35
362. The **hiprofiler_cmd** process starts the **hiprofilerd** service and the **hiprofiler_plugins** process.
37
383. **hiprofiler_plugins** enables the corresponding plugin and summarizes the obtained profiling data to the hiprofilerd process.
39
404. The **hiprofilerd** process stores the profiling data in proto format to a file, or returns it to the PC in real time.
41
425. The PC parses the data, generates lanes, and displays the obtained profiling data.
43
44![en-us_image_0000002381835609](figures/en-us_image_0000002381835609.png)
45
46
47## Command Syntax
48
49You can use hiprofiler_cmd to call different profiler plugins and input different parameters for different profiling requirements. The following is an example command:
50
51```shell
52$ hiprofiler_cmd \
53  -c - \
54  -o /data/local/tmp/hiprofiler_data.htrace \
55  -t 30 \
56  -s \
57  -k \
58<<CONFIG
59 request_id: 1
60 session_config {
61  buffers {
62   pages: 16384
63  }
64 }
65 plugin_configs {
66  plugin_name: "ftrace-plugin"
67  sample_interval: 1000
68  config_data {
69   hitrace_categories: "binder"
70   buffer_size_kb: 204800
71   flush_interval_ms: 1000
72   flush_threshold_kb: 4096
73   trace_period_ms: 200
74  }
75 }
76CONFIG
77```
78
79
80| Command| Description|
81| -------- | -------- |
82| -c | After setting this option, you need to place the configuration file in the **/data/local/tmp** directory and input the path.|
83| -o | Sets the custom file save path, which must start with **/data/local/tmp**. If no path is set, the profiling data is saved to **/data/local/tmp/hiprofiler_data.htrace** by default. If the profiling is performed repeatedly, the file in the original path will be overwritten.|
84| -k | Kills the existing profiling service process.|
85| -s | Starts the profiling service process.|
86| -t | Sets the profiling duration, in seconds.|
87
88
89After inputting the hiprofiler_cmd parameter, input the plugin configuration information. The configuration information starts with **<<CONFIG** and ends with **CONFIG**. The content in the middle is in JSON format.
90
91
92The following table describes the **session config** fields.
93
94
95| Field| Description|
96| -------- | -------- |
97| buffers | Number of shared memory pages.|
98| split_file | Whether to split a file. The value **true** means to split the file, and **false** means the opposite.|
99| split_file_max_size_mb | Maximum size of each split file when **split_file** is set to **true**.|
100
101
102The following table describes the **plugin_configs** fields.
103
104
105| Field| Description|
106| -------- | -------- |
107| plugin_name | Name of the plugin to enable.|
108| sample_interval | Interval for the plugin to obtain profiling data, in milliseconds.|
109| config_data | Parameters of the plugin. The parameters required by each plugin are different. For details, see the **proto** definition of the plugins.<br>(The code path is **developtools/profiler/protos**.)|
110
111
112Download the generated trace file to the local PC by running the **hdc file recv** command, and then upload the file to SmartPerf or DevEco Studio for parsing.
113
114
115## Plugins Supported
116
117<!--RP1-->
118| Name| Description| Specifications|
119| -------- | -------- | -------- |
120| native_hook | Obtains the call stack information about heap memory allocation.|  |
121| ftrace-plugin | Obtains the trace events of kernel logging and the HiTrace logging data.|  |
122| cpu-plugin | Obtains the CPU usage information of a process, including the process-level and thread-level usage.|  |
123| gpu-plugin | Obtains the GPU usage information of a process.|  |
124| xpower-plugin | Obtains the power consumption data of a process.|  |
125| memory-plugin | Obtains the memory usage of a process, primarily the data from its **smaps** node.| In user mode, the file paths in **smaps** are not displayed.|
126| diskio plugin | Obtains the disk space usage of a process.|  |
127| network profiler | Obtains the HTTP request information of a process through process logging.|  |
128| network plugin | Obtains the network traffic information of a process.|  |
129| hisysevent plugin | Obtains the HiSysEvent logging data by running the HiSysEvent commands.|  |
130| hiperf plugin | Obtains the number of instructions and the corresponding stack of a process by running the HiPerf commands.|  |
131| hidump plugin | Obtains the HiDump data by running the SP_daemon commands.|  |
132<!--RP1End-->
133
134
135## Applications Signed by the Debug Certificate
136
137
138> **NOTE**
139>
140> Run the **hdc shell "bm dump -n bundlename | grep appProvisionType"** command to check whether the application specified in the command can be debugged. The expected output is **"appProvisionType": "debug"**.
141
142For example, run the following command to check the bundle name **com.example.myapplication**:
143
144```shell
145hdc shell "bm dump -n com.example.myapplication | grep appProvisionType"
146```
147
148If the application is a debug application, the following information is displayed:
149
150```shell
151"appProvisionType": "debug",
152```
153
154To build a debug application, you need to use a debug certificate for signature. For details about how to request and use the debug certificate, see [Requesting a Debug Certificate](https://developer.huawei.com/consumer/en/doc/app/agc-help-add-debugcert-0000001914263178).
155
156
157## Plugin Parameters
158
159**native_hook**
160
161Obtains the call stacks for heap memory allocations, including cross-language allocations (for example, using Node-API in ArkTS to allocate native heap memory), covering the **malloc**, **mmap**, **calloc**, and **realloc** functions. It can also display the call stacks of unreleased heap memory due to memory leak.
162
163Parameters
164
165| Name| Type| Description| Details|
166| -------- | -------- | -------- | -------- |
167| fp_unwind | bool | Whether to enable stack unwinding in fp mode. The value **true** means to enable stack unwinding in fp mode.<br>The value **false** means to enable stack unwinding in dwarf mode.| The stack unwinding in fp mode is implemented by using the x29 register, in which the function fp always points to the parent function (caller) fp. After stack unwinding, the profiling service calculates the relative PC based on the instruction pointer (IP) and searches for the corresponding mapping in maps for symbolization.<br>Due to increasingly aggressive compiler optimizations, register reuse and disabled fp can cause stack unwinding in fp mode to fail. In mixed stacks, the fp alone cannot capture all frames, so dwarf is required for more accurate stack rewinding.<br>The stack unwinding in dwarf mode is to search for the corresponding mapping information in the map table based on the PC register. The performance of dwarf is worse than that of fp because the call stack is parsed level by level in dwarf mode.<br>Note: fp stack unwinding does not support profiling for non-AArch64 devices.|
168| statistics_interval | int | Statistics interval, in seconds. Stacks in a statistics interval are summarized.| The statistics stack capture mode is provided to implement long-term lightweight collection. If profiling performance is a priority and you only need call counts and total stack size, use statistics mode.|
169| startup_mode | bool | Whether to capture the memory during process startup. By default, the memory during process startup is not captured.| This parameter records the heap memory allocation information during the period from when the process is started by AppSpawn to when the profiling ends. If a system-ability (SA) service is captured, locate the name (for example, **sa_main**) of the process that launches it in the corresponding .cfg file and add that name to this parameter.|
170| js_stack_report | int | Whether to enable cross-language stack unwinding.<br>The value **0** means not to capture the JS stack.<br>The value **1** means to capture the JS stack.| This parameter provides the cross-language stack unwinding feature for the Ark environment.|
171| malloc_free_matching_interval | int | Matching interval, in seconds. **malloc** and **free** are matched within the interval. If matched, the stack is not flushed to the disk.| Within the matching interval, the allocated and released call stacks are not recorded, reducing the overhead of the stack capture service process. If this parameter is set to a value greater than 0, **statistics_interval** cannot be set to **true**.|
172| offline_symbolization | bool | Whether to enable offline symbolization.<br>The value **true** means to enable offline symbolization;<br>the value **false** means the opposite.| When offline symbolization is used, the operation of matching symbols based on IP is transferred to SmartPerf, optimizing the performance of the native daemon and reducing process freezes. However, since the offline symbol table must be written into the trace file, the trace file generated under offline symbolization is larger in size than that under online symbolization.|
173| sample_interval | int | Sampling size.| When this parameter is set, the sampling mode is enabled. In sampling mode, malloc allocations smaller than the sampling size are accounted for probabilistically. The larger the call-stack allocation size, the more frequently it occurs and the greater its chance of being sampled.|
174
175Result examples:
176
177The fp stack unwinding and cross-language stack unwinding are enabled (green frames denote JavaScript).
178
179![en-us_image_0000002379700441](figures/en-us_image_0000002379700441.png)
180
181The dwarf stack unwinding and cross-language stack unwinding are enabled (native ->JS -> native stack frames are displayed).
182
183![en-us_image_0000002346179694](figures/en-us_image_0000002346179694.png)
184
185Statistics mode is enabled (the stack data is displayed periodically).
186
187![en-us_image_0000002379820229](figures/en-us_image_0000002379820229.png)
188
189Non-statistics mode is enabled.
190
191![en-us_image_0000002346019934](figures/en-us_image_0000002346019934.png)
192
193**ftrace_plugin**:
194
1951. Parameters
196
197| Name| Type| Description| Details|
198| -------- | -------- | -------- | -------- |
199| ftrace_events | string | Captured trace events.| Trace events that record kernel logging.|
200| hitrace_categories | string | Captured HiTrace logging information.| The HiTrace capability is called to obtain data and write the data to a file in proto format.|
201| buffer_size_kb | int | Buffer size, in KB.| Cache size required for the **hiprofiler_plugins** process to read kernel events. The default value **204800** is recommended.|
202| flush_interval_ms | int | Data collection interval, in ms.| The default value **1000** is recommended.|
203| flush_threshold_kb | int | Size of the data to refresh.| Data is refreshed to the file once when the threshold is exceeded. It is recommended that you use the default value of SmartPerf.|
204| parse_ksyms | bool | Whether to obtain kernel data.| The value **true** means to obtain kernel data, and **false** means the opposite.|
205| trace_period_ms | int | Period for reading kernel data.| It is recommended that you use the default value of SmartPerf.|
206
2072. Result analysis
208
209Example command:
210
211```shell
212$ hiprofiler_cmd \
213  -c - \
214  -o /data/local/tmp/hiprofiler_data.htrace \
215  -t 10 \
216  -s \
217  -k \
218<<CONFIG
219 request_id: 1
220 session_config {
221  buffers {
222   pages: 16384
223  }
224 }
225 plugin_configs {
226  plugin_name: "ftrace-plugin"
227  sample_interval: 1000
228  config_data {
229   ftrace_events: "binder/binder_transaction"
230   ftrace_events: "binder/binder_transaction_received"
231   buffer_size_kb: 204800
232   flush_interval_ms: 1000
233   flush_threshold_kb: 4096
234   parse_ksyms: true
235   clock: "boot"
236   trace_period_ms: 200
237   debug_on: false
238  }
239 }
240CONFIG
241```
242
243This command reads the kernel **binder_transaction** and **binder_transaction_received** data. The two fields must be used together to completely display the data at both ends of the binder. After the command is executed, run the **hdc file recv** command to export the file and drag the file to SmartPerf for parsing. The following figure shows an example result.
244
245You can click the arrow on the right of **binder transaction** to go to the process/thread on the other end of the binder.
246
247![en-us_image_0000002316248152](figures/en-us_image_0000002316248152.png)
248
249**memory_plugin**:
250
2511. Parameters
252
253| Name| Type| Description| Details|
254| -------- | -------- | -------- | -------- |
255| report_sysmem_vmem_info | bool | Whether to read virtual memory data.| Data is read from the **/proc/vmstat** node.|
256| report_process_mem_info | bool | Whether to obtain the detailed memory data of a process, such as **rss_shmem**, **rss_file**, and **vm_swap**.| Data is read from the **/proc/${pid}/stat** node.|
257| report_smaps_mem_info | bool | Whether to obtain the smaps memory information of a process.| Data is read from the **/proc/${pid}/smaps** node.|
258| report_gpu_mem_info | bool | Whether to obtain the GPU usage of a process.| Data is read from the **/proc/gpu_memory** node.|
259| parse_smaps_rollup | bool | Whether to refresh the data size.| Data is read from the **/proc/{pid}/smaps_rollup** node. The profiling effect (such as CPU and memory) is better than that of using the **report_smaps_mem_info** parameter.|
260
2612. Result analysis
262
263![en-us_image_0000002357083514](figures/en-us_image_0000002357083514.png)
264
265You can go to **DevEco Studio** -> **Profiler** -> **Allocation** and select **Memory** to use the **memory plug-in** feature of the profiler. The preceding figure shows the process smaps memory information in the selected time range.
266
267**xpower_plugin**:
268
2691. Parameters
270
271| Name| Type| Description| Details|
272| -------- | -------- | -------- | -------- |
273| bundle_name | string | Name of the process for which power consumption profiling is required.| The value must be the same as the process name in the **/proc/ directory**.|
274| message_type | XpowerMessageType | Type of the power consumption data to be obtained.| The data types include **REAL_BATTERY**, **APP_STATISTIC**, **APP_DETAIL**, **COMPONENT_TOP**, **ABNORMAL_EVENTS**, and **THERMAL_REPORT**.|
275
2762. Result analysis
277
278![en-us_image_0000002346028442](figures/en-us_image_0000002346028442.png)
279
280You can go to **DevEco Studio** -> **Profiler** -> **Realtime Monitor** to obtain the power consumption data of related processes.
281
282**gpu_plugin**:
283
284Obtaining GPU usage information
285
2861. Parameters
287
288| Name| Type| Description| Details|
289| -------- | -------- | -------- | -------- |
290| pid | int | Name of the process to profile.| The value must be the same as the process name in the **/proc/ directory**.|
291| report_gpu_info | bool | Whether to display the GPU usage of a specified process.| The value **true** means to display the GPU data of a specified process. In this case, you need to set **pid**.  <br>Data is read from the **/sys/class/devfreq/gpufreq/gpu_scene_aware/utilisation** node.<br>The value **false** means not to display the GPU data of a specified process.|
292
293**cpu_plugin**:
294
295Obtaining CPU usage information
296
2971. Parameters
298
299| Name| Type| Description| Details|
300| -------- | -------- | -------- | -------- |
301| pid | int | Name of the process to profile.| The value must be the same as the process name in the **/proc/ directory**.|
302| report_process_info | bool | Whether to display the CPU usage of a specified process.| The value **true** means to display the data of a specified process and you need to set the **pid** parameter.<br>The value false means to display the system CPU usage data.|
303| skip_thread_cpu_info | bool | Whether to skip the thread CPU usage data.| The value **true** means to not display the CPU usage of each thread. When this parameter is set to **true**, the profiling service overhead is reduced.<br>The value false means to display the CPU usage of each thread.|
304
305
306## Common Commands
307
308
309### Sampling Records of Heap Memory Allocation Call Stack Data
310
311
312Capture the stack for heap memory allocation of the **com.example.insight_test_stage** process. Enable fp stack unwinding, offline symbolization, and statistics mode.
313
314
315```shell
316$ hiprofiler_cmd \
317  -c - \
318  -t 30 \
319  -s \
320  -k \
321<<CONFIG
322 request_id: 1
323 session_config {
324  buffers {
325   pages: 16384
326  }
327 }
328 plugin_configs {
329  plugin_name: "nativehook"
330  sample_interval: 5000
331  config_data {
332   save_file: false
333   smb_pages: 16384
334   max_stack_depth: 20
335   process_name: "com.example.insight_test_stage"
336   string_compressed: true
337   fp_unwind: true
338   blocked: true
339   callframe_compress: true
340   record_accurately: true
341   offline_symbolization: true
342   startup_mode: false
343   statistics_interval: 10
344   sample_interval: 256
345   js_stack_report: 1
346   max_js_stack_depth: 10
347  }
348 }
349CONFIG
350```
351
352
353The collected data is saved to the **/data/local/tmp/hiprofiler_data.htrace** file, which contains the function call information, thread and dynamic library memory allocation information, call stack count and allocation size required for memory leak analysis. Enabling offline symbolization, fp stack unwinding, and statistics mode can improve the data processing efficiency of the profiling service.
354
355
356
357Capturing the CPU usage of a specified process
358
359
360Collect CPU data of the process whose process ID is **1234**. The collection duration is 30s, the sampling period is 1000 ms, the size of the shared memory for transmitting profiling data is 16384 memory pages, and the collected data is saved to the **/data/local/tmp/hiprofiler_data.htrace** file.
361
362
363```shell
364$ hiprofiler_cmd \
365  -c - \
366  -o /data/local/tmp/hiprofiler_data.htrace \
367  -t 30 \
368  -s \
369  -k \
370<<CONFIG
371 request_id: 1
372 session_config {
373  buffers {
374   pages: 16384
375  }
376 }
377 plugin_configs {
378  plugin_name: "cpu-plugin"
379  sample_interval: 1000
380  config_data {
381   pid: 1234
382   report_process_info: true
383  }
384 }
385CONFIG
386```
387
388
389
390## FAQs
391
392What should I do if an exception occurs during profiling?
393
394**Symptom**
395
396"Service not started" is displayed when the **hiprofiler_cmd** command is executed.
397
398![en-us_image_0000002357083914](figures/en-us_image_0000002357083914.png)
399
400**Possible Causes and Solution**
401
402If the profiling service is not started, DevEco Studio is being used for profiling or the previous profiling exits frequently. In this case, run the **hiprofiler_cmd -k** command and then run the profiling command again.
403
404What should I do if the captured trace file is empty?
405
406**Symptom**
407
408The captured trace file is empty.
409
410**Possible Causes and Solution**
411
412Check whether the generated file is in the **/data/local/tmp/** directory. If the target path is a folder in **/data/local/tmp**, run the **chmod 777** command on the folder. If the user version uses **nativehook** or **network profiler** to capture a no-debug application, no data can be captured. (For details, see changelog https://gitcode.com/openharmony/docs/pulls/57419.)
413
414What should I do if I suspect that the profiling data is inaccurate?
415
416**Symptom**
417
418The native heap captured by hiprofiler is different from that viewed by hidumper.
419
420**Possible Causes and Solution**
421
422hidumper captures the process-level memory usage, while hiprofiler captures the heap memory data allocated by the user-mode process using basic library functions such as **malloc**, **mmap**, and **realloc**. The **operator new** function also calls **malloc**. Therefore, the native heap information capture by them differs in the thread memory cache, heap memory release delay, and memory used by the loader.
423