• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# perf
2
3
4## Basic Concepts
5
6perf is a performance analysis tool. It uses the performance monitoring unit (PMU) to count sampling events and collect context information and provides hot spot distribution and hot paths.
7
8
9## Working Principles
10
11When a performance event occurs, the corresponding event counter overflows and triggers an interrupt. The interrupt handler records the event information, including the current PC, task ID, and call stack.
12
13perf provides two working modes: counting mode and sampling mode.
14
15In counting mode, perf collects only the number of event occurrences and duration. In sampling mode, perf also collects context data and stores the data in a circular buffer. The IDE then analyzes the data and provides information about hotspot functions and paths.
16
17
18## Available APIs
19
20
21### Kernel Mode
22
23The Perf module of the OpenHarmony LiteOS-A kernel provides the following functions. For details about the interfaces, see the [API reference](https://gitee.com/openharmony/kernel_liteos_a/blob/master/kernel/include/los_perf.h).
24
25  **Table 1** APIs of the perf module
26
27| API| Description|
28| -------- | -------- |
29| LOS_PerfStart| Starts sampling.|
30| LOS_PerfStop| Stops sampling.|
31| LOS_PerfConfig| Sets the event type and sampling interval.|
32| LOS_PerfDataRead| Reads the sampling data.|
33| LOS_PerfNotifyHookReg| Registers the hook to be called when the buffer waterline is reached.|
34| LOS_PerfFlushHookReg| Registers the hook for flushing the cache in the buffer.|
35
36- The structure of the perf sampling event is **PerfConfigAttr**. For details, see **kernel\include\los_perf.h**.
37
38- The sampling data buffer is a circular buffer, and only the region that has been read in the buffer can be overwritten.
39
40- The buffer has limited space. You can register a hook to provide a buffer overflow notification or perform buffer read operation when the buffer waterline is reached. The default buffer waterline is 1/2 of the buffer size.
41
42   Example:
43
44   ```
45   VOID Example_PerfNotifyHook(VOID)
46   {
47       CHAR buf[LOSCFG_PERF_BUFFER_SIZE] = {0};
48       UINT32 len;
49       PRINT_DEBUG("perf buffer reach the waterline!\n");
50       len = LOS_PerfDataRead(buf, LOSCFG_PERF_BUFFER_SIZE);
51       OsPrintBuff(buf, len); /* print data */
52   }
53   LOS_PerfNotifyHookReg(Example_PerfNotifyHook);
54   ```
55
56- If the buffer sampled by perf involves caches across CPUs, you can register a hook for flushing the cache to ensure cache consistency.
57
58   Example:
59
60   ```
61   VOID Example_PerfFlushHook(VOID *addr, UINT32 size)
62   {
63       OsCacheFlush(addr, size); /* platform interface */
64   }
65   LOS_PerfNotifyHookReg(Example_PerfFlushHook);
66   ```
67
68   The API for flushing the cache is configured based on the platform.
69
70
71### User Mode
72
73
74The perf character device is located in **/dev/perf**. You can read, write, and control the user-mode perf by running the following commands on the device node:
75
76
77- **read**: reads perf data in user mode.
78
79- **write**: writes user-mode sampling events.
80
81- **ioctl**: controls the user-mode perf, which includes the following:
82
83  ```
84  #define PERF_IOC_MAGIC     'T'
85  #define PERF_START         _IO(PERF_IOC_MAGIC, 1)
86  #define PERF_STOP          _IO(PERF_IOC_MAGIC, 2)
87  ```
88
89  The operations correspond to **LOS_PerfStart** and **LOS_PerfStop**.
90
91
92For details, see [User-Mode Development Example](#user-mode-development-example).
93
94
95## How to Develop
96
97
98### Kernel-Mode Development Process
99
100The typical process of enabling perf is as follows:
101
1021. Configure the macros related to the perf module.
103
104   Configure the perf control macro **LOSCFG_KERNEL_PERF**, which is disabled by default. In the **kernel/liteos_a** directory, run the **make update_config** command, choose **Kernel**, and select **Enable Perf Feature**.
105
106  | Item| menuconfig Option| Description| Value|
107  | -------- | -------- | -------- | -------- |
108  | LOSCFG_KERNEL_PERF | Enable Perf Feature | Whether to enable perf.| YES/NO |
109  | LOSCFG_PERF_CALC_TIME_BY_TICK | Time-consuming Calc Methods->By Tick | Whether to use tick as the perf timing unit.| YES/NO |
110  | LOSCFG_PERF_CALC_TIME_BY_CYCLE | Time-consuming Calc Methods->By Cpu Cycle | Whether to use cycle as the perf timing unit.| YES/NO |
111  | LOSCFG_PERF_BUFFER_SIZE | Perf Sampling Buffer Size | Size of the buffer used for perf sampling.| INT |
112  | LOSCFG_PERF_HW_PMU | Enable Hardware Pmu Events for Sampling | Whether to enable hardware PMU events. The target platform must support the hardware PMU.| YES/NO |
113  | LOSCFG_PERF_TIMED_PMU | Enable Hrtimer Period Events for Sampling | Whether to enable high-precision periodical events. The target platform must support the high precision event timer (HPET).| YES/NO |
114  | LOSCFG_PERF_SW_PMU | Enable Software Events for Sampling | Whether to enable software events. **LOSCFG_KERNEL_HOOK** must also be enabled.| YES/NO |
115
1162. Call **LOS_PerfConfig** to configure the events to be sampled.
117
118   perf provides two working modes and three types of events.
119
120   Working modes: counting mode (counts only the number of event occurrences) and sampling mode (collects context information such as task IDs, PC, and backtrace)
121
122   Events: CPU hardware events (such as cycle, branch, icache, and dcache), high-precision periodical events (such as CPU clock), and OS software events (such as task switch, mux pend, and IRQ)
123
1243. Call **LOS_PerfStart(UINT32 sectionId)** at the start of the code to be sampled. The input parameter **sectionId** specifies different sampling session IDs.
125
1264. Call **LOS_PerfStop** at the end of the code to be sampled.
127
1285. Call **LOS_PerfDataRead** to read the sampling data and use IDE to analyze the collected data.
129
130
131####  Kernel-Mode Development Example
132
133This example implements the following:
134
1351. Create a perf task.
136
1372. Configure sampling events.
138
1393. Start perf.
140
1414. Execute algorithms for statistics.
142
1435. Stop perf.
144
1456. Export the result.
146
147
148####  Kernel-Mode Sample Code
149
150Prerequisites: The perf module configuration is complete in **menuconfig**.
151
152The sample code is as follows:
153
154```
155#include "los_perf.h"
156STATIC VOID OsPrintBuff(const CHAR *buf, UINT32 num)
157{
158    UINT32 i = 0;
159    PRINTK("num: ");
160    for (i = 0; i < num; i++) {
161        PRINTK(" %02d", i);
162    }
163    PRINTK("\n");
164    PRINTK("hex: ");
165    for (i = 0; i < num; i++) {
166        PRINTK(" %02x", buf[i]);
167    }
168    PRINTK("\n");
169}
170STATIC VOID perfTestHwEvent(VOID)
171{
172    UINT32 ret;
173    CHAR *buf = NULL;
174    UINT32 len;
175    PerfConfigAttr attr = {
176        .eventsCfg = {
177            .type        = PERF_EVENT_TYPE_HW,
178            .events = {
179                [0]      = {PERF_COUNT_HW_CPU_CYCLES, 0xFFFF},
180                [1]      = {PERF_COUNT_HW_BRANCH_INSTRUCTIONS, 0xFFFFFF00},
181            },
182            .eventsNr    = 2,
183            .predivided  = 1,             /* cycle counter increase every 64 cycles */
184        },
185        .taskIds         = {0},
186        .taskIdsNr       = 0,
187        .needSample      = 0,
188        .sampleType      = PERF_RECORD_IP | PERF_RECORD_CALLCHAIN,
189    };
190    ret = LOS_PerfConfig(&attr);
191    if (ret != LOS_OK) {
192        PRINT_ERR("perf config error %u\n", ret);
193        return;
194    }
195    PRINTK("------count mode------\n");
196    LOS_PerfStart(0);
197    test(); /* this is any test function*/
198    LOS_PerfStop();
199    PRINTK("--------sample mode------ \n");
200    attr.needSample = 1;
201    LOS_PerfConfig(&attr);
202    LOS_PerfStart(2);
203    test(); /* this is any test function*/
204    LOS_PerfStop();
205    buf = LOS_MemAlloc(m_aucSysMem1, LOSCFG_PERF_BUFFER_SIZE);
206    if (buf == NULL) {
207        PRINT_ERR("buffer alloc failed\n");
208        return;
209    }
210    /* get sample data */
211    len = LOS_PerfDataRead(buf, LOSCFG_PERF_BUFFER_SIZE);
212    OsPrintBuff(buf, len); /* print data */
213    (VOID)LOS_MemFree(m_aucSysMem1, buf);
214}
215UINT32 Example_Perf_test(VOID){
216    UINT32 ret;
217    TSK_INIT_PARAM_S perfTestTask;
218    /* Create a perf task. */
219    memset(&perfTestTask, 0, sizeof(TSK_INIT_PARAM_S));
220    perfTestTask.pfnTaskEntry = (TSK_ENTRY_FUNC)perfTestHwEvent;
221    perfTestTask.pcName = "TestPerfTsk";    /* Test task name. */
222    perfTestTask.uwStackSize  = 0x800;
223    perfTestTask.usTaskPrio   = 5;
224    perfTestTask.uwResved   = LOS_TASK_STATUS_DETACHED;
225    ret = LOS_TaskCreate(&g_perfTestTaskId, &perfTestTask);
226    if(ret != LOS_OK){
227        PRINT_ERR("PerfTestTask create failed.\n");
228        return LOS_NOK;
229    }
230    return LOS_OK;
231}
232LOS_MODULE_INIT(perfTestHwEvent, LOS_INIT_LEVEL_KMOD_EXTENDED);
233```
234
235
236#### Kernel-Mode Verification
237
238  The output is as follows:
239
240```
241--------count mode----------
242[EMG] [cycles] eventType: 0xff: 5466989440
243[EMG] [branches] eventType: 0xc: 602166445
244------- sample mode----------
245[EMG] dump section data, addr: 0x8000000 length: 0x800000
246num:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 ...
247hex:  00 ef ef ef 00 00 00 00 14 00 00 00 60 00 00 00 00 00 00 00 70 88 36 40 08 00 00 00 6b 65 72 6e 65 6c 00 00 01 00 00 00 cc 55 30 40 08 00 00 00 6b 65 72 6e 65 6c 00 00
248```
249
250- For the counting mode, the following information is displayed after perf is stopped:
251  Event name (cycles), event type (0xff), and number of event occurrences (5466989440)
252
253  For hardware PMU events, the displayed event type is the hardware event ID, not the abstract type defined in **enum PmuHWId**.
254
255- For the sampling mode, the address and length of the sampled data will be displayed after perf is stopped:
256  dump section data, addr: (0x8000000) length: (0x5000)
257
258  You can export the data using the JTAG interface and then use the IDE offline tool to analyze the data.
259
260  You can also call **LOS_PerfDataRead** to read data to a specified address for further analysis. In the example, **OsPrintBuff** is a test API, which prints the sampled data by byte. **num** indicates the sequence number of the byte, and **hex** indicates the value in the byte.
261
262
263### User-Mode Development Process
264
265Choose **Driver** > **Enable PERF DRIVER** in **menuconfig** to enable the perf driver. This option is available in **Driver** only after **Enable Perf Feature** is selected in the kernel.
266
2671. Open the **/dev/perf** file and perform read, write, and ioctl operations.
268
2692. Run the **perf** commands in user mode in the **/bin** directory.
270
271   After running **cd bin**, you can use the following commands:
272
273   - **./perf start [*id*]**: starts perf sampling. *id* is optional and is **0** by default.
274   - **./perf stop**: stops perf sampling.
275   - **./perf read <*nBytes*>**: reads n-byte data from the sampling buffer and displays the data.
276   - **./perf list**: lists the events supported by **-e**.
277   - **./perf stat/record [*option*] <*command*>**: sets counting or sampling parameters.
278      - The [*option*] can be any of the following:
279         - -**-e**: sets sampling events. Events of the same type listed in **./perf list** can be used.
280         - -**-p**: sets the event sampling interval.
281         - -**-o**: specifies the path of the file for saving the perf sampling data.
282         - -**-t**: specifies the task IDs for data collection. Only the contexts of the specified tasks are collected. If this parameter is not specified, all tasks are collected by default.
283         - -**-s**: specifies the context type for sampling. For details, see **PerfSampleType** defined in **los_perf.h**.
284         - -**-P**: specifies the process IDs for data collection. Only the contexts of the specified processes are collected. If this parameter is not specified, all processes are collected by default.
285         - -**-d**: specifies whether to divide the frequency (the value is incremented by 1 each time an event occurs 64 times). This option is valid only for hardware cycle events.
286      - *command* specifies the program to be checked by perf.
287
288Examples:
289
290Run the **./perf list** command to display available events.
291
292The output is as follows:
293
294
295```
296cycles                                 [Hardware event]
297instruction                            [Hardware event]
298dcache                                 [Hardware event]
299dcache-miss                            [Hardware event]
300icache                                 [Hardware event]
301icache-miss                            [Hardware event]
302branch                                 [Hardware event]
303branch-miss                            [Hardware event]
304clock                                     [Timed event]
305task-switch                            [Software event]
306irq-in                                 [Software event]
307mem-alloc                              [Software event]
308mux-pend                               [Software event]
309```
310
311Run **./perf stat -e cycles os_dump**.
312
313The output is as follows:
314
315
316```
317type: 0
318events[0]: 255, 0xffff
319predivided: 0
320sampleType: 0x0
321needSample: 0
322usage os_dump [--help | -l | SERVICE]
323         --help: shows this help
324         -l: only list services, do not dump them
325         SERVICE: dumps only service SERVICE
326time used: 0.058000(s)
327[cycles] eventType: 0xff [core 0]: 21720647
328[cycles] eventType: 0xff [core 1]: 13583830
329```
330
331Run **./perf record -e cycles os_dump**.
332
333The output is as follows:
334
335
336```
337type: 0
338events[0]: 255, 0xffff
339predivided: 0
340sampleType: 0x60
341needSample: 1
342usage os_dump [--help | -l | SERVICE]
343         --help: shows this help
344         -l: only list services, do not dump them
345         SERVICE: dumps only service SERVICE
346dump perf data, addr: 0x408643d8 length: 0x5000
347time used: 0.059000(s)
348save perf data success at /storage/data/perf.data
349```
350
351> ![icon-note.gif](public_sys-resources/icon-note.gif) **NOTE**<br>
352> After running the **./perf stat/record** command, you can run the **./perf start** and **./perf stop** commands multiple times. The sampling event configuration is as per the parameters set in the latest **./perfstat/record** command.
353
354
355#### User-Mode Development Example
356
357This example implements the following:
358
3591. Open the perf character device.
360
3612. Write the perf events.
362
3633. Start perf.
364
3654. Stop perf.
366
3675. Read the perf sampling data.
368
369
370#### User-Mode Sample Code
371
372  The code is as follows:
373
374```
375#include "fcntl.h"
376#include "user_copy.h"
377#include "sys/ioctl.h"
378#include "fs/driver.h"
379#include "los_dev_perf.h"
380#include "los_perf.h"
381#include "los_init.h"
382/* perf ioctl */
383#define PERF_IOC_MAGIC     'T'
384#define PERF_START         _IO(PERF_IOC_MAGIC, 1)
385#define PERF_STOP          _IO(PERF_IOC_MAGIC, 2)
386int main(int argc, char **argv)
387{
388    char *buf = NULL;
389    ssize_t len;
390    int fd = open("/dev/perf", O_RDWR);
391    if (fd == -1) {
392        printf("Perf open failed.\n");
393        exit(EXIT_FAILURE);
394    }
395    PerfConfigAttr attr = {
396        .eventsCfg = {
397#ifdef LOSCFG_PERF_HW_PMU
398            .type = PERF_EVENT_TYPE_HW,
399            .events = {
400                [0] = {PERF_COUNT_HW_CPU_CYCLES, 0xFFFF},
401            },
402#elif defined LOSCFG_PERF_TIMED_PMU
403            .type = PERF_EVENT_TYPE_TIMED,
404            .events = {
405                [0] = {PERF_COUNT_CPU_CLOCK, 100},
406            },
407#elif defined LOSCFG_PERF_SW_PMU
408            .type = PERF_EVENT_TYPE_SW,
409            .events = {
410                [0] = {PERF_COUNT_SW_TASK_SWITCH, 1},
411            },
412#endif
413            .eventsNr = 1, /* 1 event */
414            .predivided = 0,
415        },
416        .taskIds = {0},
417        .taskIdsNr = 0,
418        .processIds = {0},
419        .processIdsNr = 0,
420        .needSample = 1,
421        .sampleType = PERF_RECORD_IP | PERF_RECORD_CALLCHAIN,
422    };
423    (void)write(fd, &attr, sizeof(PerfConfigAttr)); /* perf config */
424    ioctl(fd, PERF_START, NULL); /* perf start */
425    test();
426    ioctl(fd, PERF_STOP, NULL); /* perf stop */
427    buf = (char *)malloc(LOSCFG_PERF_BUFFER_SIZE);
428    if (buf == NULL) {
429        printf("no memory for read perf 0x%x\n", LOSCFG_PERF_BUFFER_SIZE);
430        return -1;
431    }
432    len = read(fd, buf, LOSCFG_PERF_BUFFER_SIZE);
433    OsPrintBuff(buf, len); /* print data */
434    free(buf);
435    close(fd);
436    return 0;
437}
438```
439
440
441#### User-Mode Verification
442
443  The output is as follows:
444
445```
446[EMG] dump section data, addr: 0x8000000 length: 0x800000
447num:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 ...
448hex:  00 ef ef ef 00 00 00 00 14 00 00 00 60 00 00 00 00 00 00 00 70 88 36 40 08 00 00 00 6b 65 72 6e 65 6c 00 00 01 00 00 00 cc 55 30 40 08 00 00 00 6b 65 72 6e 65 6c 00 00
449```
450