• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# perf
2
3
4## Basic Concepts
5
6perf is a performance analysis tool. It uses the performance monitoring unit (PMU) to count sampling events and collect context information and provides hot spot distribution and hot paths.
7
8
9## Working Principles
10
11When a performance event occurs, the corresponding event counter overflows and triggers an interrupt. The interrupt handler records the event information, including the current PC, task ID, and call stack.
12
13perf provides two working modes: counting mode and sampling mode.
14
15In counting mode, perf collects only the number of event occurrences and duration. In sampling mode, perf also collects context data and stores the data in a circular buffer. The IDE then analyzes the data and provides information about hotspot functions and paths.
16
17
18## Available APIs
19
20The Perf module of the OpenHarmony LiteOS-A kernel provides the following APIs. For details, see the [API reference](https://gitee.com/openharmony/kernel_liteos_a/blob/master/kernel/include/los_perf.h).
21
22  **Table 1** APIs of the perf module
23
24| Category| Description|
25| -------- | -------- |
26| Starting or stopping sampling| **LOS_PerfInit**: initializes perf.<br>**LOS_PerfStart**: starts sampling.<br>**LOS_PerfStop**: stops sampling. |
27| Configuring perf sampling events| **LOS_PerfConfig**: sets the event type and sampling period. |
28| Reading sampling data| **LOS_PerfDataRead**: reads the sampling data. |
29| Registering a hook for the sampling data buffer| **LOS_PerfNotifyHookReg**: registers the hook to be called when the buffer waterline is reached.<br>**LOS_PerfFlushHookReg**: registers the hook for flushing the cache in the buffer. |
30
31**PerfConfigAttr** is the structure of the perf sampling event. For details, see [kernel\include\los_perf.h](https://gitee.com/openharmony/kernel_liteos_a/blob/master/kernel/include/los_perf.h).
32
33The sampling data buffer is a circular buffer, and only the region that has been read in the buffer can be overwritten.
34
35The buffer has limited space. You can register a hook to provide a buffer overflow notification or perform buffer read operation when the buffer waterline is reached. The default buffer waterline is 1/2 of the buffer size. The code snippet is as follows:
36
37```c
38VOID Example_PerfNotifyHook(VOID)
39{
40    CHAR buf[LOSCFG_PERF_BUFFER_SIZE] = {0};
41    UINT32 len;
42    PRINT_DEBUG("perf buffer reach the waterline!\n");
43    len = LOS_PerfDataRead(buf, LOSCFG_PERF_BUFFER_SIZE);
44    OsPrintBuff(buf, len); /* print data */
45}
46LOS_PerfNotifyHookReg(Example_PerfNotifyHook);
47```
48
49If the buffer sampled by perf involves caches across CPUs, you can register a hook for flushing the cache to ensure cache consistency. The code snippet is as follows:
50
51```c
52VOID Example_PerfFlushHook(VOID *addr, UINT32 size)
53{
54    OsCacheFlush(addr, size); /* platform interface */
55}
56LOS_PerfNotifyHookReg(Example_PerfFlushHook);
57```
58
59The API for flushing the cache is configured based on the platform.
60
61
62## Development Guidelines
63
64
65### Kernel-Mode Development Process
66
67The typical process of enabling perf is as follows:
68
691. Configure the macros related to the perf module.
70   Configure the perf control macro **LOSCFG_KERNEL_PERF**, which is disabled by default. In the **kernel/liteos_a** directory, run the **make update_config** command, choose **Kernel**, and select **Enable Perf Feature**.
71
72   | Configuration Item| menuconfig Option| Description| Value|
73   | -------- | -------- | -------- | -------- |
74   | LOSCFG_KERNEL_PERF | Enable&nbsp;Perf&nbsp;Feature | Whether to enable perf.| YES/NO |
75   | LOSCFG_PERF_CALC_TIME_BY_TICK | Time-consuming&nbsp;Calc&nbsp;Methods-&gt;By&nbsp;Tick | Whether to use tick as the perf timing unit.| YES/NO |
76   | LOSCFG_PERF_CALC_TIME_BY_CYCLE | Time-consuming&nbsp;Calc&nbsp;Methods-&gt;By&nbsp;Cpu&nbsp;Cycle | Whether to use cycle as the perf timing unit.| YES/NO |
77   | LOSCFG_PERF_BUFFER_SIZE | Perf&nbsp;Sampling&nbsp;Buffer&nbsp;Size | Size of the buffer used for perf sampling.| INT |
78   | LOSCFG_PERF_HW_PMU | Enable&nbsp;Hardware&nbsp;Pmu&nbsp;Events&nbsp;for&nbsp;Sampling | Whether to enable hardware PMU events. The target platform must support the hardware PMU.| YES/NO |
79   | LOSCFG_PERF_TIMED_PMU | Enable&nbsp;Hrtimer&nbsp;Period&nbsp;Events&nbsp;for&nbsp;Sampling | Whether to enable high-precision periodical events. The target platform must support the high precision event timer (HPET).| YES/NO |
80   | LOSCFG_PERF_SW_PMU | Enable&nbsp;Software&nbsp;Events&nbsp;for&nbsp;Sampling | Whether to enable software events. **LOSCFG_KERNEL_HOOK** must also be enabled.| YES/NO |
81
822. Call **LOS_PerfConfig** to configure the events to be sampled.
83   perf provides two working modes and three types of events.
84
85   - Working modes: counting mode (counts only the number of event occurrences) and sampling mode (collects context information such as task IDs, PC, and backtrace)
86   - Event types: CPU hardware events (such as cycle, branch, icache, and dcache), high-precision periodical events (such as CPU clock), and OS software events (such as task switch, mux pend, and IRQ)
873. Call **LOS_PerfStart(UINT32 sectionId)** at the start of the code to be sampled. The input parameter **sectionId** specifies different sampling session IDs.
88
894. Call **LOS_PerfStop** at the end of the code to be sampled.
90
915. Call **LOS_PerfDataRead** to read the sampling data and use IDE to analyze the collected data.
92
93
94####  Development Example
95
96This example implements the following:
97
981. Create a perf task.
99
1002. Configure sampling events.
101
1023. Start perf.
103
1044. Execute algorithms for statistics.
105
1065. Stop perf.
107
1086. Export the result.
109
110
111####  Sample Code
112
113Prerequisites: **Enable Hook Feature** and **Enable Software Events for Sampling** are selected for the perf module in **menuconfig**.
114
115You can compile and verify the sample code in **kernel/liteos_a/testsuites/kernel/src/osTest.c**.
116
117The code is as follows:
118
119```c
120#include "los_perf.h"
121#define TEST_MALLOC_SIZE 200
122#define TEST_TIME        5
123
124/* Add malloc() and free() in the test() function. */
125VOID test(VOID)
126{
127    VOID *p = NULL;
128    int i;
129    for (i = 0; i < TEST_TIME; i++) {
130        p = LOS_MemAlloc(m_aucSysMem1, TEST_MALLOC_SIZE);
131        if (p == NULL) {
132            PRINT_ERR("test alloc failed\n");
133            return;
134        }
135
136        (VOID)LOS_MemFree(m_aucSysMem1, p);
137    }
138}
139
140STATIC VOID OsPrintBuff(const CHAR *buf, UINT32 num)
141{
142    UINT32 i = 0;
143    PRINTK("num: ");
144    for (i = 0; i < num; i++) {
145        PRINTK(" %02d", i);
146    }
147    PRINTK("\n");
148    PRINTK("hex: ");
149    for (i = 0; i < num; i++) {
150        PRINTK(" %02x", buf[i]);
151    }
152    PRINTK("\n");
153}
154STATIC VOID perfTestHwEvent(VOID)
155{
156    UINT32 ret;
157    CHAR *buf = NULL;
158    UINT32 len;
159
160    //LOS_PerfInit(NULL, 0);
161
162
163    PerfConfigAttr attr = {
164        .eventsCfg = {
165            .type        = PERF_EVENT_TYPE_SW,
166            .events = {
167                [0]      = {PERF_COUNT_SW_TASK_SWITCH, 0xff}, /* Collect task scheduling information. */
168                [1]      = {PERF_COUNT_SW_MEM_ALLOC, 0xff},   /* Collect memory allocation information. */
169
170                PERF_COUNT_SW_TASK_SWITCH
171            },
172            .eventsNr    = 2,
173            .predivided  = 1,             /* cycle counter increase every 64 cycles */
174        },
175        .taskIds         = {0},
176        .taskIdsNr       = 0,
177        .needSample      = 0,
178        .sampleType      = PERF_RECORD_IP | PERF_RECORD_CALLCHAIN,
179    };
180    ret = LOS_PerfConfig(&attr);
181    if (ret != LOS_OK) {
182        PRINT_ERR("perf config error %u\n", ret);
183        return;
184    }
185    PRINTK("------count mode------\n");
186    LOS_PerfStart(0);
187    test(); /* this is any test function*/
188    LOS_PerfStop();
189    PRINTK("--------sample mode------ \n");
190    attr.needSample = 1;
191    LOS_PerfConfig(&attr);
192    LOS_PerfStart(2); // 2: set the section id to 2.
193    test(); /* this is any test function*/
194    LOS_PerfStop();
195    buf = LOS_MemAlloc(m_aucSysMem1, LOSCFG_PERF_BUFFER_SIZE);
196    if (buf == NULL) {
197        PRINT_ERR("buffer alloc failed\n");
198        return;
199    }
200    /* get sample data */
201    len = LOS_PerfDataRead(buf, LOSCFG_PERF_BUFFER_SIZE);
202    OsPrintBuff(buf, len); /* print data */
203    (VOID)LOS_MemFree(m_aucSysMem1, buf);
204}
205
206UINT32 Example_Perf_test(VOID)
207{
208    UINT32 ret;
209    TSK_INIT_PARAM_S perfTestTask = {0};
210    UINT32 taskID;
211    /* Create a perf task. */
212    perfTestTask.pfnTaskEntry = (TSK_ENTRY_FUNC)perfTestHwEvent;
213    perfTestTask.pcName       = "TestPerfTsk";   /* Test task name. */
214    perfTestTask.uwStackSize  = 0x1000; // 0x8000: perf test task stack size
215    perfTestTask.usTaskPrio   = 5; // 5: perf test task priority
216    ret = LOS_TaskCreate(&taskID, &perfTestTask);
217    if (ret != LOS_OK) {
218        PRINT_ERR("PerfTestTask create failed. 0x%x\n", ret);
219        return LOS_NOK;
220    }
221    return LOS_OK;
222}
223LOS_MODULE_INIT(perfTestHwEvent, LOS_INIT_LEVEL_KMOD_EXTENDED);
224```
225
226
227#### Verification
228
229  The output is as follows:
230
231```
232type: 2
233events[0]: 1, 0xff
234events[1]: 3, 0xff
235predivided: 1
236sampleType: 0x60
237needSample: 0
238------count mode------
239[task switch] eventType: 0x1 [core 0]: 0
240[mem alloc] eventType: 0x3 [core 0]: 5
241time used: 0.005000(s)
242--------sample mode------
243type: 2
244events[0]: 1, 0xff
245events[1]: 3, 0xff
246predivided: 1
247sampleType: 0x60
248needSample: 1
249dump perf data, addr: 0x402c3e6c length: 0x5000
250time used: 0.000000(s)
251num:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
252hex:  00 ffffffef ffffffef ffffffef 02 00 00 00 14 00 00 00 60 00 00 00 02 00 00 00
253
254The print information may vary depending on the running environment.
255```
256
257- For the counting mode, the following information is displayed after perf is stopped:
258  Event name (cycles), event type (0xff), and number of event occurrences (5466989440)
259
260  For hardware PMU events, the displayed event type is the hardware event ID, not the abstract type defined in **enum PmuHWId**.
261
262- For the sampling mode, the address and length of the sampled data will be displayed after perf is stopped:
263  dump section data, addr: (0x8000000) length: (0x5000)
264
265  You can export the data using the JTAG interface and then use the IDE offline tool to analyze the data.
266
267  You can also call **LOS_PerfDataRead** to read data to a specified address for further analysis. In the example, **OsPrintBuff** is a test API, which prints the sampled data by byte. **num** indicates the sequence number of the byte, and **hex** indicates the value in the byte.
268