1====================================== 2Coresight - HW Assisted Tracing on ARM 3====================================== 4 5 :Author: Mathieu Poirier <mathieu.poirier@linaro.org> 6 :Date: September 11th, 2014 7 8Introduction 9------------ 10 11Coresight is an umbrella of technologies allowing for the debugging of ARM 12based SoC. It includes solutions for JTAG and HW assisted tracing. This 13document is concerned with the latter. 14 15HW assisted tracing is becoming increasingly useful when dealing with systems 16that have many SoCs and other components like GPU and DMA engines. ARM has 17developed a HW assisted tracing solution by means of different components, each 18being added to a design at synthesis time to cater to specific tracing needs. 19Components are generally categorised as source, link and sinks and are 20(usually) discovered using the AMBA bus. 21 22"Sources" generate a compressed stream representing the processor instruction 23path based on tracing scenarios as configured by users. From there the stream 24flows through the coresight system (via ATB bus) using links that are connecting 25the emanating source to a sink(s). Sinks serve as endpoints to the coresight 26implementation, either storing the compressed stream in a memory buffer or 27creating an interface to the outside world where data can be transferred to a 28host without fear of filling up the onboard coresight memory buffer. 29 30At typical coresight system would look like this:: 31 32 ***************************************************************** 33 **************************** AMBA AXI ****************************===|| 34 ***************************************************************** || 35 ^ ^ | || 36 | | * ** 37 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ |||||||||||| 38 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System || 39 |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory || 40 | #######<-->: I : | #######<-->: I : : I : @@@<-| |||||||||||| 41 | # ETM # ::::: | # PTM # ::::: ::::: @ | 42 | ##### ^ ^ | ##### ^ ! ^ ! . | ||||||||| 43 | |->### | ! | |->### | ! | ! . | || DAP || 44 | | # | ! | | # | ! | ! . | ||||||||| 45 | | . | ! | | . | ! | ! . | | | 46 | | . | ! | | . | ! | ! . | | * 47 | | . | ! | | . | ! | ! . | | SWD/ 48 | | . | ! | | . | ! | ! . | | JTAG 49 *****************************************************************<-| 50 *************************** AMBA Debug APB ************************ 51 ***************************************************************** 52 | . ! . ! ! . | 53 | . * . * * . | 54 ***************************************************************** 55 ******************** Cross Trigger Matrix (CTM) ******************* 56 ***************************************************************** 57 | . ^ . . | 58 | * ! * * | 59 ***************************************************************** 60 ****************** AMBA Advanced Trace Bus (ATB) ****************** 61 ***************************************************************** 62 | ! =============== | 63 | * ===== F =====<---------| 64 | ::::::::: ==== U ==== 65 |-->:: CTI ::<!! === N === 66 | ::::::::: ! == N == 67 | ^ * == E == 68 | ! &&&&&&&&& IIIIIII == L == 69 |------>&& ETB &&<......II I ======= 70 | ! &&&&&&&&& II I . 71 | ! I I . 72 | ! I REP I<.......... 73 | ! I I 74 | !!>&&&&&&&&& II I *Source: ARM ltd. 75 |------>& TPIU &<......II I DAP = Debug Access Port 76 &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell 77 ; PTM = Program Trace Macrocell 78 ; CTI = Cross Trigger Interface 79 * ETB = Embedded Trace Buffer 80 To trace port TPIU= Trace Port Interface Unit 81 SWD = Serial Wire Debug 82 83While on target configuration of the components is done via the APB bus, 84all trace data are carried out-of-band on the ATB bus. The CTM provides 85a way to aggregate and distribute signals between CoreSight components. 86 87The coresight framework provides a central point to represent, configure and 88manage coresight devices on a platform. This first implementation centers on 89the basic tracing functionality, enabling components such ETM/PTM, funnel, 90replicator, TMC, TPIU and ETB. Future work will enable more 91intricate IP blocks such as STM and CTI. 92 93 94Acronyms and Classification 95--------------------------- 96 97Acronyms: 98 99PTM: 100 Program Trace Macrocell 101ETM: 102 Embedded Trace Macrocell 103STM: 104 System trace Macrocell 105ETB: 106 Embedded Trace Buffer 107ITM: 108 Instrumentation Trace Macrocell 109TPIU: 110 Trace Port Interface Unit 111TMC-ETR: 112 Trace Memory Controller, configured as Embedded Trace Router 113TMC-ETF: 114 Trace Memory Controller, configured as Embedded Trace FIFO 115CTI: 116 Cross Trigger Interface 117 118Classification: 119 120Source: 121 ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM 122Link: 123 Funnel, replicator (intelligent or not), TMC-ETR 124Sinks: 125 ETBv1.0, ETB1.1, TPIU, TMC-ETF 126Misc: 127 CTI 128 129 130Device Tree Bindings 131-------------------- 132 133See Documentation/devicetree/bindings/arm/coresight.txt for details. 134 135As of this writing drivers for ITM, STMs and CTIs are not provided but are 136expected to be added as the solution matures. 137 138 139Framework and implementation 140---------------------------- 141 142The coresight framework provides a central point to represent, configure and 143manage coresight devices on a platform. Any coresight compliant device can 144register with the framework for as long as they use the right APIs: 145 146.. c:function:: struct coresight_device *coresight_register(struct coresight_desc *desc); 147.. c:function:: void coresight_unregister(struct coresight_device *csdev); 148 149The registering function is taking a ``struct coresight_desc *desc`` and 150register the device with the core framework. The unregister function takes 151a reference to a ``struct coresight_device *csdev`` obtained at registration time. 152 153If everything goes well during the registration process the new devices will 154show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:: 155 156 root:~# ls /sys/bus/coresight/devices/ 157 replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm 158 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm 159 root:~# 160 161The functions take a ``struct coresight_device``, which looks like this:: 162 163 struct coresight_desc { 164 enum coresight_dev_type type; 165 struct coresight_dev_subtype subtype; 166 const struct coresight_ops *ops; 167 struct coresight_platform_data *pdata; 168 struct device *dev; 169 const struct attribute_group **groups; 170 }; 171 172 173The "coresight_dev_type" identifies what the device is, i.e, source link or 174sink while the "coresight_dev_subtype" will characterise that type further. 175 176The ``struct coresight_ops`` is mandatory and will tell the framework how to 177perform base operations related to the components, each component having 178a different set of requirement. For that ``struct coresight_ops_sink``, 179``struct coresight_ops_link`` and ``struct coresight_ops_source`` have been 180provided. 181 182The next field ``struct coresight_platform_data *pdata`` is acquired by calling 183``of_get_coresight_platform_data()``, as part of the driver's _probe routine and 184``struct device *dev`` gets the device reference embedded in the ``amba_device``:: 185 186 static int etm_probe(struct amba_device *adev, const struct amba_id *id) 187 { 188 ... 189 ... 190 drvdata->dev = &adev->dev; 191 ... 192 } 193 194Specific class of device (source, link, or sink) have generic operations 195that can be performed on them (see ``struct coresight_ops``). The ``**groups`` 196is a list of sysfs entries pertaining to operations 197specific to that component only. "Implementation defined" customisations are 198expected to be accessed and controlled using those entries. 199 200Device Naming scheme 201-------------------- 202 203The devices that appear on the "coresight" bus were named the same as their 204parent devices, i.e, the real devices that appears on AMBA bus or the platform bus. 205Thus the names were based on the Linux Open Firmware layer naming convention, 206which follows the base physical address of the device followed by the device 207type. e.g:: 208 209 root:~# ls /sys/bus/coresight/devices/ 210 20010000.etf 20040000.funnel 20100000.stm 22040000.etm 211 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu 212 20070000.etr 20120000.replicator 220c0000.funnel 213 23040000.etm 23140000.etm 23340000.etm 214 215However, with the introduction of ACPI support, the names of the real 216devices are a bit cryptic and non-obvious. Thus, a new naming scheme was 217introduced to use more generic names based on the type of the device. The 218following rules apply:: 219 220 1) Devices that are bound to CPUs, are named based on the CPU logical 221 number. 222 223 e.g, ETM bound to CPU0 is named "etm0" 224 225 2) All other devices follow a pattern, "<device_type_prefix>N", where : 226 227 <device_type_prefix> - A prefix specific to the type of the device 228 N - a sequential number assigned based on the order 229 of probing. 230 231 e.g, tmc_etf0, tmc_etr0, funnel0, funnel1 232 233Thus, with the new scheme the devices could appear as :: 234 235 root:~# ls /sys/bus/coresight/devices/ 236 etm0 etm1 etm2 etm3 etm4 etm5 funnel0 237 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 238 239Some of the examples below might refer to old naming scheme and some 240to the newer scheme, to give a confirmation that what you see on your 241system is not unexpected. One must use the "names" as they appear on 242the system under specified locations. 243 244How to use the tracer modules 245----------------------------- 246 247There are two ways to use the Coresight framework: 248 2491. using the perf cmd line tools. 2502. interacting directly with the Coresight devices using the sysFS interface. 251 252Preference is given to the former as using the sysFS interface 253requires a deep understanding of the Coresight HW. The following sections 254provide details on using both methods. 255 2561) Using the sysFS interface: 257 258Before trace collection can start, a coresight sink needs to be identified. 259There is no limit on the amount of sinks (nor sources) that can be enabled at 260any given moment. As a generic operation, all device pertaining to the sink 261class will have an "active" entry in sysfs:: 262 263 root:/sys/bus/coresight/devices# ls 264 replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm 265 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm 266 root:/sys/bus/coresight/devices# ls 20010000.etb 267 enable_sink status trigger_cntr 268 root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink 269 root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink 270 1 271 root:/sys/bus/coresight/devices# 272 273At boot time the current etm3x driver will configure the first address 274comparator with "_stext" and "_etext", essentially tracing any instruction 275that falls within that range. As such "enabling" a source will immediately 276trigger a trace capture:: 277 278 root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source 279 root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source 280 1 281 root:/sys/bus/coresight/devices# cat 20010000.etb/status 282 Depth: 0x2000 283 Status: 0x1 284 RAM read ptr: 0x0 285 RAM wrt ptr: 0x19d3 <----- The write pointer is moving 286 Trigger cnt: 0x0 287 Control: 0x1 288 Flush status: 0x0 289 Flush ctrl: 0x2001 290 root:/sys/bus/coresight/devices# 291 292Trace collection is stopped the same way:: 293 294 root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source 295 root:/sys/bus/coresight/devices# 296 297The content of the ETB buffer can be harvested directly from /dev:: 298 299 root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \ 300 of=~/cstrace.bin 301 64+0 records in 302 64+0 records out 303 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s 304 root:/sys/bus/coresight/devices# 305 306The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32. 307 308Following is a DS-5 output of an experimental loop that increments a variable up 309to a certain value. The example is simple and yet provides a glimpse of the 310wealth of possibilities that coresight provides. 311:: 312 313 Info Tracing enabled 314 Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr} 315 Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc 316 Instruction 0 0x8026B544 E3A03000 false MOV r3,#0 317 Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4] 318 Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4] 319 Instruction 0 0x8026B550 E3530004 false CMP r3,#4 320 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 321 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 322 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 323 Timestamp Timestamp: 17106715833 324 Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4] 325 Instruction 0 0x8026B550 E3530004 false CMP r3,#4 326 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 327 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 328 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 329 Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4] 330 Instruction 0 0x8026B550 E3530004 false CMP r3,#4 331 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 332 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 333 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 334 Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] 335 Instruction 0 0x8026B550 E3530004 false CMP r3,#4 336 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 337 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 338 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 339 Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] 340 Instruction 0 0x8026B550 E3530004 false CMP r3,#4 341 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 342 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 343 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 344 Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4] 345 Instruction 0 0x8026B550 E3530004 false CMP r3,#4 346 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 347 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 348 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 349 Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1 350 Instruction 0 0x8026B564 E1A0100D false MOV r1,sp 351 Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0 352 Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f 353 Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4] 354 Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368 355 Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc] 356 Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0] 357 Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4 358 Info Tracing enabled 359 Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc 360 Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} 361 Timestamp Timestamp: 17107041535 362 3632) Using perf framework: 364 365Coresight tracers are represented using the Perf framework's Performance 366Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of 367controlling when tracing gets enabled based on when the process of interest is 368scheduled. When configured in a system, Coresight PMUs will be listed when 369queried by the perf command line tool: 370 371 linaro@linaro-nano:~$ ./perf list pmu 372 373 List of pre-defined events (to be used in -e): 374 375 cs_etm// [Kernel PMU event] 376 377 linaro@linaro-nano:~$ 378 379Regardless of the number of tracers available in a system (usually equal to the 380amount of processor cores), the "cs_etm" PMU will be listed only once. 381 382A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is 383listed along with configuration options within forward slashes '/'. Since a 384Coresight system will typically have more than one sink, the name of the sink to 385work with needs to be specified as an event option. 386On newer kernels the available sinks are listed in sysFS under 387($SYSFS)/bus/event_source/devices/cs_etm/sinks/:: 388 389 root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls 390 tmc_etf0 tmc_etr0 tpiu0 391 392On older kernels, this may need to be found from the list of coresight devices, 393available under ($SYSFS)/bus/coresight/devices/:: 394 395 root:~# ls /sys/bus/coresight/devices/ 396 etm0 etm1 etm2 etm3 etm4 etm5 funnel0 397 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 398 root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program 399 400As mentioned above in section "Device Naming scheme", the names of the devices could 401look different from what is used in the example above. One must use the device names 402as it appears under the sysFS. 403 404The syntax within the forward slashes '/' is important. The '@' character 405tells the parser that a sink is about to be specified and that this is the sink 406to use for the trace session. 407 408More information on the above and other example on how to use Coresight with 409the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub 410repository [#third]_. 411 4122.1) AutoFDO analysis using the perf tools: 413 414perf can be used to record and analyze trace of programs. 415 416Execution can be recorded using 'perf record' with the cs_etm event, 417specifying the name of the sink to record to, e.g:: 418 419 perf record -e cs_etm/@tmc_etr0/u --per-thread 420 421The 'perf report' and 'perf script' commands can be used to analyze execution, 422synthesizing instruction and branch events from the instruction trace. 423'perf inject' can be used to replace the trace data with the synthesized events. 424The --itrace option controls the type and frequency of synthesized events 425(see perf documentation). 426 427Note that only 64-bit programs are currently supported - further work is 428required to support instruction decode of 32-bit Arm programs. 429 430 431Generating coverage files for Feedback Directed Optimization: AutoFDO 432--------------------------------------------------------------------- 433 434'perf inject' accepts the --itrace option in which case tracing data is 435removed and replaced with the synthesized events. e.g. 436:: 437 438 perf inject --itrace --strip -i perf.data -o perf.data.new 439 440Below is an example of using ARM ETM for autoFDO. It requires autofdo 441(https://github.com/google/autofdo) and gcc version 5. The bubble 442sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial). 443:: 444 445 $ gcc-5 -O3 sort.c -o sort 446 $ taskset -c 2 ./sort 447 Bubble sorting array of 30000 elements 448 5910 ms 449 450 $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort 451 Bubble sorting array of 30000 elements 452 12543 ms 453 [ perf record: Woken up 35 times to write data ] 454 [ perf record: Captured and wrote 69.640 MB perf.data ] 455 456 $ perf inject -i perf.data -o inj.data --itrace=il64 --strip 457 $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 458 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo 459 $ taskset -c 2 ./sort_autofdo 460 Bubble sorting array of 30000 elements 461 5806 ms 462 463 464How to use the STM module 465------------------------- 466 467Using the System Trace Macrocell module is the same as the tracers - the only 468difference is that clients are driving the trace capture rather 469than the program flow through the code. 470 471As with any other CoreSight component, specifics about the STM tracer can be 472found in sysfs with more information on each entry being found in [#first]_:: 473 474 root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0 475 enable_source hwevent_select port_enable subsystem uevent 476 hwevent_enable mgmt port_select traceid 477 root@genericarmv8:~# 478 479Like any other source a sink needs to be identified and the STM enabled before 480being used:: 481 482 root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink 483 root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source 484 485From there user space applications can request and use channels using the devfs 486interface provided for that purpose by the generic STM API:: 487 488 root@genericarmv8:~# ls -l /dev/stm0 489 crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0 490 root@genericarmv8:~# 491 492Details on how to use the generic STM API can be found here [#second]_. 493 494.. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm 495 496.. [#second] Documentation/trace/stm.rst 497 498.. [#third] https://github.com/Linaro/perf-opencsd 499