1AutoFDO and ARM Trace {#AutoFDO} 2===================== 3 4@brief Using CoreSight trace and perf with OpenCSD for AutoFDO. 5 6## Introduction 7 8Feedback directed optimization (FDO, also know as profile guided 9optimization - PGO) uses a profile of a program's execution to guide the 10optmizations performed by the compiler. Traditionally, this involves 11building an instrumented version of the program, which records a profile of 12execution as it runs. The instrumentation adds significant runtime 13overhead, possibly changing the behaviour of the program and it may not be 14possible to run the instrumented program in a production environment 15(e.g. where performance criteria must be met). 16 17AutoFDO uses facilities in the hardware to sample the behaviour of the 18program in the production environment and generate the execution profile. 19An improved profile can be obtained by including the branch history 20(i.e. a record of the last branches taken) when generating an instruction 21samples. On Arm systems, the ETM can be used to generate such records. 22 23The process can be broken down into the following steps: 24 25* Record execution trace of the program 26* Convert the execution trace to instruction samples with branch histories 27* Convert the instruction samples to source level profiles 28* Use the source level profile with the compiler 29 30This article describes how to enable ETM trace on Arm targets running Linux 31and use the ETM trace to generate AutoFDO profiles and compile an optimized 32program. 33 34 35## Execution trace on Arm targets 36 37Debug and trace of Arm targets is provided by CoreSight. This consists of 38a set of components that allow access to debug logic, record (trace) the 39execution of a processor and route this data through the system, collecting 40it into a store. 41 42To record the execution of a processor, we require the following 43components: 44 45* A trace source. The core contains a trace unit, called an ETM that emits 46 data describing the instructions executed by the core. 47* Trace links. The trace data generated by the ETM must be moved through 48 the system to the component that collects the data (sink). Links 49 include: 50 * Funnels: merge multiple streams of data 51 * FIFOs: buffer data to smooth out bursts 52 * Replicators: send a stream of data to multiple components 53* Sinks. These receive the trace data and store it or send it to an 54 external device: 55 * ETB: A small circular buffer (64-128 kilobytes) that stores the most 56 recent data 57 * ETR: A larger (several megabytes) buffer that uses system RAM to 58 store data 59 * TPIU: Sends data to an off-chip capture device (e.g. Arm DSTREAM) 60 61Each Arm SoC design may have a different layout (topology) of components. 62This topology is described to the OS drivers by the platform's devicetree 63or (in future) ACPI firmware. 64 65For application profiling, we need to store several megabytes of data 66within the system, so will use ETR with the capture tool (perf) 67periodically draining the buffer to a file. 68 69Even though we have a large capture buffer, the ETM can still generate a 70lot of data very quickly - typically an ETM will generate ~1 bit of data 71per instruction (depending on the workload), which results in 256Mbytes per 72second for a core running at 2GHz. This leads to problems storing and 73decoding such large volumes of data. AutoFDO uses samples of program 74execution, so we can avoid this problem by using the ETM's features to 75only record small slices of execution - e.g. collect ~5000 cycles of data 76every 50M cycles. This reduces the data rate to a manageable level - a few 77megabytes per minute. This technique is known as 'strobing'. 78 79 80## Enabling trace 81 82### Driver support 83 84CoreSight drivers must be built into the kernel to collect the trace. 85 86Typically the CoreSight trace drivers are be enabled in the kernel 87configuration. This can be done using the configuration menu (`make 88menuconfig`), selecting `Kernel hacking` / `arm64 Debugging` /`CoreSight Tracing Support` and 89enabling all options, or by setting the following in the configuration 90file: 91 92``` 93CONFIG_CORESIGHT=y 94CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y 95CONFIG_CORESIGHT_LINKS_AND_SINKS=y 96CONFIG_CORESIGHT_SINK_TPIU=y 97CONFIG_CORESIGHT_SOURCE_ETM4X=y 98CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y 99CONFIG_CORESIGHT_STM=y 100CONFIG_CORESIGHT_CATU=y 101``` 102 103Coresight support can also be built as modules. 104 105Compile the kernel for your target in the usual way, e.g. 106 107``` 108make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- 109``` 110 111Each target may have a different layout of CoreSight components. To 112collect trace into a sink, the kernel drivers need to know which other 113devices need to be configured to route data from the source to the sink. 114This is described in the devicetree (and in future, the ACPI tables). The 115device tree will define which CoreSight devices are present in the system, 116where they are located and how they are connected together. The devicetree 117for some platforms includes a description of the platform's CoreSight 118components, but in other cases you may have to ask the platform/SoC vendor 119to supply it or create it yourself (see Appendix: Describing CoreSight in 120Devicetree). 121 122Once the target has been booted with the devicetree describing the 123CoreSight devices, you should find the devices in sysfs: 124 125``` 126# ls /sys/bus/coresight/devices/ 127etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0 128etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0 129``` 130 131If `/sys/bus/coresight/devices/` is empty, you may want to check out your Kernel configuration to make sure your .config file is including CoreSight dependencies, such as the clock. 132 133The naming convention for etm devices can be different according to the kernel version you're using. 134For more information about the naming scheme, please check out the [Linux Kernel Documentation](https://www.kernel.org/doc/html/latest/trace/coresight/coresight.html#device-naming-scheme) 135 136### Configuration support - enabling strobing. 137 138From kernel version 5.16 onwards, the CoreSight System Configuration 139Management infrastructure is added that allows more complex CoreSight 140programming to be loaded on demand in the form of configurations and 141features, managed by configfs. 142 143A named configuration is a set of features and default parameters and 144presets. 145 146A feature is a set of register programming for a particular CoreSight 147component, which may contain some variable parameters that can be set 148from configfs, or set be specifying a particular preset from the 149configuration. 150 151There is a built-in name configuration called `autofdo`. 152This will load a feature called `strobing` onto each ETMv4 used in a 153trace session. The strobing feature defines two parameters called 154`window` and `period`. These are set prior to trace capture to control 155the strobing feature. 156 157When a trace session uses a configuration, the feature programming is 158carried out on all devices used during that session, and only those devices that are used. 159This avoids the need to program up all devices individually before starting a session. 160 161For additional information on using CoreSight configurations see the [CoreSight System Configuration Manager](https://www.kernel.org/doc/html/latest/trace/coresight/coresight-config.html) in the Linux Kernel Documentation. 162 163### Older Kernels (before 5.16) {#older_kernels} 164 165For targets using kernels prior to 5.16, CoreSight trace with strobing 166is enabled differently. 167 168For these targets, Arm have provided backports of the deprecated CoreSight 169drivers and ETM strobing patch at: 170 171 <https://gitlab.arm.com/linux-arm/linux-coresight-backports> 172 173This repository can be cloned with: 174 175``` 176git clone https://git.gitlab.arm.com/linux-arm/linux-coresight-backports.git 177``` 178 179You can include these backports in your kernel by either merging the 180appropriate branch using git or generating patches (using `git 181format-patch`). 182 183For 5.0 to 5.15 based kernels, the only patch which needs to be applied is the one enabling strobing - etm4x: `Enable strobing of ETM`. 184 185For 4.9 based kernels, use the `coresight-4.9-etr-etm_strobe` branch: 186 187``` 188git merge coresight-4.9-etr-etm_strobe 189``` 190 191or 192 193``` 194git format-patch --output-directory /output/dir v4.9..coresight-4.9-etr-etm_strobe 195cd my_kernel 196git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git 197``` 198 199For 4.14 based kernels, use the `coresight-4.14-etm_strobe` branch: 200 201``` 202git merge coresight-4.14-etm_strobe 203``` 204 205or 206 207``` 208git format-patch --output-directory /output/dir v4.14..coresight-4.14-etm_strobe 209cd my_kernel 210git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git 211``` 212 213For these kernels, configuration management is not available, so strobing is built directly into the sysfs of the individual components. 214The scripts set_strobing.sh and show_strobing.sh must be used to program up each individual ETM prior to starting a trace session. 215 216``` 217sudo ./set_strobing.sh 5000 10000 218``` 219 220## Perf tools 221 222The perf tool is used to capture execution trace, configuring the trace 223sources to generate trace, routing the data to the sink and collecting the 224data from the sink. 225 226Arm recommends to use the perf version corresponding to the kernel running 227on the target. This can be built from the same kernel sources with 228 229``` 230make -C tools/perf CORESIGHT=1 VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- 231``` 232 233When specifying CORESIGHT=1, perf will be built using the installed OpenCSD library. 234If you are cross compiling, then additional setup is required to ensure the build process links against the correct version of the library. 235 236If the post-processing (`perf inject`) of the captured data is not being 237done on the target, then the OpenCSD library is not required for this build 238of perf. 239 240Trace is captured by collecting the `cs_etm` event from perf. The sink 241to collect data into is specified as a parameter of this event. Trace can 242also be restricted to user space or kernel space with 'u' or 'k' 243parameters. For example: 244 245``` 246perf record -e cs_etm/@tmc_etr0/u --per-thread -- /bin/ls 247``` 248 249Will record the userspace execution of '/bin/ls' using tmc_etr0 as sink. 250 251Alternatively, the sink can be omitted an the system will choose the most appropriate 252sink: 253 254``` 255perf record -e cs_etm//u --per-thread -- /bin/ls 256``` 257 258 259### Capturing modes 260 261You can trace a single-threaded program in two different ways: 262 2631. By specifying `--per-thread`, and in this case the CoreSight subsystem will 264record only a trace relative to the given program. 265 2662. By NOT specifying `--per-thread`, and in this case CPU-wide tracing will 267be enabled. In this scenario the trace will contain both the target program trace 268and other workloads that were executing on the same CPU 269 270 271 272## Processing trace and profiles 273 274perf is also used to convert the execution trace an instruction profile. 275This requires a different build of perf, using the version of perf from 276Linux v4.17 or later, as the trace processing code isn't included in the 277driver backports. Trace decode is provided by the OpenCSD library 278(<https://github.com/Linaro/OpenCSD>), v0.9.1 or later. This is packaged 279for debian testing (install the libopencsd0, libopencsd-dev packages) or 280can be compiled from source and installed. 281 282The autoFDO tool <https://github.com/google/autofdo> is used to convert the 283instruction profiles to source profiles for the GCC and clang/llvm 284compilers. 285 286 287### Recording trace for the profile 288 289Once trace collection using perf is working, we can now use it to profile 290an application. 291 292The application must be compiled to include sufficient debug information to 293map instructions back to source lines. For GCC, use the `-g1` or `-gmlt` 294options. For clang/llvm, also add the `-fdebug-info-for-profiling` option. 295 296perf identifies the active program or library using the build identifier 297stored in the elf file. This should be added at link time with the compiler 298flag `-Wl,--build-id=sha1`. 299 300The next step is to record the execution trace of the application using the 301perf tool. The ETM strobing should be configured before running the perf 302tool. There are two parameters: 303 304 * window size: A number of CPU cycles (W) 305 * period: Trace is enabled for W cycle every _period_ * W cycles. 306 307To collect trace from an application using ETM strobing with default parameters run: 308 309``` 310perf record -e cs_etm/autofdo/u --per-thread -- <your app> 311``` 312 313To use specific strobing parameters, run: 314 315``` 316echo 5000 > /configfs/cs-syscfg/features/strobing/window/value 317echo 10000 > /configfs/cs-syscfg/features/strobing/period/value 318perf record -e cs_etm/autofdo/u --per-thread -- <your app> 319``` 320 321or alternatively, use on of the built-in parameter presets: 322 323``` 324perf record -e cs_etm/autofdo,preset=1/u --per-thread -- <your app> 325``` 326 327 328The raw trace can be examined using the `perf report` command: 329 330``` 331perf report -D -i perf.data --stdio 332``` 333 334Perf needs to be built from your linux kernel version souce code repository against the OpenCSD library in order to be able to properly read ETM-gathered samples and post-process them. 335If running `perf report` produces an error like: 336 337``` 3380x1f8 [0x268]: failed to process type: 70 [Operation not permitted] 339Error: 340failed to process sample 341``` 342or 343 344``` 345"file uses a more recent and unsupported ABI (8 bytes extra). incompatible file format". 346``` 347 348You are probably using a perf version which is not using this library: please make sure to install this project in your system by either compiling it from [Source Code]( <https://github.com/Linaro/OpenCSD>) from v0.9.1 or later and compile perf using this library. 349Otherwise, this project is packaged for debian (install the libopencsd0, libopencsd-dev packages). 350 351 352For example: 353 354``` 3550x1d370 [0x30]: PERF_RECORD_AUXTRACE size: 0x2003c0 offset: 0 ref: 0x39ba881d145f8639 idx: 0 tid: 4551 cpu: -1 356 357. ... CoreSight ETM Trace data: size 2098112 bytes 358 Idx:0; ID:12; I_ASYNC : Alignment Synchronisation. 359 Idx:12; ID:12; I_TRACE_INFO : Trace Info.; INFO=0x0 360 Idx:17; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C; 361 Idx:48; ID:14; I_ASYNC : Alignment Synchronisation. 362 Idx:60; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0 363 Idx:65; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C; 364 Idx:96; ID:14; I_ASYNC : Alignment Synchronisation. 365 Idx:108; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0 366 Idx:113; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C; 367 Idx:122; ID:14; I_TRACE_ON : Trace On. 368 Idx:123; ID:14; I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000000000407B00; Ctxt: AArch64,EL0, NS; 369 Idx:134; ID:14; I_ATOM_F3 : Atom format 3.; ENN 370 Idx:135; ID:14; I_ATOM_F5 : Atom format 5.; NENEN 371 Idx:136; ID:14; I_ATOM_F5 : Atom format 5.; ENENE 372 Idx:137; ID:14; I_ATOM_F5 : Atom format 5.; NENEN 373 Idx:138; ID:14; I_ATOM_F3 : Atom format 3.; ENN 374 Idx:139; ID:14; I_ATOM_F3 : Atom format 3.; NNE 375 Idx:140; ID:14; I_ATOM_F1 : Atom format 1.; E 376..... 377``` 378 379### Generating the profile from the trace 380 381The execution trace is then converted to an instruction profile using 382the perf build with trace decode support. This may be done on a different 383machine than that which collected the trace (e.g. when cross compiling for 384an embedded target). The `perf inject` command 385decodes the execution trace and generates periodic instruction samples, 386with branch histories: 387 388!! Careful: if you are using a device different than the one used to collect the profiling data, 389you'll need to run `perf buildid-cache` as described below. 390``` 391perf inject -i perf.data -o inj.data --itrace=i100000il 392``` 393 394The `--itrace` option configures the instruction sample behaviour: 395 396* `i100000i` generates an instruction sample every 100000 instructions 397 (only instruction count periods are currently supported, future versions 398 may support time or cycle count periods) 399* `l` includes the branch histories on each sample 400* `b` generates a sample on each branch (not used here) 401 402Perf requires the original program binaries to decode the execution trace. 403If running the `inject` command on a different system than the trace was 404captured on, then the binary and any shared libraries must be added to 405perf's cache with: 406 407``` 408perf buildid-cache -a /path/to/binary_or_library 409``` 410 411`perf report` can also be used to show the instruction samples: 412 413``` 414perf report -D -i inj.data --stdio 415....... 4160x1528 [0x630]: PERF_RECORD_SAMPLE(IP, 0x2): 4551/4551: 0x434b98 period: 3093 addr: 0 417... branch stack: nr:64 418..... 0: 0000000000434b58 -> 0000000000434b68 0 cycles P 0 419..... 1: 0000000000436a88 -> 0000000000434b4c 0 cycles P 0 420..... 2: 0000000000436a64 -> 0000000000436a78 0 cycles P 0 421..... 3: 00000000004369d0 -> 0000000000436a60 0 cycles P 0 422..... 4: 000000000043693c -> 00000000004369cc 0 cycles P 0 423..... 5: 00000000004368a8 -> 0000000000436928 0 cycles P 0 424..... 6: 000000000042d070 -> 00000000004368a8 0 cycles P 0 425..... 7: 000000000042d108 -> 000000000042d070 0 cycles P 0 426....... 427..... 57: 0000000000448ee0 -> 0000000000448f24 0 cycles P 0 428..... 58: 0000000000448ea4 -> 0000000000448ebc 0 cycles P 0 429..... 59: 0000000000448e20 -> 0000000000448e94 0 cycles P 0 430..... 60: 0000000000448da8 -> 0000000000448ddc 0 cycles P 0 431..... 61: 00000000004486f4 -> 0000000000448da8 0 cycles P 0 432..... 62: 00000000004480fc -> 00000000004486d4 0 cycles P 0 433..... 63: 0000000000448658 -> 00000000004480ec 0 cycles P 0 434 ... thread: program1:4551 435 ...... dso: /home/root/program1 436....... 437``` 438 439The instruction samples produced by `perf inject` is then passed to the 440autofdo tool to generate source level profiles for the compiler. For 441clang/LLVM: 442 443``` 444create_llvm_prof -binary=/path/to/binary -format extbinary -profile=inj.data -out=program.llvmprof 445``` 446 447The optional `-format extbinary` creates an output suitable for FSAFDO sometimes used in the kernel, and give improved performance gains. See AFDO 448docs for more details. 449 450And for GCC: 451 452``` 453create_gcov -binary=/path/to/binary -profile=inj.data -gcov_version=1 -gcov=program.gcov 454``` 455 456The profiles can be viewed with: 457 458``` 459llvm-profdata show -sample program.llvmprof 460``` 461 462Or, for GCC: 463 464``` 465dump_gcov -gcov_version=1 program.gcov 466``` 467 468### Using profile in the compiler 469 470The profile produced by the above steps can then be passed to the compiler 471to optimize the next build of the program. 472 473For GCC, use the `-fauto-profile` option: 474 475``` 476gcc -O2 -fauto-profile=program.gcov -o program program.c 477``` 478 479For Clang, use the `-fprofile-sample-use` option: 480 481``` 482clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c 483``` 484 485 486### Summary 487 488The basic commands to run an application and create a compiler profile are: 489 490``` 491perf record -e cs_etm/autofdo/u --per-thread -- <your app> 492perf inject -i perf.data -o inj.data --itrace=i100000il 493create_llvm_prof -binary=/path/to/binary -format extbinary -profile=inj.data -out=program.llvmprof 494clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c 495``` 496 497Use `create_gcov` for gcc. 498 499## High Level Summary for recoding on Arm board and decoding on different host 500 5011. (on Arm board) `perf record -e cs_etm/autofdo/u --per-thread -- <your app>.` <br> 502 If you specify `-N, --no-buildid-cache`, perf will just take care of recording the target binary and nothing will be copied.<br> If you don't specify it, any recorded dynamic library will be copied to ~/.debug in the board.<br><br> 503 5042. (on Arm board) `perf archive` which saves all the found libraries in a tar (internally, it looks into perf.data file and performs a lookup using perf-buildid-list --with-hits) 505 5063. (on host) `scp` to copy perf.data and the .tar file generated from `perf archive`.<br><br> 507 5084. (on host) Run `tar xvf perf_data.tar.bz2 -C ~/.debug` to populate the buildid-cache<br><br> 509 5105. (on host) Double check the setup is correct: 511 512 (a) `perf buildid-list -i perf.data` 513 - gives you the list of dynamic libraries buildids whose trace has been recorded and saved in perf.data.<br><br> 514 515 (b) `perf buildid-cache --list` <br> 516 - lists the dynamic libraries in the buildid cache that will be used by `perf inject`.<br> 517 Make sure the output of (a) and (b) overlaps as in buildid value for those binaries you are interested into optimizing with afdo.<br><br> 518 5196. (on host) `perf inject -i perf.data -o inj.data --itrace=i100000il` <br> 520 - will check for the dynamic libraries using the buildid inside the buildid-cache and post-process the trace.<br> 521 buildids have to be the same, otherwise it won't be possible to post-process the trace. <br><br> 522 5237. (on host) `create_llvm_prof -binary=/path/to/binary -format extbinary -profile=inj.data -out=program.llvmprof` <br> 524 - takes the output from perf-inject and tranforms it into a format that the compiler can read. <br><br> 525 5268. (on host) `clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c` <br> 527 - to make clang use the produced profile.<br> 528 If you are confident enough that your profile is accurate, you can add the `-fprofile-sample-accurate` flag, which will penalize all the callsites without corresponding profile, marking them as cold. 529 530If you are using the same host for both building the binary to be traced and re-building it with afdo: 531 5321. You won't need to copy back any dynamic libraries from the board (since you already have them), and can use `--no-buildid-cache` when recording 5332. You have to make sure the relevant dynamic libraries to be optimized are present in the buildid-cache. 534 535You can easily add a dynamic library manually into the build-id cache by running: 536 537`perf buildid-cache --add <path/to/library/or/binary> -vvv` 538 539You can easily check what is currently contained in you buildid-cache by running: 540 541`perf buildid-cache --list` 542 543You can check the buildid of a given binary/dynamic library: 544 545`file <path/to/dynamic/library>` 546 547## References 548 549* AutoFDO tool: <https://github.com/google/autofdo> 550* GCC's wiki on autofdo: <https://gcc.gnu.org/wiki/AutoFDO>, <https://gcc.gnu.org/wiki/AutoFDO/Tutorial> 551* Google paper: <https://ai.google/research/pubs/pub45290> 552* CoreSight kernel docs: <https://www.kernel.org/doc/html/latest/trace/coresight/index.html> 553 554 555## Appendix: Describing CoreSight in Devicetree 556 557 558Each component has an entry in the device tree that describes its: 559 560* type: The `compatible` field defines which driver to use 561* location: A `reg` defines the component's address and size on the bus 562* clocks: The `clocks` and `clock-names` fields state which clock provides 563 the `apb_pclk` clock. 564* connections to other components: `port` and `ports` field link the 565 component to ports of other components 566 567To create the device tree, some information about the platform is required: 568 569* The memory address of the CoreSight components. This is the address in 570 the CPU's address space where the CPU can access each CoreSight 571 component. 572* The connections between the components. 573 574This information can be found in the SoC's reference manual or you may need 575to ask the platform/SoC vendor to supply it. 576 577An ETMv4 source is declared with a section like this: 578 579``` 580 etm0: etm@22040000 { 581 compatible = "arm,coresight-etm4x", "arm,primecell"; 582 reg = <0 0x22040000 0 0x1000>; 583 584 cpu = <&A72_0>; 585 clocks = <&soc_smc50mhz>; 586 clock-names = "apb_pclk"; 587 port { 588 cluster0_etm0_out_port: endpoint { 589 remote-endpoint = <&cluster0_funnel_in_port0>; 590 }; 591 }; 592 }; 593``` 594 595This describes an ETMv4 attached to core A72_0, located at 0x22040000, with 596its output linked to port 0 of a funnel. The funnel is described with: 597 598``` 599 funnel@220c0000 { /* cluster0 funnel */ 600 compatible = "arm,coresight-funnel", "arm,primecell"; 601 reg = <0 0x220c0000 0 0x1000>; 602 603 clocks = <&soc_smc50mhz>; 604 clock-names = "apb_pclk"; 605 power-domains = <&scpi_devpd 0>; 606 ports { 607 #address-cells = <1>; 608 #size-cells = <0>; 609 610 port@0 { 611 reg = <0>; 612 cluster0_funnel_out_port: endpoint { 613 remote-endpoint = <&main_funnel_in_port0>; 614 }; 615 }; 616 617 port@1 { 618 reg = <0>; 619 cluster0_funnel_in_port0: endpoint { 620 slave-mode; 621 remote-endpoint = <&cluster0_etm0_out_port>; 622 }; 623 }; 624 625 port@2 { 626 reg = <1>; 627 cluster0_funnel_in_port1: endpoint { 628 slave-mode; 629 remote-endpoint = <&cluster0_etm1_out_port>; 630 }; 631 }; 632 }; 633 }; 634``` 635 636This describes a funnel located at 0x220c0000, receiving data from 2 ETMs 637and sending the merged data to another funnel. We continue describing 638components with similar blocks until we reach the sink (an ETR): 639 640``` 641 etr@20070000 { 642 compatible = "arm,coresight-tmc", "arm,primecell"; 643 reg = <0 0x20070000 0 0x1000>; 644 iommus = <&smmu_etr 0>; 645 646 clocks = <&soc_smc50mhz>; 647 clock-names = "apb_pclk"; 648 power-domains = <&scpi_devpd 0>; 649 port { 650 etr_in_port: endpoint { 651 slave-mode; 652 remote-endpoint = <&replicator_out_port1>; 653 }; 654 }; 655 }; 656``` 657 658Full descriptions of the properties of each component can be found in the 659Linux source at Documentation/devicetree/bindings/arm/ - files which start `arm,coresight-` are the component bindings. 660 661The Arm Juno platform's devicetree (arch/arm64/boot/dts/arm) provides an example description of CoreSight description. 662 663Many systems include a TPIU for off-chip trace. While this isn't required 664for self-hosted trace, it should still be included in the devicetree. This 665allows the drivers to access it to ensure it is put into a disabled state, 666otherwise it may limit the trace bandwidth causing data loss. 667