• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1AutoFDO and ARM Trace   {#AutoFDO}
2=====================
3
4@brief Using CoreSight trace and perf with OpenCSD for AutoFDO.
5
6## Introduction
7
8Feedback directed optimization (FDO, also know as profile guided
9optimization - PGO) uses a profile of a program's execution to guide the
10optmizations performed by the compiler.  Traditionally, this involves
11building an instrumented version of the program, which records a profile of
12execution as it runs.  The instrumentation adds significant runtime
13overhead, possibly changing the behaviour of the program and it may not be
14possible to run the instrumented program in a production environment
15(e.g. where performance criteria must be met).
16
17AutoFDO uses facilities in the hardware to sample the behaviour of the
18program in the production environment and generate the execution profile.
19An improved profile can be obtained by including the branch history
20(i.e. a record of the last branches taken) when generating an instruction
21samples.  On Arm systems, the ETM can be used to generate such records.
22
23The process can be broken down into the following steps:
24
25* Record execution trace of the program
26* Convert the execution trace to instruction samples with branch histories
27* Convert the instruction samples to source level profiles
28* Use the source level profile with the compiler
29
30This article describes how to enable ETM trace on Arm targets running Linux
31and use the ETM trace to generate AutoFDO profiles and compile an optimized
32program.
33
34
35## Execution trace on Arm targets
36
37Debug and trace of Arm targets is provided by CoreSight.  This consists of
38a set of components that allow access to debug logic, record (trace) the
39execution of a processor and route this data through the system, collecting
40it into a store.
41
42To record the execution of a processor, we require the following
43components:
44
45* A trace source.  The core contains a trace unit, called an ETM that emits
46  data describing the instructions executed by the core.
47* Trace links.  The trace data generated by the ETM must be moved through
48  the system to the component that collects the data (sink).  Links
49  include:
50    * Funnels: merge multiple streams of data
51    * FIFOs: buffer data to smooth out bursts
52    * Replicators: send a stream of data to multiple components
53* Sinks.  These receive the trace data and store it or send it to an
54  external device:
55    * ETB: A small circular buffer (64-128 kilobytes) that stores the most
56      recent data
57    * ETR: A larger (several megabytes) buffer that uses system RAM to
58      store data
59    * TPIU: Sends data to an off-chip capture device (e.g. Arm DSTREAM)
60
61Each Arm SoC design may have a different layout (topology) of components.
62This topology is described to the OS drivers by the platform's devicetree
63or (in future) ACPI firmware.
64
65For application profiling, we need to store several megabytes of data
66within the system, so will use ETR with the capture tool (perf)
67periodically draining the buffer to a file.
68
69Even though we have a large capture buffer, the ETM can still generate a
70lot of data very quickly - typically an ETM will generate ~1 bit of data
71per instruction (depending on the workload), which results in 256Mbytes per
72second for a core running at 2GHz.  This leads to problems storing and
73decoding such large volumes of data.  AutoFDO uses samples of program
74execution, so we can avoid this problem by using the ETM's features to
75only record small slices of execution - e.g. collect ~5000 cycles of data
76every 50M cycles.  This reduces the data rate to a manageable level - a few
77megabytes per minute.  This technique is known as 'strobing'.
78
79
80## Enabling trace
81
82### Driver support
83
84CoreSight drivers must be built into the kernel to collect the trace.
85
86Typically the CoreSight trace drivers are be enabled in the kernel
87configuration.  This can be done using the configuration menu (`make
88menuconfig`), selecting `Kernel hacking` / `arm64 Debugging`  /`CoreSight Tracing Support` and
89enabling all options, or by setting the following in the configuration
90file:
91
92```
93CONFIG_CORESIGHT=y
94CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
95CONFIG_CORESIGHT_LINKS_AND_SINKS=y
96CONFIG_CORESIGHT_SINK_TPIU=y
97CONFIG_CORESIGHT_SOURCE_ETM4X=y
98CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
99CONFIG_CORESIGHT_STM=y
100CONFIG_CORESIGHT_CATU=y
101```
102
103Coresight support can also be built as modules.
104
105Compile the kernel for your target in the usual way, e.g.
106
107```
108make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
109```
110
111Each target may have a different layout of CoreSight components.  To
112collect trace into a sink, the kernel drivers need to know which other
113devices need to be configured to route data from the source to the sink.
114This is described in the devicetree (and in future, the ACPI tables).  The
115device tree will define which CoreSight devices are present in the system,
116where they are located and how they are connected together.  The devicetree
117for some platforms includes a description of the platform's CoreSight
118components, but in other cases you may have to ask the platform/SoC vendor
119to supply it or create it yourself (see Appendix: Describing CoreSight in
120Devicetree).
121
122Once the target has been booted with the devicetree describing the
123CoreSight devices, you should find the devices in sysfs:
124
125```
126# ls /sys/bus/coresight/devices/
127etm0  etm2  etm4  etm6  funnel0  funnel2  funnel4      stm0      tmc_etr0
128etm1  etm3  etm5  etm7  funnel1  funnel3  replicator0  tmc_etf0
129```
130
131If `/sys/bus/coresight/devices/` is empty, you may want to check out your Kernel configuration to make sure your .config file is including CoreSight dependencies, such as the clock.
132
133The naming convention for etm devices can be different according to the kernel version you're using.
134For more information about the naming scheme, please check out the [Linux Kernel Documentation](https://www.kernel.org/doc/html/latest/trace/coresight/coresight.html#device-naming-scheme)
135
136### Configuration support - enabling strobing.
137
138From kernel version 5.16 onwards, the CoreSight System Configuration
139Management infrastructure is added that allows more complex CoreSight
140programming to be loaded on demand in the form of configurations and
141features, managed by configfs.
142
143A named configuration is a set of features and default parameters and
144presets.
145
146A feature is a set of register programming for a particular CoreSight
147component, which may contain some variable parameters that can be set
148from configfs, or set be specifying a particular preset from the
149configuration.
150
151There is a built-in name configuration called `autofdo`.
152This will load a feature called `strobing` onto each ETMv4 used in a
153trace session. The strobing feature defines two parameters called
154`window` and `period`. These are set prior to trace capture to control
155the strobing feature.
156
157When a trace session uses a configuration, the feature programming is
158carried out on all devices used during that session, and only those devices that are used.
159This avoids the need to program up all devices individually before starting a session.
160
161For additional information on using CoreSight configurations see the [CoreSight System Configuration Manager](https://www.kernel.org/doc/html/latest/trace/coresight/coresight-config.html) in the Linux Kernel Documentation.
162
163### Older Kernels (before 5.16) {#older_kernels}
164
165For targets using kernels prior to 5.16, CoreSight trace with strobing
166is enabled differently.
167
168For these targets, Arm have provided backports of the deprecated CoreSight
169drivers and ETM strobing patch at:
170
171  <https://gitlab.arm.com/linux-arm/linux-coresight-backports>
172
173This repository can be cloned with:
174
175```
176git clone https://git.gitlab.arm.com/linux-arm/linux-coresight-backports.git
177```
178
179You can include these backports in your kernel by either merging the
180appropriate branch using git or generating patches (using `git
181format-patch`).
182
183For 5.0 to 5.15 based kernels, the only patch which needs to be applied is the one enabling strobing - etm4x: `Enable strobing of ETM`.
184
185For 4.9 based kernels, use the `coresight-4.9-etr-etm_strobe` branch:
186
187```
188git merge coresight-4.9-etr-etm_strobe
189```
190
191or
192
193```
194git format-patch --output-directory /output/dir v4.9..coresight-4.9-etr-etm_strobe
195cd my_kernel
196git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
197```
198
199For 4.14 based kernels, use the `coresight-4.14-etm_strobe` branch:
200
201```
202git merge coresight-4.14-etm_strobe
203```
204
205or
206
207```
208git format-patch --output-directory /output/dir v4.14..coresight-4.14-etm_strobe
209cd my_kernel
210git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
211```
212
213For these kernels, configuration management is not available, so strobing is built directly into the sysfs of the individual components.
214The scripts set_strobing.sh and show_strobing.sh must be used to program up each individual ETM prior to starting a trace session.
215
216```
217sudo ./set_strobing.sh 5000 10000
218```
219
220## Perf tools
221
222The perf tool is used to capture execution trace, configuring the trace
223sources to generate trace, routing the data to the sink and collecting the
224data from the sink.
225
226Arm recommends to use the perf version corresponding to the kernel running
227on the target.  This can be built from the same kernel sources with
228
229```
230make -C tools/perf CORESIGHT=1 VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
231```
232
233When specifying CORESIGHT=1, perf will be built using the installed OpenCSD library.
234If you are cross compiling, then additional setup is required to ensure the build process links against the correct version of the library.
235
236If the post-processing (`perf inject`) of the captured data is not being
237done on the target, then the OpenCSD library is not required for this build
238of perf.
239
240Trace is captured by collecting the `cs_etm` event from perf.  The sink
241to collect data into is specified as a parameter of this event.  Trace can
242also be restricted to user space or kernel space with 'u' or 'k'
243parameters.  For example:
244
245```
246perf record -e cs_etm/@tmc_etr0/u --per-thread -- /bin/ls
247```
248
249Will record the userspace execution of '/bin/ls' using tmc_etr0 as sink.
250
251Alternatively, the sink can be omitted an the system will choose the most appropriate
252sink:
253
254```
255perf record -e cs_etm//u --per-thread -- /bin/ls
256```
257
258
259### Capturing modes
260
261You can trace a single-threaded program in two different ways:
262
2631. By specifying `--per-thread`, and in this case the CoreSight subsystem will
264record only a trace relative to the given program.
265
2662. By NOT specifying `--per-thread`, and in this case CPU-wide tracing will
267be enabled. In this scenario the trace will contain both the target program trace
268and other workloads that were executing on the same CPU
269
270
271
272## Processing trace and profiles
273
274perf is also used to convert the execution trace an instruction profile.
275This requires a different build of perf, using the version of perf from
276Linux v4.17 or later, as the trace processing code isn't included in the
277driver backports.  Trace decode is provided by the OpenCSD library
278(<https://github.com/Linaro/OpenCSD>), v0.9.1 or later.  This is packaged
279for debian testing (install the libopencsd0, libopencsd-dev packages) or
280can be compiled from source and installed.
281
282The autoFDO tool <https://github.com/google/autofdo> is used to convert the
283instruction profiles to source profiles for the GCC and clang/llvm
284compilers.
285
286
287### Recording trace for the profile
288
289Once trace collection using perf is working, we can now use it to profile
290an application.
291
292The application must be compiled to include sufficient debug information to
293map instructions back to source lines.  For GCC, use the `-g1` or `-gmlt`
294options.  For clang/llvm, also add the `-fdebug-info-for-profiling` option.
295
296perf identifies the active program or library using the build identifier
297stored in the elf file.  This should be added at link time with the compiler
298flag `-Wl,--build-id=sha1`.
299
300The next step is to record the execution trace of the application using the
301perf tool.  The ETM strobing should be configured before running the perf
302tool.  There are two parameters:
303
304  * window size: A number of CPU cycles (W)
305  * period: Trace is enabled for W cycle every _period_ * W cycles.
306
307To collect trace from an application using ETM strobing with default parameters run:
308
309```
310perf record -e cs_etm/autofdo/u --per-thread -- <your app>
311```
312
313To use specific strobing parameters, run:
314
315```
316echo 5000 > /configfs/cs-syscfg/features/strobing/window/value
317echo 10000 > /configfs/cs-syscfg/features/strobing/period/value
318perf record -e cs_etm/autofdo/u --per-thread -- <your app>
319```
320
321or alternatively, use on of the built-in parameter presets:
322
323```
324perf record -e cs_etm/autofdo,preset=1/u --per-thread -- <your app>
325```
326
327
328The raw trace can be examined using the `perf report` command:
329
330```
331perf report -D -i perf.data --stdio
332```
333
334Perf needs to be built from your linux kernel version souce code repository against the OpenCSD library in order to be able to properly read ETM-gathered samples and post-process them.
335If running `perf report` produces an error like:
336
337```
3380x1f8 [0x268]: failed to process type: 70 [Operation not permitted]
339Error:
340failed to process sample
341```
342or
343
344```
345"file uses a more recent and unsupported ABI (8 bytes extra). incompatible file format".
346```
347
348You are probably using a perf version which is not using this library: please make sure to install this project in your system by either compiling it from [Source Code]( <https://github.com/Linaro/OpenCSD>) from v0.9.1 or later and compile perf using this library.
349Otherwise, this project is packaged for debian (install the libopencsd0, libopencsd-dev packages).
350
351
352For example:
353
354```
3550x1d370 [0x30]: PERF_RECORD_AUXTRACE size: 0x2003c0  offset: 0  ref: 0x39ba881d145f8639  idx: 0  tid: 4551  cpu: -1
356
357. ... CoreSight ETM Trace data: size 2098112 bytes
358        Idx:0; ID:12;   I_ASYNC : Alignment Synchronisation.
359        Idx:12; ID:12;  I_TRACE_INFO : Trace Info.; INFO=0x0
360        Idx:17; ID:12;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C;
361        Idx:48; ID:14;  I_ASYNC : Alignment Synchronisation.
362        Idx:60; ID:14;  I_TRACE_INFO : Trace Info.; INFO=0x0
363        Idx:65; ID:14;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C;
364        Idx:96; ID:14;  I_ASYNC : Alignment Synchronisation.
365        Idx:108; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0
366        Idx:113; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFF000008A4991C;
367        Idx:122; ID:14; I_TRACE_ON : Trace On.
368        Idx:123; ID:14; I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000000000407B00; Ctxt: AArch64,EL0, NS;
369        Idx:134; ID:14; I_ATOM_F3 : Atom format 3.; ENN
370        Idx:135; ID:14; I_ATOM_F5 : Atom format 5.; NENEN
371        Idx:136; ID:14; I_ATOM_F5 : Atom format 5.; ENENE
372        Idx:137; ID:14; I_ATOM_F5 : Atom format 5.; NENEN
373        Idx:138; ID:14; I_ATOM_F3 : Atom format 3.; ENN
374        Idx:139; ID:14; I_ATOM_F3 : Atom format 3.; NNE
375        Idx:140; ID:14; I_ATOM_F1 : Atom format 1.; E
376.....
377```
378
379### Generating the profile from the trace
380
381The execution trace is then converted to an instruction profile using
382the perf build with trace decode support.  This may be done on a different
383machine than that which collected the trace (e.g. when cross compiling for
384an embedded target).  The `perf inject` command
385decodes the execution trace and generates periodic instruction samples,
386with branch histories:
387
388!! Careful: if you are using a device different than the one used to collect the profiling data,
389you'll need to run `perf buildid-cache` as described below.
390```
391perf inject -i perf.data -o inj.data --itrace=i100000il
392```
393
394The `--itrace` option configures the instruction sample behaviour:
395
396* `i100000i` generates an instruction sample every 100000 instructions
397  (only instruction count periods are currently supported, future versions
398  may support time or cycle count periods)
399* `l` includes the branch histories on each sample
400* `b` generates a sample on each branch (not used here)
401
402Perf requires the original program binaries to decode the execution trace.
403If running the `inject` command on a different system than the trace was
404captured on, then the binary and any shared libraries must be added to
405perf's cache with:
406
407```
408perf buildid-cache -a /path/to/binary_or_library
409```
410
411`perf report` can also be used to show the instruction samples:
412
413```
414perf report -D -i inj.data --stdio
415.......
4160x1528 [0x630]: PERF_RECORD_SAMPLE(IP, 0x2): 4551/4551: 0x434b98 period: 3093 addr: 0
417... branch stack: nr:64
418.....  0: 0000000000434b58 -> 0000000000434b68 0 cycles  P   0
419.....  1: 0000000000436a88 -> 0000000000434b4c 0 cycles  P   0
420.....  2: 0000000000436a64 -> 0000000000436a78 0 cycles  P   0
421.....  3: 00000000004369d0 -> 0000000000436a60 0 cycles  P   0
422.....  4: 000000000043693c -> 00000000004369cc 0 cycles  P   0
423.....  5: 00000000004368a8 -> 0000000000436928 0 cycles  P   0
424.....  6: 000000000042d070 -> 00000000004368a8 0 cycles  P   0
425.....  7: 000000000042d108 -> 000000000042d070 0 cycles  P   0
426.......
427..... 57: 0000000000448ee0 -> 0000000000448f24 0 cycles  P   0
428..... 58: 0000000000448ea4 -> 0000000000448ebc 0 cycles  P   0
429..... 59: 0000000000448e20 -> 0000000000448e94 0 cycles  P   0
430..... 60: 0000000000448da8 -> 0000000000448ddc 0 cycles  P   0
431..... 61: 00000000004486f4 -> 0000000000448da8 0 cycles  P   0
432..... 62: 00000000004480fc -> 00000000004486d4 0 cycles  P   0
433..... 63: 0000000000448658 -> 00000000004480ec 0 cycles  P   0
434 ... thread: program1:4551
435 ...... dso: /home/root/program1
436.......
437```
438
439The instruction samples produced by `perf inject` is then passed to the
440autofdo tool to generate source level profiles for the compiler.  For
441clang/LLVM:
442
443```
444create_llvm_prof -binary=/path/to/binary -format extbinary -profile=inj.data -out=program.llvmprof
445```
446
447The optional `-format extbinary` creates an output suitable for FSAFDO sometimes used in the kernel, and give improved performance gains. See AFDO
448docs for more details.
449
450And for GCC:
451
452```
453create_gcov -binary=/path/to/binary -profile=inj.data -gcov_version=1 -gcov=program.gcov
454```
455
456The profiles can be viewed with:
457
458```
459llvm-profdata show -sample program.llvmprof
460```
461
462Or, for GCC:
463
464```
465dump_gcov -gcov_version=1 program.gcov
466```
467
468### Using profile in the compiler
469
470The profile produced by the above steps can then be passed to the compiler
471to optimize the next build of the program.
472
473For GCC, use the `-fauto-profile` option:
474
475```
476gcc -O2 -fauto-profile=program.gcov -o program program.c
477```
478
479For Clang, use the `-fprofile-sample-use` option:
480
481```
482clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c
483```
484
485
486### Summary
487
488The basic commands to run an application and create a compiler profile are:
489
490```
491perf record -e cs_etm/autofdo/u --per-thread -- <your app>
492perf inject -i perf.data -o inj.data --itrace=i100000il
493create_llvm_prof -binary=/path/to/binary -format extbinary -profile=inj.data -out=program.llvmprof
494clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c
495```
496
497Use `create_gcov` for gcc.
498
499## High Level Summary for recoding on Arm board and decoding on different host
500
5011. (on Arm board) `perf record -e cs_etm/autofdo/u --per-thread -- <your app>.` <br>
502	If you specify `-N, --no-buildid-cache`, perf will just take care of recording the target binary and nothing will be copied.<br>  If you don't specify it, any recorded dynamic library will be copied to ~/.debug in the board.<br><br>
503
5042. (on Arm board) `perf archive` which saves all the found libraries in a tar (internally, it looks into perf.data file and performs a lookup using perf-buildid-list --with-hits)
505
5063. (on host) `scp` to copy perf.data and the .tar file generated from `perf archive`.<br><br>
507
5084. (on host) Run `tar xvf perf_data.tar.bz2 -C ~/.debug` to populate the buildid-cache<br><br>
509
5105. (on host) Double check the setup is correct:
511
512    (a) `perf buildid-list -i perf.data`
513       - gives you the list of dynamic libraries buildids whose trace has been recorded and saved in perf.data.<br><br>
514
515    (b) `perf buildid-cache --list` <br>
516       - lists the dynamic libraries in the buildid cache that will be used by `perf inject`.<br>
517	Make sure the output of (a) and (b) overlaps as in buildid value for those binaries you are interested into optimizing with afdo.<br><br>
518
5196. (on host) `perf inject -i perf.data -o inj.data --itrace=i100000il` <br>
520    - will check for the dynamic libraries using the buildid inside the buildid-cache and post-process the trace.<br>
521    buildids have to be the same, otherwise it won't be possible to post-process the trace. <br><br>
522
5237. (on host) `create_llvm_prof -binary=/path/to/binary -format extbinary -profile=inj.data -out=program.llvmprof` <br>
524    - takes the output from perf-inject and tranforms it into a format that the compiler can read. <br><br>
525
5268. (on host) `clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c` <br>
527    - to make clang use the produced profile.<br>
528	If you are confident enough that your profile is accurate, you can add the `-fprofile-sample-accurate` flag, which will penalize all the callsites without corresponding profile, marking them as cold.
529
530If you are using the same host for both building the binary to be traced and re-building it with afdo:
531
5321. You won't need to copy back any dynamic libraries from the board (since you already have them), and can use `--no-buildid-cache` when recording
5332. You have to make sure the relevant dynamic libraries to be optimized are present in the buildid-cache.
534
535You can easily add a dynamic library manually into the build-id cache by running:
536
537`perf buildid-cache --add <path/to/library/or/binary> -vvv`
538
539You can easily check what is currently contained in you buildid-cache by running:
540
541`perf buildid-cache --list`
542
543You can check the buildid of a given binary/dynamic library:
544
545`file <path/to/dynamic/library>`
546
547## References
548
549* AutoFDO tool: <https://github.com/google/autofdo>
550* GCC's wiki on autofdo: <https://gcc.gnu.org/wiki/AutoFDO>, <https://gcc.gnu.org/wiki/AutoFDO/Tutorial>
551* Google paper: <https://ai.google/research/pubs/pub45290>
552* CoreSight kernel docs: <https://www.kernel.org/doc/html/latest/trace/coresight/index.html>
553
554
555## Appendix: Describing CoreSight in Devicetree
556
557
558Each component has an entry in the device tree that describes its:
559
560* type: The `compatible` field defines which driver to use
561* location: A `reg` defines the component's address and size on the bus
562* clocks: The `clocks` and `clock-names` fields state which clock provides
563  the `apb_pclk` clock.
564* connections to other components: `port` and `ports` field link the
565  component to ports of other components
566
567To create the device tree, some information about the platform is required:
568
569* The memory address of the CoreSight components.  This is the address in
570  the CPU's address space where the CPU can access each CoreSight
571  component.
572* The connections between the components.
573
574This information can be found in the SoC's reference manual or you may need
575to ask the platform/SoC vendor to supply it.
576
577An ETMv4 source is declared with a section like this:
578
579```
580	etm0: etm@22040000 {
581		compatible = "arm,coresight-etm4x", "arm,primecell";
582		reg = <0 0x22040000 0 0x1000>;
583
584		cpu = <&A72_0>;
585		clocks = <&soc_smc50mhz>;
586		clock-names = "apb_pclk";
587		port {
588			cluster0_etm0_out_port: endpoint {
589				remote-endpoint = <&cluster0_funnel_in_port0>;
590			};
591		};
592	};
593```
594
595This describes an ETMv4 attached to core A72_0, located at 0x22040000, with
596its output linked to port 0 of a funnel.  The funnel is described with:
597
598```
599	funnel@220c0000 { /* cluster0 funnel */
600		compatible = "arm,coresight-funnel", "arm,primecell";
601		reg = <0 0x220c0000 0 0x1000>;
602
603		clocks = <&soc_smc50mhz>;
604		clock-names = "apb_pclk";
605		power-domains = <&scpi_devpd 0>;
606		ports {
607			#address-cells = <1>;
608			#size-cells = <0>;
609
610			port@0 {
611				reg = <0>;
612				cluster0_funnel_out_port: endpoint {
613					remote-endpoint = <&main_funnel_in_port0>;
614				};
615			};
616
617			port@1 {
618				reg = <0>;
619				cluster0_funnel_in_port0: endpoint {
620					slave-mode;
621					remote-endpoint = <&cluster0_etm0_out_port>;
622				};
623			};
624
625			port@2 {
626				reg = <1>;
627				cluster0_funnel_in_port1: endpoint {
628					slave-mode;
629					remote-endpoint = <&cluster0_etm1_out_port>;
630				};
631			};
632		};
633	};
634```
635
636This describes a funnel located at 0x220c0000, receiving data from 2 ETMs
637and sending the merged data to another funnel.  We continue describing
638components with similar blocks until we reach the sink (an ETR):
639
640```
641	etr@20070000 {
642		compatible = "arm,coresight-tmc", "arm,primecell";
643		reg = <0 0x20070000 0 0x1000>;
644		iommus = <&smmu_etr 0>;
645
646		clocks = <&soc_smc50mhz>;
647		clock-names = "apb_pclk";
648		power-domains = <&scpi_devpd 0>;
649		port {
650			etr_in_port: endpoint {
651				slave-mode;
652				remote-endpoint = <&replicator_out_port1>;
653			};
654		};
655	};
656```
657
658Full descriptions of the properties of each component can be found in the
659Linux source at Documentation/devicetree/bindings/arm/ - files which start `arm,coresight-` are the component bindings.
660
661The Arm Juno platform's devicetree (arch/arm64/boot/dts/arm) provides an example description of CoreSight description.
662
663Many systems include a TPIU for off-chip trace.  While this isn't required
664for self-hosted trace, it should still be included in the devicetree.  This
665allows the drivers to access it to ensure it is put into a disabled state,
666otherwise it may limit the trace bandwidth causing data loss.
667