Lines Matching +full:gcc +full:- +full:pgo
9 optimization - PGO) uses a profile of a program's execution to guide the
55 * ETB: A small circular buffer (64-128 kilobytes) that stores the most
59 * TPIU: Sends data to an off-chip capture device (e.g. Arm DSTREAM)
70 lot of data very quickly - typically an ETM will generate ~1 bit of data
75 only record small slices of execution - e.g. collect ~5000 cycles of data
76 every 50M cycles. This reduces the data rate to a manageable level - a few
90 <https://gitlab.arm.com/linux-arm/linux-coresight-backports>
95 git clone https://git.gitlab.arm.com/linux-arm/linux-coresight-backports.git
100 format-patch`).
102 …ards, the only patch which needs to be applied is the one enabling strobing - etm4x: `Enable strob…
104 For 4.9 based kernels, use the `coresight-4.9-etr-etm_strobe` branch:
107 git merge coresight-4.9-etr-etm_strobe
113 git format-patch --output-directory /output/dir v4.9..coresight-4.9-etr-etm_strobe
115 git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
118 For 4.14 based kernels, use the `coresight-4.14-etm_strobe` branch:
121 git merge coresight-4.14-etm_strobe
127 git format-patch --output-directory /output/dir v4.14..coresight-4.14-etm_strobe
129 git am /output/dir/*.patch # or patch -p1 /output/dir/*.patch if not using git
151 make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
175 …tation](https://www.kernel.org/doc/html/latest/trace/coresight/coresight.html#device-naming-scheme)
189 make -C tools/perf CORESIGHT=1 VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
195 If the post-processing (`perf inject`) of the captured data is not being
205 perf record -e cs_etm/@tmc_etr0/u --per-thread -- /bin/ls
212 You can trace a single-threaded program in two different ways:
214 1. By specifying `--per-thread`, and in this case the CoreSight subsystem will
217 2. By NOT specifying `--per-thread`, and in this case CPU-wide tracing will
230 for debian testing (install the libopencsd0, libopencsd-dev packages) or
234 instruction profiles to source profiles for the GCC and clang/llvm
244 map instructions back to source lines. For GCC, use the `-g1` or `-gmlt`
245 options. For clang/llvm, also add the `-fdebug-info-for-profiling` option.
249 flag `-Wl,--build-id=sha1`.
259 and a period of 10000 - this will collect 5000 cycles of trace every 50M
260 cycles. With these proof-of-concept patches, the strobe parameters are
261 configured via sysfs - each ETM will have `strobe_window` and
265 The `set_strobing.sh` script in this directory [`<opencsd>/decoder/tests/auto-fdo`] automates this …
271 perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>"
277 perf report -D -i perf.data --stdio
280 …st the OpenCSD library in order to be able to properly read ETM-gathered samples and post-process …
295 Otherwise, this project is packaged for debian (install the libopencsd0, libopencsd-dev packages).
301 …PERF_RECORD_AUXTRACE size: 0x2003c0 offset: 0 ref: 0x39ba881d145f8639 idx: 0 tid: 4551 cpu: -1
333 you'll need to run `perf buildid-cache` as described below.
335 perf inject -i perf.data -o inj.data --itrace=i100000il
338 The `--itrace` option configures the instruction sample behaviour:
352 perf buildid-cache -a /path/to/binary_or_library
358 perf report -D -i inj.data --stdio
362 ..... 0: 0000000000434b58 -> 0000000000434b68 0 cycles P 0
363 ..... 1: 0000000000436a88 -> 0000000000434b4c 0 cycles P 0
364 ..... 2: 0000000000436a64 -> 0000000000436a78 0 cycles P 0
365 ..... 3: 00000000004369d0 -> 0000000000436a60 0 cycles P 0
366 ..... 4: 000000000043693c -> 00000000004369cc 0 cycles P 0
367 ..... 5: 00000000004368a8 -> 0000000000436928 0 cycles P 0
368 ..... 6: 000000000042d070 -> 00000000004368a8 0 cycles P 0
369 ..... 7: 000000000042d108 -> 000000000042d070 0 cycles P 0
371 ..... 57: 0000000000448ee0 -> 0000000000448f24 0 cycles P 0
372 ..... 58: 0000000000448ea4 -> 0000000000448ebc 0 cycles P 0
373 ..... 59: 0000000000448e20 -> 0000000000448e94 0 cycles P 0
374 ..... 60: 0000000000448da8 -> 0000000000448ddc 0 cycles P 0
375 ..... 61: 00000000004486f4 -> 0000000000448da8 0 cycles P 0
376 ..... 62: 00000000004480fc -> 00000000004486d4 0 cycles P 0
377 ..... 63: 0000000000448658 -> 00000000004480ec 0 cycles P 0
388 create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof
391 And for GCC:
394 create_gcov -binary=/path/to/binary -profile=inj.data -gcov_version=1 -gcov=program.gcov
400 llvm-profdata show -sample program.llvmprof
403 Or, for GCC:
406 dump_gcov -gcov_version=1 program.gcov
414 For GCC, use the `-fauto-profile` option:
417 gcc -O2 -fauto-profile=program.gcov -o program program.c
420 For Clang, use the `-fprofile-sample-use` option:
423 clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c
433 perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>"
434 perf inject -i perf.data -o inj.data --itrace=i100000il
435 create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof
436 clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c
439 Use `create_gcov` for gcc.
446 perf record -e cs_etm/@tmc_etr0/u --per-thread -- <your app>.
447 …If you specify `-N, --no-buildid-cache`, perf will just take care of recording the target binary a…
449 …internally, it looks into perf.data file and performs a lookup using perf-buildid-list --with-hits)
451 4. (on host) Run `tar xvf perf_data.tar.bz2 -C ~/.debug` to populate the buildid-cache
454 …a. `perf buildid-list -i perf.data` gives you the list of dynamic libraries buildids whose trace h…
455 …b. `perf buildid-cache --list` lists the dynamic libraries in the buildid cache that will be used …
458 …-i perf.data -o inj.data --itrace=i100000il` will check for the dynamic libraries using the buildi…
460 …. (on host) `create_llvm_prof -binary=/path/to/binary -profile=inj.data -out=program.llvmprof` tak…
461 8. (on host) `clang -O2 -fprofile-sample-use=program.llvmprof -o program program.c` to make clang u…
462 …If you are confident enough that your profile is accurate, you can add the `-fprofile-sample-accur…
464 If you are using the same host for both building the binary to be traced and re-building it with af…
466 … libraries from the board (since you already have them), and can use `--no-buildid-cache` when rec…
467 2. You have to make sure the relevant dynamic libraries to be optimized are present in the buildid-…
469 You can easily add a dynamic library manually into the build-id cache by running:
471 `perf buildid-cache --add <path/to/library/or/binary> -vvv`
473 You can easily check what is currently contained in you buildid-cache by running:
475 `perf buildid-cache --list`
484 * GCC's wiki on autofdo: <https://gcc.gnu.org/wiki/AutoFDO>, <https://gcc.gnu.org/wiki/AutoFDO/Tuto…
496 * clocks: The `clocks` and `clock-names` fields state which clock provides
515 compatible = "arm,coresight-etm4x", "arm,primecell";
520 clock-names = "apb_pclk";
523 remote-endpoint = <&cluster0_funnel_in_port0>;
534 compatible = "arm,coresight-funnel", "arm,primecell";
538 clock-names = "apb_pclk";
539 power-domains = <&scpi_devpd 0>;
541 #address-cells = <1>;
542 #size-cells = <0>;
547 remote-endpoint = <&main_funnel_in_port0>;
554 slave-mode;
555 remote-endpoint = <&cluster0_etm0_out_port>;
562 slave-mode;
563 remote-endpoint = <&cluster0_etm1_out_port>;
576 compatible = "arm,coresight-tmc", "arm,primecell";
581 clock-names = "apb_pclk";
582 power-domains = <&scpi_devpd 0>;
585 slave-mode;
586 remote-endpoint = <&replicator_out_port1>;
597 Many systems include a TPIU for off-chip trace. While this isn't required
598 for self-hosted trace, it should still be included in the devicetree. This