1llvm-exegesis - LLVM Machine Instruction Benchmark 2================================================== 3 4SYNOPSIS 5-------- 6 7:program:`llvm-exegesis` [*options*] 8 9DESCRIPTION 10----------- 11 12:program:`llvm-exegesis` is a benchmarking tool that uses information available 13in LLVM to measure host machine instruction characteristics like latency or port 14decomposition. 15 16Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis` 17generates a code snippet that makes execution as serial (resp. as parallel) as 18possible so that we can measure the latency (resp. uop decomposition) of the 19instruction. 20The code snippet is jitted and executed on the host subtarget. The time taken 21(resp. resource usage) is measured using hardware performance counters. The 22result is printed out as YAML to the standard output. 23 24The main goal of this tool is to automatically (in)validate the LLVM's TableDef 25scheduling models. To that end, we also provide analysis of the results. 26 27EXAMPLES: benchmarking 28---------------------- 29 30Assume you have an X86-64 machine. To measure the latency of a single 31instruction, run: 32 33.. code-block:: bash 34 35 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr 36 37Measuring the uop decomposition of an instruction works similarly: 38 39.. code-block:: bash 40 41 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr 42 43The output is a YAML document (the default is to write to stdout, but you can 44redirect the output to a file using `-benchmarks-file`): 45 46.. code-block:: none 47 48 --- 49 key: 50 opcode_name: ADD64rr 51 mode: latency 52 config: '' 53 cpu_name: haswell 54 llvm_triple: x86_64-unknown-linux-gnu 55 num_repetitions: 10000 56 measurements: 57 - { key: latency, value: 1.0058, debug_string: '' } 58 error: '' 59 info: 'explicit self cycles, selecting one aliasing configuration. 60 Snippet: 61 ADD64rr R8, R8, R10 62 ' 63 ... 64 65To measure the latency of all instructions for the host architecture, run: 66 67.. code-block:: bash 68 69 #!/bin/bash 70 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1)) 71 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS}); 72 do 73 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p' 74 done 75 76FIXME: Provide an :program:`llvm-exegesis` option to test all instructions. 77 78EXAMPLES: analysis 79---------------------- 80 81Assuming you have a set of benchmarked instructions (either latency or uops) as 82YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the 83following command: 84 85.. code-block:: bash 86 87 $ llvm-exegesis -mode=analysis \ 88 -benchmarks-file=/tmp/benchmarks.yaml \ 89 -analysis-clusters-output-file=/tmp/clusters.csv \ 90 -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt 91 92This will group the instructions into clusters with the same performance 93characteristics. The clusters will be written out to `/tmp/clusters.csv` in the 94following format: 95 96.. code-block:: none 97 98 cluster_id,opcode_name,config,sched_class 99 ... 100 2,ADD32ri8_DB,,WriteALU,1.00 101 2,ADD32ri_DB,,WriteALU,1.01 102 2,ADD32rr,,WriteALU,1.01 103 2,ADD32rr_DB,,WriteALU,1.00 104 2,ADD32rr_REV,,WriteALU,1.00 105 2,ADD64i32,,WriteALU,1.01 106 2,ADD64ri32,,WriteALU,1.01 107 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00 108 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02 109 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01 110 2,ADD64ri8,,WriteALU,1.00 111 2,SETBr,,WriteSETCC,1.01 112 ... 113 114:program:`llvm-exegesis` will also analyze the clusters to point out 115inconsistencies in the scheduling information. The output is an html file. For 116example, `/tmp/inconsistencies.html` will contain messages like the following : 117 118.. image:: llvm-exegesis-analysis.png 119 :align: center 120 121Note that the scheduling class names will be resolved only when 122:program:`llvm-exegesis` is compiled in debug mode, else only the class id will 123be shown. This does not invalidate any of the analysis results though. 124 125 126OPTIONS 127------- 128 129.. option:: -help 130 131 Print a summary of command line options. 132 133.. option:: -opcode-index=<LLVM opcode index> 134 135 Specify the opcode to measure, by index. 136 Either `opcode-index` or `opcode-name` must be set. 137 138.. option:: -opcode-name=<LLVM opcode name> 139 140 Specify the opcode to measure, by name. 141 Either `opcode-index` or `opcode-name` must be set. 142 143.. option:: -mode=[latency|uops|analysis] 144 145 Specify the run mode. 146 147.. option:: -num-repetitions=<Number of repetition> 148 149 Specify the number of repetitions of the asm snippet. 150 Higher values lead to more accurate measurements but lengthen the benchmark. 151 152.. option:: -benchmarks-file=</path/to/file> 153 154 File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark 155 results. "-" uses stdin/stdout. 156 157.. option:: -analysis-clusters-output-file=</path/to/file> 158 159 If provided, write the analysis clusters as CSV to this file. "-" prints to 160 stdout. 161 162.. option:: -analysis-inconsistencies-output-file=</path/to/file> 163 164 If non-empty, write inconsistencies found during analysis to this file. `-` 165 prints to stdout. 166 167.. option:: -analysis-numpoints=<dbscan numPoints parameter> 168 169 Specify the numPoints parameters to be used for DBSCAN clustering 170 (`analysis` mode). 171 172.. option:: -analysis-espilon=<dbscan epsilon parameter> 173 174 Specify the numPoints parameters to be used for DBSCAN clustering 175 (`analysis` mode). 176 177.. option:: -ignore-invalid-sched-class=false 178 179 If set, ignore instructions that do not have a sched class (class idx = 0). 180 181 182EXIT STATUS 183----------- 184 185:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is 186printed to standard error, and the tool returns a non 0 value. 187