• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1.. role:: raw-html(raw)
2   :format: html
3
4=================================
5LLVM Code Coverage Mapping Format
6=================================
7
8.. contents::
9   :local:
10
11Introduction
12============
13
14LLVM's code coverage mapping format is used to provide code coverage
15analysis using LLVM's and Clang's instrumenation based profiling
16(Clang's ``-fprofile-instr-generate`` option).
17
18This document is aimed at those who use LLVM's code coverage mapping to provide
19code coverage analysis for their own programs, and for those who would like
20to know how it works under the hood. A prior knowledge of how Clang's profile
21guided optimization works is useful, but not required.
22
23We start by showing how to use LLVM and Clang for code coverage analysis,
24then we briefly describe LLVM's code coverage mapping format and the
25way that Clang and LLVM's code coverage tool work with this format. After
26the basics are down, more advanced features of the coverage mapping format
27are discussed - such as the data structures, LLVM IR representation and
28the binary encoding.
29
30Quick Start
31===========
32
33Here's a short story that describes how to generate code coverage overview
34for a sample source file called *test.c*.
35
36* First, compile an instrumented version of your program using Clang's
37  ``-fprofile-instr-generate`` option with the additional ``-fcoverage-mapping``
38  option:
39
40  ``clang -o test -fprofile-instr-generate -fcoverage-mapping test.c``
41* Then, run the instrumented binary. The runtime will produce a file called
42  *default.profraw* containing the raw profile instrumentation data:
43
44  ``./test``
45* After that, merge the profile data using the *llvm-profdata* tool:
46
47  ``llvm-profdata merge -o test.profdata default.profraw``
48* Finally, run LLVM's code coverage tool (*llvm-cov*) to produce the code
49  coverage overview for the sample source file:
50
51  ``llvm-cov show ./test -instr-profile=test.profdata test.c``
52
53High Level Overview
54===================
55
56LLVM's code coverage mapping format is designed to be a self contained
57data format, that can be embedded into the LLVM IR and object files.
58It's described in this document as a **mapping** format because its goal is
59to store the data that is required for a code coverage tool to map between
60the specific source ranges in a file and the execution counts obtained
61after running the instrumented version of the program.
62
63The mapping data is used in two places in the code coverage process:
64
651. When clang compiles a source file with ``-fcoverage-mapping``, it
66   generates the mapping information that describes the mapping between the
67   source ranges and the profiling instrumentation counters.
68   This information gets embedded into the LLVM IR and conveniently
69   ends up in the final executable file when the program is linked.
70
712. It is also used by *llvm-cov* - the mapping information is extracted from an
72   object file and is used to associate the execution counts (the values of the
73   profile instrumentation counters), and the source ranges in a file.
74   After that, the tool is able to generate various code coverage reports
75   for the program.
76
77The coverage mapping format aims to be a "universal format" that would be
78suitable for usage by any frontend, and not just by Clang. It also aims to
79provide the frontend the possibility of generating the minimal coverage mapping
80data in order to reduce the size of the IR and object files - for example,
81instead of emitting mapping information for each statement in a function, the
82frontend is allowed to group the statements with the same execution count into
83regions of code, and emit the mapping information only for those regions.
84
85Advanced Concepts
86=================
87
88The remainder of this guide is meant to give you insight into the way the
89coverage mapping format works.
90
91The coverage mapping format operates on a per-function level as the
92profile instrumentation counters are associated with a specific function.
93For each function that requires code coverage, the frontend has to create
94coverage mapping data that can map between the source code ranges and
95the profile instrumentation counters for that function.
96
97Mapping Region
98--------------
99
100The function's coverage mapping data contains an array of mapping regions.
101A mapping region stores the `source code range`_ that is covered by this region,
102the `file id <coverage file id_>`_, the `coverage mapping counter`_ and
103the region's kind.
104There are several kinds of mapping regions:
105
106* Code regions associate portions of source code and `coverage mapping
107  counters`_. They make up the majority of the mapping regions. They are used
108  by the code coverage tool to compute the execution counts for lines,
109  highlight the regions of code that were never executed, and to obtain
110  the various code coverage statistics for a function.
111  For example:
112
113  :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{    </span> <span class='c1'>// Code Region from 1:40 to 9:2</span>
114  <span style='background-color:#4A789C'>                                            </span>
115  <span style='background-color:#4A789C'>  if (argc &gt; 1) </span><span style='background-color:#85C1F5'>{                         </span>   <span class='c1'>// Code Region from 3:17 to 5:4</span>
116  <span style='background-color:#85C1F5'>    printf("%s\n", argv[1]);              </span>
117  <span style='background-color:#85C1F5'>  }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{                                </span>   <span class='c1'>// Code Region from 5:10 to 7:4</span>
118  <span style='background-color:#F6D55D'>    printf("\n");                         </span>
119  <span style='background-color:#F6D55D'>  }</span><span style='background-color:#4A789C'>                                         </span>
120  <span style='background-color:#4A789C'>  return 0;                                 </span>
121  <span style='background-color:#4A789C'>}</span>
122  </pre>`
123* Skipped regions are used to represent source ranges that were skipped
124  by Clang's preprocessor. They don't associate with
125  `coverage mapping counters`_, as the frontend knows that they are never
126  executed. They are used by the code coverage tool to mark the skipped lines
127  inside a function as non-code lines that don't have execution counts.
128  For example:
129
130  :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{               </span> <span class='c1'>// Code Region from 1:12 to 6:2</span>
131  <span style='background-color:#85C1F5'>#ifdef DEBUG             </span>   <span class='c1'>// Skipped Region from 2:1 to 4:2</span>
132  <span style='background-color:#85C1F5'>  printf("Hello world"); </span>
133  <span style='background-color:#85C1F5'>#</span><span style='background-color:#4A789C'>endif                     </span>
134  <span style='background-color:#4A789C'>  return 0;                </span>
135  <span style='background-color:#4A789C'>}</span>
136  </pre>`
137* Expansion regions are used to represent Clang's macro expansions. They
138  have an additional property - *expanded file id*. This property can be
139  used by the code coverage tool to find the mapping regions that are created
140  as a result of this macro expansion, by checking if their file id matches the
141  expanded file id. They don't associate with `coverage mapping counters`_,
142  as the code coverage tool can determine the execution count for this region
143  by looking up the execution count of the first region with a corresponding
144  file id.
145  For example:
146
147  :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int func(int x) </span><span style='background-color:#4A789C'>{                             </span>
148  <span style='background-color:#4A789C'>  #define MAX(x,y) </span><span style='background-color:#85C1F5'>((x) &gt; (y)? </span><span style='background-color:#F6D55D'>(x)</span><span style='background-color:#85C1F5'> : </span><span style='background-color:#F4BA70'>(y)</span><span style='background-color:#85C1F5'>)</span><span style='background-color:#4A789C'>     </span>
149  <span style='background-color:#4A789C'>  return </span><span style='background-color:#7FCA9F'>MAX</span><span style='background-color:#4A789C'>(x, 42);                          </span> <span class='c1'>// Expansion Region from 3:10 to 3:13</span>
150  <span style='background-color:#4A789C'>}</span>
151  </pre>`
152
153.. _source code range:
154
155Source Range:
156^^^^^^^^^^^^^
157
158The source range record contains the starting and ending location of a certain
159mapping region. Both locations include the line and the column numbers.
160
161.. _coverage file id:
162
163File ID:
164^^^^^^^^
165
166The file id an integer value that tells us
167in which source file or macro expansion is this region located.
168It enables Clang to produce mapping information for the code
169defined inside macros, like this example demonstrates:
170
171:raw-html:`<pre class='highlight' style='line-height:initial;'><span>void func(const char *str) </span><span style='background-color:#4A789C'>{        </span> <span class='c1'>// Code Region from 1:28 to 6:2 with file id 0</span>
172<span style='background-color:#4A789C'>  #define PUT </span><span style='background-color:#85C1F5'>printf("%s\n", str)</span><span style='background-color:#4A789C'>   </span> <span class='c1'>// 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2</span>
173<span style='background-color:#4A789C'>  if(*str)                          </span>
174<span style='background-color:#4A789C'>    </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>;                            </span> <span class='c1'>// Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1</span>
175<span style='background-color:#4A789C'>  </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>;                              </span> <span class='c1'>// Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2</span>
176<span style='background-color:#4A789C'>}</span>
177</pre>`
178
179.. _coverage mapping counter:
180.. _coverage mapping counters:
181
182Counter:
183^^^^^^^^
184
185A coverage mapping counter can represents a reference to the profile
186instrumentation counter. The execution count for a region with such counter
187is determined by looking up the value of the corresponding profile
188instrumentation counter.
189
190It can also represent a binary arithmetical expression that operates on
191coverage mapping counters or other expressions.
192The execution count for a region with an expression counter is determined by
193evaluating the expression's arguments and then adding them together or
194subtracting them from one another.
195In the example below, a subtraction expression is used to compute the execution
196count for the compound statement that follows the *else* keyword:
197
198:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{   </span> <span class='c1'>// Region's counter is a reference to the profile counter #0</span>
199<span style='background-color:#4A789C'>                                           </span>
200<span style='background-color:#4A789C'>  if (argc &gt; 1) </span><span style='background-color:#85C1F5'>{                        </span>   <span class='c1'>// Region's counter is a reference to the profile counter #1</span>
201<span style='background-color:#85C1F5'>    printf("%s\n", argv[1]);             </span><span>   </span>
202<span style='background-color:#85C1F5'>  }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{                               </span>   <span class='c1'>// Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)</span>
203<span style='background-color:#F6D55D'>    printf("\n");                        </span>
204<span style='background-color:#F6D55D'>  }</span><span style='background-color:#4A789C'>                                        </span>
205<span style='background-color:#4A789C'>  return 0;                                </span>
206<span style='background-color:#4A789C'>}</span>
207</pre>`
208
209Finally, a coverage mapping counter can also represent an execution count of
210of zero. The zero counter is used to provide coverage mapping for
211unreachable statements and expressions, like in the example below:
212
213:raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{                  </span>
214<span style='background-color:#4A789C'>  return 0;                   </span>
215<span style='background-color:#4A789C'>  </span><span style='background-color:#85C1F5'>printf("Hello world!\n")</span><span style='background-color:#4A789C'>;   </span> <span class='c1'>// Unreachable region's counter is zero</span>
216<span style='background-color:#4A789C'>}</span>
217</pre>`
218
219The zero counters allow the code coverage tool to display proper line execution
220counts for the unreachable lines and highlight the unreachable code.
221Without them, the tool would think that those lines and regions were still
222executed, as it doesn't possess the frontend's knowledge.
223
224LLVM IR Representation
225======================
226
227The coverage mapping data is stored in the LLVM IR using a single global
228constant structure variable called *__llvm_coverage_mapping*
229with the *__llvm_covmap* section specifier.
230
231For example, let’s consider a C file and how it gets compiled to LLVM:
232
233.. _coverage mapping sample:
234
235.. code-block:: c
236
237  int foo() {
238    return 42;
239  }
240  int bar() {
241    return 13;
242  }
243
244The coverage mapping variable generated by Clang has 3 fields:
245
246* Coverage mapping header.
247
248* An array of function records.
249
250* Coverage mapping data which is an array of bytes. Zero paddings are added at the end to force 8 byte alignment.
251
252.. code-block:: llvm
253
254  @__llvm_coverage_mapping = internal constant { { i32, i32, i32, i32 }, [2 x { i64, i32, i64 }], [40 x i8] }
255  {
256    { i32, i32, i32, i32 } ; Coverage map header
257    {
258      i32 2,  ; The number of function records
259      i32 20, ; The length of the string that contains the encoded translation unit filenames
260      i32 20, ; The length of the string that contains the encoded coverage mapping data
261      i32 2,  ; Coverage mapping format version
262    },
263    [2 x { i64, i32, i64 }] [ ; Function records
264     { i64, i32, i64 } {
265       i64 0x5cf8c24cdb18bdac, ; Function's name MD5
266       i32 9, ; Function's encoded coverage mapping data string length
267       i64 0  ; Function's structural hash
268     },
269     { i64, i32, i64 } {
270       i64 0xe413754a191db537, ; Function's name MD5
271       i32 9, ; Function's encoded coverage mapping data string length
272       i64 0  ; Function's structural hash
273     }],
274   [40 x i8] c"..." ; Encoded data (dissected later)
275  }, section "__llvm_covmap", align 8
276
277The current version of the format is version 3. The only difference from version 2 is that a special encoding for column end locations was introduced to indicate gap regions.
278
279The function record layout has evolved since version 1. In version 1, the function record for *foo* is defined as follows:
280
281.. code-block:: llvm
282
283     { i8*, i32, i32, i64 } { i8* getelementptr inbounds ([3 x i8]* @__profn_foo, i32 0, i32 0), ; Function's name
284       i32 3, ; Function's name length
285       i32 9, ; Function's encoded coverage mapping data string length
286       i64 0  ; Function's structural hash
287     }
288
289
290Coverage Mapping Header:
291------------------------
292
293The coverage mapping header has the following fields:
294
295* The number of function records.
296
297* The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded translation unit filenames.
298
299* The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded coverage mapping data.
300
301* The format version. The current version is 3 (encoded as a 2).
302
303.. _function records:
304
305Function record:
306----------------
307
308A function record is a structure of the following type:
309
310.. code-block:: llvm
311
312  { i64, i32, i64 }
313
314It contains function name's MD5, the length of the encoded mapping data for that function, and function's
315structural hash value.
316
317Encoded data:
318-------------
319
320The encoded data is stored in a single string that contains
321the encoded filenames used by this translation unit and the encoded coverage
322mapping data for each function in this translation unit.
323
324The encoded data has the following structure:
325
326``[filenames, coverageMappingDataForFunctionRecord0, coverageMappingDataForFunctionRecord1, ..., padding]``
327
328If necessary, the encoded data is padded with zeroes so that the size
329of the data string is rounded up to the nearest multiple of 8 bytes.
330
331Dissecting the sample:
332^^^^^^^^^^^^^^^^^^^^^^
333
334Here's an overview of the encoded data that was stored in the
335IR for the `coverage mapping sample`_ that was shown earlier:
336
337* The IR contains the following string constant that represents the encoded
338  coverage mapping data for the sample translation unit:
339
340  .. code-block:: llvm
341
342    c"\01\12/Users/alex/test.c\01\00\00\01\01\01\0C\02\02\01\00\00\01\01\04\0C\02\02\00\00"
343
344* The string contains values that are encoded in the LEB128 format, which is
345  used throughout for storing integers. It also contains a string value.
346
347* The length of the substring that contains the encoded translation unit
348  filenames is the value of the second field in the *__llvm_coverage_mapping*
349  structure, which is 20, thus the filenames are encoded in this string:
350
351  .. code-block:: llvm
352
353    c"\01\12/Users/alex/test.c"
354
355  This string contains the following data:
356
357  * Its first byte has a value of ``0x01``. It stores the number of filenames
358    contained in this string.
359  * Its second byte stores the length of the first filename in this string.
360  * The remaining 18 bytes are used to store the first filename.
361
362* The length of the substring that contains the encoded coverage mapping data
363  for the first function is the value of the third field in the first
364  structure in an array of `function records`_ stored in the
365  third field of the *__llvm_coverage_mapping* structure, which is the 9.
366  Therefore, the coverage mapping for the first function record is encoded
367  in this string:
368
369  .. code-block:: llvm
370
371    c"\01\00\00\01\01\01\0C\02\02"
372
373  This string consists of the following bytes:
374
375  +----------+-------------------------------------------------------------------------------------------------------------------------+
376  | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function.      |
377  +----------+-------------------------------------------------------------------------------------------------------------------------+
378  | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c".                                   |
379  +----------+-------------------------------------------------------------------------------------------------------------------------+
380  | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions.                     |
381  +----------+-------------------------------------------------------------------------------------------------------------------------+
382  | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0.                                |
383  +----------+-------------------------------------------------------------------------------------------------------------------------+
384  | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage        |
385  |          | mapping counter that is a reference to the profile instrumentation counter with an index of 0.                          |
386  +----------+-------------------------------------------------------------------------------------------------------------------------+
387  | ``0x01`` | The starting line of the first mapping region in this function.                                                         |
388  +----------+-------------------------------------------------------------------------------------------------------------------------+
389  | ``0x0C`` | The starting column of the first mapping region in this function.                                                       |
390  +----------+-------------------------------------------------------------------------------------------------------------------------+
391  | ``0x02`` | The ending line of the first mapping region in this function.                                                           |
392  +----------+-------------------------------------------------------------------------------------------------------------------------+
393  | ``0x02`` | The ending column of the first mapping region in this function.                                                         |
394  +----------+-------------------------------------------------------------------------------------------------------------------------+
395
396* The length of the substring that contains the encoded coverage mapping data
397  for the second function record is also 9. It's structured like the mapping data
398  for the first function record.
399
400* The two trailing bytes are zeroes and are used to pad the coverage mapping
401  data to give it the 8 byte alignment.
402
403Encoding
404========
405
406The per-function coverage mapping data is encoded as a stream of bytes,
407with a simple structure. The structure consists of the encoding
408`types <cvmtypes_>`_ like variable-length unsigned integers, that
409are used to encode `File ID Mapping`_, `Counter Expressions`_ and
410the `Mapping Regions`_.
411
412The format of the structure follows:
413
414  ``[file id mapping, counter expressions, mapping regions]``
415
416The translation unit filenames are encoded using the same encoding
417`types <cvmtypes_>`_ as the per-function coverage mapping data, with the
418following structure:
419
420  ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]``
421
422.. _cvmtypes:
423
424Types
425-----
426
427This section describes the basic types that are used by the encoding format
428and can appear after ``:`` in the ``[foo : type]`` description.
429
430.. _LEB128:
431
432LEB128
433^^^^^^
434
435LEB128 is an unsigned integer value that is encoded using DWARF's LEB128
436encoding, optimizing for the case where values are small
437(1 byte for values less than 128).
438
439.. _CoverageStrings:
440
441Strings
442^^^^^^^
443
444``[length : LEB128, characters...]``
445
446String values are encoded with a `LEB value <LEB128_>`_ for the length
447of the string and a sequence of bytes for its characters.
448
449.. _file id mapping:
450
451File ID Mapping
452---------------
453
454``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]``
455
456File id mapping in a function's coverage mapping stream
457contains the indices into the translation unit's filenames array.
458
459Counter
460-------
461
462``[value : LEB128]``
463
464A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_.
465It is composed of two things --- the `tag <counter-tag_>`_
466which is stored in the lowest 2 bits, and the `counter data`_ which is stored
467in the remaining bits.
468
469.. _counter-tag:
470
471Tag:
472^^^^
473
474The counter's tag encodes the counter's kind
475and, if the counter is an expression, the expression's kind.
476The possible tag values are:
477
478* 0 - The counter is zero.
479
480* 1 - The counter is a reference to the profile instrumentation counter.
481
482* 2 - The counter is a subtraction expression.
483
484* 3 - The counter is an addition expression.
485
486.. _counter data:
487
488Data:
489^^^^^
490
491The counter's data is interpreted in the following manner:
492
493* When the counter is a reference to the profile instrumentation counter,
494  then the counter's data is the id of the profile counter.
495* When the counter is an expression, then the counter's data
496  is the index into the array of counter expressions.
497
498.. _Counter Expressions:
499
500Counter Expressions
501-------------------
502
503``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]``
504
505Counter expressions consist of two counters as they
506represent binary arithmetic operations.
507The expression's kind is determined from the `tag <counter-tag_>`_ of the
508counter that references this expression.
509
510.. _Mapping Regions:
511
512Mapping Regions
513---------------
514
515``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]``
516
517The mapping regions are stored in an array of sub-arrays where every
518region in a particular sub-array has the same file id.
519
520The file id for a sub-array of regions is the index of that
521sub-array in the main array e.g. The first sub-array will have the file id
522of 0.
523
524Sub-Array of Regions
525^^^^^^^^^^^^^^^^^^^^
526
527``[numRegions : LEB128, region0, region1, ...]``
528
529The mapping regions for a specific file id are stored in an array that is
530sorted in an ascending order by the region's starting location.
531
532Mapping Region
533^^^^^^^^^^^^^^
534
535``[header, source range]``
536
537The mapping region record contains two sub-records ---
538the `header`_, which stores the counter and/or the region's kind,
539and the `source range`_ that contains the starting and ending
540location of this region.
541
542.. _header:
543
544Header
545^^^^^^
546
547``[counter]``
548
549or
550
551``[pseudo-counter]``
552
553The header encodes the region's counter and the region's kind.
554
555The value of the counter's tag distinguishes between the counters and
556pseudo-counters --- if the tag is zero, than this header contains a
557pseudo-counter, otherwise this header contains an ordinary counter.
558
559Counter:
560""""""""
561
562A mapping region whose header has a counter with a non-zero tag is
563a code region.
564
565Pseudo-Counter:
566"""""""""""""""
567
568``[value : LEB128]``
569
570A pseudo-counter is stored in a single `LEB value <LEB128_>`_, just like
571the ordinary counter. It has the following interpretation:
572
573* bits 0-1: tag, which is always 0.
574
575* bit 2: expansionRegionTag. If this bit is set, then this mapping region
576  is an expansion region.
577
578* remaining bits: data. If this region is an expansion region, then the data
579  contains the expanded file id of that region.
580
581  Otherwise, the data contains the region's kind. The possible region
582  kind values are:
583
584  * 0 - This mapping region is a code region with a counter of zero.
585  * 2 - This mapping region is a skipped region.
586
587.. _source range:
588
589Source Range
590^^^^^^^^^^^^
591
592``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]``
593
594The source range record contains the following fields:
595
596* *deltaLineStart*: The difference between the starting line of the
597  current mapping region and the starting line of the previous mapping region.
598
599  If the current mapping region is the first region in the current
600  sub-array, then it stores the starting line of that region.
601
602* *columnStart*: The starting column of the mapping region.
603
604* *numLines*: The difference between the ending line and the starting line
605  of the current mapping region.
606
607* *columnEnd*: The ending column of the mapping region. If the high bit is set,
608  the current mapping region is a gap area. A count for a gap area is only used
609  as the line execution count if there are no other regions on a line.
610