1<?xml version="1.0"?> <!-- -*- sgml -*- --> 2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" 4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> 5 6<chapter id="cl-format" xreflabel="Callgrind Format Specification"> 7<title>Callgrind Format Specification</title> 8 9<para>This chapter describes the Callgrind Profile Format, Version 1.</para> 10 11<para>A synonymous name is "Calltree Profile Format". These names actually mean 12the same since Callgrind was previously named Calltree.</para> 13 14<para>The format description is meant for the user to be able to understand the 15file contents; but more important, it is given for authors of measurement or 16visualization tools to be able to write and read this format.</para> 17 18<sect1 id="cl-format.overview" xreflabel="Overview"> 19<title>Overview</title> 20 21<para>The profile data format is ASCII based. 22It is written by Callgrind, and it is upwards compatible 23to the format used by Cachegrind (ie. Cachegrind uses a subset). It can 24be read by callgrind_annotate and KCachegrind.</para> 25 26<para>This chapter gives on overview of format features and examples. 27For detailed syntax, look at the format reference.</para> 28 29<sect2 id="cl-format.overview.basics" xreflabel="Basic Structure"> 30<title>Basic Structure</title> 31 32<para>Each file has a header part of an arbitrary number of lines of the 33format "key: value". The lines with key "positions" and "events" define 34the meaning of cost lines in the second part of the file: the value of 35"positions" is a list of subpositions, and the value of "events" is a list 36of event type names. Cost lines consist of subpositions followed by 64-bit 37counters for the events, in the order specified by the "positions" and "events" 38header line.</para> 39 40<para>The "events" header line is always required in contrast to the optional 41line for "positions", which defaults to "line", i.e. a line number of some 42source file. In addition, the second part of the file contains position 43specifications of the form "spec=name". "spec" can be e.g. "fn" for a 44function name or "fl" for a file name. Cost lines are always related to 45the function/file specifications given directly before.</para> 46 47</sect2> 48 49<sect2 id="cl-format.overview.example1" xreflabel="Simple Example"> 50<title>Simple Example</title> 51 52<para>The event names in the following example are quite arbitrary, and are not 53related to event names used by Callgrind. Especially, cycle counts matching 54real processors probably will never be generated by any Valgrind tools, as these 55are bound to simulations of simple machine models for acceptable slowdown. 56However, any profiling tool could use the format described in this chapter.</para> 57 58<para> 59<screen>events: Cycles Instructions Flops 60fl=file.f 61fn=main 6215 90 14 2 6316 20 12</screen></para> 64 65<para>The above example gives profile information for event types "Cycles", 66"Instructions", and "Flops". Thus, cost lines give the number of CPU cycles 67passed by, number of executed instructions, and number of floating point 68operations executed while running code corresponding to some source 69position. As there is no line specifying the value of "positions", it defaults 70to "line", which means that the first number of a cost line is always a line 71number.</para> 72 73<para>Thus, the first cost line specifies that in line 15 of source file 74<filename>file.f</filename> there is code belonging to function 75<function>main</function>. While running, 90 CPU cycles passed by, and 2 of 76the 14 instructions executed were floating point operations. Similarly, the 77next line specifies that there were 12 instructions executed in the context 78of function <function>main</function> which can be related to line 16 in 79file <filename>file.f</filename>, taking 20 CPU cycles. If a cost line 80specifies less event counts than given in the "events" line, the rest is 81assumed to be zero. I.e. there was no floating point instruction executed 82relating to line 16.</para> 83 84<para>Note that regular cost lines always give self (also called exclusive) 85cost of code at a given position. If you specify multiple cost lines for the 86same position, these will be summed up. On the other hand, in the example above 87there is no specification of how many times function 88<function>main</function> actually was 89called: profile data only contains sums.</para> 90 91</sect2> 92 93 94<sect2 id="cl-format.overview.associations" xreflabel="Associations"> 95<title>Associations</title> 96 97<para>The most important extension to the original format of Cachegrind is the 98ability to specify call relationship among functions. More generally, you 99specify associations among positions. For this, the second part of the 100file also can contain association specifications. These look similar to 101position specifications, but consist of 2 lines. For calls, the format 102looks like 103<screen> 104 calls=(Call Count) (Destination position) 105 (Source position) (Inclusive cost of call) 106</screen></para> 107 108<para>The destination only specifies subpositions like line number. Therefore, 109to be able to specify a call to another function in another source file, you 110have to precede the above lines with a "cfn=" specification for the name of the 111called function, and a "cfl=" specification if the function is in another 112source file. The 2nd line looks like a regular cost line with the difference 113that inclusive cost spent inside of the function call has to be specified.</para> 114 115<para>Other associations are for example (conditional) jumps. See the 116reference below for details.</para> 117 118</sect2> 119 120 121<sect2 id="cl-format.overview.example2" xreflabel="Extended Example"> 122<title>Extended Example</title> 123 124<para>The following example shows 3 functions, <function>main</function>, 125<function>func1</function>, and <function>func2</function>. Function 126<function>main</function> calls <function>func1</function> once and 127<function>func2</function> 3 times. <function>func1</function> calls 128<function>func2</function> 2 times. 129<screen>events: Instructions 130 131fl=file1.c 132fn=main 13316 20 134cfn=func1 135calls=1 50 13616 400 137cfl=file2.c 138cfn=func2 139calls=3 20 14016 400 141 142fn=func1 14351 100 144cfl=file2.c 145cfn=func2 146calls=2 20 14751 300 148 149fl=file2.c 150fn=func2 15120 700</screen></para> 152 153<para>One can see that in <function>main</function> only code from line 16 154is executed where also the other functions are called. Inclusive cost of 155<function>main</function> is 820, which is the sum of self cost 20 and costs 156spent in the calls: 400 for the single call to <function>func1</function> 157and 400 as sum for the three calls to <function>func2</function>.</para> 158 159<para>Function <function>func1</function> is located in 160<filename>file1.c</filename>, the same as <function>main</function>. 161Therefore, a "cfl=" specification for the call to <function>func1</function> 162is not needed. The function <function>func1</function> only consists of code 163at line 51 of <filename>file1.c</filename>, where <function>func2</function> 164is called.</para> 165 166</sect2> 167 168 169<sect2 id="cl-format.overview.compression1" xreflabel="Name Compression"> 170<title>Name Compression</title> 171 172<para>With the introduction of association specifications like calls it is 173needed to specify the same function or same file name multiple times. As 174absolute filenames or symbol names in C++ can be quite long, it is advantageous 175to be able to specify integer IDs for position specifications. 176Here, the term "position" corresponds to a file name (source or object file) 177or function name.</para> 178 179<para>To support name compression, a position specification can be not only of 180the format "spec=name", but also "spec=(ID) name" to specify a mapping of an 181integer ID to a name, and "spec=(ID)" to reference a previously defined ID 182mapping. There is a separate ID mapping for each position specification, 183i.e. you can use ID 1 for both a file name and a symbol name.</para> 184 185<para>With string compression, the example from 1.4 looks like this: 186<screen>events: Instructions 187 188fl=(1) file1.c 189fn=(1) main 19016 20 191cfn=(2) func1 192calls=1 50 19316 400 194cfl=(2) file2.c 195cfn=(3) func2 196calls=3 20 19716 400 198 199fn=(2) 20051 100 201cfl=(2) 202cfn=(3) 203calls=2 20 20451 300 205 206fl=(2) 207fn=(3) 20820 700</screen></para> 209 210<para>As position specifications carry no information themselves, but only change 211the meaning of subsequent cost lines or associations, they can appear 212everywhere in the file without any negative consequence. Especially, you can 213define name compression mappings directly after the header, and before any cost 214lines. Thus, the above example can also be written as 215<screen>events: Instructions 216 217# define file ID mapping 218fl=(1) file1.c 219fl=(2) file2.c 220# define function ID mapping 221fn=(1) main 222fn=(2) func1 223fn=(3) func2 224 225fl=(1) 226fn=(1) 22716 20 228...</screen></para> 229 230</sect2> 231 232 233<sect2 id="cl-format.overview.compression2" xreflabel="Subposition Compression"> 234<title>Subposition Compression</title> 235 236<para>If a Callgrind data file should hold costs for each assembler instruction 237of a program, you specify subposition "instr" in the "positions:" header line, 238and each cost line has to include the address of some instruction. Addresses 239are allowed to have a size of 64 bits to support 64-bit architectures. Thus, 240repeating similar, long addresses for almost every line in the data file can 241enlarge the file size quite significantly, and 242motivates for subposition compression: instead of every cost line starting with 243a 16 character long address, one is allowed to specify relative addresses. 244This relative specification is not only allowed for instruction addresses, but 245also for line numbers; both addresses and line numbers are called "subpositions".</para> 246 247<para>A relative subposition always is based on the corresponding subposition 248of the last cost line, and starts with a "+" to specify a positive difference, 249a "-" to specify a negative difference, or consists of "*" to specify the same 250subposition. Because absolute subpositions always are positive (ie. never 251prefixed by "-"), any relative specification is non-ambiguous; additionally, 252absolute and relative subposition specifications can be mixed freely. 253Assume the following example (subpositions can always be specified 254as hexadecimal numbers, beginning with "0x"): 255<screen>positions: instr line 256events: ticks 257 258fn=func 2590x80001234 90 1 2600x80001237 90 5 2610x80001238 91 6</screen></para> 262 263<para>With subposition compression, this looks like 264<screen>positions: instr line 265events: ticks 266 267fn=func 2680x80001234 90 1 269+3 * 5 270+1 +1 6</screen></para> 271 272<para>Remark: For assembler annotation to work, instruction addresses have to 273be corrected to correspond to addresses found in the original binary. I.e. for 274relocatable shared objects, often a load offset has to be subtracted.</para> 275 276</sect2> 277 278 279<sect2 id="cl-format.overview.misc" xreflabel="Miscellaneous"> 280<title>Miscellaneous</title> 281 282<sect3 id="cl-format.overview.misc.summary" xreflabel="Cost Summary Information"> 283<title>Cost Summary Information</title> 284 285<para>For the visualization to be able to show cost percentage, a sum of the 286cost of the full run has to be known. Usually, it is assumed that this is the 287sum of all cost lines in a file. But sometimes, this is not correct. Thus, you 288can specify a "summary:" line in the header giving the full cost for the 289profile run. This has another effect: a import filter can show a progress bar 290while loading a large data file if he knows to cost sum in advance.</para> 291 292</sect3> 293 294<sect3 id="cl-format.overview.misc.events" xreflabel="Long Names for Event Types and inherited Types"> 295<title>Long Names for Event Types and inherited Types</title> 296 297<para>Event types for cost lines are specified in the "events:" line with an 298abbreviated name. For visualization, it makes sense to be able to specify some 299longer, more descriptive name. For an event type "Ir" which means "Instruction 300Fetches", this can be specified the header line 301<screen>event: Ir : Instruction Fetches 302events: Ir Dr</screen></para> 303 304<para>In this example, "Dr" itself has no long name associated. The order of 305"event:" lines and the "events:" line is of no importance. Additionally, 306inherited event types can be introduced for which no raw data is available, but 307which are calculated from given types. Suppose the last example, you could add 308<screen>event: Sum = Ir + Dr</screen> 309to specify an additional event type "Sum", which is calculated by adding costs 310for "Ir and "Dr".</para> 311 312</sect3> 313 314</sect2> 315 316</sect1> 317 318<sect1 id="cl-format.reference" xreflabel="Reference"> 319<title>Reference</title> 320 321<sect2 id="cl-format.reference.grammar" xreflabel="Grammar"> 322<title>Grammar</title> 323 324<para> 325<screen>ProfileDataFile := FormatVersion? Creator? PartData*</screen> 326<screen>FormatVersion := "version:" Space* Number "\n"</screen> 327<screen>Creator := "creator:" NoNewLineChar* "\n"</screen> 328<screen>PartData := (HeaderLine "\n")+ (BodyLine "\n")+</screen> 329<screen>HeaderLine := (empty line) 330 | ('#' NoNewLineChar*) 331 | PartDetail 332 | Description 333 | EventSpecification 334 | CostLineDef</screen> 335<screen>PartDetail := TargetCommand | TargetID</screen> 336<screen>TargetCommand := "cmd:" Space* NoNewLineChar*</screen> 337<screen>TargetID := ("pid"|"thread"|"part") ":" Space* Number</screen> 338<screen>Description := "desc:" Space* Name Space* ":" NoNewLineChar*</screen> 339<screen>EventSpecification := "event:" Space* Name InheritedDef? LongNameDef?</screen> 340<screen>InheritedDef := "=" InheritedExpr</screen> 341<screen>InheritedExpr := Name 342 | Number Space* ("*" Space*)? Name 343 | InheritedExpr Space* "+" Space* InheritedExpr</screen> 344<screen>LongNameDef := ":" NoNewLineChar*</screen> 345<screen>CostLineDef := "events:" Space* Name (Space+ Name)* 346 | "positions:" "instr"? (Space+ "line")?</screen> 347<screen>BodyLine := (empty line) 348 | ('#' NoNewLineChar*) 349 | CostLine 350 | PositionSpecification 351 | AssociationSpecification</screen> 352<screen>CostLine := SubPositionList Costs?</screen> 353<screen>SubPositionList := (SubPosition+ Space+)+</screen> 354<screen>SubPosition := Number | "+" Number | "-" Number | "*"</screen> 355<screen>Costs := (Number Space+)+</screen> 356<screen>PositionSpecification := Position "=" Space* PositionName</screen> 357<screen>Position := CostPosition | CalledPosition</screen> 358<screen>CostPosition := "ob" | "fl" | "fi" | "fe" | "fn"</screen> 359<screen>CalledPosition := " "cob" | "cfl" | "cfn"</screen> 360<screen>PositionName := ( "(" Number ")" )? (Space* NoNewLineChar* )?</screen> 361<screen>AssociationSpecification := CallSpecification 362 | JumpSpecification</screen> 363<screen>CallSpecification := CallLine "\n" CostLine</screen> 364<screen>CallLine := "calls=" Space* Number Space+ SubPositionList</screen> 365<screen>JumpSpecification := ...</screen> 366<screen>Space := " " | "\t"</screen> 367<screen>Number := HexNumber | (Digit)+</screen> 368<screen>Digit := "0" | ... | "9"</screen> 369<screen>HexNumber := "0x" (Digit | HexChar)+</screen> 370<screen>HexChar := "a" | ... | "f" | "A" | ... | "F"</screen> 371<screen>Name = Alpha (Digit | Alpha)*</screen> 372<screen>Alpha = "a" | ... | "z" | "A" | ... | "Z"</screen> 373<screen>NoNewLineChar := all characters without "\n"</screen> 374</para> 375 376</sect2> 377 378<sect2 id="cl-format.reference.header" xreflabel="Description of Header Lines"> 379<title>Description of Header Lines</title> 380 381<para>The header has an arbitrary number of lines of the format 382"key: value". Possible <emphasis>key</emphasis> values for the header are:</para> 383 384<itemizedlist> 385 386 <listitem> 387 <para><computeroutput>version: number</computeroutput> [Callgrind]</para> 388 <para>This is used to distinguish future profile data formats. A 389 major version of 0 or 1 is supposed to be upwards compatible with 390 Cachegrind's format. It is optional; if not appearing, version 1 391 is supposed. Otherwise, this has to be the first header line.</para> 392 </listitem> 393 394 <listitem> 395 <para><computeroutput>pid: process id</computeroutput> [Callgrind]</para> 396 <para>This specifies the process ID of the supervised application 397 for which this profile was generated.</para> 398 </listitem> 399 400 <listitem> 401 <para><computeroutput>cmd: program name + args</computeroutput> [Cachegrind]</para> 402 <para>This specifies the full command line of the supervised 403 application for which this profile was generated.</para> 404 </listitem> 405 406 <listitem> 407 <para><computeroutput>part: number</computeroutput> [Callgrind]</para> 408 <para>This specifies a sequentially incremented number for each dump 409 generated, starting at 1.</para> 410 </listitem> 411 412 <listitem> 413 <para><computeroutput>desc: type: value</computeroutput> [Cachegrind]</para> 414 <para>This specifies various information for this dump. For some 415 types, the semantic is defined, but any description type is allowed. 416 Unknown types should be ignored.</para> 417 <para>There are the types "I1 cache", "D1 cache", "LL cache", which 418 specify parameters used for the cache simulator. These are the only 419 types originally used by Cachegrind. Additionally, Callgrind uses 420 the following types: "Timerange" gives a rough range of the basic 421 block counter, for which the cost of this dump was collected. 422 Type "Trigger" states the reason of why this trace was generated. 423 E.g. program termination or forced interactive dump.</para> 424 </listitem> 425 426 <listitem> 427 <para><computeroutput>positions: [instr] [line]</computeroutput> [Callgrind]</para> 428 <para>For cost lines, this defines the semantic of the first numbers. 429 Any combination of "instr", "bb" and "line" is allowed, but has to be 430 in this order which corresponds to position numbers at the start of 431 the cost lines later in the file.</para> 432 <para>If "instr" is specified, the position is the address of an 433 instruction whose execution raised the events given later on the 434 line. This address is relative to the offset of the binary/shared 435 library file to not have to specify relocation info. For "line", 436 the position is the line number of a source file, which is 437 responsible for the events raised. Note that the mapping of "instr" 438 and "line" positions are given by the debugging line information 439 produced by the compiler.</para> 440 <para>This field is optional. If not specified, "line" is supposed 441 only.</para> 442 </listitem> 443 444 <listitem> 445 <para><computeroutput>events: event type abbreviations</computeroutput> [Cachegrind]</para> 446 <para>A list of short names of the event types logged in this file. 447 The order is the same as in cost lines. The first event type is the 448 second or third number in a cost line, depending on the value of 449 "positions". Callgrind does not add additional cost types. Specify 450 exactly once.</para> 451 <para>Cost types from original Cachegrind are: 452 <itemizedlist> 453 <listitem> 454 <para><command>Ir</command>: Instruction read access</para> 455 </listitem> 456 <listitem> 457 <para><command>I1mr</command>: Instruction Level 1 read cache miss</para> 458 </listitem> 459 <listitem> 460 <para><command>ILmr</command>: Instruction last-level read cache miss</para> 461 </listitem> 462 <listitem> 463 <para>...</para> 464 </listitem> 465 </itemizedlist> 466 </para> 467 </listitem> 468 469 <listitem> 470 <para><computeroutput>summary: costs</computeroutput> [Callgrind]</para> 471 <para><computeroutput>totals: costs</computeroutput> [Cachegrind]</para> 472 <para>The value or the total number of events covered by this trace 473 file. Both keys have the same meaning, but the "totals:" line 474 happens to be at the end of the file, while "summary:" appears in 475 the header. This was added to allow postprocessing tools to know 476 in advance to total cost. The two lines always give the same cost 477 counts.</para> 478 </listitem> 479 480</itemizedlist> 481 482</sect2> 483 484<sect2 id="cl-format.reference.body" xreflabel="Description of Body Lines"> 485<title>Description of Body Lines</title> 486 487<para>There exist lines 488<computeroutput>spec=position</computeroutput>. The values for position 489specifications are arbitrary strings. When starting with "(" and a 490digit, it's a string in compressed format. Otherwise it's the real 491position string. This allows for file and symbol names as position 492strings, as these never start with "(" + <emphasis>digit</emphasis>. 493The compressed format is either "(" <emphasis>number</emphasis> ")" 494<emphasis>space</emphasis> <emphasis>position</emphasis> or only 495"(" <emphasis>number</emphasis> ")". The first relates 496<emphasis>position</emphasis> to <emphasis>number</emphasis> in the 497context of the given format specification from this line to the end of 498the file; it makes the (<emphasis>number</emphasis>) an alias for 499<emphasis>position</emphasis>. Compressed format is always 500optional.</para> 501 502<para>Position specifications allowed:</para> 503<itemizedlist> 504 505 <listitem> 506 <para><computeroutput>ob=</computeroutput> [Callgrind]</para> 507 <para>The ELF object where the cost of next cost lines happens.</para> 508 </listitem> 509 510 <listitem> 511 <para><computeroutput>fl=</computeroutput> [Cachegrind]</para> 512 </listitem> 513 514 <listitem> 515 <para><computeroutput>fi=</computeroutput> [Cachegrind]</para> 516 </listitem> 517 518 <listitem> 519 <para><computeroutput>fe=</computeroutput> [Cachegrind]</para> 520 <para>The source file including the code which is responsible for 521 the cost of next cost lines. "fi="/"fe=" is used when the source 522 file changes inside of a function, i.e. for inlined code.</para> 523 </listitem> 524 525 <listitem> 526 <para><computeroutput>fn=</computeroutput> [Cachegrind]</para> 527 <para>The name of the function where the cost of next cost lines 528 happens.</para> 529 </listitem> 530 531 <listitem> 532 <para><computeroutput>cob=</computeroutput> [Callgrind]</para> 533 <para>The ELF object of the target of the next call cost lines.</para> 534 </listitem> 535 536 <listitem> 537 <para><computeroutput>cfl=</computeroutput> [Callgrind]</para> 538 <para>The source file including the code of the target of the 539 next call cost lines.</para> 540 </listitem> 541 542 <listitem> 543 <para><computeroutput>cfn=</computeroutput> [Callgrind]</para> 544 <para>The name of the target function of the next call cost 545 lines.</para> 546 </listitem> 547 548 <listitem> 549 <para><computeroutput>calls=</computeroutput> [Callgrind]</para> 550 <para>The number of nonrecursive calls which are responsible for the 551 cost specified by the next call cost line. This is the cost spent 552 inside of the called function.</para> 553 <para>After "calls=" there MUST be a cost line. This is the cost 554 spent in the called function. The first number is the source line 555 from where the call happened.</para> 556 </listitem> 557 558 <listitem> 559 <para><computeroutput>jump=count target position</computeroutput> [Callgrind]</para> 560 <para>Unconditional jump, executed count times, to the given target 561 position.</para> 562 </listitem> 563 564 <listitem> 565 <para><computeroutput>jcnd=exe.count jumpcount target position</computeroutput> [Callgrind]</para> 566 <para>Conditional jump, executed exe.count times with jumpcount 567 jumps to the given target position.</para> 568 </listitem> 569 570</itemizedlist> 571 572</sect2> 573 574</sect1> 575 576</chapter> 577