Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
cmake/ | 03-May-2024 | - | 335 | 294 | ||
docs/ | 03-May-2024 | - | 243 | 211 | ||
include/benchmark/ | 03-May-2024 | - | 1,390 | 692 | ||
src/ | 03-May-2024 | - | 4,857 | 3,431 | ||
test/ | 03-May-2024 | - | 3,220 | 2,497 | ||
tools/ | 03-May-2024 | - | 1,046 | 923 | ||
.gitignore | D | 03-May-2024 | 469 | 47 | 38 | |
AUTHORS | D | 03-May-2024 | 1.5 KiB | 44 | 42 | |
CMakeLists.txt | D | 03-May-2024 | 8.6 KiB | 214 | 187 | |
CONTRIBUTING.md | D | 03-May-2024 | 2.4 KiB | 59 | 41 | |
CONTRIBUTORS | D | 03-May-2024 | 2.4 KiB | 63 | 61 | |
LICENSE | D | 03-May-2024 | 11.1 KiB | 203 | 169 | |
README.LLVM | D | 03-May-2024 | 184 | 7 | 5 | |
README.md | D | 03-May-2024 | 32.3 KiB | 936 | 771 |
README.LLVM
1LLVM notes 2---------- 3 4This directory contains the Google Benchmark source code with some unnecessary 5files removed. Note that this directory is under a different license than 6libc++. 7
README.md
1# benchmark 2[](https://travis-ci.org/google/benchmark) 3[](https://ci.appveyor.com/project/google/benchmark/branch/master) 4[](https://coveralls.io/r/google/benchmark) 5[](https://slackin-iqtfqnpzxd.now.sh/) 6 7A library to support the benchmarking of functions, similar to unit-tests. 8 9Discussion group: https://groups.google.com/d/forum/benchmark-discuss 10 11IRC channel: https://freenode.net #googlebenchmark 12 13[Known issues and common problems](#known-issues) 14 15[Additional Tooling Documentation](docs/tools.md) 16 17 18## Building 19 20The basic steps for configuring and building the library look like this: 21 22```bash 23$ git clone https://github.com/google/benchmark.git 24# Benchmark requires GTest as a dependency. Add the source tree as a subdirectory. 25$ git clone https://github.com/google/googletest.git benchmark/googletest 26$ mkdir build && cd build 27$ cmake -G <generator> [options] ../benchmark 28# Assuming a makefile generator was used 29$ make 30``` 31 32Note that Google Benchmark requires GTest to build and run the tests. This 33dependency can be provided three ways: 34 35* Checkout the GTest sources into `benchmark/googletest`. 36* Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during 37 configuration, the library will automatically download and build any required 38 dependencies. 39* Otherwise, if nothing is done, CMake will use `find_package(GTest REQUIRED)` 40 to resolve the required GTest dependency. 41 42If you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF` 43to `CMAKE_ARGS`. 44 45 46## Installation Guide 47 48For Ubuntu and Debian Based System 49 50First make sure you have git and cmake installed (If not please install it) 51 52``` 53sudo apt-get install git 54sudo apt-get install cmake 55``` 56 57Now, let's clone the repository and build it 58 59``` 60git clone https://github.com/google/benchmark.git 61cd benchmark 62mkdir build 63cd build 64cmake .. -DCMAKE_BUILD_TYPE=RELEASE 65make 66``` 67 68We need to install the library globally now 69 70``` 71sudo make install 72``` 73 74Now you have google/benchmark installed in your machine 75Note: Don't forget to link to pthread library while building 76 77## Stable and Experimental Library Versions 78 79The main branch contains the latest stable version of the benchmarking library; 80the API of which can be considered largely stable, with source breaking changes 81being made only upon the release of a new major version. 82 83Newer, experimental, features are implemented and tested on the 84[`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish 85to use, test, and provide feedback on the new features are encouraged to try 86this branch. However, this branch provides no stability guarantees and reserves 87the right to change and break the API at any time. 88 89 90## Example usage 91### Basic usage 92Define a function that executes the code to be measured. 93 94```c++ 95#include <benchmark/benchmark.h> 96 97static void BM_StringCreation(benchmark::State& state) { 98 for (auto _ : state) 99 std::string empty_string; 100} 101// Register the function as a benchmark 102BENCHMARK(BM_StringCreation); 103 104// Define another benchmark 105static void BM_StringCopy(benchmark::State& state) { 106 std::string x = "hello"; 107 for (auto _ : state) 108 std::string copy(x); 109} 110BENCHMARK(BM_StringCopy); 111 112BENCHMARK_MAIN(); 113``` 114 115Don't forget to inform your linker to add benchmark library e.g. through `-lbenchmark` compilation flag. 116 117The benchmark library will reporting the timing for the code within the `for(...)` loop. 118 119### Passing arguments 120Sometimes a family of benchmarks can be implemented with just one routine that 121takes an extra argument to specify which one of the family of benchmarks to 122run. For example, the following code defines a family of benchmarks for 123measuring the speed of `memcpy()` calls of different lengths: 124 125```c++ 126static void BM_memcpy(benchmark::State& state) { 127 char* src = new char[state.range(0)]; 128 char* dst = new char[state.range(0)]; 129 memset(src, 'x', state.range(0)); 130 for (auto _ : state) 131 memcpy(dst, src, state.range(0)); 132 state.SetBytesProcessed(int64_t(state.iterations()) * 133 int64_t(state.range(0))); 134 delete[] src; 135 delete[] dst; 136} 137BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); 138``` 139 140The preceding code is quite repetitive, and can be replaced with the following 141short-hand. The following invocation will pick a few appropriate arguments in 142the specified range and will generate a benchmark for each such argument. 143 144```c++ 145BENCHMARK(BM_memcpy)->Range(8, 8<<10); 146``` 147 148By default the arguments in the range are generated in multiples of eight and 149the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the 150range multiplier is changed to multiples of two. 151 152```c++ 153BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); 154``` 155Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. 156 157You might have a benchmark that depends on two or more inputs. For example, the 158following code defines a family of benchmarks for measuring the speed of set 159insertion. 160 161```c++ 162static void BM_SetInsert(benchmark::State& state) { 163 std::set<int> data; 164 for (auto _ : state) { 165 state.PauseTiming(); 166 data = ConstructRandomSet(state.range(0)); 167 state.ResumeTiming(); 168 for (int j = 0; j < state.range(1); ++j) 169 data.insert(RandomNumber()); 170 } 171} 172BENCHMARK(BM_SetInsert) 173 ->Args({1<<10, 128}) 174 ->Args({2<<10, 128}) 175 ->Args({4<<10, 128}) 176 ->Args({8<<10, 128}) 177 ->Args({1<<10, 512}) 178 ->Args({2<<10, 512}) 179 ->Args({4<<10, 512}) 180 ->Args({8<<10, 512}); 181``` 182 183The preceding code is quite repetitive, and can be replaced with the following 184short-hand. The following macro will pick a few appropriate arguments in the 185product of the two specified ranges and will generate a benchmark for each such 186pair. 187 188```c++ 189BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}}); 190``` 191 192For more complex patterns of inputs, passing a custom function to `Apply` allows 193programmatic specification of an arbitrary set of arguments on which to run the 194benchmark. The following example enumerates a dense range on one parameter, 195and a sparse range on the second. 196 197```c++ 198static void CustomArguments(benchmark::internal::Benchmark* b) { 199 for (int i = 0; i <= 10; ++i) 200 for (int j = 32; j <= 1024*1024; j *= 8) 201 b->Args({i, j}); 202} 203BENCHMARK(BM_SetInsert)->Apply(CustomArguments); 204``` 205 206### Calculate asymptotic complexity (Big O) 207Asymptotic complexity might be calculated for a family of benchmarks. The 208following code will calculate the coefficient for the high-order term in the 209running time and the normalized root-mean square error of string comparison. 210 211```c++ 212static void BM_StringCompare(benchmark::State& state) { 213 std::string s1(state.range(0), '-'); 214 std::string s2(state.range(0), '-'); 215 for (auto _ : state) { 216 benchmark::DoNotOptimize(s1.compare(s2)); 217 } 218 state.SetComplexityN(state.range(0)); 219} 220BENCHMARK(BM_StringCompare) 221 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); 222``` 223 224As shown in the following invocation, asymptotic complexity might also be 225calculated automatically. 226 227```c++ 228BENCHMARK(BM_StringCompare) 229 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); 230``` 231 232The following code will specify asymptotic complexity with a lambda function, 233that might be used to customize high-order term calculation. 234 235```c++ 236BENCHMARK(BM_StringCompare)->RangeMultiplier(2) 237 ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; }); 238``` 239 240### Templated benchmarks 241Templated benchmarks work the same way: This example produces and consumes 242messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the 243absence of multiprogramming. 244 245```c++ 246template <class Q> int BM_Sequential(benchmark::State& state) { 247 Q q; 248 typename Q::value_type v; 249 for (auto _ : state) { 250 for (int i = state.range(0); i--; ) 251 q.push(v); 252 for (int e = state.range(0); e--; ) 253 q.Wait(&v); 254 } 255 // actually messages, not bytes: 256 state.SetBytesProcessed( 257 static_cast<int64_t>(state.iterations())*state.range(0)); 258} 259BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 260``` 261 262Three macros are provided for adding benchmark templates. 263 264```c++ 265#ifdef BENCHMARK_HAS_CXX11 266#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. 267#else // C++ < C++11 268#define BENCHMARK_TEMPLATE(func, arg1) 269#endif 270#define BENCHMARK_TEMPLATE1(func, arg1) 271#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 272``` 273 274### A Faster KeepRunning loop 275 276In C++11 mode, a ranged-based for loop should be used in preference to 277the `KeepRunning` loop for running the benchmarks. For example: 278 279```c++ 280static void BM_Fast(benchmark::State &state) { 281 for (auto _ : state) { 282 FastOperation(); 283 } 284} 285BENCHMARK(BM_Fast); 286``` 287 288The reason the ranged-for loop is faster than using `KeepRunning`, is 289because `KeepRunning` requires a memory load and store of the iteration count 290ever iteration, whereas the ranged-for variant is able to keep the iteration count 291in a register. 292 293For example, an empty inner loop of using the ranged-based for method looks like: 294 295```asm 296# Loop Init 297 mov rbx, qword ptr [r14 + 104] 298 call benchmark::State::StartKeepRunning() 299 test rbx, rbx 300 je .LoopEnd 301.LoopHeader: # =>This Inner Loop Header: Depth=1 302 add rbx, -1 303 jne .LoopHeader 304.LoopEnd: 305``` 306 307Compared to an empty `KeepRunning` loop, which looks like: 308 309```asm 310.LoopHeader: # in Loop: Header=BB0_3 Depth=1 311 cmp byte ptr [rbx], 1 312 jne .LoopInit 313.LoopBody: # =>This Inner Loop Header: Depth=1 314 mov rax, qword ptr [rbx + 8] 315 lea rcx, [rax + 1] 316 mov qword ptr [rbx + 8], rcx 317 cmp rax, qword ptr [rbx + 104] 318 jb .LoopHeader 319 jmp .LoopEnd 320.LoopInit: 321 mov rdi, rbx 322 call benchmark::State::StartKeepRunning() 323 jmp .LoopBody 324.LoopEnd: 325``` 326 327Unless C++03 compatibility is required, the ranged-for variant of writing 328the benchmark loop should be preferred. 329 330## Passing arbitrary arguments to a benchmark 331In C++11 it is possible to define a benchmark that takes an arbitrary number 332of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` 333macro creates a benchmark that invokes `func` with the `benchmark::State` as 334the first argument followed by the specified `args...`. 335The `test_case_name` is appended to the name of the benchmark and 336should describe the values passed. 337 338```c++ 339template <class ...ExtraArgs> 340void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) { 341 [...] 342} 343// Registers a benchmark named "BM_takes_args/int_string_test" that passes 344// the specified values to `extra_args`. 345BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); 346``` 347Note that elements of `...args` may refer to global variables. Users should 348avoid modifying global state inside of a benchmark. 349 350## Using RegisterBenchmark(name, fn, args...) 351 352The `RegisterBenchmark(name, func, args...)` function provides an alternative 353way to create and register benchmarks. 354`RegisterBenchmark(name, func, args...)` creates, registers, and returns a 355pointer to a new benchmark with the specified `name` that invokes 356`func(st, args...)` where `st` is a `benchmark::State` object. 357 358Unlike the `BENCHMARK` registration macros, which can only be used at the global 359scope, the `RegisterBenchmark` can be called anywhere. This allows for 360benchmark tests to be registered programmatically. 361 362Additionally `RegisterBenchmark` allows any callable object to be registered 363as a benchmark. Including capturing lambdas and function objects. 364 365For Example: 366```c++ 367auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; 368 369int main(int argc, char** argv) { 370 for (auto& test_input : { /* ... */ }) 371 benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); 372 benchmark::Initialize(&argc, argv); 373 benchmark::RunSpecifiedBenchmarks(); 374} 375``` 376 377### Multithreaded benchmarks 378In a multithreaded test (benchmark invoked by multiple threads simultaneously), 379it is guaranteed that none of the threads will start until all have reached 380the start of the benchmark loop, and all will have finished before any thread 381exits the benchmark loop. (This behavior is also provided by the `KeepRunning()` 382API) As such, any global setup or teardown can be wrapped in a check against the thread 383index: 384 385```c++ 386static void BM_MultiThreaded(benchmark::State& state) { 387 if (state.thread_index == 0) { 388 // Setup code here. 389 } 390 for (auto _ : state) { 391 // Run the test as normal. 392 } 393 if (state.thread_index == 0) { 394 // Teardown code here. 395 } 396} 397BENCHMARK(BM_MultiThreaded)->Threads(2); 398``` 399 400If the benchmarked code itself uses threads and you want to compare it to 401single-threaded code, you may want to use real-time ("wallclock") measurements 402for latency comparisons: 403 404```c++ 405BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 406``` 407 408Without `UseRealTime`, CPU time is used by default. 409 410 411## Manual timing 412For benchmarking something for which neither CPU time nor real-time are 413correct or accurate enough, completely manual timing is supported using 414the `UseManualTime` function. 415 416When `UseManualTime` is used, the benchmarked code must call 417`SetIterationTime` once per iteration of the benchmark loop to 418report the manually measured time. 419 420An example use case for this is benchmarking GPU execution (e.g. OpenCL 421or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot 422be accurately measured using CPU time or real-time. Instead, they can be 423measured accurately using a dedicated API, and these measurement results 424can be reported back with `SetIterationTime`. 425 426```c++ 427static void BM_ManualTiming(benchmark::State& state) { 428 int microseconds = state.range(0); 429 std::chrono::duration<double, std::micro> sleep_duration { 430 static_cast<double>(microseconds) 431 }; 432 433 for (auto _ : state) { 434 auto start = std::chrono::high_resolution_clock::now(); 435 // Simulate some useful workload with a sleep 436 std::this_thread::sleep_for(sleep_duration); 437 auto end = std::chrono::high_resolution_clock::now(); 438 439 auto elapsed_seconds = 440 std::chrono::duration_cast<std::chrono::duration<double>>( 441 end - start); 442 443 state.SetIterationTime(elapsed_seconds.count()); 444 } 445} 446BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); 447``` 448 449### Preventing optimisation 450To prevent a value or expression from being optimized away by the compiler 451the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` 452functions can be used. 453 454```c++ 455static void BM_test(benchmark::State& state) { 456 for (auto _ : state) { 457 int x = 0; 458 for (int i=0; i < 64; ++i) { 459 benchmark::DoNotOptimize(x += i); 460 } 461 } 462} 463``` 464 465`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either 466memory or a register. For GNU based compilers it acts as read/write barrier 467for global memory. More specifically it forces the compiler to flush pending 468writes to memory and reload any other values as necessary. 469 470Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` 471in any way. `<expr>` may even be removed entirely when the result is already 472known. For example: 473 474```c++ 475 /* Example 1: `<expr>` is removed entirely. */ 476 int foo(int x) { return x + 42; } 477 while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); 478 479 /* Example 2: Result of '<expr>' is only reused */ 480 int bar(int) __attribute__((const)); 481 while (...) DoNotOptimize(bar(0)); // Optimized to: 482 // int __result__ = bar(0); 483 // while (...) DoNotOptimize(__result__); 484``` 485 486The second tool for preventing optimizations is `ClobberMemory()`. In essence 487`ClobberMemory()` forces the compiler to perform all pending writes to global 488memory. Memory managed by block scope objects must be "escaped" using 489`DoNotOptimize(...)` before it can be clobbered. In the below example 490`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized 491away. 492 493```c++ 494static void BM_vector_push_back(benchmark::State& state) { 495 for (auto _ : state) { 496 std::vector<int> v; 497 v.reserve(1); 498 benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered. 499 v.push_back(42); 500 benchmark::ClobberMemory(); // Force 42 to be written to memory. 501 } 502} 503``` 504 505Note that `ClobberMemory()` is only available for GNU or MSVC based compilers. 506 507### Set time unit manually 508If a benchmark runs a few milliseconds it may be hard to visually compare the 509measured times, since the output data is given in nanoseconds per default. In 510order to manually set the time unit, you can specify it manually: 511 512```c++ 513BENCHMARK(BM_test)->Unit(benchmark::kMillisecond); 514``` 515 516## Controlling number of iterations 517In all cases, the number of iterations for which the benchmark is run is 518governed by the amount of time the benchmark takes. Concretely, the number of 519iterations is at least one, not more than 1e9, until CPU time is greater than 520the minimum time, or the wallclock time is 5x minimum time. The minimum time is 521set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on 522the registered benchmark object. 523 524## Reporting the mean, median and standard deviation by repeated benchmarks 525By default each benchmark is run once and that single result is reported. 526However benchmarks are often noisy and a single result may not be representative 527of the overall behavior. For this reason it's possible to repeatedly rerun the 528benchmark. 529 530The number of runs of each benchmark is specified globally by the 531`--benchmark_repetitions` flag or on a per benchmark basis by calling 532`Repetitions` on the registered benchmark object. When a benchmark is run more 533than once the mean, median and standard deviation of the runs will be reported. 534 535Additionally the `--benchmark_report_aggregates_only={true|false}` flag or 536`ReportAggregatesOnly(bool)` function can be used to change how repeated tests 537are reported. By default the result of each repeated run is reported. When this 538option is `true` only the mean, median and standard deviation of the runs is reported. 539Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides 540the value of the flag for that benchmark. 541 542## User-defined statistics for repeated benchmarks 543While having mean, median and standard deviation is nice, this may not be 544enough for everyone. For example you may want to know what is the largest 545observation, e.g. because you have some real-time constraints. This is easy. 546The following code will specify a custom statistic to be calculated, defined 547by a lambda function. 548 549```c++ 550void BM_spin_empty(benchmark::State& state) { 551 for (auto _ : state) { 552 for (int x = 0; x < state.range(0); ++x) { 553 benchmark::DoNotOptimize(x); 554 } 555 } 556} 557 558BENCHMARK(BM_spin_empty) 559 ->ComputeStatistics("max", [](const std::vector<double>& v) -> double { 560 return *(std::max_element(std::begin(v), std::end(v))); 561 }) 562 ->Arg(512); 563``` 564 565## Fixtures 566Fixture tests are created by 567first defining a type that derives from `::benchmark::Fixture` and then 568creating/registering the tests using the following macros: 569 570* `BENCHMARK_F(ClassName, Method)` 571* `BENCHMARK_DEFINE_F(ClassName, Method)` 572* `BENCHMARK_REGISTER_F(ClassName, Method)` 573 574For Example: 575 576```c++ 577class MyFixture : public benchmark::Fixture {}; 578 579BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 580 for (auto _ : st) { 581 ... 582 } 583} 584 585BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 586 for (auto _ : st) { 587 ... 588 } 589} 590/* BarTest is NOT registered */ 591BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 592/* BarTest is now registered */ 593``` 594 595### Templated fixtures 596Also you can create templated fixture by using the following macros: 597 598* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)` 599* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)` 600 601For example: 602```c++ 603template<typename T> 604class MyFixture : public benchmark::Fixture {}; 605 606BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) { 607 for (auto _ : st) { 608 ... 609 } 610} 611 612BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) { 613 for (auto _ : st) { 614 ... 615 } 616} 617 618BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2); 619``` 620 621## User-defined counters 622 623You can add your own counters with user-defined names. The example below 624will add columns "Foo", "Bar" and "Baz" in its output: 625 626```c++ 627static void UserCountersExample1(benchmark::State& state) { 628 double numFoos = 0, numBars = 0, numBazs = 0; 629 for (auto _ : state) { 630 // ... count Foo,Bar,Baz events 631 } 632 state.counters["Foo"] = numFoos; 633 state.counters["Bar"] = numBars; 634 state.counters["Baz"] = numBazs; 635} 636``` 637 638The `state.counters` object is a `std::map` with `std::string` keys 639and `Counter` values. The latter is a `double`-like class, via an implicit 640conversion to `double&`. Thus you can use all of the standard arithmetic 641assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. 642 643In multithreaded benchmarks, each counter is set on the calling thread only. 644When the benchmark finishes, the counters from each thread will be summed; 645the resulting sum is the value which will be shown for the benchmark. 646 647The `Counter` constructor accepts two parameters: the value as a `double` 648and a bit flag which allows you to show counters as rates and/or as 649per-thread averages: 650 651```c++ 652 // sets a simple counter 653 state.counters["Foo"] = numFoos; 654 655 // Set the counter as a rate. It will be presented divided 656 // by the duration of the benchmark. 657 state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); 658 659 // Set the counter as a thread-average quantity. It will 660 // be presented divided by the number of threads. 661 state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); 662 663 // There's also a combined flag: 664 state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); 665``` 666 667When you're compiling in C++11 mode or later you can use `insert()` with 668`std::initializer_list`: 669 670```c++ 671 // With C++11, this can be done: 672 state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); 673 // ... instead of: 674 state.counters["Foo"] = numFoos; 675 state.counters["Bar"] = numBars; 676 state.counters["Baz"] = numBazs; 677``` 678 679### Counter reporting 680 681When using the console reporter, by default, user counters are are printed at 682the end after the table, the same way as ``bytes_processed`` and 683``items_processed``. This is best for cases in which there are few counters, 684or where there are only a couple of lines per benchmark. Here's an example of 685the default output: 686 687``` 688------------------------------------------------------------------------------ 689Benchmark Time CPU Iterations UserCounters... 690------------------------------------------------------------------------------ 691BM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 692BM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m 693BM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 694BM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 695BM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 696BM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 697BM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 698BM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 699BM_Factorial 26 ns 26 ns 26608979 40320 700BM_Factorial/real_time 26 ns 26 ns 26587936 40320 701BM_CalculatePiRange/1 16 ns 16 ns 45704255 0 702BM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 703BM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 704BM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 705``` 706 707If this doesn't suit you, you can print each counter as a table column by 708passing the flag `--benchmark_counters_tabular=true` to the benchmark 709application. This is best for cases in which there are a lot of counters, or 710a lot of lines per individual benchmark. Note that this will trigger a 711reprinting of the table header any time the counter set changes between 712individual benchmarks. Here's an example of corresponding output when 713`--benchmark_counters_tabular=true` is passed: 714 715``` 716--------------------------------------------------------------------------------------- 717Benchmark Time CPU Iterations Bar Bat Baz Foo 718--------------------------------------------------------------------------------------- 719BM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 720BM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 721BM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 722BM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 723BM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 724BM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 725BM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 726BM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 727-------------------------------------------------------------- 728Benchmark Time CPU Iterations 729-------------------------------------------------------------- 730BM_Factorial 26 ns 26 ns 26392245 40320 731BM_Factorial/real_time 26 ns 26 ns 26494107 40320 732BM_CalculatePiRange/1 15 ns 15 ns 45571597 0 733BM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 734BM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 735BM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 736BM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 737BM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 738BM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 739BM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 740BM_CalculatePi/threads:8 2255 ns 9943 ns 70936 741``` 742Note above the additional header printed when the benchmark changes from 743``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does 744not have the same counter set as ``BM_UserCounter``. 745 746## Exiting Benchmarks in Error 747 748When errors caused by external influences, such as file I/O and network 749communication, occur within a benchmark the 750`State::SkipWithError(const char* msg)` function can be used to skip that run 751of benchmark and report the error. Note that only future iterations of the 752`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop 753Users must explicitly exit the loop, otherwise all iterations will be performed. 754Users may explicitly return to exit the benchmark immediately. 755 756The `SkipWithError(...)` function may be used at any point within the benchmark, 757including before and after the benchmark loop. 758 759For example: 760 761```c++ 762static void BM_test(benchmark::State& state) { 763 auto resource = GetResource(); 764 if (!resource.good()) { 765 state.SkipWithError("Resource is not good!"); 766 // KeepRunning() loop will not be entered. 767 } 768 for (state.KeepRunning()) { 769 auto data = resource.read_data(); 770 if (!resource.good()) { 771 state.SkipWithError("Failed to read data!"); 772 break; // Needed to skip the rest of the iteration. 773 } 774 do_stuff(data); 775 } 776} 777 778static void BM_test_ranged_fo(benchmark::State & state) { 779 state.SkipWithError("test will not be entered"); 780 for (auto _ : state) { 781 state.SkipWithError("Failed!"); 782 break; // REQUIRED to prevent all further iterations. 783 } 784} 785``` 786 787## Running a subset of the benchmarks 788 789The `--benchmark_filter=<regex>` option can be used to only run the benchmarks 790which match the specified `<regex>`. For example: 791 792```bash 793$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 794Run on (1 X 2300 MHz CPU ) 7952016-06-25 19:34:24 796Benchmark Time CPU Iterations 797---------------------------------------------------- 798BM_memcpy/32 11 ns 11 ns 79545455 799BM_memcpy/32k 2181 ns 2185 ns 324074 800BM_memcpy/32 12 ns 12 ns 54687500 801BM_memcpy/32k 1834 ns 1837 ns 357143 802``` 803 804 805## Output Formats 806The library supports multiple output formats. Use the 807`--benchmark_format=<console|json|csv>` flag to set the format type. `console` 808is the default format. 809 810The Console format is intended to be a human readable format. By default 811the format generates color output. Context is output on stderr and the 812tabular data on stdout. Example tabular output looks like: 813``` 814Benchmark Time(ns) CPU(ns) Iterations 815---------------------------------------------------------------------- 816BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 817BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 818BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 819``` 820 821The JSON format outputs human readable json split into two top level attributes. 822The `context` attribute contains information about the run in general, including 823information about the CPU and the date. 824The `benchmarks` attribute contains a list of ever benchmark run. Example json 825output looks like: 826```json 827{ 828 "context": { 829 "date": "2015/03/17-18:40:25", 830 "num_cpus": 40, 831 "mhz_per_cpu": 2801, 832 "cpu_scaling_enabled": false, 833 "build_type": "debug" 834 }, 835 "benchmarks": [ 836 { 837 "name": "BM_SetInsert/1024/1", 838 "iterations": 94877, 839 "real_time": 29275, 840 "cpu_time": 29836, 841 "bytes_per_second": 134066, 842 "items_per_second": 33516 843 }, 844 { 845 "name": "BM_SetInsert/1024/8", 846 "iterations": 21609, 847 "real_time": 32317, 848 "cpu_time": 32429, 849 "bytes_per_second": 986770, 850 "items_per_second": 246693 851 }, 852 { 853 "name": "BM_SetInsert/1024/10", 854 "iterations": 21393, 855 "real_time": 32724, 856 "cpu_time": 33355, 857 "bytes_per_second": 1199226, 858 "items_per_second": 299807 859 } 860 ] 861} 862``` 863 864The CSV format outputs comma-separated values. The `context` is output on stderr 865and the CSV itself on stdout. Example CSV output looks like: 866``` 867name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 868"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 869"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 870"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 871``` 872 873## Output Files 874The library supports writing the output of the benchmark to a file specified 875by `--benchmark_out=<filename>`. The format of the output can be specified 876using `--benchmark_out_format={json|console|csv}`. Specifying 877`--benchmark_out` does not suppress the console output. 878 879## Debug vs Release 880By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use: 881 882``` 883cmake -DCMAKE_BUILD_TYPE=Release 884``` 885 886To enable link-time optimisation, use 887 888``` 889cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true 890``` 891 892If you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake cache variables, if autodetection fails. 893If you are using clang, you may need to set `LLVMAR_EXECUTABLE`, `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables. 894 895## Linking against the library 896When using gcc, it is necessary to link against pthread to avoid runtime exceptions. 897This is due to how gcc implements std::thread. 898See [issue #67](https://github.com/google/benchmark/issues/67) for more details. 899 900## Compiler Support 901 902Google Benchmark uses C++11 when building the library. As such we require 903a modern C++ toolchain, both compiler and standard library. 904 905The following minimum versions are strongly recommended build the library: 906 907* GCC 4.8 908* Clang 3.4 909* Visual Studio 2013 910* Intel 2015 Update 1 911 912Anything older *may* work. 913 914Note: Using the library and its headers in C++03 is supported. C++11 is only 915required to build the library. 916 917## Disable CPU frequency scaling 918If you see this error: 919``` 920***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. 921``` 922you might want to disable the CPU frequency scaling while running the benchmark: 923```bash 924sudo cpupower frequency-set --governor performance 925./mybench 926sudo cpupower frequency-set --governor powersave 927``` 928 929# Known Issues 930 931### Windows 932 933* Users must manually link `shlwapi.lib`. Failure to do so may result 934in unresolved symbols. 935 936