• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# benchmark
2[![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
3[![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
4[![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
5
6A library to support the benchmarking of functions, similar to unit-tests.
7
8Discussion group: https://groups.google.com/d/forum/benchmark-discuss
9
10IRC channel: https://freenode.net #googlebenchmark
11
12[Known issues and common problems](#known-issues)
13
14[Additional Tooling Documentation](docs/tools.md)
15
16
17## Building
18
19The basic steps for configuring and building the library look like this:
20
21```bash
22$ git clone https://github.com/google/benchmark.git
23# Benchmark requires GTest as a dependency. Add the source tree as a subdirectory.
24$ git clone https://github.com/google/googletest.git benchmark/googletest
25$ mkdir build && cd build
26$ cmake -G <generator> [options] ../benchmark
27# Assuming a makefile generator was used
28$ make
29```
30
31Note that Google Benchmark requires GTest to build and run the tests. This
32dependency can be provided three ways:
33
34* Checkout the GTest sources into `benchmark/googletest`.
35* Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during
36  configuration, the library will automatically download and build any required
37  dependencies.
38* Otherwise, if nothing is done, CMake will use `find_package(GTest REQUIRED)`
39  to resolve the required GTest dependency.
40
41
42## Installation Guide
43
44For Ubuntu and Debian Based System
45
46First make sure you have git and cmake installed (If not please install it)
47
48```
49sudo apt-get install git
50sudo apt-get install cmake
51```
52
53Now, let's clone the repository and build it
54
55```
56git clone https://github.com/google/benchmark.git
57cd benchmark
58mkdir build
59cd build
60cmake .. -DCMAKE_BUILD_TYPE=RELEASE
61make
62```
63
64We need to install the library globally now
65
66```
67sudo make install
68```
69
70Now you have google/benchmark installed in your machine
71Note: Don't forget to link to pthread library while building
72
73## Stable and Experimental Library Versions
74
75The main branch contains the latest stable version of the benchmarking library;
76the API of which can be considered largely stable, with source breaking changes
77being made only upon the release of a new major version.
78
79Newer, experimental, features are implemented and tested on the
80[`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish
81to use, test, and provide feedback on the new features are encouraged to try
82this branch. However, this branch provides no stability guarantees and reserves
83the right to change and break the API at any time.
84
85
86## Example usage
87### Basic usage
88Define a function that executes the code to be measured.
89
90```c++
91#include <benchmark/benchmark.h>
92
93static void BM_StringCreation(benchmark::State& state) {
94  for (auto _ : state)
95    std::string empty_string;
96}
97// Register the function as a benchmark
98BENCHMARK(BM_StringCreation);
99
100// Define another benchmark
101static void BM_StringCopy(benchmark::State& state) {
102  std::string x = "hello";
103  for (auto _ : state)
104    std::string copy(x);
105}
106BENCHMARK(BM_StringCopy);
107
108BENCHMARK_MAIN();
109```
110
111Don't forget to inform your linker to add benchmark library e.g. through `-lbenchmark` compilation flag.
112
113The benchmark library will reporting the timing for the code within the `for(...)` loop.
114
115### Passing arguments
116Sometimes a family of benchmarks can be implemented with just one routine that
117takes an extra argument to specify which one of the family of benchmarks to
118run. For example, the following code defines a family of benchmarks for
119measuring the speed of `memcpy()` calls of different lengths:
120
121```c++
122static void BM_memcpy(benchmark::State& state) {
123  char* src = new char[state.range(0)];
124  char* dst = new char[state.range(0)];
125  memset(src, 'x', state.range(0));
126  for (auto _ : state)
127    memcpy(dst, src, state.range(0));
128  state.SetBytesProcessed(int64_t(state.iterations()) *
129                          int64_t(state.range(0)));
130  delete[] src;
131  delete[] dst;
132}
133BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
134```
135
136The preceding code is quite repetitive, and can be replaced with the following
137short-hand. The following invocation will pick a few appropriate arguments in
138the specified range and will generate a benchmark for each such argument.
139
140```c++
141BENCHMARK(BM_memcpy)->Range(8, 8<<10);
142```
143
144By default the arguments in the range are generated in multiples of eight and
145the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
146range multiplier is changed to multiples of two.
147
148```c++
149BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
150```
151Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
152
153You might have a benchmark that depends on two or more inputs. For example, the
154following code defines a family of benchmarks for measuring the speed of set
155insertion.
156
157```c++
158static void BM_SetInsert(benchmark::State& state) {
159  std::set<int> data;
160  for (auto _ : state) {
161    state.PauseTiming();
162    data = ConstructRandomSet(state.range(0));
163    state.ResumeTiming();
164    for (int j = 0; j < state.range(1); ++j)
165      data.insert(RandomNumber());
166  }
167}
168BENCHMARK(BM_SetInsert)
169    ->Args({1<<10, 128})
170    ->Args({2<<10, 128})
171    ->Args({4<<10, 128})
172    ->Args({8<<10, 128})
173    ->Args({1<<10, 512})
174    ->Args({2<<10, 512})
175    ->Args({4<<10, 512})
176    ->Args({8<<10, 512});
177```
178
179The preceding code is quite repetitive, and can be replaced with the following
180short-hand. The following macro will pick a few appropriate arguments in the
181product of the two specified ranges and will generate a benchmark for each such
182pair.
183
184```c++
185BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
186```
187
188For more complex patterns of inputs, passing a custom function to `Apply` allows
189programmatic specification of an arbitrary set of arguments on which to run the
190benchmark. The following example enumerates a dense range on one parameter,
191and a sparse range on the second.
192
193```c++
194static void CustomArguments(benchmark::internal::Benchmark* b) {
195  for (int i = 0; i <= 10; ++i)
196    for (int j = 32; j <= 1024*1024; j *= 8)
197      b->Args({i, j});
198}
199BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
200```
201
202### Calculate asymptotic complexity (Big O)
203Asymptotic complexity might be calculated for a family of benchmarks. The
204following code will calculate the coefficient for the high-order term in the
205running time and the normalized root-mean square error of string comparison.
206
207```c++
208static void BM_StringCompare(benchmark::State& state) {
209  std::string s1(state.range(0), '-');
210  std::string s2(state.range(0), '-');
211  for (auto _ : state) {
212    benchmark::DoNotOptimize(s1.compare(s2));
213  }
214  state.SetComplexityN(state.range(0));
215}
216BENCHMARK(BM_StringCompare)
217    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
218```
219
220As shown in the following invocation, asymptotic complexity might also be
221calculated automatically.
222
223```c++
224BENCHMARK(BM_StringCompare)
225    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
226```
227
228The following code will specify asymptotic complexity with a lambda function,
229that might be used to customize high-order term calculation.
230
231```c++
232BENCHMARK(BM_StringCompare)->RangeMultiplier(2)
233    ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; });
234```
235
236### Templated benchmarks
237Templated benchmarks work the same way: This example produces and consumes
238messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
239absence of multiprogramming.
240
241```c++
242template <class Q> int BM_Sequential(benchmark::State& state) {
243  Q q;
244  typename Q::value_type v;
245  for (auto _ : state) {
246    for (int i = state.range(0); i--; )
247      q.push(v);
248    for (int e = state.range(0); e--; )
249      q.Wait(&v);
250  }
251  // actually messages, not bytes:
252  state.SetBytesProcessed(
253      static_cast<int64_t>(state.iterations())*state.range(0));
254}
255BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
256```
257
258Three macros are provided for adding benchmark templates.
259
260```c++
261#ifdef BENCHMARK_HAS_CXX11
262#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
263#else // C++ < C++11
264#define BENCHMARK_TEMPLATE(func, arg1)
265#endif
266#define BENCHMARK_TEMPLATE1(func, arg1)
267#define BENCHMARK_TEMPLATE2(func, arg1, arg2)
268```
269
270### A Faster KeepRunning loop
271
272In C++11 mode, a ranged-based for loop should be used in preference to
273the `KeepRunning` loop for running the benchmarks. For example:
274
275```c++
276static void BM_Fast(benchmark::State &state) {
277  for (auto _ : state) {
278    FastOperation();
279  }
280}
281BENCHMARK(BM_Fast);
282```
283
284The reason the ranged-for loop is faster than using `KeepRunning`, is
285because `KeepRunning` requires a memory load and store of the iteration count
286ever iteration, whereas the ranged-for variant is able to keep the iteration count
287in a register.
288
289For example, an empty inner loop of using the ranged-based for method looks like:
290
291```asm
292# Loop Init
293  mov rbx, qword ptr [r14 + 104]
294  call benchmark::State::StartKeepRunning()
295  test rbx, rbx
296  je .LoopEnd
297.LoopHeader: # =>This Inner Loop Header: Depth=1
298  add rbx, -1
299  jne .LoopHeader
300.LoopEnd:
301```
302
303Compared to an empty `KeepRunning` loop, which looks like:
304
305```asm
306.LoopHeader: # in Loop: Header=BB0_3 Depth=1
307  cmp byte ptr [rbx], 1
308  jne .LoopInit
309.LoopBody: # =>This Inner Loop Header: Depth=1
310  mov rax, qword ptr [rbx + 8]
311  lea rcx, [rax + 1]
312  mov qword ptr [rbx + 8], rcx
313  cmp rax, qword ptr [rbx + 104]
314  jb .LoopHeader
315  jmp .LoopEnd
316.LoopInit:
317  mov rdi, rbx
318  call benchmark::State::StartKeepRunning()
319  jmp .LoopBody
320.LoopEnd:
321```
322
323Unless C++03 compatibility is required, the ranged-for variant of writing
324the benchmark loop should be preferred.
325
326## Passing arbitrary arguments to a benchmark
327In C++11 it is possible to define a benchmark that takes an arbitrary number
328of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
329macro creates a benchmark that invokes `func`  with the `benchmark::State` as
330the first argument followed by the specified `args...`.
331The `test_case_name` is appended to the name of the benchmark and
332should describe the values passed.
333
334```c++
335template <class ...ExtraArgs>
336void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
337  [...]
338}
339// Registers a benchmark named "BM_takes_args/int_string_test" that passes
340// the specified values to `extra_args`.
341BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
342```
343Note that elements of `...args` may refer to global variables. Users should
344avoid modifying global state inside of a benchmark.
345
346## Using RegisterBenchmark(name, fn, args...)
347
348The `RegisterBenchmark(name, func, args...)` function provides an alternative
349way to create and register benchmarks.
350`RegisterBenchmark(name, func, args...)` creates, registers, and returns a
351pointer to a new benchmark with the specified `name` that invokes
352`func(st, args...)` where `st` is a `benchmark::State` object.
353
354Unlike the `BENCHMARK` registration macros, which can only be used at the global
355scope, the `RegisterBenchmark` can be called anywhere. This allows for
356benchmark tests to be registered programmatically.
357
358Additionally `RegisterBenchmark` allows any callable object to be registered
359as a benchmark. Including capturing lambdas and function objects.
360
361For Example:
362```c++
363auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
364
365int main(int argc, char** argv) {
366  for (auto& test_input : { /* ... */ })
367      benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
368  benchmark::Initialize(&argc, argv);
369  benchmark::RunSpecifiedBenchmarks();
370}
371```
372
373### Multithreaded benchmarks
374In a multithreaded test (benchmark invoked by multiple threads simultaneously),
375it is guaranteed that none of the threads will start until all have reached
376the start of the benchmark loop, and all will have finished before any thread
377exits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
378API) As such, any global setup or teardown can be wrapped in a check against the thread
379index:
380
381```c++
382static void BM_MultiThreaded(benchmark::State& state) {
383  if (state.thread_index == 0) {
384    // Setup code here.
385  }
386  for (auto _ : state) {
387    // Run the test as normal.
388  }
389  if (state.thread_index == 0) {
390    // Teardown code here.
391  }
392}
393BENCHMARK(BM_MultiThreaded)->Threads(2);
394```
395
396If the benchmarked code itself uses threads and you want to compare it to
397single-threaded code, you may want to use real-time ("wallclock") measurements
398for latency comparisons:
399
400```c++
401BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
402```
403
404Without `UseRealTime`, CPU time is used by default.
405
406
407## Manual timing
408For benchmarking something for which neither CPU time nor real-time are
409correct or accurate enough, completely manual timing is supported using
410the `UseManualTime` function.
411
412When `UseManualTime` is used, the benchmarked code must call
413`SetIterationTime` once per iteration of the benchmark loop to
414report the manually measured time.
415
416An example use case for this is benchmarking GPU execution (e.g. OpenCL
417or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
418be accurately measured using CPU time or real-time. Instead, they can be
419measured accurately using a dedicated API, and these measurement results
420can be reported back with `SetIterationTime`.
421
422```c++
423static void BM_ManualTiming(benchmark::State& state) {
424  int microseconds = state.range(0);
425  std::chrono::duration<double, std::micro> sleep_duration {
426    static_cast<double>(microseconds)
427  };
428
429  for (auto _ : state) {
430    auto start = std::chrono::high_resolution_clock::now();
431    // Simulate some useful workload with a sleep
432    std::this_thread::sleep_for(sleep_duration);
433    auto end   = std::chrono::high_resolution_clock::now();
434
435    auto elapsed_seconds =
436      std::chrono::duration_cast<std::chrono::duration<double>>(
437        end - start);
438
439    state.SetIterationTime(elapsed_seconds.count());
440  }
441}
442BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
443```
444
445### Preventing optimisation
446To prevent a value or expression from being optimized away by the compiler
447the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
448functions can be used.
449
450```c++
451static void BM_test(benchmark::State& state) {
452  for (auto _ : state) {
453      int x = 0;
454      for (int i=0; i < 64; ++i) {
455        benchmark::DoNotOptimize(x += i);
456      }
457  }
458}
459```
460
461`DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
462memory or a register. For GNU based compilers it acts as read/write barrier
463for global memory. More specifically it forces the compiler to flush pending
464writes to memory and reload any other values as necessary.
465
466Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
467in any way. `<expr>` may even be removed entirely when the result is already
468known. For example:
469
470```c++
471  /* Example 1: `<expr>` is removed entirely. */
472  int foo(int x) { return x + 42; }
473  while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
474
475  /*  Example 2: Result of '<expr>' is only reused */
476  int bar(int) __attribute__((const));
477  while (...) DoNotOptimize(bar(0)); // Optimized to:
478  // int __result__ = bar(0);
479  // while (...) DoNotOptimize(__result__);
480```
481
482The second tool for preventing optimizations is `ClobberMemory()`. In essence
483`ClobberMemory()` forces the compiler to perform all pending writes to global
484memory. Memory managed by block scope objects must be "escaped" using
485`DoNotOptimize(...)` before it can be clobbered. In the below example
486`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
487away.
488
489```c++
490static void BM_vector_push_back(benchmark::State& state) {
491  for (auto _ : state) {
492    std::vector<int> v;
493    v.reserve(1);
494    benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
495    v.push_back(42);
496    benchmark::ClobberMemory(); // Force 42 to be written to memory.
497  }
498}
499```
500
501Note that `ClobberMemory()` is only available for GNU or MSVC based compilers.
502
503### Set time unit manually
504If a benchmark runs a few milliseconds it may be hard to visually compare the
505measured times, since the output data is given in nanoseconds per default. In
506order to manually set the time unit, you can specify it manually:
507
508```c++
509BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
510```
511
512## Controlling number of iterations
513In all cases, the number of iterations for which the benchmark is run is
514governed by the amount of time the benchmark takes. Concretely, the number of
515iterations is at least one, not more than 1e9, until CPU time is greater than
516the minimum time, or the wallclock time is 5x minimum time. The minimum time is
517set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on
518the registered benchmark object.
519
520## Reporting the mean, median and standard deviation by repeated benchmarks
521By default each benchmark is run once and that single result is reported.
522However benchmarks are often noisy and a single result may not be representative
523of the overall behavior. For this reason it's possible to repeatedly rerun the
524benchmark.
525
526The number of runs of each benchmark is specified globally by the
527`--benchmark_repetitions` flag or on a per benchmark basis by calling
528`Repetitions` on the registered benchmark object. When a benchmark is run more
529than once the mean, median and standard deviation of the runs will be reported.
530
531Additionally the `--benchmark_report_aggregates_only={true|false}` flag or
532`ReportAggregatesOnly(bool)` function can be used to change how repeated tests
533are reported. By default the result of each repeated run is reported. When this
534option is `true` only the mean, median and standard deviation of the runs is reported.
535Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides
536the value of the flag for that benchmark.
537
538## User-defined statistics for repeated benchmarks
539While having mean, median and standard deviation is nice, this may not be
540enough for everyone. For example you may want to know what is the largest
541observation, e.g. because you have some real-time constraints. This is easy.
542The following code will specify a custom statistic to be calculated, defined
543by a lambda function.
544
545```c++
546void BM_spin_empty(benchmark::State& state) {
547  for (auto _ : state) {
548    for (int x = 0; x < state.range(0); ++x) {
549      benchmark::DoNotOptimize(x);
550    }
551  }
552}
553
554BENCHMARK(BM_spin_empty)
555  ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
556    return *(std::max_element(std::begin(v), std::end(v)));
557  })
558  ->Arg(512);
559```
560
561## Fixtures
562Fixture tests are created by
563first defining a type that derives from `::benchmark::Fixture` and then
564creating/registering the tests using the following macros:
565
566* `BENCHMARK_F(ClassName, Method)`
567* `BENCHMARK_DEFINE_F(ClassName, Method)`
568* `BENCHMARK_REGISTER_F(ClassName, Method)`
569
570For Example:
571
572```c++
573class MyFixture : public benchmark::Fixture {};
574
575BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
576   for (auto _ : st) {
577     ...
578  }
579}
580
581BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
582   for (auto _ : st) {
583     ...
584  }
585}
586/* BarTest is NOT registered */
587BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
588/* BarTest is now registered */
589```
590
591### Templated fixtures
592Also you can create templated fixture by using the following macros:
593
594* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
595* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
596
597For example:
598```c++
599template<typename T>
600class MyFixture : public benchmark::Fixture {};
601
602BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
603   for (auto _ : st) {
604     ...
605  }
606}
607
608BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
609   for (auto _ : st) {
610     ...
611  }
612}
613
614BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
615```
616
617## User-defined counters
618
619You can add your own counters with user-defined names. The example below
620will add columns "Foo", "Bar" and "Baz" in its output:
621
622```c++
623static void UserCountersExample1(benchmark::State& state) {
624  double numFoos = 0, numBars = 0, numBazs = 0;
625  for (auto _ : state) {
626    // ... count Foo,Bar,Baz events
627  }
628  state.counters["Foo"] = numFoos;
629  state.counters["Bar"] = numBars;
630  state.counters["Baz"] = numBazs;
631}
632```
633
634The `state.counters` object is a `std::map` with `std::string` keys
635and `Counter` values. The latter is a `double`-like class, via an implicit
636conversion to `double&`. Thus you can use all of the standard arithmetic
637assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
638
639In multithreaded benchmarks, each counter is set on the calling thread only.
640When the benchmark finishes, the counters from each thread will be summed;
641the resulting sum is the value which will be shown for the benchmark.
642
643The `Counter` constructor accepts two parameters: the value as a `double`
644and a bit flag which allows you to show counters as rates and/or as
645per-thread averages:
646
647```c++
648  // sets a simple counter
649  state.counters["Foo"] = numFoos;
650
651  // Set the counter as a rate. It will be presented divided
652  // by the duration of the benchmark.
653  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
654
655  // Set the counter as a thread-average quantity. It will
656  // be presented divided by the number of threads.
657  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
658
659  // There's also a combined flag:
660  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
661```
662
663When you're compiling in C++11 mode or later you can use `insert()` with
664`std::initializer_list`:
665
666```c++
667  // With C++11, this can be done:
668  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
669  // ... instead of:
670  state.counters["Foo"] = numFoos;
671  state.counters["Bar"] = numBars;
672  state.counters["Baz"] = numBazs;
673```
674
675### Counter reporting
676
677When using the console reporter, by default, user counters are are printed at
678the end after the table, the same way as ``bytes_processed`` and
679``items_processed``. This is best for cases in which there are few counters,
680or where there are only a couple of lines per benchmark. Here's an example of
681the default output:
682
683```
684------------------------------------------------------------------------------
685Benchmark                        Time           CPU Iterations UserCounters...
686------------------------------------------------------------------------------
687BM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
688BM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
689BM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
690BM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
691BM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
692BM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
693BM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
694BM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
695BM_Factorial                    26 ns         26 ns   26608979 40320
696BM_Factorial/real_time          26 ns         26 ns   26587936 40320
697BM_CalculatePiRange/1           16 ns         16 ns   45704255 0
698BM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
699BM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
700BM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
701```
702
703If this doesn't suit you, you can print each counter as a table column by
704passing the flag `--benchmark_counters_tabular=true` to the benchmark
705application. This is best for cases in which there are a lot of counters, or
706a lot of lines per individual benchmark. Note that this will trigger a
707reprinting of the table header any time the counter set changes between
708individual benchmarks. Here's an example of corresponding output when
709`--benchmark_counters_tabular=true` is passed:
710
711```
712---------------------------------------------------------------------------------------
713Benchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
714---------------------------------------------------------------------------------------
715BM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
716BM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
717BM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
718BM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
719BM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
720BM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
721BM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
722BM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
723--------------------------------------------------------------
724Benchmark                        Time           CPU Iterations
725--------------------------------------------------------------
726BM_Factorial                    26 ns         26 ns   26392245 40320
727BM_Factorial/real_time          26 ns         26 ns   26494107 40320
728BM_CalculatePiRange/1           15 ns         15 ns   45571597 0
729BM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
730BM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
731BM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
732BM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
733BM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
734BM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
735BM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
736BM_CalculatePi/threads:8      2255 ns       9943 ns      70936
737```
738Note above the additional header printed when the benchmark changes from
739``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
740not have the same counter set as ``BM_UserCounter``.
741
742## Exiting Benchmarks in Error
743
744When errors caused by external influences, such as file I/O and network
745communication, occur within a benchmark the
746`State::SkipWithError(const char* msg)` function can be used to skip that run
747of benchmark and report the error. Note that only future iterations of the
748`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
749Users must explicitly exit the loop, otherwise all iterations will be performed.
750Users may explicitly return to exit the benchmark immediately.
751
752The `SkipWithError(...)` function may be used at any point within the benchmark,
753including before and after the benchmark loop.
754
755For example:
756
757```c++
758static void BM_test(benchmark::State& state) {
759  auto resource = GetResource();
760  if (!resource.good()) {
761      state.SkipWithError("Resource is not good!");
762      // KeepRunning() loop will not be entered.
763  }
764  for (state.KeepRunning()) {
765      auto data = resource.read_data();
766      if (!resource.good()) {
767        state.SkipWithError("Failed to read data!");
768        break; // Needed to skip the rest of the iteration.
769     }
770     do_stuff(data);
771  }
772}
773
774static void BM_test_ranged_fo(benchmark::State & state) {
775  state.SkipWithError("test will not be entered");
776  for (auto _ : state) {
777    state.SkipWithError("Failed!");
778    break; // REQUIRED to prevent all further iterations.
779  }
780}
781```
782
783## Running a subset of the benchmarks
784
785The `--benchmark_filter=<regex>` option can be used to only run the benchmarks
786which match the specified `<regex>`. For example:
787
788```bash
789$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
790Run on (1 X 2300 MHz CPU )
7912016-06-25 19:34:24
792Benchmark              Time           CPU Iterations
793----------------------------------------------------
794BM_memcpy/32          11 ns         11 ns   79545455
795BM_memcpy/32k       2181 ns       2185 ns     324074
796BM_memcpy/32          12 ns         12 ns   54687500
797BM_memcpy/32k       1834 ns       1837 ns     357143
798```
799
800
801## Output Formats
802The library supports multiple output formats. Use the
803`--benchmark_format=<console|json|csv>` flag to set the format type. `console`
804is the default format.
805
806The Console format is intended to be a human readable format. By default
807the format generates color output. Context is output on stderr and the
808tabular data on stdout. Example tabular output looks like:
809```
810Benchmark                               Time(ns)    CPU(ns) Iterations
811----------------------------------------------------------------------
812BM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
813BM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
814BM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
815```
816
817The JSON format outputs human readable json split into two top level attributes.
818The `context` attribute contains information about the run in general, including
819information about the CPU and the date.
820The `benchmarks` attribute contains a list of ever benchmark run. Example json
821output looks like:
822```json
823{
824  "context": {
825    "date": "2015/03/17-18:40:25",
826    "num_cpus": 40,
827    "mhz_per_cpu": 2801,
828    "cpu_scaling_enabled": false,
829    "build_type": "debug"
830  },
831  "benchmarks": [
832    {
833      "name": "BM_SetInsert/1024/1",
834      "iterations": 94877,
835      "real_time": 29275,
836      "cpu_time": 29836,
837      "bytes_per_second": 134066,
838      "items_per_second": 33516
839    },
840    {
841      "name": "BM_SetInsert/1024/8",
842      "iterations": 21609,
843      "real_time": 32317,
844      "cpu_time": 32429,
845      "bytes_per_second": 986770,
846      "items_per_second": 246693
847    },
848    {
849      "name": "BM_SetInsert/1024/10",
850      "iterations": 21393,
851      "real_time": 32724,
852      "cpu_time": 33355,
853      "bytes_per_second": 1199226,
854      "items_per_second": 299807
855    }
856  ]
857}
858```
859
860The CSV format outputs comma-separated values. The `context` is output on stderr
861and the CSV itself on stdout. Example CSV output looks like:
862```
863name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
864"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
865"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
866"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
867```
868
869## Output Files
870The library supports writing the output of the benchmark to a file specified
871by `--benchmark_out=<filename>`. The format of the output can be specified
872using `--benchmark_out_format={json|console|csv}`. Specifying
873`--benchmark_out` does not suppress the console output.
874
875## Debug vs Release
876By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
877
878```
879cmake -DCMAKE_BUILD_TYPE=Release
880```
881
882To enable link-time optimisation, use
883
884```
885cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
886```
887
888If you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake cache variables, if autodetection fails.
889If you are using clang, you may need to set `LLVMAR_EXECUTABLE`, `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables.
890
891## Linking against the library
892When using gcc, it is necessary to link against pthread to avoid runtime exceptions.
893This is due to how gcc implements std::thread.
894See [issue #67](https://github.com/google/benchmark/issues/67) for more details.
895
896## Compiler Support
897
898Google Benchmark uses C++11 when building the library. As such we require
899a modern C++ toolchain, both compiler and standard library.
900
901The following minimum versions are strongly recommended build the library:
902
903* GCC 4.8
904* Clang 3.4
905* Visual Studio 2013
906* Intel 2015 Update 1
907
908Anything older *may* work.
909
910Note: Using the library and its headers in C++03 is supported. C++11 is only
911required to build the library.
912
913## Disable CPU frequency scaling
914If you see this error:
915```
916***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
917```
918you might want to disable the CPU frequency scaling while running the benchmark:
919```bash
920sudo cpupower frequency-set --governor performance
921./mybench
922sudo cpupower frequency-set --governor powersave
923```
924
925# Known Issues
926
927### Windows
928
929* Users must manually link `shlwapi.lib`. Failure to do so may result
930in unresolved symbols.
931
932