• Home
Name
Date
Size
#Lines
LOC

..--

cmake/03-May-2024-335294

docs/03-May-2024-243211

include/benchmark/03-May-2024-1,390692

src/03-May-2024-4,8573,431

test/03-May-2024-3,2202,497

tools/03-May-2024-1,046923

.gitignoreD03-May-2024469 4738

AUTHORSD03-May-20241.5 KiB4442

CMakeLists.txtD03-May-20248.6 KiB214187

CONTRIBUTING.mdD03-May-20242.4 KiB5941

CONTRIBUTORSD03-May-20242.4 KiB6361

LICENSED03-May-202411.1 KiB203169

README.LLVMD03-May-2024184 75

README.mdD03-May-202432.3 KiB936771

README.LLVM

1LLVM notes
2----------
3
4This directory contains the Google Benchmark source code with some unnecessary
5files removed. Note that this directory is under a different license than
6libc++.
7

README.md

1# benchmark
2[![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
3[![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
4[![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
5[![slackin](https://slackin-iqtfqnpzxd.now.sh/badge.svg)](https://slackin-iqtfqnpzxd.now.sh/)
6
7A library to support the benchmarking of functions, similar to unit-tests.
8
9Discussion group: https://groups.google.com/d/forum/benchmark-discuss
10
11IRC channel: https://freenode.net #googlebenchmark
12
13[Known issues and common problems](#known-issues)
14
15[Additional Tooling Documentation](docs/tools.md)
16
17
18## Building
19
20The basic steps for configuring and building the library look like this:
21
22```bash
23$ git clone https://github.com/google/benchmark.git
24# Benchmark requires GTest as a dependency. Add the source tree as a subdirectory.
25$ git clone https://github.com/google/googletest.git benchmark/googletest
26$ mkdir build && cd build
27$ cmake -G <generator> [options] ../benchmark
28# Assuming a makefile generator was used
29$ make
30```
31
32Note that Google Benchmark requires GTest to build and run the tests. This
33dependency can be provided three ways:
34
35* Checkout the GTest sources into `benchmark/googletest`.
36* Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during
37  configuration, the library will automatically download and build any required
38  dependencies.
39* Otherwise, if nothing is done, CMake will use `find_package(GTest REQUIRED)`
40  to resolve the required GTest dependency.
41
42If you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF`
43to `CMAKE_ARGS`.
44
45
46## Installation Guide
47
48For Ubuntu and Debian Based System
49
50First make sure you have git and cmake installed (If not please install it)
51
52```
53sudo apt-get install git
54sudo apt-get install cmake
55```
56
57Now, let's clone the repository and build it
58
59```
60git clone https://github.com/google/benchmark.git
61cd benchmark
62mkdir build
63cd build
64cmake .. -DCMAKE_BUILD_TYPE=RELEASE
65make
66```
67
68We need to install the library globally now
69
70```
71sudo make install
72```
73
74Now you have google/benchmark installed in your machine
75Note: Don't forget to link to pthread library while building
76
77## Stable and Experimental Library Versions
78
79The main branch contains the latest stable version of the benchmarking library;
80the API of which can be considered largely stable, with source breaking changes
81being made only upon the release of a new major version.
82
83Newer, experimental, features are implemented and tested on the
84[`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish
85to use, test, and provide feedback on the new features are encouraged to try
86this branch. However, this branch provides no stability guarantees and reserves
87the right to change and break the API at any time.
88
89
90## Example usage
91### Basic usage
92Define a function that executes the code to be measured.
93
94```c++
95#include <benchmark/benchmark.h>
96
97static void BM_StringCreation(benchmark::State& state) {
98  for (auto _ : state)
99    std::string empty_string;
100}
101// Register the function as a benchmark
102BENCHMARK(BM_StringCreation);
103
104// Define another benchmark
105static void BM_StringCopy(benchmark::State& state) {
106  std::string x = "hello";
107  for (auto _ : state)
108    std::string copy(x);
109}
110BENCHMARK(BM_StringCopy);
111
112BENCHMARK_MAIN();
113```
114
115Don't forget to inform your linker to add benchmark library e.g. through `-lbenchmark` compilation flag.
116
117The benchmark library will reporting the timing for the code within the `for(...)` loop.
118
119### Passing arguments
120Sometimes a family of benchmarks can be implemented with just one routine that
121takes an extra argument to specify which one of the family of benchmarks to
122run. For example, the following code defines a family of benchmarks for
123measuring the speed of `memcpy()` calls of different lengths:
124
125```c++
126static void BM_memcpy(benchmark::State& state) {
127  char* src = new char[state.range(0)];
128  char* dst = new char[state.range(0)];
129  memset(src, 'x', state.range(0));
130  for (auto _ : state)
131    memcpy(dst, src, state.range(0));
132  state.SetBytesProcessed(int64_t(state.iterations()) *
133                          int64_t(state.range(0)));
134  delete[] src;
135  delete[] dst;
136}
137BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
138```
139
140The preceding code is quite repetitive, and can be replaced with the following
141short-hand. The following invocation will pick a few appropriate arguments in
142the specified range and will generate a benchmark for each such argument.
143
144```c++
145BENCHMARK(BM_memcpy)->Range(8, 8<<10);
146```
147
148By default the arguments in the range are generated in multiples of eight and
149the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
150range multiplier is changed to multiples of two.
151
152```c++
153BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
154```
155Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
156
157You might have a benchmark that depends on two or more inputs. For example, the
158following code defines a family of benchmarks for measuring the speed of set
159insertion.
160
161```c++
162static void BM_SetInsert(benchmark::State& state) {
163  std::set<int> data;
164  for (auto _ : state) {
165    state.PauseTiming();
166    data = ConstructRandomSet(state.range(0));
167    state.ResumeTiming();
168    for (int j = 0; j < state.range(1); ++j)
169      data.insert(RandomNumber());
170  }
171}
172BENCHMARK(BM_SetInsert)
173    ->Args({1<<10, 128})
174    ->Args({2<<10, 128})
175    ->Args({4<<10, 128})
176    ->Args({8<<10, 128})
177    ->Args({1<<10, 512})
178    ->Args({2<<10, 512})
179    ->Args({4<<10, 512})
180    ->Args({8<<10, 512});
181```
182
183The preceding code is quite repetitive, and can be replaced with the following
184short-hand. The following macro will pick a few appropriate arguments in the
185product of the two specified ranges and will generate a benchmark for each such
186pair.
187
188```c++
189BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
190```
191
192For more complex patterns of inputs, passing a custom function to `Apply` allows
193programmatic specification of an arbitrary set of arguments on which to run the
194benchmark. The following example enumerates a dense range on one parameter,
195and a sparse range on the second.
196
197```c++
198static void CustomArguments(benchmark::internal::Benchmark* b) {
199  for (int i = 0; i <= 10; ++i)
200    for (int j = 32; j <= 1024*1024; j *= 8)
201      b->Args({i, j});
202}
203BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
204```
205
206### Calculate asymptotic complexity (Big O)
207Asymptotic complexity might be calculated for a family of benchmarks. The
208following code will calculate the coefficient for the high-order term in the
209running time and the normalized root-mean square error of string comparison.
210
211```c++
212static void BM_StringCompare(benchmark::State& state) {
213  std::string s1(state.range(0), '-');
214  std::string s2(state.range(0), '-');
215  for (auto _ : state) {
216    benchmark::DoNotOptimize(s1.compare(s2));
217  }
218  state.SetComplexityN(state.range(0));
219}
220BENCHMARK(BM_StringCompare)
221    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
222```
223
224As shown in the following invocation, asymptotic complexity might also be
225calculated automatically.
226
227```c++
228BENCHMARK(BM_StringCompare)
229    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
230```
231
232The following code will specify asymptotic complexity with a lambda function,
233that might be used to customize high-order term calculation.
234
235```c++
236BENCHMARK(BM_StringCompare)->RangeMultiplier(2)
237    ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; });
238```
239
240### Templated benchmarks
241Templated benchmarks work the same way: This example produces and consumes
242messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
243absence of multiprogramming.
244
245```c++
246template <class Q> int BM_Sequential(benchmark::State& state) {
247  Q q;
248  typename Q::value_type v;
249  for (auto _ : state) {
250    for (int i = state.range(0); i--; )
251      q.push(v);
252    for (int e = state.range(0); e--; )
253      q.Wait(&v);
254  }
255  // actually messages, not bytes:
256  state.SetBytesProcessed(
257      static_cast<int64_t>(state.iterations())*state.range(0));
258}
259BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
260```
261
262Three macros are provided for adding benchmark templates.
263
264```c++
265#ifdef BENCHMARK_HAS_CXX11
266#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
267#else // C++ < C++11
268#define BENCHMARK_TEMPLATE(func, arg1)
269#endif
270#define BENCHMARK_TEMPLATE1(func, arg1)
271#define BENCHMARK_TEMPLATE2(func, arg1, arg2)
272```
273
274### A Faster KeepRunning loop
275
276In C++11 mode, a ranged-based for loop should be used in preference to
277the `KeepRunning` loop for running the benchmarks. For example:
278
279```c++
280static void BM_Fast(benchmark::State &state) {
281  for (auto _ : state) {
282    FastOperation();
283  }
284}
285BENCHMARK(BM_Fast);
286```
287
288The reason the ranged-for loop is faster than using `KeepRunning`, is
289because `KeepRunning` requires a memory load and store of the iteration count
290ever iteration, whereas the ranged-for variant is able to keep the iteration count
291in a register.
292
293For example, an empty inner loop of using the ranged-based for method looks like:
294
295```asm
296# Loop Init
297  mov rbx, qword ptr [r14 + 104]
298  call benchmark::State::StartKeepRunning()
299  test rbx, rbx
300  je .LoopEnd
301.LoopHeader: # =>This Inner Loop Header: Depth=1
302  add rbx, -1
303  jne .LoopHeader
304.LoopEnd:
305```
306
307Compared to an empty `KeepRunning` loop, which looks like:
308
309```asm
310.LoopHeader: # in Loop: Header=BB0_3 Depth=1
311  cmp byte ptr [rbx], 1
312  jne .LoopInit
313.LoopBody: # =>This Inner Loop Header: Depth=1
314  mov rax, qword ptr [rbx + 8]
315  lea rcx, [rax + 1]
316  mov qword ptr [rbx + 8], rcx
317  cmp rax, qword ptr [rbx + 104]
318  jb .LoopHeader
319  jmp .LoopEnd
320.LoopInit:
321  mov rdi, rbx
322  call benchmark::State::StartKeepRunning()
323  jmp .LoopBody
324.LoopEnd:
325```
326
327Unless C++03 compatibility is required, the ranged-for variant of writing
328the benchmark loop should be preferred.
329
330## Passing arbitrary arguments to a benchmark
331In C++11 it is possible to define a benchmark that takes an arbitrary number
332of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
333macro creates a benchmark that invokes `func`  with the `benchmark::State` as
334the first argument followed by the specified `args...`.
335The `test_case_name` is appended to the name of the benchmark and
336should describe the values passed.
337
338```c++
339template <class ...ExtraArgs>
340void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
341  [...]
342}
343// Registers a benchmark named "BM_takes_args/int_string_test" that passes
344// the specified values to `extra_args`.
345BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
346```
347Note that elements of `...args` may refer to global variables. Users should
348avoid modifying global state inside of a benchmark.
349
350## Using RegisterBenchmark(name, fn, args...)
351
352The `RegisterBenchmark(name, func, args...)` function provides an alternative
353way to create and register benchmarks.
354`RegisterBenchmark(name, func, args...)` creates, registers, and returns a
355pointer to a new benchmark with the specified `name` that invokes
356`func(st, args...)` where `st` is a `benchmark::State` object.
357
358Unlike the `BENCHMARK` registration macros, which can only be used at the global
359scope, the `RegisterBenchmark` can be called anywhere. This allows for
360benchmark tests to be registered programmatically.
361
362Additionally `RegisterBenchmark` allows any callable object to be registered
363as a benchmark. Including capturing lambdas and function objects.
364
365For Example:
366```c++
367auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
368
369int main(int argc, char** argv) {
370  for (auto& test_input : { /* ... */ })
371      benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
372  benchmark::Initialize(&argc, argv);
373  benchmark::RunSpecifiedBenchmarks();
374}
375```
376
377### Multithreaded benchmarks
378In a multithreaded test (benchmark invoked by multiple threads simultaneously),
379it is guaranteed that none of the threads will start until all have reached
380the start of the benchmark loop, and all will have finished before any thread
381exits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
382API) As such, any global setup or teardown can be wrapped in a check against the thread
383index:
384
385```c++
386static void BM_MultiThreaded(benchmark::State& state) {
387  if (state.thread_index == 0) {
388    // Setup code here.
389  }
390  for (auto _ : state) {
391    // Run the test as normal.
392  }
393  if (state.thread_index == 0) {
394    // Teardown code here.
395  }
396}
397BENCHMARK(BM_MultiThreaded)->Threads(2);
398```
399
400If the benchmarked code itself uses threads and you want to compare it to
401single-threaded code, you may want to use real-time ("wallclock") measurements
402for latency comparisons:
403
404```c++
405BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
406```
407
408Without `UseRealTime`, CPU time is used by default.
409
410
411## Manual timing
412For benchmarking something for which neither CPU time nor real-time are
413correct or accurate enough, completely manual timing is supported using
414the `UseManualTime` function.
415
416When `UseManualTime` is used, the benchmarked code must call
417`SetIterationTime` once per iteration of the benchmark loop to
418report the manually measured time.
419
420An example use case for this is benchmarking GPU execution (e.g. OpenCL
421or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
422be accurately measured using CPU time or real-time. Instead, they can be
423measured accurately using a dedicated API, and these measurement results
424can be reported back with `SetIterationTime`.
425
426```c++
427static void BM_ManualTiming(benchmark::State& state) {
428  int microseconds = state.range(0);
429  std::chrono::duration<double, std::micro> sleep_duration {
430    static_cast<double>(microseconds)
431  };
432
433  for (auto _ : state) {
434    auto start = std::chrono::high_resolution_clock::now();
435    // Simulate some useful workload with a sleep
436    std::this_thread::sleep_for(sleep_duration);
437    auto end   = std::chrono::high_resolution_clock::now();
438
439    auto elapsed_seconds =
440      std::chrono::duration_cast<std::chrono::duration<double>>(
441        end - start);
442
443    state.SetIterationTime(elapsed_seconds.count());
444  }
445}
446BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
447```
448
449### Preventing optimisation
450To prevent a value or expression from being optimized away by the compiler
451the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
452functions can be used.
453
454```c++
455static void BM_test(benchmark::State& state) {
456  for (auto _ : state) {
457      int x = 0;
458      for (int i=0; i < 64; ++i) {
459        benchmark::DoNotOptimize(x += i);
460      }
461  }
462}
463```
464
465`DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
466memory or a register. For GNU based compilers it acts as read/write barrier
467for global memory. More specifically it forces the compiler to flush pending
468writes to memory and reload any other values as necessary.
469
470Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
471in any way. `<expr>` may even be removed entirely when the result is already
472known. For example:
473
474```c++
475  /* Example 1: `<expr>` is removed entirely. */
476  int foo(int x) { return x + 42; }
477  while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
478
479  /*  Example 2: Result of '<expr>' is only reused */
480  int bar(int) __attribute__((const));
481  while (...) DoNotOptimize(bar(0)); // Optimized to:
482  // int __result__ = bar(0);
483  // while (...) DoNotOptimize(__result__);
484```
485
486The second tool for preventing optimizations is `ClobberMemory()`. In essence
487`ClobberMemory()` forces the compiler to perform all pending writes to global
488memory. Memory managed by block scope objects must be "escaped" using
489`DoNotOptimize(...)` before it can be clobbered. In the below example
490`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
491away.
492
493```c++
494static void BM_vector_push_back(benchmark::State& state) {
495  for (auto _ : state) {
496    std::vector<int> v;
497    v.reserve(1);
498    benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
499    v.push_back(42);
500    benchmark::ClobberMemory(); // Force 42 to be written to memory.
501  }
502}
503```
504
505Note that `ClobberMemory()` is only available for GNU or MSVC based compilers.
506
507### Set time unit manually
508If a benchmark runs a few milliseconds it may be hard to visually compare the
509measured times, since the output data is given in nanoseconds per default. In
510order to manually set the time unit, you can specify it manually:
511
512```c++
513BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
514```
515
516## Controlling number of iterations
517In all cases, the number of iterations for which the benchmark is run is
518governed by the amount of time the benchmark takes. Concretely, the number of
519iterations is at least one, not more than 1e9, until CPU time is greater than
520the minimum time, or the wallclock time is 5x minimum time. The minimum time is
521set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on
522the registered benchmark object.
523
524## Reporting the mean, median and standard deviation by repeated benchmarks
525By default each benchmark is run once and that single result is reported.
526However benchmarks are often noisy and a single result may not be representative
527of the overall behavior. For this reason it's possible to repeatedly rerun the
528benchmark.
529
530The number of runs of each benchmark is specified globally by the
531`--benchmark_repetitions` flag or on a per benchmark basis by calling
532`Repetitions` on the registered benchmark object. When a benchmark is run more
533than once the mean, median and standard deviation of the runs will be reported.
534
535Additionally the `--benchmark_report_aggregates_only={true|false}` flag or
536`ReportAggregatesOnly(bool)` function can be used to change how repeated tests
537are reported. By default the result of each repeated run is reported. When this
538option is `true` only the mean, median and standard deviation of the runs is reported.
539Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides
540the value of the flag for that benchmark.
541
542## User-defined statistics for repeated benchmarks
543While having mean, median and standard deviation is nice, this may not be
544enough for everyone. For example you may want to know what is the largest
545observation, e.g. because you have some real-time constraints. This is easy.
546The following code will specify a custom statistic to be calculated, defined
547by a lambda function.
548
549```c++
550void BM_spin_empty(benchmark::State& state) {
551  for (auto _ : state) {
552    for (int x = 0; x < state.range(0); ++x) {
553      benchmark::DoNotOptimize(x);
554    }
555  }
556}
557
558BENCHMARK(BM_spin_empty)
559  ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
560    return *(std::max_element(std::begin(v), std::end(v)));
561  })
562  ->Arg(512);
563```
564
565## Fixtures
566Fixture tests are created by
567first defining a type that derives from `::benchmark::Fixture` and then
568creating/registering the tests using the following macros:
569
570* `BENCHMARK_F(ClassName, Method)`
571* `BENCHMARK_DEFINE_F(ClassName, Method)`
572* `BENCHMARK_REGISTER_F(ClassName, Method)`
573
574For Example:
575
576```c++
577class MyFixture : public benchmark::Fixture {};
578
579BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
580   for (auto _ : st) {
581     ...
582  }
583}
584
585BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
586   for (auto _ : st) {
587     ...
588  }
589}
590/* BarTest is NOT registered */
591BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
592/* BarTest is now registered */
593```
594
595### Templated fixtures
596Also you can create templated fixture by using the following macros:
597
598* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
599* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
600
601For example:
602```c++
603template<typename T>
604class MyFixture : public benchmark::Fixture {};
605
606BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
607   for (auto _ : st) {
608     ...
609  }
610}
611
612BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
613   for (auto _ : st) {
614     ...
615  }
616}
617
618BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
619```
620
621## User-defined counters
622
623You can add your own counters with user-defined names. The example below
624will add columns "Foo", "Bar" and "Baz" in its output:
625
626```c++
627static void UserCountersExample1(benchmark::State& state) {
628  double numFoos = 0, numBars = 0, numBazs = 0;
629  for (auto _ : state) {
630    // ... count Foo,Bar,Baz events
631  }
632  state.counters["Foo"] = numFoos;
633  state.counters["Bar"] = numBars;
634  state.counters["Baz"] = numBazs;
635}
636```
637
638The `state.counters` object is a `std::map` with `std::string` keys
639and `Counter` values. The latter is a `double`-like class, via an implicit
640conversion to `double&`. Thus you can use all of the standard arithmetic
641assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
642
643In multithreaded benchmarks, each counter is set on the calling thread only.
644When the benchmark finishes, the counters from each thread will be summed;
645the resulting sum is the value which will be shown for the benchmark.
646
647The `Counter` constructor accepts two parameters: the value as a `double`
648and a bit flag which allows you to show counters as rates and/or as
649per-thread averages:
650
651```c++
652  // sets a simple counter
653  state.counters["Foo"] = numFoos;
654
655  // Set the counter as a rate. It will be presented divided
656  // by the duration of the benchmark.
657  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
658
659  // Set the counter as a thread-average quantity. It will
660  // be presented divided by the number of threads.
661  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
662
663  // There's also a combined flag:
664  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
665```
666
667When you're compiling in C++11 mode or later you can use `insert()` with
668`std::initializer_list`:
669
670```c++
671  // With C++11, this can be done:
672  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
673  // ... instead of:
674  state.counters["Foo"] = numFoos;
675  state.counters["Bar"] = numBars;
676  state.counters["Baz"] = numBazs;
677```
678
679### Counter reporting
680
681When using the console reporter, by default, user counters are are printed at
682the end after the table, the same way as ``bytes_processed`` and
683``items_processed``. This is best for cases in which there are few counters,
684or where there are only a couple of lines per benchmark. Here's an example of
685the default output:
686
687```
688------------------------------------------------------------------------------
689Benchmark                        Time           CPU Iterations UserCounters...
690------------------------------------------------------------------------------
691BM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
692BM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
693BM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
694BM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
695BM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
696BM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
697BM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
698BM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
699BM_Factorial                    26 ns         26 ns   26608979 40320
700BM_Factorial/real_time          26 ns         26 ns   26587936 40320
701BM_CalculatePiRange/1           16 ns         16 ns   45704255 0
702BM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
703BM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
704BM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
705```
706
707If this doesn't suit you, you can print each counter as a table column by
708passing the flag `--benchmark_counters_tabular=true` to the benchmark
709application. This is best for cases in which there are a lot of counters, or
710a lot of lines per individual benchmark. Note that this will trigger a
711reprinting of the table header any time the counter set changes between
712individual benchmarks. Here's an example of corresponding output when
713`--benchmark_counters_tabular=true` is passed:
714
715```
716---------------------------------------------------------------------------------------
717Benchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
718---------------------------------------------------------------------------------------
719BM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
720BM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
721BM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
722BM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
723BM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
724BM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
725BM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
726BM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
727--------------------------------------------------------------
728Benchmark                        Time           CPU Iterations
729--------------------------------------------------------------
730BM_Factorial                    26 ns         26 ns   26392245 40320
731BM_Factorial/real_time          26 ns         26 ns   26494107 40320
732BM_CalculatePiRange/1           15 ns         15 ns   45571597 0
733BM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
734BM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
735BM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
736BM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
737BM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
738BM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
739BM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
740BM_CalculatePi/threads:8      2255 ns       9943 ns      70936
741```
742Note above the additional header printed when the benchmark changes from
743``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
744not have the same counter set as ``BM_UserCounter``.
745
746## Exiting Benchmarks in Error
747
748When errors caused by external influences, such as file I/O and network
749communication, occur within a benchmark the
750`State::SkipWithError(const char* msg)` function can be used to skip that run
751of benchmark and report the error. Note that only future iterations of the
752`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
753Users must explicitly exit the loop, otherwise all iterations will be performed.
754Users may explicitly return to exit the benchmark immediately.
755
756The `SkipWithError(...)` function may be used at any point within the benchmark,
757including before and after the benchmark loop.
758
759For example:
760
761```c++
762static void BM_test(benchmark::State& state) {
763  auto resource = GetResource();
764  if (!resource.good()) {
765      state.SkipWithError("Resource is not good!");
766      // KeepRunning() loop will not be entered.
767  }
768  for (state.KeepRunning()) {
769      auto data = resource.read_data();
770      if (!resource.good()) {
771        state.SkipWithError("Failed to read data!");
772        break; // Needed to skip the rest of the iteration.
773     }
774     do_stuff(data);
775  }
776}
777
778static void BM_test_ranged_fo(benchmark::State & state) {
779  state.SkipWithError("test will not be entered");
780  for (auto _ : state) {
781    state.SkipWithError("Failed!");
782    break; // REQUIRED to prevent all further iterations.
783  }
784}
785```
786
787## Running a subset of the benchmarks
788
789The `--benchmark_filter=<regex>` option can be used to only run the benchmarks
790which match the specified `<regex>`. For example:
791
792```bash
793$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
794Run on (1 X 2300 MHz CPU )
7952016-06-25 19:34:24
796Benchmark              Time           CPU Iterations
797----------------------------------------------------
798BM_memcpy/32          11 ns         11 ns   79545455
799BM_memcpy/32k       2181 ns       2185 ns     324074
800BM_memcpy/32          12 ns         12 ns   54687500
801BM_memcpy/32k       1834 ns       1837 ns     357143
802```
803
804
805## Output Formats
806The library supports multiple output formats. Use the
807`--benchmark_format=<console|json|csv>` flag to set the format type. `console`
808is the default format.
809
810The Console format is intended to be a human readable format. By default
811the format generates color output. Context is output on stderr and the
812tabular data on stdout. Example tabular output looks like:
813```
814Benchmark                               Time(ns)    CPU(ns) Iterations
815----------------------------------------------------------------------
816BM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
817BM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
818BM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
819```
820
821The JSON format outputs human readable json split into two top level attributes.
822The `context` attribute contains information about the run in general, including
823information about the CPU and the date.
824The `benchmarks` attribute contains a list of ever benchmark run. Example json
825output looks like:
826```json
827{
828  "context": {
829    "date": "2015/03/17-18:40:25",
830    "num_cpus": 40,
831    "mhz_per_cpu": 2801,
832    "cpu_scaling_enabled": false,
833    "build_type": "debug"
834  },
835  "benchmarks": [
836    {
837      "name": "BM_SetInsert/1024/1",
838      "iterations": 94877,
839      "real_time": 29275,
840      "cpu_time": 29836,
841      "bytes_per_second": 134066,
842      "items_per_second": 33516
843    },
844    {
845      "name": "BM_SetInsert/1024/8",
846      "iterations": 21609,
847      "real_time": 32317,
848      "cpu_time": 32429,
849      "bytes_per_second": 986770,
850      "items_per_second": 246693
851    },
852    {
853      "name": "BM_SetInsert/1024/10",
854      "iterations": 21393,
855      "real_time": 32724,
856      "cpu_time": 33355,
857      "bytes_per_second": 1199226,
858      "items_per_second": 299807
859    }
860  ]
861}
862```
863
864The CSV format outputs comma-separated values. The `context` is output on stderr
865and the CSV itself on stdout. Example CSV output looks like:
866```
867name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
868"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
869"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
870"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
871```
872
873## Output Files
874The library supports writing the output of the benchmark to a file specified
875by `--benchmark_out=<filename>`. The format of the output can be specified
876using `--benchmark_out_format={json|console|csv}`. Specifying
877`--benchmark_out` does not suppress the console output.
878
879## Debug vs Release
880By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
881
882```
883cmake -DCMAKE_BUILD_TYPE=Release
884```
885
886To enable link-time optimisation, use
887
888```
889cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
890```
891
892If you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake cache variables, if autodetection fails.
893If you are using clang, you may need to set `LLVMAR_EXECUTABLE`, `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables.
894
895## Linking against the library
896When using gcc, it is necessary to link against pthread to avoid runtime exceptions.
897This is due to how gcc implements std::thread.
898See [issue #67](https://github.com/google/benchmark/issues/67) for more details.
899
900## Compiler Support
901
902Google Benchmark uses C++11 when building the library. As such we require
903a modern C++ toolchain, both compiler and standard library.
904
905The following minimum versions are strongly recommended build the library:
906
907* GCC 4.8
908* Clang 3.4
909* Visual Studio 2013
910* Intel 2015 Update 1
911
912Anything older *may* work.
913
914Note: Using the library and its headers in C++03 is supported. C++11 is only
915required to build the library.
916
917## Disable CPU frequency scaling
918If you see this error:
919```
920***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
921```
922you might want to disable the CPU frequency scaling while running the benchmark:
923```bash
924sudo cpupower frequency-set --governor performance
925./mybench
926sudo cpupower frequency-set --governor powersave
927```
928
929# Known Issues
930
931### Windows
932
933* Users must manually link `shlwapi.lib`. Failure to do so may result
934in unresolved symbols.
935
936