1# Benchmark 2 3[](https://travis-ci.org/google/benchmark) 4[](https://ci.appveyor.com/project/google/benchmark/branch/master) 5[](https://coveralls.io/r/google/benchmark) 6 7A library to benchmark code snippets, similar to unit tests. Example: 8 9```c++ 10#include <benchmark/benchmark.h> 11 12static void BM_SomeFunction(benchmark::State& state) { 13 // Perform setup here 14 for (auto _ : state) { 15 // This code gets timed 16 SomeFunction(); 17 } 18} 19// Register the function as a benchmark 20BENCHMARK(BM_SomeFunction); 21// Run the benchmark 22BENCHMARK_MAIN(); 23``` 24 25To get started, see [Requirements](#requirements) and 26[Installation](#installation). See [Usage](#usage) for a full example and the 27[User Guide](#user-guide) for a more comprehensive feature overview. 28 29It may also help to read the [Google Test documentation](https://github.com/google/googletest/blob/master/googletest/docs/primer.md) 30as some of the structural aspects of the APIs are similar. 31 32### Resources 33 34[Discussion group](https://groups.google.com/d/forum/benchmark-discuss) 35 36IRC channel: [freenode](https://freenode.net) #googlebenchmark 37 38[Additional Tooling Documentation](docs/tools.md) 39 40[Assembly Testing Documentation](docs/AssemblyTests.md) 41 42## Requirements 43 44The library can be used with C++03. However, it requires C++11 to build, 45including compiler and standard library support. 46 47The following minimum versions are required to build the library: 48 49* GCC 4.8 50* Clang 3.4 51* Visual Studio 14 2015 52* Intel 2015 Update 1 53 54See [Platform-Specific Build Instructions](#platform-specific-build-instructions). 55 56## Installation 57 58This describes the installation process using cmake. As pre-requisites, you'll 59need git and cmake installed. 60 61_See [dependencies.md](dependencies.md) for more details regarding supported 62versions of build tools._ 63 64```bash 65# Check out the library. 66$ git clone https://github.com/google/benchmark.git 67# Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory. 68$ git clone https://github.com/google/googletest.git benchmark/googletest 69# Go to the library root directory 70$ cd benchmark 71# Make a build directory to place the build output. 72$ cmake -E make_directory "build" 73# Generate build system files with cmake. 74$ cmake -E chdir "build" cmake -DCMAKE_BUILD_TYPE=Release ../ 75# or, starting with CMake 3.13, use a simpler form: 76# cmake -DCMAKE_BUILD_TYPE=Release -S . -B "build" 77# Build the library. 78$ cmake --build "build" --config Release 79``` 80This builds the `benchmark` and `benchmark_main` libraries and tests. 81On a unix system, the build directory should now look something like this: 82 83``` 84/benchmark 85 /build 86 /src 87 /libbenchmark.a 88 /libbenchmark_main.a 89 /test 90 ... 91``` 92 93Next, you can run the tests to check the build. 94 95```bash 96$ cmake -E chdir "build" ctest --build-config Release 97``` 98 99If you want to install the library globally, also run: 100 101``` 102sudo cmake --build "build" --config Release --target install 103``` 104 105Note that Google Benchmark requires Google Test to build and run the tests. This 106dependency can be provided two ways: 107 108* Checkout the Google Test sources into `benchmark/googletest` as above. 109* Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during 110 configuration, the library will automatically download and build any required 111 dependencies. 112 113If you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF` 114to `CMAKE_ARGS`. 115 116### Debug vs Release 117 118By default, benchmark builds as a debug library. You will see a warning in the 119output when this is the case. To build it as a release library instead, add 120`-DCMAKE_BUILD_TYPE=Release` when generating the build system files, as shown 121above. The use of `--config Release` in build commands is needed to properly 122support multi-configuration tools (like Visual Studio for example) and can be 123skipped for other build systems (like Makefile). 124 125To enable link-time optimisation, also add `-DBENCHMARK_ENABLE_LTO=true` when 126generating the build system files. 127 128If you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake 129cache variables, if autodetection fails. 130 131If you are using clang, you may need to set `LLVMAR_EXECUTABLE`, 132`LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables. 133 134### Stable and Experimental Library Versions 135 136The main branch contains the latest stable version of the benchmarking library; 137the API of which can be considered largely stable, with source breaking changes 138being made only upon the release of a new major version. 139 140Newer, experimental, features are implemented and tested on the 141[`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish 142to use, test, and provide feedback on the new features are encouraged to try 143this branch. However, this branch provides no stability guarantees and reserves 144the right to change and break the API at any time. 145 146## Usage 147 148### Basic usage 149 150Define a function that executes the code to measure, register it as a benchmark 151function using the `BENCHMARK` macro, and ensure an appropriate `main` function 152is available: 153 154```c++ 155#include <benchmark/benchmark.h> 156 157static void BM_StringCreation(benchmark::State& state) { 158 for (auto _ : state) 159 std::string empty_string; 160} 161// Register the function as a benchmark 162BENCHMARK(BM_StringCreation); 163 164// Define another benchmark 165static void BM_StringCopy(benchmark::State& state) { 166 std::string x = "hello"; 167 for (auto _ : state) 168 std::string copy(x); 169} 170BENCHMARK(BM_StringCopy); 171 172BENCHMARK_MAIN(); 173``` 174 175To run the benchmark, compile and link against the `benchmark` library 176(libbenchmark.a/.so). If you followed the build steps above, this library will 177be under the build directory you created. 178 179```bash 180# Example on linux after running the build steps above. Assumes the 181# `benchmark` and `build` directories are under the current directory. 182$ g++ mybenchmark.cc -std=c++11 -isystem benchmark/include \ 183 -Lbenchmark/build/src -lbenchmark -lpthread -o mybenchmark 184``` 185 186Alternatively, link against the `benchmark_main` library and remove 187`BENCHMARK_MAIN();` above to get the same behavior. 188 189The compiled executable will run all benchmarks by default. Pass the `--help` 190flag for option information or see the guide below. 191 192### Usage with CMake 193 194If using CMake, it is recommended to link against the project-provided 195`benchmark::benchmark` and `benchmark::benchmark_main` targets using 196`target_link_libraries`. 197It is possible to use ```find_package``` to import an installed version of the 198library. 199```cmake 200find_package(benchmark REQUIRED) 201``` 202Alternatively, ```add_subdirectory``` will incorporate the library directly in 203to one's CMake project. 204```cmake 205add_subdirectory(benchmark) 206``` 207Either way, link to the library as follows. 208```cmake 209target_link_libraries(MyTarget benchmark::benchmark) 210``` 211 212## Platform Specific Build Instructions 213 214### Building with GCC 215 216When the library is built using GCC it is necessary to link with the pthread 217library due to how GCC implements `std::thread`. Failing to link to pthread will 218lead to runtime exceptions (unless you're using libc++), not linker errors. See 219[issue #67](https://github.com/google/benchmark/issues/67) for more details. You 220can link to pthread by adding `-pthread` to your linker command. Note, you can 221also use `-lpthread`, but there are potential issues with ordering of command 222line parameters if you use that. 223 224### Building with Visual Studio 2015 or 2017 225 226The `shlwapi` library (`-lshlwapi`) is required to support a call to `CPUInfo` which reads the registry. Either add `shlwapi.lib` under `[ Configuration Properties > Linker > Input ]`, or use the following: 227 228``` 229// Alternatively, can add libraries using linker options. 230#ifdef _WIN32 231#pragma comment ( lib, "Shlwapi.lib" ) 232#ifdef _DEBUG 233#pragma comment ( lib, "benchmarkd.lib" ) 234#else 235#pragma comment ( lib, "benchmark.lib" ) 236#endif 237#endif 238``` 239 240Can also use the graphical version of CMake: 241* Open `CMake GUI`. 242* Under `Where to build the binaries`, same path as source plus `build`. 243* Under `CMAKE_INSTALL_PREFIX`, same path as source plus `install`. 244* Click `Configure`, `Generate`, `Open Project`. 245* If build fails, try deleting entire directory and starting again, or unticking options to build less. 246 247### Building with Intel 2015 Update 1 or Intel System Studio Update 4 248 249See instructions for building with Visual Studio. Once built, right click on the solution and change the build to Intel. 250 251### Building on Solaris 252 253If you're running benchmarks on solaris, you'll want the kstat library linked in 254too (`-lkstat`). 255 256## User Guide 257 258### Command Line 259 260[Output Formats](#output-formats) 261 262[Output Files](#output-files) 263 264[Running Benchmarks](#running-benchmarks) 265 266[Running a Subset of Benchmarks](#running-a-subset-of-benchmarks) 267 268[Result Comparison](#result-comparison) 269 270### Library 271 272[Runtime and Reporting Considerations](#runtime-and-reporting-considerations) 273 274[Passing Arguments](#passing-arguments) 275 276[Calculating Asymptotic Complexity](#asymptotic-complexity) 277 278[Templated Benchmarks](#templated-benchmarks) 279 280[Fixtures](#fixtures) 281 282[Custom Counters](#custom-counters) 283 284[Multithreaded Benchmarks](#multithreaded-benchmarks) 285 286[CPU Timers](#cpu-timers) 287 288[Manual Timing](#manual-timing) 289 290[Setting the Time Unit](#setting-the-time-unit) 291 292[Preventing Optimization](#preventing-optimization) 293 294[Reporting Statistics](#reporting-statistics) 295 296[Custom Statistics](#custom-statistics) 297 298[Using RegisterBenchmark](#using-register-benchmark) 299 300[Exiting with an Error](#exiting-with-an-error) 301 302[A Faster KeepRunning Loop](#a-faster-keep-running-loop) 303 304[Disabling CPU Frequency Scaling](#disabling-cpu-frequency-scaling) 305 306 307<a name="output-formats" /> 308 309### Output Formats 310 311The library supports multiple output formats. Use the 312`--benchmark_format=<console|json|csv>` flag (or set the 313`BENCHMARK_FORMAT=<console|json|csv>` environment variable) to set 314the format type. `console` is the default format. 315 316The Console format is intended to be a human readable format. By default 317the format generates color output. Context is output on stderr and the 318tabular data on stdout. Example tabular output looks like: 319 320``` 321Benchmark Time(ns) CPU(ns) Iterations 322---------------------------------------------------------------------- 323BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 324BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 325BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 326``` 327 328The JSON format outputs human readable json split into two top level attributes. 329The `context` attribute contains information about the run in general, including 330information about the CPU and the date. 331The `benchmarks` attribute contains a list of every benchmark run. Example json 332output looks like: 333 334```json 335{ 336 "context": { 337 "date": "2015/03/17-18:40:25", 338 "num_cpus": 40, 339 "mhz_per_cpu": 2801, 340 "cpu_scaling_enabled": false, 341 "build_type": "debug" 342 }, 343 "benchmarks": [ 344 { 345 "name": "BM_SetInsert/1024/1", 346 "iterations": 94877, 347 "real_time": 29275, 348 "cpu_time": 29836, 349 "bytes_per_second": 134066, 350 "items_per_second": 33516 351 }, 352 { 353 "name": "BM_SetInsert/1024/8", 354 "iterations": 21609, 355 "real_time": 32317, 356 "cpu_time": 32429, 357 "bytes_per_second": 986770, 358 "items_per_second": 246693 359 }, 360 { 361 "name": "BM_SetInsert/1024/10", 362 "iterations": 21393, 363 "real_time": 32724, 364 "cpu_time": 33355, 365 "bytes_per_second": 1199226, 366 "items_per_second": 299807 367 } 368 ] 369} 370``` 371 372The CSV format outputs comma-separated values. The `context` is output on stderr 373and the CSV itself on stdout. Example CSV output looks like: 374 375``` 376name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 377"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 378"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 379"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 380``` 381 382<a name="output-files" /> 383 384### Output Files 385 386Write benchmark results to a file with the `--benchmark_out=<filename>` option 387(or set `BENCHMARK_OUT`). Specify the output format with 388`--benchmark_out_format={json|console|csv}` (or set 389`BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that specifying 390`--benchmark_out` does not suppress the console output. 391 392<a name="running-benchmarks" /> 393 394### Running Benchmarks 395 396Benchmarks are executed by running the produced binaries. Benchmarks binaries, 397by default, accept options that may be specified either through their command 398line interface or by setting environment variables before execution. For every 399`--option_flag=<value>` CLI switch, a corresponding environment variable 400`OPTION_FLAG=<value>` exist and is used as default if set (CLI switches always 401 prevails). A complete list of CLI options is available running benchmarks 402 with the `--help` switch. 403 404<a name="running-a-subset-of-benchmarks" /> 405 406### Running a Subset of Benchmarks 407 408The `--benchmark_filter=<regex>` option (or `BENCHMARK_FILTER=<regex>` 409environment variable) can be used to only run the benchmarks that match 410the specified `<regex>`. For example: 411 412```bash 413$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 414Run on (1 X 2300 MHz CPU ) 4152016-06-25 19:34:24 416Benchmark Time CPU Iterations 417---------------------------------------------------- 418BM_memcpy/32 11 ns 11 ns 79545455 419BM_memcpy/32k 2181 ns 2185 ns 324074 420BM_memcpy/32 12 ns 12 ns 54687500 421BM_memcpy/32k 1834 ns 1837 ns 357143 422``` 423 424<a name="result-comparison" /> 425 426### Result comparison 427 428It is possible to compare the benchmarking results. 429See [Additional Tooling Documentation](docs/tools.md) 430 431<a name="runtime-and-reporting-considerations" /> 432 433### Runtime and Reporting Considerations 434 435When the benchmark binary is executed, each benchmark function is run serially. 436The number of iterations to run is determined dynamically by running the 437benchmark a few times and measuring the time taken and ensuring that the 438ultimate result will be statistically stable. As such, faster benchmark 439functions will be run for more iterations than slower benchmark functions, and 440the number of iterations is thus reported. 441 442In all cases, the number of iterations for which the benchmark is run is 443governed by the amount of time the benchmark takes. Concretely, the number of 444iterations is at least one, not more than 1e9, until CPU time is greater than 445the minimum time, or the wallclock time is 5x minimum time. The minimum time is 446set per benchmark by calling `MinTime` on the registered benchmark object. 447 448Average timings are then reported over the iterations run. If multiple 449repetitions are requested using the `--benchmark_repetitions` command-line 450option, or at registration time, the benchmark function will be run several 451times and statistical results across these repetitions will also be reported. 452 453As well as the per-benchmark entries, a preamble in the report will include 454information about the machine on which the benchmarks are run. 455 456<a name="passing-arguments" /> 457 458### Passing Arguments 459 460Sometimes a family of benchmarks can be implemented with just one routine that 461takes an extra argument to specify which one of the family of benchmarks to 462run. For example, the following code defines a family of benchmarks for 463measuring the speed of `memcpy()` calls of different lengths: 464 465```c++ 466static void BM_memcpy(benchmark::State& state) { 467 char* src = new char[state.range(0)]; 468 char* dst = new char[state.range(0)]; 469 memset(src, 'x', state.range(0)); 470 for (auto _ : state) 471 memcpy(dst, src, state.range(0)); 472 state.SetBytesProcessed(int64_t(state.iterations()) * 473 int64_t(state.range(0))); 474 delete[] src; 475 delete[] dst; 476} 477BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); 478``` 479 480The preceding code is quite repetitive, and can be replaced with the following 481short-hand. The following invocation will pick a few appropriate arguments in 482the specified range and will generate a benchmark for each such argument. 483 484```c++ 485BENCHMARK(BM_memcpy)->Range(8, 8<<10); 486``` 487 488By default the arguments in the range are generated in multiples of eight and 489the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the 490range multiplier is changed to multiples of two. 491 492```c++ 493BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); 494``` 495 496Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. 497 498The preceding code shows a method of defining a sparse range. The following 499example shows a method of defining a dense range. It is then used to benchmark 500the performance of `std::vector` initialization for uniformly increasing sizes. 501 502```c++ 503static void BM_DenseRange(benchmark::State& state) { 504 for(auto _ : state) { 505 std::vector<int> v(state.range(0), state.range(0)); 506 benchmark::DoNotOptimize(v.data()); 507 benchmark::ClobberMemory(); 508 } 509} 510BENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128); 511``` 512 513Now arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ]. 514 515You might have a benchmark that depends on two or more inputs. For example, the 516following code defines a family of benchmarks for measuring the speed of set 517insertion. 518 519```c++ 520static void BM_SetInsert(benchmark::State& state) { 521 std::set<int> data; 522 for (auto _ : state) { 523 state.PauseTiming(); 524 data = ConstructRandomSet(state.range(0)); 525 state.ResumeTiming(); 526 for (int j = 0; j < state.range(1); ++j) 527 data.insert(RandomNumber()); 528 } 529} 530BENCHMARK(BM_SetInsert) 531 ->Args({1<<10, 128}) 532 ->Args({2<<10, 128}) 533 ->Args({4<<10, 128}) 534 ->Args({8<<10, 128}) 535 ->Args({1<<10, 512}) 536 ->Args({2<<10, 512}) 537 ->Args({4<<10, 512}) 538 ->Args({8<<10, 512}); 539``` 540 541The preceding code is quite repetitive, and can be replaced with the following 542short-hand. The following macro will pick a few appropriate arguments in the 543product of the two specified ranges and will generate a benchmark for each such 544pair. 545 546```c++ 547BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}}); 548``` 549 550Some benchmarks may require specific argument values that cannot be expressed 551with `Ranges`. In this case, `ArgsProduct` offers the ability to generate a 552benchmark input for each combination in the product of the supplied vectors. 553 554```c++ 555BENCHMARK(BM_SetInsert) 556 ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}}) 557// would generate the same benchmark arguments as 558BENCHMARK(BM_SetInsert) 559 ->Args({1<<10, 20}) 560 ->Args({3<<10, 20}) 561 ->Args({8<<10, 20}) 562 ->Args({3<<10, 40}) 563 ->Args({8<<10, 40}) 564 ->Args({1<<10, 40}) 565 ->Args({1<<10, 60}) 566 ->Args({3<<10, 60}) 567 ->Args({8<<10, 60}) 568 ->Args({1<<10, 80}) 569 ->Args({3<<10, 80}) 570 ->Args({8<<10, 80}); 571``` 572 573For more complex patterns of inputs, passing a custom function to `Apply` allows 574programmatic specification of an arbitrary set of arguments on which to run the 575benchmark. The following example enumerates a dense range on one parameter, 576and a sparse range on the second. 577 578```c++ 579static void CustomArguments(benchmark::internal::Benchmark* b) { 580 for (int i = 0; i <= 10; ++i) 581 for (int j = 32; j <= 1024*1024; j *= 8) 582 b->Args({i, j}); 583} 584BENCHMARK(BM_SetInsert)->Apply(CustomArguments); 585``` 586 587#### Passing Arbitrary Arguments to a Benchmark 588 589In C++11 it is possible to define a benchmark that takes an arbitrary number 590of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` 591macro creates a benchmark that invokes `func` with the `benchmark::State` as 592the first argument followed by the specified `args...`. 593The `test_case_name` is appended to the name of the benchmark and 594should describe the values passed. 595 596```c++ 597template <class ...ExtraArgs> 598void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) { 599 [...] 600} 601// Registers a benchmark named "BM_takes_args/int_string_test" that passes 602// the specified values to `extra_args`. 603BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); 604``` 605 606Note that elements of `...args` may refer to global variables. Users should 607avoid modifying global state inside of a benchmark. 608 609<a name="asymptotic-complexity" /> 610 611### Calculating Asymptotic Complexity (Big O) 612 613Asymptotic complexity might be calculated for a family of benchmarks. The 614following code will calculate the coefficient for the high-order term in the 615running time and the normalized root-mean square error of string comparison. 616 617```c++ 618static void BM_StringCompare(benchmark::State& state) { 619 std::string s1(state.range(0), '-'); 620 std::string s2(state.range(0), '-'); 621 for (auto _ : state) { 622 benchmark::DoNotOptimize(s1.compare(s2)); 623 } 624 state.SetComplexityN(state.range(0)); 625} 626BENCHMARK(BM_StringCompare) 627 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); 628``` 629 630As shown in the following invocation, asymptotic complexity might also be 631calculated automatically. 632 633```c++ 634BENCHMARK(BM_StringCompare) 635 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); 636``` 637 638The following code will specify asymptotic complexity with a lambda function, 639that might be used to customize high-order term calculation. 640 641```c++ 642BENCHMARK(BM_StringCompare)->RangeMultiplier(2) 643 ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; }); 644``` 645 646<a name="templated-benchmarks" /> 647 648### Templated Benchmarks 649 650This example produces and consumes messages of size `sizeof(v)` `range_x` 651times. It also outputs throughput in the absence of multiprogramming. 652 653```c++ 654template <class Q> void BM_Sequential(benchmark::State& state) { 655 Q q; 656 typename Q::value_type v; 657 for (auto _ : state) { 658 for (int i = state.range(0); i--; ) 659 q.push(v); 660 for (int e = state.range(0); e--; ) 661 q.Wait(&v); 662 } 663 // actually messages, not bytes: 664 state.SetBytesProcessed( 665 static_cast<int64_t>(state.iterations())*state.range(0)); 666} 667BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 668``` 669 670Three macros are provided for adding benchmark templates. 671 672```c++ 673#ifdef BENCHMARK_HAS_CXX11 674#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. 675#else // C++ < C++11 676#define BENCHMARK_TEMPLATE(func, arg1) 677#endif 678#define BENCHMARK_TEMPLATE1(func, arg1) 679#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 680``` 681 682<a name="fixtures" /> 683 684### Fixtures 685 686Fixture tests are created by first defining a type that derives from 687`::benchmark::Fixture` and then creating/registering the tests using the 688following macros: 689 690* `BENCHMARK_F(ClassName, Method)` 691* `BENCHMARK_DEFINE_F(ClassName, Method)` 692* `BENCHMARK_REGISTER_F(ClassName, Method)` 693 694For Example: 695 696```c++ 697class MyFixture : public benchmark::Fixture { 698public: 699 void SetUp(const ::benchmark::State& state) { 700 } 701 702 void TearDown(const ::benchmark::State& state) { 703 } 704}; 705 706BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 707 for (auto _ : st) { 708 ... 709 } 710} 711 712BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 713 for (auto _ : st) { 714 ... 715 } 716} 717/* BarTest is NOT registered */ 718BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 719/* BarTest is now registered */ 720``` 721 722#### Templated Fixtures 723 724Also you can create templated fixture by using the following macros: 725 726* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)` 727* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)` 728 729For example: 730 731```c++ 732template<typename T> 733class MyFixture : public benchmark::Fixture {}; 734 735BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) { 736 for (auto _ : st) { 737 ... 738 } 739} 740 741BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) { 742 for (auto _ : st) { 743 ... 744 } 745} 746 747BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2); 748``` 749 750<a name="custom-counters" /> 751 752### Custom Counters 753 754You can add your own counters with user-defined names. The example below 755will add columns "Foo", "Bar" and "Baz" in its output: 756 757```c++ 758static void UserCountersExample1(benchmark::State& state) { 759 double numFoos = 0, numBars = 0, numBazs = 0; 760 for (auto _ : state) { 761 // ... count Foo,Bar,Baz events 762 } 763 state.counters["Foo"] = numFoos; 764 state.counters["Bar"] = numBars; 765 state.counters["Baz"] = numBazs; 766} 767``` 768 769The `state.counters` object is a `std::map` with `std::string` keys 770and `Counter` values. The latter is a `double`-like class, via an implicit 771conversion to `double&`. Thus you can use all of the standard arithmetic 772assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. 773 774In multithreaded benchmarks, each counter is set on the calling thread only. 775When the benchmark finishes, the counters from each thread will be summed; 776the resulting sum is the value which will be shown for the benchmark. 777 778The `Counter` constructor accepts three parameters: the value as a `double` 779; a bit flag which allows you to show counters as rates, and/or as per-thread 780iteration, and/or as per-thread averages, and/or iteration invariants, 781and/or finally inverting the result; and a flag specifying the 'unit' - i.e. 782is 1k a 1000 (default, `benchmark::Counter::OneK::kIs1000`), or 1024 783(`benchmark::Counter::OneK::kIs1024`)? 784 785```c++ 786 // sets a simple counter 787 state.counters["Foo"] = numFoos; 788 789 // Set the counter as a rate. It will be presented divided 790 // by the duration of the benchmark. 791 // Meaning: per one second, how many 'foo's are processed? 792 state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); 793 794 // Set the counter as a rate. It will be presented divided 795 // by the duration of the benchmark, and the result inverted. 796 // Meaning: how many seconds it takes to process one 'foo'? 797 state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert); 798 799 // Set the counter as a thread-average quantity. It will 800 // be presented divided by the number of threads. 801 state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); 802 803 // There's also a combined flag: 804 state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); 805 806 // This says that we process with the rate of state.range(0) bytes every iteration: 807 state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); 808``` 809 810When you're compiling in C++11 mode or later you can use `insert()` with 811`std::initializer_list`: 812 813```c++ 814 // With C++11, this can be done: 815 state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); 816 // ... instead of: 817 state.counters["Foo"] = numFoos; 818 state.counters["Bar"] = numBars; 819 state.counters["Baz"] = numBazs; 820``` 821 822#### Counter Reporting 823 824When using the console reporter, by default, user counters are printed at 825the end after the table, the same way as ``bytes_processed`` and 826``items_processed``. This is best for cases in which there are few counters, 827or where there are only a couple of lines per benchmark. Here's an example of 828the default output: 829 830``` 831------------------------------------------------------------------------------ 832Benchmark Time CPU Iterations UserCounters... 833------------------------------------------------------------------------------ 834BM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 835BM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m 836BM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 837BM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 838BM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 839BM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 840BM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 841BM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 842BM_Factorial 26 ns 26 ns 26608979 40320 843BM_Factorial/real_time 26 ns 26 ns 26587936 40320 844BM_CalculatePiRange/1 16 ns 16 ns 45704255 0 845BM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 846BM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 847BM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 848``` 849 850If this doesn't suit you, you can print each counter as a table column by 851passing the flag `--benchmark_counters_tabular=true` to the benchmark 852application. This is best for cases in which there are a lot of counters, or 853a lot of lines per individual benchmark. Note that this will trigger a 854reprinting of the table header any time the counter set changes between 855individual benchmarks. Here's an example of corresponding output when 856`--benchmark_counters_tabular=true` is passed: 857 858``` 859--------------------------------------------------------------------------------------- 860Benchmark Time CPU Iterations Bar Bat Baz Foo 861--------------------------------------------------------------------------------------- 862BM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 863BM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 864BM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 865BM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 866BM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 867BM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 868BM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 869BM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 870-------------------------------------------------------------- 871Benchmark Time CPU Iterations 872-------------------------------------------------------------- 873BM_Factorial 26 ns 26 ns 26392245 40320 874BM_Factorial/real_time 26 ns 26 ns 26494107 40320 875BM_CalculatePiRange/1 15 ns 15 ns 45571597 0 876BM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 877BM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 878BM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 879BM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 880BM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 881BM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 882BM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 883BM_CalculatePi/threads:8 2255 ns 9943 ns 70936 884``` 885 886Note above the additional header printed when the benchmark changes from 887``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does 888not have the same counter set as ``BM_UserCounter``. 889 890<a name="multithreaded-benchmarks"/> 891 892### Multithreaded Benchmarks 893 894In a multithreaded test (benchmark invoked by multiple threads simultaneously), 895it is guaranteed that none of the threads will start until all have reached 896the start of the benchmark loop, and all will have finished before any thread 897exits the benchmark loop. (This behavior is also provided by the `KeepRunning()` 898API) As such, any global setup or teardown can be wrapped in a check against the thread 899index: 900 901```c++ 902static void BM_MultiThreaded(benchmark::State& state) { 903 if (state.thread_index == 0) { 904 // Setup code here. 905 } 906 for (auto _ : state) { 907 // Run the test as normal. 908 } 909 if (state.thread_index == 0) { 910 // Teardown code here. 911 } 912} 913BENCHMARK(BM_MultiThreaded)->Threads(2); 914``` 915 916If the benchmarked code itself uses threads and you want to compare it to 917single-threaded code, you may want to use real-time ("wallclock") measurements 918for latency comparisons: 919 920```c++ 921BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 922``` 923 924Without `UseRealTime`, CPU time is used by default. 925 926<a name="cpu-timers" /> 927 928### CPU Timers 929 930By default, the CPU timer only measures the time spent by the main thread. 931If the benchmark itself uses threads internally, this measurement may not 932be what you are looking for. Instead, there is a way to measure the total 933CPU usage of the process, by all the threads. 934 935```c++ 936void callee(int i); 937 938static void MyMain(int size) { 939#pragma omp parallel for 940 for(int i = 0; i < size; i++) 941 callee(i); 942} 943 944static void BM_OpenMP(benchmark::State& state) { 945 for (auto _ : state) 946 MyMain(state.range(0)); 947} 948 949// Measure the time spent by the main thread, use it to decide for how long to 950// run the benchmark loop. Depending on the internal implementation detail may 951// measure to anywhere from near-zero (the overhead spent before/after work 952// handoff to worker thread[s]) to the whole single-thread time. 953BENCHMARK(BM_OpenMP)->Range(8, 8<<10); 954 955// Measure the user-visible time, the wall clock (literally, the time that 956// has passed on the clock on the wall), use it to decide for how long to 957// run the benchmark loop. This will always be meaningful, an will match the 958// time spent by the main thread in single-threaded case, in general decreasing 959// with the number of internal threads doing the work. 960BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime(); 961 962// Measure the total CPU consumption, use it to decide for how long to 963// run the benchmark loop. This will always measure to no less than the 964// time spent by the main thread in single-threaded case. 965BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime(); 966 967// A mixture of the last two. Measure the total CPU consumption, but use the 968// wall clock to decide for how long to run the benchmark loop. 969BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime(); 970``` 971 972#### Controlling Timers 973 974Normally, the entire duration of the work loop (`for (auto _ : state) {}`) 975is measured. But sometimes, it is necessary to do some work inside of 976that loop, every iteration, but without counting that time to the benchmark time. 977That is possible, although it is not recommended, since it has high overhead. 978 979```c++ 980static void BM_SetInsert_With_Timer_Control(benchmark::State& state) { 981 std::set<int> data; 982 for (auto _ : state) { 983 state.PauseTiming(); // Stop timers. They will not count until they are resumed. 984 data = ConstructRandomSet(state.range(0)); // Do something that should not be measured 985 state.ResumeTiming(); // And resume timers. They are now counting again. 986 // The rest will be measured. 987 for (int j = 0; j < state.range(1); ++j) 988 data.insert(RandomNumber()); 989 } 990} 991BENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}}); 992``` 993 994<a name="manual-timing" /> 995 996### Manual Timing 997 998For benchmarking something for which neither CPU time nor real-time are 999correct or accurate enough, completely manual timing is supported using 1000the `UseManualTime` function. 1001 1002When `UseManualTime` is used, the benchmarked code must call 1003`SetIterationTime` once per iteration of the benchmark loop to 1004report the manually measured time. 1005 1006An example use case for this is benchmarking GPU execution (e.g. OpenCL 1007or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot 1008be accurately measured using CPU time or real-time. Instead, they can be 1009measured accurately using a dedicated API, and these measurement results 1010can be reported back with `SetIterationTime`. 1011 1012```c++ 1013static void BM_ManualTiming(benchmark::State& state) { 1014 int microseconds = state.range(0); 1015 std::chrono::duration<double, std::micro> sleep_duration { 1016 static_cast<double>(microseconds) 1017 }; 1018 1019 for (auto _ : state) { 1020 auto start = std::chrono::high_resolution_clock::now(); 1021 // Simulate some useful workload with a sleep 1022 std::this_thread::sleep_for(sleep_duration); 1023 auto end = std::chrono::high_resolution_clock::now(); 1024 1025 auto elapsed_seconds = 1026 std::chrono::duration_cast<std::chrono::duration<double>>( 1027 end - start); 1028 1029 state.SetIterationTime(elapsed_seconds.count()); 1030 } 1031} 1032BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); 1033``` 1034 1035<a name="setting-the-time-unit" /> 1036 1037### Setting the Time Unit 1038 1039If a benchmark runs a few milliseconds it may be hard to visually compare the 1040measured times, since the output data is given in nanoseconds per default. In 1041order to manually set the time unit, you can specify it manually: 1042 1043```c++ 1044BENCHMARK(BM_test)->Unit(benchmark::kMillisecond); 1045``` 1046 1047<a name="preventing-optimization" /> 1048 1049### Preventing Optimization 1050 1051To prevent a value or expression from being optimized away by the compiler 1052the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` 1053functions can be used. 1054 1055```c++ 1056static void BM_test(benchmark::State& state) { 1057 for (auto _ : state) { 1058 int x = 0; 1059 for (int i=0; i < 64; ++i) { 1060 benchmark::DoNotOptimize(x += i); 1061 } 1062 } 1063} 1064``` 1065 1066`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either 1067memory or a register. For GNU based compilers it acts as read/write barrier 1068for global memory. More specifically it forces the compiler to flush pending 1069writes to memory and reload any other values as necessary. 1070 1071Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` 1072in any way. `<expr>` may even be removed entirely when the result is already 1073known. For example: 1074 1075```c++ 1076 /* Example 1: `<expr>` is removed entirely. */ 1077 int foo(int x) { return x + 42; } 1078 while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); 1079 1080 /* Example 2: Result of '<expr>' is only reused */ 1081 int bar(int) __attribute__((const)); 1082 while (...) DoNotOptimize(bar(0)); // Optimized to: 1083 // int __result__ = bar(0); 1084 // while (...) DoNotOptimize(__result__); 1085``` 1086 1087The second tool for preventing optimizations is `ClobberMemory()`. In essence 1088`ClobberMemory()` forces the compiler to perform all pending writes to global 1089memory. Memory managed by block scope objects must be "escaped" using 1090`DoNotOptimize(...)` before it can be clobbered. In the below example 1091`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized 1092away. 1093 1094```c++ 1095static void BM_vector_push_back(benchmark::State& state) { 1096 for (auto _ : state) { 1097 std::vector<int> v; 1098 v.reserve(1); 1099 benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered. 1100 v.push_back(42); 1101 benchmark::ClobberMemory(); // Force 42 to be written to memory. 1102 } 1103} 1104``` 1105 1106Note that `ClobberMemory()` is only available for GNU or MSVC based compilers. 1107 1108<a name="reporting-statistics" /> 1109 1110### Statistics: Reporting the Mean, Median and Standard Deviation of Repeated Benchmarks 1111 1112By default each benchmark is run once and that single result is reported. 1113However benchmarks are often noisy and a single result may not be representative 1114of the overall behavior. For this reason it's possible to repeatedly rerun the 1115benchmark. 1116 1117The number of runs of each benchmark is specified globally by the 1118`--benchmark_repetitions` flag or on a per benchmark basis by calling 1119`Repetitions` on the registered benchmark object. When a benchmark is run more 1120than once the mean, median and standard deviation of the runs will be reported. 1121 1122Additionally the `--benchmark_report_aggregates_only={true|false}`, 1123`--benchmark_display_aggregates_only={true|false}` flags or 1124`ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be 1125used to change how repeated tests are reported. By default the result of each 1126repeated run is reported. When `report aggregates only` option is `true`, 1127only the aggregates (i.e. mean, median and standard deviation, maybe complexity 1128measurements if they were requested) of the runs is reported, to both the 1129reporters - standard output (console), and the file. 1130However when only the `display aggregates only` option is `true`, 1131only the aggregates are displayed in the standard output, while the file 1132output still contains everything. 1133Calling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a 1134registered benchmark object overrides the value of the appropriate flag for that 1135benchmark. 1136 1137<a name="custom-statistics" /> 1138 1139### Custom Statistics 1140 1141While having mean, median and standard deviation is nice, this may not be 1142enough for everyone. For example you may want to know what the largest 1143observation is, e.g. because you have some real-time constraints. This is easy. 1144The following code will specify a custom statistic to be calculated, defined 1145by a lambda function. 1146 1147```c++ 1148void BM_spin_empty(benchmark::State& state) { 1149 for (auto _ : state) { 1150 for (int x = 0; x < state.range(0); ++x) { 1151 benchmark::DoNotOptimize(x); 1152 } 1153 } 1154} 1155 1156BENCHMARK(BM_spin_empty) 1157 ->ComputeStatistics("max", [](const std::vector<double>& v) -> double { 1158 return *(std::max_element(std::begin(v), std::end(v))); 1159 }) 1160 ->Arg(512); 1161``` 1162 1163<a name="using-register-benchmark" /> 1164 1165### Using RegisterBenchmark(name, fn, args...) 1166 1167The `RegisterBenchmark(name, func, args...)` function provides an alternative 1168way to create and register benchmarks. 1169`RegisterBenchmark(name, func, args...)` creates, registers, and returns a 1170pointer to a new benchmark with the specified `name` that invokes 1171`func(st, args...)` where `st` is a `benchmark::State` object. 1172 1173Unlike the `BENCHMARK` registration macros, which can only be used at the global 1174scope, the `RegisterBenchmark` can be called anywhere. This allows for 1175benchmark tests to be registered programmatically. 1176 1177Additionally `RegisterBenchmark` allows any callable object to be registered 1178as a benchmark. Including capturing lambdas and function objects. 1179 1180For Example: 1181```c++ 1182auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; 1183 1184int main(int argc, char** argv) { 1185 for (auto& test_input : { /* ... */ }) 1186 benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); 1187 benchmark::Initialize(&argc, argv); 1188 benchmark::RunSpecifiedBenchmarks(); 1189} 1190``` 1191 1192<a name="exiting-with-an-error" /> 1193 1194### Exiting with an Error 1195 1196When errors caused by external influences, such as file I/O and network 1197communication, occur within a benchmark the 1198`State::SkipWithError(const char* msg)` function can be used to skip that run 1199of benchmark and report the error. Note that only future iterations of the 1200`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop 1201Users must explicitly exit the loop, otherwise all iterations will be performed. 1202Users may explicitly return to exit the benchmark immediately. 1203 1204The `SkipWithError(...)` function may be used at any point within the benchmark, 1205including before and after the benchmark loop. Moreover, if `SkipWithError(...)` 1206has been used, it is not required to reach the benchmark loop and one may return 1207from the benchmark function early. 1208 1209For example: 1210 1211```c++ 1212static void BM_test(benchmark::State& state) { 1213 auto resource = GetResource(); 1214 if (!resource.good()) { 1215 state.SkipWithError("Resource is not good!"); 1216 // KeepRunning() loop will not be entered. 1217 } 1218 while (state.KeepRunning()) { 1219 auto data = resource.read_data(); 1220 if (!resource.good()) { 1221 state.SkipWithError("Failed to read data!"); 1222 break; // Needed to skip the rest of the iteration. 1223 } 1224 do_stuff(data); 1225 } 1226} 1227 1228static void BM_test_ranged_fo(benchmark::State & state) { 1229 auto resource = GetResource(); 1230 if (!resource.good()) { 1231 state.SkipWithError("Resource is not good!"); 1232 return; // Early return is allowed when SkipWithError() has been used. 1233 } 1234 for (auto _ : state) { 1235 auto data = resource.read_data(); 1236 if (!resource.good()) { 1237 state.SkipWithError("Failed to read data!"); 1238 break; // REQUIRED to prevent all further iterations. 1239 } 1240 do_stuff(data); 1241 } 1242} 1243``` 1244<a name="a-faster-keep-running-loop" /> 1245 1246### A Faster KeepRunning Loop 1247 1248In C++11 mode, a ranged-based for loop should be used in preference to 1249the `KeepRunning` loop for running the benchmarks. For example: 1250 1251```c++ 1252static void BM_Fast(benchmark::State &state) { 1253 for (auto _ : state) { 1254 FastOperation(); 1255 } 1256} 1257BENCHMARK(BM_Fast); 1258``` 1259 1260The reason the ranged-for loop is faster than using `KeepRunning`, is 1261because `KeepRunning` requires a memory load and store of the iteration count 1262ever iteration, whereas the ranged-for variant is able to keep the iteration count 1263in a register. 1264 1265For example, an empty inner loop of using the ranged-based for method looks like: 1266 1267```asm 1268# Loop Init 1269 mov rbx, qword ptr [r14 + 104] 1270 call benchmark::State::StartKeepRunning() 1271 test rbx, rbx 1272 je .LoopEnd 1273.LoopHeader: # =>This Inner Loop Header: Depth=1 1274 add rbx, -1 1275 jne .LoopHeader 1276.LoopEnd: 1277``` 1278 1279Compared to an empty `KeepRunning` loop, which looks like: 1280 1281```asm 1282.LoopHeader: # in Loop: Header=BB0_3 Depth=1 1283 cmp byte ptr [rbx], 1 1284 jne .LoopInit 1285.LoopBody: # =>This Inner Loop Header: Depth=1 1286 mov rax, qword ptr [rbx + 8] 1287 lea rcx, [rax + 1] 1288 mov qword ptr [rbx + 8], rcx 1289 cmp rax, qword ptr [rbx + 104] 1290 jb .LoopHeader 1291 jmp .LoopEnd 1292.LoopInit: 1293 mov rdi, rbx 1294 call benchmark::State::StartKeepRunning() 1295 jmp .LoopBody 1296.LoopEnd: 1297``` 1298 1299Unless C++03 compatibility is required, the ranged-for variant of writing 1300the benchmark loop should be preferred. 1301 1302<a name="disabling-cpu-frequency-scaling" /> 1303 1304### Disabling CPU Frequency Scaling 1305 1306If you see this error: 1307 1308``` 1309***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. 1310``` 1311 1312you might want to disable the CPU frequency scaling while running the benchmark: 1313 1314```bash 1315sudo cpupower frequency-set --governor performance 1316./mybench 1317sudo cpupower frequency-set --governor powersave 1318``` 1319