1# User Guide 2 3## Command Line 4 5[Output Formats](#output-formats) 6 7[Output Files](#output-files) 8 9[Running Benchmarks](#running-benchmarks) 10 11[Running a Subset of Benchmarks](#running-a-subset-of-benchmarks) 12 13[Result Comparison](#result-comparison) 14 15[Extra Context](#extra-context) 16 17## Library 18 19[Runtime and Reporting Considerations](#runtime-and-reporting-considerations) 20 21[Setup/Teardown](#setupteardown) 22 23[Passing Arguments](#passing-arguments) 24 25[Custom Benchmark Name](#custom-benchmark-name) 26 27[Calculating Asymptotic Complexity](#asymptotic-complexity) 28 29[Templated Benchmarks](#templated-benchmarks) 30 31[Templated Benchmarks that take arguments](#templated-benchmarks-with-arguments) 32 33[Fixtures](#fixtures) 34 35[Custom Counters](#custom-counters) 36 37[Multithreaded Benchmarks](#multithreaded-benchmarks) 38 39[CPU Timers](#cpu-timers) 40 41[Manual Timing](#manual-timing) 42 43[Setting the Time Unit](#setting-the-time-unit) 44 45[Random Interleaving](random_interleaving.md) 46 47[User-Requested Performance Counters](perf_counters.md) 48 49[Preventing Optimization](#preventing-optimization) 50 51[Reporting Statistics](#reporting-statistics) 52 53[Custom Statistics](#custom-statistics) 54 55[Memory Usage](#memory-usage) 56 57[Using RegisterBenchmark](#using-register-benchmark) 58 59[Exiting with an Error](#exiting-with-an-error) 60 61[A Faster `KeepRunning` Loop](#a-faster-keep-running-loop) 62 63## Benchmarking Tips 64 65[Disabling CPU Frequency Scaling](#disabling-cpu-frequency-scaling) 66 67[Reducing Variance in Benchmarks](reducing_variance.md) 68 69<a name="output-formats" /> 70 71## Output Formats 72 73The library supports multiple output formats. Use the 74`--benchmark_format=<console|json|csv>` flag (or set the 75`BENCHMARK_FORMAT=<console|json|csv>` environment variable) to set 76the format type. `console` is the default format. 77 78The Console format is intended to be a human readable format. By default 79the format generates color output. Context is output on stderr and the 80tabular data on stdout. Example tabular output looks like: 81 82``` 83Benchmark Time(ns) CPU(ns) Iterations 84---------------------------------------------------------------------- 85BM_SetInsert/1024/1 28928 29349 23853 133.097kiB/s 33.2742k items/s 86BM_SetInsert/1024/8 32065 32913 21375 949.487kiB/s 237.372k items/s 87BM_SetInsert/1024/10 33157 33648 21431 1.13369MiB/s 290.225k items/s 88``` 89 90The JSON format outputs human readable json split into two top level attributes. 91The `context` attribute contains information about the run in general, including 92information about the CPU and the date. 93The `benchmarks` attribute contains a list of every benchmark run. Example json 94output looks like: 95 96```json 97{ 98 "context": { 99 "date": "2015/03/17-18:40:25", 100 "num_cpus": 40, 101 "mhz_per_cpu": 2801, 102 "cpu_scaling_enabled": false, 103 "build_type": "debug" 104 }, 105 "benchmarks": [ 106 { 107 "name": "BM_SetInsert/1024/1", 108 "iterations": 94877, 109 "real_time": 29275, 110 "cpu_time": 29836, 111 "bytes_per_second": 134066, 112 "items_per_second": 33516 113 }, 114 { 115 "name": "BM_SetInsert/1024/8", 116 "iterations": 21609, 117 "real_time": 32317, 118 "cpu_time": 32429, 119 "bytes_per_second": 986770, 120 "items_per_second": 246693 121 }, 122 { 123 "name": "BM_SetInsert/1024/10", 124 "iterations": 21393, 125 "real_time": 32724, 126 "cpu_time": 33355, 127 "bytes_per_second": 1199226, 128 "items_per_second": 299807 129 } 130 ] 131} 132``` 133 134The CSV format outputs comma-separated values. The `context` is output on stderr 135and the CSV itself on stdout. Example CSV output looks like: 136 137``` 138name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 139"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 140"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 141"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 142``` 143 144<a name="output-files" /> 145 146## Output Files 147 148Write benchmark results to a file with the `--benchmark_out=<filename>` option 149(or set `BENCHMARK_OUT`). Specify the output format with 150`--benchmark_out_format={json|console|csv}` (or set 151`BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that the 'csv' reporter is 152deprecated and the saved `.csv` file 153[is not parsable](https://github.com/google/benchmark/issues/794) by csv 154parsers. 155 156Specifying `--benchmark_out` does not suppress the console output. 157 158<a name="running-benchmarks" /> 159 160## Running Benchmarks 161 162Benchmarks are executed by running the produced binaries. Benchmarks binaries, 163by default, accept options that may be specified either through their command 164line interface or by setting environment variables before execution. For every 165`--option_flag=<value>` CLI switch, a corresponding environment variable 166`OPTION_FLAG=<value>` exist and is used as default if set (CLI switches always 167 prevails). A complete list of CLI options is available running benchmarks 168 with the `--help` switch. 169 170### Dry runs 171 172To confirm that benchmarks can run successfully without needing to wait for 173multiple repetitions and iterations, the `--benchmark_dry_run` flag can be 174used. This will run the benchmarks as normal, but for 1 iteration and 1 175repetition only. 176 177<a name="running-a-subset-of-benchmarks" /> 178 179## Running a Subset of Benchmarks 180 181The `--benchmark_filter=<regex>` option (or `BENCHMARK_FILTER=<regex>` 182environment variable) can be used to only run the benchmarks that match 183the specified `<regex>`. For example: 184 185```bash 186$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 187Run on (1 X 2300 MHz CPU ) 1882016-06-25 19:34:24 189Benchmark Time CPU Iterations 190---------------------------------------------------- 191BM_memcpy/32 11 ns 11 ns 79545455 192BM_memcpy/32k 2181 ns 2185 ns 324074 193BM_memcpy/32 12 ns 12 ns 54687500 194BM_memcpy/32k 1834 ns 1837 ns 357143 195``` 196 197## Disabling Benchmarks 198 199It is possible to temporarily disable benchmarks by renaming the benchmark 200function to have the prefix "DISABLED_". This will cause the benchmark to 201be skipped at runtime. 202 203<a name="result-comparison" /> 204 205## Result comparison 206 207It is possible to compare the benchmarking results. 208See [Additional Tooling Documentation](tools.md) 209 210<a name="extra-context" /> 211 212## Extra Context 213 214Sometimes it's useful to add extra context to the content printed before the 215results. By default this section includes information about the CPU on which 216the benchmarks are running. If you do want to add more context, you can use 217the `benchmark_context` command line flag: 218 219```bash 220$ ./run_benchmarks --benchmark_context=pwd=`pwd` 221Run on (1 x 2300 MHz CPU) 222pwd: /home/user/benchmark/ 223Benchmark Time CPU Iterations 224---------------------------------------------------- 225BM_memcpy/32 11 ns 11 ns 79545455 226BM_memcpy/32k 2181 ns 2185 ns 324074 227``` 228 229You can get the same effect with the API: 230 231```c++ 232 benchmark::AddCustomContext("foo", "bar"); 233``` 234 235Note that attempts to add a second value with the same key will fail with an 236error message. 237 238<a name="runtime-and-reporting-considerations" /> 239 240## Runtime and Reporting Considerations 241 242When the benchmark binary is executed, each benchmark function is run serially. 243The number of iterations to run is determined dynamically by running the 244benchmark a few times and measuring the time taken and ensuring that the 245ultimate result will be statistically stable. As such, faster benchmark 246functions will be run for more iterations than slower benchmark functions, and 247the number of iterations is thus reported. 248 249In all cases, the number of iterations for which the benchmark is run is 250governed by the amount of time the benchmark takes. Concretely, the number of 251iterations is at least one, not more than 1e9, until CPU time is greater than 252the minimum time, or the wallclock time is 5x minimum time. The minimum time is 253set per benchmark by calling `MinTime` on the registered benchmark object. 254 255Furthermore warming up a benchmark might be necessary in order to get 256stable results because of e.g caching effects of the code under benchmark. 257Warming up means running the benchmark a given amount of time, before 258results are actually taken into account. The amount of time for which 259the warmup should be run can be set per benchmark by calling 260`MinWarmUpTime` on the registered benchmark object or for all benchmarks 261using the `--benchmark_min_warmup_time` command-line option. Note that 262`MinWarmUpTime` will overwrite the value of `--benchmark_min_warmup_time` 263for the single benchmark. How many iterations the warmup run of each 264benchmark takes is determined the same way as described in the paragraph 265above. Per default the warmup phase is set to 0 seconds and is therefore 266disabled. 267 268Average timings are then reported over the iterations run. If multiple 269repetitions are requested using the `--benchmark_repetitions` command-line 270option, or at registration time, the benchmark function will be run several 271times and statistical results across these repetitions will also be reported. 272 273As well as the per-benchmark entries, a preamble in the report will include 274information about the machine on which the benchmarks are run. 275 276<a name="setup-teardown" /> 277 278## Setup/Teardown 279 280Global setup/teardown specific to each benchmark can be done by 281passing a callback to Setup/Teardown: 282 283The setup/teardown callbacks will be invoked once for each benchmark. If the 284benchmark is multi-threaded (will run in k threads), they will be invoked 285exactly once before each run with k threads. 286 287If the benchmark uses different size groups of threads, the above will be true 288for each size group. 289 290Eg., 291 292```c++ 293static void DoSetup(const benchmark::State& state) { 294} 295 296static void DoTeardown(const benchmark::State& state) { 297} 298 299static void BM_func(benchmark::State& state) {...} 300 301BENCHMARK(BM_func)->Arg(1)->Arg(3)->Threads(16)->Threads(32)->Setup(DoSetup)->Teardown(DoTeardown); 302 303``` 304 305In this example, `DoSetup` and `DoTearDown` will be invoked 4 times each, 306specifically, once for each of this family: 307 - BM_func_Arg_1_Threads_16, BM_func_Arg_1_Threads_32 308 - BM_func_Arg_3_Threads_16, BM_func_Arg_3_Threads_32 309 310<a name="passing-arguments" /> 311 312## Passing Arguments 313 314Sometimes a family of benchmarks can be implemented with just one routine that 315takes an extra argument to specify which one of the family of benchmarks to 316run. For example, the following code defines a family of benchmarks for 317measuring the speed of `memcpy()` calls of different lengths: 318 319```c++ 320static void BM_memcpy(benchmark::State& state) { 321 char* src = new char[state.range(0)]; 322 char* dst = new char[state.range(0)]; 323 memset(src, 'x', state.range(0)); 324 for (auto _ : state) 325 memcpy(dst, src, state.range(0)); 326 state.SetBytesProcessed(int64_t(state.iterations()) * 327 int64_t(state.range(0))); 328 delete[] src; 329 delete[] dst; 330} 331BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(4<<10)->Arg(8<<10); 332``` 333 334The preceding code is quite repetitive, and can be replaced with the following 335short-hand. The following invocation will pick a few appropriate arguments in 336the specified range and will generate a benchmark for each such argument. 337 338```c++ 339BENCHMARK(BM_memcpy)->Range(8, 8<<10); 340``` 341 342By default the arguments in the range are generated in multiples of eight and 343the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the 344range multiplier is changed to multiples of two. 345 346```c++ 347BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); 348``` 349 350Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. 351 352The preceding code shows a method of defining a sparse range. The following 353example shows a method of defining a dense range. It is then used to benchmark 354the performance of `std::vector` initialization for uniformly increasing sizes. 355 356```c++ 357static void BM_DenseRange(benchmark::State& state) { 358 for(auto _ : state) { 359 std::vector<int> v(state.range(0), state.range(0)); 360 auto data = v.data(); 361 benchmark::DoNotOptimize(data); 362 benchmark::ClobberMemory(); 363 } 364} 365BENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128); 366``` 367 368Now arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ]. 369 370You might have a benchmark that depends on two or more inputs. For example, the 371following code defines a family of benchmarks for measuring the speed of set 372insertion. 373 374```c++ 375static void BM_SetInsert(benchmark::State& state) { 376 std::set<int> data; 377 for (auto _ : state) { 378 state.PauseTiming(); 379 data = ConstructRandomSet(state.range(0)); 380 state.ResumeTiming(); 381 for (int j = 0; j < state.range(1); ++j) 382 data.insert(RandomNumber()); 383 } 384} 385BENCHMARK(BM_SetInsert) 386 ->Args({1<<10, 128}) 387 ->Args({2<<10, 128}) 388 ->Args({4<<10, 128}) 389 ->Args({8<<10, 128}) 390 ->Args({1<<10, 512}) 391 ->Args({2<<10, 512}) 392 ->Args({4<<10, 512}) 393 ->Args({8<<10, 512}); 394``` 395 396The preceding code is quite repetitive, and can be replaced with the following 397short-hand. The following macro will pick a few appropriate arguments in the 398product of the two specified ranges and will generate a benchmark for each such 399pair. 400 401<!-- {% raw %} --> 402```c++ 403BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}}); 404``` 405<!-- {% endraw %} --> 406 407Some benchmarks may require specific argument values that cannot be expressed 408with `Ranges`. In this case, `ArgsProduct` offers the ability to generate a 409benchmark input for each combination in the product of the supplied vectors. 410 411<!-- {% raw %} --> 412```c++ 413BENCHMARK(BM_SetInsert) 414 ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}}) 415// would generate the same benchmark arguments as 416BENCHMARK(BM_SetInsert) 417 ->Args({1<<10, 20}) 418 ->Args({3<<10, 20}) 419 ->Args({8<<10, 20}) 420 ->Args({3<<10, 40}) 421 ->Args({8<<10, 40}) 422 ->Args({1<<10, 40}) 423 ->Args({1<<10, 60}) 424 ->Args({3<<10, 60}) 425 ->Args({8<<10, 60}) 426 ->Args({1<<10, 80}) 427 ->Args({3<<10, 80}) 428 ->Args({8<<10, 80}); 429``` 430<!-- {% endraw %} --> 431 432For the most common scenarios, helper methods for creating a list of 433integers for a given sparse or dense range are provided. 434 435```c++ 436BENCHMARK(BM_SetInsert) 437 ->ArgsProduct({ 438 benchmark::CreateRange(8, 128, /*multi=*/2), 439 benchmark::CreateDenseRange(1, 4, /*step=*/1) 440 }) 441// would generate the same benchmark arguments as 442BENCHMARK(BM_SetInsert) 443 ->ArgsProduct({ 444 {8, 16, 32, 64, 128}, 445 {1, 2, 3, 4} 446 }); 447``` 448 449For more complex patterns of inputs, passing a custom function to `Apply` allows 450programmatic specification of an arbitrary set of arguments on which to run the 451benchmark. The following example enumerates a dense range on one parameter, 452and a sparse range on the second. 453 454```c++ 455static void CustomArguments(benchmark::internal::Benchmark* b) { 456 for (int i = 0; i <= 10; ++i) 457 for (int j = 32; j <= 1024*1024; j *= 8) 458 b->Args({i, j}); 459} 460BENCHMARK(BM_SetInsert)->Apply(CustomArguments); 461``` 462 463### Passing Arbitrary Arguments to a Benchmark 464 465In C++11 it is possible to define a benchmark that takes an arbitrary number 466of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` 467macro creates a benchmark that invokes `func` with the `benchmark::State` as 468the first argument followed by the specified `args...`. 469The `test_case_name` is appended to the name of the benchmark and 470should describe the values passed. 471 472```c++ 473template <class ...Args> 474void BM_takes_args(benchmark::State& state, Args&&... args) { 475 auto args_tuple = std::make_tuple(std::move(args)...); 476 for (auto _ : state) { 477 std::cout << std::get<0>(args_tuple) << ": " << std::get<1>(args_tuple) 478 << '\n'; 479 [...] 480 } 481} 482// Registers a benchmark named "BM_takes_args/int_string_test" that passes 483// the specified values to `args`. 484BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); 485 486// Registers the same benchmark "BM_takes_args/int_test" that passes 487// the specified values to `args`. 488BENCHMARK_CAPTURE(BM_takes_args, int_test, 42, 43); 489``` 490 491Note that elements of `...args` may refer to global variables. Users should 492avoid modifying global state inside of a benchmark. 493 494<a name="asymptotic-complexity" /> 495 496## Calculating Asymptotic Complexity (Big O) 497 498Asymptotic complexity might be calculated for a family of benchmarks. The 499following code will calculate the coefficient for the high-order term in the 500running time and the normalized root-mean square error of string comparison. 501 502```c++ 503static void BM_StringCompare(benchmark::State& state) { 504 std::string s1(state.range(0), '-'); 505 std::string s2(state.range(0), '-'); 506 for (auto _ : state) { 507 auto comparison_result = s1.compare(s2); 508 benchmark::DoNotOptimize(comparison_result); 509 } 510 state.SetComplexityN(state.range(0)); 511} 512BENCHMARK(BM_StringCompare) 513 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); 514``` 515 516As shown in the following invocation, asymptotic complexity might also be 517calculated automatically. 518 519```c++ 520BENCHMARK(BM_StringCompare) 521 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); 522``` 523 524The following code will specify asymptotic complexity with a lambda function, 525that might be used to customize high-order term calculation. 526 527```c++ 528BENCHMARK(BM_StringCompare)->RangeMultiplier(2) 529 ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; }); 530``` 531 532<a name="custom-benchmark-name" /> 533 534## Custom Benchmark Name 535 536You can change the benchmark's name as follows: 537 538```c++ 539BENCHMARK(BM_memcpy)->Name("memcpy")->RangeMultiplier(2)->Range(8, 8<<10); 540``` 541 542The invocation will execute the benchmark as before using `BM_memcpy` but changes 543the prefix in the report to `memcpy`. 544 545<a name="templated-benchmarks" /> 546 547## Templated Benchmarks 548 549This example produces and consumes messages of size `sizeof(v)` `range_x` 550times. It also outputs throughput in the absence of multiprogramming. 551 552```c++ 553template <class Q> void BM_Sequential(benchmark::State& state) { 554 Q q; 555 typename Q::value_type v; 556 for (auto _ : state) { 557 for (int i = state.range(0); i--; ) 558 q.push(v); 559 for (int e = state.range(0); e--; ) 560 q.Wait(&v); 561 } 562 // actually messages, not bytes: 563 state.SetBytesProcessed( 564 static_cast<int64_t>(state.iterations())*state.range(0)); 565} 566// C++03 567BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 568 569// C++11 or newer, you can use the BENCHMARK macro with template parameters: 570BENCHMARK(BM_Sequential<WaitQueue<int>>)->Range(1<<0, 1<<10); 571 572``` 573 574Three macros are provided for adding benchmark templates. 575 576```c++ 577#ifdef BENCHMARK_HAS_CXX11 578#define BENCHMARK(func<...>) // Takes any number of parameters. 579#else // C++ < C++11 580#define BENCHMARK_TEMPLATE(func, arg1) 581#endif 582#define BENCHMARK_TEMPLATE1(func, arg1) 583#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 584``` 585 586<a name="templated-benchmarks-with-arguments" /> 587 588## Templated Benchmarks that take arguments 589 590Sometimes there is a need to template benchmarks, and provide arguments to them. 591 592```c++ 593template <class Q> void BM_Sequential_With_Step(benchmark::State& state, int step) { 594 Q q; 595 typename Q::value_type v; 596 for (auto _ : state) { 597 for (int i = state.range(0); i-=step; ) 598 q.push(v); 599 for (int e = state.range(0); e-=step; ) 600 q.Wait(&v); 601 } 602 // actually messages, not bytes: 603 state.SetBytesProcessed( 604 static_cast<int64_t>(state.iterations())*state.range(0)); 605} 606 607BENCHMARK_TEMPLATE1_CAPTURE(BM_Sequential, WaitQueue<int>, Step1, 1)->Range(1<<0, 1<<10); 608``` 609 610<a name="fixtures" /> 611 612## Fixtures 613 614Fixture tests are created by first defining a type that derives from 615`::benchmark::Fixture` and then creating/registering the tests using the 616following macros: 617 618* `BENCHMARK_F(ClassName, Method)` 619* `BENCHMARK_DEFINE_F(ClassName, Method)` 620* `BENCHMARK_REGISTER_F(ClassName, Method)` 621 622For Example: 623 624```c++ 625class MyFixture : public benchmark::Fixture { 626public: 627 void SetUp(::benchmark::State& state) { 628 } 629 630 void TearDown(::benchmark::State& state) { 631 } 632}; 633 634// Defines and registers `FooTest` using the class `MyFixture`. 635BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 636 for (auto _ : st) { 637 ... 638 } 639} 640 641// Only defines `BarTest` using the class `MyFixture`. 642BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 643 for (auto _ : st) { 644 ... 645 } 646} 647// `BarTest` is NOT registered. 648BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 649// `BarTest` is now registered. 650``` 651 652### Templated Fixtures 653 654Also you can create templated fixture by using the following macros: 655 656* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)` 657* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)` 658 659For example: 660 661```c++ 662template<typename T> 663class MyFixture : public benchmark::Fixture {}; 664 665// Defines and registers `IntTest` using the class template `MyFixture<int>`. 666BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) { 667 for (auto _ : st) { 668 ... 669 } 670} 671 672// Only defines `DoubleTest` using the class template `MyFixture<double>`. 673BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) { 674 for (auto _ : st) { 675 ... 676 } 677} 678// `DoubleTest` is NOT registered. 679BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2); 680// `DoubleTest` is now registered. 681``` 682 683<a name="custom-counters" /> 684 685## Custom Counters 686 687You can add your own counters with user-defined names. The example below 688will add columns "Foo", "Bar" and "Baz" in its output: 689 690```c++ 691static void UserCountersExample1(benchmark::State& state) { 692 double numFoos = 0, numBars = 0, numBazs = 0; 693 for (auto _ : state) { 694 // ... count Foo,Bar,Baz events 695 } 696 state.counters["Foo"] = numFoos; 697 state.counters["Bar"] = numBars; 698 state.counters["Baz"] = numBazs; 699} 700``` 701 702The `state.counters` object is a `std::map` with `std::string` keys 703and `Counter` values. The latter is a `double`-like class, via an implicit 704conversion to `double&`. Thus you can use all of the standard arithmetic 705assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. 706 707In multithreaded benchmarks, each counter is set on the calling thread only. 708When the benchmark finishes, the counters from each thread will be summed; 709the resulting sum is the value which will be shown for the benchmark. 710 711The `Counter` constructor accepts three parameters: the value as a `double` 712; a bit flag which allows you to show counters as rates, and/or as per-thread 713iteration, and/or as per-thread averages, and/or iteration invariants, 714and/or finally inverting the result; and a flag specifying the 'unit' - i.e. 715is 1k a 1000 (default, `benchmark::Counter::OneK::kIs1000`), or 1024 716(`benchmark::Counter::OneK::kIs1024`)? 717 718```c++ 719 // sets a simple counter 720 state.counters["Foo"] = numFoos; 721 722 // Set the counter as a rate. It will be presented divided 723 // by the duration of the benchmark. 724 // Meaning: per one second, how many 'foo's are processed? 725 state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); 726 727 // Set the counter as a rate. It will be presented divided 728 // by the duration of the benchmark, and the result inverted. 729 // Meaning: how many seconds it takes to process one 'foo'? 730 state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert); 731 732 // Set the counter as a thread-average quantity. It will 733 // be presented divided by the number of threads. 734 state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); 735 736 // There's also a combined flag: 737 state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); 738 739 // This says that we process with the rate of state.range(0) bytes every iteration: 740 state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); 741``` 742 743When you're compiling in C++11 mode or later you can use `insert()` with 744`std::initializer_list`: 745 746<!-- {% raw %} --> 747```c++ 748 // With C++11, this can be done: 749 state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); 750 // ... instead of: 751 state.counters["Foo"] = numFoos; 752 state.counters["Bar"] = numBars; 753 state.counters["Baz"] = numBazs; 754``` 755<!-- {% endraw %} --> 756 757### Counter Reporting 758 759When using the console reporter, by default, user counters are printed at 760the end after the table, the same way as ``bytes_processed`` and 761``items_processed``. This is best for cases in which there are few counters, 762or where there are only a couple of lines per benchmark. Here's an example of 763the default output: 764 765``` 766------------------------------------------------------------------------------ 767Benchmark Time CPU Iterations UserCounters... 768------------------------------------------------------------------------------ 769BM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 770BM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m 771BM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 772BM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 773BM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 774BM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 775BM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 776BM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 777BM_Factorial 26 ns 26 ns 26608979 40320 778BM_Factorial/real_time 26 ns 26 ns 26587936 40320 779BM_CalculatePiRange/1 16 ns 16 ns 45704255 0 780BM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 781BM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 782BM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 783``` 784 785If this doesn't suit you, you can print each counter as a table column by 786passing the flag `--benchmark_counters_tabular=true` to the benchmark 787application. This is best for cases in which there are a lot of counters, or 788a lot of lines per individual benchmark. Note that this will trigger a 789reprinting of the table header any time the counter set changes between 790individual benchmarks. Here's an example of corresponding output when 791`--benchmark_counters_tabular=true` is passed: 792 793``` 794--------------------------------------------------------------------------------------- 795Benchmark Time CPU Iterations Bar Bat Baz Foo 796--------------------------------------------------------------------------------------- 797BM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 798BM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 799BM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 800BM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 801BM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 802BM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 803BM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 804BM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 805-------------------------------------------------------------- 806Benchmark Time CPU Iterations 807-------------------------------------------------------------- 808BM_Factorial 26 ns 26 ns 26392245 40320 809BM_Factorial/real_time 26 ns 26 ns 26494107 40320 810BM_CalculatePiRange/1 15 ns 15 ns 45571597 0 811BM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 812BM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 813BM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 814BM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 815BM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 816BM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 817BM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 818BM_CalculatePi/threads:8 2255 ns 9943 ns 70936 819``` 820 821Note above the additional header printed when the benchmark changes from 822``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does 823not have the same counter set as ``BM_UserCounter``. 824 825<a name="multithreaded-benchmarks"/> 826 827## Multithreaded Benchmarks 828 829In a multithreaded test (benchmark invoked by multiple threads simultaneously), 830it is guaranteed that none of the threads will start until all have reached 831the start of the benchmark loop, and all will have finished before any thread 832exits the benchmark loop. (This behavior is also provided by the `KeepRunning()` 833API) As such, any global setup or teardown can be wrapped in a check against the thread 834index: 835 836```c++ 837static void BM_MultiThreaded(benchmark::State& state) { 838 if (state.thread_index() == 0) { 839 // Setup code here. 840 } 841 for (auto _ : state) { 842 // Run the test as normal. 843 } 844 if (state.thread_index() == 0) { 845 // Teardown code here. 846 } 847} 848BENCHMARK(BM_MultiThreaded)->Threads(2); 849``` 850 851To run the benchmark across a range of thread counts, instead of `Threads`, use 852`ThreadRange`. This takes two parameters (`min_threads` and `max_threads`) and 853runs the benchmark once for values in the inclusive range. For example: 854 855```c++ 856BENCHMARK(BM_MultiThreaded)->ThreadRange(1, 8); 857``` 858 859will run `BM_MultiThreaded` with thread counts 1, 2, 4, and 8. 860 861If the benchmarked code itself uses threads and you want to compare it to 862single-threaded code, you may want to use real-time ("wallclock") measurements 863for latency comparisons: 864 865```c++ 866BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 867``` 868 869Without `UseRealTime`, CPU time is used by default. 870 871<a name="cpu-timers" /> 872 873## CPU Timers 874 875By default, the CPU timer only measures the time spent by the main thread. 876If the benchmark itself uses threads internally, this measurement may not 877be what you are looking for. Instead, there is a way to measure the total 878CPU usage of the process, by all the threads. 879 880```c++ 881void callee(int i); 882 883static void MyMain(int size) { 884#pragma omp parallel for 885 for(int i = 0; i < size; i++) 886 callee(i); 887} 888 889static void BM_OpenMP(benchmark::State& state) { 890 for (auto _ : state) 891 MyMain(state.range(0)); 892} 893 894// Measure the time spent by the main thread, use it to decide for how long to 895// run the benchmark loop. Depending on the internal implementation detail may 896// measure to anywhere from near-zero (the overhead spent before/after work 897// handoff to worker thread[s]) to the whole single-thread time. 898BENCHMARK(BM_OpenMP)->Range(8, 8<<10); 899 900// Measure the user-visible time, the wall clock (literally, the time that 901// has passed on the clock on the wall), use it to decide for how long to 902// run the benchmark loop. This will always be meaningful, and will match the 903// time spent by the main thread in single-threaded case, in general decreasing 904// with the number of internal threads doing the work. 905BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime(); 906 907// Measure the total CPU consumption, use it to decide for how long to 908// run the benchmark loop. This will always measure to no less than the 909// time spent by the main thread in single-threaded case. 910BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime(); 911 912// A mixture of the last two. Measure the total CPU consumption, but use the 913// wall clock to decide for how long to run the benchmark loop. 914BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime(); 915``` 916 917### Controlling Timers 918 919Normally, the entire duration of the work loop (`for (auto _ : state) {}`) 920is measured. But sometimes, it is necessary to do some work inside of 921that loop, every iteration, but without counting that time to the benchmark time. 922That is possible, although it is not recommended, since it has high overhead. 923 924<!-- {% raw %} --> 925```c++ 926static void BM_SetInsert_With_Timer_Control(benchmark::State& state) { 927 std::set<int> data; 928 for (auto _ : state) { 929 state.PauseTiming(); // Stop timers. They will not count until they are resumed. 930 data = ConstructRandomSet(state.range(0)); // Do something that should not be measured 931 state.ResumeTiming(); // And resume timers. They are now counting again. 932 // The rest will be measured. 933 for (int j = 0; j < state.range(1); ++j) 934 data.insert(RandomNumber()); 935 } 936} 937BENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}}); 938``` 939<!-- {% endraw %} --> 940 941<a name="manual-timing" /> 942 943## Manual Timing 944 945For benchmarking something for which neither CPU time nor real-time are 946correct or accurate enough, completely manual timing is supported using 947the `UseManualTime` function. 948 949When `UseManualTime` is used, the benchmarked code must call 950`SetIterationTime` once per iteration of the benchmark loop to 951report the manually measured time. 952 953An example use case for this is benchmarking GPU execution (e.g. OpenCL 954or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot 955be accurately measured using CPU time or real-time. Instead, they can be 956measured accurately using a dedicated API, and these measurement results 957can be reported back with `SetIterationTime`. 958 959```c++ 960static void BM_ManualTiming(benchmark::State& state) { 961 int microseconds = state.range(0); 962 std::chrono::duration<double, std::micro> sleep_duration { 963 static_cast<double>(microseconds) 964 }; 965 966 for (auto _ : state) { 967 auto start = std::chrono::high_resolution_clock::now(); 968 // Simulate some useful workload with a sleep 969 std::this_thread::sleep_for(sleep_duration); 970 auto end = std::chrono::high_resolution_clock::now(); 971 972 auto elapsed_seconds = 973 std::chrono::duration_cast<std::chrono::duration<double>>( 974 end - start); 975 976 state.SetIterationTime(elapsed_seconds.count()); 977 } 978} 979BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); 980``` 981 982<a name="setting-the-time-unit" /> 983 984## Setting the Time Unit 985 986If a benchmark runs a few milliseconds it may be hard to visually compare the 987measured times, since the output data is given in nanoseconds per default. In 988order to manually set the time unit, you can specify it manually: 989 990```c++ 991BENCHMARK(BM_test)->Unit(benchmark::kMillisecond); 992``` 993 994Additionally the default time unit can be set globally with the 995`--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only 996affects benchmarks where the time unit is not set explicitly. 997 998<a name="preventing-optimization" /> 999 1000## Preventing Optimization 1001 1002To prevent a value or expression from being optimized away by the compiler 1003the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` 1004functions can be used. 1005 1006```c++ 1007static void BM_test(benchmark::State& state) { 1008 for (auto _ : state) { 1009 int x = 0; 1010 for (int i=0; i < 64; ++i) { 1011 benchmark::DoNotOptimize(x += i); 1012 } 1013 } 1014} 1015``` 1016 1017`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either 1018memory or a register. For GNU based compilers it acts as read/write barrier 1019for global memory. More specifically it forces the compiler to flush pending 1020writes to memory and reload any other values as necessary. 1021 1022Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` 1023in any way. `<expr>` may even be removed entirely when the result is already 1024known. For example: 1025 1026```c++ 1027 // Example 1: `<expr>` is removed entirely. 1028 int foo(int x) { return x + 42; } 1029 while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); 1030 1031 // Example 2: Result of '<expr>' is only reused. 1032 int bar(int) __attribute__((const)); 1033 while (...) DoNotOptimize(bar(0)); // Optimized to: 1034 // int __result__ = bar(0); 1035 // while (...) DoNotOptimize(__result__); 1036``` 1037 1038The second tool for preventing optimizations is `ClobberMemory()`. In essence 1039`ClobberMemory()` forces the compiler to perform all pending writes to global 1040memory. Memory managed by block scope objects must be "escaped" using 1041`DoNotOptimize(...)` before it can be clobbered. In the below example 1042`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized 1043away. 1044 1045```c++ 1046static void BM_vector_push_back(benchmark::State& state) { 1047 for (auto _ : state) { 1048 std::vector<int> v; 1049 v.reserve(1); 1050 auto data = v.data(); // Allow v.data() to be clobbered. Pass as non-const 1051 benchmark::DoNotOptimize(data); // lvalue to avoid undesired compiler optimizations 1052 v.push_back(42); 1053 benchmark::ClobberMemory(); // Force 42 to be written to memory. 1054 } 1055} 1056``` 1057 1058Note that `ClobberMemory()` is only available for GNU or MSVC based compilers. 1059 1060<a name="reporting-statistics" /> 1061 1062## Statistics: Reporting the Mean, Median and Standard Deviation / Coefficient of variation of Repeated Benchmarks 1063 1064By default each benchmark is run once and that single result is reported. 1065However benchmarks are often noisy and a single result may not be representative 1066of the overall behavior. For this reason it's possible to repeatedly rerun the 1067benchmark. 1068 1069The number of runs of each benchmark is specified globally by the 1070`--benchmark_repetitions` flag or on a per benchmark basis by calling 1071`Repetitions` on the registered benchmark object. When a benchmark is run more 1072than once the mean, median, standard deviation and coefficient of variation 1073of the runs will be reported. 1074 1075Additionally the `--benchmark_report_aggregates_only={true|false}`, 1076`--benchmark_display_aggregates_only={true|false}` flags or 1077`ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be 1078used to change how repeated tests are reported. By default the result of each 1079repeated run is reported. When `report aggregates only` option is `true`, 1080only the aggregates (i.e. mean, median, standard deviation and coefficient 1081of variation, maybe complexity measurements if they were requested) of the runs 1082is reported, to both the reporters - standard output (console), and the file. 1083However when only the `display aggregates only` option is `true`, 1084only the aggregates are displayed in the standard output, while the file 1085output still contains everything. 1086Calling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a 1087registered benchmark object overrides the value of the appropriate flag for that 1088benchmark. 1089 1090<a name="custom-statistics" /> 1091 1092## Custom Statistics 1093 1094While having these aggregates is nice, this may not be enough for everyone. 1095For example you may want to know what the largest observation is, e.g. because 1096you have some real-time constraints. This is easy. The following code will 1097specify a custom statistic to be calculated, defined by a lambda function. 1098 1099```c++ 1100void BM_spin_empty(benchmark::State& state) { 1101 for (auto _ : state) { 1102 for (int x = 0; x < state.range(0); ++x) { 1103 benchmark::DoNotOptimize(x); 1104 } 1105 } 1106} 1107 1108BENCHMARK(BM_spin_empty) 1109 ->ComputeStatistics("max", [](const std::vector<double>& v) -> double { 1110 return *(std::max_element(std::begin(v), std::end(v))); 1111 }) 1112 ->Arg(512); 1113``` 1114 1115While usually the statistics produce values in time units, 1116you can also produce percentages: 1117 1118```c++ 1119void BM_spin_empty(benchmark::State& state) { 1120 for (auto _ : state) { 1121 for (int x = 0; x < state.range(0); ++x) { 1122 benchmark::DoNotOptimize(x); 1123 } 1124 } 1125} 1126 1127BENCHMARK(BM_spin_empty) 1128 ->ComputeStatistics("ratio", [](const std::vector<double>& v) -> double { 1129 return std::begin(v) / std::end(v); 1130 }, benchmark::StatisticUnit::kPercentage) 1131 ->Arg(512); 1132``` 1133 1134<a name="memory-usage" /> 1135 1136## Memory Usage 1137 1138It's often useful to also track memory usage for benchmarks, alongside CPU 1139performance. For this reason, benchmark offers the `RegisterMemoryManager` 1140method that allows a custom `MemoryManager` to be injected. 1141 1142If set, the `MemoryManager::Start` and `MemoryManager::Stop` methods will be 1143called at the start and end of benchmark runs to allow user code to fill out 1144a report on the number of allocations, bytes used, etc. 1145 1146This data will then be reported alongside other performance data, currently 1147only when using JSON output. 1148 1149<a name="profiling" /> 1150 1151## Profiling 1152 1153It's often useful to also profile benchmarks in particular ways, in addition to 1154CPU performance. For this reason, benchmark offers the `RegisterProfilerManager` 1155method that allows a custom `ProfilerManager` to be injected. 1156 1157If set, the `ProfilerManager::AfterSetupStart` and 1158`ProfilerManager::BeforeTeardownStop` methods will be called at the start and 1159end of a separate benchmark run to allow user code to collect and report 1160user-provided profile metrics. 1161 1162Output collected from this profiling run must be reported separately. 1163 1164<a name="using-register-benchmark" /> 1165 1166## Using RegisterBenchmark(name, fn, args...) 1167 1168The `RegisterBenchmark(name, func, args...)` function provides an alternative 1169way to create and register benchmarks. 1170`RegisterBenchmark(name, func, args...)` creates, registers, and returns a 1171pointer to a new benchmark with the specified `name` that invokes 1172`func(st, args...)` where `st` is a `benchmark::State` object. 1173 1174Unlike the `BENCHMARK` registration macros, which can only be used at the global 1175scope, the `RegisterBenchmark` can be called anywhere. This allows for 1176benchmark tests to be registered programmatically. 1177 1178Additionally `RegisterBenchmark` allows any callable object to be registered 1179as a benchmark. Including capturing lambdas and function objects. 1180 1181For Example: 1182```c++ 1183auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; 1184 1185int main(int argc, char** argv) { 1186 for (auto& test_input : { /* ... */ }) 1187 benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); 1188 benchmark::Initialize(&argc, argv); 1189 benchmark::RunSpecifiedBenchmarks(); 1190 benchmark::Shutdown(); 1191} 1192``` 1193 1194<a name="exiting-with-an-error" /> 1195 1196## Exiting with an Error 1197 1198When errors caused by external influences, such as file I/O and network 1199communication, occur within a benchmark the 1200`State::SkipWithError(const std::string& msg)` function can be used to skip that run 1201of benchmark and report the error. Note that only future iterations of the 1202`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop 1203Users must explicitly exit the loop, otherwise all iterations will be performed. 1204Users may explicitly return to exit the benchmark immediately. 1205 1206The `SkipWithError(...)` function may be used at any point within the benchmark, 1207including before and after the benchmark loop. Moreover, if `SkipWithError(...)` 1208has been used, it is not required to reach the benchmark loop and one may return 1209from the benchmark function early. 1210 1211For example: 1212 1213```c++ 1214static void BM_test(benchmark::State& state) { 1215 auto resource = GetResource(); 1216 if (!resource.good()) { 1217 state.SkipWithError("Resource is not good!"); 1218 // KeepRunning() loop will not be entered. 1219 } 1220 while (state.KeepRunning()) { 1221 auto data = resource.read_data(); 1222 if (!resource.good()) { 1223 state.SkipWithError("Failed to read data!"); 1224 break; // Needed to skip the rest of the iteration. 1225 } 1226 do_stuff(data); 1227 } 1228} 1229 1230static void BM_test_ranged_fo(benchmark::State & state) { 1231 auto resource = GetResource(); 1232 if (!resource.good()) { 1233 state.SkipWithError("Resource is not good!"); 1234 return; // Early return is allowed when SkipWithError() has been used. 1235 } 1236 for (auto _ : state) { 1237 auto data = resource.read_data(); 1238 if (!resource.good()) { 1239 state.SkipWithError("Failed to read data!"); 1240 break; // REQUIRED to prevent all further iterations. 1241 } 1242 do_stuff(data); 1243 } 1244} 1245``` 1246<a name="a-faster-keep-running-loop" /> 1247 1248## A Faster KeepRunning Loop 1249 1250In C++11 mode, a ranged-based for loop should be used in preference to 1251the `KeepRunning` loop for running the benchmarks. For example: 1252 1253```c++ 1254static void BM_Fast(benchmark::State &state) { 1255 for (auto _ : state) { 1256 FastOperation(); 1257 } 1258} 1259BENCHMARK(BM_Fast); 1260``` 1261 1262The reason the ranged-for loop is faster than using `KeepRunning`, is 1263because `KeepRunning` requires a memory load and store of the iteration count 1264ever iteration, whereas the ranged-for variant is able to keep the iteration count 1265in a register. 1266 1267For example, an empty inner loop of using the ranged-based for method looks like: 1268 1269```asm 1270# Loop Init 1271 mov rbx, qword ptr [r14 + 104] 1272 call benchmark::State::StartKeepRunning() 1273 test rbx, rbx 1274 je .LoopEnd 1275.LoopHeader: # =>This Inner Loop Header: Depth=1 1276 add rbx, -1 1277 jne .LoopHeader 1278.LoopEnd: 1279``` 1280 1281Compared to an empty `KeepRunning` loop, which looks like: 1282 1283```asm 1284.LoopHeader: # in Loop: Header=BB0_3 Depth=1 1285 cmp byte ptr [rbx], 1 1286 jne .LoopInit 1287.LoopBody: # =>This Inner Loop Header: Depth=1 1288 mov rax, qword ptr [rbx + 8] 1289 lea rcx, [rax + 1] 1290 mov qword ptr [rbx + 8], rcx 1291 cmp rax, qword ptr [rbx + 104] 1292 jb .LoopHeader 1293 jmp .LoopEnd 1294.LoopInit: 1295 mov rdi, rbx 1296 call benchmark::State::StartKeepRunning() 1297 jmp .LoopBody 1298.LoopEnd: 1299``` 1300 1301Unless C++03 compatibility is required, the ranged-for variant of writing 1302the benchmark loop should be preferred. 1303 1304<a name="disabling-cpu-frequency-scaling" /> 1305 1306## Disabling CPU Frequency Scaling 1307 1308If you see this error: 1309 1310``` 1311***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 1312be noisy and will incur extra overhead. 1313``` 1314 1315you might want to disable the CPU frequency scaling while running the 1316benchmark, as well as consider other ways to stabilize the performance of 1317your system while benchmarking. 1318 1319See [Reducing Variance](reducing_variance.md) for more information. 1320