1# How to Write and Run Benchmarks in Node.js Core 2 3## Table of Contents 4 5* [Prerequisites](#prerequisites) 6 * [HTTP Benchmark Requirements](#http-benchmark-requirements) 7 * [Benchmark Analysis Requirements](#benchmark-analysis-requirements) 8* [Running benchmarks](#running-benchmarks) 9 * [Running individual benchmarks](#running-individual-benchmarks) 10 * [Running all benchmarks](#running-all-benchmarks) 11 * [Filtering benchmarks](#filtering-benchmarks) 12 * [Comparing Node.js versions](#comparing-nodejs-versions) 13 * [Comparing parameters](#comparing-parameters) 14 * [Running Benchmarks on the CI](#running-benchmarks-on-the-ci) 15* [Creating a benchmark](#creating-a-benchmark) 16 * [Basics of a benchmark](#basics-of-a-benchmark) 17 * [Creating an HTTP benchmark](#creating-an-http-benchmark) 18 19## Prerequisites 20 21Basic Unix tools are required for some benchmarks. 22[Git for Windows][git-for-windows] includes Git Bash and the necessary tools, 23which need to be included in the global Windows `PATH`. 24 25### HTTP Benchmark Requirements 26 27Most of the HTTP benchmarks require a benchmarker to be installed. This can be 28either [`wrk`][wrk] or [`autocannon`][autocannon]. 29 30`Autocannon` is a Node.js script that can be installed using 31`npm install -g autocannon`. It will use the Node.js executable that is in the 32path. In order to compare two HTTP benchmark runs, make sure that the 33Node.js version in the path is not altered. 34 35`wrk` may be available through one of the available package managers. If not, 36it can be easily built [from source][wrk] via `make`. 37 38By default, `wrk` will be used as the benchmarker. If it is not available, 39`autocannon` will be used in its place. When creating an HTTP benchmark, the 40benchmarker to be used should be specified by providing it as an argument: 41 42`node benchmark/run.js --set benchmarker=autocannon http` 43 44`node benchmark/http/simple.js benchmarker=autocannon` 45 46#### HTTP/2 Benchmark Requirements 47 48To run the `http2` benchmarks, the `h2load` benchmarker must be used. The 49`h2load` tool is a component of the `nghttp2` project and may be installed 50from [nghttp2.org][] or built from source. 51 52`node benchmark/http2/simple.js benchmarker=autocannon` 53 54### Benchmark Analysis Requirements 55 56To analyze the results, `R` should be installed. Use one of the available 57package managers or download it from <https://www.r-project.org/>. 58 59The R packages `ggplot2` and `plyr` are also used and can be installed using 60the R REPL. 61 62```console 63$ R 64install.packages("ggplot2") 65install.packages("plyr") 66``` 67 68In the event that a message is reported stating that a CRAN mirror must be 69selected first, specify a mirror by adding in the repo parameter. 70 71If we used the "<http://cran.us.r-project.org>" mirror, it could look something 72like this: 73 74```r 75install.packages("ggplot2", repo="http://cran.us.r-project.org") 76``` 77 78Of course, use an appropriate mirror based on location. 79A list of mirrors is [located here](https://cran.r-project.org/mirrors.html). 80 81## Running benchmarks 82 83### Running individual benchmarks 84 85This can be useful for debugging a benchmark or doing a quick performance 86measure. But it does not provide the statistical information to make any 87conclusions about the performance. 88 89Individual benchmarks can be executed by simply executing the benchmark script 90with node. 91 92```console 93$ node benchmark/buffers/buffer-tostring.js 94 95buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704 96buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899 97buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695 98buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733 99buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723 100buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845 101buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129 102buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765 103``` 104 105Each line represents a single benchmark with parameters specified as 106`${variable}=${value}`. Each configuration combination is executed in a separate 107process. This ensures that benchmark results aren't affected by the execution 108order due to V8 optimizations. **The last number is the rate of operations 109measured in ops/sec (higher is better).** 110 111Furthermore a subset of the configurations can be specified, by setting them in 112the process arguments: 113 114```console 115$ node benchmark/buffers/buffer-tostring.js len=1024 116 117buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504 118buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295 119``` 120 121### Running all benchmarks 122 123Similar to running individual benchmarks, a group of benchmarks can be executed 124by using the `run.js` tool. To see how to use this script, 125run `node benchmark/run.js`. Again this does not provide the statistical 126information to make any conclusions. 127 128```console 129$ node benchmark/run.js assert 130 131assert/deepequal-buffer.js 132assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 133assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 134... 135 136assert/deepequal-map.js 137assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 138assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 139... 140 141assert/deepequal-object.js 142assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 143assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 144... 145``` 146 147It is possible to execute more groups by adding extra process arguments. 148 149```console 150$ node benchmark/run.js assert async_hooks 151``` 152 153#### Filtering benchmarks 154 155`benchmark/run.js` and `benchmark/compare.js` have `--filter pattern` and 156`--exclude pattern` options, which can be used to run a subset of benchmarks or 157to exclude specific benchmarks from the execution, respectively. 158 159```console 160$ node benchmark/run.js --filter "deepequal-b" assert 161 162assert/deepequal-buffer.js 163assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 164assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 165 166$ node benchmark/run.js --exclude "deepequal-b" assert 167 168assert/deepequal-map.js 169assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 170assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 171... 172 173assert/deepequal-object.js 174assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 175assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 176... 177``` 178 179`--filter` and `--exclude` can be repeated to provide multiple patterns. 180 181```console 182$ node benchmark/run.js --filter "deepequal-b" --filter "deepequal-m" assert 183 184assert/deepequal-buffer.js 185assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 186assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 187 188assert/deepequal-map.js 189assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 190assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 191 192$ node benchmark/run.js --exclude "deepequal-b" --exclude "deepequal-m" assert 193 194assert/deepequal-object.js 195assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 196assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 197... 198 199assert/deepequal-prims-and-objs-big-array-set.js 200assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Array" strict=0 len=20000 n=25 primitive="string": 865.2977195251661 201assert/deepequal-prims-and-objs-big-array-set.js method="notDeepEqual_Array" strict=0 len=20000 n=25 primitive="string": 827.8297281403861 202assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Set" strict=0 len=20000 n=25 primitive="string": 28,826.618268696366 203... 204``` 205 206If `--filter` and `--exclude` are used together, `--filter` is applied first, 207and `--exclude` is applied on the result of `--filter`: 208 209```console 210$ node benchmark/run.js --filter "bench-" process 211 212process/bench-env.js 213process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095 214process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867 215process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992 216process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144 217process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569 218 219process/bench-hrtime.js 220process/bench-hrtime.js type="raw" n=1000000: 13,178,002.113936031 221process/bench-hrtime.js type="diff" n=1000000: 11,585,435.712423025 222process/bench-hrtime.js type="bigint" n=1000000: 13,342,884.703919787 223 224$ node benchmark/run.js --filter "bench-" --exclude "hrtime" process 225 226process/bench-env.js 227process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095 228process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867 229process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992 230process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144 231process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569 232``` 233 234### Comparing Node.js versions 235 236To compare the effect of a new Node.js version use the `compare.js` tool. This 237will run each benchmark multiple times, making it possible to calculate 238statistics on the performance measures. To see how to use this script, 239run `node benchmark/compare.js`. 240 241As an example on how to check for a possible performance improvement, the 242[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as 243an example. This pull request _claims_ to improve the performance of the 244`string_decoder` module. 245 246First build two versions of Node.js, one from the master branch (here called 247`./node-master`) and another with the pull request applied (here called 248`./node-pr-5134`). 249 250To run multiple compiled versions in parallel you need to copy the output of the 251build: `cp ./out/Release/node ./node-master`. Check out the following example: 252 253```console 254$ git checkout master 255$ ./configure && make -j4 256$ cp ./out/Release/node ./node-master 257 258$ git checkout pr-5134 259$ ./configure && make -j4 260$ cp ./out/Release/node ./node-pr-5134 261``` 262 263The `compare.js` tool will then produce a csv file with the benchmark results. 264 265```console 266$ node benchmark/compare.js --old ./node-master --new ./node-pr-5134 string_decoder > compare-pr-5134.csv 267``` 268 269*Tips: there are some useful options of `benchmark/compare.js`. For example, 270if you want to compare the benchmark of a single script instead of a whole 271module, you can use the `--filter` option:* 272 273```console 274 --new ./new-node-binary new node binary (required) 275 --old ./old-node-binary old node binary (required) 276 --runs 30 number of samples 277 --filter pattern string to filter benchmark scripts 278 --set variable=value set benchmark variable (can be repeated) 279 --no-progress don't show benchmark progress indicator 280``` 281 282For analysing the benchmark results use the `compare.R` tool. 283 284```console 285$ cat compare-pr-5134.csv | Rscript benchmark/compare.R 286 287 confidence improvement accuracy (*) (**) (***) 288 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40% 289 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='utf8' ** -0.81 % ±0.53% ±0.71% ±0.93% 290 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45% 291 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='base64-ascii' *** -1.57 % ±0.83% ±1.11% ±1.46% 292... 293``` 294 295In the output, _improvement_ is the relative improvement of the new version, 296hopefully this is positive. _confidence_ tells if there is enough 297statistical evidence to validate the _improvement_. If there is enough evidence 298then there will be at least one star (`*`), more stars is just better. **However 299if there are no stars, then don't make any conclusions based on the 300_improvement_.** Sometimes this is fine, for example if no improvements are 301expected, then there shouldn't be any stars. 302 303**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows 304a statistical significant difference, there is a 5% risk that this 305difference doesn't actually exist. For a single benchmark this is not an 306issue. But when considering 20 benchmarks it's normal that one of them 307will show significance, when it shouldn't. A possible solution is to instead 308consider at least two stars (`**`) as the threshold, in that case the risk 309is 1%. If three stars (`***`) is considered the risk is 0.1%. However this 310may require more runs to obtain (can be set with `--runs`). 311 312_For the statistically minded, the R script performs an [independent/unpaired 3132-group t-test][t-test], with the null hypothesis that the performance is the 314same for both versions. The confidence field will show a star if the p-value 315is less than `0.05`._ 316 317The `compare.R` tool can also produce a box plot by using the `--plot filename` 318option. In this case there are 48 different benchmark combinations, and there 319may be a need to filter the csv file. This can be done while benchmarking 320using the `--set` parameter (e.g. `--set encoding=ascii`) or by filtering 321results afterwards using tools such as `sed` or `grep`. In the `sed` case be 322sure to keep the first line since that contains the header information. 323 324```console 325$ cat compare-pr-5134.csv | sed '1p;/encoding='"'"ascii"'"'/!d' | Rscript benchmark/compare.R --plot compare-plot.png 326 327 confidence improvement accuracy (*) (**) (***) 328 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40% 329 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45% 330 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=4096 encoding='ascii' *** -4.06 % ±0.31% ±0.41% ±0.54% 331 string_decoder/string-decoder.js n=2500000 chunkLen=256 inLen=1024 encoding='ascii' *** -1.42 % ±0.58% ±0.77% ±1.01% 332... 333``` 334 335 336 337### Comparing parameters 338 339It can be useful to compare the performance for different parameters, for 340example to analyze the time complexity. 341 342To do this use the `scatter.js` tool, this will run a benchmark multiple times 343and generate a csv with the results. To see how to use this script, 344run `node benchmark/scatter.js`. 345 346```console 347$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv 348``` 349 350After generating the csv, a comparison table can be created using the 351`scatter.R` tool. Even more useful it creates an actual scatter plot when using 352the `--plot filename` option. 353 354```console 355$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log 356 357aggregating variable: inLen 358 359chunkLen encoding rate confidence.interval 360 16 ascii 1515855.1 334492.68 361 16 base64-ascii 403527.2 89677.70 362 16 base64-utf8 322352.8 70792.93 363 16 utf16le 1714567.5 388439.81 364 16 utf8 1100181.6 254141.32 365 64 ascii 3550402.0 661277.65 366 64 base64-ascii 1093660.3 229976.34 367 64 base64-utf8 997804.8 227238.04 368 64 utf16le 3372234.0 647274.88 369 64 utf8 1731941.2 360854.04 370 256 ascii 5033793.9 723354.30 371 256 base64-ascii 1447962.1 236625.96 372 256 base64-utf8 1357269.2 231045.70 373 256 utf16le 4039581.5 655483.16 374 256 utf8 1828672.9 360311.55 375 1024 ascii 5677592.7 624771.56 376 1024 base64-ascii 1494171.7 227302.34 377 1024 base64-utf8 1399218.9 224584.79 378 1024 utf16le 4157452.0 630416.28 379 1024 utf8 1824266.6 359628.52 380``` 381 382Because the scatter plot can only show two variables (in this case _chunkLen_ 383and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this 384can be solved by filtering. This can be done while benchmarking using the 385`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results 386afterwards using tools such as `sed` or `grep`. In the `sed` case be 387sure to keep the first line since that contains the header information. 388 389```console 390$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log 391 392chunkLen encoding rate confidence.interval 393 16 ascii 1302078.5 71692.27 394 16 base64-ascii 338669.1 15159.54 395 16 base64-utf8 281904.2 20326.75 396 16 utf16le 1381515.5 58533.61 397 16 utf8 831183.2 33631.01 398 64 ascii 4363402.8 224030.00 399 64 base64-ascii 1036825.9 48644.72 400 64 base64-utf8 780059.3 60994.98 401 64 utf16le 3900749.5 158366.84 402 64 utf8 1723710.6 80665.65 403 256 ascii 8472896.1 511822.51 404 256 base64-ascii 2215884.6 104347.53 405 256 base64-utf8 1996230.3 131778.47 406 256 utf16le 5824147.6 234550.82 407 256 utf8 2019428.8 100913.36 408 1024 ascii 8340189.4 598855.08 409 1024 base64-ascii 2201316.2 111777.68 410 1024 base64-utf8 2002272.9 128843.11 411 1024 utf16le 5789281.7 240642.77 412 1024 utf8 2025551.2 81770.69 413``` 414 415 416 417### Running Benchmarks on the CI 418 419To see the performance impact of a Pull Request by running benchmarks on 420the CI, check out [How to: Running core benchmarks on Node.js CI][benchmark-ci]. 421 422## Creating a benchmark 423 424### Basics of a benchmark 425 426All benchmarks use the `require('../common.js')` module. This contains the 427`createBenchmark(main, configs[, options])` method which will setup the 428benchmark. 429 430The arguments of `createBenchmark` are: 431 432* `main` {Function} The benchmark function, 433 where the code running operations and controlling timers should go 434* `configs` {Object} The benchmark parameters. `createBenchmark` will run all 435 possible combinations of these parameters, unless specified otherwise. 436 Each configuration is a property with an array of possible values. 437 The configuration values can only be strings or numbers. 438* `options` {Object} The benchmark options. At the moment only the `flags` 439 option for specifying command line flags is supported. 440 441`createBenchmark` returns a `bench` object, which is used for timing 442the runtime of the benchmark. Run `bench.start()` after the initialization 443and `bench.end(n)` when the benchmark is done. `n` is the number of operations 444performed in the benchmark. 445 446The benchmark script will be run twice: 447 448The first pass will configure the benchmark with the combination of 449parameters specified in `configs`, and WILL NOT run the `main` function. 450In this pass, no flags except the ones directly passed via commands 451when running the benchmarks will be used. 452 453In the second pass, the `main` function will be run, and the process 454will be launched with: 455 456* The flags passed into `createBenchmark` (the third argument) 457* The flags in the command passed when the benchmark was run 458 459Beware that any code outside the `main` function will be run twice 460in different processes. This could be troublesome if the code 461outside the `main` function has side effects. In general, prefer putting 462the code inside the `main` function if it's more than just declaration. 463 464```js 465'use strict'; 466const common = require('../common.js'); 467const { SlowBuffer } = require('buffer'); 468 469const configs = { 470 // Number of operations, specified here so they show up in the report. 471 // Most benchmarks just use one value for all runs. 472 n: [1024], 473 type: ['fast', 'slow'], // Custom configurations 474 size: [16, 128, 1024] // Custom configurations 475}; 476 477const options = { 478 // Add --expose-internals in order to require internal modules in main 479 flags: ['--zero-fill-buffers'] 480}; 481 482// `main` and `configs` are required, `options` is optional. 483const bench = common.createBenchmark(main, configs, options); 484 485// Any code outside main will be run twice, 486// in different processes, with different command line arguments. 487 488function main(conf) { 489 // Only flags that have been passed to createBenchmark 490 // earlier when main is run will be in effect. 491 // In order to benchmark the internal modules, require them here. For example: 492 // const URL = require('internal/url').URL 493 494 // Start the timer 495 bench.start(); 496 497 // Do operations here 498 const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer; 499 500 for (let i = 0; i < conf.n; i++) { 501 new BufferConstructor(conf.size); 502 } 503 504 // End the timer, pass in the number of operations 505 bench.end(conf.n); 506} 507``` 508 509### Creating an HTTP benchmark 510 511The `bench` object returned by `createBenchmark` implements 512`http(options, callback)` method. It can be used to run external tool to 513benchmark HTTP servers. 514 515```js 516'use strict'; 517 518const common = require('../common.js'); 519 520const bench = common.createBenchmark(main, { 521 kb: [64, 128, 256, 1024], 522 connections: [100, 500], 523 duration: 5 524}); 525 526function main(conf) { 527 const http = require('http'); 528 const len = conf.kb * 1024; 529 const chunk = Buffer.alloc(len, 'x'); 530 const server = http.createServer((req, res) => { 531 res.end(chunk); 532 }); 533 534 server.listen(common.PORT, () => { 535 bench.http({ 536 connections: conf.connections, 537 }, () => { 538 server.close(); 539 }); 540 }); 541} 542``` 543 544Supported options keys are: 545 546* `port` - defaults to `common.PORT` 547* `path` - defaults to `/` 548* `connections` - number of concurrent connections to use, defaults to 100 549* `duration` - duration of the benchmark in seconds, defaults to 10 550* `benchmarker` - benchmarker to use, defaults to the first available http 551 benchmarker 552 553[autocannon]: https://github.com/mcollina/autocannon 554[benchmark-ci]: https://github.com/nodejs/benchmarking/blob/master/docs/core_benchmarks.md 555[git-for-windows]: https://git-scm.com/download/win 556[nghttp2.org]: https://nghttp2.org 557[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes.2C_unequal_variances 558[wrk]: https://github.com/wg/wrk 559