1# How to write and run benchmarks in Node.js core 2 3## Table of contents 4 5* [Prerequisites](#prerequisites) 6 * [HTTP benchmark requirements](#http-benchmark-requirements) 7 * [HTTPS benchmark requirements](#https-benchmark-requirements) 8 * [HTTP/2 benchmark requirements](#http2-benchmark-requirements) 9 * [Benchmark analysis requirements](#benchmark-analysis-requirements) 10* [Running benchmarks](#running-benchmarks) 11 * [Running individual benchmarks](#running-individual-benchmarks) 12 * [Running all benchmarks](#running-all-benchmarks) 13 * [Filtering benchmarks](#filtering-benchmarks) 14 * [Comparing Node.js versions](#comparing-nodejs-versions) 15 * [Comparing parameters](#comparing-parameters) 16 * [Running benchmarks on the CI](#running-benchmarks-on-the-ci) 17* [Creating a benchmark](#creating-a-benchmark) 18 * [Basics of a benchmark](#basics-of-a-benchmark) 19 * [Creating an HTTP benchmark](#creating-an-http-benchmark) 20 21## Prerequisites 22 23Basic Unix tools are required for some benchmarks. 24[Git for Windows][git-for-windows] includes Git Bash and the necessary tools, 25which need to be included in the global Windows `PATH`. 26 27### HTTP benchmark requirements 28 29Most of the HTTP benchmarks require a benchmarker to be installed. This can be 30either [`wrk`][wrk] or [`autocannon`][autocannon]. 31 32`Autocannon` is a Node.js script that can be installed using 33`npm install -g autocannon`. It will use the Node.js executable that is in the 34path. In order to compare two HTTP benchmark runs, make sure that the 35Node.js version in the path is not altered. 36 37`wrk` may be available through one of the available package managers. If not, 38it can be easily built [from source][wrk] via `make`. 39 40By default, `wrk` will be used as the benchmarker. If it is not available, 41`autocannon` will be used in its place. When creating an HTTP benchmark, the 42benchmarker to be used should be specified by providing it as an argument: 43 44`node benchmark/run.js --set benchmarker=autocannon http` 45 46`node benchmark/http/simple.js benchmarker=autocannon` 47 48#### HTTPS benchmark requirements 49 50To run the `https` benchmarks, one of `autocannon` or `wrk` benchmarkers must 51be used. 52 53`node benchmark/https/simple.js benchmarker=autocannon` 54 55#### HTTP/2 benchmark requirements 56 57To run the `http2` benchmarks, the `h2load` benchmarker must be used. The 58`h2load` tool is a component of the `nghttp2` project and may be installed 59from [nghttp2.org][] or built from source. 60 61`node benchmark/http2/simple.js benchmarker=h2load` 62 63### Benchmark analysis requirements 64 65To analyze the results, `R` should be installed. Use one of the available 66package managers or download it from <https://www.r-project.org/>. 67 68The R packages `ggplot2` and `plyr` are also used and can be installed using 69the R REPL. 70 71```console 72$ R 73install.packages("ggplot2") 74install.packages("plyr") 75``` 76 77If a message states that a CRAN mirror must be selected first, specify a mirror 78with the `repo` parameter. 79 80```r 81install.packages("ggplot2", repo="http://cran.us.r-project.org") 82``` 83 84Of course, use an appropriate mirror based on location. 85A list of mirrors is [located here](https://cran.r-project.org/mirrors.html). 86 87## Running benchmarks 88 89### Running individual benchmarks 90 91This can be useful for debugging a benchmark or doing a quick performance 92measure. But it does not provide the statistical information to make any 93conclusions about the performance. 94 95Individual benchmarks can be executed by simply executing the benchmark script 96with node. 97 98```console 99$ node benchmark/buffers/buffer-tostring.js 100 101buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704 102buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899 103buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695 104buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733 105buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723 106buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845 107buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129 108buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765 109``` 110 111Each line represents a single benchmark with parameters specified as 112`${variable}=${value}`. Each configuration combination is executed in a separate 113process. This ensures that benchmark results aren't affected by the execution 114order due to V8 optimizations. **The last number is the rate of operations 115measured in ops/sec (higher is better).** 116 117Furthermore a subset of the configurations can be specified, by setting them in 118the process arguments: 119 120```console 121$ node benchmark/buffers/buffer-tostring.js len=1024 122 123buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504 124buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295 125``` 126 127### Running all benchmarks 128 129Similar to running individual benchmarks, a group of benchmarks can be executed 130by using the `run.js` tool. To see how to use this script, 131run `node benchmark/run.js`. Again this does not provide the statistical 132information to make any conclusions. 133 134```console 135$ node benchmark/run.js assert 136 137assert/deepequal-buffer.js 138assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 139assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 140... 141 142assert/deepequal-map.js 143assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 144assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 145... 146 147assert/deepequal-object.js 148assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 149assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 150... 151``` 152 153It is possible to execute more groups by adding extra process arguments. 154 155```console 156$ node benchmark/run.js assert async_hooks 157``` 158 159#### Filtering benchmarks 160 161`benchmark/run.js` and `benchmark/compare.js` have `--filter pattern` and 162`--exclude pattern` options, which can be used to run a subset of benchmarks or 163to exclude specific benchmarks from the execution, respectively. 164 165```console 166$ node benchmark/run.js --filter "deepequal-b" assert 167 168assert/deepequal-buffer.js 169assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 170assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 171 172$ node benchmark/run.js --exclude "deepequal-b" assert 173 174assert/deepequal-map.js 175assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 176assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 177... 178 179assert/deepequal-object.js 180assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 181assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 182... 183``` 184 185`--filter` and `--exclude` can be repeated to provide multiple patterns. 186 187```console 188$ node benchmark/run.js --filter "deepequal-b" --filter "deepequal-m" assert 189 190assert/deepequal-buffer.js 191assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 192assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 193 194assert/deepequal-map.js 195assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 196assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 197 198$ node benchmark/run.js --exclude "deepequal-b" --exclude "deepequal-m" assert 199 200assert/deepequal-object.js 201assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 202assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 203... 204 205assert/deepequal-prims-and-objs-big-array-set.js 206assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Array" strict=0 len=20000 n=25 primitive="string": 865.2977195251661 207assert/deepequal-prims-and-objs-big-array-set.js method="notDeepEqual_Array" strict=0 len=20000 n=25 primitive="string": 827.8297281403861 208assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Set" strict=0 len=20000 n=25 primitive="string": 28,826.618268696366 209... 210``` 211 212If `--filter` and `--exclude` are used together, `--filter` is applied first, 213and `--exclude` is applied on the result of `--filter`: 214 215```console 216$ node benchmark/run.js --filter "bench-" process 217 218process/bench-env.js 219process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095 220process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867 221process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992 222process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144 223process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569 224 225process/bench-hrtime.js 226process/bench-hrtime.js type="raw" n=1000000: 13,178,002.113936031 227process/bench-hrtime.js type="diff" n=1000000: 11,585,435.712423025 228process/bench-hrtime.js type="bigint" n=1000000: 13,342,884.703919787 229 230$ node benchmark/run.js --filter "bench-" --exclude "hrtime" process 231 232process/bench-env.js 233process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095 234process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867 235process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992 236process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144 237process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569 238``` 239 240### Comparing Node.js versions 241 242To compare the effect of a new Node.js version use the `compare.js` tool. This 243will run each benchmark multiple times, making it possible to calculate 244statistics on the performance measures. To see how to use this script, 245run `node benchmark/compare.js`. 246 247As an example on how to check for a possible performance improvement, the 248[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as 249an example. This pull request _claims_ to improve the performance of the 250`string_decoder` module. 251 252First build two versions of Node.js, one from the master branch (here called 253`./node-master`) and another with the pull request applied (here called 254`./node-pr-5134`). 255 256To run multiple compiled versions in parallel you need to copy the output of the 257build: `cp ./out/Release/node ./node-master`. Check out the following example: 258 259```console 260$ git checkout master 261$ ./configure && make -j4 262$ cp ./out/Release/node ./node-master 263 264$ git checkout pr-5134 265$ ./configure && make -j4 266$ cp ./out/Release/node ./node-pr-5134 267``` 268 269The `compare.js` tool will then produce a csv file with the benchmark results. 270 271```console 272$ node benchmark/compare.js --old ./node-master --new ./node-pr-5134 string_decoder > compare-pr-5134.csv 273``` 274 275*Tips: there are some useful options of `benchmark/compare.js`. For example, 276if you want to compare the benchmark of a single script instead of a whole 277module, you can use the `--filter` option:* 278 279```console 280 --new ./new-node-binary new node binary (required) 281 --old ./old-node-binary old node binary (required) 282 --runs 30 number of samples 283 --filter pattern string to filter benchmark scripts 284 --set variable=value set benchmark variable (can be repeated) 285 --no-progress don't show benchmark progress indicator 286``` 287 288For analysing the benchmark results use the `compare.R` tool. 289 290```console 291$ cat compare-pr-5134.csv | Rscript benchmark/compare.R 292 293 confidence improvement accuracy (*) (**) (***) 294 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40% 295 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='utf8' ** -0.81 % ±0.53% ±0.71% ±0.93% 296 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45% 297 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='base64-ascii' *** -1.57 % ±0.83% ±1.11% ±1.46% 298... 299``` 300 301In the output, _improvement_ is the relative improvement of the new version, 302hopefully this is positive. _confidence_ tells if there is enough 303statistical evidence to validate the _improvement_. If there is enough evidence 304then there will be at least one star (`*`), more stars is just better. **However 305if there are no stars, then don't make any conclusions based on the 306_improvement_.** Sometimes this is fine, for example if no improvements are 307expected, then there shouldn't be any stars. 308 309**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows 310a statistical significant difference, there is a 5% risk that this 311difference doesn't actually exist. For a single benchmark this is not an 312issue. But when considering 20 benchmarks it's normal that one of them 313will show significance, when it shouldn't. A possible solution is to instead 314consider at least two stars (`**`) as the threshold, in that case the risk 315is 1%. If three stars (`***`) is considered the risk is 0.1%. However this 316may require more runs to obtain (can be set with `--runs`). 317 318_For the statistically minded, the R script performs an [independent/unpaired 3192-group t-test][t-test], with the null hypothesis that the performance is the 320same for both versions. The confidence field will show a star if the p-value 321is less than `0.05`._ 322 323The `compare.R` tool can also produce a box plot by using the `--plot filename` 324option. In this case there are 48 different benchmark combinations, and there 325may be a need to filter the csv file. This can be done while benchmarking 326using the `--set` parameter (e.g. `--set encoding=ascii`) or by filtering 327results afterwards using tools such as `sed` or `grep`. In the `sed` case be 328sure to keep the first line since that contains the header information. 329 330```console 331$ cat compare-pr-5134.csv | sed '1p;/encoding='"'"ascii"'"'/!d' | Rscript benchmark/compare.R --plot compare-plot.png 332 333 confidence improvement accuracy (*) (**) (***) 334 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40% 335 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45% 336 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=4096 encoding='ascii' *** -4.06 % ±0.31% ±0.41% ±0.54% 337 string_decoder/string-decoder.js n=2500000 chunkLen=256 inLen=1024 encoding='ascii' *** -1.42 % ±0.58% ±0.77% ±1.01% 338... 339``` 340 341![compare tool boxplot](doc_img/compare-boxplot.png) 342 343### Comparing parameters 344 345It can be useful to compare the performance for different parameters, for 346example to analyze the time complexity. 347 348To do this use the `scatter.js` tool, this will run a benchmark multiple times 349and generate a csv with the results. To see how to use this script, 350run `node benchmark/scatter.js`. 351 352```console 353$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv 354``` 355 356After generating the csv, a comparison table can be created using the 357`scatter.R` tool. Even more useful it creates an actual scatter plot when using 358the `--plot filename` option. 359 360```console 361$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log 362 363aggregating variable: inLen 364 365chunkLen encoding rate confidence.interval 366 16 ascii 1515855.1 334492.68 367 16 base64-ascii 403527.2 89677.70 368 16 base64-utf8 322352.8 70792.93 369 16 utf16le 1714567.5 388439.81 370 16 utf8 1100181.6 254141.32 371 64 ascii 3550402.0 661277.65 372 64 base64-ascii 1093660.3 229976.34 373 64 base64-utf8 997804.8 227238.04 374 64 utf16le 3372234.0 647274.88 375 64 utf8 1731941.2 360854.04 376 256 ascii 5033793.9 723354.30 377 256 base64-ascii 1447962.1 236625.96 378 256 base64-utf8 1357269.2 231045.70 379 256 utf16le 4039581.5 655483.16 380 256 utf8 1828672.9 360311.55 381 1024 ascii 5677592.7 624771.56 382 1024 base64-ascii 1494171.7 227302.34 383 1024 base64-utf8 1399218.9 224584.79 384 1024 utf16le 4157452.0 630416.28 385 1024 utf8 1824266.6 359628.52 386``` 387 388Because the scatter plot can only show two variables (in this case _chunkLen_ 389and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this 390can be solved by filtering. This can be done while benchmarking using the 391`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results 392afterwards using tools such as `sed` or `grep`. In the `sed` case be 393sure to keep the first line since that contains the header information. 394 395```console 396$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log 397 398chunkLen encoding rate confidence.interval 399 16 ascii 1302078.5 71692.27 400 16 base64-ascii 338669.1 15159.54 401 16 base64-utf8 281904.2 20326.75 402 16 utf16le 1381515.5 58533.61 403 16 utf8 831183.2 33631.01 404 64 ascii 4363402.8 224030.00 405 64 base64-ascii 1036825.9 48644.72 406 64 base64-utf8 780059.3 60994.98 407 64 utf16le 3900749.5 158366.84 408 64 utf8 1723710.6 80665.65 409 256 ascii 8472896.1 511822.51 410 256 base64-ascii 2215884.6 104347.53 411 256 base64-utf8 1996230.3 131778.47 412 256 utf16le 5824147.6 234550.82 413 256 utf8 2019428.8 100913.36 414 1024 ascii 8340189.4 598855.08 415 1024 base64-ascii 2201316.2 111777.68 416 1024 base64-utf8 2002272.9 128843.11 417 1024 utf16le 5789281.7 240642.77 418 1024 utf8 2025551.2 81770.69 419``` 420 421![compare tool boxplot](doc_img/scatter-plot.png) 422 423### Running benchmarks on the CI 424 425To see the performance impact of a pull request by running benchmarks on 426the CI, check out [How to: Running core benchmarks on Node.js CI][benchmark-ci]. 427 428## Creating a benchmark 429 430### Basics of a benchmark 431 432All benchmarks use the `require('../common.js')` module. This contains the 433`createBenchmark(main, configs[, options])` method which will setup the 434benchmark. 435 436The arguments of `createBenchmark` are: 437 438* `main` {Function} The benchmark function, 439 where the code running operations and controlling timers should go 440* `configs` {Object} The benchmark parameters. `createBenchmark` will run all 441 possible combinations of these parameters, unless specified otherwise. 442 Each configuration is a property with an array of possible values. 443 The configuration values can only be strings or numbers. 444* `options` {Object} The benchmark options. At the moment only the `flags` 445 option for specifying command line flags is supported. 446 447`createBenchmark` returns a `bench` object, which is used for timing 448the runtime of the benchmark. Run `bench.start()` after the initialization 449and `bench.end(n)` when the benchmark is done. `n` is the number of operations 450performed in the benchmark. 451 452The benchmark script will be run twice: 453 454The first pass will configure the benchmark with the combination of 455parameters specified in `configs`, and WILL NOT run the `main` function. 456In this pass, no flags except the ones directly passed via commands 457when running the benchmarks will be used. 458 459In the second pass, the `main` function will be run, and the process 460will be launched with: 461 462* The flags passed into `createBenchmark` (the third argument) 463* The flags in the command passed when the benchmark was run 464 465Beware that any code outside the `main` function will be run twice 466in different processes. This could be troublesome if the code 467outside the `main` function has side effects. In general, prefer putting 468the code inside the `main` function if it's more than just declaration. 469 470```js 471'use strict'; 472const common = require('../common.js'); 473const { SlowBuffer } = require('buffer'); 474 475const configs = { 476 // Number of operations, specified here so they show up in the report. 477 // Most benchmarks just use one value for all runs. 478 n: [1024], 479 type: ['fast', 'slow'], // Custom configurations 480 size: [16, 128, 1024] // Custom configurations 481}; 482 483const options = { 484 // Add --expose-internals in order to require internal modules in main 485 flags: ['--zero-fill-buffers'] 486}; 487 488// `main` and `configs` are required, `options` is optional. 489const bench = common.createBenchmark(main, configs, options); 490 491// Any code outside main will be run twice, 492// in different processes, with different command line arguments. 493 494function main(conf) { 495 // Only flags that have been passed to createBenchmark 496 // earlier when main is run will be in effect. 497 // In order to benchmark the internal modules, require them here. For example: 498 // const URL = require('internal/url').URL 499 500 // Start the timer 501 bench.start(); 502 503 // Do operations here 504 const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer; 505 506 for (let i = 0; i < conf.n; i++) { 507 new BufferConstructor(conf.size); 508 } 509 510 // End the timer, pass in the number of operations 511 bench.end(conf.n); 512} 513``` 514 515### Creating an HTTP benchmark 516 517The `bench` object returned by `createBenchmark` implements 518`http(options, callback)` method. It can be used to run external tool to 519benchmark HTTP servers. 520 521```js 522'use strict'; 523 524const common = require('../common.js'); 525 526const bench = common.createBenchmark(main, { 527 kb: [64, 128, 256, 1024], 528 connections: [100, 500], 529 duration: 5 530}); 531 532function main(conf) { 533 const http = require('http'); 534 const len = conf.kb * 1024; 535 const chunk = Buffer.alloc(len, 'x'); 536 const server = http.createServer((req, res) => { 537 res.end(chunk); 538 }); 539 540 server.listen(common.PORT, () => { 541 bench.http({ 542 connections: conf.connections, 543 }, () => { 544 server.close(); 545 }); 546 }); 547} 548``` 549 550Supported options keys are: 551 552* `port` - defaults to `common.PORT` 553* `path` - defaults to `/` 554* `connections` - number of concurrent connections to use, defaults to 100 555* `duration` - duration of the benchmark in seconds, defaults to 10 556* `benchmarker` - benchmarker to use, defaults to the first available http 557 benchmarker 558 559[autocannon]: https://github.com/mcollina/autocannon 560[benchmark-ci]: https://github.com/nodejs/benchmarking/blob/HEAD/docs/core_benchmarks.md 561[git-for-windows]: https://git-scm.com/download/win 562[nghttp2.org]: https://nghttp2.org 563[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes.2C_unequal_variances 564[wrk]: https://github.com/wg/wrk 565