1# How to write and run benchmarks in Node.js core 2 3## Table of contents 4 5* [Prerequisites](#prerequisites) 6 * [HTTP benchmark requirements](#http-benchmark-requirements) 7 * [HTTPS benchmark requirements](#https-benchmark-requirements) 8 * [HTTP/2 benchmark requirements](#http2-benchmark-requirements) 9 * [Benchmark analysis requirements](#benchmark-analysis-requirements) 10* [Running benchmarks](#running-benchmarks) 11 * [Running individual benchmarks](#running-individual-benchmarks) 12 * [Running all benchmarks](#running-all-benchmarks) 13 * [Filtering benchmarks](#filtering-benchmarks) 14 * [Comparing Node.js versions](#comparing-nodejs-versions) 15 * [Comparing parameters](#comparing-parameters) 16 * [Running benchmarks on the CI](#running-benchmarks-on-the-ci) 17* [Creating a benchmark](#creating-a-benchmark) 18 * [Basics of a benchmark](#basics-of-a-benchmark) 19 * [Creating an HTTP benchmark](#creating-an-http-benchmark) 20 21## Prerequisites 22 23Basic Unix tools are required for some benchmarks. 24[Git for Windows][git-for-windows] includes Git Bash and the necessary tools, 25which need to be included in the global Windows `PATH`. 26 27### HTTP benchmark requirements 28 29Most of the HTTP benchmarks require a benchmarker to be installed. This can be 30either [`wrk`][wrk] or [`autocannon`][autocannon]. 31 32`Autocannon` is a Node.js script that can be installed using 33`npm install -g autocannon`. It will use the Node.js executable that is in the 34path. In order to compare two HTTP benchmark runs, make sure that the 35Node.js version in the path is not altered. 36 37`wrk` may be available through one of the available package managers. If not, 38it can be easily built [from source][wrk] via `make`. 39 40By default, `wrk` will be used as the benchmarker. If it is not available, 41`autocannon` will be used in its place. When creating an HTTP benchmark, the 42benchmarker to be used should be specified by providing it as an argument: 43 44`node benchmark/run.js --set benchmarker=autocannon http` 45 46`node benchmark/http/simple.js benchmarker=autocannon` 47 48#### HTTPS benchmark requirements 49 50To run the `https` benchmarks, one of `autocannon` or `wrk` benchmarkers must 51be used. 52 53`node benchmark/https/simple.js benchmarker=autocannon` 54 55#### HTTP/2 benchmark requirements 56 57To run the `http2` benchmarks, the `h2load` benchmarker must be used. The 58`h2load` tool is a component of the `nghttp2` project and may be installed 59from [nghttp2.org][] or built from source. 60 61`node benchmark/http2/simple.js benchmarker=h2load` 62 63### Benchmark analysis requirements 64 65To analyze the results statistically, you can use either the 66[node-benchmark-compare][] tool or the R script `benchmark/compare.R`. 67 68[node-benchmark-compare][] is a Node.js script that can be installed with 69`npm install -g node-benchmark-compare`. 70 71To draw comparison plots when analyzing the results, `R` must be installed. 72Use one of the available package managers or download it from 73<https://www.r-project.org/>. 74 75The R packages `ggplot2` and `plyr` are also used and can be installed using 76the R REPL. 77 78```console 79$ R 80install.packages("ggplot2") 81install.packages("plyr") 82``` 83 84If a message states that a CRAN mirror must be selected first, specify a mirror 85with the `repo` parameter. 86 87```r 88install.packages("ggplot2", repo="http://cran.us.r-project.org") 89``` 90 91Of course, use an appropriate mirror based on location. 92A list of mirrors is [located here](https://cran.r-project.org/mirrors.html). 93 94## Running benchmarks 95 96### Running individual benchmarks 97 98This can be useful for debugging a benchmark or doing a quick performance 99measure. But it does not provide the statistical information to make any 100conclusions about the performance. 101 102Individual benchmarks can be executed by simply executing the benchmark script 103with node. 104 105```console 106$ node benchmark/buffers/buffer-tostring.js 107 108buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704 109buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899 110buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695 111buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733 112buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723 113buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845 114buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129 115buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765 116``` 117 118Each line represents a single benchmark with parameters specified as 119`${variable}=${value}`. Each configuration combination is executed in a separate 120process. This ensures that benchmark results aren't affected by the execution 121order due to V8 optimizations. **The last number is the rate of operations 122measured in ops/sec (higher is better).** 123 124Furthermore a subset of the configurations can be specified, by setting them in 125the process arguments: 126 127```console 128$ node benchmark/buffers/buffer-tostring.js len=1024 129 130buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504 131buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295 132``` 133 134### Running all benchmarks 135 136Similar to running individual benchmarks, a group of benchmarks can be executed 137by using the `run.js` tool. To see how to use this script, 138run `node benchmark/run.js`. Again this does not provide the statistical 139information to make any conclusions. 140 141```console 142$ node benchmark/run.js assert 143 144assert/deepequal-buffer.js 145assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 146assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 147... 148 149assert/deepequal-map.js 150assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 151assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 152... 153 154assert/deepequal-object.js 155assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 156assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 157... 158``` 159 160It is possible to execute more groups by adding extra process arguments. 161 162```console 163$ node benchmark/run.js assert async_hooks 164``` 165 166#### Filtering benchmarks 167 168`benchmark/run.js` and `benchmark/compare.js` have `--filter pattern` and 169`--exclude pattern` options, which can be used to run a subset of benchmarks or 170to exclude specific benchmarks from the execution, respectively. 171 172```console 173$ node benchmark/run.js --filter "deepequal-b" assert 174 175assert/deepequal-buffer.js 176assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 177assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 178 179$ node benchmark/run.js --exclude "deepequal-b" assert 180 181assert/deepequal-map.js 182assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 183assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 184... 185 186assert/deepequal-object.js 187assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 188assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 189... 190``` 191 192`--filter` and `--exclude` can be repeated to provide multiple patterns. 193 194```console 195$ node benchmark/run.js --filter "deepequal-b" --filter "deepequal-m" assert 196 197assert/deepequal-buffer.js 198assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788 199assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848 200 201assert/deepequal-map.js 202assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332 203assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833 204 205$ node benchmark/run.js --exclude "deepequal-b" --exclude "deepequal-m" assert 206 207assert/deepequal-object.js 208assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475 209assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213 210... 211 212assert/deepequal-prims-and-objs-big-array-set.js 213assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Array" strict=0 len=20000 n=25 primitive="string": 865.2977195251661 214assert/deepequal-prims-and-objs-big-array-set.js method="notDeepEqual_Array" strict=0 len=20000 n=25 primitive="string": 827.8297281403861 215assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Set" strict=0 len=20000 n=25 primitive="string": 28,826.618268696366 216... 217``` 218 219If `--filter` and `--exclude` are used together, `--filter` is applied first, 220and `--exclude` is applied on the result of `--filter`: 221 222```console 223$ node benchmark/run.js --filter "bench-" process 224 225process/bench-env.js 226process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095 227process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867 228process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992 229process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144 230process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569 231 232process/bench-hrtime.js 233process/bench-hrtime.js type="raw" n=1000000: 13,178,002.113936031 234process/bench-hrtime.js type="diff" n=1000000: 11,585,435.712423025 235process/bench-hrtime.js type="bigint" n=1000000: 13,342,884.703919787 236 237$ node benchmark/run.js --filter "bench-" --exclude "hrtime" process 238 239process/bench-env.js 240process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095 241process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867 242process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992 243process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144 244process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569 245``` 246 247### Comparing Node.js versions 248 249To compare the effect of a new Node.js version use the `compare.js` tool. This 250will run each benchmark multiple times, making it possible to calculate 251statistics on the performance measures. To see how to use this script, 252run `node benchmark/compare.js`. 253 254As an example on how to check for a possible performance improvement, the 255[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as 256an example. This pull request _claims_ to improve the performance of the 257`node:string_decoder` module. 258 259First build two versions of Node.js, one from the `main` branch (here called 260`./node-main`) and another with the pull request applied (here called 261`./node-pr-5134`). 262 263To run multiple compiled versions in parallel you need to copy the output of the 264build: `cp ./out/Release/node ./node-main`. Check out the following example: 265 266```console 267$ git checkout main 268$ ./configure && make -j4 269$ cp ./out/Release/node ./node-main 270 271$ git checkout pr-5134 272$ ./configure && make -j4 273$ cp ./out/Release/node ./node-pr-5134 274``` 275 276The `compare.js` tool will then produce a csv file with the benchmark results. 277 278```console 279$ node benchmark/compare.js --old ./node-main --new ./node-pr-5134 string_decoder > compare-pr-5134.csv 280``` 281 282_Tips: there are some useful options of `benchmark/compare.js`. For example, 283if you want to compare the benchmark of a single script instead of a whole 284module, you can use the `--filter` option:_ 285 286```console 287 --new ./new-node-binary new node binary (required) 288 --old ./old-node-binary old node binary (required) 289 --runs 30 number of samples 290 --filter pattern string to filter benchmark scripts 291 --set variable=value set benchmark variable (can be repeated) 292 --no-progress don't show benchmark progress indicator 293``` 294 295For analyzing the benchmark results, use [node-benchmark-compare][] or the R 296scripts: 297 298* `benchmark/compare.R` 299* `benchmark/bar.R` 300 301```console 302$ node-benchmark-compare compare-pr-5134.csv # or cat compare-pr-5134.csv | Rscript benchmark/compare.R 303 304 confidence improvement accuracy (*) (**) (***) 305 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40% 306 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='utf8' ** -0.81 % ±0.53% ±0.71% ±0.93% 307 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45% 308 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='base64-ascii' *** -1.57 % ±0.83% ±1.11% ±1.46% 309... 310``` 311 312In the output, _improvement_ is the relative improvement of the new version, 313hopefully this is positive. _confidence_ tells if there is enough 314statistical evidence to validate the _improvement_. If there is enough evidence 315then there will be at least one star (`*`), more stars is just better. **However 316if there are no stars, then don't make any conclusions based on the 317_improvement_.** Sometimes this is fine, for example if no improvements are 318expected, then there shouldn't be any stars. 319 320**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows 321a statistical significant difference, there is a 5% risk that this 322difference doesn't actually exist. For a single benchmark this is not an 323issue. But when considering 20 benchmarks it's normal that one of them 324will show significance, when it shouldn't. A possible solution is to instead 325consider at least two stars (`**`) as the threshold, in that case the risk 326is 1%. If three stars (`***`) is considered the risk is 0.1%. However this 327may require more runs to obtain (can be set with `--runs`). 328 329_For the statistically minded, the script performs an [independent/unpaired 3302-group t-test][t-test], with the null hypothesis that the performance is the 331same for both versions. The confidence field will show a star if the p-value 332is less than `0.05`._ 333 334The `compare.R` tool can additionally produce a box plot by using the 335`--plot filename` option. In this case there are 48 different benchmark 336combinations, and there may be a need to filter the csv file. This can be done 337while benchmarking using the `--set` parameter (e.g. `--set encoding=ascii`) or 338by filtering results afterwards using tools such as `sed` or `grep`. In the 339`sed` case be sure to keep the first line since that contains the header 340information. 341 342```console 343$ cat compare-pr-5134.csv | sed '1p;/encoding='"'"ascii"'"'/!d' | Rscript benchmark/compare.R --plot compare-plot.png 344 345 confidence improvement accuracy (*) (**) (***) 346 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40% 347 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45% 348 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=4096 encoding='ascii' *** -4.06 % ±0.31% ±0.41% ±0.54% 349 string_decoder/string-decoder.js n=2500000 chunkLen=256 inLen=1024 encoding='ascii' *** -1.42 % ±0.58% ±0.77% ±1.01% 350... 351``` 352 353![compare tool boxplot](doc_img/compare-boxplot.png) 354 355### Comparing parameters 356 357It can be useful to compare the performance for different parameters, for 358example to analyze the time complexity. 359 360To do this use the `scatter.js` tool, this will run a benchmark multiple times 361and generate a csv with the results. To see how to use this script, 362run `node benchmark/scatter.js`. 363 364```console 365$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv 366``` 367 368After generating the csv, a comparison table can be created using the 369`scatter.R` tool. Even more useful it creates an actual scatter plot when using 370the `--plot filename` option. 371 372```console 373$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log 374 375aggregating variable: inLen 376 377chunkLen encoding rate confidence.interval 378 16 ascii 1515855.1 334492.68 379 16 base64-ascii 403527.2 89677.70 380 16 base64-utf8 322352.8 70792.93 381 16 utf16le 1714567.5 388439.81 382 16 utf8 1100181.6 254141.32 383 64 ascii 3550402.0 661277.65 384 64 base64-ascii 1093660.3 229976.34 385 64 base64-utf8 997804.8 227238.04 386 64 utf16le 3372234.0 647274.88 387 64 utf8 1731941.2 360854.04 388 256 ascii 5033793.9 723354.30 389 256 base64-ascii 1447962.1 236625.96 390 256 base64-utf8 1357269.2 231045.70 391 256 utf16le 4039581.5 655483.16 392 256 utf8 1828672.9 360311.55 393 1024 ascii 5677592.7 624771.56 394 1024 base64-ascii 1494171.7 227302.34 395 1024 base64-utf8 1399218.9 224584.79 396 1024 utf16le 4157452.0 630416.28 397 1024 utf8 1824266.6 359628.52 398``` 399 400Because the scatter plot can only show two variables (in this case _chunkLen_ 401and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this 402can be solved by filtering. This can be done while benchmarking using the 403`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results 404afterwards using tools such as `sed` or `grep`. In the `sed` case be 405sure to keep the first line since that contains the header information. 406 407```console 408$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log 409 410chunkLen encoding rate confidence.interval 411 16 ascii 1302078.5 71692.27 412 16 base64-ascii 338669.1 15159.54 413 16 base64-utf8 281904.2 20326.75 414 16 utf16le 1381515.5 58533.61 415 16 utf8 831183.2 33631.01 416 64 ascii 4363402.8 224030.00 417 64 base64-ascii 1036825.9 48644.72 418 64 base64-utf8 780059.3 60994.98 419 64 utf16le 3900749.5 158366.84 420 64 utf8 1723710.6 80665.65 421 256 ascii 8472896.1 511822.51 422 256 base64-ascii 2215884.6 104347.53 423 256 base64-utf8 1996230.3 131778.47 424 256 utf16le 5824147.6 234550.82 425 256 utf8 2019428.8 100913.36 426 1024 ascii 8340189.4 598855.08 427 1024 base64-ascii 2201316.2 111777.68 428 1024 base64-utf8 2002272.9 128843.11 429 1024 utf16le 5789281.7 240642.77 430 1024 utf8 2025551.2 81770.69 431``` 432 433![compare tool boxplot](doc_img/scatter-plot.png) 434 435### Running benchmarks on the CI 436 437To see the performance impact of a pull request by running benchmarks on 438the CI, check out [How to: Running core benchmarks on Node.js CI][benchmark-ci]. 439 440## Creating a benchmark 441 442### Basics of a benchmark 443 444All benchmarks use the `require('../common.js')` module. This contains the 445`createBenchmark(main, configs[, options])` method which will setup the 446benchmark. 447 448The arguments of `createBenchmark` are: 449 450* `main` {Function} The benchmark function, 451 where the code running operations and controlling timers should go 452* `configs` {Object} The benchmark parameters. `createBenchmark` will run all 453 possible combinations of these parameters, unless specified otherwise. 454 Each configuration is a property with an array of possible values. 455 The configuration values can only be strings or numbers. 456* `options` {Object} The benchmark options. Supported options: 457 * `flags` {Array} Contains node-specific command line flags to pass to 458 the child process. 459 * `combinationFilter` {Function} Has a single parameter which is an object 460 containing a combination of benchmark parameters. It should return `true` 461 or `false` to indicate whether the combination should be included or not. 462 463`createBenchmark` returns a `bench` object, which is used for timing 464the runtime of the benchmark. Run `bench.start()` after the initialization 465and `bench.end(n)` when the benchmark is done. `n` is the number of operations 466performed in the benchmark. 467 468The benchmark script will be run twice: 469 470The first pass will configure the benchmark with the combination of 471parameters specified in `configs`, and WILL NOT run the `main` function. 472In this pass, no flags except the ones directly passed via commands 473when running the benchmarks will be used. 474 475In the second pass, the `main` function will be run, and the process 476will be launched with: 477 478* The flags passed into `createBenchmark` (the third argument) 479* The flags in the command passed when the benchmark was run 480 481Beware that any code outside the `main` function will be run twice 482in different processes. This could be troublesome if the code 483outside the `main` function has side effects. In general, prefer putting 484the code inside the `main` function if it's more than just declaration. 485 486```js 487'use strict'; 488const common = require('../common.js'); 489const { SlowBuffer } = require('node:buffer'); 490 491const configs = { 492 // Number of operations, specified here so they show up in the report. 493 // Most benchmarks just use one value for all runs. 494 n: [1024], 495 type: ['fast', 'slow'], // Custom configurations 496 size: [16, 128, 1024], // Custom configurations 497}; 498 499const options = { 500 // Add --expose-internals in order to require internal modules in main 501 flags: ['--zero-fill-buffers'], 502}; 503 504// `main` and `configs` are required, `options` is optional. 505const bench = common.createBenchmark(main, configs, options); 506 507// Any code outside main will be run twice, 508// in different processes, with different command line arguments. 509 510function main(conf) { 511 // Only flags that have been passed to createBenchmark 512 // earlier when main is run will be in effect. 513 // In order to benchmark the internal modules, require them here. For example: 514 // const URL = require('internal/url').URL 515 516 // Start the timer 517 bench.start(); 518 519 // Do operations here 520 const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer; 521 522 for (let i = 0; i < conf.n; i++) { 523 new BufferConstructor(conf.size); 524 } 525 526 // End the timer, pass in the number of operations 527 bench.end(conf.n); 528} 529``` 530 531### Creating an HTTP benchmark 532 533The `bench` object returned by `createBenchmark` implements 534`http(options, callback)` method. It can be used to run external tool to 535benchmark HTTP servers. 536 537```js 538'use strict'; 539 540const common = require('../common.js'); 541 542const bench = common.createBenchmark(main, { 543 kb: [64, 128, 256, 1024], 544 connections: [100, 500], 545 duration: 5, 546}); 547 548function main(conf) { 549 const http = require('node:http'); 550 const len = conf.kb * 1024; 551 const chunk = Buffer.alloc(len, 'x'); 552 const server = http.createServer((req, res) => { 553 res.end(chunk); 554 }); 555 556 server.listen(common.PORT, () => { 557 bench.http({ 558 connections: conf.connections, 559 }, () => { 560 server.close(); 561 }); 562 }); 563} 564``` 565 566Supported options keys are: 567 568* `port` - defaults to `common.PORT` 569* `path` - defaults to `/` 570* `connections` - number of concurrent connections to use, defaults to 100 571* `duration` - duration of the benchmark in seconds, defaults to 10 572* `benchmarker` - benchmarker to use, defaults to the first available http 573 benchmarker 574 575[autocannon]: https://github.com/mcollina/autocannon 576[benchmark-ci]: https://github.com/nodejs/benchmarking/blob/HEAD/docs/core_benchmarks.md 577[git-for-windows]: https://git-scm.com/download/win 578[nghttp2.org]: https://nghttp2.org 579[node-benchmark-compare]: https://github.com/targos/node-benchmark-compare 580[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes%2C_unequal_variances_%28sX1_%3E_2sX2_or_sX2_%3E_2sX1%29 581[wrk]: https://github.com/wg/wrk 582