• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# How to write and run benchmarks in Node.js core
2
3## Table of contents
4
5* [Prerequisites](#prerequisites)
6  * [HTTP benchmark requirements](#http-benchmark-requirements)
7  * [HTTPS benchmark requirements](#https-benchmark-requirements)
8  * [HTTP/2 benchmark requirements](#http2-benchmark-requirements)
9  * [Benchmark analysis requirements](#benchmark-analysis-requirements)
10* [Running benchmarks](#running-benchmarks)
11  * [Running individual benchmarks](#running-individual-benchmarks)
12  * [Running all benchmarks](#running-all-benchmarks)
13  * [Filtering benchmarks](#filtering-benchmarks)
14  * [Comparing Node.js versions](#comparing-nodejs-versions)
15  * [Comparing parameters](#comparing-parameters)
16  * [Running benchmarks on the CI](#running-benchmarks-on-the-ci)
17* [Creating a benchmark](#creating-a-benchmark)
18  * [Basics of a benchmark](#basics-of-a-benchmark)
19  * [Creating an HTTP benchmark](#creating-an-http-benchmark)
20
21## Prerequisites
22
23Basic Unix tools are required for some benchmarks.
24[Git for Windows][git-for-windows] includes Git Bash and the necessary tools,
25which need to be included in the global Windows `PATH`.
26
27### HTTP benchmark requirements
28
29Most of the HTTP benchmarks require a benchmarker to be installed. This can be
30either [`wrk`][wrk] or [`autocannon`][autocannon].
31
32`Autocannon` is a Node.js script that can be installed using
33`npm install -g autocannon`. It will use the Node.js executable that is in the
34path. In order to compare two HTTP benchmark runs, make sure that the
35Node.js version in the path is not altered.
36
37`wrk` may be available through one of the available package managers. If not,
38it can be easily built [from source][wrk] via `make`.
39
40By default, `wrk` will be used as the benchmarker. If it is not available,
41`autocannon` will be used in its place. When creating an HTTP benchmark, the
42benchmarker to be used should be specified by providing it as an argument:
43
44`node benchmark/run.js --set benchmarker=autocannon http`
45
46`node benchmark/http/simple.js benchmarker=autocannon`
47
48#### HTTPS benchmark requirements
49
50To run the `https` benchmarks, one of `autocannon` or `wrk` benchmarkers must
51be used.
52
53`node benchmark/https/simple.js benchmarker=autocannon`
54
55#### HTTP/2 benchmark requirements
56
57To run the `http2` benchmarks, the `h2load` benchmarker must be used. The
58`h2load` tool is a component of the `nghttp2` project and may be installed
59from [nghttp2.org][] or built from source.
60
61`node benchmark/http2/simple.js benchmarker=h2load`
62
63### Benchmark analysis requirements
64
65To analyze the results statistically, you can use either the
66[node-benchmark-compare][] tool or the R script `benchmark/compare.R`.
67
68[node-benchmark-compare][] is a Node.js script that can be installed with
69`npm install -g node-benchmark-compare`.
70
71To draw comparison plots when analyzing the results, `R` must be installed.
72Use one of the available package managers or download it from
73<https://www.r-project.org/>.
74
75The R packages `ggplot2` and `plyr` are also used and can be installed using
76the R REPL.
77
78```console
79$ R
80install.packages("ggplot2")
81install.packages("plyr")
82```
83
84If a message states that a CRAN mirror must be selected first, specify a mirror
85with the `repo` parameter.
86
87```r
88install.packages("ggplot2", repo="http://cran.us.r-project.org")
89```
90
91Of course, use an appropriate mirror based on location.
92A list of mirrors is [located here](https://cran.r-project.org/mirrors.html).
93
94## Running benchmarks
95
96### Running individual benchmarks
97
98This can be useful for debugging a benchmark or doing a quick performance
99measure. But it does not provide the statistical information to make any
100conclusions about the performance.
101
102Individual benchmarks can be executed by simply executing the benchmark script
103with node.
104
105```console
106$ node benchmark/buffers/buffer-tostring.js
107
108buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704
109buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899
110buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695
111buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733
112buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723
113buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845
114buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129
115buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765
116```
117
118Each line represents a single benchmark with parameters specified as
119`${variable}=${value}`. Each configuration combination is executed in a separate
120process. This ensures that benchmark results aren't affected by the execution
121order due to V8 optimizations. **The last number is the rate of operations
122measured in ops/sec (higher is better).**
123
124Furthermore a subset of the configurations can be specified, by setting them in
125the process arguments:
126
127```console
128$ node benchmark/buffers/buffer-tostring.js len=1024
129
130buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504
131buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295
132```
133
134### Running all benchmarks
135
136Similar to running individual benchmarks, a group of benchmarks can be executed
137by using the `run.js` tool. To see how to use this script,
138run `node benchmark/run.js`. Again this does not provide the statistical
139information to make any conclusions.
140
141```console
142$ node benchmark/run.js assert
143
144assert/deepequal-buffer.js
145assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
146assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
147...
148
149assert/deepequal-map.js
150assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
151assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
152...
153
154assert/deepequal-object.js
155assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
156assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
157...
158```
159
160It is possible to execute more groups by adding extra process arguments.
161
162```console
163$ node benchmark/run.js assert async_hooks
164```
165
166#### Filtering benchmarks
167
168`benchmark/run.js` and `benchmark/compare.js` have `--filter pattern` and
169`--exclude pattern` options, which can be used to run a subset of benchmarks or
170to exclude specific benchmarks from the execution, respectively.
171
172```console
173$ node benchmark/run.js --filter "deepequal-b" assert
174
175assert/deepequal-buffer.js
176assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
177assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
178
179$ node benchmark/run.js --exclude "deepequal-b" assert
180
181assert/deepequal-map.js
182assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
183assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
184...
185
186assert/deepequal-object.js
187assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
188assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
189...
190```
191
192`--filter` and `--exclude` can be repeated to provide multiple patterns.
193
194```console
195$ node benchmark/run.js --filter "deepequal-b" --filter "deepequal-m" assert
196
197assert/deepequal-buffer.js
198assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
199assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
200
201assert/deepequal-map.js
202assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
203assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
204
205$ node benchmark/run.js --exclude "deepequal-b" --exclude "deepequal-m" assert
206
207assert/deepequal-object.js
208assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
209assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
210...
211
212assert/deepequal-prims-and-objs-big-array-set.js
213assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Array" strict=0 len=20000 n=25 primitive="string": 865.2977195251661
214assert/deepequal-prims-and-objs-big-array-set.js method="notDeepEqual_Array" strict=0 len=20000 n=25 primitive="string": 827.8297281403861
215assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Set" strict=0 len=20000 n=25 primitive="string": 28,826.618268696366
216...
217```
218
219If `--filter` and `--exclude` are used together, `--filter` is applied first,
220and `--exclude` is applied on the result of `--filter`:
221
222```console
223$ node benchmark/run.js --filter "bench-" process
224
225process/bench-env.js
226process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095
227process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867
228process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992
229process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144
230process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569
231
232process/bench-hrtime.js
233process/bench-hrtime.js type="raw" n=1000000: 13,178,002.113936031
234process/bench-hrtime.js type="diff" n=1000000: 11,585,435.712423025
235process/bench-hrtime.js type="bigint" n=1000000: 13,342,884.703919787
236
237$ node benchmark/run.js --filter "bench-" --exclude "hrtime" process
238
239process/bench-env.js
240process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095
241process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867
242process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992
243process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144
244process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569
245```
246
247### Comparing Node.js versions
248
249To compare the effect of a new Node.js version use the `compare.js` tool. This
250will run each benchmark multiple times, making it possible to calculate
251statistics on the performance measures. To see how to use this script,
252run `node benchmark/compare.js`.
253
254As an example on how to check for a possible performance improvement, the
255[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as
256an example. This pull request _claims_ to improve the performance of the
257`node:string_decoder` module.
258
259First build two versions of Node.js, one from the `main` branch (here called
260`./node-main`) and another with the pull request applied (here called
261`./node-pr-5134`).
262
263To run multiple compiled versions in parallel you need to copy the output of the
264build: `cp ./out/Release/node ./node-main`. Check out the following example:
265
266```console
267$ git checkout main
268$ ./configure && make -j4
269$ cp ./out/Release/node ./node-main
270
271$ git checkout pr-5134
272$ ./configure && make -j4
273$ cp ./out/Release/node ./node-pr-5134
274```
275
276The `compare.js` tool will then produce a csv file with the benchmark results.
277
278```console
279$ node benchmark/compare.js --old ./node-main --new ./node-pr-5134 string_decoder > compare-pr-5134.csv
280```
281
282_Tips: there are some useful options of `benchmark/compare.js`. For example,
283if you want to compare the benchmark of a single script instead of a whole
284module, you can use the `--filter` option:_
285
286```console
287  --new      ./new-node-binary  new node binary (required)
288  --old      ./old-node-binary  old node binary (required)
289  --runs     30                 number of samples
290  --filter   pattern            string to filter benchmark scripts
291  --set      variable=value     set benchmark variable (can be repeated)
292  --no-progress                 don't show benchmark progress indicator
293```
294
295For analyzing the benchmark results, use [node-benchmark-compare][] or the R
296scripts:
297
298* `benchmark/compare.R`
299* `benchmark/bar.R`
300
301```console
302$ node-benchmark-compare compare-pr-5134.csv # or cat compare-pr-5134.csv | Rscript benchmark/compare.R
303
304                                                                                             confidence improvement accuracy (*)    (**)   (***)
305 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii'                  ***     -3.76 %       ±1.36%  ±1.82%  ±2.40%
306 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='utf8'                    **     -0.81 %       ±0.53%  ±0.71%  ±0.93%
307 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii'                   ***     -2.70 %       ±0.83%  ±1.11%  ±1.45%
308 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='base64-ascii'            ***     -1.57 %       ±0.83%  ±1.11%  ±1.46%
309...
310```
311
312In the output, _improvement_ is the relative improvement of the new version,
313hopefully this is positive. _confidence_ tells if there is enough
314statistical evidence to validate the _improvement_. If there is enough evidence
315then there will be at least one star (`*`), more stars is just better. **However
316if there are no stars, then don't make any conclusions based on the
317_improvement_.** Sometimes this is fine, for example if no improvements are
318expected, then there shouldn't be any stars.
319
320**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows
321a statistical significant difference, there is a 5% risk that this
322difference doesn't actually exist. For a single benchmark this is not an
323issue. But when considering 20 benchmarks it's normal that one of them
324will show significance, when it shouldn't. A possible solution is to instead
325consider at least two stars (`**`) as the threshold, in that case the risk
326is 1%. If three stars (`***`) is considered the risk is 0.1%. However this
327may require more runs to obtain (can be set with `--runs`).
328
329_For the statistically minded, the script performs an [independent/unpaired
3302-group t-test][t-test], with the null hypothesis that the performance is the
331same for both versions. The confidence field will show a star if the p-value
332is less than `0.05`._
333
334The `compare.R` tool can additionally produce a box plot by using the
335`--plot filename` option. In this case there are 48 different benchmark
336combinations, and there may be a need to filter the csv file. This can be done
337while benchmarking using the `--set` parameter (e.g. `--set encoding=ascii`) or
338by filtering results afterwards using tools such as `sed` or `grep`. In the
339`sed` case be sure to keep the first line since that contains the header
340information.
341
342```console
343$ cat compare-pr-5134.csv | sed '1p;/encoding='"'"ascii"'"'/!d' | Rscript benchmark/compare.R --plot compare-plot.png
344
345                                                                                      confidence improvement accuracy (*)    (**)   (***)
346 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii'           ***     -3.76 %       ±1.36%  ±1.82%  ±2.40%
347 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii'            ***     -2.70 %       ±0.83%  ±1.11%  ±1.45%
348 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=4096 encoding='ascii'          ***     -4.06 %       ±0.31%  ±0.41%  ±0.54%
349 string_decoder/string-decoder.js n=2500000 chunkLen=256 inLen=1024 encoding='ascii'         ***     -1.42 %       ±0.58%  ±0.77%  ±1.01%
350...
351```
352
353![compare tool boxplot](doc_img/compare-boxplot.png)
354
355### Comparing parameters
356
357It can be useful to compare the performance for different parameters, for
358example to analyze the time complexity.
359
360To do this use the `scatter.js` tool, this will run a benchmark multiple times
361and generate a csv with the results. To see how to use this script,
362run `node benchmark/scatter.js`.
363
364```console
365$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv
366```
367
368After generating the csv, a comparison table can be created using the
369`scatter.R` tool. Even more useful it creates an actual scatter plot when using
370the `--plot filename` option.
371
372```console
373$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
374
375aggregating variable: inLen
376
377chunkLen     encoding      rate confidence.interval
378      16        ascii 1515855.1           334492.68
379      16 base64-ascii  403527.2            89677.70
380      16  base64-utf8  322352.8            70792.93
381      16      utf16le 1714567.5           388439.81
382      16         utf8 1100181.6           254141.32
383      64        ascii 3550402.0           661277.65
384      64 base64-ascii 1093660.3           229976.34
385      64  base64-utf8  997804.8           227238.04
386      64      utf16le 3372234.0           647274.88
387      64         utf8 1731941.2           360854.04
388     256        ascii 5033793.9           723354.30
389     256 base64-ascii 1447962.1           236625.96
390     256  base64-utf8 1357269.2           231045.70
391     256      utf16le 4039581.5           655483.16
392     256         utf8 1828672.9           360311.55
393    1024        ascii 5677592.7           624771.56
394    1024 base64-ascii 1494171.7           227302.34
395    1024  base64-utf8 1399218.9           224584.79
396    1024      utf16le 4157452.0           630416.28
397    1024         utf8 1824266.6           359628.52
398```
399
400Because the scatter plot can only show two variables (in this case _chunkLen_
401and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this
402can be solved by filtering. This can be done while benchmarking using the
403`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results
404afterwards using tools such as `sed` or `grep`. In the `sed` case be
405sure to keep the first line since that contains the header information.
406
407```console
408$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
409
410chunkLen     encoding      rate confidence.interval
411      16        ascii 1302078.5            71692.27
412      16 base64-ascii  338669.1            15159.54
413      16  base64-utf8  281904.2            20326.75
414      16      utf16le 1381515.5            58533.61
415      16         utf8  831183.2            33631.01
416      64        ascii 4363402.8           224030.00
417      64 base64-ascii 1036825.9            48644.72
418      64  base64-utf8  780059.3            60994.98
419      64      utf16le 3900749.5           158366.84
420      64         utf8 1723710.6            80665.65
421     256        ascii 8472896.1           511822.51
422     256 base64-ascii 2215884.6           104347.53
423     256  base64-utf8 1996230.3           131778.47
424     256      utf16le 5824147.6           234550.82
425     256         utf8 2019428.8           100913.36
426    1024        ascii 8340189.4           598855.08
427    1024 base64-ascii 2201316.2           111777.68
428    1024  base64-utf8 2002272.9           128843.11
429    1024      utf16le 5789281.7           240642.77
430    1024         utf8 2025551.2            81770.69
431```
432
433![compare tool boxplot](doc_img/scatter-plot.png)
434
435### Running benchmarks on the CI
436
437To see the performance impact of a pull request by running benchmarks on
438the CI, check out [How to: Running core benchmarks on Node.js CI][benchmark-ci].
439
440## Creating a benchmark
441
442### Basics of a benchmark
443
444All benchmarks use the `require('../common.js')` module. This contains the
445`createBenchmark(main, configs[, options])` method which will setup the
446benchmark.
447
448The arguments of `createBenchmark` are:
449
450* `main` {Function} The benchmark function,
451  where the code running operations and controlling timers should go
452* `configs` {Object} The benchmark parameters. `createBenchmark` will run all
453  possible combinations of these parameters, unless specified otherwise.
454  Each configuration is a property with an array of possible values.
455  The configuration values can only be strings or numbers.
456* `options` {Object} The benchmark options. Supported options:
457  * `flags` {Array} Contains node-specific command line flags to pass to
458    the child process.
459  * `combinationFilter` {Function} Has a single parameter which is an object
460    containing a combination of benchmark parameters. It should return `true`
461    or `false` to indicate whether the combination should be included or not.
462
463`createBenchmark` returns a `bench` object, which is used for timing
464the runtime of the benchmark. Run `bench.start()` after the initialization
465and `bench.end(n)` when the benchmark is done. `n` is the number of operations
466performed in the benchmark.
467
468The benchmark script will be run twice:
469
470The first pass will configure the benchmark with the combination of
471parameters specified in `configs`, and WILL NOT run the `main` function.
472In this pass, no flags except the ones directly passed via commands
473when running the benchmarks will be used.
474
475In the second pass, the `main` function will be run, and the process
476will be launched with:
477
478* The flags passed into `createBenchmark` (the third argument)
479* The flags in the command passed when the benchmark was run
480
481Beware that any code outside the `main` function will be run twice
482in different processes. This could be troublesome if the code
483outside the `main` function has side effects. In general, prefer putting
484the code inside the `main` function if it's more than just declaration.
485
486```js
487'use strict';
488const common = require('../common.js');
489const { SlowBuffer } = require('node:buffer');
490
491const configs = {
492  // Number of operations, specified here so they show up in the report.
493  // Most benchmarks just use one value for all runs.
494  n: [1024],
495  type: ['fast', 'slow'],  // Custom configurations
496  size: [16, 128, 1024],  // Custom configurations
497};
498
499const options = {
500  // Add --expose-internals in order to require internal modules in main
501  flags: ['--zero-fill-buffers'],
502};
503
504// `main` and `configs` are required, `options` is optional.
505const bench = common.createBenchmark(main, configs, options);
506
507// Any code outside main will be run twice,
508// in different processes, with different command line arguments.
509
510function main(conf) {
511  // Only flags that have been passed to createBenchmark
512  // earlier when main is run will be in effect.
513  // In order to benchmark the internal modules, require them here. For example:
514  // const URL = require('internal/url').URL
515
516  // Start the timer
517  bench.start();
518
519  // Do operations here
520  const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer;
521
522  for (let i = 0; i < conf.n; i++) {
523    new BufferConstructor(conf.size);
524  }
525
526  // End the timer, pass in the number of operations
527  bench.end(conf.n);
528}
529```
530
531### Creating an HTTP benchmark
532
533The `bench` object returned by `createBenchmark` implements
534`http(options, callback)` method. It can be used to run external tool to
535benchmark HTTP servers.
536
537```js
538'use strict';
539
540const common = require('../common.js');
541
542const bench = common.createBenchmark(main, {
543  kb: [64, 128, 256, 1024],
544  connections: [100, 500],
545  duration: 5,
546});
547
548function main(conf) {
549  const http = require('node:http');
550  const len = conf.kb * 1024;
551  const chunk = Buffer.alloc(len, 'x');
552  const server = http.createServer((req, res) => {
553    res.end(chunk);
554  });
555
556  server.listen(common.PORT, () => {
557    bench.http({
558      connections: conf.connections,
559    }, () => {
560      server.close();
561    });
562  });
563}
564```
565
566Supported options keys are:
567
568* `port` - defaults to `common.PORT`
569* `path` - defaults to `/`
570* `connections` - number of concurrent connections to use, defaults to 100
571* `duration` - duration of the benchmark in seconds, defaults to 10
572* `benchmarker` - benchmarker to use, defaults to the first available http
573  benchmarker
574
575[autocannon]: https://github.com/mcollina/autocannon
576[benchmark-ci]: https://github.com/nodejs/benchmarking/blob/HEAD/docs/core_benchmarks.md
577[git-for-windows]: https://git-scm.com/download/win
578[nghttp2.org]: https://nghttp2.org
579[node-benchmark-compare]: https://github.com/targos/node-benchmark-compare
580[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes%2C_unequal_variances_%28sX1_%3E_2sX2_or_sX2_%3E_2sX1%29
581[wrk]: https://github.com/wg/wrk
582