• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# How to write and run benchmarks in Node.js core
2
3## Table of contents
4
5* [Prerequisites](#prerequisites)
6  * [HTTP benchmark requirements](#http-benchmark-requirements)
7  * [HTTPS benchmark requirements](#https-benchmark-requirements)
8  * [HTTP/2 benchmark requirements](#http2-benchmark-requirements)
9  * [Benchmark analysis requirements](#benchmark-analysis-requirements)
10* [Running benchmarks](#running-benchmarks)
11  * [Running individual benchmarks](#running-individual-benchmarks)
12  * [Running all benchmarks](#running-all-benchmarks)
13  * [Filtering benchmarks](#filtering-benchmarks)
14  * [Comparing Node.js versions](#comparing-nodejs-versions)
15  * [Comparing parameters](#comparing-parameters)
16  * [Running benchmarks on the CI](#running-benchmarks-on-the-ci)
17* [Creating a benchmark](#creating-a-benchmark)
18  * [Basics of a benchmark](#basics-of-a-benchmark)
19  * [Creating an HTTP benchmark](#creating-an-http-benchmark)
20
21## Prerequisites
22
23Basic Unix tools are required for some benchmarks.
24[Git for Windows][git-for-windows] includes Git Bash and the necessary tools,
25which need to be included in the global Windows `PATH`.
26
27### HTTP benchmark requirements
28
29Most of the HTTP benchmarks require a benchmarker to be installed. This can be
30either [`wrk`][wrk] or [`autocannon`][autocannon].
31
32`Autocannon` is a Node.js script that can be installed using
33`npm install -g autocannon`. It will use the Node.js executable that is in the
34path. In order to compare two HTTP benchmark runs, make sure that the
35Node.js version in the path is not altered.
36
37`wrk` may be available through one of the available package managers. If not,
38it can be easily built [from source][wrk] via `make`.
39
40By default, `wrk` will be used as the benchmarker. If it is not available,
41`autocannon` will be used in its place. When creating an HTTP benchmark, the
42benchmarker to be used should be specified by providing it as an argument:
43
44`node benchmark/run.js --set benchmarker=autocannon http`
45
46`node benchmark/http/simple.js benchmarker=autocannon`
47
48#### HTTPS benchmark requirements
49
50To run the `https` benchmarks, one of `autocannon` or `wrk` benchmarkers must
51be used.
52
53`node benchmark/https/simple.js benchmarker=autocannon`
54
55#### HTTP/2 benchmark requirements
56
57To run the `http2` benchmarks, the `h2load` benchmarker must be used. The
58`h2load` tool is a component of the `nghttp2` project and may be installed
59from [nghttp2.org][] or built from source.
60
61`node benchmark/http2/simple.js benchmarker=h2load`
62
63### Benchmark analysis requirements
64
65To analyze the results, `R` should be installed. Use one of the available
66package managers or download it from <https://www.r-project.org/>.
67
68The R packages `ggplot2` and `plyr` are also used and can be installed using
69the R REPL.
70
71```console
72$ R
73install.packages("ggplot2")
74install.packages("plyr")
75```
76
77If a message states that a CRAN mirror must be selected first, specify a mirror
78with the `repo` parameter.
79
80```r
81install.packages("ggplot2", repo="http://cran.us.r-project.org")
82```
83
84Of course, use an appropriate mirror based on location.
85A list of mirrors is [located here](https://cran.r-project.org/mirrors.html).
86
87## Running benchmarks
88
89### Running individual benchmarks
90
91This can be useful for debugging a benchmark or doing a quick performance
92measure. But it does not provide the statistical information to make any
93conclusions about the performance.
94
95Individual benchmarks can be executed by simply executing the benchmark script
96with node.
97
98```console
99$ node benchmark/buffers/buffer-tostring.js
100
101buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704
102buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899
103buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695
104buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733
105buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723
106buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845
107buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129
108buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765
109```
110
111Each line represents a single benchmark with parameters specified as
112`${variable}=${value}`. Each configuration combination is executed in a separate
113process. This ensures that benchmark results aren't affected by the execution
114order due to V8 optimizations. **The last number is the rate of operations
115measured in ops/sec (higher is better).**
116
117Furthermore a subset of the configurations can be specified, by setting them in
118the process arguments:
119
120```console
121$ node benchmark/buffers/buffer-tostring.js len=1024
122
123buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504
124buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295
125```
126
127### Running all benchmarks
128
129Similar to running individual benchmarks, a group of benchmarks can be executed
130by using the `run.js` tool. To see how to use this script,
131run `node benchmark/run.js`. Again this does not provide the statistical
132information to make any conclusions.
133
134```console
135$ node benchmark/run.js assert
136
137assert/deepequal-buffer.js
138assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
139assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
140...
141
142assert/deepequal-map.js
143assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
144assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
145...
146
147assert/deepequal-object.js
148assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
149assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
150...
151```
152
153It is possible to execute more groups by adding extra process arguments.
154
155```console
156$ node benchmark/run.js assert async_hooks
157```
158
159#### Filtering benchmarks
160
161`benchmark/run.js` and `benchmark/compare.js` have `--filter pattern` and
162`--exclude pattern` options, which can be used to run a subset of benchmarks or
163to exclude specific benchmarks from the execution, respectively.
164
165```console
166$ node benchmark/run.js --filter "deepequal-b" assert
167
168assert/deepequal-buffer.js
169assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
170assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
171
172$ node benchmark/run.js --exclude "deepequal-b" assert
173
174assert/deepequal-map.js
175assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
176assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
177...
178
179assert/deepequal-object.js
180assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
181assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
182...
183```
184
185`--filter` and `--exclude` can be repeated to provide multiple patterns.
186
187```console
188$ node benchmark/run.js --filter "deepequal-b" --filter "deepequal-m" assert
189
190assert/deepequal-buffer.js
191assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
192assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
193
194assert/deepequal-map.js
195assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
196assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
197
198$ node benchmark/run.js --exclude "deepequal-b" --exclude "deepequal-m" assert
199
200assert/deepequal-object.js
201assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
202assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
203...
204
205assert/deepequal-prims-and-objs-big-array-set.js
206assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Array" strict=0 len=20000 n=25 primitive="string": 865.2977195251661
207assert/deepequal-prims-and-objs-big-array-set.js method="notDeepEqual_Array" strict=0 len=20000 n=25 primitive="string": 827.8297281403861
208assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Set" strict=0 len=20000 n=25 primitive="string": 28,826.618268696366
209...
210```
211
212If `--filter` and `--exclude` are used together, `--filter` is applied first,
213and `--exclude` is applied on the result of `--filter`:
214
215```console
216$ node benchmark/run.js --filter "bench-" process
217
218process/bench-env.js
219process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095
220process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867
221process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992
222process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144
223process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569
224
225process/bench-hrtime.js
226process/bench-hrtime.js type="raw" n=1000000: 13,178,002.113936031
227process/bench-hrtime.js type="diff" n=1000000: 11,585,435.712423025
228process/bench-hrtime.js type="bigint" n=1000000: 13,342,884.703919787
229
230$ node benchmark/run.js --filter "bench-" --exclude "hrtime" process
231
232process/bench-env.js
233process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095
234process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867
235process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992
236process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144
237process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569
238```
239
240### Comparing Node.js versions
241
242To compare the effect of a new Node.js version use the `compare.js` tool. This
243will run each benchmark multiple times, making it possible to calculate
244statistics on the performance measures. To see how to use this script,
245run `node benchmark/compare.js`.
246
247As an example on how to check for a possible performance improvement, the
248[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as
249an example. This pull request _claims_ to improve the performance of the
250`string_decoder` module.
251
252First build two versions of Node.js, one from the master branch (here called
253`./node-master`) and another with the pull request applied (here called
254`./node-pr-5134`).
255
256To run multiple compiled versions in parallel you need to copy the output of the
257build: `cp ./out/Release/node ./node-master`. Check out the following example:
258
259```console
260$ git checkout master
261$ ./configure && make -j4
262$ cp ./out/Release/node ./node-master
263
264$ git checkout pr-5134
265$ ./configure && make -j4
266$ cp ./out/Release/node ./node-pr-5134
267```
268
269The `compare.js` tool will then produce a csv file with the benchmark results.
270
271```console
272$ node benchmark/compare.js --old ./node-master --new ./node-pr-5134 string_decoder > compare-pr-5134.csv
273```
274
275*Tips: there are some useful options of `benchmark/compare.js`. For example,
276if you want to compare the benchmark of a single script instead of a whole
277module, you can use the `--filter` option:*
278
279```console
280  --new      ./new-node-binary  new node binary (required)
281  --old      ./old-node-binary  old node binary (required)
282  --runs     30                 number of samples
283  --filter   pattern            string to filter benchmark scripts
284  --set      variable=value     set benchmark variable (can be repeated)
285  --no-progress                 don't show benchmark progress indicator
286```
287
288For analysing the benchmark results use the `compare.R` tool.
289
290```console
291$ cat compare-pr-5134.csv | Rscript benchmark/compare.R
292
293                                                                                             confidence improvement accuracy (*)    (**)   (***)
294 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii'                  ***     -3.76 %       ±1.36%  ±1.82%  ±2.40%
295 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='utf8'                    **     -0.81 %       ±0.53%  ±0.71%  ±0.93%
296 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii'                   ***     -2.70 %       ±0.83%  ±1.11%  ±1.45%
297 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='base64-ascii'            ***     -1.57 %       ±0.83%  ±1.11%  ±1.46%
298...
299```
300
301In the output, _improvement_ is the relative improvement of the new version,
302hopefully this is positive. _confidence_ tells if there is enough
303statistical evidence to validate the _improvement_. If there is enough evidence
304then there will be at least one star (`*`), more stars is just better. **However
305if there are no stars, then don't make any conclusions based on the
306_improvement_.** Sometimes this is fine, for example if no improvements are
307expected, then there shouldn't be any stars.
308
309**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows
310a statistical significant difference, there is a 5% risk that this
311difference doesn't actually exist. For a single benchmark this is not an
312issue. But when considering 20 benchmarks it's normal that one of them
313will show significance, when it shouldn't. A possible solution is to instead
314consider at least two stars (`**`) as the threshold, in that case the risk
315is 1%. If three stars (`***`) is considered the risk is 0.1%. However this
316may require more runs to obtain (can be set with `--runs`).
317
318_For the statistically minded, the R script performs an [independent/unpaired
3192-group t-test][t-test], with the null hypothesis that the performance is the
320same for both versions. The confidence field will show a star if the p-value
321is less than `0.05`._
322
323The `compare.R` tool can also produce a box plot by using the `--plot filename`
324option. In this case there are 48 different benchmark combinations, and there
325may be a need to filter the csv file. This can be done while benchmarking
326using the `--set` parameter (e.g. `--set encoding=ascii`) or by filtering
327results afterwards using tools such as `sed` or `grep`. In the `sed` case be
328sure to keep the first line since that contains the header information.
329
330```console
331$ cat compare-pr-5134.csv | sed '1p;/encoding='"'"ascii"'"'/!d' | Rscript benchmark/compare.R --plot compare-plot.png
332
333                                                                                      confidence improvement accuracy (*)    (**)   (***)
334 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii'           ***     -3.76 %       ±1.36%  ±1.82%  ±2.40%
335 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii'            ***     -2.70 %       ±0.83%  ±1.11%  ±1.45%
336 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=4096 encoding='ascii'          ***     -4.06 %       ±0.31%  ±0.41%  ±0.54%
337 string_decoder/string-decoder.js n=2500000 chunkLen=256 inLen=1024 encoding='ascii'         ***     -1.42 %       ±0.58%  ±0.77%  ±1.01%
338...
339```
340
341![compare tool boxplot](doc_img/compare-boxplot.png)
342
343### Comparing parameters
344
345It can be useful to compare the performance for different parameters, for
346example to analyze the time complexity.
347
348To do this use the `scatter.js` tool, this will run a benchmark multiple times
349and generate a csv with the results. To see how to use this script,
350run `node benchmark/scatter.js`.
351
352```console
353$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv
354```
355
356After generating the csv, a comparison table can be created using the
357`scatter.R` tool. Even more useful it creates an actual scatter plot when using
358the `--plot filename` option.
359
360```console
361$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
362
363aggregating variable: inLen
364
365chunkLen     encoding      rate confidence.interval
366      16        ascii 1515855.1           334492.68
367      16 base64-ascii  403527.2            89677.70
368      16  base64-utf8  322352.8            70792.93
369      16      utf16le 1714567.5           388439.81
370      16         utf8 1100181.6           254141.32
371      64        ascii 3550402.0           661277.65
372      64 base64-ascii 1093660.3           229976.34
373      64  base64-utf8  997804.8           227238.04
374      64      utf16le 3372234.0           647274.88
375      64         utf8 1731941.2           360854.04
376     256        ascii 5033793.9           723354.30
377     256 base64-ascii 1447962.1           236625.96
378     256  base64-utf8 1357269.2           231045.70
379     256      utf16le 4039581.5           655483.16
380     256         utf8 1828672.9           360311.55
381    1024        ascii 5677592.7           624771.56
382    1024 base64-ascii 1494171.7           227302.34
383    1024  base64-utf8 1399218.9           224584.79
384    1024      utf16le 4157452.0           630416.28
385    1024         utf8 1824266.6           359628.52
386```
387
388Because the scatter plot can only show two variables (in this case _chunkLen_
389and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this
390can be solved by filtering. This can be done while benchmarking using the
391`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results
392afterwards using tools such as `sed` or `grep`. In the `sed` case be
393sure to keep the first line since that contains the header information.
394
395```console
396$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
397
398chunkLen     encoding      rate confidence.interval
399      16        ascii 1302078.5            71692.27
400      16 base64-ascii  338669.1            15159.54
401      16  base64-utf8  281904.2            20326.75
402      16      utf16le 1381515.5            58533.61
403      16         utf8  831183.2            33631.01
404      64        ascii 4363402.8           224030.00
405      64 base64-ascii 1036825.9            48644.72
406      64  base64-utf8  780059.3            60994.98
407      64      utf16le 3900749.5           158366.84
408      64         utf8 1723710.6            80665.65
409     256        ascii 8472896.1           511822.51
410     256 base64-ascii 2215884.6           104347.53
411     256  base64-utf8 1996230.3           131778.47
412     256      utf16le 5824147.6           234550.82
413     256         utf8 2019428.8           100913.36
414    1024        ascii 8340189.4           598855.08
415    1024 base64-ascii 2201316.2           111777.68
416    1024  base64-utf8 2002272.9           128843.11
417    1024      utf16le 5789281.7           240642.77
418    1024         utf8 2025551.2            81770.69
419```
420
421![compare tool boxplot](doc_img/scatter-plot.png)
422
423### Running benchmarks on the CI
424
425To see the performance impact of a pull request by running benchmarks on
426the CI, check out [How to: Running core benchmarks on Node.js CI][benchmark-ci].
427
428## Creating a benchmark
429
430### Basics of a benchmark
431
432All benchmarks use the `require('../common.js')` module. This contains the
433`createBenchmark(main, configs[, options])` method which will setup the
434benchmark.
435
436The arguments of `createBenchmark` are:
437
438* `main` {Function} The benchmark function,
439  where the code running operations and controlling timers should go
440* `configs` {Object} The benchmark parameters. `createBenchmark` will run all
441  possible combinations of these parameters, unless specified otherwise.
442  Each configuration is a property with an array of possible values.
443  The configuration values can only be strings or numbers.
444* `options` {Object} The benchmark options. At the moment only the `flags`
445  option for specifying command line flags is supported.
446
447`createBenchmark` returns a `bench` object, which is used for timing
448the runtime of the benchmark. Run `bench.start()` after the initialization
449and `bench.end(n)` when the benchmark is done. `n` is the number of operations
450performed in the benchmark.
451
452The benchmark script will be run twice:
453
454The first pass will configure the benchmark with the combination of
455parameters specified in `configs`, and WILL NOT run the `main` function.
456In this pass, no flags except the ones directly passed via commands
457when running the benchmarks will be used.
458
459In the second pass, the `main` function will be run, and the process
460will be launched with:
461
462* The flags passed into `createBenchmark` (the third argument)
463* The flags in the command passed when the benchmark was run
464
465Beware that any code outside the `main` function will be run twice
466in different processes. This could be troublesome if the code
467outside the `main` function has side effects. In general, prefer putting
468the code inside the `main` function if it's more than just declaration.
469
470```js
471'use strict';
472const common = require('../common.js');
473const { SlowBuffer } = require('buffer');
474
475const configs = {
476  // Number of operations, specified here so they show up in the report.
477  // Most benchmarks just use one value for all runs.
478  n: [1024],
479  type: ['fast', 'slow'],  // Custom configurations
480  size: [16, 128, 1024]  // Custom configurations
481};
482
483const options = {
484  // Add --expose-internals in order to require internal modules in main
485  flags: ['--zero-fill-buffers']
486};
487
488// `main` and `configs` are required, `options` is optional.
489const bench = common.createBenchmark(main, configs, options);
490
491// Any code outside main will be run twice,
492// in different processes, with different command line arguments.
493
494function main(conf) {
495  // Only flags that have been passed to createBenchmark
496  // earlier when main is run will be in effect.
497  // In order to benchmark the internal modules, require them here. For example:
498  // const URL = require('internal/url').URL
499
500  // Start the timer
501  bench.start();
502
503  // Do operations here
504  const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer;
505
506  for (let i = 0; i < conf.n; i++) {
507    new BufferConstructor(conf.size);
508  }
509
510  // End the timer, pass in the number of operations
511  bench.end(conf.n);
512}
513```
514
515### Creating an HTTP benchmark
516
517The `bench` object returned by `createBenchmark` implements
518`http(options, callback)` method. It can be used to run external tool to
519benchmark HTTP servers.
520
521```js
522'use strict';
523
524const common = require('../common.js');
525
526const bench = common.createBenchmark(main, {
527  kb: [64, 128, 256, 1024],
528  connections: [100, 500],
529  duration: 5
530});
531
532function main(conf) {
533  const http = require('http');
534  const len = conf.kb * 1024;
535  const chunk = Buffer.alloc(len, 'x');
536  const server = http.createServer((req, res) => {
537    res.end(chunk);
538  });
539
540  server.listen(common.PORT, () => {
541    bench.http({
542      connections: conf.connections,
543    }, () => {
544      server.close();
545    });
546  });
547}
548```
549
550Supported options keys are:
551
552* `port` - defaults to `common.PORT`
553* `path` - defaults to `/`
554* `connections` - number of concurrent connections to use, defaults to 100
555* `duration` - duration of the benchmark in seconds, defaults to 10
556* `benchmarker` - benchmarker to use, defaults to the first available http
557  benchmarker
558
559[autocannon]: https://github.com/mcollina/autocannon
560[benchmark-ci]: https://github.com/nodejs/benchmarking/blob/HEAD/docs/core_benchmarks.md
561[git-for-windows]: https://git-scm.com/download/win
562[nghttp2.org]: https://nghttp2.org
563[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes.2C_unequal_variances
564[wrk]: https://github.com/wg/wrk
565