• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# How to Write and Run Benchmarks in Node.js Core
2
3## Table of Contents
4
5* [Prerequisites](#prerequisites)
6  * [HTTP Benchmark Requirements](#http-benchmark-requirements)
7  * [Benchmark Analysis Requirements](#benchmark-analysis-requirements)
8* [Running benchmarks](#running-benchmarks)
9  * [Running individual benchmarks](#running-individual-benchmarks)
10  * [Running all benchmarks](#running-all-benchmarks)
11  * [Filtering benchmarks](#filtering-benchmarks)
12  * [Comparing Node.js versions](#comparing-nodejs-versions)
13  * [Comparing parameters](#comparing-parameters)
14  * [Running Benchmarks on the CI](#running-benchmarks-on-the-ci)
15* [Creating a benchmark](#creating-a-benchmark)
16  * [Basics of a benchmark](#basics-of-a-benchmark)
17  * [Creating an HTTP benchmark](#creating-an-http-benchmark)
18
19## Prerequisites
20
21Basic Unix tools are required for some benchmarks.
22[Git for Windows][git-for-windows] includes Git Bash and the necessary tools,
23which need to be included in the global Windows `PATH`.
24
25### HTTP Benchmark Requirements
26
27Most of the HTTP benchmarks require a benchmarker to be installed. This can be
28either [`wrk`][wrk] or [`autocannon`][autocannon].
29
30`Autocannon` is a Node.js script that can be installed using
31`npm install -g autocannon`. It will use the Node.js executable that is in the
32path. In order to compare two HTTP benchmark runs, make sure that the
33Node.js version in the path is not altered.
34
35`wrk` may be available through one of the available package managers. If not,
36it can be easily built [from source][wrk] via `make`.
37
38By default, `wrk` will be used as the benchmarker. If it is not available,
39`autocannon` will be used in its place. When creating an HTTP benchmark, the
40benchmarker to be used should be specified by providing it as an argument:
41
42`node benchmark/run.js --set benchmarker=autocannon http`
43
44`node benchmark/http/simple.js benchmarker=autocannon`
45
46#### HTTP/2 Benchmark Requirements
47
48To run the `http2` benchmarks, the `h2load` benchmarker must be used. The
49`h2load` tool is a component of the `nghttp2` project and may be installed
50from [nghttp2.org][] or built from source.
51
52`node benchmark/http2/simple.js benchmarker=autocannon`
53
54### Benchmark Analysis Requirements
55
56To analyze the results, `R` should be installed. Use one of the available
57package managers or download it from <https://www.r-project.org/>.
58
59The R packages `ggplot2` and `plyr` are also used and can be installed using
60the R REPL.
61
62```console
63$ R
64install.packages("ggplot2")
65install.packages("plyr")
66```
67
68In the event that a message is reported stating that a CRAN mirror must be
69selected first, specify a mirror by adding in the repo parameter.
70
71If we used the "<http://cran.us.r-project.org>" mirror, it could look something
72like this:
73
74```r
75install.packages("ggplot2", repo="http://cran.us.r-project.org")
76```
77
78Of course, use an appropriate mirror based on location.
79A list of mirrors is [located here](https://cran.r-project.org/mirrors.html).
80
81## Running benchmarks
82
83### Running individual benchmarks
84
85This can be useful for debugging a benchmark or doing a quick performance
86measure. But it does not provide the statistical information to make any
87conclusions about the performance.
88
89Individual benchmarks can be executed by simply executing the benchmark script
90with node.
91
92```console
93$ node benchmark/buffers/buffer-tostring.js
94
95buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704
96buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899
97buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695
98buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733
99buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723
100buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845
101buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129
102buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765
103```
104
105Each line represents a single benchmark with parameters specified as
106`${variable}=${value}`. Each configuration combination is executed in a separate
107process. This ensures that benchmark results aren't affected by the execution
108order due to V8 optimizations. **The last number is the rate of operations
109measured in ops/sec (higher is better).**
110
111Furthermore a subset of the configurations can be specified, by setting them in
112the process arguments:
113
114```console
115$ node benchmark/buffers/buffer-tostring.js len=1024
116
117buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504
118buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295
119```
120
121### Running all benchmarks
122
123Similar to running individual benchmarks, a group of benchmarks can be executed
124by using the `run.js` tool. To see how to use this script,
125run `node benchmark/run.js`. Again this does not provide the statistical
126information to make any conclusions.
127
128```console
129$ node benchmark/run.js assert
130
131assert/deepequal-buffer.js
132assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
133assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
134...
135
136assert/deepequal-map.js
137assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
138assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
139...
140
141assert/deepequal-object.js
142assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
143assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
144...
145```
146
147It is possible to execute more groups by adding extra process arguments.
148
149```console
150$ node benchmark/run.js assert async_hooks
151```
152
153#### Filtering benchmarks
154
155`benchmark/run.js` and `benchmark/compare.js` have `--filter pattern` and
156`--exclude pattern` options, which can be used to run a subset of benchmarks or
157to exclude specific benchmarks from the execution, respectively.
158
159```console
160$ node benchmark/run.js --filter "deepequal-b" assert
161
162assert/deepequal-buffer.js
163assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
164assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
165
166$ node benchmark/run.js --exclude "deepequal-b" assert
167
168assert/deepequal-map.js
169assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
170assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
171...
172
173assert/deepequal-object.js
174assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
175assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
176...
177```
178
179`--filter` and `--exclude` can be repeated to provide multiple patterns.
180
181```console
182$ node benchmark/run.js --filter "deepequal-b" --filter "deepequal-m" assert
183
184assert/deepequal-buffer.js
185assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
186assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
187
188assert/deepequal-map.js
189assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
190assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
191
192$ node benchmark/run.js --exclude "deepequal-b" --exclude "deepequal-m" assert
193
194assert/deepequal-object.js
195assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
196assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
197...
198
199assert/deepequal-prims-and-objs-big-array-set.js
200assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Array" strict=0 len=20000 n=25 primitive="string": 865.2977195251661
201assert/deepequal-prims-and-objs-big-array-set.js method="notDeepEqual_Array" strict=0 len=20000 n=25 primitive="string": 827.8297281403861
202assert/deepequal-prims-and-objs-big-array-set.js method="deepEqual_Set" strict=0 len=20000 n=25 primitive="string": 28,826.618268696366
203...
204```
205
206If `--filter` and `--exclude` are used together, `--filter` is applied first,
207and `--exclude` is applied on the result of `--filter`:
208
209```console
210$ node benchmark/run.js --filter "bench-" process
211
212process/bench-env.js
213process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095
214process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867
215process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992
216process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144
217process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569
218
219process/bench-hrtime.js
220process/bench-hrtime.js type="raw" n=1000000: 13,178,002.113936031
221process/bench-hrtime.js type="diff" n=1000000: 11,585,435.712423025
222process/bench-hrtime.js type="bigint" n=1000000: 13,342,884.703919787
223
224$ node benchmark/run.js --filter "bench-" --exclude "hrtime" process
225
226process/bench-env.js
227process/bench-env.js operation="get" n=1000000: 2,356,946.0770617095
228process/bench-env.js operation="set" n=1000000: 1,295,176.3266261867
229process/bench-env.js operation="enumerate" n=1000000: 24,592.32231990992
230process/bench-env.js operation="query" n=1000000: 3,625,787.2150573144
231process/bench-env.js operation="delete" n=1000000: 1,521,131.5742806569
232```
233
234### Comparing Node.js versions
235
236To compare the effect of a new Node.js version use the `compare.js` tool. This
237will run each benchmark multiple times, making it possible to calculate
238statistics on the performance measures. To see how to use this script,
239run `node benchmark/compare.js`.
240
241As an example on how to check for a possible performance improvement, the
242[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as
243an example. This pull request _claims_ to improve the performance of the
244`string_decoder` module.
245
246First build two versions of Node.js, one from the master branch (here called
247`./node-master`) and another with the pull request applied (here called
248`./node-pr-5134`).
249
250To run multiple compiled versions in parallel you need to copy the output of the
251build: `cp ./out/Release/node ./node-master`. Check out the following example:
252
253```console
254$ git checkout master
255$ ./configure && make -j4
256$ cp ./out/Release/node ./node-master
257
258$ git checkout pr-5134
259$ ./configure && make -j4
260$ cp ./out/Release/node ./node-pr-5134
261```
262
263The `compare.js` tool will then produce a csv file with the benchmark results.
264
265```console
266$ node benchmark/compare.js --old ./node-master --new ./node-pr-5134 string_decoder > compare-pr-5134.csv
267```
268
269*Tips: there are some useful options of `benchmark/compare.js`. For example,
270if you want to compare the benchmark of a single script instead of a whole
271module, you can use the `--filter` option:*
272
273```console
274  --new      ./new-node-binary  new node binary (required)
275  --old      ./old-node-binary  old node binary (required)
276  --runs     30                 number of samples
277  --filter   pattern            string to filter benchmark scripts
278  --set      variable=value     set benchmark variable (can be repeated)
279  --no-progress                 don't show benchmark progress indicator
280```
281
282For analysing the benchmark results use the `compare.R` tool.
283
284```console
285$ cat compare-pr-5134.csv | Rscript benchmark/compare.R
286
287                                                                                             confidence improvement accuracy (*)    (**)   (***)
288 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii'                  ***     -3.76 %       ±1.36%  ±1.82%  ±2.40%
289 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='utf8'                    **     -0.81 %       ±0.53%  ±0.71%  ±0.93%
290 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii'                   ***     -2.70 %       ±0.83%  ±1.11%  ±1.45%
291 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='base64-ascii'            ***     -1.57 %       ±0.83%  ±1.11%  ±1.46%
292...
293```
294
295In the output, _improvement_ is the relative improvement of the new version,
296hopefully this is positive. _confidence_ tells if there is enough
297statistical evidence to validate the _improvement_. If there is enough evidence
298then there will be at least one star (`*`), more stars is just better. **However
299if there are no stars, then don't make any conclusions based on the
300_improvement_.** Sometimes this is fine, for example if no improvements are
301expected, then there shouldn't be any stars.
302
303**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows
304a statistical significant difference, there is a 5% risk that this
305difference doesn't actually exist. For a single benchmark this is not an
306issue. But when considering 20 benchmarks it's normal that one of them
307will show significance, when it shouldn't. A possible solution is to instead
308consider at least two stars (`**`) as the threshold, in that case the risk
309is 1%. If three stars (`***`) is considered the risk is 0.1%. However this
310may require more runs to obtain (can be set with `--runs`).
311
312_For the statistically minded, the R script performs an [independent/unpaired
3132-group t-test][t-test], with the null hypothesis that the performance is the
314same for both versions. The confidence field will show a star if the p-value
315is less than `0.05`._
316
317The `compare.R` tool can also produce a box plot by using the `--plot filename`
318option. In this case there are 48 different benchmark combinations, and there
319may be a need to filter the csv file. This can be done while benchmarking
320using the `--set` parameter (e.g. `--set encoding=ascii`) or by filtering
321results afterwards using tools such as `sed` or `grep`. In the `sed` case be
322sure to keep the first line since that contains the header information.
323
324```console
325$ cat compare-pr-5134.csv | sed '1p;/encoding='"'"ascii"'"'/!d' | Rscript benchmark/compare.R --plot compare-plot.png
326
327                                                                                      confidence improvement accuracy (*)    (**)   (***)
328 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii'           ***     -3.76 %       ±1.36%  ±1.82%  ±2.40%
329 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii'            ***     -2.70 %       ±0.83%  ±1.11%  ±1.45%
330 string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=4096 encoding='ascii'          ***     -4.06 %       ±0.31%  ±0.41%  ±0.54%
331 string_decoder/string-decoder.js n=2500000 chunkLen=256 inLen=1024 encoding='ascii'         ***     -1.42 %       ±0.58%  ±0.77%  ±1.01%
332...
333```
334
335![compare tool boxplot](doc_img/compare-boxplot.png)
336
337### Comparing parameters
338
339It can be useful to compare the performance for different parameters, for
340example to analyze the time complexity.
341
342To do this use the `scatter.js` tool, this will run a benchmark multiple times
343and generate a csv with the results. To see how to use this script,
344run `node benchmark/scatter.js`.
345
346```console
347$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv
348```
349
350After generating the csv, a comparison table can be created using the
351`scatter.R` tool. Even more useful it creates an actual scatter plot when using
352the `--plot filename` option.
353
354```console
355$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
356
357aggregating variable: inLen
358
359chunkLen     encoding      rate confidence.interval
360      16        ascii 1515855.1           334492.68
361      16 base64-ascii  403527.2            89677.70
362      16  base64-utf8  322352.8            70792.93
363      16      utf16le 1714567.5           388439.81
364      16         utf8 1100181.6           254141.32
365      64        ascii 3550402.0           661277.65
366      64 base64-ascii 1093660.3           229976.34
367      64  base64-utf8  997804.8           227238.04
368      64      utf16le 3372234.0           647274.88
369      64         utf8 1731941.2           360854.04
370     256        ascii 5033793.9           723354.30
371     256 base64-ascii 1447962.1           236625.96
372     256  base64-utf8 1357269.2           231045.70
373     256      utf16le 4039581.5           655483.16
374     256         utf8 1828672.9           360311.55
375    1024        ascii 5677592.7           624771.56
376    1024 base64-ascii 1494171.7           227302.34
377    1024  base64-utf8 1399218.9           224584.79
378    1024      utf16le 4157452.0           630416.28
379    1024         utf8 1824266.6           359628.52
380```
381
382Because the scatter plot can only show two variables (in this case _chunkLen_
383and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this
384can be solved by filtering. This can be done while benchmarking using the
385`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results
386afterwards using tools such as `sed` or `grep`. In the `sed` case be
387sure to keep the first line since that contains the header information.
388
389```console
390$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
391
392chunkLen     encoding      rate confidence.interval
393      16        ascii 1302078.5            71692.27
394      16 base64-ascii  338669.1            15159.54
395      16  base64-utf8  281904.2            20326.75
396      16      utf16le 1381515.5            58533.61
397      16         utf8  831183.2            33631.01
398      64        ascii 4363402.8           224030.00
399      64 base64-ascii 1036825.9            48644.72
400      64  base64-utf8  780059.3            60994.98
401      64      utf16le 3900749.5           158366.84
402      64         utf8 1723710.6            80665.65
403     256        ascii 8472896.1           511822.51
404     256 base64-ascii 2215884.6           104347.53
405     256  base64-utf8 1996230.3           131778.47
406     256      utf16le 5824147.6           234550.82
407     256         utf8 2019428.8           100913.36
408    1024        ascii 8340189.4           598855.08
409    1024 base64-ascii 2201316.2           111777.68
410    1024  base64-utf8 2002272.9           128843.11
411    1024      utf16le 5789281.7           240642.77
412    1024         utf8 2025551.2            81770.69
413```
414
415![compare tool boxplot](doc_img/scatter-plot.png)
416
417### Running Benchmarks on the CI
418
419To see the performance impact of a Pull Request by running benchmarks on
420the CI, check out [How to: Running core benchmarks on Node.js CI][benchmark-ci].
421
422## Creating a benchmark
423
424### Basics of a benchmark
425
426All benchmarks use the `require('../common.js')` module. This contains the
427`createBenchmark(main, configs[, options])` method which will setup the
428benchmark.
429
430The arguments of `createBenchmark` are:
431
432* `main` {Function} The benchmark function,
433  where the code running operations and controlling timers should go
434* `configs` {Object} The benchmark parameters. `createBenchmark` will run all
435  possible combinations of these parameters, unless specified otherwise.
436  Each configuration is a property with an array of possible values.
437  The configuration values can only be strings or numbers.
438* `options` {Object} The benchmark options. At the moment only the `flags`
439  option for specifying command line flags is supported.
440
441`createBenchmark` returns a `bench` object, which is used for timing
442the runtime of the benchmark. Run `bench.start()` after the initialization
443and `bench.end(n)` when the benchmark is done. `n` is the number of operations
444performed in the benchmark.
445
446The benchmark script will be run twice:
447
448The first pass will configure the benchmark with the combination of
449parameters specified in `configs`, and WILL NOT run the `main` function.
450In this pass, no flags except the ones directly passed via commands
451when running the benchmarks will be used.
452
453In the second pass, the `main` function will be run, and the process
454will be launched with:
455
456* The flags passed into `createBenchmark` (the third argument)
457* The flags in the command passed when the benchmark was run
458
459Beware that any code outside the `main` function will be run twice
460in different processes. This could be troublesome if the code
461outside the `main` function has side effects. In general, prefer putting
462the code inside the `main` function if it's more than just declaration.
463
464```js
465'use strict';
466const common = require('../common.js');
467const { SlowBuffer } = require('buffer');
468
469const configs = {
470  // Number of operations, specified here so they show up in the report.
471  // Most benchmarks just use one value for all runs.
472  n: [1024],
473  type: ['fast', 'slow'],  // Custom configurations
474  size: [16, 128, 1024]  // Custom configurations
475};
476
477const options = {
478  // Add --expose-internals in order to require internal modules in main
479  flags: ['--zero-fill-buffers']
480};
481
482// `main` and `configs` are required, `options` is optional.
483const bench = common.createBenchmark(main, configs, options);
484
485// Any code outside main will be run twice,
486// in different processes, with different command line arguments.
487
488function main(conf) {
489  // Only flags that have been passed to createBenchmark
490  // earlier when main is run will be in effect.
491  // In order to benchmark the internal modules, require them here. For example:
492  // const URL = require('internal/url').URL
493
494  // Start the timer
495  bench.start();
496
497  // Do operations here
498  const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer;
499
500  for (let i = 0; i < conf.n; i++) {
501    new BufferConstructor(conf.size);
502  }
503
504  // End the timer, pass in the number of operations
505  bench.end(conf.n);
506}
507```
508
509### Creating an HTTP benchmark
510
511The `bench` object returned by `createBenchmark` implements
512`http(options, callback)` method. It can be used to run external tool to
513benchmark HTTP servers.
514
515```js
516'use strict';
517
518const common = require('../common.js');
519
520const bench = common.createBenchmark(main, {
521  kb: [64, 128, 256, 1024],
522  connections: [100, 500],
523  duration: 5
524});
525
526function main(conf) {
527  const http = require('http');
528  const len = conf.kb * 1024;
529  const chunk = Buffer.alloc(len, 'x');
530  const server = http.createServer((req, res) => {
531    res.end(chunk);
532  });
533
534  server.listen(common.PORT, () => {
535    bench.http({
536      connections: conf.connections,
537    }, () => {
538      server.close();
539    });
540  });
541}
542```
543
544Supported options keys are:
545
546* `port` - defaults to `common.PORT`
547* `path` - defaults to `/`
548* `connections` - number of concurrent connections to use, defaults to 100
549* `duration` - duration of the benchmark in seconds, defaults to 10
550* `benchmarker` - benchmarker to use, defaults to the first available http
551  benchmarker
552
553[autocannon]: https://github.com/mcollina/autocannon
554[benchmark-ci]: https://github.com/nodejs/benchmarking/blob/master/docs/core_benchmarks.md
555[git-for-windows]: https://git-scm.com/download/win
556[nghttp2.org]: https://nghttp2.org
557[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes.2C_unequal_variances
558[wrk]: https://github.com/wg/wrk
559