1# Benchmarks 2 3These are the results of benchmarks comparing this `bc` (at version `2.1.0`) and 4GNU `bc` (at version `1.07.1`). 5 6Note: all benchmarks were run four times, and the fastest run is the one shown. 7Also, `[bc]` means whichever `bc` was being run, and the assumed working 8directory is the root directory of this repository. Also, this `bc` was built at 9`-O2`. 10 11Note: some mistakes were made when updating these benchmarks for `2.1.0`. 12First, I did not update this `bc`'s version in this file. Second, I ran this 13`bc` after compiling with `clang` when the GNU `bc` was almost certainly 14compiled with `gcc`. Those mistakes have been fixed. 15 16### Addition 17 18The command used was: 19 20``` 21tests/script.sh bc add.bc 0 1 1 [bc] 22``` 23 24For GNU `bc`: 25 26``` 27Running bc script: add.bc 28 29real 2.06 30user 1.09 31sys 0.96 32``` 33 34For this `bc`: 35 36``` 37Running bc script: add.bc 38 39real 0.95 40user 0.90 41sys 0.05 42``` 43 44### Subtraction 45 46The command used was: 47 48``` 49tests/script.sh bc subtract.bc 0 1 1 [bc] 50``` 51 52For GNU `bc`: 53 54``` 55Running bc script: subtract.bc 56 57real 2.08 58user 1.13 59sys 0.94 60``` 61 62For this `bc`: 63 64``` 65Running bc script: subtract.bc 66 67real 0.92 68user 0.88 69sys 0.04 70``` 71 72### Multiplication 73 74The command used was: 75 76``` 77tests/script.sh bc multiply.bc 0 1 1 [bc] 78``` 79 80For GNU `bc`: 81 82``` 83Running bc script: multiply.bc 84 85real 5.54 86user 3.72 87sys 1.81 88``` 89 90For this `bc`: 91 92``` 93Running bc script: multiply.bc 94 95real 2.06 96user 2.01 97sys 0.05 98``` 99 100### Division 101 102The command used was: 103 104``` 105tests/script.sh bc divide.bc 0 1 1 [bc] 106``` 107 108For GNU `bc`: 109 110``` 111Running bc script: divide.bc 112 113real 2.80 114user 1.68 115sys 1.11 116``` 117 118For this `bc`: 119 120``` 121Running bc script: divide.bc 122 123real 1.45 124user 1.42 125sys 0.02 126``` 127 128### Power 129 130The command used was: 131 132``` 133printf '1234567890^100000; halt\n' | time -p [bc] -lq > /dev/null 134``` 135 136For GNU `bc`: 137 138``` 139real 11.46 140user 11.45 141sys 0.00 142``` 143 144For this `bc`: 145 146``` 147real 0.75 148user 0.75 149sys 0.00 150``` 151 152### Scripts 153 154[This file][1] was downloaded, saved at `../timeconst.bc` and the following 155patch was applied: 156 157``` 158--- tests/bc/scripts/timeconst.bc 2018-09-28 11:32:22.808669000 -0600 159+++ ../timeconst.bc 2019-06-07 07:26:36.359913078 -0600 160@@ -108,8 +108,10 @@ 161 162 print "#endif /* KERNEL_TIMECONST_H */\n" 163 } 164- halt 165 } 166 167-hz = read(); 168-timeconst(hz) 169+for (i = 0; i <= 50000; ++i) { 170+ timeconst(i) 171+} 172+ 173+halt 174``` 175 176The command used was: 177 178``` 179time -p [bc] ../timeconst.bc > /dev/null 180``` 181 182For GNU `bc`: 183 184``` 185real 15.16 186user 14.59 187sys 0.56 188``` 189 190For this `bc`: 191 192``` 193real 11.63 194user 11.63 195sys 0.00 196``` 197 198Because this `bc` is faster when doing math, it might be a better comparison to 199run a script that is not running any math. As such, I put the following into 200`../test.bc`: 201 202``` 203for (i = 0; i < 100000000; ++i) { 204 y = i 205} 206 207i 208y 209 210halt 211``` 212 213The command used was: 214 215``` 216time -p [bc] ../test.bc > /dev/null 217``` 218 219For GNU `bc`: 220 221``` 222real 12.84 223user 12.84 224sys 0.00 225``` 226 227For this `bc`: 228 229``` 230real 21.20 231user 21.20 232sys 0.00 233``` 234 235However, when I put the following into `../test2.bc`: 236 237``` 238i = 0 239 240while (i < 100000000) { 241 ++i 242} 243 244i 245 246halt 247``` 248 249the results were surprising. 250 251The command used was: 252 253``` 254time -p [bc] ../test2.bc > /dev/null 255``` 256 257For GNU `bc`: 258 259``` 260real 13.80 261user 13.80 262sys 0.00 263``` 264 265For this `bc`: 266 267``` 268real 14.90 269user 14.90 270sys 0.00 271``` 272 273It seems that my `bc` runs `while` loops faster than `for` loops. I don't know 274why it does that because both loops are using the same code underneath the hood. 275 276Note that, when running the benchmarks, the optimization used is not the one I 277recommend, which is `-O3 -flto -march=native`. This `bc` separates its code into 278modules that, when optimized at link time, removes a lot of the inefficiency 279that comes from function overhead. This is most keenly felt with one function: 280`bc_vec_item()`, which should turn into just one instruction (on `x86_64`) when 281optimized at link time and inlined. There are other functions that matter as 282well. 283 284When compiling this `bc` with the recommended optimizations, the results are as 285follows. 286 287For the first script: 288 289``` 290real 9.85 291user 9.85 292sys 0.00 293``` 294 295For the second script: 296 297``` 298real 18.04 299user 18.04 300sys 0.00 301``` 302 303For the third script: 304 305``` 306real 12.66 307user 12.66 308sys 0.00 309``` 310 311This is more competitive. 312 313In addition, when compiling with the above recommendation, this `bc` gets even 314faster when doing math. 315 316### Recommended Compiler 317 318When I ran these benchmarks with my `bc` compiled under `clang`, it performed 319much better. I recommend compiling this `bc` with `clang`. 320 321[1]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc 322