1# Benchmarks 2 3The results of these benchmarks suggest that building this `bc` with 4optimization at `-O3` with link-time optimization (`-flto`) will result in the 5best performance. However, using `-march=native` can result in **WORSE** 6performance. 7 8*Note*: all benchmarks were run four times, and the fastest run is the one 9shown. Also, `[bc]` means whichever `bc` was being run, and the assumed working 10directory is the root directory of this repository. Also, this `bc` was at 11version `3.0.0` while GNU `bc` was at version `1.07.1`, and all tests were 12conducted on an `x86_64` machine running Gentoo Linux with `clang` `9.0.1` as 13the compiler. 14 15## Typical Optimization Level 16 17These benchmarks were run with both `bc`'s compiled with the typical `-O2` 18optimizations and no link-time optimization. 19 20### Addition 21 22The command used was: 23 24``` 25tests/script.sh bc add.bc 1 0 1 1 [bc] 26``` 27 28For GNU `bc`: 29 30``` 31real 2.54 32user 1.21 33sys 1.32 34``` 35 36For this `bc`: 37 38``` 39real 0.88 40user 0.85 41sys 0.02 42``` 43 44### Subtraction 45 46The command used was: 47 48``` 49tests/script.sh bc subtract.bc 1 0 1 1 [bc] 50``` 51 52For GNU `bc`: 53 54``` 55real 2.51 56user 1.05 57sys 1.45 58``` 59 60For this `bc`: 61 62``` 63real 0.91 64user 0.85 65sys 0.05 66``` 67 68### Multiplication 69 70The command used was: 71 72``` 73tests/script.sh bc multiply.bc 1 0 1 1 [bc] 74``` 75 76For GNU `bc`: 77 78``` 79real 7.15 80user 4.69 81sys 2.46 82``` 83 84For this `bc`: 85 86``` 87real 2.20 88user 2.10 89sys 0.09 90``` 91 92### Division 93 94The command used was: 95 96``` 97tests/script.sh bc divide.bc 1 0 1 1 [bc] 98``` 99 100For GNU `bc`: 101 102``` 103real 3.36 104user 1.87 105sys 1.48 106``` 107 108For this `bc`: 109 110``` 111real 1.61 112user 1.57 113sys 0.03 114``` 115 116### Power 117 118The command used was: 119 120``` 121printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null 122``` 123 124For GNU `bc`: 125 126``` 127real 11.30 128user 11.30 129sys 0.00 130``` 131 132For this `bc`: 133 134``` 135real 0.73 136user 0.72 137sys 0.00 138``` 139 140### Scripts 141 142[This file][1] was downloaded, saved at `../timeconst.bc` and the following 143patch was applied: 144 145``` 146--- ../timeconst.bc 2018-09-28 11:32:22.808669000 -0600 147+++ ../timeconst.bc 2019-06-07 07:26:36.359913078 -0600 148@@ -110,8 +110,10 @@ 149 150 print "#endif /* KERNEL_TIMECONST_H */\n" 151 } 152- halt 153 } 154 155-hz = read(); 156-timeconst(hz) 157+for (i = 0; i <= 50000; ++i) { 158+ timeconst(i) 159+} 160+ 161+halt 162``` 163 164The command used was: 165 166``` 167time -p [bc] ../timeconst.bc > /dev/null 168``` 169 170For GNU `bc`: 171 172``` 173real 16.71 174user 16.06 175sys 0.65 176``` 177 178For this `bc`: 179 180``` 181real 13.16 182user 13.15 183sys 0.00 184``` 185 186Because this `bc` is faster when doing math, it might be a better comparison to 187run a script that is not running any math. As such, I put the following into 188`../test.bc`: 189 190``` 191for (i = 0; i < 100000000; ++i) { 192 y = i 193} 194 195i 196y 197 198halt 199``` 200 201The command used was: 202 203``` 204time -p [bc] ../test.bc > /dev/null 205``` 206 207For GNU `bc`: 208 209``` 210real 16.60 211user 16.59 212sys 0.00 213``` 214 215For this `bc`: 216 217``` 218real 22.76 219user 22.75 220sys 0.00 221``` 222 223I also put the following into `../test2.bc`: 224 225``` 226i = 0 227 228while (i < 100000000) { 229 i += 1 230} 231 232i 233 234halt 235``` 236 237The command used was: 238 239``` 240time -p [bc] ../test2.bc > /dev/null 241``` 242 243For GNU `bc`: 244 245``` 246real 17.32 247user 17.30 248sys 0.00 249``` 250 251For this `bc`: 252 253``` 254real 16.98 255user 16.96 256sys 0.01 257``` 258 259It seems that the improvements to the interpreter helped a lot in certain cases. 260 261Also, I have no idea why GNU `bc` did worse when it is technically doing less 262work. 263 264## Recommended Optimizations from `2.7.0` 265 266Note that, when running the benchmarks, the optimizations used are not the ones 267I recommended for version `2.7.0`, which are `-O3 -flto -march=native`. 268 269This `bc` separates its code into modules that, when optimized at link time, 270removes a lot of the inefficiency that comes from function overhead. This is 271most keenly felt with one function: `bc_vec_item()`, which should turn into just 272one instruction (on `x86_64`) when optimized at link time and inlined. There are 273other functions that matter as well. 274 275I also recommended `-march=native` on the grounds that newer instructions would 276increase performance on math-heavy code. We will see if that assumption was 277correct. (Spoiler: **NO**.) 278 279When compiling both `bc`'s with the optimizations I recommended for this `bc` 280for version `2.7.0`, the results are as follows. 281 282### Addition 283 284The command used was: 285 286``` 287tests/script.sh bc add.bc 1 0 1 1 [bc] 288``` 289 290For GNU `bc`: 291 292``` 293real 2.44 294user 1.11 295sys 1.32 296``` 297 298For this `bc`: 299 300``` 301real 0.59 302user 0.54 303sys 0.05 304``` 305 306### Subtraction 307 308The command used was: 309 310``` 311tests/script.sh bc subtract.bc 1 0 1 1 [bc] 312``` 313 314For GNU `bc`: 315 316``` 317real 2.42 318user 1.02 319sys 1.40 320``` 321 322For this `bc`: 323 324``` 325real 0.64 326user 0.57 327sys 0.06 328``` 329 330### Multiplication 331 332The command used was: 333 334``` 335tests/script.sh bc multiply.bc 1 0 1 1 [bc] 336``` 337 338For GNU `bc`: 339 340``` 341real 7.01 342user 4.50 343sys 2.50 344``` 345 346For this `bc`: 347 348``` 349real 1.59 350user 1.53 351sys 0.05 352``` 353 354### Division 355 356The command used was: 357 358``` 359tests/script.sh bc divide.bc 1 0 1 1 [bc] 360``` 361 362For GNU `bc`: 363 364``` 365real 3.26 366user 1.82 367sys 1.44 368``` 369 370For this `bc`: 371 372``` 373real 1.24 374user 1.20 375sys 0.03 376``` 377 378### Power 379 380The command used was: 381 382``` 383printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null 384``` 385 386For GNU `bc`: 387 388``` 389real 11.08 390user 11.07 391sys 0.00 392``` 393 394For this `bc`: 395 396``` 397real 0.71 398user 0.70 399sys 0.00 400``` 401 402### Scripts 403 404The command for the `../timeconst.bc` script was: 405 406``` 407time -p [bc] ../timeconst.bc > /dev/null 408``` 409 410For GNU `bc`: 411 412``` 413real 15.62 414user 15.08 415sys 0.53 416``` 417 418For this `bc`: 419 420``` 421real 10.09 422user 10.08 423sys 0.01 424``` 425 426The command for the next script, the `for` loop script, was: 427 428``` 429time -p [bc] ../test.bc > /dev/null 430``` 431 432For GNU `bc`: 433 434``` 435real 14.76 436user 14.75 437sys 0.00 438``` 439 440For this `bc`: 441 442``` 443real 17.95 444user 17.94 445sys 0.00 446``` 447 448The command for the next script, the `while` loop script, was: 449 450``` 451time -p [bc] ../test2.bc > /dev/null 452``` 453 454For GNU `bc`: 455 456``` 457real 14.84 458user 14.83 459sys 0.00 460``` 461 462For this `bc`: 463 464``` 465real 13.53 466user 13.52 467sys 0.00 468``` 469 470## Link-Time Optimization Only 471 472Just for kicks, let's see if `-march=native` is even useful. 473 474The optimizations I used for both `bc`'s were `-O3 -flto`. 475 476### Addition 477 478The command used was: 479 480``` 481tests/script.sh bc add.bc 1 0 1 1 [bc] 482``` 483 484For GNU `bc`: 485 486``` 487real 2.41 488user 1.05 489sys 1.35 490``` 491 492For this `bc`: 493 494``` 495real 0.58 496user 0.52 497sys 0.05 498``` 499 500### Subtraction 501 502The command used was: 503 504``` 505tests/script.sh bc subtract.bc 1 0 1 1 [bc] 506``` 507 508For GNU `bc`: 509 510``` 511real 2.39 512user 1.10 513sys 1.28 514``` 515 516For this `bc`: 517 518``` 519real 0.65 520user 0.57 521sys 0.07 522``` 523 524### Multiplication 525 526The command used was: 527 528``` 529tests/script.sh bc multiply.bc 1 0 1 1 [bc] 530``` 531 532For GNU `bc`: 533 534``` 535real 6.82 536user 4.30 537sys 2.51 538``` 539 540For this `bc`: 541 542``` 543real 1.57 544user 1.49 545sys 0.08 546``` 547 548### Division 549 550The command used was: 551 552``` 553tests/script.sh bc divide.bc 1 0 1 1 [bc] 554``` 555 556For GNU `bc`: 557 558``` 559real 3.25 560user 1.81 561sys 1.43 562``` 563 564For this `bc`: 565 566``` 567real 1.27 568user 1.23 569sys 0.04 570``` 571 572### Power 573 574The command used was: 575 576``` 577printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null 578``` 579 580For GNU `bc`: 581 582``` 583real 10.50 584user 10.49 585sys 0.00 586``` 587 588For this `bc`: 589 590``` 591real 0.72 592user 0.71 593sys 0.00 594``` 595 596### Scripts 597 598The command for the `../timeconst.bc` script was: 599 600``` 601time -p [bc] ../timeconst.bc > /dev/null 602``` 603 604For GNU `bc`: 605 606``` 607real 15.50 608user 14.81 609sys 0.68 610``` 611 612For this `bc`: 613 614``` 615real 10.17 616user 10.15 617sys 0.01 618``` 619 620The command for the next script, the `for` loop script, was: 621 622``` 623time -p [bc] ../test.bc > /dev/null 624``` 625 626For GNU `bc`: 627 628``` 629real 14.99 630user 14.99 631sys 0.00 632``` 633 634For this `bc`: 635 636``` 637real 16.85 638user 16.84 639sys 0.00 640``` 641 642The command for the next script, the `while` loop script, was: 643 644``` 645time -p [bc] ../test2.bc > /dev/null 646``` 647 648For GNU `bc`: 649 650``` 651real 14.92 652user 14.91 653sys 0.00 654``` 655 656For this `bc`: 657 658``` 659real 12.75 660user 12.75 661sys 0.00 662``` 663 664It turns out that `-march=native` can be a problem. As such, I have removed the 665recommendation to build with `-march=native`. 666 667## Recommended Compiler 668 669When I ran these benchmarks with my `bc` compiled under `clang` vs. `gcc`, it 670performed much better under `clang`. I recommend compiling this `bc` with 671`clang`. 672 673[1]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc 674