1# Development 2 3Updated: 27 Mar 2023 4 5This document is meant for the day when I (Gavin D. Howard) get [hit by a 6bus][1]. In other words, it's meant to make the [bus factor][1] a non-issue. 7 8This document is supposed to contain all of the knowledge necessary to develop 9`bc` and `dc`. 10 11In addition, this document is meant to add to the [oral tradition of software 12engineering][118], as described by Bryan Cantrill. 13 14This document will reference other parts of the repository. That is so a lot of 15the documentation can be closest to the part of the repo where it is actually 16necessary. 17 18## What Is It? 19 20This repository contains an implementation of both [POSIX `bc`][2] and [Unix 21`dc`][3]. 22 23POSIX `bc` is a standard utility required for POSIX systems. `dc` is a 24historical utility that was included in early Unix and even predates both Unix 25and C. They both are arbitrary-precision command-line calculators with their own 26programming languages. `bc`'s language looks similar to C, with infix notation 27and including functions, while `dc` uses [Reverse Polish Notation][4] and allows 28the user to execute strings as though they were functions. 29 30In addition, it is also possible to build the arbitrary-precision math as a 31library, named [`bcl`][156]. 32 33**Note**: for ease, I will refer to both programs as `bc` in this document. 34However, if I say "just `bc`," I am referring to just `bc`, and if I say `dc`, I 35am referring to just `dc`. 36 37### History 38 39This project started in January 2018 when a certain individual on IRC, hearing 40that I knew how to write parsers, asked me to write a `bc` parser for his math 41library. I did so. I thought about writing my own math library, but he 42disparaged my programming skills and made me think that I couldn't do it. 43 44However, he took so long to do it that I eventually decided to give it a try and 45had a working math portion in two weeks. It taught me that I should not listen 46to such people. 47 48From that point, I decided to make it an extreme learning experience about how 49to write quality software. 50 51That individual's main goal had been to get his `bc` into [toybox][16], and I 52managed to get my own `bc` in. I also got it in [busybox][17]. 53 54Eventually, in late 2018, I also decided to try my hand at implementing 55[Karatsuba multiplication][18], an algorithm that that unnamed individual 56claimed I could never implement. It took me a bit, but I did it. 57 58This project became a passion project for me, and I continued. In mid-2019, 59Stefan Eßer suggested I improve performance by putting more than 1 digit in each 60section of the numbers. After I showed immaturity because of some burnout, I 61implemented his suggestion, and the results were incredible. 62 63Since that time, I have gradually been improving the `bc` as I have learned more 64about things like fuzzing, [`scan-build`][19], [valgrind][20], 65[AddressSanitizer][21] (and the other sanitizers), and many other things. 66 67One of my happiest moments was when my `bc` was made the default in FreeBSD. 68Another happiest moment was when I found out that my `bc` had shipped with Mac 69OSX Ventura, without my knowledge. 70 71But since I believe in [finishing the software I write][22], I have done less 72work on `bc` over time, though there are still times when I put a lot of effort 73in, such as now (17 June 2021), when I am attempting to convince OpenBSD to use 74my `bc`. 75 76And that is why I am writing this document: someday, someone else is going to 77want to change my code, and this document is my attempt to make it as simple as 78possible. 79 80### Values 81 82[According to Bryan Cantrill][10], all software has values. I think he's 83correct, though I [added one value for programming languages in particular][11]. 84 85However, for `bc`, his original list will do: 86 87* Approachability 88* Availability 89* Compatibility 90* Composability 91* Debuggability 92* Expressiveness 93* Extensibility 94* Interoperability 95* Integrity 96* Maintainability 97* Measurability 98* Operability 99* Performance 100* Portability 101* Resiliency 102* Rigor 103* Robustness 104* Safety 105* Security 106* Simplicity 107* Stability 108* Thoroughness 109* Transparency 110* Velocity 111 112There are several values that don't apply. The reason they don't apply is 113because `bc` and `dc` are existing utilities; this is just another 114reimplementation. The designs of `bc` and `dc` are set in stone; there is 115nothing we can do to change them, so let's get rid of those values that would 116apply to their design: 117 118* Compatibility 119* Integrity 120* Maintainability 121* Measurability 122* Performance 123* Portability 124* Resiliency 125* Rigor 126* Robustness 127* Safety 128* Security 129* Simplicity 130* Stability 131* Thoroughness 132* Transparency 133 134Furthermore, some of the remaining ones don't matter to me, so let me get rid of 135those and order the rest according to my *actual* values for this project: 136 137* Robustness 138* Stability 139* Portability 140* Compatibility 141* Performance 142* Security 143* Simplicity 144 145First is **robustness**. This `bc` and `dc` should be robust, accepting any 146input, never crashing, and instead, returning an error. 147 148Closely related to that is **stability**. The execution of `bc` and `dc` should 149be deterministic and never change for the same inputs, including the 150pseudo-random number generator (for the same seed). 151 152Third is **portability**. These programs should run everywhere that POSIX 153exists, as well as Windows. This means that just about every person on the 154planet will have access to these programs. 155 156Next is **compatibility**. These programs should, as much as possible, be 157compatible with other existing implementations and standards. 158 159Then we come to **performance**. A calculator is only usable if it's fast, so 160these programs should run as fast as possible. 161 162After that is **security**. These programs should *never* be the reason a user's 163computer is compromised. 164 165And finally, **simplicity**. Where possible, the code should be simple, while 166deferring to the above values. 167 168Keep these values in mind for the rest of this document, and for exploring any 169other part of this repo. 170 171#### Portability 172 173But before I go on, I want to talk about portability in particular. 174 175Most of these principles just require good attention and care, but portability 176is different. Sometimes, it requires pulling in code from other places and 177adapting it. In other words, sometimes I need to duplicate and adapt code. 178 179This happened in a few cases: 180 181* Option parsing (see [`include/opt.h`][35]). 182* History (see [`include/history.h`][36]). 183* Pseudo-Random Number Generator (see [`include/rand.h`][37]). 184 185This was done because I decided to ensure that `bc`'s dependencies were 186basically zero. In particular, either users have a normal install of Windows or 187they have a POSIX system. 188 189A POSIX system limited me to C99, `sh`, and zero external dependencies. That 190last item is why I pull code into `bc`: if I pull it in, it's not an external 191dependency. 192 193That's why `bc` has duplicated code. Remove it, and you risk `bc` not being 194portable to some platforms. 195 196## Suggested Course 197 198I do have a suggested course for programmers to follow when trying to understand 199this codebase. The order is this: 200 2011. `bc` Spec. 2022. Manpages. 2033. Test suite. 2044. Understand the build. 2055. Algorithms manual. 2066. Code concepts. 2077. Repo structure. 2088. Headers. 2099. Source code. 210 211This order roughly follows this order: 212 2131. High-level requirements 2142. Low-level requirements 2153. High-level implementation 2164. Low-level implementation 217 218In other words, first understand what the code is *supposed* to do, then 219understand the code itself. 220 221## Useful External Tools 222 223I have a few tools external to `bc` that are useful: 224 225* A [Vim plugin with syntax files made specifically for my `bc` and `dc`][132]. 226* A [repo of `bc` and `dc` scripts][133]. 227* A set of `bash` aliases (see below). 228* A `.bcrc` file with items useful for my `bash` setup (see below). 229 230My `bash` aliases are these: 231 232```sh 233alias makej='make -j16' 234alias mcmake='make clean && make' 235alias mcmakej='make clean && make -j16' 236alias bcdebug='CPPFLAGS="-DBC_DEBUG_CODE=1" CFLAGS="-Weverything -Wno-padded \ 237 -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \ 238 -Wno-unreachable-code-return -Wno-missing-noreturn \ 239 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 240 -pedantic -std=c99" ./configure.sh' 241alias bcconfig='CFLAGS="-Weverything -Wno-padded -Wno-switch-enum \ 242 -Wno-format-nonliteral -Wno-cast-align -Wno-unreachable-code-return \ 243 -Wno-missing-noreturn -Wno-disabled-macro-expansion -Wno-unreachable-code \ 244 -Wall -Wextra -pedantic -std=c99" ./configure.sh' 245alias bcnoassert='CPPFLAGS="-DNDEBUG" CFLAGS="-Weverything -Wno-padded \ 246 -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \ 247 -Wno-unreachable-code-return -Wno-missing-noreturn \ 248 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 249 -pedantic -std=c99" ./configure.sh' 250alias bcdebugnoassert='CPPFLAGS="-DNDEBUG -DBC_DEBUG_CODE=1" \ 251 CFLAGS="-Weverything -Wno-padded -Wno-switch-enum -Wno-format-nonliteral \ 252 -Wno-cast-align -Wno-unreachable-code-return -Wno-missing-noreturn \ 253 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 254 -pedantic -std=c99" ./configure.sh' 255alias bcunset='unset BC_LINE_LENGTH && unset BC_ENV_ARGS' 256``` 257 258`makej` runs `make` with all of my cores. 259 260`mcmake` runs `make clean` before running `make`. It will take a target on the 261command-line. 262 263`mcmakej` is a combination of `makej` and `mcmake`. 264 265`bcdebug` configures `bc` for a full debug build, including `BC_DEBUG_CODE` (see 266[Debugging][134] below). 267 268`bcconfig` configures `bc` with Clang (Clang is my personal default compiler) 269using full warnings, with a few really loud and useless warnings turned off. 270 271`bcnoassert` configures `bc` to not have asserts built in. 272 273`bcdebugnoassert` is like `bcnoassert`, except it also configures `bc` for debug 274mode. 275 276`bcunset` unsets my personal `bc` environment variables, which are set to: 277 278```sh 279export BC_ENV_ARGS="-l $HOME/.bcrc" 280export BC_LINE_LENGTH="74" 281``` 282 283Unsetting these environment variables are necessary for running 284[`scripts/release.sh`][83] because otherwise, it will error when attempting to 285run `bc -s` on my `$HOME/.bcrc`. 286 287Speaking of which, the contents of that file are: 288 289```bc 290define void print_time_unit(t){ 291 if(t<10)print "0" 292 if(t<1&&t)print "0" 293 print t,":" 294} 295define void sec2time(t){ 296 auto s,m,h,d,r 297 r=scale 298 scale=0 299 t=abs(t) 300 s=t%60 301 t-=s 302 m=t/60%60 303 t-=m 304 h=t/3600%24 305 t-=h 306 d=t/86400 307 if(d)print_time_unit(d) 308 if(h)print_time_unit(h) 309 print_time_unit(m) 310 if(s<10)print "0" 311 if(s<1&&s)print "0" 312 s 313 scale=r 314} 315define minutes(secs){ 316 return secs/60; 317} 318define hours(secs){ 319 return secs/3600; 320} 321define days(secs){ 322 return secs/3600/24; 323} 324define years(secs){ 325 return secs/3600/24/365.25; 326} 327define fbrand(b,p){ 328 auto l,s,t 329 b=abs(b)$ 330 if(b<2)b=2 331 s=scale 332 t=b^abs(p)$ 333 l=ceil(l2(t),0) 334 if(l>scale)scale=l 335 t=irand(t)/t 336 scale=s 337 return t 338} 339define ifbrand(i,b,p){return irand(abs(i)$)+fbrand(b,p)} 340``` 341 342This allows me to use `bc` as part of my `bash` prompt. 343 344## Code Style 345 346The code style for `bc` is...weird, and that comes from historical accident. 347 348In [History][23], I mentioned how I got my `bc` in [toybox][16]. Well, in order 349to do that, my `bc` originally had toybox style. Eventually, I changed to using 350tabs, and assuming they were 4 spaces wide, but other than that, I basically 351kept the same style, with some exceptions that are more or less dependent on my 352taste. 353 354However, I later managed to get [ClangFormat][24] to work, so I changed the 355style to that. 356 357### ClangFormat 358 359The style is now defined as whatever [ClangFormat][24] outputs using the 360existing `.clang-format` file. More precisely, the style is whatever is output 361when the following command is run in the root directory: 362 363``` 364./scripts/format.sh 365``` 366 367### Historical Style 368 369The code style used to be: 370 371* Tabs are 4 spaces. 372* Tabs are used at the beginning of lines for indent. 373* Spaces are used for alignment. 374* Lines are limited to 80 characters, period. 375* Pointer asterisk (`*`) goes with the variable (on the right), not the type, 376 unless it is for a pointer type returned from a function. 377* The opening brace is put on the same line as the header for the function, 378 loop, or `if` statement. 379* Unless the header is more than one line, in which case the opening brace is 380 put on its own line. 381* If the opening brace is put on its own line, there is no blank line after it. 382* If the opening brace is *not* put on its own line, there *is* a blank line 383 after it, *unless* the block is only one or two lines long. 384* Code lines are grouped into what I call "paragraphs." Basically, lines that 385 seem like they should go together are grouped together. This one comes down 386 to judgment. 387* Bodies of `if` statements, `else` statements, and loops that are one line 388 long are put on the same line as the statement, unless the header is more than 389 one line long, and/or, the header and body cannot fit into 80 characters with 390 a space inbetween them. 391* If single-line bodies are on a separate line from their headers, and the 392 headers are only a single line, then no braces are used. 393* However, braces are *always* used if they contain another `if` statement or 394 loop. 395* Loops with empty bodies are ended with a semicolon. 396* Expressions that return a boolean value are surrounded by paretheses. 397* Macro backslashes are aligned as far to the left as possible. 398* Binary operators have spaces on both sides. 399* If a line with binary operators overflows 80 characters, a newline is inserted 400 *after* binary operators. 401* Function modifiers and return types are on the same line as the function name. 402* With one exception, `goto`'s are only used to jump to the end of a function 403 for cleanup. 404* All structs, enums, and unions are `typedef`'ed. 405* All constant data is in one file: [`src/data.c`][131], but the corresponding 406 `extern` declarations are in the appropriate header file. 407* All local variables are declared at the beginning of the scope where they 408 appear. They may be initialized at that point, if it does not invoke UB or 409 otherwise cause bugs. 410* All precondition `assert()`'s (see [Asserts][135]) come *after* local variable 411 declarations. 412* Besides short `if` statements and loops, there should *never* be more than one 413 statement per line. 414 415## Repo Structure 416 417Functions are documented with Doxygen-style doc comments. Functions that appear 418in headers are documented in the headers, while static functions are documented 419where they are defined. 420 421### `configure` 422 423A symlink to [`configure.sh`][69]. 424 425### `configure.sh` 426 427This is the script to configure `bc` and [`bcl`][156] for building. 428 429This `bc` has a custom build system. The reason for this is because of 430[*portability*][136]. 431 432If `bc` used an outside build system, that build system would be an external 433dependency. Thus, I had to write a build system for `bc` that used nothing but 434C99 and POSIX utilities. 435 436One of those utilities is POSIX `sh`, which technically implements a 437Turing-complete programming language. It's a terrible one, but it works. 438 439A user that wants to build `bc` on a POSIX system (not Windows) first runs 440`configure.sh` with the options he wants. `configure.sh` uses those options and 441the `Makefile` template ([`Makefile.in`][70]) to generate an actual valid 442`Makefile`. Then `make` can do the rest. 443 444For more information about the build process, see the [Build System][142] 445section and the [build manual][14]. 446 447For more information about shell scripts, see [POSIX Shell Scripts][76]. 448 449`configure.sh` does the following: 450 4511. It processes command-line arguments and figure out what the user wants to 452 build. 4532. It reads in [`Makefile.in`][70]. 4543. One-by-one, it replaces placeholders (in [`Makefile.in`][70]) of the form 455 `%%<placeholder_name>%%` based on the [build type][81]. 4564. It appends a list of file targets based on the [build type][81]. 4575. It appends the correct test targets. 4586. It copies the correct manpage and markdown manual for `bc` and `dc` into a 459 location from which they can be copied for install. 4607. It does a `make clean` to reset the build state. 461 462### `.gitattributes` 463 464A `.gitattributes` file. This is needed to preserve the `crlf` line endings in 465the Visual Studio files. 466 467### `.gitignore` 468 469The `.gitignore` 470 471### `LICENSE.md` 472 473This is the `LICENSE` file, including the licenses of various software that I 474have borrowed. 475 476### `Makefile.in` 477 478This is the `Makefile` template for [`configure.sh`][69] to use for generating a 479`Makefile`. 480 481For more information, see [`configure.sh`][69], the [Build System][142] section, 482and the [build manual][14]. 483 484Because of [portability][136], the generated `Makefile.in` should be a pure 485[POSIX `make`][74]-compatible `Makefile` (minus the placeholders). Here are a 486few snares for the unwary programmer in this file: 487 4881. No extensions allowed, including and especially GNU extensions. 4892. If new headers are added, they must also be added to `Makefile.in`. 4903. Don't delete the `.POSIX:` empty target at the top; that's what tells `make` 491 implementations that pure [POSIX `make`][74] is needed. 492 493In particular, there is no way to set up variables other than the `=` operator. 494There are no conditionals, so all of the conditional stuff must be in 495[`configure.sh`][69]. This is, in fact, why [`configure.sh`][69] exists in the 496first place: [POSIX `make`][74] is barebones and only does a build with no 497configuration. 498 499### `NEWS.md` 500 501A running changelog with an entry for each version. This should be updated at 502the same time that [`include/version.h`][75] is. 503 504### `NOTICE.md` 505 506The `NOTICE` file with proper attributions. 507 508### `README.md` 509 510The `README`. Read it. 511 512### `benchmarks/` 513 514The folder containing files to generate benchmarks. 515 516Each of these files was made, at one time or another, to benchmark some 517experimental feature, so if it seems there is no rhyme or reason to these 518benchmarks, it is because there is none, besides historical accident. 519 520#### `bc/` 521 522The folder containing `bc` scripts to generate `bc` benchmarks. 523 524##### `add.bc` 525 526The file to generate the benchmark to benchmark addition in `bc`. 527 528##### `arrays_and_constants.bc` 529 530The file to generate the benchmark to benchmark `bc` using lots of array names 531and constants. 532 533##### `arrays.bc` 534 535The file to generate the benchmark to benchmark `bc` using lots of array names. 536 537##### `constants.bc` 538 539The file to generate the benchmark to benchmark `bc` using lots of constants. 540 541##### `divide.bc` 542 543The file to generate the benchmark to benchmark division in `bc`. 544 545##### `functions.bc` 546 547The file to generate the benchmark to benchmark `bc` using lots of functions. 548 549##### `irand_long.bc` 550 551The file to generate the benchmark to benchmark `bc` using lots of calls to 552`irand()` with large bounds. 553 554##### `irand_short.bc` 555 556The file to generate the benchmark to benchmark `bc` using lots of calls to 557`irand()` with small bounds. 558 559##### `lib.bc` 560 561The file to generate the benchmark to benchmark `bc` using lots of calls to 562heavy functions in `lib.bc`. 563 564##### `multiply.bc` 565 566The file to generate the benchmark to benchmark multiplication in `bc`. 567 568##### `postfix_incdec.bc` 569 570The file to generate the benchmark to benchmark `bc` using postfix increment and 571decrement operators. 572 573##### `power.bc` 574 575The file to generate the benchmark to benchmark power (exponentiation) in `bc`. 576 577##### `subtract.bc` 578 579The file to generate the benchmark to benchmark subtraction in `bc`. 580 581##### `strings.bc` 582 583The file to generate the benchmark to benchmark `bc` using lots of strings. 584 585#### `dc/` 586 587The folder containing `dc` scripts to generate `dc` benchmarks. 588 589##### `modexp.dc` 590 591The file to generate the benchmark to benchmark modular exponentiation in `dc`. 592 593### `gen/` 594 595A folder containing the files necessary to generate C strings that will be 596embedded in the executable. 597 598All of the files in this folder have license headers, but the program and script 599that can generate strings from them include code to strip the license header out 600before strings are generated. 601 602#### `bc_help.txt` 603 604A text file containing the text displayed for `bc -h` or `bc --help`. 605 606This text just contains the command-line options and a short summary of the 607differences from GNU and BSD `bc`'s. It also directs users to the manpage. 608 609The reason for this is because otherwise, the help would be far too long to be 610useful. 611 612**Warning**: The text has some `printf()` format specifiers. You need to make 613sure the format specifiers match the arguments given to `bc_file_printf()`. 614 615#### `dc_help.txt` 616 617A text file containing the text displayed for `dc -h` or `dc --help`. 618 619This text just contains the command-line options and a short summary of the 620differences from GNU and BSD `dc`'s. It also directs users to the manpage. 621 622The reason for this is because otherwise, the help would be far too long to be 623useful. 624 625**Warning**: The text has some `printf()` format specifiers. You need to make 626sure the format specifiers match the arguments given to `bc_file_printf()`. 627 628#### `lib.bc` 629 630A `bc` script containing the [standard math library][5] required by POSIX. See 631the [POSIX standard][2] for what is required. 632 633This file does not have any extraneous whitespace, except for tabs at the 634beginning of lines. That is because this data goes directly into the binary, 635and whitespace is extra bytes in the binary. Thus, not having any extra 636whitespace shrinks the resulting binary. 637 638However, tabs at the beginning of lines are kept for two reasons: 639 6401. Readability. (This file is still code.) 6412. The program and script that generate strings from this file can remove 642 tabs at the beginning of lines. 643 644For more details about the algorithms used, see the [algorithms manual][25]. 645 646However, there are a few snares for unwary programmers. 647 648First, all constants must be one digit. This is because otherwise, multi-digit 649constants could be interpreted wrongly if the user uses a different `ibase`. 650This does not happen with single-digit numbers because they are guaranteed to be 651interpreted what number they would be if the `ibase` was as high as possible. 652 653This is why `A` is used in the library instead of `10`, and things like `2*9*A` 654for `180` in [`lib2.bc`][26]. 655 656As an alternative, you can set `ibase` in the function, but if you do, make sure 657to set it with a single-digit number and beware the snare below... 658 659Second, `scale`, `ibase`, and `obase` must be safely restored before returning 660from any function in the library. This is because without the `-g` option, 661functions are allowed to change any of the globals. 662 663Third, all local variables in a function must be declared in an `auto` statement 664before doing anything else. This includes arrays. However, function parameters 665are considered predeclared. 666 667Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code 668must not use *any* extensions. It has to work when users use the `-s` or `-w` 669flags. 670 671#### `lib2.bc` 672 673A `bc` script containing the [extended math library][7]. 674 675Like [`lib.bc`][8], and for the same reasons, this file should have no 676extraneous whitespace, except for tabs at the beginning of lines. 677 678For more details about the algorithms used, see the [algorithms manual][25]. 679 680Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary 681programmers when writing code for `lib2.bc`. 682 683#### `strgen.c` 684 685Code for the program to generate C strings from text files. This is the original 686program, although [`strgen.sh`][9] was added later. 687 688The reason I used C here is because even though I knew `sh` would be available 689(it must be available to run `configure.sh`), I didn't know how to do what I 690needed to do with POSIX utilities and `sh`. 691 692Later, [`strgen.sh`][9] was contributed by Stefan Eßer of FreeBSD, showing that 693it *could* be done with `sh` and POSIX utilities. 694 695However, `strgen.c` exists *still* exists because the versions generated by 696[`strgen.sh`][9] may technically hit an environmental limit. (See the [draft C99 697standard][12], page 21.) This is because [`strgen.sh`][9] generates string 698literals, and in C99, string literals can be limited to 4095 characters, and 699`gen/lib2.bc` is above that. 700 701Fortunately, the limit for "objects," which include `char` arrays, is much 702bigger: 65535 bytes, so that's what `strgen.c` generates. 703 704However, the existence of `strgen.c` does come with a cost: the build needs C99 705compiler that targets the host machine. For more information, see the ["Cross 706Compiling" section][13] of the [build manual][14]. 707 708Read the comments in `strgen.c` for more detail about it, the arguments it 709takes, and how it works. 710 711#### `strgen.sh` 712 713An `sh` script that will generate C strings that uses only POSIX utilities. This 714exists for those situations where a host C99 compiler is not available, and the 715environment limits mentioned above in [`strgen.c`][15] don't matter. 716 717`strgen.sh` takes the same arguments as [`strgen.c`][15], and the arguments mean 718the exact same things, so see the comments in [`strgen.c`][15] for more detail 719about that, and see the comments in `strgen.sh` for more details about it and 720how it works. 721 722For more information about shell scripts, see [POSIX Shell Scripts][76]. 723 724### `include/` 725 726A folder containing the headers. 727 728The headers are not included among the source code because I like it better that 729way. Also there were folders within `src/` at one point, and I did not want to 730see `#include "../some_header.h"` or things like that. 731 732So all headers are here, even though only one ([`bcl.h`][30]) is meant for end 733users (to be installed in `INCLUDEDIR`). 734 735#### `args.h` 736 737This file is the API for processing command-line arguments. 738 739#### `bc.h` 740 741This header is the API for `bc`-only items. This includes the `bc_main()` 742function and the `bc`-specific lexing and parsing items. 743 744The `bc` parser is perhaps the most sensitive part of the entire codebase. See 745the documentation in `bc.h` for more information. 746 747The code associated with this header is in [`src/bc.c`][40], 748[`src/bc_lex.c`][41], and [`src/bc_parse.c`][42]. 749 750#### `bcl.h` 751 752This header is the API for the [`bcl`][156] library. 753 754This header is meant for distribution to end users and contains the API that end 755users of [`bcl`][156] can use in their own software. 756 757This header, because it's the public header, is also the root header. That means 758that it has platform-specific fixes for Windows. (If the fixes were not in this 759header, the build would fail on Windows.) 760 761The code associated with this header is in [`src/library.c`][43]. 762 763#### `dc.h` 764 765This header is the API for `dc`-only items. This includes the `dc_main()` 766function and the `dc`-specific lexing and parsing items. 767 768The code associated with this header is in [`src/dc.c`][44], 769[`src/dc_lex.c`][45], and [`src/dc_parse.c`][46]. 770 771#### `file.h` 772 773This header is for `bc`'s internal buffered I/O API. 774 775For more information about `bc`'s error handling and custom buffered I/O, see 776[Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the 777notes about version [`3.0.0`][32] in the [`NEWS`][32]. 778 779The code associated with this header is in [`src/file.c`][47]. 780 781#### `history.h` 782 783This header is for `bc`'s implementation of command-line editing/history, which 784is based on a [UTF-8-aware fork][28] of [`linenoise`][29]. 785 786For more information, see the [Command-Line History][189] section. 787 788The code associated with this header is in [`src/history.c`][48]. 789 790#### `lang.h` 791 792This header defines the data structures and bytecode used for actual execution 793of `bc` and `dc` code. 794 795Yes, it's misnamed; that's an accident of history where the first things I put 796into it all seemed related to the `bc` language. 797 798The code associated with this header is in [`src/lang.c`][49]. 799 800#### `lex.h` 801 802This header defines the common items that both programs need for lexing. 803 804The code associated with this header is in [`src/lex.c`][50], 805[`src/bc_lex.c`][41], and [`src/dc_lex.c`][45]. 806 807#### `library.h` 808 809This header defines the things needed for [`bcl`][156] that users should *not* 810have access to. In other words, [`bcl.h`][30] is the *public* header for the 811library, and this header is the *private* header for the library. 812 813The code associated with this header is in [`src/library.c`][43]. 814 815#### `num.h` 816 817This header is the API for numbers and math. 818 819The code associated with this header is in [`src/num.c`][39]. 820 821#### `opt.h` 822 823This header is the API for parsing command-line arguments. 824 825It's different from [`args.h`][31] in that [`args.h`][31] is for the main code 826to process the command-line arguments into global data *after* they have already 827been parsed by `opt.h` into proper tokens. In other words, `opt.h` actually 828parses the command-line arguments, and [`args.h`][31] turns that parsed data 829into flags (bits), strings, and expressions that will be used later. 830 831Why are they separate? Because originally, `bc` used `getopt_long()` for 832parsing, so [`args.h`][31] was the only one that existed. After it was 833discovered that `getopt_long()` has different behavior on different platforms, I 834adapted a [public-domain option parsing library][34] to do the job instead. And 835in doing so, I gave it its own header. 836 837They could probably be combined, but I don't really care enough at this point. 838 839The code associated with this header is in [`src/opt.c`][51]. 840 841#### `parse.h` 842 843This header defines the common items that both programs need for parsing. 844 845Note that the parsers don't produce abstract syntax trees (AST's) or any 846intermediate representations. They produce bytecode directly. In other words, 847they don't have special data structures except what they need to do their job. 848 849The code associated with this header is in [`src/parse.c`][50], 850[`src/bc_lex.c`][42], and [`src/dc_lex.c`][46]. 851 852#### `program.h` 853 854This header defines the items needed to manage the data structures in 855[`lang.h`][38] as well as any helper functions needed to generate bytecode or 856execute it. 857 858The code associated with this header is in [`src/program.c`][53]. 859 860#### `rand.h` 861 862This header defines the API for the [pseudo-random number generator 863(PRNG)][179]. 864 865The PRNG only generates fixed-size integers. The magic of generating random 866numbers of arbitrary size is actually given to the code that does math 867([`src/num.c`][39]). 868 869The code associated with this header is in [`src/rand.c`][54]. 870 871#### `read.h` 872 873This header defines the API for reading from files and `stdin`. 874 875Thus, [`file.h`][55] is really for buffered *output*, while this file is for 876*input*. There is no buffering needed for `bc`'s inputs. 877 878The code associated with this header is in [`src/read.c`][56]. 879 880#### `status.h` 881 882This header has several things: 883 884* A list of possible errors that internal `bc` code can use. 885* Compiler-specific fixes. 886* Platform-specific fixes. 887* Macros for `bc`'s [error handling][97]. 888 889There is no code associated with this header. 890 891#### `vector.h` 892 893This header defines the API for the vectors (resizable arrays) that are used for 894data structures. 895 896Vectors are what do the heavy lifting in almost all of `bc`'s data structures. 897Even the maps of identifiers and arrays use vectors. 898 899The code associated with this header is in [`src/vector.c`][228]. 900 901#### `version.h` 902 903This header defines the version of `bc`. 904 905There is no code associated with this header. 906 907#### `vm.h` 908 909This header defines the API for setting up and running `bc` and `dc`. 910 911It is so named because I think of it as the "virtual machine" of `bc`, though 912that is probably not true as [`program.h`][57] is probably the "virtual machine" 913API. Thus, the name is more historical accident. 914 915The code associated with this header is in [`src/vm.c`][58]. 916 917### `locales/` 918 919This folder contains a bunch of `.msg` files and soft links to the real `.msg` 920files. This is how locale support is implemented in `bc`. 921 922The files are in the format required by the [`gencat`][59] POSIX utility. They 923all have the same messages, in the same order, with the same numbering, under 924the same groups. This is because the locale system expects those messages in 925that order. 926 927The softlinks exist because for many locales, they would contain the exact same 928information. To prevent duplication, they are simply linked to a master copy. 929 930The naming format for all files is: 931 932``` 933<language_code>_<country_code>.<encoding>.msg 934``` 935 936This naming format must be followed for all locale files. 937 938### `manuals/` 939 940This folder contains the documentation for `bc`, `dc`, and [`bcl`][156], along 941with a few other manuals. 942 943#### `algorithms.md` 944 945This file explains the mathematical algorithms that are used. 946 947The hope is that this file will guide people in understanding how the math code 948works. 949 950#### `bc.1.md.in` 951 952This file is a template for the markdown version of the `bc` manual and 953manpages. 954 955For more information about how the manpages and markdown manuals are generated, 956and for why, see [`scripts/manpage.sh`][60] and [Manuals][86]. 957 958#### `bcl.3` 959 960This is the manpage for the [`bcl`][156] library. It is generated from 961[`bcl.3.md`][61] using [`scripts/manpage.sh`][60]. 962 963For the reason why I check generated data into the repo, see 964[`scripts/manpage.sh`][60] and [Manuals][86]. 965 966#### `bcl.3.md` 967 968This is the markdown manual for the [`bcl`][156] library. It is the source for the 969generated [`bcl.3`][62] file. 970 971#### `benchmarks.md` 972 973This is a document that compares this `bc` to GNU `bc` in various benchmarks. It 974was last updated when version [`3.0.0`][32] was released. 975 976It has very little documentation value, other than showing what compiler options 977are useful for performance. 978 979#### `build.md` 980 981This is the [build manual][14]. 982 983This `bc` has a custom build system. The reason for this is because of 984[*portability*][136]. 985 986If `bc` used an outside build system, that build system would be an external 987dependency. Thus, I had to write a build system for `bc` that used nothing but 988C99 and POSIX utilities, including barebones [POSIX `make`][74]. 989 990for more information about the build system, see the [build system][142] 991section, the [build manual][14], [`configure.sh`][69], and [`Makefile.in`][70]. 992 993#### `dc.1.md.in` 994 995This file is a template for the markdown version of the `dc` manual and 996manpages. 997 998For more information about how the manpages and markdown manuals are generated, 999and for why, see [`scripts/manpage.sh`][60] and [Manuals][86]. 1000 1001#### `development.md` 1002 1003The file you are reading right now. 1004 1005#### `header_bcl.txt` 1006 1007Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper 1008header. 1009 1010For more information about generating manuals, see [`scripts/manpage.sh`][60] 1011and [Manuals][86]. 1012 1013#### `header_bc.txt` 1014 1015Used by [`scripts/manpage.sh`][60] to give the [generated `bc` manpages][79] a 1016proper header. 1017 1018For more information about generating manuals, see [`scripts/manpage.sh`][60] 1019and [Manuals][86]. 1020 1021#### `header_dc.txt` 1022 1023Used by [`scripts/manpage.sh`][60] to give the [generated `dc` manpages][80] a 1024proper header. 1025 1026For more information about generating manuals, see [`scripts/manpage.sh`][60] 1027and [Manuals][86]. 1028 1029#### `header.txt` 1030 1031Used by [`scripts/manpage.sh`][60] to give all generated manpages a license 1032header. 1033 1034For more information about generating manuals, see [`scripts/manpage.sh`][60] 1035and [Manuals][86]. 1036 1037#### `release.md` 1038 1039A checklist that I try to somewhat follow when making a release. 1040 1041#### `bc/` 1042 1043A folder containing the `bc` manuals. 1044 1045Each `bc` manual corresponds to a [build type][81]. See that link for more 1046details. 1047 1048For each manual, there are two copies: the markdown version generated from the 1049template, and the manpage generated from the markdown version. 1050 1051#### `dc/` 1052 1053A folder containing the `dc` manuals. 1054 1055Each `dc` manual corresponds to a [build type][81]. See that link for more 1056details. 1057 1058For each manual, there are two copies: the markdown version generated from the 1059template, and the manpage generated from the markdown version. 1060 1061### `scripts/` 1062 1063This folder contains helper scripts. Most of them are written in pure [POSIX 1064`sh`][72], but three ([`afl.py`][94], [`karatsuba.py`][78], and 1065[`randmath.py`][95]) are written in Python 3, and one ([`ministat.c`][223]) is 1066written in C. [`ministat.c`][223] in particular is copied from elsewhere. 1067 1068For more information about the shell scripts, see [POSIX Shell Scripts][76]. 1069 1070#### `afl.py` 1071 1072This script is meant to be used as part of the fuzzing workflow. 1073 1074It does one of two things: checks for valid crashes, or runs `bc` and or `dc` 1075under all of the paths found by [AFL++][125]. 1076 1077See [Fuzzing][82] for more information about fuzzing, including this script. 1078 1079#### `alloc.sh` 1080 1081This script is a quick and dirty script to test whether or not the garbage 1082collection mechanism of the [`BcNum` caching][96] works. It has been little-used 1083because it tests something that is not important to correctness. 1084 1085#### `benchmark.sh` 1086 1087A script making it easy to run benchmarks and to run the executable produced by 1088[`ministat.c`][223] on them. 1089 1090For more information, see the [Benchmarks][144] section. 1091 1092#### `bitfuncgen.c` 1093 1094A source file for an executable to generate tests for `bc`'s bitwise functions 1095in [`gen/lib2.bc`][26]. The executable is `scripts/bitfuncgen`, and it is built 1096with `make bitfuncgen`. It produces the test on `stdout` and the expected 1097results on `stderr`. This means that to generat tests, use the following 1098invokation: 1099 1100``` 1101scripts/bitfuncgen > tests/bc/bitfuncs.txt 2> tests/bc/bitfuncs_results.txt 1102``` 1103 1104It calls `abort()` if it runs into an error. 1105 1106#### `exec-install.sh` 1107 1108This script is the magic behind making sure `dc` is installed properly if it's 1109a symlink to `bc`. It checks to see if it is a link, and if so, it just creates 1110a new symlink in the install directory. Of course, it also installs `bc` itself, 1111or `dc` when it's alone. 1112 1113#### `functions.sh` 1114 1115This file is a bunch of common functions for most of the POSIX shell scripts. It 1116is not supposed to be run; instead, it is *sourced* by other POSIX shell 1117scripts, like so: 1118 1119``` 1120. "$scriptdir/functions.sh" 1121``` 1122 1123or the equivalent, depending on where the sourcing script is. 1124 1125For more information about the shell scripts, see [POSIX Shell Scripts][76]. 1126 1127#### `fuzz_prep.sh` 1128 1129Fuzzing is a regular activity when I am preparing for a release. 1130 1131This script handles all the options and such for building a fuzzable binary. 1132Instead of having to remember a bunch of options, I just put them in this script 1133and run the script when I want to fuzz. 1134 1135For more information about fuzzing, see [Fuzzing][82]. 1136 1137#### `karatsuba.py` 1138 1139This script has at least one of two major differences from most of the other 1140scripts: 1141 1142* It's written in Python 3. 1143* It's meant for software packagers. 1144 1145For example, [`scripts/afl.py`][94] and [`scripts/randmath.py`][95] are both in 1146Python 3, but they are not meant for the end user or software packagers and are 1147not included in source distributions. But this script is. 1148 1149This script breaks my rule of only POSIX utilities necessary for package 1150maintainers, but there's a very good reason for that: it's only meant to be run 1151*once* when the package is created for the first time, and maybe not even then. 1152 1153You see, this script does two things: it tests the Karatsuba implementation at 1154various settings for `KARATSUBA_LEN`, and it figures out what the optimal 1155`KARATSUBA_LEN` is for the machine that it is running on. 1156 1157Package maintainers can use this script, when creating a package for this `bc`, 1158to figure out what is optimal for their users. Then they don't have to run it 1159ever again. So this script only has to run on the packagers machine. 1160 1161I tried to write the script in `sh`, by the way, and I finally accepted the 1162tradeoff of using Python 3 when it became too hard. 1163 1164However, I also mentioned that it's for testing Karatsuba with various settings 1165of `KARATSUBA_LEN`. Package maintainers will want to run the [test suite][124], 1166right? 1167 1168Yes, but this script is not part of the [test suite][124]; it's used for testing 1169in the [`scripts/release.sh`][83] script, which is maintainer use only. 1170 1171However, there is one snare with `karatsuba.py`: I didn't want the user to have 1172to install any Python libraries to run it. Keep that in mind if you change it. 1173 1174#### `link.sh` 1175 1176This script is the magic behind making `dc` a symlink of `bc` when both 1177calculators are built. 1178 1179#### `locale_install.sh` 1180 1181This script does what its name says: it installs locales. 1182 1183It turns out that this is complicated. 1184 1185There is a magic environment variable, `$NLSPATH`, that tells you how and where 1186you are supposed to install locales. 1187 1188Yes, *how*. And where. 1189 1190But now is not the place to rant about `$NLSPATH`. For more information on 1191locales and `$NLSPATH`, see [Locales][85]. 1192 1193#### `locale_uninstall.sh` 1194 1195This script does what its name says: it uninstalls locales. 1196 1197This is far less complicated than installing locales. I basically generate a 1198wildcard path and then list all paths that fit that wildcard. Then I delete each 1199one of those paths. Easy. 1200 1201For more information on locales, see [Locales][85]. 1202 1203#### `manpage.sh` 1204 1205This script is the one that generates markdown manuals from a template and a 1206manpage from a markdown manual. 1207 1208For more information about generating manuals, see [Manuals][86]. 1209 1210#### `ministat.c` 1211 1212This is a file copied [from FreeBSD][221] that calculates the standard 1213statistical numbers, such as mean, average, and median, based on numbers 1214obtained from a file. 1215 1216For more information, see the [FreeBSD ministat(1) manpage][222]. 1217 1218This file allows `bc` to build the `scripts/ministat` executable using the 1219command `make ministat`, and this executable helps programmers evaluate the 1220results of [benchmarks][144] more accurately. 1221 1222#### `package.sh` 1223 1224This script is what helps `bc` maintainers cut a release. It does the following: 1225 12261. Creates the appropriate `git` tag. 12272. Pushes the `git` tag. 12283. Copies the repo to a temp directory. 12294. Removes files that should not be included in source distributions. 12305. Creates the tarballs. 12316. Signs the tarballs. 12327. Zips and signs the Windows executables if they exist. 12338. Calculates and outputs SHA512 and SHA256 sums for all of the files, 1234 including the signatures. 1235 1236This script is for `bc` maintainers to use when cutting a release. It is not 1237meant for outside use. This means that some non-POSIX utilities can be used, 1238such as `git` and `gpg`. 1239 1240In addition, before using this script, it expects that the folders that Windows 1241generated when building `bc`, `dc`, and [`bcl`][156], are in the parent 1242directory of the repo, exactly as Windows generated them. If they are not there, 1243then it will not zip and sign, nor calculate sums of, the Windows executables. 1244 1245Because this script creates a tag and pushes it, it should *only* be run *ONCE* 1246per release. 1247 1248#### `radamsa.sh` 1249 1250A script to test `bc`'s command-line expression parsing code, which, while 1251simple, strives to handle as much as possible. 1252 1253What this script does is it uses the test cases in [`radamsa.txt`][98] an input 1254to the [Radamsa fuzzer][99]. 1255 1256For more information, see the [Radamsa][128] section. 1257 1258#### `radamsa.txt` 1259 1260Initial test cases for the [`radamsa.sh`][100] script. 1261 1262#### `randmath.py` 1263 1264This script generates random math problems and checks that `bc`'s and `dc`'s 1265output matches the GNU `bc` and `dc`. (For this reason, it is necessary to have 1266GNU `bc` and `dc` installed before using this script.) 1267 1268One snare: be sure that this script is using the GNU `bc` and `dc`, not a 1269previously-installed version of this `bc` and `dc`. 1270 1271If you want to check for memory issues or failing asserts, you can build the 1272`bc` using `./scripts/fuzz_prep.sh -a`, and then run it under this script. Any 1273errors or crashes should be caught by the script and given to the user as part 1274of the "checklist" (see below). 1275 1276The basic idea behind this script is that it generates as many math problems as 1277it can, biasing towards situations that may be likely to have bugs, and testing 1278each math problem against GNU `bc` or `dc`. 1279 1280If GNU `bc` or `dc` fails, it just continues. If this `bc` or `dc` fails, it 1281stores that problem. If the output mismatches, it also stores the problem. 1282 1283Then, when the user sends a `SIGINT`, the script stops testing and goes into 1284report mode. One-by-one, it will go through the "checklist," the list of failed 1285problems, and present each problem to the user, as well as whether this `bc` or 1286`dc` crashed, and its output versus GNU. Then the user can decide to add them as 1287test cases, which it does automatically to the appropriate test file. 1288 1289#### `release_settings.txt` 1290 1291A text file of settings combinations that [`release.sh`][83] uses to ensure that 1292`bc` and `dc` build and work with various default settings. [`release.sh`][83] 1293simply reads it line by line and uses each line for one build. 1294 1295#### `release.sh` 1296 1297This script is for `bc` maintainers only. It runs `bc`, `dc`, and [`bcl`][156] 1298through a gauntlet that is mostly meant to be used in preparation for a release. 1299 1300It does the following: 1301 13021. Builds every [build type][81], with every setting combo in 1303 [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc` 1304 alone. 13052. Builds every [build type][81], with every setting combo in 1306 [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc` 1307 alone for 32-bit. 13083. Does #1 and #2 for Debug, Release, Release with Debug Info, and Min Size 1309 Release builds. 13104. Runs the [test suite][124] on every build, if desired. 13115. Runs the [test suite][124] under [ASan, UBSan, and MSan][21] for every build 1312 type/setting combo. 13136. Runs [`scripts/karatsuba.py`][78] in test mode. 13147. Runs the [test suite][124] for both calculators, `bc` alone, and `dc` alone 1315 under [valgrind][20] and errors if there are any memory bugs or memory 1316 leaks. 1317 1318#### `safe-install.sh` 1319 1320A script copied from [musl][101] to atomically install files. 1321 1322#### `test_settings.sh` 1323 1324A quick and dirty script to help automate rebuilding while manually testing the 1325various default settings. 1326 1327This script uses [`test_settings.txt`][103] to generate the various settings 1328combos. 1329 1330For more information about settings, see [Settings][102] in the [build 1331manual][14]. 1332 1333#### `test_settings.txt` 1334 1335A list of the various settings combos to be used by [`test_settings.sh`][104]. 1336 1337### `src/` 1338 1339This folder is, obviously, where the actual heart and soul of `bc`, the source 1340code, is. 1341 1342All of the source files are in one folder; this simplifies the build system 1343immensely. 1344 1345There are separate files for `bc` and `dc` specific code ([`bc.c`][40], 1346[`bc_lex.c`][41], [`bc_parse.c`][42], [`dc.c`][44], [`dc_lex.c`][45], and 1347[`dc_parse.c`][46]) where possible because it is cleaner to exclude an entire 1348source file from a build than to have `#if`/`#endif` preprocessor guards. 1349 1350That said, it was easier in many cases to use preprocessor macros where both 1351calculators used much of the same code and data structures, so there is a 1352liberal sprinkling of them through the code. 1353 1354#### `args.c` 1355 1356Code for processing command-line arguments. 1357 1358The header for this file is [`include/args.h`][31]. 1359 1360#### `bc.c` 1361 1362The code for the `bc` main function `bc_main()`. 1363 1364The header for this file is [`include/bc.h`][106]. 1365 1366#### `bc_lex.c` 1367 1368The code for lexing that only `bc` needs. 1369 1370The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106]. 1371 1372#### `bc_parse.c` 1373 1374The code for parsing that only `bc` needs. This code is the most complex and 1375subtle in the entire codebase. 1376 1377The headers for this file are [`include/parse.h`][181] and 1378[`include/bc.h`][106]. 1379 1380#### `data.c` 1381 1382Due to [historical accident][23] because of a desire to get my `bc` into 1383[toybox][16], all of the constant data that `bc` needs is all in one file. This 1384is that file. 1385 1386There is no code in this file, but a lot of the const data has a heavy influence 1387on code, including the order of data in arrays because that order has to 1388correspond to the order of other things elsewhere in the codebase. If you change 1389the order of something in this file, run `make test`, and get errors, you 1390changed something that depends on the order that you messed up. 1391 1392Almost all headers have `extern` references to items in this file. 1393 1394#### `dc.c` 1395 1396The code for the `dc` main function `dc_main()`. 1397 1398The header for this file is [`include/dc.h`][182]. 1399 1400#### `dc_lex.c` 1401 1402The code for lexing that only `dc` needs. 1403 1404The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182]. 1405 1406#### `dc_parse.c` 1407 1408The code for parsing that only `dc` needs. 1409 1410The headers for this file are [`include/parse.h`][181] and 1411[`include/bc.h`][182]. 1412 1413#### `file.c` 1414 1415The code for `bc`'s implementation of buffered I/O. For more information about 1416why I implemented my own buffered I/O, see [`include/file.h`][55], [Error 1417Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes 1418about version [`3.0.0`][32] in the [`NEWS`][32]. 1419 1420The header for this file is [`include/file.h`][55]. 1421 1422#### `history.c` 1423 1424The code for `bc`'s implementation of command-line editing/history, which is 1425based on a [UTF-8-aware fork][28] of [`linenoise`][29]. 1426 1427For more information, see the [Command-Line History][189] section. 1428 1429The header for this file is [`include/history.h`][36]. 1430 1431#### `lang.c` 1432 1433The data structures used for actual execution of `bc` and `dc` code. 1434 1435While execution is done in [`src/program.c`][53], this file defines functions 1436for initializing, copying, and freeing the data structures, which is somewhat 1437orthogonal to actual execution. 1438 1439Yes, it's misnamed; that's an accident of history where the first things I put 1440into it all seemed related to the `bc` language. 1441 1442The header for this file is [`include/lang.h`][38]. 1443 1444#### `lex.c` 1445 1446The code for the common things that both programs need for lexing. 1447 1448The header for this file is [`include/lex.h`][180]. 1449 1450#### `library.c` 1451 1452The code to implement the public API of the `bcl` library. 1453 1454The code in this file does a lot to ensure that clients do not have to worry 1455about internal `bc` details, especially error handling with `setjmp()` and 1456`longjmp()`. That and encapsulating the handling of numbers are the bulk of what 1457the code in this file actually does because most of the library is still 1458implemented in [`src/num.c`][39]. 1459 1460The headers for this file are [`include/bcl.h`][30] and 1461[`include/library.h`][183]. 1462 1463#### `main.c` 1464 1465The entry point for both programs; this is the `main()` function. 1466 1467This file has no headers associated with it. 1468 1469#### `num.c` 1470 1471The code for all of the arbitrary-precision [numbers][177] and [math][178] in 1472`bc`. 1473 1474The header for this file is [`include/num.h`][184]. 1475 1476#### `opt.c` 1477 1478The code for parsing command-line options. 1479 1480The header for this file is [`include/opt.h`][35]. 1481 1482#### `parse.c` 1483 1484The code for the common items that both programs need for parsing. 1485 1486The header for this file is [`include/parse.h`][181]. 1487 1488#### `program.c` 1489 1490The code for the actual execution engine for `bc` and `dc` code. 1491 1492The header for this file is [`include/program.h`][57]. 1493 1494#### `rand.c` 1495 1496The code for the [pseudo-random number generator (PRNG)][179] and the special 1497stack handling it needs. 1498 1499The PRNG only generates fixed-size integers. The magic of generating random 1500numbers of arbitrary size is actually given to the code that does math 1501([`src/num.c`][39]). 1502 1503The header for this file is [`include/rand.h`][37]. 1504 1505#### `read.c` 1506 1507The code for reading from files and `stdin`. 1508 1509The header for this file is [`include/read.h`][185]. 1510 1511#### `vector.c` 1512 1513The code for [vectors][111], [maps][186], and [slab vectors][187], along with 1514slabs. 1515 1516The header for this file is [`include/vector.h`][174]. 1517 1518#### `vm.c` 1519 1520The code for setting up and running `bc` and `dc`. 1521 1522It is so named because I think of it as the "virtual machine" of `bc`, though 1523that is probably not true as [`program.h`][57] is probably the "virtual machine" 1524code. Thus, the name is more historical accident. 1525 1526The header for this file is [`include/vm.h`][27]. 1527 1528### `tests/` 1529 1530This directory contains the entire [test suite][124] and its infrastructure. 1531 1532#### `all.sh` 1533 1534A convenience script for the `make run_all_tests` target (see the [Group 1535Tests][141] section for more information). 1536 1537#### `all.txt` 1538 1539The file with the names of the calculators. This is to make it easier for the 1540test scripts to know where the standard and other test directories are. 1541 1542#### `bcl.c` 1543 1544The test for the [`bcl`][156] API. For more information, see the [`bcl` 1545Test][157] section. 1546 1547#### `error.sh` 1548 1549The script to run the file-based error tests in `tests/<calculator>/errors/` for 1550each calculator. For more information, see the [Error Tests][151] section. 1551 1552This is a separate script so that each error file can be run separately and in 1553parallel. 1554 1555#### `errors.sh` 1556 1557The script to run the line-based error tests in `tests/<calculator>/errors.txt` 1558for each calculator. For more information, see the [Error Tests][151] section. 1559 1560#### `extra_required.txt` 1561 1562The file with the list of tests which both calculators have that need the [Extra 1563Math build option][188]. This exists to make it easy for test scripts to skip 1564those tests when the [Extra Math build option][188] is disabled. 1565 1566#### `history.py` 1567 1568The file with all of the history tests. For more information, see the [History 1569Tests][155] section. 1570 1571#### `history.sh` 1572 1573The script to integrate [`history.py`][139] into the build system in a portable 1574way, and to skip it if necessary. 1575 1576This script also re-runs the test three times if it fails. This is because 1577`pexpect` can be flaky at times. 1578 1579#### `other.sh` 1580 1581The script to run the "other" (miscellaneous) tests for each calculator. For 1582more information, see the [Other Tests][154] section. 1583 1584#### `read.sh` 1585 1586The script to run the read tests for each calculator. For more information, see 1587the [`read()` Tests][153] section. 1588 1589#### `script.sed` 1590 1591The `sed` script to edit the output of GNU `bc` when generating script tests. 1592For more information, see the [Script Tests][150] section. 1593 1594#### `script.sh` 1595 1596The script for running one script test. For more information, see the [Script 1597Tests][150] section. 1598 1599#### `scripts.sh` 1600 1601The script to help the `make run_all_tests` (see the [Group Tests][141] section) 1602run all of the script tests. 1603 1604#### `stdin.sh` 1605 1606The script to run the `stdin` tests for each calculator. For more information, 1607see the [`stdin` Tests][152] section. 1608 1609#### `test.sh` 1610 1611The script to run one standard test. For more information, see the [Standard 1612Tests][149] section. 1613 1614#### `bc/` 1615 1616The standard tests directory for `bc`. For more information, see the [Standard 1617Tests][149] section. 1618 1619##### `all.txt` 1620 1621The file to tell the build system and `make run_all_tests` (see the [Group 1622Tests][141] section) what standard tests to run for `bc`, as well as in what 1623order. 1624 1625This file just lists the test names, one per line. 1626 1627##### `errors.txt` 1628 1629The initial error test file for `bc`. This file has one test per line. See the 1630[Error Tests][151] section for more information. 1631 1632##### `posix_errors.txt` 1633 1634The file of tests for POSIX compatibility for `bc`. This file has one test per 1635line. For more information, see the [Error Tests][151] section. 1636 1637##### `timeconst.sh` 1638 1639The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6]. 1640For more information, see the [Linux `timeconst.bc` Script][191]section. 1641 1642##### `errors/` 1643 1644The directory with error tests for `bc`, most discovered by AFL++ (see the 1645[Fuzzing][82] section). There is one test per file. For more information, see 1646the [Error Tests][151] section. 1647 1648##### `scripts/` 1649 1650The script tests directory for `bc`. For more information, see the [Script 1651Tests][150] section. 1652 1653###### `all.txt` 1654 1655A file to tell the build system and `make run_all_tests` (see the [Group 1656Tests][141] section) what script tests to run for `bc`, as well as in what 1657order. 1658 1659This file just lists the test names, one per line. 1660 1661#### `dc/` 1662 1663The standard tests directory for `dc`. For more information, see the [Standard 1664Tests][149] section. 1665 1666##### `all.txt` 1667 1668The file to tell the build system and `make run_all_tests` (see the [Group 1669Tests][141] section) what standard tests to run for `dc`, as well as in what 1670order. 1671 1672This file just lists the test names, one per line. 1673 1674##### `errors.txt` 1675 1676The initial error test file for `dc`. This file has one test per line. See the 1677[Error Tests][151] section for more information. 1678 1679##### `read_errors.txt` 1680 1681The file of tests errors with the `?` command (`read()` in `bc`). This file has 1682one test per line. See the [Error Tests][151] section for more information. 1683 1684##### `errors/` 1685 1686The directory with error tests for `dc`, most discovered by AFL++ (see the 1687[Fuzzing][82] section). There is one test per file. For more information, see 1688the [Error Tests][151] section. 1689 1690##### `scripts/` 1691 1692The script tests directory for `dc`. For more information, see the [Script 1693Tests][150] section. 1694 1695###### `all.txt` 1696 1697The file to tell the build system and `make run_all_tests` (see the [Group 1698Tests][141] section) what script tests to run for `dc`, as well as in what 1699order. 1700 1701This file just lists the test names, one per line. 1702 1703#### `fuzzing/` 1704 1705The directory containing the fuzzing infrastructure. For more information, see 1706the [Fuzzing][82] section. 1707 1708##### `bc_afl_continue.yaml` 1709 1710The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a 1711fuzz run. For more information, see the [Convenience][130] subsection of the 1712[Fuzzing][82] section. 1713 1714##### `bc_afl.yaml` 1715 1716The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a 1717fuzz run. For more information, see the [Convenience][130] subsection of the 1718[Fuzzing][82] section. 1719 1720Be aware that this will delete all previous unsaved fuzzing tests in the output 1721directories. 1722 1723##### `bc_inputs1/` 1724 1725The fuzzing input directory for the first third of inputs for `bc`. For more 1726information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1727 1728##### `bc_inputs2/` 1729 1730The fuzzing input directory for the second third of inputs for `bc`. For more 1731information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1732 1733##### `bc_inputs3/` 1734 1735The fuzzing input directory for the third third of inputs for `bc`. For more 1736information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1737 1738##### `dc_inputs/` 1739 1740The fuzzing input directory for the inputs for `dc`. For more information, see 1741the [Corpuses][192] subsection of the [Fuzzing][82] section. 1742 1743### `vs/` 1744 1745The directory containing all of the materials needed to build `bc`, `dc`, and 1746`bcl` on Windows. 1747 1748#### `bcl.sln` 1749 1750A Visual Studio solution file for [`bcl`][156]. This, along with 1751[`bcl.vcxproj`][63] and [`bcl.vcxproj.filters`][64] is what makes it possible to 1752build [`bcl`][156] on Windows. 1753 1754#### `bcl.vcxproj` 1755 1756A Visual Studio project file for [`bcl`][156]. This, along with [`bcl.sln`][65] 1757and [`bcl.vcxproj.filters`][64] is what makes it possible to build [`bcl`][156] 1758on Windows. 1759 1760#### `bcl.vcxproj.filters` 1761 1762A Visual Studio filters file for [`bcl`][156]. This, along with [`bcl.sln`][65] 1763and [`bcl.vcxproj`][63] is what makes it possible to build [`bcl`][156] on 1764Windows. 1765 1766#### `bc.sln` 1767 1768A Visual Studio solution file for `bc`. This, along with [`bc.vcxproj`][66] 1769and [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on 1770Windows. 1771 1772#### `bc.vcxproj` 1773 1774A Visual Studio project file for `bc`. This, along with [`bc.sln`][68] and 1775[`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on Windows. 1776 1777#### `bc.vcxproj.filters` 1778 1779A Visual Studio filters file for `bc`. This, along with [`bc.sln`][68] and 1780[`bc.vcxproj`][66] is what makes it possible to build `bc` on Windows. 1781 1782#### `tests/` 1783 1784A directory of files to run tests on Windows. 1785 1786##### `tests_bc.bat` 1787 1788A file to run basic `bc` tests on Windows. It expects that it will be run from 1789the directory containing it, and it also expects a `bc.exe` in the same 1790directory. 1791 1792##### `tests_dc.bat` 1793 1794A file to run basic `dc` tests on Windows. It expects that it will be run from 1795the directory containing it, and it also expects a `bc.exe` in the same 1796directory. 1797 1798## Build System 1799 1800The build system is described in detail in the [build manual][14], so 1801maintainers should start there. This section, however, describes some parts of 1802the build system that only maintainers will care about. 1803 1804### Clean Targets 1805 1806`bc` has a default `make clean` target that cleans up the build files. However, 1807because `bc`'s build system can generate many different types of files, there 1808are other clean targets that may be useful: 1809 1810* `make clean_gen` cleans the `gen/strgen` executable generated from 1811 [`gen/strgen.c`][15]. It has no prerequisites. 1812* `make clean` cleans object files, `*.cat` files (see the [Locales][85] 1813 section), executables, and files generated from text files in [`gen/`][145], 1814 including `gen/strgen` if it was built. So this has a prerequisite on 1815 `make clean_gen` in normal use. 1816* `make clean_benchmarks` cleans [benchmarks][144], including the `ministat` 1817 executable. It has no prerequisites. 1818* `make clean_config` cleans the generated `Makefile` and the manuals that 1819 [`configure.sh`][69] copied in preparation for install. It also depends on 1820 `make clean` and `make clean_benchmarks`, so it cleans those items too. This 1821 is the target that [`configure.sh`][69] uses before it does its work. 1822* `make clean_coverage` cleans the generated coverage files for the [test 1823 suite][124]'s [code coverage][146] capabilities. It has no prerequisites. This 1824 is useful if the code coverage tools are giving errors. 1825* `make clean_tests` cleans *everything*. It has prerequisites on all previous 1826 clean targets, but it also cleans all of the [generated tests][143]. 1827 1828When adding more generated files, you may need to add them to one of these 1829targets and/or add a target for them especially. 1830 1831### Preprocessor Macros 1832 1833`bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type: 1834 1835* builds, 1836* works under the [test suite][124], and 1837* excludes as much code as possible from all builds. 1838 1839This section will explain the preprocessor style of `bc` and `dc`, as well as 1840provide an explanation of the macros used. 1841 1842#### Style 1843 1844The style of macro use in `bc` is pretty straightforward: I avoid depending on 1845macro definitions and instead, I set defaults if the macro is not defined and 1846then test the value if the macro with a plain `#if`. 1847 1848(Some examples of setting defaults are in [`include/status.h`][176], just above 1849the definition of the `BcStatus` enum.) 1850 1851In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible. 1852 1853There are a couple of cases where I went with standard stuff instead. 1854 1855#### Standard Macros 1856 1857`BC_ENABLED` 1858 1859: This macro expands to `1` if `bc` is enabled, `0` if disabled. 1860 1861`DC_ENABLED` 1862 1863: This macro expands to `1` if `dc` is enabled, `0` if disabled. 1864 1865`BUILD_TYPE` 1866 1867: The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`, 1868 `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct 1869 the user to the correct markdown manual in the `git.gavinhoward.com` 1870 website. 1871 1872`EXECPREFIX` 1873 1874: This macro expands to the prefix on the executable name. This is used to 1875 allow `bc` and `dc` to skip the prefix when finding out which calculator is 1876 executing. 1877 1878`BC_NUM_KARATSUBA_LEN` 1879 1880: This macro expands to an integer, which is the length of numbers below which 1881 the Karatsuba multiplication algorithm switches to brute-force 1882 multiplication. 1883 1884`BC_ENABLE_EXTRA_MATH` 1885 1886: This macro expands to `1` if the [Extra Math build option][188] is enabled, 1887 `0` if disabled. 1888 1889`BC_ENABLE_HISTORY` 1890 1891: This macro expands to `1` if the [History build option][193] is enabled, `0` 1892 if disabled. 1893 1894`BC_ENABLE_NLS` 1895 1896: This macro expands to `1` if the [NLS build option][193] (for locales) is 1897 enabled, `0` if disabled. 1898 1899`BC_ENABLE_LIBRARY` 1900 1901: This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if 1902 disabled. If this is enabled, building the calculators themselves is 1903 disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero. 1904 1905`BC_ENABLE_MEMCHECK` 1906 1907: This macro expands to `1` if `bc` has been built for use with Valgrind's 1908 [Memcheck][194], `0` otherwise. This ensures that fatal errors still free 1909 all of their memory when exiting. `bc` does not do that normally because 1910 what's the point? 1911 1912`BC_ENABLE_AFL` 1913 1914: This macro expands to `1` if `bc` has been built for fuzzing with 1915 [AFL++][125], `0` otherwise. See the [Fuzzing][82] section for more 1916 information. 1917 1918`BC_DEFAULT_BANNER` 1919 1920: This macro expands to the default value for displaying the `bc` banner. 1921 1922`BC_DEFAULT_SIGINT_RESET` 1923 1924: The macro expands to the default value for whether or not `bc` should reset 1925 on `SIGINT` or quit. 1926 1927`BC_DEFAULT_TTY_MODE` 1928 1929: The macro expands to the default value for whether or not `bc` should use 1930 TTY mode when it available. 1931 1932`BC_DEFAULT_PROMPT` 1933 1934: This macro expands to the default value for whether or not `bc` should use a 1935 prompt when TTY mode is available. 1936 1937`DC_DEFAULT_SIGINT_RESET` 1938 1939: The macro expands to the default value for whether or not `dc` should reset 1940 on `SIGINT` or quit. 1941 1942`DC_DEFAULT_TTY_MODE` 1943 1944: The macro expands to the default value for whether or not `dc` should use 1945 TTY mode when it available. 1946 1947`DC_DEFAULT_PROMPT` 1948 1949: This macro expands to the default value for whether or not `dc` should use a 1950 prompt when TTY mode is available. 1951 1952`BC_DEBUG_CODE` 1953 1954: If this macro expands to a non-zero integer, then `bc` is built with *a lot* 1955 of extra debugging code. This is never set by the build system and must be 1956 set by the programmer manually. This should never be set in builds given to 1957 end users. For more information, see the [Debugging][134] section. 1958 1959## Test Suite 1960 1961While the source code may be the heart and soul of `bc`, the test suite is the 1962arms and legs: it gives `bc` the power to do anything it needs to do. 1963 1964The test suite is what allowed `bc` to climb to such high heights of quality. 1965This even goes for fuzzing because fuzzing depends on the test suite for its 1966input corpuses. (See the [Fuzzing][82] section.) 1967 1968Understanding how the test suite works should be, I think, the first thing that 1969maintainers learn after learning what `bc` and `dc` should do. This is because 1970the test suite, properly used, gives confidence that changes have not caused 1971bugs or regressions. 1972 1973That is why I spent the time to make the test suite as easy to use and as fast 1974as possible. 1975 1976To use the test suite (assuming `bc` and/or `dc` are already built), run the 1977following command: 1978 1979``` 1980make test 1981``` 1982 1983That's it. That's all. 1984 1985It will return an error code if the test suite failed. It will also print out 1986information about the failure. 1987 1988If you want the test suite to go fast, then run the following command: 1989 1990``` 1991make -j<cores> test 1992``` 1993 1994Where `<cores>` is the number of cores that your computer has. Of course, this 1995requires a `make` implementation that supports that option, but most do. (And I 1996will use this convention throughout the rest of this section.) 1997 1998I have even tried as much as possible, to put longer-running tests near the 1999beginning of the run so that the entire suite runs as fast as possible. 2000 2001However, if you want to be sure which test is failing, then running a bare 2002`make test` is a great way to do that. 2003 2004But enough about how you have no excuses to use the test suite as much as 2005possible; let's talk about how it works and what you *can* do with it. 2006 2007### Standard Tests 2008 2009The heavy lifting of testing the math in `bc`, as well as basic scripting, is 2010done by the "standard tests" for each calculator. 2011 2012These tests use the files in the [`tests/bc/`][161] and [`tests/dc/`][162] 2013directories (except for [`tests/bc/all.txt`][163], [`tests/bc/errors.txt`][164], 2014[`tests/bc/posix_errors.txt`][165], [`tests/bc/timeconst.sh`][166], 2015[`tests/dc/all.txt`][167], [`tests/dc/errors.txt`][168], and 2016[`tests/dc/read_errors.txt`][175]), which are called the "standard test 2017directories." 2018 2019For every test, there is the test file and the results file. The test files have 2020names of the form `<test>.txt`, where `<test>` is the name of the test, and the 2021results files have names of the form `<test>_results.txt`. 2022 2023If the test file exists but the results file does not, the results for that test 2024are generated by a GNU-compatible `bc` or `dc`. See the [Generated Tests][143] 2025section. 2026 2027The `all.txt` file in each standard tests directory is what tells the test suite 2028and [build system][142] what tests there are, and the tests are either run in 2029that order, or in the case of parallel `make`, that is the order that the 2030targets are listed as prerequisites of `make test`. 2031 2032If the test exists in the `all.txt` file but does not *actually* exist, the test 2033and its results are generated by a GNU-compatible `bc` or `dc`. See the 2034[Generated Tests][143] section. 2035 2036To add a non-generated standard test, do the following: 2037 2038* Add the test file (`<test>.txt` in the standard tests directory). 2039* Add the results file (`<test>_results.txt` in the standard tests directory). 2040 You can skip this step if just the results file needs to be generated. See the 2041 [Generated Tests][147] section for more information. 2042* Add the name of the test to the `all.txt` file in the standard tests 2043 directory, putting it in the order it should be in. If possible, I would put 2044 longer tests near the beginning because they will start running earlier with 2045 parallel `make`. I always keep `decimal` first, though, as a smoke test. 2046 2047If you need to add a generated standard test, see the [Generated Tests][147] 2048section for how to do that. 2049 2050Some standard tests need to be skipped in certain cases. That is handled by the 2051[build system][142]. See the [Integration with the Build System][147] section 2052for more details. 2053 2054In addition to all of the above, the standard test directory is not only the 2055directory for the standard tests of each calculator, it is also the parent 2056directory of all other test directories for each calculator. 2057 2058#### `bc` Standard Tests 2059 2060The list of current (27 February 2023) standard tests for `bc` is below: 2061 2062decimal 2063 2064: Tests decimal parsing and printing. 2065 2066print 2067 2068: Tests printing in every base from decimal. This is near the top for 2069 performance of parallel testing. 2070 2071parse 2072 2073: Tests parsing in any base and outputting in decimal. This is near the top 2074 for performance of parallel testing. 2075 2076lib2 2077 2078: Tests the extended math library. This is near the top for performance of 2079 parallel testing. 2080 2081print2 2082 2083: Tests printing at the extreme values of `obase`. 2084 2085length 2086 2087: Tests the `length()` builtin function. 2088 2089scale 2090 2091: Tests the `scale()` builtin function. 2092 2093shift 2094 2095: Tests the left (`<<`) and right (`>>`) shift operators. 2096 2097add 2098 2099: Tests addition. 2100 2101subtract 2102 2103: Tests subtraction. 2104 2105multiply 2106 2107: Tests multiplication. 2108 2109divide 2110 2111: Tests division. 2112 2113modulus 2114 2115: Tests modulus. 2116 2117power 2118 2119: Tests power (exponentiation). 2120 2121sqrt 2122 2123: Tests the `sqrt()` (square root) builtin function. 2124 2125trunc 2126 2127: Tests the truncation (`$`) operator. 2128 2129places 2130 2131: Tests the places (`@`) operator. 2132 2133vars 2134 2135: Tests some usage of variables. This one came from [AFL++][125] I think. 2136 2137boolean 2138 2139: Tests boolean operators. 2140 2141comp 2142 2143: Tests comparison operators. 2144 2145abs 2146 2147: Tests the `abs()` builtin function. 2148 2149assignments 2150 2151: Tests assignment operators, including increment/decrement operators. 2152 2153functions 2154 2155: Tests functions, specifically function parameters being replaced before they 2156 themselves are used. See the comment in `bc_program_call()` about the last 2157 condition. 2158 2159scientific 2160 2161: Tests scientific notation. 2162 2163engineering 2164 2165: Tests engineering notation. 2166 2167globals 2168 2169: Tests that assigning to globals affects callers. 2170 2171strings 2172 2173: Tests strings. 2174 2175strings2 2176 2177: Tests string allocation in slabs, to ensure slabs work. 2178 2179letters 2180 2181: Tests single and double letter numbers to ensure they behave differently. 2182 Single-letter numbers always be set to the same value, regardless of 2183 `ibase`. 2184 2185exponent 2186 2187: Tests the `e()` function in the math library. 2188 2189log 2190 2191: Tests the `l()` function in the math library. 2192 2193pi 2194 2195: Tests that `bc` produces the right value of pi for numbers with varying 2196 `scale` values. 2197 2198arctangent 2199 2200: Tests the `a()` function in the math library. 2201 2202sine 2203 2204: Tests the `s()` function in the math library. 2205 2206cosine 2207 2208: Tests the `c()` function in the math library. 2209 2210bessel 2211 2212: Tests the `j()` function in the math library. 2213 2214fib 2215 2216: Tests the `fib()` Fibonacci function in the extended math library. 2217 2218arrays 2219 2220: Test arrays. 2221 2222misc 2223 2224: Miscellaneous tests. I named it this because at the time, I struggled to 2225 classify them, but it's really testing multi-line numbers. 2226 2227misc1 2228 2229: A miscellaneous test found by [AFL++][125]. 2230 2231misc2 2232 2233: A miscellaneous test found by [AFL++][125]. 2234 2235misc3 2236 2237: A miscellaneous test found by [AFL++][125]. 2238 2239misc4 2240 2241: A miscellaneous test found by [AFL++][125]. 2242 2243misc5 2244 2245: A miscellaneous test found by [AFL++][125]. 2246 2247misc6 2248 2249: A miscellaneous test found by [AFL++][125]. 2250 2251misc7 2252 2253: A miscellaneous test found by [AFL++][125]. 2254 2255void 2256 2257: Tests void functions. 2258 2259rand 2260 2261: Tests the pseudo-random number generator and its special stack handling. 2262 2263rand_limits 2264 2265: Tests the limits of the pseudo-random number generator `irand()`. 2266 2267recursive_arrays 2268 2269: Tested the slab vector undo ability in used in `bc_parse_name()` when it 2270 existed. Now used as a stress test. 2271 2272divmod 2273 2274: Tests divmod. 2275 2276modexp 2277 2278: Tests modular exponentiation. 2279 2280bitfuncs 2281 2282: Tests the bitwise functions, `band()`, `bor()`, `bxor()`, `blshift()` and 2283 `brshift()` in [`gen/lib2.bc`][26]. 2284 2285leadingzero 2286 2287: Tests the leading zero functionality and the `plz*()` and `pnlz*()` 2288 functions in [`gen/lib2.bc`][26]. 2289 2290is_number 2291 2292: Tests the `is_number()` built-in function. 2293 2294is_string 2295 2296: Tests the `is_number()` built-in function. 2297 2298asciify_array 2299 2300: Tests the ability of `asciify()` to convert an array into a full string. 2301 2302line_by_line1 2303 2304: Tests the line-by-line behavior of `bc` with regards to `quit` in a function 2305 definition. 2306 2307line_by_line2 2308 2309: Tests the line-by-line behavior of `bc` with regards to `quit`. 2310 2311line_loop_quit1 2312 2313: Tests the behavior of `bc` with a `quit` after a single-line loop. 2314 2315line_loop_quit2 2316 2317: Tests the behavior of `bc` with a `quit` after a single-line loop and a 2318 newline escape. 2319 2320#### `dc` Standard Tests 2321 2322The list of current (27 February 2023) standard tests for `dc` is below: 2323 2324decimal 2325 2326: Tests decimal parsing and printing. 2327 2328length 2329 2330: Tests the `length()` builtin function, including for strings and arrays. 2331 2332stack_len 2333 2334: Tests taking the length of the results stack. 2335 2336exec_stack_len 2337 2338: Tests taking the length of the execution stack. 2339 2340add 2341 2342: Tests addition. 2343 2344subtract 2345 2346: Tests subtraction. 2347 2348multiply 2349 2350: Tests multiplication. 2351 2352divide 2353 2354: Tests division. 2355 2356modulus 2357 2358: Tests modulus. 2359 2360divmod 2361 2362: Tests divmod. 2363 2364power 2365 2366: Tests power (exponentiation). 2367 2368sqrt 2369 2370: Tests the `sqrt()` (square root) builtin function. 2371 2372modexp 2373 2374: Tests modular exponentiation. 2375 2376boolean 2377 2378: Tests boolean operators. 2379 2380negate 2381 2382: Tests negation as a command and as part of numbers. 2383 2384trunc 2385 2386: Tests the truncation (`$`) operator. 2387 2388places 2389 2390: Tests the places (`@`) operator. 2391 2392shift 2393 2394: Tests the left (`<<`) and right (`>>`) shift operators. 2395 2396abs 2397 2398: Tests the `abs()` builtin function (the `b` command). 2399 2400scientific 2401 2402: Tests scientific notation. 2403 2404engineering 2405 2406: Tests engineering notation. 2407 2408vars 2409 2410: Tests some usage of variables. This one came from [AFL++][125] I think. 2411 2412misc 2413 2414: Miscellaneous tests. I named it this because at the time, I struggled to 2415 classify them. 2416 2417misc1 2418 2419: More miscellaneous tests. This used to be an error file 2420 (`tests/dc/errors/15.txt`) due to the presence of a invalid `u` character. 2421 However, starting with version `6.1.0`, `u` became a valid command 2422 (`is_number()`), so the error file became valid. It was checked manually and 2423 moved, and `tests/dc/errors/34.txt` became the new `tests/dc/errors/15.txt`. 2424 2425strings 2426 2427: Tests strings. 2428 2429rand 2430 2431: Tests the pseudo-random number generator and its special stack handling. 2432 2433is_number 2434 2435: Tests the `is_number()` built-in function (the `u` command). 2436 2437is_string 2438 2439: Tests the `is_number()` built-in function (the `t` command). 2440 2441### Script Tests 2442 2443The heavy lifting of testing the scripting of `bc` is done by the "script tests" 2444for each calculator. 2445 2446These tests use the files in the [`tests/bc/scripts/`][169] and 2447[`tests/dc/scripts/`][170] directories (except for 2448[`tests/bc/scripts/all.txt`][171] and [`tests/dc/scripts/all.txt`][172]), which 2449are called the "script test directories." 2450 2451To add a script test, do the following: 2452 2453* Add the test file (`<test>.bc` or `<test>.dc` in the script tests directory). 2454* Add the results file (`<test>.txt` in the script tests directory). You can 2455 skip this step if just the results file needs to be generated. See the 2456 [Generated Tests][147] section for more information. 2457* Add the name of the test to the `all.txt` file in the script tests directory, 2458 putting it in the order it should be in. If possible, I would put longer tests 2459 near the beginning because they will start running earlier with parallel 2460 `make`. 2461 2462Some script tests need to be skipped in certain cases. That is handled by the 2463[build system][142]. See the [Integration with the Build System][147] section 2464for more details. 2465 2466Another unique thing about the script tests, at least for `bc`: they test the 2467`-g` and `--global-stacks` flags. This means that all of the script tests for 2468`bc` are written assuming the `-g` flag was given on the command-line 2469 2470There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed` 2471script is used to remove an incompatibility with GNU `bc`. 2472 2473If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU 2474`bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is 2475correct according to my reading of the `bc` spec, so my `bc` doesn't as well. 2476 2477The `sed` script edits numbers that end with just one digit on a line by itself 2478to put it on the same line as others. 2479 2480#### `bc` Script Tests 2481 2482The list of current (27 February 2023) script tests for `bc` is below: 2483 2484print.bc 2485 2486: Tests printing even harder than the print standard test. 2487 2488multiply.bc 2489 2490: Tests multiplication even harder than the multiply standard test. 2491 2492divide.bc 2493 2494: Tests division even harder than the divide standard test. 2495 2496subtract.bc 2497 2498: Tests subtraction even harder than the subtract standard test. 2499 2500add.bc 2501 2502: Tests addition even harder than the add standard test. 2503 2504parse.bc 2505 2506: Tests parsing even harder than the parse standard test. 2507 2508root.bc 2509 2510: Tests that `root()` and `cbrt()` do not go into an infinite loop on a 2511 pathological case found by a user. 2512 2513array.bc 2514 2515: Tests arrays even harder than the arrays standard test. 2516 2517array2.bc 2518 2519: Implements a test where an array element is passed as a parameter to a 2520 function, and then another array is passed to a later parameter that is 2521 named the same as the first array. This was added because of a bug found 2522 while writing a script in bc. 2523 2524atan.bc 2525 2526: Tests arctangent even harder than the arctangent standard test. 2527 2528bessel.bc 2529 2530: Tests bessel even harder than the bessel standard test. 2531 2532functions.bc 2533 2534: Tests functions even harder than the functions standard test. 2535 2536globals.bc 2537 2538: Tests global stacks directly. 2539 2540len.bc 2541 2542: Tests the `length()` builtin on arrays. 2543 2544rand.bc 2545 2546: Tests the random number generator in the presence of global stacks. 2547 2548references.bc 2549 2550: Tests functions with array reference parameters. 2551 2552screen.bc 2553 2554: A random script provided by an early user that he used to calculate the size 2555 of computer screens 2556 2557strings2.bc 2558 2559: Tests escaping in strings. 2560 2561ifs.bc 2562 2563: Tests proper ending of `if` statements without `else` statements. 2564 2565ifs2.bc 2566 2567: More tests proper ending of `if` statements without `else` statements. 2568 2569#### `dc` Script Tests 2570 2571The list of current (27 February 2023) script tests for `dc` is below: 2572 2573prime.dc 2574 2575: Tests scripting by generating the first 100,000 primes. 2576 2577asciify.dc 2578 2579: Tests the asciify command. 2580 2581stream.dc 2582 2583: Tests the stream command. 2584 2585array.dc 2586 2587: Tests arrays. 2588 2589else.dc 2590 2591: Tests else clauses on conditional execution commands. 2592 2593factorial.dc 2594 2595: Tests scripting with factorial. 2596 2597loop.dc 2598 2599: Tests scripting by implementing loops. 2600 2601quit.dc 2602 2603: Tests the quit command in the presence of tail calls. 2604 2605weird.dc 2606 2607: A miscellaneous test. 2608 2609no_clamp.dc 2610 2611: A test to ensure `dc` has the same behavior as the BSD `dc` with digi 2612 clamping off when parsing numbers. 2613 2614### Error Tests 2615 2616One of the most useful parts of the `bc` test suite, in my opinion, is the heavy 2617testing of error conditions. 2618 2619Just about every error condition I can think of is tested, along with many 2620machine-generated (by [AFL++][125]) ones. 2621 2622However, because the error tests will often return error codes, they require 2623different infrastructure from the rest of the test suite, which assumes that 2624the calculator under test will return successfully. A lot of that infrastructure 2625is in the [`scripts/functions.sh`][105] script, but it basically allows the 2626calculator to exit with an error code and then tests that there *was* an error 2627code. 2628 2629Besides returning error codes, error tests also ensure that there is output from 2630`stderr`. This is to make sure that an error message is always printed. 2631 2632The error tests for each calculator are spread through two directories, due to 2633historical accident. These two directories are the standard test directory (see 2634the [Standard Tests][149] section) and the `errors/` directory directly 2635underneath the standard tests directory. 2636 2637This split is convenient, however, because the tests in each directory are 2638treated differently. 2639 2640The error tests in the standard test directory, which include `errors.txt` for 2641both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for `dc`, 2642are run by [`tests/errors.sh`][226]. It reads them line-by-line and shoves the 2643data through `stdin`. Each line is considered a separate test. For this reason, 2644there can't be any blank lines in the error files in the standard tests 2645directory because a blank line causes a successful exit. 2646 2647On the other hand, the tests in the `errors/` directory below the standard tests 2648directory are run by [`tests/error.sh`][227] and are considered to be one test 2649per file. As such, they are used differently. They are shoved into the 2650calculator through `stdin`, but they are also executed by passing them on the 2651command-line. 2652 2653To add an error test, first figure out which kind you want. 2654 2655Is it a simple one-liner, and you don't care if it's tested through a file? 2656 2657Then put it in one of the error files in the standard test directory. I would 2658only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()` 2659errors in the `read_errors.txt` file for `dc`; all others I would put in the 2660respective `errors.txt` file. 2661 2662On the other hand, if you care if the error is run as a file on the 2663command-line, or the error requires multiple lines to reproduce, then put the 2664test in the respective `errors/` directory and run the [`configure.sh`][69] 2665script again. 2666 2667After that, you are done; the test suite will automatically pick up the new 2668test, and you don't have to tell the test suite the expected results. 2669 2670### `stdin` Tests 2671 2672The `stdin` tests specifically test the lexing and parsing of multi-line 2673comments and strings. This is important because when reading from `stdin`, the 2674calculators can only read one line at a time, so partial parses are possible. 2675 2676To add `stdin` tests, just add the tests to the `stdin.txt` file in the 2677respective standard tests directory, and add the expected results in the 2678`stdin_results.txt` in the respective standard tests directory. 2679 2680### `read()` Tests 2681 2682The `read()` tests are meant to test the `read()` builtin function, to ensure 2683that the parsing and execution is correct. 2684 2685Each line is one test, as that is the nature of using the `read()` function, so 2686to add a test, just add it as another line in the `read.txt` file in the 2687respective standard tests directory, and add its result to the 2688`read_results.txt` file in the respective standard tests directory. 2689 2690### Other Tests 2691 2692The "other" tests are just random tests that I could not easily classify under 2693other types of tests. They usually include things like command-line parsing and 2694environment variable testing. 2695 2696To add an other test, it requires adding the programming for it to 2697[`tests/other.sh`][195] because all of the tests are written specifically in 2698that script. It would be best to use the infrastructure in 2699[`scripts/functions.sh`][105]. 2700 2701### Linux `timeconst.bc` Script 2702 2703One special script that `bc`'s test suite will use is the [Linux `timeconst.bc` 2704script][6]. 2705 2706I made the test suite able to use this script because the reason the 2707[toybox][16] maintainer wanted my `bc` is because of this script, and I wanted 2708to be sure that it would run correctly on the script. 2709 2710However, it is not part of the distribution, nor is it part of the repository. 2711The reason for this is because [`timeconst.bc`][6] is under the GPL, while this 2712repo is under a BSD license. 2713 2714If you want `bc` to run tests on [`timeconst.bc`][6], download it and place it 2715at `tests/bc/scripts/timeconst.bc`. If it is there, the test suite will 2716automatically run its tests; otherwise, it will skip it. 2717 2718### History Tests 2719 2720There are automatic tests for history; however, they have dependencies: Python 3 2721and [`pexpect`][137]. 2722 2723As a result, because I need the [test suite to be portable][138], like the rest 2724of `bc`, the history tests are carefully guarded with things to ensure that they 2725are skipped, rather than failing if Python and [`pexpect`][137] are not 2726installed. For this reason, there is a `sh` script, [`tests/history.sh`][140] 2727that runs the actual script, [`tests/history.py`][139]. 2728 2729I have added as many tests as I could to cover as many lines and branches as 2730possible. I guess I could have done more, but doing so would have required a lot 2731of time. 2732 2733I have tried to make it as easy as possible to run the history tests. They will 2734run automatically if you use the `make test_history` command, and they will also 2735use parallel execution with `make -j<cores> test_history`. 2736 2737However, the history tests are meant only to be run by maintainers of `bc`; they 2738are *not* meant to be run by users and packagers. The reason for this is that 2739they only seem to work reliably on Linux; `pexpect` seems to have issues on 2740other platforms, especially timeout issues. 2741 2742Thus, they are excluded from running with `make test` and [`tests/all.sh`][225]. 2743However, they can be run from the [`scripts/release.sh`][83] script. 2744 2745All of the tests are contained in [`tests/history.py`][139]. The reason for this 2746is because they are in Python, and I don't have an easy way of including Python 2747(or at the very least, I am not familiar enough with Python to do that). So they 2748are all in the same file to make it easier on me. 2749 2750Each test is one function in the script. They all take the same number and type 2751of arguments: 2752 27531. `exe`: the executable to run. 27542. `args`: the arguments to pass to the executable. 27553. `env`: the environment. 2756 2757Each function creates a child process with `pexpect.spawn` and then tests with 2758that child. Then the function returns the child to the caller, who closes it 2759and checks its error code against its expected error code. 2760 2761Yes, the error code is not a success all the time. This is because of the UTF-8 2762tests; `bc` gives a fatal error on any non-ASCII data because ASCII is all `bc` 2763is required to handle, per the [standard][2]. 2764 2765So in [`tests/history.py`][139], there are four main arrays: 2766 2767* `bc` test functions, 2768* `bc` expected error codes. 2769* `dc` test functions. 2770* `dc` expected error codes. 2771 2772[`tests/history.py`][139] takes an index as an argument; that index is what test 2773it should run. That index is used to index into the proper test and error code 2774array. 2775 2776If you need to add more history tests, you need to do the following: 2777 27781. Add the function for that test to [`tests/history.py`][139]. 27792. Add the function to the proper array of tests. 27803. Add the expected error code to the proper array of error codes. 27814. Add a target for the test to [`Makefile.in`][70]. 27825. Add that target as a prerequisite to either `test_bc_history` or 2783 `test_dc_history`. 2784 2785You do not need to do anything to add the test to `history_all_tests` (see 2786[Group Tests][141] below) because the scripts will automatically run all of the 2787tests properly. 2788 2789### Generated Tests 2790 2791Some tests are *large*, and as such, it is impractical to check them into `git`. 2792Instead, the tests depend on the existence of a GNU-compatible `bc` in the 2793`PATH`, which is then used to generate the tests. 2794 2795If [`configure.sh`][69] was run with the `-G` argument, which disables generated 2796tests, then `make test` and friends will automatically skip generated tests. 2797This is useful to do on platforms that are not guaranteed to have a 2798GNU-compatible `bc` installed. 2799 2800However, adding a generated test is a complicated because you have to figure out 2801*where* you want to put the file to generate the test. 2802 2803For example, `bc`'s test suite will automatically use a GNU-compatible `bc` to 2804generate a `<test>_results.txt` file in the [standard tests][149] directory 2805(either `tests/bc/` or `tests/dc/`) if none exists for the `<test>` test. If no 2806`<test>.txt` file exists in the [standard tests][149] directory, then `bc`'s 2807test suite will look for a `<test>.bc` or `<test>.dc` file in the [script 2808tests][150] directory (either `tests/bc/scripts` or `tests/dc/scripts`), and if 2809that exists, it will use that script to generate the `<test>.txt` file in the 2810[standard tests][149] directory after which it will generate the 2811`<test>_results.txt` file in the [standard tests][149] directory. 2812 2813So you can choose to either: 2814 2815* Have a test in the [standard tests][149] directory without a corresponding 2816 `*_results.txt` file, or 2817* Have a script in the [script tests][150] directory to generate the 2818 corresponding file in the standard test directory before generating the 2819 corresponding `*_results.txt` file. 2820 2821Adding a script has a double benefit: the script itself can be used as a test. 2822However, script test results can also be generated. 2823 2824If `bc` is asked to run a script test, then if the script does not exist, `bc`'s 2825test suite returns an error. If it *does* exist, but no corresponding 2826`<test>.txt` file exists in the [script tests][150] directory, then a 2827GNU-compatible `bc` is used to generate the `<test>.txt` results file. 2828 2829If generated tests are disabled through [`configure.sh`][69], then these tests 2830are not generated if they do not exist. However, if they *do* exist, then they 2831are run. This can happen if a `make clean_tests` was not run between a build 2832that generated tests and a build that will not. 2833 2834### Group Tests 2835 2836While the test suite has a lot of targets in order to get parallel execution, 2837there are five targets that allow you to run each section, or all, of the test 2838suite as one unit: 2839 2840* `bc_all_tests` (`bc` tests) 2841* `timeconst_all_tests` ([Linux `timeconst.bc` script][6] tests) 2842* `dc_all_tests` (`dc` tests) 2843* `history_all_tests` (history tests) 2844* `run_all_tests` (combination of the previous four) 2845 2846In addition, there are more fine-grained targets available: 2847 2848* `test_bc` runs all `bc` tests (except history tests). 2849* `test_dc` runs all `dc` tests (except history tests). 2850* `test_bc_tests` runs all `bc` [standard tests][149]. 2851* `test_dc_tests` runs all `dc` [standard tests][149]. 2852* `test_bc_scripts` runs all `bc` [script tests][150]. 2853* `test_dc_scripts` runs all `dc` [script tests][150]. 2854* `test_bc_stdin` runs the `bc` [`stdin` tests][152]. 2855* `test_dc_stdin` runs the `dc` [`stdin` tests][152]. 2856* `test_bc_read` runs the `bc` [`read()` tests][153]. 2857* `test_dc_read` runs the `dc` [`read()` tests][153]. 2858* `test_bc_errors` runs the `bc` [error tests][151]. 2859* `test_dc_errors` runs the `dc` [error tests][151]. 2860* `test_bc_other` runs the `bc` [other tests][151]. 2861* `test_dc_other` runs the `dc` [other tests][151]. 2862* `timeconst` runs the tests for the [Linux `timeconst.bc` script][6]. 2863* `test_history` runs all history tests. 2864* `test_bc_history` runs all `bc` history tests. 2865* `test_dc_history` runs all `dc` history tests. 2866 2867All of the above tests are parallelizable. 2868 2869### Individual Tests 2870 2871In addition to all of the above, individual test targets are available. These 2872are mostly useful for attempting to fix a singular test failure. 2873 2874These tests are: 2875 2876* `test_bc_<test>`, where `<test>` is the name of a `bc` [standard test][149]. 2877 The name is the name of the test file without the `.txt` extension. It is the 2878 name printed by the test suite when running the test. 2879* `test_dc_<test>`, where `<test>` is the name of a `dc` [standard test][149]. 2880 The name is the name of the test file without the `.txt` extension. It is the 2881 name printed by the test suite when running the test. 2882* `test_bc_script_<test>`, where `<test>` is the name of a `bc` [script 2883 test][150]. The name of the test is the name of the script without the `.bc` 2884 extension. 2885* `test_dc_script_<test>`, where `<test>` is the name of a `dc` [script 2886 test][150]. The name of the test is the name of the script without the `.dc` 2887 extension. 2888* `test_bc_history<idx>` runs the `bc` history test with index `<idx>`. 2889* `test_dc_history<idx>` runs the `dc` history test with index `<idx>`. 2890 2891### [`bcl`][156] Test 2892 2893When [`bcl`][156] is built, the [build system][142] automatically ensures that 2894`make test` runs the [`bcl`][156] test instead of the `bc` and `dc` tests. 2895 2896There is only one test, and it is built from [`tests/bcl.c`][158]. 2897 2898The reason the test is in C is because [`bcl`][156] is a C library; I did not 2899want to have to write C code *and* POSIX `sh` scripts to run it. 2900 2901The reason there is only one test is because most of the code for the library is 2902tested by virtue of testing `bc` and `dc`; the test needs to only ensure that 2903the library bindings and plumbing do not interfere with the underlying code. 2904 2905However, just because there is only one test does not mean that it doesn't test 2906more than one thing. The code actually handles a series of tests, along with 2907error checking to ensure that nothing went wrong. 2908 2909To add a [`bcl`][156] test, just figure out what test you want, figure out where 2910in the [`tests/bcl.c`][158] would be best to put it, and put it there. Do as 2911much error checking as possible, and use the `err(BclError)` function. Ensure 2912that all memory is freed because that test is run through [Valgrind][159] and 2913[AddressSanitizer][160]. 2914 2915### Integration with the Build System 2916 2917If it was not obvious by now, the test suite is heavily integrated into the 2918[build system][142], but the integration goes further than just making the test 2919suite easy to run from `make` and generating individual and group tests. 2920 2921The big problem the test suite has is that some `bc` code, stuff that is 2922important to test, is only in *some* builds. This includes all of the extra math 2923extensions, for example. 2924 2925So the test suite needs to have some way of turning off the tests that depend on 2926certain [build types][81] when those [build types][81] are not used. 2927 2928This is the reason the is tightly integrated with the [build system][142]: the 2929[build system][142] knows what [build type][81] was used and can tell the test 2930suite to turn off the tests that do not apply. 2931 2932It does this with arguments to the test scripts that are either a `1` or a `0`, 2933depending on whether tests of that type should be enabled or not. These 2934arguments are why I suggest, in the [Test Scripts][148] section, to always use a 2935`make` target to run the test suite or any individual test. I have added a lot 2936of targets to make this easy and as fast as possible. 2937 2938In addition to all of that, the build system is responsible for selecting the 2939`bc`/`dc` tests or the [`bcl` test][157]. 2940 2941### Output Directories 2942 2943During any run of the test suite, the test suite outputs the results of running 2944various tests to files. These files are usually output to `tests/bc_outputs/` 2945and `tests/dc_outputs/`. 2946 2947However, in some cases, it may be necessary to output test results to a 2948different directory. If that is the case, set the environment variable 2949`BC_TEST_OUTPUT_DIR` to the name of the directory. 2950 2951If that is done, then test results will be written to 2952`$BC_TEST_OUTPUT_DIR/bc_outputs/` and `$BC_TEST_OUTPUT_DIR/dc_outputs/`. 2953 2954### Test Suite Portability 2955 2956The test suite is meant to be run by users and packagers as part of their 2957install process. 2958 2959This puts some constraints on the test suite, but the biggest is that the test 2960suite must be as [portable as `bc` itself][136]. 2961 2962This means that the test suite must be implemented in pure POSIX `make`, `sh`, 2963and C99. 2964 2965#### Test Scripts 2966 2967To accomplish the portability, the test suite is run by a bunch of `sh` scripts 2968that have the constraints laid out in [POSIX Shell Scripts][76]. 2969 2970However, that means they have some quirks, made worse by the fact that there are 2971[generated tests][143] and [tests that need to be skipped, but only 2972sometimes][147]. 2973 2974This means that a lot of the scripts take an awkward number and type of 2975arguments. Some arguments are strings, but most are integers, like 2976[`scripts/release.sh`][83]. 2977 2978It is for this reason that I do not suggest running the test scripts directly. 2979Instead, always use an appropriate `make` target, which already knows the 2980correct arguments for the test because of the [integration with the build 2981system][147]. 2982 2983### Test Coverage 2984 2985In order to get test coverage information, you need `gcc`, `gcov`, and `gcovr`. 2986 2987If you have them, run the following commands: 2988 2989``` 2990CC=gcc ./configure -gO3 -c 2991make -j<cores> 2992make coverage 2993``` 2994 2995Note that `make coverage` does not have a `-j<cores>` part; it cannot be run in 2996parallel. If you try, you will get errors. And note that `CC=gcc` is used. 2997 2998After running those commands, you can open your web browser and open the 2999`index.html` file in the root directory of the repo. From there, you can explore 3000all of the coverage results. 3001 3002If you see lines or branches that you think you could hit with a manual 3003execution, do such manual execution, and then run the following command: 3004 3005``` 3006make coverage_output 3007``` 3008 3009and the coverage output will be updated. 3010 3011If you want to rerun `make coverage`, you must do a `make clean` and build 3012first, like this: 3013 3014``` 3015make clean 3016make -j<cores> 3017make coverage 3018``` 3019 3020Otherwise, you will get errors. 3021 3022If you want to run tests in parallel, you can do this: 3023 3024``` 3025make -j<cores> 3026make -j<cores> test 3027make coverage_output 3028``` 3029 3030and that will generate coverage output correctly. 3031 3032### [AddressSanitizer][21] and Friends 3033 3034To run the test suite under [AddressSanitizer][21] or any of its friends, use 3035the following commands: 3036 3037``` 3038CFLAGS="-fsanitize=<sanitizer> ./configure -gO3 -m 3039make -j<cores> 3040make -j<cores> test 3041``` 3042 3043where `<sanitizer>` is the correct name of the desired sanitizer. There is one 3044exception to the above: `UndefinedBehaviorSanitizer` should be run on a build 3045that has zero optimization, so for `UBSan`, use the following commands: 3046 3047``` 3048CFLAGS="-fsanitize=undefined" ./configure -gO0 -m 3049make -j<cores> 3050make -j<cores> test 3051``` 3052 3053### [Valgrind][20] 3054 3055To run the test suite under [Valgrind][20], run the following commands: 3056 3057``` 3058./configure -gO3 -v 3059make -j<cores> 3060make -j<cores> test 3061``` 3062 3063It really is that easy. I have directly added infrastructure to the build system 3064and the test suite to ensure that if [Valgrind][20] detects any memory errors or 3065any memory leaks at all, it will tell the test suite infrastructure to report an 3066error and exit accordingly. 3067 3068## POSIX Shell Scripts 3069 3070There is a lot of shell scripts in this repository, and every single one of them 3071is written in pure [POSIX `sh`][72]. 3072 3073The reason that they are written in [POSIX `sh`][72] is for *portability*: POSIX 3074systems are only guaranteed to have a barebones implementation of `sh` 3075available. 3076 3077There are *many* snares for unwary programmers attempting to modify 3078[`configure.sh`][69], any of the scripts in this directory, [`strgen.sh`][9], or 3079any of the scripts in [`tests/`][77]. Here are some of them: 3080 30811. No `bash`-isms. 30822. Only POSIX standard utilities are allowed. 30833. Only command-line options defined in the POSIX standard for POSIX utilities 3084 are allowed. 30854. Only the standardized behavior of POSIX utilities is allowed. 30865. Functions return data by *printing* it. Using `return` sets their exit code. 3087 3088In other words, the script must only use what is standardized in the [`sh`][72] 3089and [Shell Command Language][73] standards in POSIX. This is *hard*. It precludes 3090things like `local` and the `[[ ]]` notation. 3091 3092These are *enormous* restrictions and must be tested properly. I put out at 3093least one release with a change to `configure.sh` that wasn't portable. That was 3094an embarrassing mistake. 3095 3096The lack of `local`, by the way, is why variables in functions are named with 3097the form: 3098 3099``` 3100_<function_name>_<var_name> 3101``` 3102 3103This is done to prevent any clashes of variable names with already existing 3104names. And this applies to *all* shell scripts. However, there are a few times 3105when that naming convention is *not* used; all of them are because those 3106functions are required to change variables in the global scope. 3107 3108### Maintainer-Only Scripts 3109 3110If a script is meant to be used for maintainers (of `bc`, not package 3111maintainers), then rules 2, 3, and 4 don't need to be followed as much because 3112it is assumed that maintainers will be able to install whatever tools are 3113necessary to do the job. 3114 3115## Manuals 3116 3117The manuals for `bc` and `dc` are all generated, and the manpages for `bc`, 3118`dc`, and `bcl` are also generated. 3119 3120Why? 3121 3122I don't like the format of manpages, and I am not confident in my ability to 3123write them. Also, they are not easy to read on the web. 3124 3125So that explains why `bcl`'s manpage is generated from its markdown version. But 3126why are the markdown versions of the `bc` and `dc` generated? 3127 3128Because the content of the manuals needs to change based on the [build 3129type][81]. For example, if `bc` was built with no history support, it should not 3130have the **COMMAND LINE HISTORY** section in its manual. If it did, that would 3131just confuse users. 3132 3133So the markdown manuals for `bc` and `dc` are generated from templates 3134([`manuals/bc.1.md.in`][89] and [`manuals/dc.1.md.in`][90]). And from there, 3135the manpages are generated from the generated manuals. 3136 3137The generated manpage for `bcl` ([`manuals/bcl.3`][62]) is checked into version 3138control, and the generated markdown manuals and manpages for `bc` 3139([`manuals/bc`][79]) and `dc` ([`manuals/dc`][80]) are as well. 3140 3141This is because generating the manuals and manpages requires a heavy dependency 3142that only maintainers should care about: [Pandoc][92]. Because users [should not 3143have to install *any* dependencies][136], the files are generated, checked into 3144version control, and included in distribution tarballs. 3145 3146If you run [`configure.sh`][69], you have an easy way of generating the markdown 3147manuals and manpages: just run `make manpages`. This target calls 3148[`scripts/manpage.sh`][60] appropriately for `bc`, `dc`, and `bcl`. 3149 3150For more on how generating manuals and manpages works, see 3151[`scripts/manpage.sh`][60]. 3152 3153## Locales 3154 3155The locale system of `bc` is enormously complex, but that's because 3156POSIX-compatible locales are terrible. 3157 3158How are they terrible? 3159 3160First, `gencat` does not work for generating cross-compilation. In other words, 3161it does not generate machine-portable files. There's nothing I can do about 3162this except for warn users. 3163 3164Second, the format of `.msg` files is...interesting. Thank goodness it is text 3165because otherwise, it would be impossible to get them right. 3166 3167Third, `.msg` files are not used. In other words, `gencat` exists. Why? 3168 3169Fourth, `$NLSPATH` is an awful way to set where and *how* to install locales. 3170 3171Yes, where and *how*. 3172 3173Obviously, from it's name, it's a path, and that's the where. The *how* is more 3174complicated. 3175 3176It's actually *not* a path, but a path template. It's a format string, and it 3177can have a few format specifiers. For more information on that, see [this 3178link][84]. But in essence, those format specifiers configure how each locale is 3179supposed to be installed. 3180 3181With all those problems, why use POSIX locales? Portability, as always. I can't 3182assume that `gettext` will be available, but I *can* pretty well assume that 3183POSIX locales will be available. 3184 3185The locale system of `bc` includes all files under [`locales/`][85], 3186[`scripts/locale_install.sh`][87], [`scripts/locale_uninstall.sh`][88], 3187[`scripts/functions.sh`][105], the `bc_err_*` constants in [`src/data.c`][131], 3188and the parts of the build system needed to activate it. There is also code in 3189[`src/vm.c`][58] (in `bc_vm_gettext()`) for loading the current locale. 3190 3191If the order of error messages and/or categories are changed, the order of 3192errors must be changed in the enum, the default error messages and categories in 3193[`src/data.c`][131], and all of the messages and categories in the `.msg` files 3194under [`locales/`][85]. 3195 3196## Static Analysis 3197 3198I do *some* static analysis on `bc`. 3199 3200I used to use [Coverity][196], but I stopped using it when it started giving me 3201too many false positives and also because it had a vulnerability. 3202 3203However, I still use the [Clang Static Analyzer][197] through 3204[`scan-build`][19]. I only use it in debug mode because I have to add some 3205special code to make it not complain about things that are definitely not a 3206problem. 3207 3208The most frequent example of false positives is where a local is passed to a 3209function to be initialized. [`scan-build`][19] misses that fact, so I 3210pre-initialize such locals to prevent the warnings. 3211 3212To run `scan-build`, do the following: 3213 3214``` 3215make clean 3216scan-build make 3217``` 3218 3219`scan-build` will print its warnings to `stdout`. 3220 3221## Fuzzing 3222 3223The quality of this `bc` is directly related to the amount of fuzzing I did. As 3224such, I spent a lot of work making the fuzzing convenient and fast, though I do 3225admit that it took me a long time to admit that it did need to be faster. 3226 3227First, there were several things which make fuzzing fast: 3228 3229* Using [AFL++][125]'s deferred initialization. 3230* Splitting `bc`'s corpuses. 3231* Parallel fuzzing. 3232 3233Second, there are several things which make fuzzing convenient: 3234 3235* Preprepared input corpuses. 3236* [`scripts/fuzz_prep.sh`][119]. 3237* `tmux` and `tmuxp` configs. 3238* [`scripts/afl.py`][94]. 3239 3240### Fuzzing Performance 3241 3242Fuzzing with [AFL++][125] can be ***SLOW***. Spending the time to make it as 3243fast as possible is well worth the time. 3244 3245However, there is a caveat to the above: it is easy to make [AFL++][125] crash, 3246be unstable, or be unable to find "paths" (see [AFL++ Quickstart][129]) if the 3247performance enhancements are done poorly. 3248 3249To stop [AFL++][125] from crashing on test cases, and to be stable, these are 3250the requirements: 3251 3252* The state at startup must be *exactly* the same. 3253* The virtual memory setup at startup must be *exactly* the same. 3254 3255The first isn't too hard; it's the second that is difficult. 3256 3257`bc` allocates a lot of memory at start. ("A lot" is relative; it's far less 3258than most programs.) After going through an execution run, however, some of that 3259memory, while it could be cleared and reset, is in different places because of 3260vectors. Since vectors reallocate, their allocations are not guaranteed to be in 3261the same place. 3262 3263So to make all three work, I had to set up the deferred initialization and 3264persistent mode *before* any memory was allocated (except for `vm.jmp_bufs`, 3265which is probably what caused the stability to drop below 100%). However, using 3266deferred alone let me put the [AFL++][125] initialization further back. This 3267works because [AFL++][125] sets up a `fork()` server that `fork()`'s `bc` right 3268at that call. Thus, every run has the exact same virtual memory setup, and each 3269run can skip all of the setup code. 3270 3271I tested `bc` using [AFL++][125]'s deferred initialization, plus persistent 3272mode, plus shared memory fuzzing. In order to do it safely, with stability above 327399%, all of that was actually *slower* than using just deferred initialization 3274with the initialization *right before* `stdin` was read. And as a bonus, the 3275stability in that situation is 100%. 3276 3277As a result, my [AFL++][125] setup only uses deferred initialization. That's the 3278`__AFL_INIT()` call. 3279 3280(Note: there is one more big item that must be done in order to have 100% 3281stability: the pseudo-random number generator *must* start with *exactly* the 3282same seed for every run. This is set up with the `tmux` and `tmuxp` configs that 3283I talk about below in [Convenience][130]. This seed is set before the 3284`__AFL_INIT()` call, so setting it has no runtime cost for each run, but without 3285it, stability would be abysmal.) 3286 3287On top of that, while `dc` is plenty fast under fuzzing (because of a faster 3288parser and less test cases), `bc` can be slow. So I have split the `bc` input 3289corpus into three parts, and I set fuzzers to run on each individually. This 3290means that they will duplicate work, but they will also find more stuff. 3291 3292On top of all of that, each input corpus (the three `bc` corpuses and the one 3293`dc` corpus) is set to run with 4 fuzzers. That works out perfectly for two 3294reasons: first, my machine has 16 cores, and second, the [AFL++][125] docs 3295recommend 4 parallel fuzzers, at least, to run different "power schedules." 3296 3297### Convenience 3298 3299The preprepared input corpuses are contained in the 3300`tests/fuzzing/bc_inputs{1,2,3}/`, and `tests/fuzzing/dc_inputs` directories. 3301There are three `bc` directories and only one `dc` directory because `bc`'s 3302input corpuses are about three times as large, and `bc` is a larger program; 3303it's going to need much more fuzzing. 3304 3305(They do share code though, so fuzzing all of them still tests a lot of the same 3306math code.) 3307 3308The next feature of convenience is the [`scripts/fuzz_prep.sh`][119] script. It 3309assumes the existence of `afl-clang-lto` in the `$PATH`, but if that exists, it 3310automatically configures and builds `bc` with a fuzz-ideal build. 3311 3312A fuzz-ideal build has several things: 3313 3314* `afl-clang-lto` as the compiler. (See [AFL++ Quickstart][129].) 3315* Debug mode, to crash as easily as possible. 3316* Full optimization (including [Link-Time Optimization][126]), for performance. 3317* [AFL++][125]'s deferred initialization (see [Fuzzing Performance][127] above). 3318* And `AFL_HARDEN=1` during the build to harden the build. See the [AFL++][125] 3319 documentation for more information. 3320 3321There is one big thing that a fuzz-ideal build does *not* have: it does not use 3322[AFL++][125]'s `libdislocator.so`. This is because `libdislocator.so` crashes if 3323it fails to allocate memory. I do not want to consider those as crashes because 3324my `bc` does, in fact, handle them gracefully by exiting with a set error code. 3325So `libdislocator.so` is not an option. 3326 3327However, to add to [`scripts/fuzz_prep.sh`][119] making a fuzz-ideal build, in 3328`tests/fuzzing/`, there are two `yaml` files: [`tests/fuzzing/bc_afl.yaml`][120] 3329and [`tests/fuzzing/bc_afl_continue.yaml`][121]. These files are meant to be 3330used with [`tmux`][122] and [`tmuxp`][123]. While other programmers will have to 3331adjust the `start_directory` item, once it is adjusted, then using this command: 3332 3333``` 3334tmuxp load tests/fuzzing/bc_afl.yaml 3335``` 3336 3337will start fuzzing. 3338 3339In other words, to start fuzzing, the sequence is: 3340 3341``` 3342./scripts/fuzz_prep.sh 3343tmuxp load tests/fuzzing/bc_afl.yaml 3344``` 3345 3346Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on 3347`bc` and 4 on `dc`. The outputs will be put into the 3348`tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories. 3349 3350(Note that loading that config will also delete all unsaved [AFL++][125] output 3351from the output directories.) 3352 3353Sometimes, [AFL++][125] will report crashes when there are none. When crashes 3354are reported, I always run the following command: 3355 3356``` 3357./scripts/afl.py <dir> 3358``` 3359 3360where `dir` is one of `bc1`, `bc2`, `bc3`, or `dc`, depending on which of the 336116 instances reported the crash. If it was one of the first four (`bc11` through 3362`bc14`), I use `bc1`. If it was one of the second four (`bc21` through `bc24`, I 3363use `bc2`. If it was one of the third four (`bc31` through `bc34`, I use `bc3`. 3364And if it was `dc`, I use `dc`. 3365 3366The [`scripts/afl.py`][94] script will report whether [AFL++][125] correctly 3367reported a crash or not. If so, it will copy the crashing test case to 3368`.test.txt` and tell you whether it was from running it as a file or through 3369`stdin`. 3370 3371From there, I personally always investigate the crash and fix it. Then, when the 3372crash is fixed, I either move `.test.txt` to `tests/{bc,dc}/errors/<idx>.txt` as 3373an error test (if it produces an error) or I create a new 3374`tests/{bc,dc}/misc<idx>.txt` test for it and a corresponding results file. (See 3375[Test Suite][124] for more information about the test suite.) In either case, 3376`<idx>` is the next number for a file in that particular place. For example, if 3377the last file in `tests/{bc,dc}/errors/` is `tests/{bc,dc}/errors/18.txt`, I 3378move `.test.txt` to `tests/bc/error/19.txt`. 3379 3380Then I immediately run [`scripts/afl.py`][94] again to find the next crash 3381because often, [AFL++][125] found multiple test cases that trigger the same 3382crash. If it finds another, I repeat the process until it is happy. 3383 3384Once it *is* happy, I do the same `fuzz_prep.sh`, `tmuxp load` sequence and 3385restart fuzzing. Why do I restart instead of continuing? Because with the 3386changes, the test outputs could be stale and invalid. 3387 3388However, there *is* a case where I continue: if [`scripts/afl.py`][94] finds 3389that every crash reported by [AFL++][125] is invalid. If that's the case, I can 3390just continue with the command: 3391 3392``` 3393tmuxp load tests/fuzzing/bc_afl_continue.yaml 3394``` 3395 3396(Note: I admit that I usually run [`scripts/afl.py`][94] while the fuzzer is 3397still running, so often, I don't find a need to continue since there was no 3398stop. However, the capability is there, if needed.) 3399 3400In addition, my fuzzing setup, including the `tmux` and `tmuxp` configs, 3401automatically set up [AFL++][125] power schedules (see [Fuzzing 3402Performance][127] above). They also set up the parallel fuzzing such that there 3403is one fuzzer in each group of 4 that does deterministic fuzzing. It's always 3404the first one in each group. 3405 3406For more information about deterministic fuzzing, see the [AFL++][125] 3407documentation. 3408 3409### Corpuses 3410 3411I occasionally add to the input corpuses. These files come from new files in the 3412[Test Suite][124]. In fact, I use soft links when the files are the same. 3413 3414However, when I add new files to an input corpus, I sometimes reduce the size of 3415the file by removing some redundancies. 3416 3417And then, when adding to the `bc` corpuses, I try to add them evenly so that 3418each corpus will take about the same amount of time to get to a finished state. 3419 3420### [AFL++][125] Quickstart 3421 3422The way [AFL++][125] works is complicated. 3423 3424First, it is the one to invoke the compiler. It leverages the compiler to add 3425code to the binary to help it know when certain branches are taken. 3426 3427Then, when fuzzing, it uses that branch information to generate information 3428about the "path" that was taken through the binary. 3429 3430I don't know what AFL++ counts as a new path, but each new path is added to an 3431output corpus, and it is later used as a springboard to find new paths. 3432 3433This is what makes AFL++ so effective: it's not just blindly thrashing a binary; 3434it adapts to the binary by leveraging information about paths. 3435 3436### Fuzzing Runs 3437 3438For doing a fuzzing run, I expect about a week or two where my computer is 3439basically unusable, except for text editing and light web browsing. 3440 3441Yes, it can take two weeks for me to do a full fuzzing run, and that does *not* 3442include the time needed to find and fix crashes; it only counts the time on the 3443*last* run, the one that does not find any crashes. This means that the entire 3444process can take a month or more. 3445 3446What I use as an indicator that the fuzzing run is good enough is when the 3447number of "Pending" paths (see [AFL++ Quickstart][129] above) for all fuzzer 3448instances, except maybe the deterministic instances, is below 50. And even then, 3449I try to let deterministic instances get that far as well. 3450 3451You can see how many pending paths are left in the "path geometry" section of 3452the [AFL++][125] dashboard. 3453 3454Also, to make [AFL++][125] quit, you need to send it a `SIGINT`, either with 3455`Ctrl+c` or some other method. It will not quit until you tell it to. 3456 3457### Radamsa 3458 3459I rarely use [Radamsa][99] instead of [AFL++][125]. In fact, it's only happened 3460once. 3461 3462The reason I use [Radamsa][99] instead of [AFL++][125] is because it is easier 3463to use with varying command-line arguments, which was needed for testing `bc`'s 3464command-line expression parsing code, and [AFL++][125] is best when testing 3465input from `stdin`. 3466 3467[`scripts/radamsa.sh`][100] does also do fuzzing on the [AFL++][125] inputs, but 3468it's not as effective at that, so I don't really use it for that either. 3469 3470[`scripts/radamsa.sh`][100] and [Radamsa][99] were only really used once; I have 3471not had to touch the command-line expression parsing code since. 3472 3473### [AddressSanitizer][21] with Fuzzing 3474 3475One advantage of using [AFL++][125] is that it saves every test case that 3476generated a new path (see [AFL++ Quickstart][129] above), and it doesn't delete 3477them when the user makes it quit. 3478 3479Keeping them around is not a good idea, for several reasons: 3480 3481* They are frequently large. 3482* There are a lot of them. 3483* They go stale; after `bc` is changed, the generated paths may not be valid 3484 anymore. 3485 3486However, before they are deleted, they can definitely be leveraged for even 3487*more* bug squashing by running *all* of the paths through a build of `bc` with 3488[AddressSanitizer][21]. 3489 3490This can easily be done with these four commands: 3491 3492``` 3493./scripts/fuzz_prep.sh -a 3494./scripts/afl.py --asan bc1 3495./scripts/afl.py --asan bc2 3496./scripts/afl.py --asan bc3 3497./scripts/afl.py --asan dc 3498``` 3499 3500(By the way, the last four commands could be run in separate terminals to do the 3501processing in parallel.) 3502 3503These commands build an [ASan][21]-enabled build of `bc` and `dc` and then they 3504run `bc` and `dc` on all of the found crashes and path output corpuses. This is 3505to check that no path or crash has found any memory errors, including memory 3506leaks. 3507 3508Because the output corpuses can contain test cases that generate infinite loops 3509in `bc` or `dc`, [`scripts/afl.py`][94] has a timeout of 8 seconds, which is far 3510greater than the timeout that [AFL++][125] uses and should be enough to catch 3511any crash. 3512 3513If [AFL++][125] fails to find crashes *and* [ASan][21] fails to find memory 3514errors on the outputs of [AFL++][125], that is an excellent indicator of very 3515few bugs in `bc`, and a release can be made with confidence. 3516 3517## Code Concepts 3518 3519This section is about concepts that, if understood, will make it easier to 3520understand the code as it is written. 3521 3522The concepts in this section are not found in a single source file, but they are 3523littered throughout the code. That's why I am writing them all down in a single 3524place. 3525 3526### POSIX Mode 3527 3528POSIX mode is `bc`-only. 3529 3530In fact, POSIX mode is two different modes: Standard Mode and Warning Mode. 3531These modes are designed to help users write POSIX-compatible `bc` scripts. 3532 3533#### Standard Mode 3534 3535Standard Mode is activated with the `-s` or `--standard` flags. 3536 3537In this mode, `bc` will error if any constructs are used that are not strictly 3538compatible with the [POSIX `bc` specification][2]. 3539 3540#### Warning Mode 3541 3542Warning Mode is activated with the `-w` or `--warn` flags. 3543 3544In this mode, `bc` will issue warnings, but continue, if any constructs are used 3545that are not strictly compatible with the [POSIX `bc` specification][2]. 3546 3547### Memory Management 3548 3549The memory management in `bc` is simple: everything is owned by one thing. 3550 3551If something is in a vector, it is owned by that vector. 3552 3553If something is contained in a struct, it is owned by that struct with one 3554exception: structs can be given pointers to other things, but only if those 3555other things will outlast the struct itself. 3556 3557As an example, the `BcParse` struct has a pointer to the one `BcProgram` in 3558`bc`. This is okay because the program is initialized first and deallocated 3559last. 3560 3561In other words, it's simple: if a field in a struct is a pointer, then unless 3562that pointer is directly allocated by the struct (like the vector array or the 3563number limb array), that struct does not own the item at that pointer. 3564Otherwise, the struct *does* own the item. 3565 3566### [Async-Signal-Safe][115] Signal Handling 3567 3568`bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but 3569`bc` is, by and large, CPU-bound. This has several consequences, but the biggest 3570is that there is no easy way to allow signals to interrupt it. 3571 3572This consequence is not obvious, but it comes from the fact that a lot of I/O 3573operations can be interrupted and return [`EINTR`][198]. This makes such I/O 3574calls natural places for allowing signals to interrupt execution, even when the 3575signal comes during execution, and not interrupting I/O calls. The way this is 3576done is setting a flag in the signal handler, which is checked around the time 3577of the I/O call, when it is convenient. 3578 3579Alternatively, I/O bound programs can use the [self-pipe trick][199]. 3580 3581Neither of these are possible in `bc` because the execution of math code can 3582take a long time. If a signal arrives during this long execution time, setting a 3583flag like an I/O bound application and waiting until the next I/O call could 3584take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a 3585week to calculate a million digits of pi, and it's not slow as far as `bc` 3586implementations go.) 3587 3588Thus, using just the technique of setting the flag just will not work for an 3589interactive calculator. 3590 3591Well, it can, but it requires a lot of code and massive inefficiencies. I know 3592this because that was the original implementation. 3593 3594The original implementation set a flag and just exit the signal handler. Then, 3595on just about every loop header, I have a check for the signal flag. These 3596checks happened on every iteration of every loop. It was a massive waste because 3597it was polling, and [polling is evil][200]. 3598 3599So for version [3.0.0][32], I expended a lot of effort to change the 3600implementation. 3601 3602In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`) 3603to tell the signal handler whether it can use `longjmp()` to stop the current 3604execution. If so, it does. If not, it sets a flag, which then is used by the 3605code outside the signal handler that set the `vm.sig_lock` flag. When that code 3606unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that 3607code executes the `longjmp()` and stops the current execution. 3608 3609Other than that, the rest of the interrupt-based implementation is best 3610described in the [Error Handling][97]. 3611 3612However, there are rules for signal handlers that I must lay out. 3613 3614First, signal handlers can only call [async-signal-safe][115] functions. 3615 3616Second, any field set or read by both the signal handler and normal code must be 3617a `volatile sig_atomic_t`. 3618 3619Third, when setting such fields, they must be set to constants and no math can 3620be done on them. This restriction and the above restriction exist in order to 3621ensure that the setting of the fields is always atomic with respect to signals. 3622 3623These rules exist for *any* code using Unix signal handlers, not just `bc`. 3624 3625#### Vectors and Numbers 3626 3627Vectors and numbers needed special consideration with the interrupt-based signal 3628handling. 3629 3630When vectors and numbers are about to allocate, or *reallocate* their arrays, 3631they need to lock signals to ensure that they do not call `malloc()` and friends 3632and get interrupted by a signal because, as you will see in the [Error 3633Handling][97] section, `longjmp()` cannot be used in a signal handler if it may 3634be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and 3635friends. 3636 3637### Asserts 3638 3639If you asked me what procedure is used the most in `bc`, I would reply without 3640hesitation, "`assert()`." 3641 3642I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with 3643[AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes, 3644and a failing `assert()` counts as one. 3645 3646So while a lot of bad bugs might have corrupted data and *not* caused crashes, 3647because I put in so many `assert()`'s, they were *turned into* crashing bugs, 3648and [AFL++][125] found them. 3649 3650By far, the most bugs it found this way was in the `bc` parser. (See the [`bc` 3651Parsing][110] for more information.) And even though I was careful to put 3652`assert()`'s everywhere, most parser bugs manifested during execution of 3653bytecode because the virtual machine assumes the bytecode is valid. 3654 3655Sidenote: one of those bugs caused an infinite recursion when running the sine 3656(`s()`) function in the math library, so yes, parser bugs can be *very* weird. 3657 3658Anyway, the way I did `assert()`'s was like this: whenever I realized that I 3659had put assumptions into the code, I would put an `assert()` there to test it 3660**and** to *document* it. 3661 3662Yes, documentation. In fact, by far the best documentation of the code in `bc` 3663is actually the `assert()`'s. The only time I would not put an `assert()` to 3664test an assumption is if that assumption was already tested by an `assert()` 3665earlier. 3666 3667As an example, if a function calls another function and passes a pointer that 3668the caller previously `assert()`'ed was *not* `NULL`, then the callee does not 3669have to `assert()` it too, unless *also* called by another function that does 3670not `assert()` that. 3671 3672At first glance, it may seem like putting asserts for pointers being non-`NULL` 3673everywhere would actually be good, but unfortunately, not for fuzzing. Each 3674`assert()` is a branch, and [AFL++][125] rates its own effectiveness based on 3675how many branches it covers. If there are too many `assert()`'s, it may think 3676that it is not being effective and that more fuzzing is needed. 3677 3678This means that `assert()`'s show up most often in two places: function 3679preconditions and function postconditions. 3680 3681Function preconditions are `assert()`'s that test conditions relating to the 3682arguments a function was given. They appear at the top of the function, usually 3683before anything else (except maybe initializing a local variable). 3684 3685Function postconditions are `assert()`'s that test the return values or other 3686conditions when a function exits. These are at the bottom of a function or just 3687before a `return` statement. 3688 3689The other `assert()`'s cover various miscellaneous assumptions. 3690 3691If you change the code, I ***HIGHLY*** suggest that you use `assert()`'s to 3692document your assumptions. And don't remove them when [AFL++][125] gleefully 3693crashes `bc` and `dc` over and over again. 3694 3695### Vectors 3696 3697In `bc`, vectors mean resizable arrays, and they are the most fundamental piece 3698of code in the entire codebase. 3699 3700I had previously written a [vector implementation][112], which I used to guide 3701my decisions, but I wrote a new one so that `bc` would not have a dependency. I 3702also didn't make it as sophisticated; the one in `bc` is very simple. 3703 3704Vectors store some information about the type that they hold: 3705 3706* The size (as returned by `sizeof`). 3707* An enum designating the destructor. 3708 3709If the destructor is `BC_DTOR_NONE`, it is counted as the type not having a 3710destructor. 3711 3712But by storing the size, the vector can then allocate `size * cap` bytes, where 3713`cap` is the capacity. Then, when growing the vector, the `cap` is doubled again 3714and again until it is bigger than the requested size. 3715 3716But to store items, or to push items, or even to return items, the vector has to 3717figure out where they are, since to it, the array just looks like an array of 3718bytes. 3719 3720It does this by calculating a pointer to the underlying type with 3721`v + (i * size)`, where `v` is the array of bytes, `i` is the index of the 3722desired element, and `size` is the size of the underlying type. 3723 3724Doing that, vectors can avoid undefined behavior (because `char` pointers can 3725be cast to any other pointer type), while calculating the exact position of 3726every element. 3727 3728Because it can do that, it can figure out where to push new elements by 3729calculating `v + (len * size)`, where `len` is the number of items actually in 3730the vector. 3731 3732By the way, `len` is different from `cap`. While `cap` is the amount of storage 3733*available*, `len` is the number of actual elements in the vector at the present 3734point in time. 3735 3736Growing the vector happens when `len` is equal to `cap` *before* pushing new 3737items, not after. 3738 3739To add a destructor, you need to add an enum item to `BcDtorType` in 3740[`include/vector.h`][174] and add the actual destructor in the same place as the 3741enum item in the `bc_vec_dtors[]` array in [`src/data.c`][131]. 3742 3743#### Pointer Invalidation 3744 3745There is one big danger with the vectors as currently implemented: pointer 3746invalidation. 3747 3748If a piece of code receives a pointer from a vector, then adds an item to the 3749vector before they finish using the pointer, that code must then update the 3750pointer from the vector again. 3751 3752This is because any pointer inside the vector is calculated based off of the 3753array in the vector, and when the vector grows, it can `realloc()` the array, 3754which may move it in memory. If that is done, any pointer returned by 3755`bc_vec_item()`, `bc_vec_top()` and `bc_vec_item_rev()` will be invalid. 3756 3757This fact was the single most common cause of crashes in the early days of this 3758`bc`; wherever I have put a comment about pointers becoming invalidated and 3759updating them with another call to `bc_vec_item()` and friends, *do **NOT** 3760remove that code!* 3761 3762#### Maps 3763 3764Maps in `bc` are...not. 3765 3766They are really a combination of two vectors. Those combinations are easily 3767recognized in the source because one vector is named `<name>s` (plural), and the 3768other is named `<name>_map`. 3769 3770There are currently three, all in `BcProgram`: 3771 3772* `fns` and `fn_map` (`bc` functions). 3773* `vars` and `var_map` (variables). 3774* `arrs` and `arr_map` (arrays). 3775 3776They work like this: the `<name>_map` vector holds `BcId`'s, which just holds a 3777string and an index. The string is the name of the item, and the index is the 3778index of that item in the `<name>s` vector. 3779 3780Obviously, I could have just done a linear search for items in the `<name>s` 3781vector, but that would be slow with a lot of functions/variables/arrays. 3782Instead, I ensure that whenever an item is inserted into the `<name>_map` 3783vector, the item is inserted in sorted order. This means that the `<name>_map` 3784is always sorted (by the names of the items). 3785 3786So when looking up an item in the "map", what is really done is this: 3787 37881. A binary search is carried out on the names in the `<name>_map` vector. 37892. When one is found, it returns the index in the `<name>_map` vector where the 3790 item was found. 37913. This index is then used to retrieve the `BcId`. 37924. The index from the `BcId` is then used to index into the `<name>s` vector, 3793 which returns the *actual* desired item. 3794 3795Why were the `<name>s` and `<name>_map` vectors not combined for ease? The 3796answer is that sometime, when attempting to insert into the "map", code might 3797find that something is already there. For example, a function with that name may 3798already exist, or the variable might already exist. 3799 3800If the insert fails, then the name already exists, and the inserting code can 3801forego creating a new item to put into the vector. However, if there is no item, 3802the inserting code must create a new item and insert it. 3803 3804If the two vectors were combined together, it would not be possible to separate 3805the steps such that creating a new item could be avoided if it already exists. 3806 3807#### Slabs and Slab Vectors 3808 3809`bc` allocates *a lot* of small strings, and small allocations are the toughest 3810for general-purpose allocators to handle efficiently. 3811 3812Because of that reason, I decided to create a system for allocating small 3813strings using something that I call a "slab vector" after [slab 3814allocators][201]. 3815 3816These vectors allocate what I call "slabs," which are just an allocation of a 3817single page with a length to tell the slab how much of the slab is used. 3818 3819The vector itself holds slabs, and when the slab vector is asked to allocate a 3820string, it attempts to in the last slab. If that slab cannot do so, it allocates 3821a new slab and allocates from that. 3822 3823There is one exception: if a string is going to be bigger than 128 bytes, then 3824the string is directly allocated, and a slab is created with that pointer and a 3825length of `SIZE_MAX`, which tells the slab vector that it is a direct 3826allocation. Then, the last slab is pushed into the next spot and the new special 3827slab is put into the vacated spot. This ensures that a non-special slab is 3828always last. 3829 3830### Command-Line History 3831 3832When I first wrote `bc`, I immediately started using it in order to eat my own 3833dog food. 3834 3835It sucked, and the biggest reason why was because of the lack of command-line 3836history. 3837 3838At first, I just dealt with it, not knowing how command-line history might be 3839implemented. 3840 3841Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had 3842known about for some time. 3843 3844It turned out to be easier than I thought; the hardest part was the tedious 3845renaming of everything to fit the `bc` naming scheme. 3846 3847Understanding command-line history in `bc` is really about understanding VT-100 3848escape codes, so I would start there. 3849 3850Now, the history implementation of `bc` has been adapted far beyond that initial 3851adaptation to make the command-line history implementation perfect for `bc` 3852alone, including integrating it into `bc`'s [Custom I/O][114] and making sure 3853that it does not disturb output that did not end with a newline. 3854 3855On top of that, at one point, I attempted to get history to work on Windows. It 3856barely worked after a lot of work and a lot of portability code, but even with 3857all of that, it does not have at least one feature: multi-line pasting from the 3858clipboard. 3859 3860#### Editline and Readline 3861 3862I also implemented an integration with both editline and readline, though not at 3863the same time. This allows distributions that want to use either one in `bc` to 3864do so. This enables users to use their `.editrc` and `.inputrc` settings. 3865 3866The integration is custom to each library, and it does *not* involve any of 3867`bc`'s custom history code. It also required changes to `bc`'s [Custom 3868I/O][114]. 3869 3870### Error Handling 3871 3872The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it 3873became one of the things that taught me the most about C in particular and 3874programming in general. 3875 3876Before then, error handling was manual. Almost all functions returned a 3877`BcStatus` indicating if an error had occurred. This led to a proliferation of 3878lines like: 3879 3880``` 3881if (BC_ERR(s)) return s; 3882``` 3883 3884In fact, a quick and dirty count of such lines in version `2.7.2` (the last 3885version before [`3.0.0`][32]) turned up 252 occurrences of that sort of line. 3886 3887And that didn't even guarantee that return values were checked *everywhere*. 3888 3889But before I can continue, let me back up a bit. 3890 3891From the beginning, I decided that I would not do what GNU `bc` does on errors; 3892it tries to find a point at which it can recover. Instead, I decided that I 3893would have `bc` reset to a clean slate, which I believed, would reduce the 3894number of bugs where an unclean state caused errors with continuing execution. 3895 3896So from the beginning, errors would essentially unwind the stack until they got 3897to a safe place from which to clean the slate, reset, and ask for more input. 3898 3899Well, if that weren't enough, `bc` also has to handle [POSIX signals][113]. As 3900such, it had a signal handler that set a flag. But it could not safely interrupt 3901execution, so that's all it could do. 3902 3903In order to actually respond to the signal, I had to litter checks for the flag 3904*everywhere* in the code. And I mean *everywhere*. They had to be checked on 3905every iteration of *every* loop. They had to be checked going into and out of 3906certain functions. 3907 3908It was a mess. 3909 3910But fortunately for me, signals did the same thing that errors did: they unwound 3911the stack to the *same* place. 3912 3913Do you see where I am going with this? 3914 3915It turns out that what I needed was a [async-signal-safe][115] form of what 3916programmers call "exceptions" in other languages. 3917 3918I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement 3919exceptions, so I thought I would learn how to use them. How hard could it be? 3920 3921Quite hard, it turns out, especially in the presence of signals. And that's 3922because there are a few big snares: 3923 39241. The value of any local variables are not guaranteed to be preserved after a 3925 `longjmp()` back into a function. 39262. While `longjmp()` is required to be [async-signal-safe][115], if it is 3927 invoked by a signal handler that interrupted a non-[async-signal-safe][115] 3928 function, then the behavior is undefined. 39293. Any mutation that is not guaranteed to be atomic with respect to signals may 3930 be incomplete when a signal arrives. 3931 3932Oh boy. 3933 3934For number 1, the answer to this is to hide data that must stay changed behind 3935pointers. Only the *pointers* are considered local, so as long as I didn't do 3936any modifying pointer arithmetic, pointers and their data would be safe. For 3937cases where I have local data that must change and stay changed, I needed to 3938*undo* the `setjmp()`, do the change, and the *redo* the `setjmp()`. 3939 3940For number 2 and number 3, `bc` needs some way to tell the signal handler that 3941it cannot do a `longjmp()`. This is done by "locking" signals with a `volatile 3942sig_atomic_t`. (For more information, see the [Async-Signal-Safe Signal 3943Handling][173] section.) For every function that calls a function that is not 3944async-signal-safe, they first need to use `BC_SIG_LOCK` to lock signals, and 3945afterward, use `BC_SIG_UNLOCK` to unlock signals. 3946 3947Code also need to do this for all global, non-atomic mutation, which means that 3948modifying any part of the `BcVm` global struct. 3949 3950`BC_SIG_UNLOCK` has another requirement: it must check for signals or errors and 3951jump if necessary. 3952 3953On top of all of that, *all* functions with cleanup needed to be able to run 3954their cleanup. This meant that `longjmp()` could not just jump to the finish; it 3955had to start what I call a "jump series," using a stack of `jmp_buf`'s 3956(`jmp_bufs` in `BcVm`). Each `longjmp()` uses the top of the `jmp_bufs` stack to 3957execute its jump. Then, if the cleanup code was executed because of a jump, the 3958cleanup code was responsible for continuing the jump series by popping the 3959previous item off the stack and using the new top of the stack for a jump. 3960 3961In this way, C++-style exceptions were implemented in pure C. Not fun, but it 3962works. However, the order of operations matters, especially in the macros that 3963help implement the error handling. 3964 3965For example, in `BC_UNSETJMP`, signals are unlocked before checking for signals. 3966If a signal comes between, that's fine; it will still cause a jump to the right 3967place. However, disabling the lock after could mean that a signal could come 3968*after* checking for signals, but before signals were unlocked, causing the 3969handling of the signal to be delayed. 3970 3971#### Custom I/O 3972 3973Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and 3974`longjmp()` for error handling (see the [Error Handling][97] section), and the 3975buffered I/O in `libc` does not interact well with the use of those procedures; 3976all of the buffered I/O API is basically non-[async-signal-safe][115]. 3977 3978Implementing custom buffered I/O had other benefits. First, it allowed me to 3979tightly integrate history with the I/O code. Second, it allowed me to make 3980changes to history in order to make it adapt to user prompts. 3981 3982### Lexing 3983 3984To simplify parsing, both calculators use lexers to turn the text into a more 3985easily-parsable form. 3986 3987While some tokens are only one character long, others require many tokens, and 3988some of those need to store all of the text corresponding to the token for use 3989by the parsers. Tokens that need to store their corresponding text include, but 3990are not limited to: 3991 3992* Strings. 3993* Numbers. 3994* Identifiers. 3995 3996For this purpose, the lexer has a [vector][111] named `str` to store the data 3997for tokens. This data is overwritten if another token is lexed that needs to 3998store data, so the parsers need to copy the data before calling the lexer again. 3999 4000Both lexers do some of the same things: 4001 4002* Lex identifiers into tokens, storing the identifier in `str`. 4003* Lex number strings into tokens, storing the string in `str`. 4004* Lex whitespace. 4005* Lex comments. 4006 4007Other than that, and some common plumbing, the lexers have separate code. 4008 4009#### `dc` Lexing 4010 4011The `dc` lexer is remarkably simple; in fact, besides [`src/main.c`][205], 4012[`src/bc.c`][40], and [`src/dc.c`][44], which just contain one function each, 4013the only file smaller than [`src/dc_lex.c`][45] is [`src/args.c`][206], which 4014just processes command-line arguments after they are parsed by 4015[`src/opt.c`][51]. 4016 4017For most characters, the `dc` lexer is able to convert directly from the 4018character to its corresponding token. This happens using `dc_lex_tokens[]` in 4019[`src/data.c`][131]. 4020 4021`dc`'s lexer also has to lex the register name after lexing tokens for commands 4022that need registers. 4023 4024And finally, `dc`'s lexer needs to parse `dc` strings, which is the only part of 4025the `dc` lexer that is more complex than the `bc` lexer. This is because `dc` 4026strings need to have a balanced number of brackets. 4027 4028#### `bc` Lexing 4029 4030The `bc` lexer is fairly simple. It does the following things: 4031 4032* Lexes `bc` strings. 4033* Lexes `bc` identifiers. This is necessary because this is how `bc` keywords 4034 are lexed. After ensuring that an identifier is not a keyword, the `bc` lexer 4035 allows the common identifier function to take over. 4036* Turns characters and groups of characters into `bc` operator tokens. 4037 4038### Parsing 4039 4040The difference between parsing `bc` and `dc` code is...vast. The `dc` parser is 4041simple, while the `bc` parser is the most complex piece of code in the entire 4042codebase. 4043 4044However, they both do some of the same things. 4045 4046First, the parsers do *not* use [abstract syntax trees][207]; instead, they 4047directly generate the bytecode that will be executed by the `BcProgram` code. 4048Even in the case of `bc`, this heavily simplifies the parsing because the 4049[Shunting-Yard Algorithm][109] is designed to generate [Reverse Polish 4050Notation][108], which is basically directly executable. 4051 4052Second, any extra data that the `BcProgram` needs for execution is stored into 4053functions (see the [Functions][208] section). These include constants and 4054strings. 4055 4056#### `dc` Parsing 4057 4058The parser for `dc`, like its lexer, is remarkably simple. In fact, the easiness 4059of lexing and parsing [Reverse Polish notation][108] is probably why it was used 4060for `dc` when it was first created at Bell Labs. 4061 4062For most tokens, the `dc` parser is able to convert directly from the token 4063to its corresponding instruction. This happens using `dc_parse_insts[]` in 4064[`src/data.c`][131]. 4065 4066`dc`'s parser also has to parse the register name for commands that need 4067registers. This is the most complex part of the `dc` parser; each different 4068register command needs to be parsed differently because most of them require two 4069or more instructions to execute properly. 4070 4071For example, storing in a register requires a swap instruction and an assignment 4072instruction. 4073 4074Another example are conditional execution instructions; they need to produce the 4075instruction for the condition, and then they must parse a possible "else" part, 4076which might not exist. 4077 4078##### Existing Commands 4079 4080`dc` is based on commands, which are usually one letter. The following table is 4081a table of which ASCII characters are already used: 4082 4083| Characters | Used? | For... | 4084|------------|-------|--------------------------------------------| 4085| Space | x | Separator | 4086| `!` | x | Conditional Execution of Registers | 4087| `"` | x | Bounded Rand Operator | 4088| `#` | x | Comments | 4089| `$` | x | Truncation | 4090| `%` | x | Modulus | 4091| `&` | | | 4092| `'` | x | Rand Operator | 4093| `(` | x | Greater Than Operator | 4094| `)` | x | Less Than Operator | 4095| `*` | x | Multiplication | 4096| `+` | x | Addition | 4097| `,` | x | Depth of Execution Stack | 4098| `-` | x | Subtraction | 4099| `.` | x | Numbers | 4100| `/` | x | Division | 4101| `0-9` | x | Numbers | 4102| `:` | x | Store into Array | 4103| `;` | x | Load from Array | 4104| `<` | x | Conditional Execution of Registers | 4105| `=` | x | Conditional Execution of Registers | 4106| `>` | x | Conditional Execution of Registers | 4107| `?` | x | Ask for User Input | 4108| `@` | x | Places Operator | 4109| `A-F` | x | Numbers | 4110| `G` | x | Equal Operator | 4111| `H` | x | Shift Left | 4112| `I` | x | Push `ibase` onto Stack | 4113| `J` | x | Push `seed` onto Stack | 4114| `K` | x | Push `scale` onto Stack | 4115| `L` | x | Pop off of Register | 4116| `M` | x | Boolean And Operator | 4117| `N` | x | Boolean Not Operator | 4118| `O` | x | Push `obase` onto Stack | 4119| `P` | x | Byte Stream Printing | 4120| `Q` | x | Quit Some Number of Macros | 4121| `R` | x | Pop Top of Stack | 4122| `S` | x | Push onto Register | 4123| `T` | x | Push Max `ibase` onto Stack | 4124| `U` | x | Push Max `obase` onto Stack | 4125| `V` | x | Push Max `scale` onto Stack | 4126| `W` | x | Push Max of `'` Operator | 4127| `X` | x | Scale of a Number | 4128| `Y` | x | Length of Array | 4129| `Z` | x | Number of Significant Digits | 4130| `[` | x | Strings | 4131| `\\` | x | Escaping Brackets in Strings | 4132| `]` | x | Strings | 4133| `^` | x | Power | 4134| `_` | x | Negative Numbers and Negation | 4135| Backtick | | | 4136| `a` | x | Asciify | 4137| `b` | x | Absolute Value | 4138| `c` | x | Clear Stack | 4139| `d` | x | Duplication of Top of Stack | 4140| `e` | x | Else in Conditional Execution of Registers | 4141| `f` | x | Printing the Stack | 4142| `g` | x | Global Settings | 4143| `h` | x | Shift Right | 4144| `i` | x | Set `ibase` | 4145| `j` | x | Set `seed` | 4146| `k` | x | Set `scale` | 4147| `l` | x | Load from Register | 4148| `m` | x | Boolean Or Operator | 4149| `n` | x | Print and Pop | 4150| `o` | x | Set `obase` | 4151| `p` | x | Print with Newline | 4152| `q` | x | Quit Two Macros | 4153| `r` | x | Swap Top Two Items | 4154| `s` | x | Store into Register | 4155| `t` | x | Equivalent of `bc`'s `is_number()` | 4156| `u` | x | Equivalent of `bc`'s `is_string()` | 4157| `v` | x | Square Root | 4158| `w` | | | 4159| `x` | x | Execute String | 4160| `y` | x | Current Depth of a Register | 4161| `z` | x | Current Depth of Stack | 4162| `{` | x | Greater Than or Equal Operator | 4163| `\|` | x | Moduler Exponentiation | 4164| `}` | x | Less Than or Equal Operator | 4165| `~` | x | Division and Modulus Combined | 4166 4167#### `bc` Parsing 4168 4169`bc`'s parser is, by far, the most sensitive piece of code in this software, and 4170there is a very big reason for that: `bc`'s standard is awful and defined a very 4171poor language. 4172 4173The standard says that either semicolons or newlines can end statements. Trying 4174to parse the end of a statement when it can either be a newline or a semicolon 4175is subtle. Doing it in the presence of control flow constructs that do not have 4176to use braces is even harder. 4177 4178And then comes the biggest complication of all: `bc` has to assume that it is 4179*always* at a REPL (Read-Eval-Print Loop). `bc` is, first and foremost, an 4180*interactive* utility. 4181 4182##### Flags 4183 4184All of this means that `bc` has to be able to partially parse something, store 4185enough data to recreate that state later, and return, making sure to not 4186execute anything in the meantime. 4187 4188*That* is what the flags in [`include/bc.h`][106] are: they are the state that 4189`bc` is saving for itself. 4190 4191It saves them in a stack, by the way, because it's possible to nest 4192structures, just like any other programming language. Thus, not only does it 4193have to store state, it needs to do it arbitrarily, and still be able to 4194come back to it. 4195 4196So `bc` stores its parser state with flags in a stack. Careful setting of these 4197flags, along with properly using them and maintaining the flag stack, are what 4198make `bc` parsing work, but it's complicated. In fact, as I mentioned, the `bc` 4199parser is the single most subtle, fickle, and sensitive piece of code in the 4200entire codebase. Only one thing came close once: square root, and that was only 4201sensitive because I wrote it wrong. This parser is pretty good, and it is 4202*still* sensitive. And flags are the reason why. 4203 4204For more information about what individual flags there are, see the comments in 4205[`include/bc.h`][106]. 4206 4207##### Labels 4208 4209`bc`'s language is Turing-complete. That means that code needs the ability to 4210jump around, specifically to implement control flow like `if` statements and 4211loops. 4212 4213`bc` handles this while parsing with what I called "labels." 4214 4215Labels are markers in the bytecode. They are stored in functions alongside the 4216bytecode, and they are just indices into the bytecode. 4217 4218When the `bc` parser creates a label, it pushes an index onto the labels array, 4219and the index of the label in that array is the index that will be inserted into 4220the bytecode. 4221 4222Then, when a jump happens, the index pulled out of the bytecode is used to index 4223the labels array, and the label (index) at the index is then used to set the 4224instruction pointer. 4225 4226##### Cond Labels 4227 4228"Cond" labels are so-called because they are used by conditionals. 4229 4230The key to them is that they come *before* the code that uses them. In other 4231words, when jumping to a condition, code is jumping *backwards*. 4232 4233This means that when a cond label is created, the value that should go there is 4234well-known. Cond labels are easy. 4235 4236However, they are still stored on a stack so that the parser knows what cond 4237label to use. 4238 4239##### Exit Labels 4240 4241Exit labels are not so easy. 4242 4243"Exit" labels are so-called because they are used by code "exiting" out of `if` 4244statements or loops. 4245 4246The key to them is that they come *after* the code that uses them. In other 4247words, when jumping to an exit, code is jumping *forwards*. 4248 4249But this means that when an exit label is created, the value that should go 4250there is *not* known. The code that needs it must be parsed and generated first. 4251 4252That means that exit labels are created with the index of `SIZE_MAX`, which is 4253then specifically checked for with an assert in `bc_program_exec()` before using 4254those indices. 4255 4256There should ***NEVER*** be a case when an exit label is not filled in properly 4257if the parser has no bugs. This is because every `if` statement, every loop, 4258must have an exit, so the exit must be set. If not, there is a bug. 4259 4260Exit labels are also stored on a stack so that the parser knows what exit label 4261to use. 4262 4263##### Expression Parsing 4264 4265`bc` has expressions like you might expect in a typical programming language. 4266This means [infix notation][107]. 4267 4268One thing about infix notation is that you can't just generate code straight 4269from it like you can with [Reverse Polish notation][108]. It requires more work 4270to shape it into a form that works for execution on a stack machine. 4271 4272That extra work is called the [Shunting-Yard algorithm][109], and the form it 4273translates infix notation into is...[Reverse Polish notation][108]. 4274 4275In order to understand the rest of this section, you must understand the 4276[Shunting-Yard algorithm][109]. Go do that before you read on. 4277 4278###### Operator Stack 4279 4280In `bc`, the [Shunting-Yard algorithm][109] is implemented with bytecode as the 4281output and an explicit operator stack (the `ops` field in `BcParse`) as the 4282operator stack. It stores tokens from `BcLex`. 4283 4284However, there is one **HUGE** hangup: multiple expressions can stack. This 4285means that multiple expressions can be parsed at one time (think an array element 4286expression in the middle of a larger expression). Because of that, we need to 4287keep track of where the previous expression ended. That's what `start` parameter 4288to `bc_parse_operator()` is. 4289 4290Parsing multiple expressions on one operator stack only works because 4291expressions can only *stack*; this means that, if an expression begins before 4292another ends, it must *also* end before that other expression ends. This 4293property ensures that operators will never interfere with each other on the 4294operator stack. 4295 4296Also, the operator precedence is reversed: a lower precedence *value* means a 4297higher precedence. 4298 4299###### Recursion 4300 4301Because expressions can stack, parsing expressions actually requires recursion. 4302Well, it doesn't *require* it, but the code is much more readable that way. 4303 4304This recursion is indirect; the functions that `bc_parse_expr_err()` (the actual 4305expression parsing function) calls can, in turn, call it. 4306 4307###### Expression Flags 4308 4309There is one more big thing: not all expressions in `bc` are equal. 4310 4311Some expressions have requirements that others don't have. For example, only 4312array arguments can be arrays (which are technically not expressions, but are 4313treated as such for parsing), and some operators (in POSIX) are not allowed in 4314certain places. 4315 4316For this reason, functions that are part of the expression parsing 4317infrastructure in `bc`'s parser usually take a `flags` argument. This is meant 4318to be passed to children, and somewhere, they will be checked to ensure that the 4319resulting expression meets its requirements. 4320 4321There are also places where the flags are changed. This is because the 4322requirements change. 4323 4324Maintaining the integrity of the requirements flag set is an important part of 4325the `bc` parser. However, they do not have to be stored on a stack because their 4326stack is implicit from the recursion that expression parsing uses. 4327 4328### Functions 4329 4330Functions, in `bc`, are data structures that contain the bytecode and data 4331produced by the parsers. Functions are what the `BcProgram` program executes. 4332 4333#### Main and Read Functions 4334 4335There are two functions that always exist, which I call the "main" and "read" 4336functions. 4337 4338The "main" function is the function in which any code and data outside other 4339functions is put. Basically, it is the function where the scripting code ends 4340up. 4341 4342The "read" function is the function that is reset and parsed every time a call 4343to the `read()` builtin function happens. 4344 4345#### `dc` Strings 4346 4347In `dc`, strings can be executed, and since there are no actual "functions" in 4348`dc`, strings are handled as functions. In fact, they are effectively translated 4349into functions by parsing. 4350 4351##### Tail Calls 4352 4353Since strings in `dc` are functions, and the fact that `dc` has no native loops, 4354such loops are implemented in `dc` code using strings with conditional execution 4355commands at the end of strings. 4356 4357When such conditional execution, or even unconditional execution, commands are 4358the very last commands in a string, then `dc` can perform a [tail call][202]. 4359 4360This is done by recording the fact that a tail call happened, done by 4361incrementing an integer on a stack. When a string is executed *without* a tail 4362call, a new entry is pushed onto the stack with the integer `1`. 4363 4364When a string finally quits that followed tail calls, its stack entry is popped, 4365eliminating all of those tail calls. 4366 4367Why perform tail calls? Because otherwise, `dc` would be subject to the same 4368thing that plagues [functional programming languages][203]: stack overflow. In 4369`dc`'s case, that would manifest itself as a growing [heap][204], because the 4370execution stack is stored on the heap, until a fatal allocation failure would 4371occur. 4372 4373#### Execution 4374 4375Execution is handled by an interpreter implemented using `BcProgram` and code 4376in [`src/program.c`][53]. 4377 4378The interpreter is a mix between a [stack machine][210] and a [register 4379machine][211]. It is a stack machine in that operations happen on a stack I call 4380the "results stack," but it is a register machine in that items on the stack can 4381be stored to and loaded from "registers" (`dc` terminology), variables (`bc` 4382terminology), and arrays. 4383 4384##### Stacks 4385 4386There are two stacks in the interpreter: 4387 4388* The "results" stack (as mentioned above). 4389* The "execution" stack. 4390 4391The results stack (the `results` field of the `BcProgram` struct) is the stack 4392where the results of computations are stored. It is what makes the interpreter 4393part [stack machine][210]. It is filled with `BcResult`'s. 4394 4395The execution stack (the `stack` field of the `BcProgram` struct) is the stack 4396that tracks the current execution state of the interpreter. It is the presence 4397of this separate stack that allows the interpreter to implement the machine as a 4398loop, rather than recursively. It is filled with `BcInstPtr`'s, which are the 4399"instruction pointers." 4400 4401These instruction pointers have three fields, all integers: 4402 4403* `func`, the index of the function that is currently executing. 4404* `idx`, the index of the next bytecode instruction to execute in the function's 4405 bytecode array. 4406* `len`, which is the length of the results stack when the function started 4407 executing. This is not used by `dc`, but it used by `bc` because functions 4408 in `bc` should never affect the results stack of their callers. 4409 4410With these three fields, and always executing using the instruction pointer at 4411the top of the execution stack, the interpreter can always keep track of its 4412execution. 4413 4414When a function or a string starts executing, a new `BcInstPtr` is pushed onto 4415the execution stack for it. This includes if a function was called recursively. 4416And then, when the function or string returns, its `BcInstPtr` is popped off of 4417the execution stack. 4418 4419##### Bytecode 4420 4421Execution of functions are done through bytecode produced directly by the 4422parsers (see the [Parsing][209]). This bytecode is stored in the `code` 4423[vector][111] of the `BcFunc` struct. 4424 4425This is a vector for two reasons: 4426 4427* It makes it easier to add bytecode to the vector in the parsers. 4428* `bc` allows users to redefine functions. 4429 4430The reason I can use bytecode is because there are less than 256 instructions, 4431so an `unsigned char` can store all the bytecodes. 4432 4433###### Bytecode Indices 4434 4435There is one other factor to bytecode: there are instructions that need to 4436reference strings, constants, variables, or arrays. Bytecode need some way to 4437reference those things. 4438 4439Fortunately, all of those things can be referenced in the same way: with indices 4440because all of the items are in vectors. 4441 4442So `bc` has a way of encoding an index into bytecode. It does this by, after 4443pushing the instruction that references anything, pushing a byte set to the 4444length of the index in bytes, then the bytes of the index are pushed in 4445little-endian order. 4446 4447Then, when the interpreter encounters an instruction that needs one or more 4448items, it decodes the index or indices there and updates the `idx` field of the 4449current `BcInstPtr` to point to the byte after the index or indices. 4450 4451One more thing: the encoder of the indices only pushes as many bytes as 4452necessary to encode the index. It stops pushing when the index has no more bytes 4453with any 1 bits. 4454 4455##### Variables 4456 4457In `bc`, the vector of variables, `vars` in `BcProgram`, is not a vector of 4458numbers; it is a vector of vector of numbers. The first vector is the vector of 4459variables, the second is the variable stack, and the last level is the actual 4460number. 4461 4462This is because both `bc` and `dc` need variables to be stacks. 4463 4464For `dc`, registers are *defined* to be stacks. 4465 4466For `bc`, variables as stacks is how function arguments/parameters and function 4467`auto` variables are implemented. 4468 4469When a function is called, and a value needs to be used as a function argument, 4470a copy of the value is pushed onto the stack corresponding to the variable with 4471the same name as the function's parameter. For `auto` variables, a new number 4472set to zero is pushed onto each stack corresponding to the `auto` variables. 4473(Zero is used because the [`bc` spec][2] requires that `auto` variables are set 4474to zero.) 4475 4476It is in this way that the old value of the variable, which may not even be 4477related to the function parameter or `auto` variable, is preserved while the 4478variable is used as a function parameter or `auto` variable. 4479 4480When the function returns, of course, the stacks of the variables for the 4481parameters and `auto`'s will have their top item popped, restoring the old value 4482as it was before the function call. 4483 4484##### Arrays 4485 4486Like variables, arrays are also implemented as stacks. However, because they are 4487arrays, there is yet another level; the `arrs` field in `BcProgram` is a vector 4488of vectors of vectors of numbers. The first of the two levels is the vector of 4489arrays, the second the stack of for each array, the third the actual array, and 4490last the numbers in the array. 4491 4492`dc` has no need of this extra stack, but `bc` does because arrays can be 4493function parameters themselves. 4494 4495When arrays are used for function arguments, they are copied with a deep copy; 4496each item of the source vector is copied. This is because in `bc`, according to 4497the [`bc` spec][2], all function arguments are passed by value. 4498 4499However, array references are possible (see below). 4500 4501When arrays are used as `auto`'s, a new vector is pushed with one element; if 4502more elements are needed, the array is grown automatically, and new elements are 4503given the value of zero. 4504 4505In fact, if *any* array is accessed and does not have an element at that index, 4506the array is automatically grown to that size, and all new elements are given 4507the value zero. This behavior is guaranteed by the [`bc` spec][2]. 4508 4509###### Array References 4510 4511Array references had to be implemented as vectors themselves because they must 4512be pushed on the vectors stacks, which, as seen above, expect vectors 4513themselves. 4514 4515So thus, references are implemented as vectors on the vector stacks. These 4516vectors are not vectors of vectors themselves; they are vectors of bytes; in 4517fact, the fact that they are byte vectors and not vector vectors is how a 4518reference vector is detected. 4519 4520These reference vectors always have the same two things pushed: a byte encoding 4521(the same way bytecode indices are) of the referenced vector's index in the 4522`arrs` vector, and a byte encoding of the referenced vectors index in the vector 4523stack. 4524 4525If an item in a referenced vector is needed, then the reference is dereferenced, 4526and the item is returned. 4527 4528If a reference vector is passed to a function that does *not* expect a 4529reference, the vector is dereferenced and a deep copy is done, in the same way 4530as vectors are copied for normal array function parameters. 4531 4532### Callbacks 4533 4534There are many places in `bc` and `dc` where function pointers are used: 4535 4536* To implement destructors in vectors. (See the [Vectors][111] section.) 4537* To select the correct lex and parse functions for `bc` and `dc`. 4538* To select the correct function to execute unary operators. 4539* To select the correct function to execute binary operators. 4540* To calculate the correct number size for binary operators. 4541* To print a "digit" of a number. 4542* To seed the pseudo-random number generator. 4543 4544And there might be more. 4545 4546In every case, they are used for reducing the amount of code. Instead of 4547`if`/`else` chains, such as: 4548 4549``` 4550if (BC_IS_BC) { 4551 bc_parse_parse(vm.parse); 4552} 4553else { 4554 dc_parse_parse(vm.parse); 4555} 4556``` 4557 4558The best example of this is `bc_num_binary()`. It is called by every binary 4559operator. It figures out if it needs to allocate space for a new `BcNum`. If so, 4560it allocates the space and then calls the function pointer to the *true* 4561operation. 4562 4563Doing it like that shrunk the code *immensely*. First, instead of every single 4564binary operator duplicating the allocation code, it only exists in one place. 4565Second, `bc_num_binary()` itself does not have a massive `if`/`else` chain or a 4566`switch` statement. 4567 4568But perhaps the most important use was for destructors in vectors. 4569 4570Most of the data structures in `bc` are stored in vectors. If I hadn't made 4571destructors available for vectors, then ensuring that `bc` had no memory leaks 4572would have been nigh impossible. As it is, I check `bc` for memory leaks every 4573release when I change the code, and I have not released `bc` after version 4574`1.0.0` with any memory leaks, as far as I can remember anyway. 4575 4576### Numbers 4577 4578In order to do arbitrary-precision math, as `bc` must do, there must be some way 4579of representing arbitrary-precision numbers. `BcNum` in [`include/num.h`][184] 4580is `bc`'s way of doing that. 4581 4582(Note: the word ["limb"][214] is used below; it has a specific meaning when 4583applied to arbitrary-precision numbers. It means one piece of the number. It can 4584have a single digit, which is what GNU `bc` does, or it can have multiple, which 4585is what this `bc` does.) 4586 4587This struct needs to store several things: 4588 4589* The array of limbs of the number. This is the `num` field. 4590* The location of the decimal point. This is the `rdx` (short for [radix][215]) 4591 field. 4592* The number of limbs the number has. This is the `len` field. 4593* Whether the number is negative or not. This is the least significant bit of 4594 the `rdx` field. More on that later. 4595 4596In addition, `bc`'s number stores the capacity of the limb array; this is the 4597`cap` field. 4598 4599If the number needs to grow, and the capacity of the number is big enough, the 4600number is not reallocated; the number of limbs is just added to. 4601 4602There is one additional wrinkle: to make the usual operations (binary operators) 4603fast, the decimal point is *not* allowed to be in the middle of a limb; it must 4604always be between limbs, after all limbs (integer), or before all limbs (real 4605between -1 and 1). 4606 4607The reason for this is because addition, subtraction, multiplication, and 4608division expect digits to be lined up on the decimal point. By requiring that it 4609be between limbs, no extra alignment is needed, and those operations can proceed 4610without extra overhead. 4611 4612This does make some operations, most notably extending, truncating, and 4613shifting, more expensive, but the overhead is constant, and these operations are 4614usually cheap compared to the binary operators anyway. 4615 4616This also requires something else: `bc` numbers need to know *exactly* how many 4617decimal places they have after the decimal point. If the decimal point must be 4618inbetween limbs, the last decimal place could be in the middle of a limb. The 4619amount of decimal places in a number is carefully tracked and stored in the 4620`scale` field, and this number must always coincide with the `rdx` field by the 4621following formula: 4622 4623``` 4624scale + (BC_BASE_DIGS - 1) / BC_BASE_DIGS == rdx >> 1 4625``` 4626 4627(`BC_BASE_DIGS` is the number of decimal digits stored in one limb. It is 9 on 462864-bit systems and 4 on other systems.) 4629 4630Yes, `rdx` is shifted; that is because the negative bit is stored in the least 4631significant bit of the `rdx` field, and the actual radix (amount of limbs after 4632the decimal/radix point) is stored in the rest of the bits. This is safe because 4633`BC_BASE_DIGS` is always at least 4, which means `rdx` will always need at least 46342 bits less than `scale`. 4635 4636In addition to `rdx` always matching `scale`, another invariant is that `rdx` 4637must always be less than or equal to `len`. (Because `scale` may be greater than 4638`rdx`, `scale` does not have to be less than or equal to `len`.) 4639 4640Another invariant is that `len` must always be less than or equal to `cap`, for 4641obvious reasons. 4642 4643The last thing programmers need to know is that the limb array is stored in 4644little-endian order. This means that the last decimal places are in the limb 4645stored at index 0, and the most significant digits are stored at index `len-1`. 4646 4647This is done to make the most important operations fast. Addition and 4648subtraction are done from least significant to most significant limbs, which 4649means they can speed through memory in the way most computers are best at. 4650Multiplication does the same, sort of, and with division, it matters less. 4651Comparison does need to go backwards, but that's after exhausting all other 4652alternatives, including for example, checking the length of the integer portion 4653of each limb array. 4654 4655Finally, here are some possible special situations with numbers and what they 4656mean: 4657 4658* `len == 0`: the number equals 0. 4659* `len == 0 && scale != 0`: the number equals 0, but it has a `scale` value. 4660 This is the only case where `scale` does not have to coincide with `rdx` 4661 This can happen with division, for example, that sets a specific `scale` for 4662 the result value but may produce 0. 4663* `(rdx >> 1) == len`: the number is greater than or equal to 1, or less than or 4664 equal to -1. 4665* `(rdx >> 1) < len`: the number is greater than -1 and less than 1, not 4666 including 0, although this will be true for 0 as well. However, 0 is always 4667 assumed to be represented by `len == 0`. 4668* `(rdx >> 1) == 0`: the number is an integer. In this case, `scale` must also 4669 equal 0. 4670 4671#### Math Style 4672 4673When I wrote the math for `bc`, I adopted a certain style that, if known, will 4674make it easier to understand the code. The style follows these rules: 4675 4676* `BcNum` arguments always come before arguments of other types. 4677* Among the `BcNum` arguments, the operands always come first, and the `BcNum` 4678 where the result(s) will be stored come last. 4679* Error checking is placed first in the function. 4680* Easy cases are placed next. 4681* Preparation, such as allocating temporaries, comes next. 4682* The actual math. 4683* Cleanup and ensuring invariants. 4684 4685While these rules are not hard and fast, using them as a guide will probably 4686help. 4687 4688#### Digit Clamping 4689 4690There is a question of what to do when parsing numbers and coming across digits 4691that are greater than or equal to the current `ibase`. `bc` and `dc` do one of 4692two things: 4693 4694* Clamp the digit to the maximum correct digit, or 4695* Use the digit as-is, multiplying it by the `ibase` to the power of the digit's 4696 position (starting from 0 at the least significant digit). 4697 4698The former behavior is what GNU `bc` does, and the latter is what GNU `dc`, the 4699OpenBSD `bc`, and the OpenBSD `dc` do. 4700 4701It is possible to switch between the two with the `-c` and `-C` command-line 4702options. However, it is important for developers to understand that both 4703behaviors exist and should exist. 4704 4705The library can also do both, and it is set in a global for each thread. 4706 4707### Strings as Numbers 4708 4709Strings can be assigned to variables. This is a problem because the vectors for 4710variable stacks expect `BcNum` structs only. 4711 4712While I could have made a union, I decided that the complexity of adding an 4713entirely new type, with destructor and everything, was not worth it. Instead, I 4714took advantage of the fact that `free()`, when passed a `NULL` pointer, will do 4715nothing. 4716 4717Using that, I made it so `BcNum`'s could store strings instead. This is marked 4718by the `BcNum` having a `NULL` limb array (`num`) and a `cap` of 0 (which should 4719*never* happen with a real number, though the other fields could be 0). 4720 4721The `BcNum` stores the function that stores the string in the `rdx` field, and 4722it stores the index of the string in the `scale` field. This is used to actually 4723load the string if necessary. 4724 4725Note that historically, string information was stored in the `loc` field of 4726the `d` union in a `BcResult`. This was changed recently to standardize; now, 4727all string information are stored in the `n` field of the `d` union regardless. 4728This means that all string information is stored in `BcNum`'s. This removes 4729extra cases. 4730 4731Also, if a temp is made with a string, then the result type should still be 4732`BC_RESULT_STR`, not `BC_RESULT_TEMP`. This is to make it easier to do type 4733checks. 4734 4735### Pseudo-Random Number Generator 4736 4737In order to understand this section, I suggest you read the information in the 4738manpages about the pseudo-random number generator (PRNG) first; that will help 4739you understand the guarantees it has, which is important because this section 4740delves into implementation details. 4741 4742First, the PRNG I use is seeded; this is because most OS's have an excellent 4743cryptographically secure PRNG available via command-line, usually 4744`/dev/urandom`, but the only *seeded* PRNG available is usually `bash`'s 4745`$RANDOM`, which is essentially a wrapper around C's `rand()`. 4746 4747`rand()` is...bad. It is only guaranteed to return 15 bits of random data. 4748Obviously, getting good random data out of that would be hard with that alone, 4749but implementations also seem to be poor. 4750 4751On top of that, `bc` is an arbitrary-precision calculator; if I made it able to 4752generate random numbers, I could make it generate random numbers of any size, 4753and since it would be seeded, results would be reproducible, when wanted. 4754 4755So to get that, I needed a seeded PRNG with good characteristics. After scouring 4756the Internet, I decided on the [PCG PRNG][215], mostly because of [this blog 4757post][216]. Part of the reason was the behavior of the xoroshiro128+ author, who 4758hates on PCG and its author, but also because PCG seemed to do better when 4759tested by independent parties. 4760 4761After that decision, I faced a challenge: PCG requires 255 bits of seed: 128 for 4762the actual seed, and 127 for the "increment." (Melissa O'Neill, the PCG author, 4763likens the increment to selecting a codebook.) 4764 4765I could, of course, put the entire 255 bits into one massive arbitrary-precision 4766number; `bc` is good at that, after all. But that didn't sit right with me 4767because it would mean any seed selected by users would have the real portion 4768ignored, which is stupid in a program like `bc`. 4769 4770Instead, I decided to make the integer portion the increment (clamped down to 4771size), and the real portion the seed. 4772 4773In most cases, this would be a bad idea because you cannot, in general, know how 4774many decimal places you need to represent any number with `n` real digits in 4775base `b` in another base. However, there is an easy to how many decimal digits 4776after the decimal point it takes to represent reals of base 2 in base 10: the 4777power of two. 4778 4779It turns out that, for base 2 represented in base 10, the power of 2 is 4780*exactly* how many digits are necessary to represent *any* number `n/2^p`, where 4781`p` is the power of 2. This is because at every halving, the number of decimal 4782places increases by 1: 4783 4784``` 47850.5 47860.25 47870.125 47880.0625 47890.03125 47900.015625 4791... 4792``` 4793 4794This happens because because every decimal representation of `1/2^p` ends with a 47955 because 5 is half of 10. Because 5 is odd, half of it always ends with the 4796digits 25, where 2 is where the previous 5 was, and the new 5 is one decimal 4797place further right. 4798 4799So the algorithm to convert all 255 bits of the seed is as follows: 4800 48011. Convert the increment to a `BcNum`. 48022. Convert the seed to a `BcNum`. 48033. Divide the seed by `2^128` with a `scale` of 128. (For 32-bit systems, 4804 substitute 64 bits for 128.) 48054. Add the two numbers together. 4806 4807Likewise, the algorithm to convert from a user-supplied number to a seed is: 4808 48091. Truncate a copy of the number. 48102. Subtract the result from #1 from the original number. This gives the real 4811 portion of the number. 48123. Clamp the result of #1 to 127 (or 63) bits. This is the increment. 48134. Multiply the result of #2 by `2^128`. 48145. Truncate the result of #4. This is the seed. 4815 4816#### Generating Arbitrary-Precision Numbers 4817 4818I wrote a function (`bc_rand_bounded()`) that will return unbiased results with 4819any bound below the max that PCG can generate. 4820 4821To generate an integer of arbitrary size using a bound, `bc` simply uses 4822`bc_rand_bounded()` to generate numbers with a bound `10^BC_BASE_DIGS` for as 4823many limbs as needed to satisfy the bigger bound. 4824 4825To generate numbers with arbitrary precision after the decimal point, `bc` 4826merely generates an arbitrary precision integer with the bound `10^p`, where `p` 4827is the desired number of decimal places, then divides in by `10^p` with a 4828`scale` of `p`. 4829 4830## Debug Code 4831 4832Besides building `bc` in debug mode with the `-g` flag to [`configure.sh`][69], 4833programmers can also add `-DBC_DEBUG_CODE=1` to the `CFLAGS`. This will enable 4834the inclusion of *a lot* of extra code to assist with debugging. 4835 4836For more information, see all of the code guarded by `#if BC_DEBUG_CODE` in the 4837[`include/`][212] directory and in the [`src/`][213] directory. 4838 4839Yes, all of the code is guarded by `#if` preprocessor statements; this is 4840because the code should *never* be in a release build, and by making programmers 4841add this manually (not even an option to [`configure.sh`][69]), it is easier to 4842ensure that never happens. 4843 4844However, that said, the extra debug code is useful; that was why I kept it in. 4845 4846## Performance 4847 4848While I have put in a lot of effort to make `bc` as fast as possible, there 4849might be some things users can do to speed it up without changing the code. 4850 4851First, users can probably use [profile-guided optimization][217] to optimize 4852even better, using the test suite to profile. 4853 4854Second, I included macros that might help branch placement and prediction: 4855 4856* `BC_ERR(e)` 4857* `BC_UNLIKELY(e)` 4858* `BC_NO_ERR(e)` 4859* `BC_LIKELY(e)` 4860 4861`BC_ERR` is the same as `BC_UNLIKELY`, and `BC_NO_ERR` is the same as 4862`BC_LIKELY`; I just added them to also document branches that lead to error 4863conditions or *away* from error conditions. 4864 4865Anyway, if `BC_LIKELY` and `BC_UNLIKELY` are not defined during compilation, 4866they expand to nothing but the argument they were given. 4867 4868They can, however, be defined to `__builtin_expect((e), 1)` and 4869`__builtin_expect((e), 0)`, respectively, on GCC and Clang for better branch 4870prediction and placement. (For more information about `__builtin_expect()` see 4871the [GCC documentation][218].) 4872 4873There might be other compilers that can take advantage of that, but I don't know 4874anything about that. 4875 4876Also, as stated in the [build manual][219], link-time optimization is excellent 4877at optimizing this `bc`. Use it. 4878 4879### Benchmarks 4880 4881To help programmers improve performance, I have built and assembled 4882infrastructure to make benchmarking easy. 4883 4884First, in order to easily run benchmarks, I created 4885[`scripts/benchmark.sh`][220]. 4886 4887Second, I copied and adapted [`ministat.c`][223] [from FreeBSD][221], to make it 4888easier to judge whether the results are significant or not. 4889 4890Third, I made the `make` clean target `make clean_benchmarks`, to clean 4891`scripts/ministat` and the generated benchmark files. 4892 4893Fourth, I made it so [`scripts/benchmark.sh`][220] outputs the timing and memory 4894data in a format that is easy for `scripts/ministat` to digest. 4895 4896To add a benchmark, add a script in the right directory to generate the 4897benchmark. Yes, generate. 4898 4899All of the benchmarks are generated first, from `.bc` and `.dc` files in the 4900[`benchmarks/bc/`][91] and [`benchmarks/dc/`][224]. This is so that massive 4901amounts of data can be generated and then pushed through the calculators. 4902 4903If you need to benchmark `bc` or `dc` with simple loops, have the generator 4904files simply print the loop code. 4905 4906### Caching of Numbers 4907 4908In order to provide some performance boost, `bc` tries to reuse old `BcNum`'s 4909that have the default capacity (`BC_NUM_DEF_SIZE`). 4910 4911It does this by allowing `bc_num_free()` to put the limb array onto a 4912statically-allocated stack (it's just a global array with a set size). Then, 4913when a `BcNum` with the default capacity is needed, `bc_num_init()` asks if any 4914are available. If the answer is yes, the one on top of the stack is returned. 4915Otherwise, `NULL` is returned, and `bc_num_free()` knows it needs to `malloc()` 4916a new limb array. 4917 4918When the stack is filled, any numbers that `bc` attempts to put on it are just 4919freed. 4920 4921This setup saved a few percent in my testing for version [3.0.0][32], which is 4922when I added it. 4923 4924## `bcl` 4925 4926At the request of one of my biggest users, I spent the time to make a build mode 4927where the number and math code of `bc` could be wrapped into a library, which I 4928called `bcl`. 4929 4930This mode is exclusive; `bc` and `dc` themselves are *not* built when building 4931`bcl`. 4932 4933The only things in the `bc` math code that is not included is: 4934 4935* Printing newlines (clients do not care about `bc`'s line lenth restriction). 4936* `dc`'s stream print. 4937 4938Even the [pseudo-random number generator][179] is included, with extra support 4939for generating real numbers with it. (In `bc`, such support is in 4940[`lib2.bc`][26].) 4941 4942### Signal Handling 4943 4944`bcl` does not have the capability for signal handling (as of version `6.0.0`). 4945 4946### Threads 4947 4948It is possible to use `bcl` across multiple threads. However, `bcl` must be 4949initialized on each thread where it will be used. 4950 4951This is because `bcl` uses thread-specific data to store the "globals" for each 4952thread. This is to ensure that threads do not stomp on each other's numbers or 4953other data structures. 4954 4955### Contexts 4956 4957Contexts were an idea by the same user that requested `bcl`. They are meant to 4958make it so multiple clients in one program can keep their data separate from 4959each other. 4960 4961### Numbers 4962 4963Numbers in `bcl` are literally indices into an encapsulated array of numbers, 4964hidden in the context. These indices are then passed to clients to refer to 4965numbers later. 4966 4967### Operand Consumption 4968 4969Most math functions in `bcl` "consume" their operand arguments; the arguments 4970are freed, whether or not an error is returned. 4971 4972This is to make it easy to implement math code, like this: 4973 4974``` 4975n = bcl_add(bcl_mul(a, b), bcl_div(c, d)); 4976``` 4977 4978If numbers need to be preserved, they can be with `bcl_dup()`: 4979 4980``` 4981n = bcl_add(bcl_mul(bcl_dup(a), bc_dup(b)), bcl_div(bcl_dup(c), bcl_dup(d))); 4982``` 4983 4984### Errors 4985 4986Errors can be encoded in the indices representing numbers, and where necessary, 4987clients are responsible for checking those errors. 4988 4989The encoding of errors is this: if an error happens, the value `0-error` is 4990returned. To decode, do the exact same thing. Thus, any index above 4991`0-num_errors` is an error. 4992 4993If an index that represents an error is passed to a math function, that function 4994propagates the error to its result and does not perform the math operation. 4995 4996All of this is to, once again, make it easy to implement the math code as above. 4997 4998However, where possible, errors are returned directly. 4999 5000[1]: https://en.wikipedia.org/wiki/Bus_factor 5001[2]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#top 5002[3]: https://en.wikipedia.org/wiki/Dc_(Unix) 5003[4]: https://en.wikipedia.org/wiki/Reverse_Polish_notation 5004[5]: ./bc/A.1.md#standard-library 5005[6]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc 5006[7]: ./bc/A.1.md#extended-library 5007[8]: #libbc-2 5008[9]: #strgensh 5009[10]: https://vimeo.com/230142234 5010[11]: https://gavinhoward.com/2019/12/values-for-yao/ 5011[12]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf 5012[13]: ./build.md#cross-compiling 5013[14]: ./build.md 5014[15]: #strgenc 5015[16]: http://landley.net/toybox/about.html 5016[17]: https://www.busybox.net/ 5017[18]: https://en.wikipedia.org/wiki/Karatsuba_algorithm 5018[19]: https://clang-analyzer.llvm.org/scan-build.html 5019[20]: https://www.valgrind.org/ 5020[21]: https://clang.llvm.org/docs/AddressSanitizer.html 5021[22]: https://gavinhoward.com/2019/11/finishing-software/ 5022[23]: #history 5023[24]: https://clang.llvm.org/docs/ClangFormat.html 5024[25]: ./algorithms.md 5025[26]: #lib2bc 5026[27]: #vmh 5027[28]: https://github.com/rain-1/linenoise-mob 5028[29]: https://github.com/antirez/linenoise 5029[30]: #bclh 5030[31]: #argsh 5031[32]: ../NEWS.md#3-0-0 5032[33]: ../NEWS.md 5033[34]: https://github.com/skeeto/optparse 5034[35]: #opth 5035[36]: #historyh 5036[37]: #randh 5037[38]: #langh 5038[39]: #numc 5039[40]: #bcc 5040[41]: #bc_lexc 5041[42]: #bc_parsec 5042[43]: #libraryc 5043[44]: #dcc 5044[45]: #dc_lexc 5045[46]: #dc_parsec 5046[47]: #filec 5047[48]: #historyc 5048[49]: #langc 5049[50]: #lexc 5050[51]: #optc 5051[52]: #parsec 5052[53]: #programc 5053[54]: #randc 5054[55]: #fileh 5055[56]: #readc 5056[57]: #programh 5057[58]: #vmc 5058[59]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/gencat.html#top 5059[60]: #manpagesh 5060[61]: #bcl3md 5061[62]: #bcl3 5062[63]: #bclvcxproj 5063[64]: #bclvcxprojfilters 5064[65]: #bclsln 5065[66]: #bcvcxproj 5066[67]: #bcvcxprojfilters 5067[68]: #bcsln 5068[69]: #configuresh 5069[70]: #makefilein 5070[71]: #functionsh 5071[72]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#top 5072[73]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18 5073[74]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html#top 5074[75]: #versionh 5075[76]: ##posix-shell-scripts 5076[77]: #tests 5077[78]: #karatsubapy 5078[79]: #bc-1 5079[80]: #dc-1 5080[81]: ./build.md#build-type 5081[82]: #fuzzing-1 5082[83]: #releasesh 5083[84]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 5084[85]: #locales-1 5085[86]: #manuals-1 5086[87]: #locale_installsh 5087[88]: #locale_uninstallsh 5088[89]: #bc1mdin 5089[90]: #dc1mdin 5090[91]: #bc 5091[92]: https://pandoc.org/ 5092[93]: #release_settingstxt 5093[94]: #aflpy 5094[95]: #randmathpy 5095[96]: #caching-of-numbers 5096[97]: #error-handling 5097[98]: #radamsatxt 5098[99]: https://gitlab.com/akihe/radamsa 5099[100]: #radamsash 5100[101]: https://musl.libc.org/ 5101[102]: ./build.md#settings 5102[103]: #test_settingstxt 5103[104]: #test_settingssh 5104[105]: #functionssh 5105[106]: #bch 5106[107]: https://en.wikipedia.org/wiki/Infix_notation 5107[108]: https://en.wikipedia.org/wiki/Reverse_Polish_notation 5108[109]: https://en.wikipedia.org/wiki/Shunting-yard_algorithm 5109[110]: #bc-parsing 5110[111]: #vectors 5111[112]: https://git.yzena.com/Yzena/Yc/src/branch/master/include/yc/vector.h 5112[113]: https://en.wikipedia.org/wiki/Signal_(IPC) 5113[114]: #custom-io 5114[115]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 5115[116]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html 5116[117]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html 5117[118]: https://www.youtube.com/watch?v=4PaWFYm0kEw 5118[119]: #fuzz_prepsh 5119[120]: #bc_aflyaml 5120[121]: #bc_afl_continueyaml 5121[122]: https://github.com/tmux/tmux 5122[123]: https://tmuxp.git-pull.com/ 5123[124]: #test-suite 5124[125]: https://aflplus.plus/ 5125[126]: #link-time-optimization 5126[127]: #fuzzing-performance 5127[128]: #radamsa 5128[129]: #afl-quickstart 5129[130]: #convenience 5130[131]: #datac 5131[132]: https://git.gavinhoward.com/gavin/vim-bc 5132[133]: https://git.gavinhoward.com/gavin/bc_libs 5133[134]: #debugging 5134[135]: #asserts 5135[136]: #portability 5136[137]: https://pexpect.readthedocs.io/en/stable/ 5137[138]: #test-suite-portability 5138[139]: #historypy 5139[140]: #historysh 5140[141]: #group-tests 5141[142]: #build-system 5142[143]: #generated-tests 5143[144]: #benchmarks-1 5144[145]: #gen 5145[146]: #test-coverage 5146[147]: #integration-with-the-build-system 5147[148]: #test-scripts 5148[149]: #standard-tests 5149[150]: #script-tests 5150[151]: #error-tests 5151[152]: #stdin-tests 5152[153]: #read-tests 5153[154]: #other-tests 5154[155]: #history-tests 5155[156]: #bcl 5156[157]: #bcl-test 5157[158]: #bclc 5158[159]: #valgrind 5159[160]: #addresssanitizer-and-friends 5160[161]: #bc-2 5161[162]: #dc-2 5162[163]: #alltxt-1 5163[164]: #errorstxt 5164[165]: #posix_errorstxt 5165[166]: #timeconstsh 5166[167]: #alltxt-3 5167[168]: #errorstxt-1 5168[169]: #scripts-1 5169[170]: #scripts-2 5170[171]: #alltxt-2 5171[172]: #alltxt-4 5172[173]: #async-signal-safe-signal-handling 5173[174]: #vectorh 5174[175]: #read_errorstxt 5175[176]: #statush 5176[177]: #numbers 5177[178]: #math-style 5178[179]: #pseudo-random-number-generator 5179[180]: #lexh 5180[181]: #parseh 5181[182]: #dch 5182[183]: #libraryh 5183[184]: #numh 5184[185]: #readh 5185[186]: #maps 5186[187]: #slabs-and-slab-vectors 5187[188]: ./build.md#extra-math 5188[189]: #command-line-history 5189[190]: #scriptsed 5190[191]: #linux-timeconstbc-script 5191[192]: #corpuses 5192[193]: ./build.md#history 5193[194]: https://www.valgrind.org/docs/manual/mc-manual.html 5194[195]: #othersh 5195[196]: https://scan.coverity.com/ 5196[197]: https://clang-analyzer.llvm.org/ 5197[198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it 5198[199]: https://cr.yp.to/docs/selfpipe.html 5199[200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph 5200[201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator 5201[202]: https://en.wikipedia.org/wiki/Tail_call 5202[203]: https://en.wikipedia.org/wiki/Functional_programming_language 5203[204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation 5204[205]: #mainc 5205[206]: #argc 5206[207]: https://en.wikipedia.org/wiki/Abstract_syntax_tree 5207[208]: #functions 5208[209]: #parsing 5209[210]: https://en.wikipedia.org/wiki/Stack_machine 5210[211]: https://en.wikipedia.org/wiki/Register_machine 5211[212]: #include 5212[213]: #src 5213[214]: https://gmplib.org/manual/Nomenclature-and-Types 5214[215]: https://en.wikipedia.org/wiki/Radix_point 5215[216]: #main-and-read-functions 5216[215]: https://www.pcg-random.org/ 5217[216]: https://lemire.me/blog/2017/08/22/testing-non-cryptographic-random-number-generators-my-results/ 5218[217]: https://en.wikipedia.org/wiki/Profile-guided_optimization 5219[218]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect 5220[219]: ./build.md#optimization 5221[220]: #benchmarksh 5222[221]: https://cgit.freebsd.org/src/tree/usr.bin/ministat/ministat.c 5223[222]: https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html 5224[223]: #ministatc 5225[224]: #dc 5226[225]: #allsh 5227[226]: #errorssh 5228[227]: #errorsh 5229[228]: #vectorc 5230