1# Development 2 3Updated: 13 Mar 2023 4 5This document is meant for the day when I (Gavin D. Howard) get [hit by a 6bus][1]. In other words, it's meant to make the [bus factor][1] a non-issue. 7 8This document is supposed to contain all of the knowledge necessary to develop 9`bc` and `dc`. 10 11In addition, this document is meant to add to the [oral tradition of software 12engineering][118], as described by Bryan Cantrill. 13 14This document will reference other parts of the repository. That is so a lot of 15the documentation can be closest to the part of the repo where it is actually 16necessary. 17 18## What Is It? 19 20This repository contains an implementation of both [POSIX `bc`][2] and [Unix 21`dc`][3]. 22 23POSIX `bc` is a standard utility required for POSIX systems. `dc` is a 24historical utility that was included in early Unix and even predates both Unix 25and C. They both are arbitrary-precision command-line calculators with their own 26programming languages. `bc`'s language looks similar to C, with infix notation 27and including functions, while `dc` uses [Reverse Polish Notation][4] and allows 28the user to execute strings as though they were functions. 29 30In addition, it is also possible to build the arbitrary-precision math as a 31library, named [`bcl`][156]. 32 33**Note**: for ease, I will refer to both programs as `bc` in this document. 34However, if I say "just `bc`," I am referring to just `bc`, and if I say `dc`, I 35am referring to just `dc`. 36 37### History 38 39This project started in January 2018 when a certain individual on IRC, hearing 40that I knew how to write parsers, asked me to write a `bc` parser for his math 41library. I did so. I thought about writing my own math library, but he 42disparaged my programming skills and made me think that I couldn't do it. 43 44However, he took so long to do it that I eventually decided to give it a try and 45had a working math portion in two weeks. It taught me that I should not listen 46to such people. 47 48From that point, I decided to make it an extreme learning experience about how 49to write quality software. 50 51That individual's main goal had been to get his `bc` into [toybox][16], and I 52managed to get my own `bc` in. I also got it in [busybox][17]. 53 54Eventually, in late 2018, I also decided to try my hand at implementing 55[Karatsuba multiplication][18], an algorithm that that unnamed individual 56claimed I could never implement. It took me a bit, but I did it. 57 58This project became a passion project for me, and I continued. In mid-2019, 59Stefan Eßer suggested I improve performance by putting more than 1 digit in each 60section of the numbers. After I showed immaturity because of some burnout, I 61implemented his suggestion, and the results were incredible. 62 63Since that time, I have gradually been improving the `bc` as I have learned more 64about things like fuzzing, [`scan-build`][19], [valgrind][20], 65[AddressSanitizer][21] (and the other sanitizers), and many other things. 66 67One of my happiest moments was when my `bc` was made the default in FreeBSD. 68Another happiest moment was when I found out that my `bc` had shipped with Mac 69OSX Ventura, without my knowledge. 70 71But since I believe in [finishing the software I write][22], I have done less 72work on `bc` over time, though there are still times when I put a lot of effort 73in, such as now (17 June 2021), when I am attempting to convince OpenBSD to use 74my `bc`. 75 76And that is why I am writing this document: someday, someone else is going to 77want to change my code, and this document is my attempt to make it as simple as 78possible. 79 80### Values 81 82[According to Bryan Cantrill][10], all software has values. I think he's 83correct, though I [added one value for programming languages in particular][11]. 84 85However, for `bc`, his original list will do: 86 87* Approachability 88* Availability 89* Compatibility 90* Composability 91* Debuggability 92* Expressiveness 93* Extensibility 94* Interoperability 95* Integrity 96* Maintainability 97* Measurability 98* Operability 99* Performance 100* Portability 101* Resiliency 102* Rigor 103* Robustness 104* Safety 105* Security 106* Simplicity 107* Stability 108* Thoroughness 109* Transparency 110* Velocity 111 112There are several values that don't apply. The reason they don't apply is 113because `bc` and `dc` are existing utilities; this is just another 114reimplementation. The designs of `bc` and `dc` are set in stone; there is 115nothing we can do to change them, so let's get rid of those values that would 116apply to their design: 117 118* Compatibility 119* Integrity 120* Maintainability 121* Measurability 122* Performance 123* Portability 124* Resiliency 125* Rigor 126* Robustness 127* Safety 128* Security 129* Simplicity 130* Stability 131* Thoroughness 132* Transparency 133 134Furthermore, some of the remaining ones don't matter to me, so let me get rid of 135those and order the rest according to my *actual* values for this project: 136 137* Robustness 138* Stability 139* Portability 140* Compatibility 141* Performance 142* Security 143* Simplicity 144 145First is **robustness**. This `bc` and `dc` should be robust, accepting any 146input, never crashing, and instead, returning an error. 147 148Closely related to that is **stability**. The execution of `bc` and `dc` should 149be deterministic and never change for the same inputs, including the 150pseudo-random number generator (for the same seed). 151 152Third is **portability**. These programs should run everywhere that POSIX 153exists, as well as Windows. This means that just about every person on the 154planet will have access to these programs. 155 156Next is **compatibility**. These programs should, as much as possible, be 157compatible with other existing implementations and standards. 158 159Then we come to **performance**. A calculator is only usable if it's fast, so 160these programs should run as fast as possible. 161 162After that is **security**. These programs should *never* be the reason a user's 163computer is compromised. 164 165And finally, **simplicity**. Where possible, the code should be simple, while 166deferring to the above values. 167 168Keep these values in mind for the rest of this document, and for exploring any 169other part of this repo. 170 171#### Portability 172 173But before I go on, I want to talk about portability in particular. 174 175Most of these principles just require good attention and care, but portability 176is different. Sometimes, it requires pulling in code from other places and 177adapting it. In other words, sometimes I need to duplicate and adapt code. 178 179This happened in a few cases: 180 181* Option parsing (see [`include/opt.h`][35]). 182* History (see [`include/history.h`][36]). 183* Pseudo-Random Number Generator (see [`include/rand.h`][37]). 184 185This was done because I decided to ensure that `bc`'s dependencies were 186basically zero. In particular, either users have a normal install of Windows or 187they have a POSIX system. 188 189A POSIX system limited me to C99, `sh`, and zero external dependencies. That 190last item is why I pull code into `bc`: if I pull it in, it's not an external 191dependency. 192 193That's why `bc` has duplicated code. Remove it, and you risk `bc` not being 194portable to some platforms. 195 196## Suggested Course 197 198I do have a suggested course for programmers to follow when trying to understand 199this codebase. The order is this: 200 2011. `bc` Spec. 2022. Manpages. 2033. Test suite. 2044. Understand the build. 2055. Algorithms manual. 2066. Code concepts. 2077. Repo structure. 2088. Headers. 2099. Source code. 210 211This order roughly follows this order: 212 2131. High-level requirements 2142. Low-level requirements 2153. High-level implementation 2164. Low-level implementation 217 218In other words, first understand what the code is *supposed* to do, then 219understand the code itself. 220 221## Useful External Tools 222 223I have a few tools external to `bc` that are useful: 224 225* A [Vim plugin with syntax files made specifically for my `bc` and `dc`][132]. 226* A [repo of `bc` and `dc` scripts][133]. 227* A set of `bash` aliases (see below). 228* A `.bcrc` file with items useful for my `bash` setup (see below). 229 230My `bash` aliases are these: 231 232```sh 233alias makej='make -j16' 234alias mcmake='make clean && make' 235alias mcmakej='make clean && make -j16' 236alias bcdebug='CPPFLAGS="-DBC_DEBUG_CODE=1" CFLAGS="-Weverything -Wno-padded \ 237 -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \ 238 -Wno-unreachable-code-return -Wno-missing-noreturn \ 239 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 240 -pedantic -std=c99" ./configure.sh' 241alias bcconfig='CFLAGS="-Weverything -Wno-padded -Wno-switch-enum \ 242 -Wno-format-nonliteral -Wno-cast-align -Wno-unreachable-code-return \ 243 -Wno-missing-noreturn -Wno-disabled-macro-expansion -Wno-unreachable-code \ 244 -Wall -Wextra -pedantic -std=c99" ./configure.sh' 245alias bcnoassert='CPPFLAGS="-DNDEBUG" CFLAGS="-Weverything -Wno-padded \ 246 -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \ 247 -Wno-unreachable-code-return -Wno-missing-noreturn \ 248 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 249 -pedantic -std=c99" ./configure.sh' 250alias bcdebugnoassert='CPPFLAGS="-DNDEBUG -DBC_DEBUG_CODE=1" \ 251 CFLAGS="-Weverything -Wno-padded -Wno-switch-enum -Wno-format-nonliteral \ 252 -Wno-cast-align -Wno-unreachable-code-return -Wno-missing-noreturn \ 253 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 254 -pedantic -std=c99" ./configure.sh' 255alias bcunset='unset BC_LINE_LENGTH && unset BC_ENV_ARGS' 256``` 257 258`makej` runs `make` with all of my cores. 259 260`mcmake` runs `make clean` before running `make`. It will take a target on the 261command-line. 262 263`mcmakej` is a combination of `makej` and `mcmake`. 264 265`bcdebug` configures `bc` for a full debug build, including `BC_DEBUG_CODE` (see 266[Debugging][134] below). 267 268`bcconfig` configures `bc` with Clang (Clang is my personal default compiler) 269using full warnings, with a few really loud and useless warnings turned off. 270 271`bcnoassert` configures `bc` to not have asserts built in. 272 273`bcdebugnoassert` is like `bcnoassert`, except it also configures `bc` for debug 274mode. 275 276`bcunset` unsets my personal `bc` environment variables, which are set to: 277 278```sh 279export BC_ENV_ARGS="-l $HOME/.bcrc" 280export BC_LINE_LENGTH="74" 281``` 282 283Unsetting these environment variables are necessary for running 284[`scripts/release.sh`][83] because otherwise, it will error when attempting to 285run `bc -s` on my `$HOME/.bcrc`. 286 287Speaking of which, the contents of that file are: 288 289```bc 290define void print_time_unit(t){ 291 if(t<10)print "0" 292 if(t<1&&t)print "0" 293 print t,":" 294} 295define void sec2time(t){ 296 auto s,m,h,d,r 297 r=scale 298 scale=0 299 t=abs(t) 300 s=t%60 301 t-=s 302 m=t/60%60 303 t-=m 304 h=t/3600%24 305 t-=h 306 d=t/86400 307 if(d)print_time_unit(d) 308 if(h)print_time_unit(h) 309 print_time_unit(m) 310 if(s<10)print "0" 311 if(s<1&&s)print "0" 312 s 313 scale=r 314} 315define minutes(secs){ 316 return secs/60; 317} 318define hours(secs){ 319 return secs/3600; 320} 321define days(secs){ 322 return secs/3600/24; 323} 324define years(secs){ 325 return secs/3600/24/365.25; 326} 327define fbrand(b,p){ 328 auto l,s,t 329 b=abs(b)$ 330 if(b<2)b=2 331 s=scale 332 t=b^abs(p)$ 333 l=ceil(l2(t),0) 334 if(l>scale)scale=l 335 t=irand(t)/t 336 scale=s 337 return t 338} 339define ifbrand(i,b,p){return irand(abs(i)$)+fbrand(b,p)} 340``` 341 342This allows me to use `bc` as part of my `bash` prompt. 343 344## Code Style 345 346The code style for `bc` is...weird, and that comes from historical accident. 347 348In [History][23], I mentioned how I got my `bc` in [toybox][16]. Well, in order 349to do that, my `bc` originally had toybox style. Eventually, I changed to using 350tabs, and assuming they were 4 spaces wide, but other than that, I basically 351kept the same style, with some exceptions that are more or less dependent on my 352taste. 353 354However, I later managed to get [ClangFormat][24] to work, so I changed the 355style to that. 356 357### ClangFormat 358 359The style is now defined as whatever [ClangFormat][24] outputs using the 360existing `.clang-format` file. More precisely, the style is whatever is output 361when the following command is run in the root directory: 362 363``` 364./scripts/format.sh 365``` 366 367### Historical Style 368 369The code style used to be: 370 371* Tabs are 4 spaces. 372* Tabs are used at the beginning of lines for indent. 373* Spaces are used for alignment. 374* Lines are limited to 80 characters, period. 375* Pointer asterisk (`*`) goes with the variable (on the right), not the type, 376 unless it is for a pointer type returned from a function. 377* The opening brace is put on the same line as the header for the function, 378 loop, or `if` statement. 379* Unless the header is more than one line, in which case the opening brace is 380 put on its own line. 381* If the opening brace is put on its own line, there is no blank line after it. 382* If the opening brace is *not* put on its own line, there *is* a blank line 383 after it, *unless* the block is only one or two lines long. 384* Code lines are grouped into what I call "paragraphs." Basically, lines that 385 seem like they should go together are grouped together. This one comes down 386 to judgment. 387* Bodies of `if` statements, `else` statements, and loops that are one line 388 long are put on the same line as the statement, unless the header is more than 389 one line long, and/or, the header and body cannot fit into 80 characters with 390 a space inbetween them. 391* If single-line bodies are on a separate line from their headers, and the 392 headers are only a single line, then no braces are used. 393* However, braces are *always* used if they contain another `if` statement or 394 loop. 395* Loops with empty bodies are ended with a semicolon. 396* Expressions that return a boolean value are surrounded by paretheses. 397* Macro backslashes are aligned as far to the left as possible. 398* Binary operators have spaces on both sides. 399* If a line with binary operators overflows 80 characters, a newline is inserted 400 *after* binary operators. 401* Function modifiers and return types are on the same line as the function name. 402* With one exception, `goto`'s are only used to jump to the end of a function 403 for cleanup. 404* All structs, enums, and unions are `typedef`'ed. 405* All constant data is in one file: [`src/data.c`][131], but the corresponding 406 `extern` declarations are in the appropriate header file. 407* All local variables are declared at the beginning of the scope where they 408 appear. They may be initialized at that point, if it does not invoke UB or 409 otherwise cause bugs. 410* All precondition `assert()`'s (see [Asserts][135]) come *after* local variable 411 declarations. 412* Besides short `if` statements and loops, there should *never* be more than one 413 statement per line. 414 415## Repo Structure 416 417Functions are documented with Doxygen-style doc comments. Functions that appear 418in headers are documented in the headers, while static functions are documented 419where they are defined. 420 421### `configure` 422 423A symlink to [`configure.sh`][69]. 424 425### `configure.sh` 426 427This is the script to configure `bc` and [`bcl`][156] for building. 428 429This `bc` has a custom build system. The reason for this is because of 430[*portability*][136]. 431 432If `bc` used an outside build system, that build system would be an external 433dependency. Thus, I had to write a build system for `bc` that used nothing but 434C99 and POSIX utilities. 435 436One of those utilities is POSIX `sh`, which technically implements a 437Turing-complete programming language. It's a terrible one, but it works. 438 439A user that wants to build `bc` on a POSIX system (not Windows) first runs 440`configure.sh` with the options he wants. `configure.sh` uses those options and 441the `Makefile` template ([`Makefile.in`][70]) to generate an actual valid 442`Makefile`. Then `make` can do the rest. 443 444For more information about the build process, see the [Build System][142] 445section and the [build manual][14]. 446 447For more information about shell scripts, see [POSIX Shell Scripts][76]. 448 449`configure.sh` does the following: 450 4511. It processes command-line arguments and figure out what the user wants to 452 build. 4532. It reads in [`Makefile.in`][70]. 4543. One-by-one, it replaces placeholders (in [`Makefile.in`][70]) of the form 455 `%%<placeholder_name>%%` based on the [build type][81]. 4564. It appends a list of file targets based on the [build type][81]. 4575. It appends the correct test targets. 4586. It copies the correct manpage and markdown manual for `bc` and `dc` into a 459 location from which they can be copied for install. 4607. It does a `make clean` to reset the build state. 461 462### `.gitattributes` 463 464A `.gitattributes` file. This is needed to preserve the `crlf` line endings in 465the Visual Studio files. 466 467### `.gitignore` 468 469The `.gitignore` 470 471### `LICENSE.md` 472 473This is the `LICENSE` file, including the licenses of various software that I 474have borrowed. 475 476### `Makefile.in` 477 478This is the `Makefile` template for [`configure.sh`][69] to use for generating a 479`Makefile`. 480 481For more information, see [`configure.sh`][69], the [Build System][142] section, 482and the [build manual][14]. 483 484Because of [portability][136], the generated `Makefile.in` should be a pure 485[POSIX `make`][74]-compatible `Makefile` (minus the placeholders). Here are a 486few snares for the unwary programmer in this file: 487 4881. No extensions allowed, including and especially GNU extensions. 4892. If new headers are added, they must also be added to `Makefile.in`. 4903. Don't delete the `.POSIX:` empty target at the top; that's what tells `make` 491 implementations that pure [POSIX `make`][74] is needed. 492 493In particular, there is no way to set up variables other than the `=` operator. 494There are no conditionals, so all of the conditional stuff must be in 495[`configure.sh`][69]. This is, in fact, why [`configure.sh`][69] exists in the 496first place: [POSIX `make`][74] is barebones and only does a build with no 497configuration. 498 499### `NEWS.md` 500 501A running changelog with an entry for each version. This should be updated at 502the same time that [`include/version.h`][75] is. 503 504### `NOTICE.md` 505 506The `NOTICE` file with proper attributions. 507 508### `README.md` 509 510The `README`. Read it. 511 512### `benchmarks/` 513 514The folder containing files to generate benchmarks. 515 516Each of these files was made, at one time or another, to benchmark some 517experimental feature, so if it seems there is no rhyme or reason to these 518benchmarks, it is because there is none, besides historical accident. 519 520#### `bc/` 521 522The folder containing `bc` scripts to generate `bc` benchmarks. 523 524##### `add.bc` 525 526The file to generate the benchmark to benchmark addition in `bc`. 527 528##### `arrays_and_constants.bc` 529 530The file to generate the benchmark to benchmark `bc` using lots of array names 531and constants. 532 533##### `arrays.bc` 534 535The file to generate the benchmark to benchmark `bc` using lots of array names. 536 537##### `constants.bc` 538 539The file to generate the benchmark to benchmark `bc` using lots of constants. 540 541##### `divide.bc` 542 543The file to generate the benchmark to benchmark division in `bc`. 544 545##### `functions.bc` 546 547The file to generate the benchmark to benchmark `bc` using lots of functions. 548 549##### `irand_long.bc` 550 551The file to generate the benchmark to benchmark `bc` using lots of calls to 552`irand()` with large bounds. 553 554##### `irand_short.bc` 555 556The file to generate the benchmark to benchmark `bc` using lots of calls to 557`irand()` with small bounds. 558 559##### `lib.bc` 560 561The file to generate the benchmark to benchmark `bc` using lots of calls to 562heavy functions in `lib.bc`. 563 564##### `multiply.bc` 565 566The file to generate the benchmark to benchmark multiplication in `bc`. 567 568##### `postfix_incdec.bc` 569 570The file to generate the benchmark to benchmark `bc` using postfix increment and 571decrement operators. 572 573##### `power.bc` 574 575The file to generate the benchmark to benchmark power (exponentiation) in `bc`. 576 577##### `subtract.bc` 578 579The file to generate the benchmark to benchmark subtraction in `bc`. 580 581##### `strings.bc` 582 583The file to generate the benchmark to benchmark `bc` using lots of strings. 584 585#### `dc/` 586 587The folder containing `dc` scripts to generate `dc` benchmarks. 588 589##### `modexp.dc` 590 591The file to generate the benchmark to benchmark modular exponentiation in `dc`. 592 593### `gen/` 594 595A folder containing the files necessary to generate C strings that will be 596embedded in the executable. 597 598All of the files in this folder have license headers, but the program and script 599that can generate strings from them include code to strip the license header out 600before strings are generated. 601 602#### `bc_help.txt` 603 604A text file containing the text displayed for `bc -h` or `bc --help`. 605 606This text just contains the command-line options and a short summary of the 607differences from GNU and BSD `bc`'s. It also directs users to the manpage. 608 609The reason for this is because otherwise, the help would be far too long to be 610useful. 611 612**Warning**: The text has some `printf()` format specifiers. You need to make 613sure the format specifiers match the arguments given to `bc_file_printf()`. 614 615#### `dc_help.txt` 616 617A text file containing the text displayed for `dc -h` or `dc --help`. 618 619This text just contains the command-line options and a short summary of the 620differences from GNU and BSD `dc`'s. It also directs users to the manpage. 621 622The reason for this is because otherwise, the help would be far too long to be 623useful. 624 625**Warning**: The text has some `printf()` format specifiers. You need to make 626sure the format specifiers match the arguments given to `bc_file_printf()`. 627 628#### `lib.bc` 629 630A `bc` script containing the [standard math library][5] required by POSIX. See 631the [POSIX standard][2] for what is required. 632 633This file does not have any extraneous whitespace, except for tabs at the 634beginning of lines. That is because this data goes directly into the binary, 635and whitespace is extra bytes in the binary. Thus, not having any extra 636whitespace shrinks the resulting binary. 637 638However, tabs at the beginning of lines are kept for two reasons: 639 6401. Readability. (This file is still code.) 6412. The program and script that generate strings from this file can remove 642 tabs at the beginning of lines. 643 644For more details about the algorithms used, see the [algorithms manual][25]. 645 646However, there are a few snares for unwary programmers. 647 648First, all constants must be one digit. This is because otherwise, multi-digit 649constants could be interpreted wrongly if the user uses a different `ibase`. 650This does not happen with single-digit numbers because they are guaranteed to be 651interpreted what number they would be if the `ibase` was as high as possible. 652 653This is why `A` is used in the library instead of `10`, and things like `2*9*A` 654for `180` in [`lib2.bc`][26]. 655 656As an alternative, you can set `ibase` in the function, but if you do, make sure 657to set it with a single-digit number and beware the snare below... 658 659Second, `scale`, `ibase`, and `obase` must be safely restored before returning 660from any function in the library. This is because without the `-g` option, 661functions are allowed to change any of the globals. 662 663Third, all local variables in a function must be declared in an `auto` statement 664before doing anything else. This includes arrays. However, function parameters 665are considered predeclared. 666 667Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code 668must not use *any* extensions. It has to work when users use the `-s` or `-w` 669flags. 670 671#### `lib2.bc` 672 673A `bc` script containing the [extended math library][7]. 674 675Like [`lib.bc`][8], and for the same reasons, this file should have no 676extraneous whitespace, except for tabs at the beginning of lines. 677 678For more details about the algorithms used, see the [algorithms manual][25]. 679 680Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary 681programmers when writing code for `lib2.bc`. 682 683#### `strgen.c` 684 685Code for the program to generate C strings from text files. This is the original 686program, although [`strgen.sh`][9] was added later. 687 688The reason I used C here is because even though I knew `sh` would be available 689(it must be available to run `configure.sh`), I didn't know how to do what I 690needed to do with POSIX utilities and `sh`. 691 692Later, [`strgen.sh`][9] was contributed by Stefan Eßer of FreeBSD, showing that 693it *could* be done with `sh` and POSIX utilities. 694 695However, `strgen.c` exists *still* exists because the versions generated by 696[`strgen.sh`][9] may technically hit an environmental limit. (See the [draft C99 697standard][12], page 21.) This is because [`strgen.sh`][9] generates string 698literals, and in C99, string literals can be limited to 4095 characters, and 699`gen/lib2.bc` is above that. 700 701Fortunately, the limit for "objects," which include `char` arrays, is much 702bigger: 65535 bytes, so that's what `strgen.c` generates. 703 704However, the existence of `strgen.c` does come with a cost: the build needs C99 705compiler that targets the host machine. For more information, see the ["Cross 706Compiling" section][13] of the [build manual][14]. 707 708Read the comments in `strgen.c` for more detail about it, the arguments it 709takes, and how it works. 710 711#### `strgen.sh` 712 713An `sh` script that will generate C strings that uses only POSIX utilities. This 714exists for those situations where a host C99 compiler is not available, and the 715environment limits mentioned above in [`strgen.c`][15] don't matter. 716 717`strgen.sh` takes the same arguments as [`strgen.c`][15], and the arguments mean 718the exact same things, so see the comments in [`strgen.c`][15] for more detail 719about that, and see the comments in `strgen.sh` for more details about it and 720how it works. 721 722For more information about shell scripts, see [POSIX Shell Scripts][76]. 723 724### `include/` 725 726A folder containing the headers. 727 728The headers are not included among the source code because I like it better that 729way. Also there were folders within `src/` at one point, and I did not want to 730see `#include "../some_header.h"` or things like that. 731 732So all headers are here, even though only one ([`bcl.h`][30]) is meant for end 733users (to be installed in `INCLUDEDIR`). 734 735#### `args.h` 736 737This file is the API for processing command-line arguments. 738 739#### `bc.h` 740 741This header is the API for `bc`-only items. This includes the `bc_main()` 742function and the `bc`-specific lexing and parsing items. 743 744The `bc` parser is perhaps the most sensitive part of the entire codebase. See 745the documentation in `bc.h` for more information. 746 747The code associated with this header is in [`src/bc.c`][40], 748[`src/bc_lex.c`][41], and [`src/bc_parse.c`][42]. 749 750#### `bcl.h` 751 752This header is the API for the [`bcl`][156] library. 753 754This header is meant for distribution to end users and contains the API that end 755users of [`bcl`][156] can use in their own software. 756 757This header, because it's the public header, is also the root header. That means 758that it has platform-specific fixes for Windows. (If the fixes were not in this 759header, the build would fail on Windows.) 760 761The code associated with this header is in [`src/library.c`][43]. 762 763#### `dc.h` 764 765This header is the API for `dc`-only items. This includes the `dc_main()` 766function and the `dc`-specific lexing and parsing items. 767 768The code associated with this header is in [`src/dc.c`][44], 769[`src/dc_lex.c`][45], and [`src/dc_parse.c`][46]. 770 771#### `file.h` 772 773This header is for `bc`'s internal buffered I/O API. 774 775For more information about `bc`'s error handling and custom buffered I/O, see 776[Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the 777notes about version [`3.0.0`][32] in the [`NEWS`][32]. 778 779The code associated with this header is in [`src/file.c`][47]. 780 781#### `history.h` 782 783This header is for `bc`'s implementation of command-line editing/history, which 784is based on a [UTF-8-aware fork][28] of [`linenoise`][29]. 785 786For more information, see the [Command-Line History][189] section. 787 788The code associated with this header is in [`src/history.c`][48]. 789 790#### `lang.h` 791 792This header defines the data structures and bytecode used for actual execution 793of `bc` and `dc` code. 794 795Yes, it's misnamed; that's an accident of history where the first things I put 796into it all seemed related to the `bc` language. 797 798The code associated with this header is in [`src/lang.c`][49]. 799 800#### `lex.h` 801 802This header defines the common items that both programs need for lexing. 803 804The code associated with this header is in [`src/lex.c`][50], 805[`src/bc_lex.c`][41], and [`src/dc_lex.c`][45]. 806 807#### `library.h` 808 809This header defines the things needed for [`bcl`][156] that users should *not* 810have access to. In other words, [`bcl.h`][30] is the *public* header for the 811library, and this header is the *private* header for the library. 812 813The code associated with this header is in [`src/library.c`][43]. 814 815#### `num.h` 816 817This header is the API for numbers and math. 818 819The code associated with this header is in [`src/num.c`][39]. 820 821#### `opt.h` 822 823This header is the API for parsing command-line arguments. 824 825It's different from [`args.h`][31] in that [`args.h`][31] is for the main code 826to process the command-line arguments into global data *after* they have already 827been parsed by `opt.h` into proper tokens. In other words, `opt.h` actually 828parses the command-line arguments, and [`args.h`][31] turns that parsed data 829into flags (bits), strings, and expressions that will be used later. 830 831Why are they separate? Because originally, `bc` used `getopt_long()` for 832parsing, so [`args.h`][31] was the only one that existed. After it was 833discovered that `getopt_long()` has different behavior on different platforms, I 834adapted a [public-domain option parsing library][34] to do the job instead. And 835in doing so, I gave it its own header. 836 837They could probably be combined, but I don't really care enough at this point. 838 839The code associated with this header is in [`src/opt.c`][51]. 840 841#### `parse.h` 842 843This header defines the common items that both programs need for parsing. 844 845Note that the parsers don't produce abstract syntax trees (AST's) or any 846intermediate representations. They produce bytecode directly. In other words, 847they don't have special data structures except what they need to do their job. 848 849The code associated with this header is in [`src/parse.c`][50], 850[`src/bc_lex.c`][42], and [`src/dc_lex.c`][46]. 851 852#### `program.h` 853 854This header defines the items needed to manage the data structures in 855[`lang.h`][38] as well as any helper functions needed to generate bytecode or 856execute it. 857 858The code associated with this header is in [`src/program.c`][53]. 859 860#### `rand.h` 861 862This header defines the API for the [pseudo-random number generator 863(PRNG)][179]. 864 865The PRNG only generates fixed-size integers. The magic of generating random 866numbers of arbitrary size is actually given to the code that does math 867([`src/num.c`][39]). 868 869The code associated with this header is in [`src/rand.c`][54]. 870 871#### `read.h` 872 873This header defines the API for reading from files and `stdin`. 874 875Thus, [`file.h`][55] is really for buffered *output*, while this file is for 876*input*. There is no buffering needed for `bc`'s inputs. 877 878The code associated with this header is in [`src/read.c`][56]. 879 880#### `status.h` 881 882This header has several things: 883 884* A list of possible errors that internal `bc` code can use. 885* Compiler-specific fixes. 886* Platform-specific fixes. 887* Macros for `bc`'s [error handling][97]. 888 889There is no code associated with this header. 890 891#### `vector.h` 892 893This header defines the API for the vectors (resizable arrays) that are used for 894data structures. 895 896Vectors are what do the heavy lifting in almost all of `bc`'s data structures. 897Even the maps of identifiers and arrays use vectors. 898 899The code associated with this header is in [`src/vector.c`][228]. 900 901#### `version.h` 902 903This header defines the version of `bc`. 904 905There is no code associated with this header. 906 907#### `vm.h` 908 909This header defines the API for setting up and running `bc` and `dc`. 910 911It is so named because I think of it as the "virtual machine" of `bc`, though 912that is probably not true as [`program.h`][57] is probably the "virtual machine" 913API. Thus, the name is more historical accident. 914 915The code associated with this header is in [`src/vm.c`][58]. 916 917### `locales/` 918 919This folder contains a bunch of `.msg` files and soft links to the real `.msg` 920files. This is how locale support is implemented in `bc`. 921 922The files are in the format required by the [`gencat`][59] POSIX utility. They 923all have the same messages, in the same order, with the same numbering, under 924the same groups. This is because the locale system expects those messages in 925that order. 926 927The softlinks exist because for many locales, they would contain the exact same 928information. To prevent duplication, they are simply linked to a master copy. 929 930The naming format for all files is: 931 932``` 933<language_code>_<country_code>.<encoding>.msg 934``` 935 936This naming format must be followed for all locale files. 937 938### `manuals/` 939 940This folder contains the documentation for `bc`, `dc`, and [`bcl`][156], along 941with a few other manuals. 942 943#### `algorithms.md` 944 945This file explains the mathematical algorithms that are used. 946 947The hope is that this file will guide people in understanding how the math code 948works. 949 950#### `bc.1.md.in` 951 952This file is a template for the markdown version of the `bc` manual and 953manpages. 954 955For more information about how the manpages and markdown manuals are generated, 956and for why, see [`scripts/manpage.sh`][60] and [Manuals][86]. 957 958#### `bcl.3` 959 960This is the manpage for the [`bcl`][156] library. It is generated from 961[`bcl.3.md`][61] using [`scripts/manpage.sh`][60]. 962 963For the reason why I check generated data into the repo, see 964[`scripts/manpage.sh`][60] and [Manuals][86]. 965 966#### `bcl.3.md` 967 968This is the markdown manual for the [`bcl`][156] library. It is the source for the 969generated [`bcl.3`][62] file. 970 971#### `benchmarks.md` 972 973This is a document that compares this `bc` to GNU `bc` in various benchmarks. It 974was last updated when version [`3.0.0`][32] was released. 975 976It has very little documentation value, other than showing what compiler options 977are useful for performance. 978 979#### `build.md` 980 981This is the [build manual][14]. 982 983This `bc` has a custom build system. The reason for this is because of 984[*portability*][136]. 985 986If `bc` used an outside build system, that build system would be an external 987dependency. Thus, I had to write a build system for `bc` that used nothing but 988C99 and POSIX utilities, including barebones [POSIX `make`][74]. 989 990for more information about the build system, see the [build system][142] 991section, the [build manual][14], [`configure.sh`][69], and [`Makefile.in`][70]. 992 993#### `dc.1.md.in` 994 995This file is a template for the markdown version of the `dc` manual and 996manpages. 997 998For more information about how the manpages and markdown manuals are generated, 999and for why, see [`scripts/manpage.sh`][60] and [Manuals][86]. 1000 1001#### `development.md` 1002 1003The file you are reading right now. 1004 1005#### `header_bcl.txt` 1006 1007Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper 1008header. 1009 1010For more information about generating manuals, see [`scripts/manpage.sh`][60] 1011and [Manuals][86]. 1012 1013#### `header_bc.txt` 1014 1015Used by [`scripts/manpage.sh`][60] to give the [generated `bc` manpages][79] a 1016proper header. 1017 1018For more information about generating manuals, see [`scripts/manpage.sh`][60] 1019and [Manuals][86]. 1020 1021#### `header_dc.txt` 1022 1023Used by [`scripts/manpage.sh`][60] to give the [generated `dc` manpages][80] a 1024proper header. 1025 1026For more information about generating manuals, see [`scripts/manpage.sh`][60] 1027and [Manuals][86]. 1028 1029#### `header.txt` 1030 1031Used by [`scripts/manpage.sh`][60] to give all generated manpages a license 1032header. 1033 1034For more information about generating manuals, see [`scripts/manpage.sh`][60] 1035and [Manuals][86]. 1036 1037#### `release.md` 1038 1039A checklist that I try to somewhat follow when making a release. 1040 1041#### `bc/` 1042 1043A folder containing the `bc` manuals. 1044 1045Each `bc` manual corresponds to a [build type][81]. See that link for more 1046details. 1047 1048For each manual, there are two copies: the markdown version generated from the 1049template, and the manpage generated from the markdown version. 1050 1051#### `dc/` 1052 1053A folder containing the `dc` manuals. 1054 1055Each `dc` manual corresponds to a [build type][81]. See that link for more 1056details. 1057 1058For each manual, there are two copies: the markdown version generated from the 1059template, and the manpage generated from the markdown version. 1060 1061### `scripts/` 1062 1063This folder contains helper scripts. Most of them are written in pure [POSIX 1064`sh`][72], but three ([`afl.py`][94], [`karatsuba.py`][78], and 1065[`randmath.py`][95]) are written in Python 3, and one ([`ministat.c`][223]) is 1066written in C. [`ministat.c`][223] in particular is copied from elsewhere. 1067 1068For more information about the shell scripts, see [POSIX Shell Scripts][76]. 1069 1070#### `afl.py` 1071 1072This script is meant to be used as part of the fuzzing workflow. 1073 1074It does one of two things: checks for valid crashes, or runs `bc` and or `dc` 1075under all of the paths found by [AFL++][125]. 1076 1077See [Fuzzing][82] for more information about fuzzing, including this script. 1078 1079#### `alloc.sh` 1080 1081This script is a quick and dirty script to test whether or not the garbage 1082collection mechanism of the [`BcNum` caching][96] works. It has been little-used 1083because it tests something that is not important to correctness. 1084 1085#### `benchmark.sh` 1086 1087A script making it easy to run benchmarks and to run the executable produced by 1088[`ministat.c`][223] on them. 1089 1090For more information, see the [Benchmarks][144] section. 1091 1092#### `bitfuncgen.c` 1093 1094A source file for an executable to generate tests for `bc`'s bitwise functions 1095in [`gen/lib2.bc`][26]. The executable is `scripts/bitfuncgen`, and it is built 1096with `make bitfuncgen`. It produces the test on `stdout` and the expected 1097results on `stderr`. This means that to generat tests, use the following 1098invokation: 1099 1100``` 1101scripts/bitfuncgen > tests/bc/bitfuncs.txt 2> tests/bc/bitfuncs_results.txt 1102``` 1103 1104It calls `abort()` if it runs into an error. 1105 1106#### `exec-install.sh` 1107 1108This script is the magic behind making sure `dc` is installed properly if it's 1109a symlink to `bc`. It checks to see if it is a link, and if so, it just creates 1110a new symlink in the install directory. Of course, it also installs `bc` itself, 1111or `dc` when it's alone. 1112 1113#### `functions.sh` 1114 1115This file is a bunch of common functions for most of the POSIX shell scripts. It 1116is not supposed to be run; instead, it is *sourced* by other POSIX shell 1117scripts, like so: 1118 1119``` 1120. "$scriptdir/functions.sh" 1121``` 1122 1123or the equivalent, depending on where the sourcing script is. 1124 1125For more information about the shell scripts, see [POSIX Shell Scripts][76]. 1126 1127#### `fuzz_prep.sh` 1128 1129Fuzzing is a regular activity when I am preparing for a release. 1130 1131This script handles all the options and such for building a fuzzable binary. 1132Instead of having to remember a bunch of options, I just put them in this script 1133and run the script when I want to fuzz. 1134 1135For more information about fuzzing, see [Fuzzing][82]. 1136 1137#### `karatsuba.py` 1138 1139This script has at least one of two major differences from most of the other 1140scripts: 1141 1142* It's written in Python 3. 1143* It's meant for software packagers. 1144 1145For example, [`scripts/afl.py`][94] and [`scripts/randmath.py`][95] are both in 1146Python 3, but they are not meant for the end user or software packagers and are 1147not included in source distributions. But this script is. 1148 1149This script breaks my rule of only POSIX utilities necessary for package 1150maintainers, but there's a very good reason for that: it's only meant to be run 1151*once* when the package is created for the first time, and maybe not even then. 1152 1153You see, this script does two things: it tests the Karatsuba implementation at 1154various settings for `KARATSUBA_LEN`, and it figures out what the optimal 1155`KARATSUBA_LEN` is for the machine that it is running on. 1156 1157Package maintainers can use this script, when creating a package for this `bc`, 1158to figure out what is optimal for their users. Then they don't have to run it 1159ever again. So this script only has to run on the packagers machine. 1160 1161I tried to write the script in `sh`, by the way, and I finally accepted the 1162tradeoff of using Python 3 when it became too hard. 1163 1164However, I also mentioned that it's for testing Karatsuba with various settings 1165of `KARATSUBA_LEN`. Package maintainers will want to run the [test suite][124], 1166right? 1167 1168Yes, but this script is not part of the [test suite][124]; it's used for testing 1169in the [`scripts/release.sh`][83] script, which is maintainer use only. 1170 1171However, there is one snare with `karatsuba.py`: I didn't want the user to have 1172to install any Python libraries to run it. Keep that in mind if you change it. 1173 1174#### `link.sh` 1175 1176This script is the magic behind making `dc` a symlink of `bc` when both 1177calculators are built. 1178 1179#### `locale_install.sh` 1180 1181This script does what its name says: it installs locales. 1182 1183It turns out that this is complicated. 1184 1185There is a magic environment variable, `$NLSPATH`, that tells you how and where 1186you are supposed to install locales. 1187 1188Yes, *how*. And where. 1189 1190But now is not the place to rant about `$NLSPATH`. For more information on 1191locales and `$NLSPATH`, see [Locales][85]. 1192 1193#### `locale_uninstall.sh` 1194 1195This script does what its name says: it uninstalls locales. 1196 1197This is far less complicated than installing locales. I basically generate a 1198wildcard path and then list all paths that fit that wildcard. Then I delete each 1199one of those paths. Easy. 1200 1201For more information on locales, see [Locales][85]. 1202 1203#### `manpage.sh` 1204 1205This script is the one that generates markdown manuals from a template and a 1206manpage from a markdown manual. 1207 1208For more information about generating manuals, see [Manuals][86]. 1209 1210#### `ministat.c` 1211 1212This is a file copied [from FreeBSD][221] that calculates the standard 1213statistical numbers, such as mean, average, and median, based on numbers 1214obtained from a file. 1215 1216For more information, see the [FreeBSD ministat(1) manpage][222]. 1217 1218This file allows `bc` to build the `scripts/ministat` executable using the 1219command `make ministat`, and this executable helps programmers evaluate the 1220results of [benchmarks][144] more accurately. 1221 1222#### `package.sh` 1223 1224This script is what helps `bc` maintainers cut a release. It does the following: 1225 12261. Creates the appropriate `git` tag. 12272. Pushes the `git` tag. 12283. Copies the repo to a temp directory. 12294. Removes files that should not be included in source distributions. 12305. Creates the tarballs. 12316. Signs the tarballs. 12327. Zips and signs the Windows executables if they exist. 12338. Calculates and outputs SHA512 and SHA256 sums for all of the files, 1234 including the signatures. 1235 1236This script is for `bc` maintainers to use when cutting a release. It is not 1237meant for outside use. This means that some non-POSIX utilities can be used, 1238such as `git` and `gpg`. 1239 1240In addition, before using this script, it expects that the folders that Windows 1241generated when building `bc`, `dc`, and [`bcl`][156], are in the parent 1242directory of the repo, exactly as Windows generated them. If they are not there, 1243then it will not zip and sign, nor calculate sums of, the Windows executables. 1244 1245Because this script creates a tag and pushes it, it should *only* be run *ONCE* 1246per release. 1247 1248#### `radamsa.sh` 1249 1250A script to test `bc`'s command-line expression parsing code, which, while 1251simple, strives to handle as much as possible. 1252 1253What this script does is it uses the test cases in [`radamsa.txt`][98] an input 1254to the [Radamsa fuzzer][99]. 1255 1256For more information, see the [Radamsa][128] section. 1257 1258#### `radamsa.txt` 1259 1260Initial test cases for the [`radamsa.sh`][100] script. 1261 1262#### `randmath.py` 1263 1264This script generates random math problems and checks that `bc`'s and `dc`'s 1265output matches the GNU `bc` and `dc`. (For this reason, it is necessary to have 1266GNU `bc` and `dc` installed before using this script.) 1267 1268One snare: be sure that this script is using the GNU `bc` and `dc`, not a 1269previously-installed version of this `bc` and `dc`. 1270 1271If you want to check for memory issues or failing asserts, you can build the 1272`bc` using `./scripts/fuzz_prep.sh -a`, and then run it under this script. Any 1273errors or crashes should be caught by the script and given to the user as part 1274of the "checklist" (see below). 1275 1276The basic idea behind this script is that it generates as many math problems as 1277it can, biasing towards situations that may be likely to have bugs, and testing 1278each math problem against GNU `bc` or `dc`. 1279 1280If GNU `bc` or `dc` fails, it just continues. If this `bc` or `dc` fails, it 1281stores that problem. If the output mismatches, it also stores the problem. 1282 1283Then, when the user sends a `SIGINT`, the script stops testing and goes into 1284report mode. One-by-one, it will go through the "checklist," the list of failed 1285problems, and present each problem to the user, as well as whether this `bc` or 1286`dc` crashed, and its output versus GNU. Then the user can decide to add them as 1287test cases, which it does automatically to the appropriate test file. 1288 1289#### `release_settings.txt` 1290 1291A text file of settings combinations that [`release.sh`][83] uses to ensure that 1292`bc` and `dc` build and work with various default settings. [`release.sh`][83] 1293simply reads it line by line and uses each line for one build. 1294 1295#### `release.sh` 1296 1297This script is for `bc` maintainers only. It runs `bc`, `dc`, and [`bcl`][156] 1298through a gauntlet that is mostly meant to be used in preparation for a release. 1299 1300It does the following: 1301 13021. Builds every [build type][81], with every setting combo in 1303 [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc` 1304 alone. 13052. Builds every [build type][81], with every setting combo in 1306 [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc` 1307 alone for 32-bit. 13083. Does #1 and #2 for Debug, Release, Release with Debug Info, and Min Size 1309 Release builds. 13104. Runs the [test suite][124] on every build, if desired. 13115. Runs the [test suite][124] under [ASan, UBSan, and MSan][21] for every build 1312 type/setting combo. 13136. Runs [`scripts/karatsuba.py`][78] in test mode. 13147. Runs the [test suite][124] for both calculators, `bc` alone, and `dc` alone 1315 under [valgrind][20] and errors if there are any memory bugs or memory 1316 leaks. 1317 1318#### `safe-install.sh` 1319 1320A script copied from [musl][101] to atomically install files. 1321 1322#### `test_settings.sh` 1323 1324A quick and dirty script to help automate rebuilding while manually testing the 1325various default settings. 1326 1327This script uses [`test_settings.txt`][103] to generate the various settings 1328combos. 1329 1330For more information about settings, see [Settings][102] in the [build 1331manual][14]. 1332 1333#### `test_settings.txt` 1334 1335A list of the various settings combos to be used by [`test_settings.sh`][104]. 1336 1337### `src/` 1338 1339This folder is, obviously, where the actual heart and soul of `bc`, the source 1340code, is. 1341 1342All of the source files are in one folder; this simplifies the build system 1343immensely. 1344 1345There are separate files for `bc` and `dc` specific code ([`bc.c`][40], 1346[`bc_lex.c`][41], [`bc_parse.c`][42], [`dc.c`][44], [`dc_lex.c`][45], and 1347[`dc_parse.c`][46]) where possible because it is cleaner to exclude an entire 1348source file from a build than to have `#if`/`#endif` preprocessor guards. 1349 1350That said, it was easier in many cases to use preprocessor macros where both 1351calculators used much of the same code and data structures, so there is a 1352liberal sprinkling of them through the code. 1353 1354#### `args.c` 1355 1356Code for processing command-line arguments. 1357 1358The header for this file is [`include/args.h`][31]. 1359 1360#### `bc.c` 1361 1362The code for the `bc` main function `bc_main()`. 1363 1364The header for this file is [`include/bc.h`][106]. 1365 1366#### `bc_lex.c` 1367 1368The code for lexing that only `bc` needs. 1369 1370The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106]. 1371 1372#### `bc_parse.c` 1373 1374The code for parsing that only `bc` needs. This code is the most complex and 1375subtle in the entire codebase. 1376 1377The headers for this file are [`include/parse.h`][181] and 1378[`include/bc.h`][106]. 1379 1380#### `data.c` 1381 1382Due to [historical accident][23] because of a desire to get my `bc` into 1383[toybox][16], all of the constant data that `bc` needs is all in one file. This 1384is that file. 1385 1386There is no code in this file, but a lot of the const data has a heavy influence 1387on code, including the order of data in arrays because that order has to 1388correspond to the order of other things elsewhere in the codebase. If you change 1389the order of something in this file, run `make test`, and get errors, you 1390changed something that depends on the order that you messed up. 1391 1392Almost all headers have `extern` references to items in this file. 1393 1394#### `dc.c` 1395 1396The code for the `dc` main function `dc_main()`. 1397 1398The header for this file is [`include/dc.h`][182]. 1399 1400#### `dc_lex.c` 1401 1402The code for lexing that only `dc` needs. 1403 1404The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182]. 1405 1406#### `dc_parse.c` 1407 1408The code for parsing that only `dc` needs. 1409 1410The headers for this file are [`include/parse.h`][181] and 1411[`include/bc.h`][182]. 1412 1413#### `file.c` 1414 1415The code for `bc`'s implementation of buffered I/O. For more information about 1416why I implemented my own buffered I/O, see [`include/file.h`][55], [Error 1417Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes 1418about version [`3.0.0`][32] in the [`NEWS`][32]. 1419 1420The header for this file is [`include/file.h`][55]. 1421 1422#### `history.c` 1423 1424The code for `bc`'s implementation of command-line editing/history, which is 1425based on a [UTF-8-aware fork][28] of [`linenoise`][29]. 1426 1427For more information, see the [Command-Line History][189] section. 1428 1429The header for this file is [`include/history.h`][36]. 1430 1431#### `lang.c` 1432 1433The data structures used for actual execution of `bc` and `dc` code. 1434 1435While execution is done in [`src/program.c`][53], this file defines functions 1436for initializing, copying, and freeing the data structures, which is somewhat 1437orthogonal to actual execution. 1438 1439Yes, it's misnamed; that's an accident of history where the first things I put 1440into it all seemed related to the `bc` language. 1441 1442The header for this file is [`include/lang.h`][38]. 1443 1444#### `lex.c` 1445 1446The code for the common things that both programs need for lexing. 1447 1448The header for this file is [`include/lex.h`][180]. 1449 1450#### `library.c` 1451 1452The code to implement the public API of the `bcl` library. 1453 1454The code in this file does a lot to ensure that clients do not have to worry 1455about internal `bc` details, especially error handling with `setjmp()` and 1456`longjmp()`. That and encapsulating the handling of numbers are the bulk of what 1457the code in this file actually does because most of the library is still 1458implemented in [`src/num.c`][39]. 1459 1460The headers for this file are [`include/bcl.h`][30] and 1461[`include/library.h`][183]. 1462 1463#### `main.c` 1464 1465The entry point for both programs; this is the `main()` function. 1466 1467This file has no headers associated with it. 1468 1469#### `num.c` 1470 1471The code for all of the arbitrary-precision [numbers][177] and [math][178] in 1472`bc`. 1473 1474The header for this file is [`include/num.h`][184]. 1475 1476#### `opt.c` 1477 1478The code for parsing command-line options. 1479 1480The header for this file is [`include/opt.h`][35]. 1481 1482#### `parse.c` 1483 1484The code for the common items that both programs need for parsing. 1485 1486The header for this file is [`include/parse.h`][181]. 1487 1488#### `program.c` 1489 1490The code for the actual execution engine for `bc` and `dc` code. 1491 1492The header for this file is [`include/program.h`][57]. 1493 1494#### `rand.c` 1495 1496The code for the [pseudo-random number generator (PRNG)][179] and the special 1497stack handling it needs. 1498 1499The PRNG only generates fixed-size integers. The magic of generating random 1500numbers of arbitrary size is actually given to the code that does math 1501([`src/num.c`][39]). 1502 1503The header for this file is [`include/rand.h`][37]. 1504 1505#### `read.c` 1506 1507The code for reading from files and `stdin`. 1508 1509The header for this file is [`include/read.h`][185]. 1510 1511#### `vector.c` 1512 1513The code for [vectors][111], [maps][186], and [slab vectors][187], along with 1514slabs. 1515 1516The header for this file is [`include/vector.h`][174]. 1517 1518#### `vm.c` 1519 1520The code for setting up and running `bc` and `dc`. 1521 1522It is so named because I think of it as the "virtual machine" of `bc`, though 1523that is probably not true as [`program.h`][57] is probably the "virtual machine" 1524code. Thus, the name is more historical accident. 1525 1526The header for this file is [`include/vm.h`][27]. 1527 1528### `tests/` 1529 1530This directory contains the entire [test suite][124] and its infrastructure. 1531 1532#### `all.sh` 1533 1534A convenience script for the `make run_all_tests` target (see the [Group 1535Tests][141] section for more information). 1536 1537#### `all.txt` 1538 1539The file with the names of the calculators. This is to make it easier for the 1540test scripts to know where the standard and other test directories are. 1541 1542#### `bcl.c` 1543 1544The test for the [`bcl`][156] API. For more information, see the [`bcl` 1545Test][157] section. 1546 1547#### `error.sh` 1548 1549The script to run the file-based error tests in `tests/<calculator>/errors/` for 1550each calculator. For more information, see the [Error Tests][151] section. 1551 1552This is a separate script so that each error file can be run separately and in 1553parallel. 1554 1555#### `errors.sh` 1556 1557The script to run the line-based error tests in `tests/<calculator>/errors.txt` 1558for each calculator. For more information, see the [Error Tests][151] section. 1559 1560#### `extra_required.txt` 1561 1562The file with the list of tests which both calculators have that need the [Extra 1563Math build option][188]. This exists to make it easy for test scripts to skip 1564those tests when the [Extra Math build option][188] is disabled. 1565 1566#### `history.py` 1567 1568The file with all of the history tests. For more information, see the [History 1569Tests][155] section. 1570 1571#### `history.sh` 1572 1573The script to integrate [`history.py`][139] into the build system in a portable 1574way, and to skip it if necessary. 1575 1576This script also re-runs the test three times if it fails. This is because 1577`pexpect` can be flaky at times. 1578 1579#### `other.sh` 1580 1581The script to run the "other" (miscellaneous) tests for each calculator. For 1582more information, see the [Other Tests][154] section. 1583 1584#### `read.sh` 1585 1586The script to run the read tests for each calculator. For more information, see 1587the [`read()` Tests][153] section. 1588 1589#### `script.sed` 1590 1591The `sed` script to edit the output of GNU `bc` when generating script tests. 1592For more information, see the [Script Tests][150] section. 1593 1594#### `script.sh` 1595 1596The script for running one script test. For more information, see the [Script 1597Tests][150] section. 1598 1599#### `scripts.sh` 1600 1601The script to help the `make run_all_tests` (see the [Group Tests][141] section) 1602run all of the script tests. 1603 1604#### `stdin.sh` 1605 1606The script to run the `stdin` tests for each calculator. For more information, 1607see the [`stdin` Tests][152] section. 1608 1609#### `test.sh` 1610 1611The script to run one standard test. For more information, see the [Standard 1612Tests][149] section. 1613 1614#### `bc/` 1615 1616The standard tests directory for `bc`. For more information, see the [Standard 1617Tests][149] section. 1618 1619##### `all.txt` 1620 1621The file to tell the build system and `make run_all_tests` (see the [Group 1622Tests][141] section) what standard tests to run for `bc`, as well as in what 1623order. 1624 1625This file just lists the test names, one per line. 1626 1627##### `errors.txt` 1628 1629The initial error test file for `bc`. This file has one test per line. See the 1630[Error Tests][151] section for more information. 1631 1632##### `posix_errors.txt` 1633 1634The file of tests for POSIX compatibility for `bc`. This file has one test per 1635line. For more information, see the [Error Tests][151] section. 1636 1637##### `timeconst.sh` 1638 1639The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6]. 1640For more information, see the [Linux `timeconst.bc` Script][191]section. 1641 1642##### `errors/` 1643 1644The directory with error tests for `bc`, most discovered by AFL++ (see the 1645[Fuzzing][82] section). There is one test per file. For more information, see 1646the [Error Tests][151] section. 1647 1648##### `scripts/` 1649 1650The script tests directory for `bc`. For more information, see the [Script 1651Tests][150] section. 1652 1653###### `all.txt` 1654 1655A file to tell the build system and `make run_all_tests` (see the [Group 1656Tests][141] section) what script tests to run for `bc`, as well as in what 1657order. 1658 1659This file just lists the test names, one per line. 1660 1661#### `dc/` 1662 1663The standard tests directory for `dc`. For more information, see the [Standard 1664Tests][149] section. 1665 1666##### `all.txt` 1667 1668The file to tell the build system and `make run_all_tests` (see the [Group 1669Tests][141] section) what standard tests to run for `dc`, as well as in what 1670order. 1671 1672This file just lists the test names, one per line. 1673 1674##### `errors.txt` 1675 1676The initial error test file for `dc`. This file has one test per line. See the 1677[Error Tests][151] section for more information. 1678 1679##### `read_errors.txt` 1680 1681The file of tests errors with the `?` command (`read()` in `bc`). This file has 1682one test per line. See the [Error Tests][151] section for more information. 1683 1684##### `errors/` 1685 1686The directory with error tests for `dc`, most discovered by AFL++ (see the 1687[Fuzzing][82] section). There is one test per file. For more information, see 1688the [Error Tests][151] section. 1689 1690##### `scripts/` 1691 1692The script tests directory for `dc`. For more information, see the [Script 1693Tests][150] section. 1694 1695###### `all.txt` 1696 1697The file to tell the build system and `make run_all_tests` (see the [Group 1698Tests][141] section) what script tests to run for `dc`, as well as in what 1699order. 1700 1701This file just lists the test names, one per line. 1702 1703#### `fuzzing/` 1704 1705The directory containing the fuzzing infrastructure. For more information, see 1706the [Fuzzing][82] section. 1707 1708##### `bc_afl_continue.yaml` 1709 1710The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a 1711fuzz run. For more information, see the [Convenience][130] subsection of the 1712[Fuzzing][82] section. 1713 1714##### `bc_afl.yaml` 1715 1716The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a 1717fuzz run. For more information, see the [Convenience][130] subsection of the 1718[Fuzzing][82] section. 1719 1720Be aware that this will delete all previous unsaved fuzzing tests in the output 1721directories. 1722 1723##### `bc_inputs1/` 1724 1725The fuzzing input directory for the first third of inputs for `bc`. For more 1726information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1727 1728##### `bc_inputs2/` 1729 1730The fuzzing input directory for the second third of inputs for `bc`. For more 1731information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1732 1733##### `bc_inputs3/` 1734 1735The fuzzing input directory for the third third of inputs for `bc`. For more 1736information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1737 1738##### `dc_inputs/` 1739 1740The fuzzing input directory for the inputs for `dc`. For more information, see 1741the [Corpuses][192] subsection of the [Fuzzing][82] section. 1742 1743### `vs/` 1744 1745The directory containing all of the materials needed to build `bc`, `dc`, and 1746`bcl` on Windows. 1747 1748#### `bcl.sln` 1749 1750A Visual Studio solution file for [`bcl`][156]. This, along with 1751[`bcl.vcxproj`][63] and [`bcl.vcxproj.filters`][64] is what makes it possible to 1752build [`bcl`][156] on Windows. 1753 1754#### `bcl.vcxproj` 1755 1756A Visual Studio project file for [`bcl`][156]. This, along with [`bcl.sln`][65] 1757and [`bcl.vcxproj.filters`][64] is what makes it possible to build [`bcl`][156] 1758on Windows. 1759 1760#### `bcl.vcxproj.filters` 1761 1762A Visual Studio filters file for [`bcl`][156]. This, along with [`bcl.sln`][65] 1763and [`bcl.vcxproj`][63] is what makes it possible to build [`bcl`][156] on 1764Windows. 1765 1766#### `bc.sln` 1767 1768A Visual Studio solution file for `bc`. This, along with [`bc.vcxproj`][66] 1769and [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on 1770Windows. 1771 1772#### `bc.vcxproj` 1773 1774A Visual Studio project file for `bc`. This, along with [`bc.sln`][68] and 1775[`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on Windows. 1776 1777#### `bc.vcxproj.filters` 1778 1779A Visual Studio filters file for `bc`. This, along with [`bc.sln`][68] and 1780[`bc.vcxproj`][66] is what makes it possible to build `bc` on Windows. 1781 1782#### `tests/` 1783 1784A directory of files to run tests on Windows. 1785 1786##### `tests_bc.bat` 1787 1788A file to run basic `bc` tests on Windows. It expects that it will be run from 1789the directory containing it, and it also expects a `bc.exe` in the same 1790directory. 1791 1792##### `tests_dc.bat` 1793 1794A file to run basic `dc` tests on Windows. It expects that it will be run from 1795the directory containing it, and it also expects a `bc.exe` in the same 1796directory. 1797 1798## Build System 1799 1800The build system is described in detail in the [build manual][14], so 1801maintainers should start there. This section, however, describes some parts of 1802the build system that only maintainers will care about. 1803 1804### Clean Targets 1805 1806`bc` has a default `make clean` target that cleans up the build files. However, 1807because `bc`'s build system can generate many different types of files, there 1808are other clean targets that may be useful: 1809 1810* `make clean_gen` cleans the `gen/strgen` executable generated from 1811 [`gen/strgen.c`][15]. It has no prerequisites. 1812* `make clean` cleans object files, `*.cat` files (see the [Locales][85] 1813 section), executables, and files generated from text files in [`gen/`][145], 1814 including `gen/strgen` if it was built. So this has a prerequisite on 1815 `make clean_gen` in normal use. 1816* `make clean_benchmarks` cleans [benchmarks][144], including the `ministat` 1817 executable. It has no prerequisites. 1818* `make clean_config` cleans the generated `Makefile` and the manuals that 1819 [`configure.sh`][69] copied in preparation for install. It also depends on 1820 `make clean` and `make clean_benchmarks`, so it cleans those items too. This 1821 is the target that [`configure.sh`][69] uses before it does its work. 1822* `make clean_coverage` cleans the generated coverage files for the [test 1823 suite][124]'s [code coverage][146] capabilities. It has no prerequisites. This 1824 is useful if the code coverage tools are giving errors. 1825* `make clean_tests` cleans *everything*. It has prerequisites on all previous 1826 clean targets, but it also cleans all of the [generated tests][143]. 1827 1828When adding more generated files, you may need to add them to one of these 1829targets and/or add a target for them especially. 1830 1831### Preprocessor Macros 1832 1833`bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type: 1834 1835* builds, 1836* works under the [test suite][124], and 1837* excludes as much code as possible from all builds. 1838 1839This section will explain the preprocessor style of `bc` and `dc`, as well as 1840provide an explanation of the macros used. 1841 1842#### Style 1843 1844The style of macro use in `bc` is pretty straightforward: I avoid depending on 1845macro definitions and instead, I set defaults if the macro is not defined and 1846then test the value if the macro with a plain `#if`. 1847 1848(Some examples of setting defaults are in [`include/status.h`][176], just above 1849the definition of the `BcStatus` enum.) 1850 1851In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible. 1852 1853There are a couple of cases where I went with standard stuff instead. 1854 1855#### Standard Macros 1856 1857`BC_ENABLED` 1858 1859: This macro expands to `1` if `bc` is enabled, `0` if disabled. 1860 1861`DC_ENABLED` 1862 1863: This macro expands to `1` if `dc` is enabled, `0` if disabled. 1864 1865`BUILD_TYPE` 1866 1867: The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`, 1868 `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct 1869 the user to the correct markdown manual in the `git.gavinhoward.com` 1870 website. 1871 1872`EXECPREFIX` 1873 1874: This macro expands to the prefix on the executable name. This is used to 1875 allow `bc` and `dc` to skip the prefix when finding out which calculator is 1876 executing. 1877 1878`BC_NUM_KARATSUBA_LEN` 1879 1880: This macro expands to an integer, which is the length of numbers below which 1881 the Karatsuba multiplication algorithm switches to brute-force 1882 multiplication. 1883 1884`BC_ENABLE_EXTRA_MATH` 1885 1886: This macro expands to `1` if the [Extra Math build option][188] is enabled, 1887 `0` if disabled. 1888 1889`BC_ENABLE_HISTORY` 1890 1891: This macro expands to `1` if the [History build option][193] is enabled, `0` 1892 if disabled. 1893 1894`BC_ENABLE_NLS` 1895 1896: This macro expands to `1` if the [NLS build option][193] (for locales) is 1897 enabled, `0` if disabled. 1898 1899`BC_ENABLE_LIBRARY` 1900 1901: This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if 1902 disabled. If this is enabled, building the calculators themselves is 1903 disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero. 1904 1905`BC_ENABLE_MEMCHECK` 1906 1907: This macro expands to `1` if `bc` has been built for use with Valgrind's 1908 [Memcheck][194], `0` otherwise. This ensures that fatal errors still free 1909 all of their memory when exiting. `bc` does not do that normally because 1910 what's the point? 1911 1912`BC_ENABLE_AFL` 1913 1914: This macro expands to `1` if `bc` has been built for fuzzing with 1915 [AFL++][125], `0` otherwise. See the [Fuzzing][82] section for more 1916 information. 1917 1918`BC_DEFAULT_BANNER` 1919 1920: This macro expands to the default value for displaying the `bc` banner. 1921 1922`BC_DEFAULT_SIGINT_RESET` 1923 1924: The macro expands to the default value for whether or not `bc` should reset 1925 on `SIGINT` or quit. 1926 1927`BC_DEFAULT_TTY_MODE` 1928 1929: The macro expands to the default value for whether or not `bc` should use 1930 TTY mode when it available. 1931 1932`BC_DEFAULT_PROMPT` 1933 1934: This macro expands to the default value for whether or not `bc` should use a 1935 prompt when TTY mode is available. 1936 1937`DC_DEFAULT_SIGINT_RESET` 1938 1939: The macro expands to the default value for whether or not `dc` should reset 1940 on `SIGINT` or quit. 1941 1942`DC_DEFAULT_TTY_MODE` 1943 1944: The macro expands to the default value for whether or not `dc` should use 1945 TTY mode when it available. 1946 1947`DC_DEFAULT_PROMPT` 1948 1949: This macro expands to the default value for whether or not `dc` should use a 1950 prompt when TTY mode is available. 1951 1952`BC_DEBUG_CODE` 1953 1954: If this macro expands to a non-zero integer, then `bc` is built with *a lot* 1955 of extra debugging code. This is never set by the build system and must be 1956 set by the programmer manually. This should never be set in builds given to 1957 end users. For more information, see the [Debugging][134] section. 1958 1959## Test Suite 1960 1961While the source code may be the heart and soul of `bc`, the test suite is the 1962arms and legs: it gives `bc` the power to do anything it needs to do. 1963 1964The test suite is what allowed `bc` to climb to such high heights of quality. 1965This even goes for fuzzing because fuzzing depends on the test suite for its 1966input corpuses. (See the [Fuzzing][82] section.) 1967 1968Understanding how the test suite works should be, I think, the first thing that 1969maintainers learn after learning what `bc` and `dc` should do. This is because 1970the test suite, properly used, gives confidence that changes have not caused 1971bugs or regressions. 1972 1973That is why I spent the time to make the test suite as easy to use and as fast 1974as possible. 1975 1976To use the test suite (assuming `bc` and/or `dc` are already built), run the 1977following command: 1978 1979``` 1980make test 1981``` 1982 1983That's it. That's all. 1984 1985It will return an error code if the test suite failed. It will also print out 1986information about the failure. 1987 1988If you want the test suite to go fast, then run the following command: 1989 1990``` 1991make -j<cores> test 1992``` 1993 1994Where `<cores>` is the number of cores that your computer has. Of course, this 1995requires a `make` implementation that supports that option, but most do. (And I 1996will use this convention throughout the rest of this section.) 1997 1998I have even tried as much as possible, to put longer-running tests near the 1999beginning of the run so that the entire suite runs as fast as possible. 2000 2001However, if you want to be sure which test is failing, then running a bare 2002`make test` is a great way to do that. 2003 2004But enough about how you have no excuses to use the test suite as much as 2005possible; let's talk about how it works and what you *can* do with it. 2006 2007### Standard Tests 2008 2009The heavy lifting of testing the math in `bc`, as well as basic scripting, is 2010done by the "standard tests" for each calculator. 2011 2012These tests use the files in the [`tests/bc/`][161] and [`tests/dc/`][162] 2013directories (except for [`tests/bc/all.txt`][163], [`tests/bc/errors.txt`][164], 2014[`tests/bc/posix_errors.txt`][165], [`tests/bc/timeconst.sh`][166], 2015[`tests/dc/all.txt`][167], [`tests/dc/errors.txt`][168], and 2016[`tests/dc/read_errors.txt`][175]), which are called the "standard test 2017directories." 2018 2019For every test, there is the test file and the results file. The test files have 2020names of the form `<test>.txt`, where `<test>` is the name of the test, and the 2021results files have names of the form `<test>_results.txt`. 2022 2023If the test file exists but the results file does not, the results for that test 2024are generated by a GNU-compatible `bc` or `dc`. See the [Generated Tests][143] 2025section. 2026 2027The `all.txt` file in each standard tests directory is what tells the test suite 2028and [build system][142] what tests there are, and the tests are either run in 2029that order, or in the case of parallel `make`, that is the order that the 2030targets are listed as prerequisites of `make test`. 2031 2032If the test exists in the `all.txt` file but does not *actually* exist, the test 2033and its results are generated by a GNU-compatible `bc` or `dc`. See the 2034[Generated Tests][143] section. 2035 2036To add a non-generated standard test, do the following: 2037 2038* Add the test file (`<test>.txt` in the standard tests directory). 2039* Add the results file (`<test>_results.txt` in the standard tests directory). 2040 You can skip this step if just the results file needs to be generated. See the 2041 [Generated Tests][147] section for more information. 2042* Add the name of the test to the `all.txt` file in the standard tests 2043 directory, putting it in the order it should be in. If possible, I would put 2044 longer tests near the beginning because they will start running earlier with 2045 parallel `make`. I always keep `decimal` first, though, as a smoke test. 2046 2047If you need to add a generated standard test, see the [Generated Tests][147] 2048section for how to do that. 2049 2050Some standard tests need to be skipped in certain cases. That is handled by the 2051[build system][142]. See the [Integration with the Build System][147] section 2052for more details. 2053 2054In addition to all of the above, the standard test directory is not only the 2055directory for the standard tests of each calculator, it is also the parent 2056directory of all other test directories for each calculator. 2057 2058#### `bc` Standard Tests 2059 2060The list of current (27 February 2023) standard tests for `bc` is below: 2061 2062decimal 2063 2064: Tests decimal parsing and printing. 2065 2066print 2067 2068: Tests printing in every base from decimal. This is near the top for 2069 performance of parallel testing. 2070 2071parse 2072 2073: Tests parsing in any base and outputting in decimal. This is near the top 2074 for performance of parallel testing. 2075 2076lib2 2077 2078: Tests the extended math library. This is near the top for performance of 2079 parallel testing. 2080 2081print2 2082 2083: Tests printing at the extreme values of `obase`. 2084 2085length 2086 2087: Tests the `length()` builtin function. 2088 2089scale 2090 2091: Tests the `scale()` builtin function. 2092 2093shift 2094 2095: Tests the left (`<<`) and right (`>>`) shift operators. 2096 2097add 2098 2099: Tests addition. 2100 2101subtract 2102 2103: Tests subtraction. 2104 2105multiply 2106 2107: Tests multiplication. 2108 2109divide 2110 2111: Tests division. 2112 2113modulus 2114 2115: Tests modulus. 2116 2117power 2118 2119: Tests power (exponentiation). 2120 2121sqrt 2122 2123: Tests the `sqrt()` (square root) builtin function. 2124 2125trunc 2126 2127: Tests the truncation (`$`) operator. 2128 2129places 2130 2131: Tests the places (`@`) operator. 2132 2133vars 2134 2135: Tests some usage of variables. This one came from [AFL++][125] I think. 2136 2137boolean 2138 2139: Tests boolean operators. 2140 2141comp 2142 2143: Tests comparison operators. 2144 2145abs 2146 2147: Tests the `abs()` builtin function. 2148 2149assignments 2150 2151: Tests assignment operators, including increment/decrement operators. 2152 2153functions 2154 2155: Tests functions, specifically function parameters being replaced before they 2156 themselves are used. See the comment in `bc_program_call()` about the last 2157 condition. 2158 2159scientific 2160 2161: Tests scientific notation. 2162 2163engineering 2164 2165: Tests engineering notation. 2166 2167globals 2168 2169: Tests that assigning to globals affects callers. 2170 2171strings 2172 2173: Tests strings. 2174 2175strings2 2176 2177: Tests string allocation in slabs, to ensure slabs work. 2178 2179letters 2180 2181: Tests single and double letter numbers to ensure they behave differently. 2182 Single-letter numbers always be set to the same value, regardless of 2183 `ibase`. 2184 2185exponent 2186 2187: Tests the `e()` function in the math library. 2188 2189log 2190 2191: Tests the `l()` function in the math library. 2192 2193pi 2194 2195: Tests that `bc` produces the right value of pi for numbers with varying 2196 `scale` values. 2197 2198arctangent 2199 2200: Tests the `a()` function in the math library. 2201 2202sine 2203 2204: Tests the `s()` function in the math library. 2205 2206cosine 2207 2208: Tests the `c()` function in the math library. 2209 2210bessel 2211 2212: Tests the `j()` function in the math library. 2213 2214fib 2215 2216: Tests the `fib()` Fibonacci function in the extended math library. 2217 2218arrays 2219 2220: Test arrays. 2221 2222misc 2223 2224: Miscellaneous tests. I named it this because at the time, I struggled to 2225 classify them, but it's really testing multi-line numbers. 2226 2227misc1 2228 2229: A miscellaneous test found by [AFL++][125]. 2230 2231misc2 2232 2233: A miscellaneous test found by [AFL++][125]. 2234 2235misc3 2236 2237: A miscellaneous test found by [AFL++][125]. 2238 2239misc4 2240 2241: A miscellaneous test found by [AFL++][125]. 2242 2243misc5 2244 2245: A miscellaneous test found by [AFL++][125]. 2246 2247misc6 2248 2249: A miscellaneous test found by [AFL++][125]. 2250 2251misc7 2252 2253: A miscellaneous test found by [AFL++][125]. 2254 2255void 2256 2257: Tests void functions. 2258 2259rand 2260 2261: Tests the pseudo-random number generator and its special stack handling. 2262 2263recursive_arrays 2264 2265: Tested the slab vector undo ability in used in `bc_parse_name()` when it 2266 existed. Now used as a stress test. 2267 2268divmod 2269 2270: Tests divmod. 2271 2272modexp 2273 2274: Tests modular exponentiation. 2275 2276bitfuncs 2277 2278: Tests the bitwise functions, `band()`, `bor()`, `bxor()`, `blshift()` and 2279 `brshift()` in [`gen/lib2.bc`][26]. 2280 2281leadingzero 2282 2283: Tests the leading zero functionality and the `plz*()` and `pnlz*()` 2284 functions in [`gen/lib2.bc`][26]. 2285 2286is_number 2287 2288: Tests the `is_number()` built-in function. 2289 2290is_string 2291 2292: Tests the `is_number()` built-in function. 2293 2294asciify_array 2295 2296: Tests the ability of `asciify()` to convert an array into a full string. 2297 2298line_by_line1 2299 2300: Tests the line-by-line behavior of `bc` with regards to `quit` in a function 2301 definition. 2302 2303line_by_line2 2304 2305: Tests the line-by-line behavior of `bc` with regards to `quit`. 2306 2307line_loop_quit1 2308 2309: Tests the behavior of `bc` with a `quit` after a single-line loop. 2310 2311line_loop_quit2 2312 2313: Tests the behavior of `bc` with a `quit` after a single-line loop and a 2314 newline escape. 2315 2316#### `dc` Standard Tests 2317 2318The list of current (27 February 2023) standard tests for `dc` is below: 2319 2320decimal 2321 2322: Tests decimal parsing and printing. 2323 2324length 2325 2326: Tests the `length()` builtin function, including for strings and arrays. 2327 2328stack_len 2329 2330: Tests taking the length of the results stack. 2331 2332exec_stack_len 2333 2334: Tests taking the length of the execution stack. 2335 2336add 2337 2338: Tests addition. 2339 2340subtract 2341 2342: Tests subtraction. 2343 2344multiply 2345 2346: Tests multiplication. 2347 2348divide 2349 2350: Tests division. 2351 2352modulus 2353 2354: Tests modulus. 2355 2356divmod 2357 2358: Tests divmod. 2359 2360power 2361 2362: Tests power (exponentiation). 2363 2364sqrt 2365 2366: Tests the `sqrt()` (square root) builtin function. 2367 2368modexp 2369 2370: Tests modular exponentiation. 2371 2372boolean 2373 2374: Tests boolean operators. 2375 2376negate 2377 2378: Tests negation as a command and as part of numbers. 2379 2380trunc 2381 2382: Tests the truncation (`$`) operator. 2383 2384places 2385 2386: Tests the places (`@`) operator. 2387 2388shift 2389 2390: Tests the left (`<<`) and right (`>>`) shift operators. 2391 2392abs 2393 2394: Tests the `abs()` builtin function (the `b` command). 2395 2396scientific 2397 2398: Tests scientific notation. 2399 2400engineering 2401 2402: Tests engineering notation. 2403 2404vars 2405 2406: Tests some usage of variables. This one came from [AFL++][125] I think. 2407 2408misc 2409 2410: Miscellaneous tests. I named it this because at the time, I struggled to 2411 classify them. 2412 2413misc1 2414 2415: More miscellaneous tests. This used to be an error file 2416 (`tests/dc/errors/15.txt`) due to the presence of a invalid `u` character. 2417 However, starting with version `6.1.0`, `u` became a valid command 2418 (`is_number()`), so the error file became valid. It was checked manually and 2419 moved, and `tests/dc/errors/34.txt` became the new `tests/dc/errors/15.txt`. 2420 2421strings 2422 2423: Tests strings. 2424 2425rand 2426 2427: Tests the pseudo-random number generator and its special stack handling. 2428 2429is_number 2430 2431: Tests the `is_number()` built-in function (the `u` command). 2432 2433is_string 2434 2435: Tests the `is_number()` built-in function (the `t` command). 2436 2437### Script Tests 2438 2439The heavy lifting of testing the scripting of `bc` is done by the "script tests" 2440for each calculator. 2441 2442These tests use the files in the [`tests/bc/scripts/`][169] and 2443[`tests/dc/scripts/`][170] directories (except for 2444[`tests/bc/scripts/all.txt`][171] and [`tests/dc/scripts/all.txt`][172]), which 2445are called the "script test directories." 2446 2447To add a script test, do the following: 2448 2449* Add the test file (`<test>.bc` or `<test>.dc` in the script tests directory). 2450* Add the results file (`<test>.txt` in the script tests directory). You can 2451 skip this step if just the results file needs to be generated. See the 2452 [Generated Tests][147] section for more information. 2453* Add the name of the test to the `all.txt` file in the script tests directory, 2454 putting it in the order it should be in. If possible, I would put longer tests 2455 near the beginning because they will start running earlier with parallel 2456 `make`. 2457 2458Some script tests need to be skipped in certain cases. That is handled by the 2459[build system][142]. See the [Integration with the Build System][147] section 2460for more details. 2461 2462Another unique thing about the script tests, at least for `bc`: they test the 2463`-g` and `--global-stacks` flags. This means that all of the script tests for 2464`bc` are written assuming the `-g` flag was given on the command-line 2465 2466There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed` 2467script is used to remove an incompatibility with GNU `bc`. 2468 2469If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU 2470`bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is 2471correct according to my reading of the `bc` spec, so my `bc` doesn't as well. 2472 2473The `sed` script edits numbers that end with just one digit on a line by itself 2474to put it on the same line as others. 2475 2476#### `bc` Script Tests 2477 2478The list of current (27 February 2023) script tests for `bc` is below: 2479 2480print.bc 2481 2482: Tests printing even harder than the print standard test. 2483 2484multiply.bc 2485 2486: Tests multiplication even harder than the multiply standard test. 2487 2488divide.bc 2489 2490: Tests division even harder than the divide standard test. 2491 2492subtract.bc 2493 2494: Tests subtraction even harder than the subtract standard test. 2495 2496add.bc 2497 2498: Tests addition even harder than the add standard test. 2499 2500parse.bc 2501 2502: Tests parsing even harder than the parse standard test. 2503 2504root.bc 2505 2506: Tests that `root()` and `cbrt()` do not go into an infinite loop on a 2507 pathological case found by a user. 2508 2509array.bc 2510 2511: Tests arrays even harder than the arrays standard test. 2512 2513array2.bc 2514 2515: Implements a test where an array element is passed as a parameter to a 2516 function, and then another array is passed to a later parameter that is 2517 named the same as the first array. This was added because of a bug found 2518 while writing a script in bc. 2519 2520atan.bc 2521 2522: Tests arctangent even harder than the arctangent standard test. 2523 2524bessel.bc 2525 2526: Tests bessel even harder than the bessel standard test. 2527 2528functions.bc 2529 2530: Tests functions even harder than the functions standard test. 2531 2532globals.bc 2533 2534: Tests global stacks directly. 2535 2536len.bc 2537 2538: Tests the `length()` builtin on arrays. 2539 2540rand.bc 2541 2542: Tests the random number generator in the presence of global stacks. 2543 2544references.bc 2545 2546: Tests functions with array reference parameters. 2547 2548screen.bc 2549 2550: A random script provided by an early user that he used to calculate the size 2551 of computer screens 2552 2553strings2.bc 2554 2555: Tests escaping in strings. 2556 2557ifs.bc 2558 2559: Tests proper ending of `if` statements without `else` statements. 2560 2561ifs2.bc 2562 2563: More tests proper ending of `if` statements without `else` statements. 2564 2565#### `dc` Script Tests 2566 2567The list of current (27 February 2023) script tests for `dc` is below: 2568 2569prime.dc 2570 2571: Tests scripting by generating the first 100,000 primes. 2572 2573asciify.dc 2574 2575: Tests the asciify command. 2576 2577stream.dc 2578 2579: Tests the stream command. 2580 2581array.dc 2582 2583: Tests arrays. 2584 2585else.dc 2586 2587: Tests else clauses on conditional execution commands. 2588 2589factorial.dc 2590 2591: Tests scripting with factorial. 2592 2593loop.dc 2594 2595: Tests scripting by implementing loops. 2596 2597quit.dc 2598 2599: Tests the quit command in the presence of tail calls. 2600 2601weird.dc 2602 2603: A miscellaneous test. 2604 2605no_clamp.dc 2606 2607: A test to ensure `dc` has the same behavior as the BSD `dc` with digi 2608 clamping off when parsing numbers. 2609 2610### Error Tests 2611 2612One of the most useful parts of the `bc` test suite, in my opinion, is the heavy 2613testing of error conditions. 2614 2615Just about every error condition I can think of is tested, along with many 2616machine-generated (by [AFL++][125]) ones. 2617 2618However, because the error tests will often return error codes, they require 2619different infrastructure from the rest of the test suite, which assumes that 2620the calculator under test will return successfully. A lot of that infrastructure 2621is in the [`scripts/functions.sh`][105] script, but it basically allows the 2622calculator to exit with an error code and then tests that there *was* an error 2623code. 2624 2625Besides returning error codes, error tests also ensure that there is output from 2626`stderr`. This is to make sure that an error message is always printed. 2627 2628The error tests for each calculator are spread through two directories, due to 2629historical accident. These two directories are the standard test directory (see 2630the [Standard Tests][149] section) and the `errors/` directory directly 2631underneath the standard tests directory. 2632 2633This split is convenient, however, because the tests in each directory are 2634treated differently. 2635 2636The error tests in the standard test directory, which include `errors.txt` for 2637both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for `dc`, 2638are run by [`tests/errors.sh`][226]. It reads them line-by-line and shoves the 2639data through `stdin`. Each line is considered a separate test. For this reason, 2640there can't be any blank lines in the error files in the standard tests 2641directory because a blank line causes a successful exit. 2642 2643On the other hand, the tests in the `errors/` directory below the standard tests 2644directory are run by [`tests/error.sh`][227] and are considered to be one test 2645per file. As such, they are used differently. They are shoved into the 2646calculator through `stdin`, but they are also executed by passing them on the 2647command-line. 2648 2649To add an error test, first figure out which kind you want. 2650 2651Is it a simple one-liner, and you don't care if it's tested through a file? 2652 2653Then put it in one of the error files in the standard test directory. I would 2654only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()` 2655errors in the `read_errors.txt` file for `dc`; all others I would put in the 2656respective `errors.txt` file. 2657 2658On the other hand, if you care if the error is run as a file on the 2659command-line, or the error requires multiple lines to reproduce, then put the 2660test in the respective `errors/` directory and run the [`configure.sh`][69] 2661script again. 2662 2663After that, you are done; the test suite will automatically pick up the new 2664test, and you don't have to tell the test suite the expected results. 2665 2666### `stdin` Tests 2667 2668The `stdin` tests specifically test the lexing and parsing of multi-line 2669comments and strings. This is important because when reading from `stdin`, the 2670calculators can only read one line at a time, so partial parses are possible. 2671 2672To add `stdin` tests, just add the tests to the `stdin.txt` file in the 2673respective standard tests directory, and add the expected results in the 2674`stdin_results.txt` in the respective standard tests directory. 2675 2676### `read()` Tests 2677 2678The `read()` tests are meant to test the `read()` builtin function, to ensure 2679that the parsing and execution is correct. 2680 2681Each line is one test, as that is the nature of using the `read()` function, so 2682to add a test, just add it as another line in the `read.txt` file in the 2683respective standard tests directory, and add its result to the 2684`read_results.txt` file in the respective standard tests directory. 2685 2686### Other Tests 2687 2688The "other" tests are just random tests that I could not easily classify under 2689other types of tests. They usually include things like command-line parsing and 2690environment variable testing. 2691 2692To add an other test, it requires adding the programming for it to 2693[`tests/other.sh`][195] because all of the tests are written specifically in 2694that script. It would be best to use the infrastructure in 2695[`scripts/functions.sh`][105]. 2696 2697### Linux `timeconst.bc` Script 2698 2699One special script that `bc`'s test suite will use is the [Linux `timeconst.bc` 2700script][6]. 2701 2702I made the test suite able to use this script because the reason the 2703[toybox][16] maintainer wanted my `bc` is because of this script, and I wanted 2704to be sure that it would run correctly on the script. 2705 2706However, it is not part of the distribution, nor is it part of the repository. 2707The reason for this is because [`timeconst.bc`][6] is under the GPL, while this 2708repo is under a BSD license. 2709 2710If you want `bc` to run tests on [`timeconst.bc`][6], download it and place it 2711at `tests/bc/scripts/timeconst.bc`. If it is there, the test suite will 2712automatically run its tests; otherwise, it will skip it. 2713 2714### History Tests 2715 2716There are automatic tests for history; however, they have dependencies: Python 3 2717and [`pexpect`][137]. 2718 2719As a result, because I need the [test suite to be portable][138], like the rest 2720of `bc`, the history tests are carefully guarded with things to ensure that they 2721are skipped, rather than failing if Python and [`pexpect`][137] are not 2722installed. For this reason, there is a `sh` script, [`tests/history.sh`][140] 2723that runs the actual script, [`tests/history.py`][139]. 2724 2725I have added as many tests as I could to cover as many lines and branches as 2726possible. I guess I could have done more, but doing so would have required a lot 2727of time. 2728 2729I have tried to make it as easy as possible to run the history tests. They will 2730run automatically if you use the `make test_history` command, and they will also 2731use parallel execution with `make -j<cores> test_history`. 2732 2733However, the history tests are meant only to be run by maintainers of `bc`; they 2734are *not* meant to be run by users and packagers. The reason for this is that 2735they only seem to work reliably on Linux; `pexpect` seems to have issues on 2736other platforms, especially timeout issues. 2737 2738Thus, they are excluded from running with `make test` and [`tests/all.sh`][225]. 2739However, they can be run from the [`scripts/release.sh`][83] script. 2740 2741All of the tests are contained in [`tests/history.py`][139]. The reason for this 2742is because they are in Python, and I don't have an easy way of including Python 2743(or at the very least, I am not familiar enough with Python to do that). So they 2744are all in the same file to make it easier on me. 2745 2746Each test is one function in the script. They all take the same number and type 2747of arguments: 2748 27491. `exe`: the executable to run. 27502. `args`: the arguments to pass to the executable. 27513. `env`: the environment. 2752 2753Each function creates a child process with `pexpect.spawn` and then tests with 2754that child. Then the function returns the child to the caller, who closes it 2755and checks its error code against its expected error code. 2756 2757Yes, the error code is not a success all the time. This is because of the UTF-8 2758tests; `bc` gives a fatal error on any non-ASCII data because ASCII is all `bc` 2759is required to handle, per the [standard][2]. 2760 2761So in [`tests/history.py`][139], there are four main arrays: 2762 2763* `bc` test functions, 2764* `bc` expected error codes. 2765* `dc` test functions. 2766* `dc` expected error codes. 2767 2768[`tests/history.py`][139] takes an index as an argument; that index is what test 2769it should run. That index is used to index into the proper test and error code 2770array. 2771 2772If you need to add more history tests, you need to do the following: 2773 27741. Add the function for that test to [`tests/history.py`][139]. 27752. Add the function to the proper array of tests. 27763. Add the expected error code to the proper array of error codes. 27774. Add a target for the test to [`Makefile.in`][70]. 27785. Add that target as a prerequisite to either `test_bc_history` or 2779 `test_dc_history`. 2780 2781You do not need to do anything to add the test to `history_all_tests` (see 2782[Group Tests][141] below) because the scripts will automatically run all of the 2783tests properly. 2784 2785### Generated Tests 2786 2787Some tests are *large*, and as such, it is impractical to check them into `git`. 2788Instead, the tests depend on the existence of a GNU-compatible `bc` in the 2789`PATH`, which is then used to generate the tests. 2790 2791If [`configure.sh`][69] was run with the `-G` argument, which disables generated 2792tests, then `make test` and friends will automatically skip generated tests. 2793This is useful to do on platforms that are not guaranteed to have a 2794GNU-compatible `bc` installed. 2795 2796However, adding a generated test is a complicated because you have to figure out 2797*where* you want to put the file to generate the test. 2798 2799For example, `bc`'s test suite will automatically use a GNU-compatible `bc` to 2800generate a `<test>_results.txt` file in the [standard tests][149] directory 2801(either `tests/bc/` or `tests/dc/`) if none exists for the `<test>` test. If no 2802`<test>.txt` file exists in the [standard tests][149] directory, then `bc`'s 2803test suite will look for a `<test>.bc` or `<test>.dc` file in the [script 2804tests][150] directory (either `tests/bc/scripts` or `tests/dc/scripts`), and if 2805that exists, it will use that script to generate the `<test>.txt` file in the 2806[standard tests][149] directory after which it will generate the 2807`<test>_results.txt` file in the [standard tests][149] directory. 2808 2809So you can choose to either: 2810 2811* Have a test in the [standard tests][149] directory without a corresponding 2812 `*_results.txt` file, or 2813* Have a script in the [script tests][150] directory to generate the 2814 corresponding file in the standard test directory before generating the 2815 corresponding `*_results.txt` file. 2816 2817Adding a script has a double benefit: the script itself can be used as a test. 2818However, script test results can also be generated. 2819 2820If `bc` is asked to run a script test, then if the script does not exist, `bc`'s 2821test suite returns an error. If it *does* exist, but no corresponding 2822`<test>.txt` file exists in the [script tests][150] directory, then a 2823GNU-compatible `bc` is used to generate the `<test>.txt` results file. 2824 2825If generated tests are disabled through [`configure.sh`][69], then these tests 2826are not generated if they do not exist. However, if they *do* exist, then they 2827are run. This can happen if a `make clean_tests` was not run between a build 2828that generated tests and a build that will not. 2829 2830### Group Tests 2831 2832While the test suite has a lot of targets in order to get parallel execution, 2833there are five targets that allow you to run each section, or all, of the test 2834suite as one unit: 2835 2836* `bc_all_tests` (`bc` tests) 2837* `timeconst_all_tests` ([Linux `timeconst.bc` script][6] tests) 2838* `dc_all_tests` (`dc` tests) 2839* `history_all_tests` (history tests) 2840* `run_all_tests` (combination of the previous four) 2841 2842In addition, there are more fine-grained targets available: 2843 2844* `test_bc` runs all `bc` tests (except history tests). 2845* `test_dc` runs all `dc` tests (except history tests). 2846* `test_bc_tests` runs all `bc` [standard tests][149]. 2847* `test_dc_tests` runs all `dc` [standard tests][149]. 2848* `test_bc_scripts` runs all `bc` [script tests][150]. 2849* `test_dc_scripts` runs all `dc` [script tests][150]. 2850* `test_bc_stdin` runs the `bc` [`stdin` tests][152]. 2851* `test_dc_stdin` runs the `dc` [`stdin` tests][152]. 2852* `test_bc_read` runs the `bc` [`read()` tests][153]. 2853* `test_dc_read` runs the `dc` [`read()` tests][153]. 2854* `test_bc_errors` runs the `bc` [error tests][151]. 2855* `test_dc_errors` runs the `dc` [error tests][151]. 2856* `test_bc_other` runs the `bc` [other tests][151]. 2857* `test_dc_other` runs the `dc` [other tests][151]. 2858* `timeconst` runs the tests for the [Linux `timeconst.bc` script][6]. 2859* `test_history` runs all history tests. 2860* `test_bc_history` runs all `bc` history tests. 2861* `test_dc_history` runs all `dc` history tests. 2862 2863All of the above tests are parallelizable. 2864 2865### Individual Tests 2866 2867In addition to all of the above, individual test targets are available. These 2868are mostly useful for attempting to fix a singular test failure. 2869 2870These tests are: 2871 2872* `test_bc_<test>`, where `<test>` is the name of a `bc` [standard test][149]. 2873 The name is the name of the test file without the `.txt` extension. It is the 2874 name printed by the test suite when running the test. 2875* `test_dc_<test>`, where `<test>` is the name of a `dc` [standard test][149]. 2876 The name is the name of the test file without the `.txt` extension. It is the 2877 name printed by the test suite when running the test. 2878* `test_bc_script_<test>`, where `<test>` is the name of a `bc` [script 2879 test][150]. The name of the test is the name of the script without the `.bc` 2880 extension. 2881* `test_dc_script_<test>`, where `<test>` is the name of a `dc` [script 2882 test][150]. The name of the test is the name of the script without the `.dc` 2883 extension. 2884* `test_bc_history<idx>` runs the `bc` history test with index `<idx>`. 2885* `test_dc_history<idx>` runs the `dc` history test with index `<idx>`. 2886 2887### [`bcl`][156] Test 2888 2889When [`bcl`][156] is built, the [build system][142] automatically ensures that 2890`make test` runs the [`bcl`][156] test instead of the `bc` and `dc` tests. 2891 2892There is only one test, and it is built from [`tests/bcl.c`][158]. 2893 2894The reason the test is in C is because [`bcl`][156] is a C library; I did not 2895want to have to write C code *and* POSIX `sh` scripts to run it. 2896 2897The reason there is only one test is because most of the code for the library is 2898tested by virtue of testing `bc` and `dc`; the test needs to only ensure that 2899the library bindings and plumbing do not interfere with the underlying code. 2900 2901However, just because there is only one test does not mean that it doesn't test 2902more than one thing. The code actually handles a series of tests, along with 2903error checking to ensure that nothing went wrong. 2904 2905To add a [`bcl`][156] test, just figure out what test you want, figure out where 2906in the [`tests/bcl.c`][158] would be best to put it, and put it there. Do as 2907much error checking as possible, and use the `err(BclError)` function. Ensure 2908that all memory is freed because that test is run through [Valgrind][159] and 2909[AddressSanitizer][160]. 2910 2911### Integration with the Build System 2912 2913If it was not obvious by now, the test suite is heavily integrated into the 2914[build system][142], but the integration goes further than just making the test 2915suite easy to run from `make` and generating individual and group tests. 2916 2917The big problem the test suite has is that some `bc` code, stuff that is 2918important to test, is only in *some* builds. This includes all of the extra math 2919extensions, for example. 2920 2921So the test suite needs to have some way of turning off the tests that depend on 2922certain [build types][81] when those [build types][81] are not used. 2923 2924This is the reason the is tightly integrated with the [build system][142]: the 2925[build system][142] knows what [build type][81] was used and can tell the test 2926suite to turn off the tests that do not apply. 2927 2928It does this with arguments to the test scripts that are either a `1` or a `0`, 2929depending on whether tests of that type should be enabled or not. These 2930arguments are why I suggest, in the [Test Scripts][148] section, to always use a 2931`make` target to run the test suite or any individual test. I have added a lot 2932of targets to make this easy and as fast as possible. 2933 2934In addition to all of that, the build system is responsible for selecting the 2935`bc`/`dc` tests or the [`bcl` test][157]. 2936 2937### Output Directories 2938 2939During any run of the test suite, the test suite outputs the results of running 2940various tests to files. These files are usually output to `tests/bc_outputs/` 2941and `tests/dc_outputs/`. 2942 2943However, in some cases, it may be necessary to output test results to a 2944different directory. If that is the case, set the environment variable 2945`BC_TEST_OUTPUT_DIR` to the name of the directory. 2946 2947If that is done, then test results will be written to 2948`$BC_TEST_OUTPUT_DIR/bc_outputs/` and `$BC_TEST_OUTPUT_DIR/dc_outputs/`. 2949 2950### Test Suite Portability 2951 2952The test suite is meant to be run by users and packagers as part of their 2953install process. 2954 2955This puts some constraints on the test suite, but the biggest is that the test 2956suite must be as [portable as `bc` itself][136]. 2957 2958This means that the test suite must be implemented in pure POSIX `make`, `sh`, 2959and C99. 2960 2961#### Test Scripts 2962 2963To accomplish the portability, the test suite is run by a bunch of `sh` scripts 2964that have the constraints laid out in [POSIX Shell Scripts][76]. 2965 2966However, that means they have some quirks, made worse by the fact that there are 2967[generated tests][143] and [tests that need to be skipped, but only 2968sometimes][147]. 2969 2970This means that a lot of the scripts take an awkward number and type of 2971arguments. Some arguments are strings, but most are integers, like 2972[`scripts/release.sh`][83]. 2973 2974It is for this reason that I do not suggest running the test scripts directly. 2975Instead, always use an appropriate `make` target, which already knows the 2976correct arguments for the test because of the [integration with the build 2977system][147]. 2978 2979### Test Coverage 2980 2981In order to get test coverage information, you need `gcc`, `gcov`, and `gcovr`. 2982 2983If you have them, run the following commands: 2984 2985``` 2986CC=gcc ./configure -gO3 -c 2987make -j<cores> 2988make coverage 2989``` 2990 2991Note that `make coverage` does not have a `-j<cores>` part; it cannot be run in 2992parallel. If you try, you will get errors. And note that `CC=gcc` is used. 2993 2994After running those commands, you can open your web browser and open the 2995`index.html` file in the root directory of the repo. From there, you can explore 2996all of the coverage results. 2997 2998If you see lines or branches that you think you could hit with a manual 2999execution, do such manual execution, and then run the following command: 3000 3001``` 3002make coverage_output 3003``` 3004 3005and the coverage output will be updated. 3006 3007If you want to rerun `make coverage`, you must do a `make clean` and build 3008first, like this: 3009 3010``` 3011make clean 3012make -j<cores> 3013make coverage 3014``` 3015 3016Otherwise, you will get errors. 3017 3018If you want to run tests in parallel, you can do this: 3019 3020``` 3021make -j<cores> 3022make -j<cores> test 3023make coverage_output 3024``` 3025 3026and that will generate coverage output correctly. 3027 3028### [AddressSanitizer][21] and Friends 3029 3030To run the test suite under [AddressSanitizer][21] or any of its friends, use 3031the following commands: 3032 3033``` 3034CFLAGS="-fsanitize=<sanitizer> ./configure -gO3 -m 3035make -j<cores> 3036make -j<cores> test 3037``` 3038 3039where `<sanitizer>` is the correct name of the desired sanitizer. There is one 3040exception to the above: `UndefinedBehaviorSanitizer` should be run on a build 3041that has zero optimization, so for `UBSan`, use the following commands: 3042 3043``` 3044CFLAGS="-fsanitize=undefined" ./configure -gO0 -m 3045make -j<cores> 3046make -j<cores> test 3047``` 3048 3049### [Valgrind][20] 3050 3051To run the test suite under [Valgrind][20], run the following commands: 3052 3053``` 3054./configure -gO3 -v 3055make -j<cores> 3056make -j<cores> test 3057``` 3058 3059It really is that easy. I have directly added infrastructure to the build system 3060and the test suite to ensure that if [Valgrind][20] detects any memory errors or 3061any memory leaks at all, it will tell the test suite infrastructure to report an 3062error and exit accordingly. 3063 3064## POSIX Shell Scripts 3065 3066There is a lot of shell scripts in this repository, and every single one of them 3067is written in pure [POSIX `sh`][72]. 3068 3069The reason that they are written in [POSIX `sh`][72] is for *portability*: POSIX 3070systems are only guaranteed to have a barebones implementation of `sh` 3071available. 3072 3073There are *many* snares for unwary programmers attempting to modify 3074[`configure.sh`][69], any of the scripts in this directory, [`strgen.sh`][9], or 3075any of the scripts in [`tests/`][77]. Here are some of them: 3076 30771. No `bash`-isms. 30782. Only POSIX standard utilities are allowed. 30793. Only command-line options defined in the POSIX standard for POSIX utilities 3080 are allowed. 30814. Only the standardized behavior of POSIX utilities is allowed. 30825. Functions return data by *printing* it. Using `return` sets their exit code. 3083 3084In other words, the script must only use what is standardized in the [`sh`][72] 3085and [Shell Command Language][73] standards in POSIX. This is *hard*. It precludes 3086things like `local` and the `[[ ]]` notation. 3087 3088These are *enormous* restrictions and must be tested properly. I put out at 3089least one release with a change to `configure.sh` that wasn't portable. That was 3090an embarrassing mistake. 3091 3092The lack of `local`, by the way, is why variables in functions are named with 3093the form: 3094 3095``` 3096_<function_name>_<var_name> 3097``` 3098 3099This is done to prevent any clashes of variable names with already existing 3100names. And this applies to *all* shell scripts. However, there are a few times 3101when that naming convention is *not* used; all of them are because those 3102functions are required to change variables in the global scope. 3103 3104### Maintainer-Only Scripts 3105 3106If a script is meant to be used for maintainers (of `bc`, not package 3107maintainers), then rules 2, 3, and 4 don't need to be followed as much because 3108it is assumed that maintainers will be able to install whatever tools are 3109necessary to do the job. 3110 3111## Manuals 3112 3113The manuals for `bc` and `dc` are all generated, and the manpages for `bc`, 3114`dc`, and `bcl` are also generated. 3115 3116Why? 3117 3118I don't like the format of manpages, and I am not confident in my ability to 3119write them. Also, they are not easy to read on the web. 3120 3121So that explains why `bcl`'s manpage is generated from its markdown version. But 3122why are the markdown versions of the `bc` and `dc` generated? 3123 3124Because the content of the manuals needs to change based on the [build 3125type][81]. For example, if `bc` was built with no history support, it should not 3126have the **COMMAND LINE HISTORY** section in its manual. If it did, that would 3127just confuse users. 3128 3129So the markdown manuals for `bc` and `dc` are generated from templates 3130([`manuals/bc.1.md.in`][89] and [`manuals/dc.1.md.in`][90]). And from there, 3131the manpages are generated from the generated manuals. 3132 3133The generated manpage for `bcl` ([`manuals/bcl.3`][62]) is checked into version 3134control, and the generated markdown manuals and manpages for `bc` 3135([`manuals/bc`][79]) and `dc` ([`manuals/dc`][80]) are as well. 3136 3137This is because generating the manuals and manpages requires a heavy dependency 3138that only maintainers should care about: [Pandoc][92]. Because users [should not 3139have to install *any* dependencies][136], the files are generated, checked into 3140version control, and included in distribution tarballs. 3141 3142If you run [`configure.sh`][69], you have an easy way of generating the markdown 3143manuals and manpages: just run `make manpages`. This target calls 3144[`scripts/manpage.sh`][60] appropriately for `bc`, `dc`, and `bcl`. 3145 3146For more on how generating manuals and manpages works, see 3147[`scripts/manpage.sh`][60]. 3148 3149## Locales 3150 3151The locale system of `bc` is enormously complex, but that's because 3152POSIX-compatible locales are terrible. 3153 3154How are they terrible? 3155 3156First, `gencat` does not work for generating cross-compilation. In other words, 3157it does not generate machine-portable files. There's nothing I can do about 3158this except for warn users. 3159 3160Second, the format of `.msg` files is...interesting. Thank goodness it is text 3161because otherwise, it would be impossible to get them right. 3162 3163Third, `.msg` files are not used. In other words, `gencat` exists. Why? 3164 3165Fourth, `$NLSPATH` is an awful way to set where and *how* to install locales. 3166 3167Yes, where and *how*. 3168 3169Obviously, from it's name, it's a path, and that's the where. The *how* is more 3170complicated. 3171 3172It's actually *not* a path, but a path template. It's a format string, and it 3173can have a few format specifiers. For more information on that, see [this 3174link][84]. But in essence, those format specifiers configure how each locale is 3175supposed to be installed. 3176 3177With all those problems, why use POSIX locales? Portability, as always. I can't 3178assume that `gettext` will be available, but I *can* pretty well assume that 3179POSIX locales will be available. 3180 3181The locale system of `bc` includes all files under [`locales/`][85], 3182[`scripts/locale_install.sh`][87], [`scripts/locale_uninstall.sh`][88], 3183[`scripts/functions.sh`][105], the `bc_err_*` constants in [`src/data.c`][131], 3184and the parts of the build system needed to activate it. There is also code in 3185[`src/vm.c`][58] (in `bc_vm_gettext()`) for loading the current locale. 3186 3187If the order of error messages and/or categories are changed, the order of 3188errors must be changed in the enum, the default error messages and categories in 3189[`src/data.c`][131], and all of the messages and categories in the `.msg` files 3190under [`locales/`][85]. 3191 3192## Static Analysis 3193 3194I do *some* static analysis on `bc`. 3195 3196I used to use [Coverity][196], but I stopped using it when it started giving me 3197too many false positives and also because it had a vulnerability. 3198 3199However, I still use the [Clang Static Analyzer][197] through 3200[`scan-build`][19]. I only use it in debug mode because I have to add some 3201special code to make it not complain about things that are definitely not a 3202problem. 3203 3204The most frequent example of false positives is where a local is passed to a 3205function to be initialized. [`scan-build`][19] misses that fact, so I 3206pre-initialize such locals to prevent the warnings. 3207 3208To run `scan-build`, do the following: 3209 3210``` 3211make clean 3212scan-build make 3213``` 3214 3215`scan-build` will print its warnings to `stdout`. 3216 3217## Fuzzing 3218 3219The quality of this `bc` is directly related to the amount of fuzzing I did. As 3220such, I spent a lot of work making the fuzzing convenient and fast, though I do 3221admit that it took me a long time to admit that it did need to be faster. 3222 3223First, there were several things which make fuzzing fast: 3224 3225* Using [AFL++][125]'s deferred initialization. 3226* Splitting `bc`'s corpuses. 3227* Parallel fuzzing. 3228 3229Second, there are several things which make fuzzing convenient: 3230 3231* Preprepared input corpuses. 3232* [`scripts/fuzz_prep.sh`][119]. 3233* `tmux` and `tmuxp` configs. 3234* [`scripts/afl.py`][94]. 3235 3236### Fuzzing Performance 3237 3238Fuzzing with [AFL++][125] can be ***SLOW***. Spending the time to make it as 3239fast as possible is well worth the time. 3240 3241However, there is a caveat to the above: it is easy to make [AFL++][125] crash, 3242be unstable, or be unable to find "paths" (see [AFL++ Quickstart][129]) if the 3243performance enhancements are done poorly. 3244 3245To stop [AFL++][125] from crashing on test cases, and to be stable, these are 3246the requirements: 3247 3248* The state at startup must be *exactly* the same. 3249* The virtual memory setup at startup must be *exactly* the same. 3250 3251The first isn't too hard; it's the second that is difficult. 3252 3253`bc` allocates a lot of memory at start. ("A lot" is relative; it's far less 3254than most programs.) After going through an execution run, however, some of that 3255memory, while it could be cleared and reset, is in different places because of 3256vectors. Since vectors reallocate, their allocations are not guaranteed to be in 3257the same place. 3258 3259So to make all three work, I had to set up the deferred initialization and 3260persistent mode *before* any memory was allocated (except for `vm.jmp_bufs`, 3261which is probably what caused the stability to drop below 100%). However, using 3262deferred alone let me put the [AFL++][125] initialization further back. This 3263works because [AFL++][125] sets up a `fork()` server that `fork()`'s `bc` right 3264at that call. Thus, every run has the exact same virtual memory setup, and each 3265run can skip all of the setup code. 3266 3267I tested `bc` using [AFL++][125]'s deferred initialization, plus persistent 3268mode, plus shared memory fuzzing. In order to do it safely, with stability above 326999%, all of that was actually *slower* than using just deferred initialization 3270with the initialization *right before* `stdin` was read. And as a bonus, the 3271stability in that situation is 100%. 3272 3273As a result, my [AFL++][125] setup only uses deferred initialization. That's the 3274`__AFL_INIT()` call. 3275 3276(Note: there is one more big item that must be done in order to have 100% 3277stability: the pseudo-random number generator *must* start with *exactly* the 3278same seed for every run. This is set up with the `tmux` and `tmuxp` configs that 3279I talk about below in [Convenience][130]. This seed is set before the 3280`__AFL_INIT()` call, so setting it has no runtime cost for each run, but without 3281it, stability would be abysmal.) 3282 3283On top of that, while `dc` is plenty fast under fuzzing (because of a faster 3284parser and less test cases), `bc` can be slow. So I have split the `bc` input 3285corpus into three parts, and I set fuzzers to run on each individually. This 3286means that they will duplicate work, but they will also find more stuff. 3287 3288On top of all of that, each input corpus (the three `bc` corpuses and the one 3289`dc` corpus) is set to run with 4 fuzzers. That works out perfectly for two 3290reasons: first, my machine has 16 cores, and second, the [AFL++][125] docs 3291recommend 4 parallel fuzzers, at least, to run different "power schedules." 3292 3293### Convenience 3294 3295The preprepared input corpuses are contained in the 3296`tests/fuzzing/bc_inputs{1,2,3}/`, and `tests/fuzzing/dc_inputs` directories. 3297There are three `bc` directories and only one `dc` directory because `bc`'s 3298input corpuses are about three times as large, and `bc` is a larger program; 3299it's going to need much more fuzzing. 3300 3301(They do share code though, so fuzzing all of them still tests a lot of the same 3302math code.) 3303 3304The next feature of convenience is the [`scripts/fuzz_prep.sh`][119] script. It 3305assumes the existence of `afl-clang-lto` in the `$PATH`, but if that exists, it 3306automatically configures and builds `bc` with a fuzz-ideal build. 3307 3308A fuzz-ideal build has several things: 3309 3310* `afl-clang-lto` as the compiler. (See [AFL++ Quickstart][129].) 3311* Debug mode, to crash as easily as possible. 3312* Full optimization (including [Link-Time Optimization][126]), for performance. 3313* [AFL++][125]'s deferred initialization (see [Fuzzing Performance][127] above). 3314* And `AFL_HARDEN=1` during the build to harden the build. See the [AFL++][125] 3315 documentation for more information. 3316 3317There is one big thing that a fuzz-ideal build does *not* have: it does not use 3318[AFL++][125]'s `libdislocator.so`. This is because `libdislocator.so` crashes if 3319it fails to allocate memory. I do not want to consider those as crashes because 3320my `bc` does, in fact, handle them gracefully by exiting with a set error code. 3321So `libdislocator.so` is not an option. 3322 3323However, to add to [`scripts/fuzz_prep.sh`][119] making a fuzz-ideal build, in 3324`tests/fuzzing/`, there are two `yaml` files: [`tests/fuzzing/bc_afl.yaml`][120] 3325and [`tests/fuzzing/bc_afl_continue.yaml`][121]. These files are meant to be 3326used with [`tmux`][122] and [`tmuxp`][123]. While other programmers will have to 3327adjust the `start_directory` item, once it is adjusted, then using this command: 3328 3329``` 3330tmuxp load tests/fuzzing/bc_afl.yaml 3331``` 3332 3333will start fuzzing. 3334 3335In other words, to start fuzzing, the sequence is: 3336 3337``` 3338./scripts/fuzz_prep.sh 3339tmuxp load tests/fuzzing/bc_afl.yaml 3340``` 3341 3342Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on 3343`bc` and 4 on `dc`. The outputs will be put into the 3344`tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories. 3345 3346(Note that loading that config will also delete all unsaved [AFL++][125] output 3347from the output directories.) 3348 3349Sometimes, [AFL++][125] will report crashes when there are none. When crashes 3350are reported, I always run the following command: 3351 3352``` 3353./scripts/afl.py <dir> 3354``` 3355 3356where `dir` is one of `bc1`, `bc2`, `bc3`, or `dc`, depending on which of the 335716 instances reported the crash. If it was one of the first four (`bc11` through 3358`bc14`), I use `bc1`. If it was one of the second four (`bc21` through `bc24`, I 3359use `bc2`. If it was one of the third four (`bc31` through `bc34`, I use `bc3`. 3360And if it was `dc`, I use `dc`. 3361 3362The [`scripts/afl.py`][94] script will report whether [AFL++][125] correctly 3363reported a crash or not. If so, it will copy the crashing test case to 3364`.test.txt` and tell you whether it was from running it as a file or through 3365`stdin`. 3366 3367From there, I personally always investigate the crash and fix it. Then, when the 3368crash is fixed, I either move `.test.txt` to `tests/{bc,dc}/errors/<idx>.txt` as 3369an error test (if it produces an error) or I create a new 3370`tests/{bc,dc}/misc<idx>.txt` test for it and a corresponding results file. (See 3371[Test Suite][124] for more information about the test suite.) In either case, 3372`<idx>` is the next number for a file in that particular place. For example, if 3373the last file in `tests/{bc,dc}/errors/` is `tests/{bc,dc}/errors/18.txt`, I 3374move `.test.txt` to `tests/bc/error/19.txt`. 3375 3376Then I immediately run [`scripts/afl.py`][94] again to find the next crash 3377because often, [AFL++][125] found multiple test cases that trigger the same 3378crash. If it finds another, I repeat the process until it is happy. 3379 3380Once it *is* happy, I do the same `fuzz_prep.sh`, `tmuxp load` sequence and 3381restart fuzzing. Why do I restart instead of continuing? Because with the 3382changes, the test outputs could be stale and invalid. 3383 3384However, there *is* a case where I continue: if [`scripts/afl.py`][94] finds 3385that every crash reported by [AFL++][125] is invalid. If that's the case, I can 3386just continue with the command: 3387 3388``` 3389tmuxp load tests/fuzzing/bc_afl_continue.yaml 3390``` 3391 3392(Note: I admit that I usually run [`scripts/afl.py`][94] while the fuzzer is 3393still running, so often, I don't find a need to continue since there was no 3394stop. However, the capability is there, if needed.) 3395 3396In addition, my fuzzing setup, including the `tmux` and `tmuxp` configs, 3397automatically set up [AFL++][125] power schedules (see [Fuzzing 3398Performance][127] above). They also set up the parallel fuzzing such that there 3399is one fuzzer in each group of 4 that does deterministic fuzzing. It's always 3400the first one in each group. 3401 3402For more information about deterministic fuzzing, see the [AFL++][125] 3403documentation. 3404 3405### Corpuses 3406 3407I occasionally add to the input corpuses. These files come from new files in the 3408[Test Suite][124]. In fact, I use soft links when the files are the same. 3409 3410However, when I add new files to an input corpus, I sometimes reduce the size of 3411the file by removing some redundancies. 3412 3413And then, when adding to the `bc` corpuses, I try to add them evenly so that 3414each corpus will take about the same amount of time to get to a finished state. 3415 3416### [AFL++][125] Quickstart 3417 3418The way [AFL++][125] works is complicated. 3419 3420First, it is the one to invoke the compiler. It leverages the compiler to add 3421code to the binary to help it know when certain branches are taken. 3422 3423Then, when fuzzing, it uses that branch information to generate information 3424about the "path" that was taken through the binary. 3425 3426I don't know what AFL++ counts as a new path, but each new path is added to an 3427output corpus, and it is later used as a springboard to find new paths. 3428 3429This is what makes AFL++ so effective: it's not just blindly thrashing a binary; 3430it adapts to the binary by leveraging information about paths. 3431 3432### Fuzzing Runs 3433 3434For doing a fuzzing run, I expect about a week or two where my computer is 3435basically unusable, except for text editing and light web browsing. 3436 3437Yes, it can take two weeks for me to do a full fuzzing run, and that does *not* 3438include the time needed to find and fix crashes; it only counts the time on the 3439*last* run, the one that does not find any crashes. This means that the entire 3440process can take a month or more. 3441 3442What I use as an indicator that the fuzzing run is good enough is when the 3443number of "Pending" paths (see [AFL++ Quickstart][129] above) for all fuzzer 3444instances, except maybe the deterministic instances, is below 50. And even then, 3445I try to let deterministic instances get that far as well. 3446 3447You can see how many pending paths are left in the "path geometry" section of 3448the [AFL++][125] dashboard. 3449 3450Also, to make [AFL++][125] quit, you need to send it a `SIGINT`, either with 3451`Ctrl+c` or some other method. It will not quit until you tell it to. 3452 3453### Radamsa 3454 3455I rarely use [Radamsa][99] instead of [AFL++][125]. In fact, it's only happened 3456once. 3457 3458The reason I use [Radamsa][99] instead of [AFL++][125] is because it is easier 3459to use with varying command-line arguments, which was needed for testing `bc`'s 3460command-line expression parsing code, and [AFL++][125] is best when testing 3461input from `stdin`. 3462 3463[`scripts/radamsa.sh`][100] does also do fuzzing on the [AFL++][125] inputs, but 3464it's not as effective at that, so I don't really use it for that either. 3465 3466[`scripts/radamsa.sh`][100] and [Radamsa][99] were only really used once; I have 3467not had to touch the command-line expression parsing code since. 3468 3469### [AddressSanitizer][21] with Fuzzing 3470 3471One advantage of using [AFL++][125] is that it saves every test case that 3472generated a new path (see [AFL++ Quickstart][129] above), and it doesn't delete 3473them when the user makes it quit. 3474 3475Keeping them around is not a good idea, for several reasons: 3476 3477* They are frequently large. 3478* There are a lot of them. 3479* They go stale; after `bc` is changed, the generated paths may not be valid 3480 anymore. 3481 3482However, before they are deleted, they can definitely be leveraged for even 3483*more* bug squashing by running *all* of the paths through a build of `bc` with 3484[AddressSanitizer][21]. 3485 3486This can easily be done with these four commands: 3487 3488``` 3489./scripts/fuzz_prep.sh -a 3490./scripts/afl.py --asan bc1 3491./scripts/afl.py --asan bc2 3492./scripts/afl.py --asan bc3 3493./scripts/afl.py --asan dc 3494``` 3495 3496(By the way, the last four commands could be run in separate terminals to do the 3497processing in parallel.) 3498 3499These commands build an [ASan][21]-enabled build of `bc` and `dc` and then they 3500run `bc` and `dc` on all of the found crashes and path output corpuses. This is 3501to check that no path or crash has found any memory errors, including memory 3502leaks. 3503 3504Because the output corpuses can contain test cases that generate infinite loops 3505in `bc` or `dc`, [`scripts/afl.py`][94] has a timeout of 8 seconds, which is far 3506greater than the timeout that [AFL++][125] uses and should be enough to catch 3507any crash. 3508 3509If [AFL++][125] fails to find crashes *and* [ASan][21] fails to find memory 3510errors on the outputs of [AFL++][125], that is an excellent indicator of very 3511few bugs in `bc`, and a release can be made with confidence. 3512 3513## Code Concepts 3514 3515This section is about concepts that, if understood, will make it easier to 3516understand the code as it is written. 3517 3518The concepts in this section are not found in a single source file, but they are 3519littered throughout the code. That's why I am writing them all down in a single 3520place. 3521 3522### POSIX Mode 3523 3524POSIX mode is `bc`-only. 3525 3526In fact, POSIX mode is two different modes: Standard Mode and Warning Mode. 3527These modes are designed to help users write POSIX-compatible `bc` scripts. 3528 3529#### Standard Mode 3530 3531Standard Mode is activated with the `-s` or `--standard` flags. 3532 3533In this mode, `bc` will error if any constructs are used that are not strictly 3534compatible with the [POSIX `bc` specification][2]. 3535 3536#### Warning Mode 3537 3538Warning Mode is activated with the `-w` or `--warn` flags. 3539 3540In this mode, `bc` will issue warnings, but continue, if any constructs are used 3541that are not strictly compatible with the [POSIX `bc` specification][2]. 3542 3543### Memory Management 3544 3545The memory management in `bc` is simple: everything is owned by one thing. 3546 3547If something is in a vector, it is owned by that vector. 3548 3549If something is contained in a struct, it is owned by that struct with one 3550exception: structs can be given pointers to other things, but only if those 3551other things will outlast the struct itself. 3552 3553As an example, the `BcParse` struct has a pointer to the one `BcProgram` in 3554`bc`. This is okay because the program is initialized first and deallocated 3555last. 3556 3557In other words, it's simple: if a field in a struct is a pointer, then unless 3558that pointer is directly allocated by the struct (like the vector array or the 3559number limb array), that struct does not own the item at that pointer. 3560Otherwise, the struct *does* own the item. 3561 3562### [Async-Signal-Safe][115] Signal Handling 3563 3564`bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but 3565`bc` is, by and large, CPU-bound. This has several consequences, but the biggest 3566is that there is no easy way to allow signals to interrupt it. 3567 3568This consequence is not obvious, but it comes from the fact that a lot of I/O 3569operations can be interrupted and return [`EINTR`][198]. This makes such I/O 3570calls natural places for allowing signals to interrupt execution, even when the 3571signal comes during execution, and not interrupting I/O calls. The way this is 3572done is setting a flag in the signal handler, which is checked around the time 3573of the I/O call, when it is convenient. 3574 3575Alternatively, I/O bound programs can use the [self-pipe trick][199]. 3576 3577Neither of these are possible in `bc` because the execution of math code can 3578take a long time. If a signal arrives during this long execution time, setting a 3579flag like an I/O bound application and waiting until the next I/O call could 3580take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a 3581week to calculate a million digits of pi, and it's not slow as far as `bc` 3582implementations go.) 3583 3584Thus, using just the technique of setting the flag just will not work for an 3585interactive calculator. 3586 3587Well, it can, but it requires a lot of code and massive inefficiencies. I know 3588this because that was the original implementation. 3589 3590The original implementation set a flag and just exit the signal handler. Then, 3591on just about every loop header, I have a check for the signal flag. These 3592checks happened on every iteration of every loop. It was a massive waste because 3593it was polling, and [polling is evil][200]. 3594 3595So for version [3.0.0][32], I expended a lot of effort to change the 3596implementation. 3597 3598In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`) 3599to tell the signal handler whether it can use `longjmp()` to stop the current 3600execution. If so, it does. If not, it sets a flag, which then is used by the 3601code outside the signal handler that set the `vm.sig_lock` flag. When that code 3602unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that 3603code executes the `longjmp()` and stops the current execution. 3604 3605Other than that, the rest of the interrupt-based implementation is best 3606described in the [Error Handling][97]. 3607 3608However, there are rules for signal handlers that I must lay out. 3609 3610First, signal handlers can only call [async-signal-safe][115] functions. 3611 3612Second, any field set or read by both the signal handler and normal code must be 3613a `volatile sig_atomic_t`. 3614 3615Third, when setting such fields, they must be set to constants and no math can 3616be done on them. This restriction and the above restriction exist in order to 3617ensure that the setting of the fields is always atomic with respect to signals. 3618 3619These rules exist for *any* code using Unix signal handlers, not just `bc`. 3620 3621#### Vectors and Numbers 3622 3623Vectors and numbers needed special consideration with the interrupt-based signal 3624handling. 3625 3626When vectors and numbers are about to allocate, or *reallocate* their arrays, 3627they need to lock signals to ensure that they do not call `malloc()` and friends 3628and get interrupted by a signal because, as you will see in the [Error 3629Handling][97] section, `longjmp()` cannot be used in a signal handler if it may 3630be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and 3631friends. 3632 3633### Asserts 3634 3635If you asked me what procedure is used the most in `bc`, I would reply without 3636hesitation, "`assert()`." 3637 3638I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with 3639[AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes, 3640and a failing `assert()` counts as one. 3641 3642So while a lot of bad bugs might have corrupted data and *not* caused crashes, 3643because I put in so many `assert()`'s, they were *turned into* crashing bugs, 3644and [AFL++][125] found them. 3645 3646By far, the most bugs it found this way was in the `bc` parser. (See the [`bc` 3647Parsing][110] for more information.) And even though I was careful to put 3648`assert()`'s everywhere, most parser bugs manifested during execution of 3649bytecode because the virtual machine assumes the bytecode is valid. 3650 3651Sidenote: one of those bugs caused an infinite recursion when running the sine 3652(`s()`) function in the math library, so yes, parser bugs can be *very* weird. 3653 3654Anyway, the way I did `assert()`'s was like this: whenever I realized that I 3655had put assumptions into the code, I would put an `assert()` there to test it 3656**and** to *document* it. 3657 3658Yes, documentation. In fact, by far the best documentation of the code in `bc` 3659is actually the `assert()`'s. The only time I would not put an `assert()` to 3660test an assumption is if that assumption was already tested by an `assert()` 3661earlier. 3662 3663As an example, if a function calls another function and passes a pointer that 3664the caller previously `assert()`'ed was *not* `NULL`, then the callee does not 3665have to `assert()` it too, unless *also* called by another function that does 3666not `assert()` that. 3667 3668At first glance, it may seem like putting asserts for pointers being non-`NULL` 3669everywhere would actually be good, but unfortunately, not for fuzzing. Each 3670`assert()` is a branch, and [AFL++][125] rates its own effectiveness based on 3671how many branches it covers. If there are too many `assert()`'s, it may think 3672that it is not being effective and that more fuzzing is needed. 3673 3674This means that `assert()`'s show up most often in two places: function 3675preconditions and function postconditions. 3676 3677Function preconditions are `assert()`'s that test conditions relating to the 3678arguments a function was given. They appear at the top of the function, usually 3679before anything else (except maybe initializing a local variable). 3680 3681Function postconditions are `assert()`'s that test the return values or other 3682conditions when a function exits. These are at the bottom of a function or just 3683before a `return` statement. 3684 3685The other `assert()`'s cover various miscellaneous assumptions. 3686 3687If you change the code, I ***HIGHLY*** suggest that you use `assert()`'s to 3688document your assumptions. And don't remove them when [AFL++][125] gleefully 3689crashes `bc` and `dc` over and over again. 3690 3691### Vectors 3692 3693In `bc`, vectors mean resizable arrays, and they are the most fundamental piece 3694of code in the entire codebase. 3695 3696I had previously written a [vector implementation][112], which I used to guide 3697my decisions, but I wrote a new one so that `bc` would not have a dependency. I 3698also didn't make it as sophisticated; the one in `bc` is very simple. 3699 3700Vectors store some information about the type that they hold: 3701 3702* The size (as returned by `sizeof`). 3703* An enum designating the destructor. 3704 3705If the destructor is `BC_DTOR_NONE`, it is counted as the type not having a 3706destructor. 3707 3708But by storing the size, the vector can then allocate `size * cap` bytes, where 3709`cap` is the capacity. Then, when growing the vector, the `cap` is doubled again 3710and again until it is bigger than the requested size. 3711 3712But to store items, or to push items, or even to return items, the vector has to 3713figure out where they are, since to it, the array just looks like an array of 3714bytes. 3715 3716It does this by calculating a pointer to the underlying type with 3717`v + (i * size)`, where `v` is the array of bytes, `i` is the index of the 3718desired element, and `size` is the size of the underlying type. 3719 3720Doing that, vectors can avoid undefined behavior (because `char` pointers can 3721be cast to any other pointer type), while calculating the exact position of 3722every element. 3723 3724Because it can do that, it can figure out where to push new elements by 3725calculating `v + (len * size)`, where `len` is the number of items actually in 3726the vector. 3727 3728By the way, `len` is different from `cap`. While `cap` is the amount of storage 3729*available*, `len` is the number of actual elements in the vector at the present 3730point in time. 3731 3732Growing the vector happens when `len` is equal to `cap` *before* pushing new 3733items, not after. 3734 3735To add a destructor, you need to add an enum item to `BcDtorType` in 3736[`include/vector.h`][174] and add the actual destructor in the same place as the 3737enum item in the `bc_vec_dtors[]` array in [`src/data.c`][131]. 3738 3739#### Pointer Invalidation 3740 3741There is one big danger with the vectors as currently implemented: pointer 3742invalidation. 3743 3744If a piece of code receives a pointer from a vector, then adds an item to the 3745vector before they finish using the pointer, that code must then update the 3746pointer from the vector again. 3747 3748This is because any pointer inside the vector is calculated based off of the 3749array in the vector, and when the vector grows, it can `realloc()` the array, 3750which may move it in memory. If that is done, any pointer returned by 3751`bc_vec_item()`, `bc_vec_top()` and `bc_vec_item_rev()` will be invalid. 3752 3753This fact was the single most common cause of crashes in the early days of this 3754`bc`; wherever I have put a comment about pointers becoming invalidated and 3755updating them with another call to `bc_vec_item()` and friends, *do **NOT** 3756remove that code!* 3757 3758#### Maps 3759 3760Maps in `bc` are...not. 3761 3762They are really a combination of two vectors. Those combinations are easily 3763recognized in the source because one vector is named `<name>s` (plural), and the 3764other is named `<name>_map`. 3765 3766There are currently three, all in `BcProgram`: 3767 3768* `fns` and `fn_map` (`bc` functions). 3769* `vars` and `var_map` (variables). 3770* `arrs` and `arr_map` (arrays). 3771 3772They work like this: the `<name>_map` vector holds `BcId`'s, which just holds a 3773string and an index. The string is the name of the item, and the index is the 3774index of that item in the `<name>s` vector. 3775 3776Obviously, I could have just done a linear search for items in the `<name>s` 3777vector, but that would be slow with a lot of functions/variables/arrays. 3778Instead, I ensure that whenever an item is inserted into the `<name>_map` 3779vector, the item is inserted in sorted order. This means that the `<name>_map` 3780is always sorted (by the names of the items). 3781 3782So when looking up an item in the "map", what is really done is this: 3783 37841. A binary search is carried out on the names in the `<name>_map` vector. 37852. When one is found, it returns the index in the `<name>_map` vector where the 3786 item was found. 37873. This index is then used to retrieve the `BcId`. 37884. The index from the `BcId` is then used to index into the `<name>s` vector, 3789 which returns the *actual* desired item. 3790 3791Why were the `<name>s` and `<name>_map` vectors not combined for ease? The 3792answer is that sometime, when attempting to insert into the "map", code might 3793find that something is already there. For example, a function with that name may 3794already exist, or the variable might already exist. 3795 3796If the insert fails, then the name already exists, and the inserting code can 3797forego creating a new item to put into the vector. However, if there is no item, 3798the inserting code must create a new item and insert it. 3799 3800If the two vectors were combined together, it would not be possible to separate 3801the steps such that creating a new item could be avoided if it already exists. 3802 3803#### Slabs and Slab Vectors 3804 3805`bc` allocates *a lot* of small strings, and small allocations are the toughest 3806for general-purpose allocators to handle efficiently. 3807 3808Because of that reason, I decided to create a system for allocating small 3809strings using something that I call a "slab vector" after [slab 3810allocators][201]. 3811 3812These vectors allocate what I call "slabs," which are just an allocation of a 3813single page with a length to tell the slab how much of the slab is used. 3814 3815The vector itself holds slabs, and when the slab vector is asked to allocate a 3816string, it attempts to in the last slab. If that slab cannot do so, it allocates 3817a new slab and allocates from that. 3818 3819There is one exception: if a string is going to be bigger than 128 bytes, then 3820the string is directly allocated, and a slab is created with that pointer and a 3821length of `SIZE_MAX`, which tells the slab vector that it is a direct 3822allocation. Then, the last slab is pushed into the next spot and the new special 3823slab is put into the vacated spot. This ensures that a non-special slab is 3824always last. 3825 3826### Command-Line History 3827 3828When I first wrote `bc`, I immediately started using it in order to eat my own 3829dog food. 3830 3831It sucked, and the biggest reason why was because of the lack of command-line 3832history. 3833 3834At first, I just dealt with it, not knowing how command-line history might be 3835implemented. 3836 3837Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had 3838known about for some time. 3839 3840It turned out to be easier than I thought; the hardest part was the tedious 3841renaming of everything to fit the `bc` naming scheme. 3842 3843Understanding command-line history in `bc` is really about understanding VT-100 3844escape codes, so I would start there. 3845 3846Now, the history implementation of `bc` has been adapted far beyond that initial 3847adaptation to make the command-line history implementation perfect for `bc` 3848alone, including integrating it into `bc`'s [Custom I/O][114] and making sure 3849that it does not disturb output that did not end with a newline. 3850 3851On top of that, at one point, I attempted to get history to work on Windows. It 3852barely worked after a lot of work and a lot of portability code, but even with 3853all of that, it does not have at least one feature: multi-line pasting from the 3854clipboard. 3855 3856#### Editline and Readline 3857 3858I also implemented an integration with both editline and readline, though not at 3859the same time. This allows distributions that want to use either one in `bc` to 3860do so. This enables users to use their `.editrc` and `.inputrc` settings. 3861 3862The integration is custom to each library, and it does *not* involve any of 3863`bc`'s custom history code. It also required changes to `bc`'s [Custom 3864I/O][114]. 3865 3866### Error Handling 3867 3868The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it 3869became one of the things that taught me the most about C in particular and 3870programming in general. 3871 3872Before then, error handling was manual. Almost all functions returned a 3873`BcStatus` indicating if an error had occurred. This led to a proliferation of 3874lines like: 3875 3876``` 3877if (BC_ERR(s)) return s; 3878``` 3879 3880In fact, a quick and dirty count of such lines in version `2.7.2` (the last 3881version before [`3.0.0`][32]) turned up 252 occurrences of that sort of line. 3882 3883And that didn't even guarantee that return values were checked *everywhere*. 3884 3885But before I can continue, let me back up a bit. 3886 3887From the beginning, I decided that I would not do what GNU `bc` does on errors; 3888it tries to find a point at which it can recover. Instead, I decided that I 3889would have `bc` reset to a clean slate, which I believed, would reduce the 3890number of bugs where an unclean state caused errors with continuing execution. 3891 3892So from the beginning, errors would essentially unwind the stack until they got 3893to a safe place from which to clean the slate, reset, and ask for more input. 3894 3895Well, if that weren't enough, `bc` also has to handle [POSIX signals][113]. As 3896such, it had a signal handler that set a flag. But it could not safely interrupt 3897execution, so that's all it could do. 3898 3899In order to actually respond to the signal, I had to litter checks for the flag 3900*everywhere* in the code. And I mean *everywhere*. They had to be checked on 3901every iteration of *every* loop. They had to be checked going into and out of 3902certain functions. 3903 3904It was a mess. 3905 3906But fortunately for me, signals did the same thing that errors did: they unwound 3907the stack to the *same* place. 3908 3909Do you see where I am going with this? 3910 3911It turns out that what I needed was a [async-signal-safe][115] form of what 3912programmers call "exceptions" in other languages. 3913 3914I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement 3915exceptions, so I thought I would learn how to use them. How hard could it be? 3916 3917Quite hard, it turns out, especially in the presence of signals. And that's 3918because there are a few big snares: 3919 39201. The value of any local variables are not guaranteed to be preserved after a 3921 `longjmp()` back into a function. 39222. While `longjmp()` is required to be [async-signal-safe][115], if it is 3923 invoked by a signal handler that interrupted a non-[async-signal-safe][115] 3924 function, then the behavior is undefined. 39253. Any mutation that is not guaranteed to be atomic with respect to signals may 3926 be incomplete when a signal arrives. 3927 3928Oh boy. 3929 3930For number 1, the answer to this is to hide data that must stay changed behind 3931pointers. Only the *pointers* are considered local, so as long as I didn't do 3932any modifying pointer arithmetic, pointers and their data would be safe. For 3933cases where I have local data that must change and stay changed, I needed to 3934*undo* the `setjmp()`, do the change, and the *redo* the `setjmp()`. 3935 3936For number 2 and number 3, `bc` needs some way to tell the signal handler that 3937it cannot do a `longjmp()`. This is done by "locking" signals with a `volatile 3938sig_atomic_t`. (For more information, see the [Async-Signal-Safe Signal 3939Handling][173] section.) For every function that calls a function that is not 3940async-signal-safe, they first need to use `BC_SIG_LOCK` to lock signals, and 3941afterward, use `BC_SIG_UNLOCK` to unlock signals. 3942 3943Code also need to do this for all global, non-atomic mutation, which means that 3944modifying any part of the `BcVm` global struct. 3945 3946`BC_SIG_UNLOCK` has another requirement: it must check for signals or errors and 3947jump if necessary. 3948 3949On top of all of that, *all* functions with cleanup needed to be able to run 3950their cleanup. This meant that `longjmp()` could not just jump to the finish; it 3951had to start what I call a "jump series," using a stack of `jmp_buf`'s 3952(`jmp_bufs` in `BcVm`). Each `longjmp()` uses the top of the `jmp_bufs` stack to 3953execute its jump. Then, if the cleanup code was executed because of a jump, the 3954cleanup code was responsible for continuing the jump series by popping the 3955previous item off the stack and using the new top of the stack for a jump. 3956 3957In this way, C++-style exceptions were implemented in pure C. Not fun, but it 3958works. However, the order of operations matters, especially in the macros that 3959help implement the error handling. 3960 3961For example, in `BC_UNSETJMP`, signals are unlocked before checking for signals. 3962If a signal comes between, that's fine; it will still cause a jump to the right 3963place. However, disabling the lock after could mean that a signal could come 3964*after* checking for signals, but before signals were unlocked, causing the 3965handling of the signal to be delayed. 3966 3967#### Custom I/O 3968 3969Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and 3970`longjmp()` for error handling (see the [Error Handling][97] section), and the 3971buffered I/O in `libc` does not interact well with the use of those procedures; 3972all of the buffered I/O API is basically non-[async-signal-safe][115]. 3973 3974Implementing custom buffered I/O had other benefits. First, it allowed me to 3975tightly integrate history with the I/O code. Second, it allowed me to make 3976changes to history in order to make it adapt to user prompts. 3977 3978### Lexing 3979 3980To simplify parsing, both calculators use lexers to turn the text into a more 3981easily-parsable form. 3982 3983While some tokens are only one character long, others require many tokens, and 3984some of those need to store all of the text corresponding to the token for use 3985by the parsers. Tokens that need to store their corresponding text include, but 3986are not limited to: 3987 3988* Strings. 3989* Numbers. 3990* Identifiers. 3991 3992For this purpose, the lexer has a [vector][111] named `str` to store the data 3993for tokens. This data is overwritten if another token is lexed that needs to 3994store data, so the parsers need to copy the data before calling the lexer again. 3995 3996Both lexers do some of the same things: 3997 3998* Lex identifiers into tokens, storing the identifier in `str`. 3999* Lex number strings into tokens, storing the string in `str`. 4000* Lex whitespace. 4001* Lex comments. 4002 4003Other than that, and some common plumbing, the lexers have separate code. 4004 4005#### `dc` Lexing 4006 4007The `dc` lexer is remarkably simple; in fact, besides [`src/main.c`][205], 4008[`src/bc.c`][40], and [`src/dc.c`][44], which just contain one function each, 4009the only file smaller than [`src/dc_lex.c`][45] is [`src/args.c`][206], which 4010just processes command-line arguments after they are parsed by 4011[`src/opt.c`][51]. 4012 4013For most characters, the `dc` lexer is able to convert directly from the 4014character to its corresponding token. This happens using `dc_lex_tokens[]` in 4015[`src/data.c`][131]. 4016 4017`dc`'s lexer also has to lex the register name after lexing tokens for commands 4018that need registers. 4019 4020And finally, `dc`'s lexer needs to parse `dc` strings, which is the only part of 4021the `dc` lexer that is more complex than the `bc` lexer. This is because `dc` 4022strings need to have a balanced number of brackets. 4023 4024#### `bc` Lexing 4025 4026The `bc` lexer is fairly simple. It does the following things: 4027 4028* Lexes `bc` strings. 4029* Lexes `bc` identifiers. This is necessary because this is how `bc` keywords 4030 are lexed. After ensuring that an identifier is not a keyword, the `bc` lexer 4031 allows the common identifier function to take over. 4032* Turns characters and groups of characters into `bc` operator tokens. 4033 4034### Parsing 4035 4036The difference between parsing `bc` and `dc` code is...vast. The `dc` parser is 4037simple, while the `bc` parser is the most complex piece of code in the entire 4038codebase. 4039 4040However, they both do some of the same things. 4041 4042First, the parsers do *not* use [abstract syntax trees][207]; instead, they 4043directly generate the bytecode that will be executed by the `BcProgram` code. 4044Even in the case of `bc`, this heavily simplifies the parsing because the 4045[Shunting-Yard Algorithm][109] is designed to generate [Reverse Polish 4046Notation][108], which is basically directly executable. 4047 4048Second, any extra data that the `BcProgram` needs for execution is stored into 4049functions (see the [Functions][208] section). These include constants and 4050strings. 4051 4052#### `dc` Parsing 4053 4054The parser for `dc`, like its lexer, is remarkably simple. In fact, the easiness 4055of lexing and parsing [Reverse Polish notation][108] is probably why it was used 4056for `dc` when it was first created at Bell Labs. 4057 4058For most tokens, the `dc` parser is able to convert directly from the token 4059to its corresponding instruction. This happens using `dc_parse_insts[]` in 4060[`src/data.c`][131]. 4061 4062`dc`'s parser also has to parse the register name for commands that need 4063registers. This is the most complex part of the `dc` parser; each different 4064register command needs to be parsed differently because most of them require two 4065or more instructions to execute properly. 4066 4067For example, storing in a register requires a swap instruction and an assignment 4068instruction. 4069 4070Another example are conditional execution instructions; they need to produce the 4071instruction for the condition, and then they must parse a possible "else" part, 4072which might not exist. 4073 4074##### Existing Commands 4075 4076`dc` is based on commands, which are usually one letter. The following table is 4077a table of which ASCII characters are already used: 4078 4079| Characters | Used? | For... | 4080|------------|-------|--------------------------------------------| 4081| Space | x | Separator | 4082| `!` | x | Conditional Execution of Registers | 4083| `"` | x | Bounded Rand Operator | 4084| `#` | x | Comments | 4085| `$` | x | Truncation | 4086| `%` | x | Modulus | 4087| `&` | | | 4088| `'` | x | Rand Operator | 4089| `(` | x | Greater Than Operator | 4090| `)` | x | Less Than Operator | 4091| `*` | x | Multiplication | 4092| `+` | x | Addition | 4093| `,` | x | Depth of Execution Stack | 4094| `-` | x | Subtraction | 4095| `.` | x | Numbers | 4096| `/` | x | Division | 4097| `0-9` | x | Numbers | 4098| `:` | x | Store into Array | 4099| `;` | x | Load from Array | 4100| `<` | x | Conditional Execution of Registers | 4101| `=` | x | Conditional Execution of Registers | 4102| `>` | x | Conditional Execution of Registers | 4103| `?` | x | Ask for User Input | 4104| `@` | x | Places Operator | 4105| `A-F` | x | Numbers | 4106| `G` | x | Equal Operator | 4107| `H` | x | Shift Left | 4108| `I` | x | Push `ibase` onto Stack | 4109| `J` | x | Push `seed` onto Stack | 4110| `K` | x | Push `scale` onto Stack | 4111| `L` | x | Pop off of Register | 4112| `M` | x | Boolean And Operator | 4113| `N` | x | Boolean Not Operator | 4114| `O` | x | Push `obase` onto Stack | 4115| `P` | x | Byte Stream Printing | 4116| `Q` | x | Quit Some Number of Macros | 4117| `R` | x | Pop Top of Stack | 4118| `S` | x | Push onto Register | 4119| `T` | x | Push Max `ibase` onto Stack | 4120| `U` | x | Push Max `obase` onto Stack | 4121| `V` | x | Push Max `scale` onto Stack | 4122| `W` | x | Push Max of `'` Operator | 4123| `X` | x | Scale of a Number | 4124| `Y` | x | Length of Array | 4125| `Z` | x | Number of Significant Digits | 4126| `[` | x | Strings | 4127| `\\` | x | Escaping Brackets in Strings | 4128| `]` | x | Strings | 4129| `^` | x | Power | 4130| `_` | x | Negative Numbers and Negation | 4131| Backtick | | | 4132| `a` | x | Asciify | 4133| `b` | x | Absolute Value | 4134| `c` | x | Clear Stack | 4135| `d` | x | Duplication of Top of Stack | 4136| `e` | x | Else in Conditional Execution of Registers | 4137| `f` | x | Printing the Stack | 4138| `g` | x | Global Settings | 4139| `h` | x | Shift Right | 4140| `i` | x | Set `ibase` | 4141| `j` | x | Set `seed` | 4142| `k` | x | Set `scale` | 4143| `l` | x | Load from Register | 4144| `m` | x | Boolean Or Operator | 4145| `n` | x | Print and Pop | 4146| `o` | x | Set `obase` | 4147| `p` | x | Print with Newline | 4148| `q` | x | Quit Two Macros | 4149| `r` | x | Swap Top Two Items | 4150| `s` | x | Store into Register | 4151| `t` | x | Equivalent of `bc`'s `is_number()` | 4152| `u` | x | Equivalent of `bc`'s `is_string()` | 4153| `v` | x | Square Root | 4154| `w` | | | 4155| `x` | x | Execute String | 4156| `y` | x | Current Depth of a Register | 4157| `z` | x | Current Depth of Stack | 4158| `{` | x | Greater Than or Equal Operator | 4159| `\|` | x | Moduler Exponentiation | 4160| `}` | x | Less Than or Equal Operator | 4161| `~` | x | Division and Modulus Combined | 4162 4163#### `bc` Parsing 4164 4165`bc`'s parser is, by far, the most sensitive piece of code in this software, and 4166there is a very big reason for that: `bc`'s standard is awful and defined a very 4167poor language. 4168 4169The standard says that either semicolons or newlines can end statements. Trying 4170to parse the end of a statement when it can either be a newline or a semicolon 4171is subtle. Doing it in the presence of control flow constructs that do not have 4172to use braces is even harder. 4173 4174And then comes the biggest complication of all: `bc` has to assume that it is 4175*always* at a REPL (Read-Eval-Print Loop). `bc` is, first and foremost, an 4176*interactive* utility. 4177 4178##### Flags 4179 4180All of this means that `bc` has to be able to partially parse something, store 4181enough data to recreate that state later, and return, making sure to not 4182execute anything in the meantime. 4183 4184*That* is what the flags in [`include/bc.h`][106] are: they are the state that 4185`bc` is saving for itself. 4186 4187It saves them in a stack, by the way, because it's possible to nest 4188structures, just like any other programming language. Thus, not only does it 4189have to store state, it needs to do it arbitrarily, and still be able to 4190come back to it. 4191 4192So `bc` stores its parser state with flags in a stack. Careful setting of these 4193flags, along with properly using them and maintaining the flag stack, are what 4194make `bc` parsing work, but it's complicated. In fact, as I mentioned, the `bc` 4195parser is the single most subtle, fickle, and sensitive piece of code in the 4196entire codebase. Only one thing came close once: square root, and that was only 4197sensitive because I wrote it wrong. This parser is pretty good, and it is 4198*still* sensitive. And flags are the reason why. 4199 4200For more information about what individual flags there are, see the comments in 4201[`include/bc.h`][106]. 4202 4203##### Labels 4204 4205`bc`'s language is Turing-complete. That means that code needs the ability to 4206jump around, specifically to implement control flow like `if` statements and 4207loops. 4208 4209`bc` handles this while parsing with what I called "labels." 4210 4211Labels are markers in the bytecode. They are stored in functions alongside the 4212bytecode, and they are just indices into the bytecode. 4213 4214When the `bc` parser creates a label, it pushes an index onto the labels array, 4215and the index of the label in that array is the index that will be inserted into 4216the bytecode. 4217 4218Then, when a jump happens, the index pulled out of the bytecode is used to index 4219the labels array, and the label (index) at the index is then used to set the 4220instruction pointer. 4221 4222##### Cond Labels 4223 4224"Cond" labels are so-called because they are used by conditionals. 4225 4226The key to them is that they come *before* the code that uses them. In other 4227words, when jumping to a condition, code is jumping *backwards*. 4228 4229This means that when a cond label is created, the value that should go there is 4230well-known. Cond labels are easy. 4231 4232However, they are still stored on a stack so that the parser knows what cond 4233label to use. 4234 4235##### Exit Labels 4236 4237Exit labels are not so easy. 4238 4239"Exit" labels are so-called because they are used by code "exiting" out of `if` 4240statements or loops. 4241 4242The key to them is that they come *after* the code that uses them. In other 4243words, when jumping to an exit, code is jumping *forwards*. 4244 4245But this means that when an exit label is created, the value that should go 4246there is *not* known. The code that needs it must be parsed and generated first. 4247 4248That means that exit labels are created with the index of `SIZE_MAX`, which is 4249then specifically checked for with an assert in `bc_program_exec()` before using 4250those indices. 4251 4252There should ***NEVER*** be a case when an exit label is not filled in properly 4253if the parser has no bugs. This is because every `if` statement, every loop, 4254must have an exit, so the exit must be set. If not, there is a bug. 4255 4256Exit labels are also stored on a stack so that the parser knows what exit label 4257to use. 4258 4259##### Expression Parsing 4260 4261`bc` has expressions like you might expect in a typical programming language. 4262This means [infix notation][107]. 4263 4264One thing about infix notation is that you can't just generate code straight 4265from it like you can with [Reverse Polish notation][108]. It requires more work 4266to shape it into a form that works for execution on a stack machine. 4267 4268That extra work is called the [Shunting-Yard algorithm][109], and the form it 4269translates infix notation into is...[Reverse Polish notation][108]. 4270 4271In order to understand the rest of this section, you must understand the 4272[Shunting-Yard algorithm][109]. Go do that before you read on. 4273 4274###### Operator Stack 4275 4276In `bc`, the [Shunting-Yard algorithm][109] is implemented with bytecode as the 4277output and an explicit operator stack (the `ops` field in `BcParse`) as the 4278operator stack. It stores tokens from `BcLex`. 4279 4280However, there is one **HUGE** hangup: multiple expressions can stack. This 4281means that multiple expressions can be parsed at one time (think an array element 4282expression in the middle of a larger expression). Because of that, we need to 4283keep track of where the previous expression ended. That's what `start` parameter 4284to `bc_parse_operator()` is. 4285 4286Parsing multiple expressions on one operator stack only works because 4287expressions can only *stack*; this means that, if an expression begins before 4288another ends, it must *also* end before that other expression ends. This 4289property ensures that operators will never interfere with each other on the 4290operator stack. 4291 4292Also, the operator precedence is reversed: a lower precedence *value* means a 4293higher precedence. 4294 4295###### Recursion 4296 4297Because expressions can stack, parsing expressions actually requires recursion. 4298Well, it doesn't *require* it, but the code is much more readable that way. 4299 4300This recursion is indirect; the functions that `bc_parse_expr_err()` (the actual 4301expression parsing function) calls can, in turn, call it. 4302 4303###### Expression Flags 4304 4305There is one more big thing: not all expressions in `bc` are equal. 4306 4307Some expressions have requirements that others don't have. For example, only 4308array arguments can be arrays (which are technically not expressions, but are 4309treated as such for parsing), and some operators (in POSIX) are not allowed in 4310certain places. 4311 4312For this reason, functions that are part of the expression parsing 4313infrastructure in `bc`'s parser usually take a `flags` argument. This is meant 4314to be passed to children, and somewhere, they will be checked to ensure that the 4315resulting expression meets its requirements. 4316 4317There are also places where the flags are changed. This is because the 4318requirements change. 4319 4320Maintaining the integrity of the requirements flag set is an important part of 4321the `bc` parser. However, they do not have to be stored on a stack because their 4322stack is implicit from the recursion that expression parsing uses. 4323 4324### Functions 4325 4326Functions, in `bc`, are data structures that contain the bytecode and data 4327produced by the parsers. Functions are what the `BcProgram` program executes. 4328 4329#### Main and Read Functions 4330 4331There are two functions that always exist, which I call the "main" and "read" 4332functions. 4333 4334The "main" function is the function in which any code and data outside other 4335functions is put. Basically, it is the function where the scripting code ends 4336up. 4337 4338The "read" function is the function that is reset and parsed every time a call 4339to the `read()` builtin function happens. 4340 4341#### `dc` Strings 4342 4343In `dc`, strings can be executed, and since there are no actual "functions" in 4344`dc`, strings are handled as functions. In fact, they are effectively translated 4345into functions by parsing. 4346 4347##### Tail Calls 4348 4349Since strings in `dc` are functions, and the fact that `dc` has no native loops, 4350such loops are implemented in `dc` code using strings with conditional execution 4351commands at the end of strings. 4352 4353When such conditional execution, or even unconditional execution, commands are 4354the very last commands in a string, then `dc` can perform a [tail call][202]. 4355 4356This is done by recording the fact that a tail call happened, done by 4357incrementing an integer on a stack. When a string is executed *without* a tail 4358call, a new entry is pushed onto the stack with the integer `1`. 4359 4360When a string finally quits that followed tail calls, its stack entry is popped, 4361eliminating all of those tail calls. 4362 4363Why perform tail calls? Because otherwise, `dc` would be subject to the same 4364thing that plagues [functional programming languages][203]: stack overflow. In 4365`dc`'s case, that would manifest itself as a growing [heap][204], because the 4366execution stack is stored on the heap, until a fatal allocation failure would 4367occur. 4368 4369#### Execution 4370 4371Execution is handled by an interpreter implemented using `BcProgram` and code 4372in [`src/program.c`][53]. 4373 4374The interpreter is a mix between a [stack machine][210] and a [register 4375machine][211]. It is a stack machine in that operations happen on a stack I call 4376the "results stack," but it is a register machine in that items on the stack can 4377be stored to and loaded from "registers" (`dc` terminology), variables (`bc` 4378terminology), and arrays. 4379 4380##### Stacks 4381 4382There are two stacks in the interpreter: 4383 4384* The "results" stack (as mentioned above). 4385* The "execution" stack. 4386 4387The results stack (the `results` field of the `BcProgram` struct) is the stack 4388where the results of computations are stored. It is what makes the interpreter 4389part [stack machine][210]. It is filled with `BcResult`'s. 4390 4391The execution stack (the `stack` field of the `BcProgram` struct) is the stack 4392that tracks the current execution state of the interpreter. It is the presence 4393of this separate stack that allows the interpreter to implement the machine as a 4394loop, rather than recursively. It is filled with `BcInstPtr`'s, which are the 4395"instruction pointers." 4396 4397These instruction pointers have three fields, all integers: 4398 4399* `func`, the index of the function that is currently executing. 4400* `idx`, the index of the next bytecode instruction to execute in the function's 4401 bytecode array. 4402* `len`, which is the length of the results stack when the function started 4403 executing. This is not used by `dc`, but it used by `bc` because functions 4404 in `bc` should never affect the results stack of their callers. 4405 4406With these three fields, and always executing using the instruction pointer at 4407the top of the execution stack, the interpreter can always keep track of its 4408execution. 4409 4410When a function or a string starts executing, a new `BcInstPtr` is pushed onto 4411the execution stack for it. This includes if a function was called recursively. 4412And then, when the function or string returns, its `BcInstPtr` is popped off of 4413the execution stack. 4414 4415##### Bytecode 4416 4417Execution of functions are done through bytecode produced directly by the 4418parsers (see the [Parsing][209]). This bytecode is stored in the `code` 4419[vector][111] of the `BcFunc` struct. 4420 4421This is a vector for two reasons: 4422 4423* It makes it easier to add bytecode to the vector in the parsers. 4424* `bc` allows users to redefine functions. 4425 4426The reason I can use bytecode is because there are less than 256 instructions, 4427so an `unsigned char` can store all the bytecodes. 4428 4429###### Bytecode Indices 4430 4431There is one other factor to bytecode: there are instructions that need to 4432reference strings, constants, variables, or arrays. Bytecode need some way to 4433reference those things. 4434 4435Fortunately, all of those things can be referenced in the same way: with indices 4436because all of the items are in vectors. 4437 4438So `bc` has a way of encoding an index into bytecode. It does this by, after 4439pushing the instruction that references anything, pushing a byte set to the 4440length of the index in bytes, then the bytes of the index are pushed in 4441little-endian order. 4442 4443Then, when the interpreter encounters an instruction that needs one or more 4444items, it decodes the index or indices there and updates the `idx` field of the 4445current `BcInstPtr` to point to the byte after the index or indices. 4446 4447One more thing: the encoder of the indices only pushes as many bytes as 4448necessary to encode the index. It stops pushing when the index has no more bytes 4449with any 1 bits. 4450 4451##### Variables 4452 4453In `bc`, the vector of variables, `vars` in `BcProgram`, is not a vector of 4454numbers; it is a vector of vector of numbers. The first vector is the vector of 4455variables, the second is the variable stack, and the last level is the actual 4456number. 4457 4458This is because both `bc` and `dc` need variables to be stacks. 4459 4460For `dc`, registers are *defined* to be stacks. 4461 4462For `bc`, variables as stacks is how function arguments/parameters and function 4463`auto` variables are implemented. 4464 4465When a function is called, and a value needs to be used as a function argument, 4466a copy of the value is pushed onto the stack corresponding to the variable with 4467the same name as the function's parameter. For `auto` variables, a new number 4468set to zero is pushed onto each stack corresponding to the `auto` variables. 4469(Zero is used because the [`bc` spec][2] requires that `auto` variables are set 4470to zero.) 4471 4472It is in this way that the old value of the variable, which may not even be 4473related to the function parameter or `auto` variable, is preserved while the 4474variable is used as a function parameter or `auto` variable. 4475 4476When the function returns, of course, the stacks of the variables for the 4477parameters and `auto`'s will have their top item popped, restoring the old value 4478as it was before the function call. 4479 4480##### Arrays 4481 4482Like variables, arrays are also implemented as stacks. However, because they are 4483arrays, there is yet another level; the `arrs` field in `BcProgram` is a vector 4484of vectors of vectors of numbers. The first of the two levels is the vector of 4485arrays, the second the stack of for each array, the third the actual array, and 4486last the numbers in the array. 4487 4488`dc` has no need of this extra stack, but `bc` does because arrays can be 4489function parameters themselves. 4490 4491When arrays are used for function arguments, they are copied with a deep copy; 4492each item of the source vector is copied. This is because in `bc`, according to 4493the [`bc` spec][2], all function arguments are passed by value. 4494 4495However, array references are possible (see below). 4496 4497When arrays are used as `auto`'s, a new vector is pushed with one element; if 4498more elements are needed, the array is grown automatically, and new elements are 4499given the value of zero. 4500 4501In fact, if *any* array is accessed and does not have an element at that index, 4502the array is automatically grown to that size, and all new elements are given 4503the value zero. This behavior is guaranteed by the [`bc` spec][2]. 4504 4505###### Array References 4506 4507Array references had to be implemented as vectors themselves because they must 4508be pushed on the vectors stacks, which, as seen above, expect vectors 4509themselves. 4510 4511So thus, references are implemented as vectors on the vector stacks. These 4512vectors are not vectors of vectors themselves; they are vectors of bytes; in 4513fact, the fact that they are byte vectors and not vector vectors is how a 4514reference vector is detected. 4515 4516These reference vectors always have the same two things pushed: a byte encoding 4517(the same way bytecode indices are) of the referenced vector's index in the 4518`arrs` vector, and a byte encoding of the referenced vectors index in the vector 4519stack. 4520 4521If an item in a referenced vector is needed, then the reference is dereferenced, 4522and the item is returned. 4523 4524If a reference vector is passed to a function that does *not* expect a 4525reference, the vector is dereferenced and a deep copy is done, in the same way 4526as vectors are copied for normal array function parameters. 4527 4528### Callbacks 4529 4530There are many places in `bc` and `dc` where function pointers are used: 4531 4532* To implement destructors in vectors. (See the [Vectors][111] section.) 4533* To select the correct lex and parse functions for `bc` and `dc`. 4534* To select the correct function to execute unary operators. 4535* To select the correct function to execute binary operators. 4536* To calculate the correct number size for binary operators. 4537* To print a "digit" of a number. 4538* To seed the pseudo-random number generator. 4539 4540And there might be more. 4541 4542In every case, they are used for reducing the amount of code. Instead of 4543`if`/`else` chains, such as: 4544 4545``` 4546if (BC_IS_BC) { 4547 bc_parse_parse(vm.parse); 4548} 4549else { 4550 dc_parse_parse(vm.parse); 4551} 4552``` 4553 4554The best example of this is `bc_num_binary()`. It is called by every binary 4555operator. It figures out if it needs to allocate space for a new `BcNum`. If so, 4556it allocates the space and then calls the function pointer to the *true* 4557operation. 4558 4559Doing it like that shrunk the code *immensely*. First, instead of every single 4560binary operator duplicating the allocation code, it only exists in one place. 4561Second, `bc_num_binary()` itself does not have a massive `if`/`else` chain or a 4562`switch` statement. 4563 4564But perhaps the most important use was for destructors in vectors. 4565 4566Most of the data structures in `bc` are stored in vectors. If I hadn't made 4567destructors available for vectors, then ensuring that `bc` had no memory leaks 4568would have been nigh impossible. As it is, I check `bc` for memory leaks every 4569release when I change the code, and I have not released `bc` after version 4570`1.0.0` with any memory leaks, as far as I can remember anyway. 4571 4572### Numbers 4573 4574In order to do arbitrary-precision math, as `bc` must do, there must be some way 4575of representing arbitrary-precision numbers. `BcNum` in [`include/num.h`][184] 4576is `bc`'s way of doing that. 4577 4578(Note: the word ["limb"][214] is used below; it has a specific meaning when 4579applied to arbitrary-precision numbers. It means one piece of the number. It can 4580have a single digit, which is what GNU `bc` does, or it can have multiple, which 4581is what this `bc` does.) 4582 4583This struct needs to store several things: 4584 4585* The array of limbs of the number. This is the `num` field. 4586* The location of the decimal point. This is the `rdx` (short for [radix][215]) 4587 field. 4588* The number of limbs the number has. This is the `len` field. 4589* Whether the number is negative or not. This is the least significant bit of 4590 the `rdx` field. More on that later. 4591 4592In addition, `bc`'s number stores the capacity of the limb array; this is the 4593`cap` field. 4594 4595If the number needs to grow, and the capacity of the number is big enough, the 4596number is not reallocated; the number of limbs is just added to. 4597 4598There is one additional wrinkle: to make the usual operations (binary operators) 4599fast, the decimal point is *not* allowed to be in the middle of a limb; it must 4600always be between limbs, after all limbs (integer), or before all limbs (real 4601between -1 and 1). 4602 4603The reason for this is because addition, subtraction, multiplication, and 4604division expect digits to be lined up on the decimal point. By requiring that it 4605be between limbs, no extra alignment is needed, and those operations can proceed 4606without extra overhead. 4607 4608This does make some operations, most notably extending, truncating, and 4609shifting, more expensive, but the overhead is constant, and these operations are 4610usually cheap compared to the binary operators anyway. 4611 4612This also requires something else: `bc` numbers need to know *exactly* how many 4613decimal places they have after the decimal point. If the decimal point must be 4614inbetween limbs, the last decimal place could be in the middle of a limb. The 4615amount of decimal places in a number is carefully tracked and stored in the 4616`scale` field, and this number must always coincide with the `rdx` field by the 4617following formula: 4618 4619``` 4620scale + (BC_BASE_DIGS - 1) / BC_BASE_DIGS == rdx >> 1 4621``` 4622 4623(`BC_BASE_DIGS` is the number of decimal digits stored in one limb. It is 9 on 462464-bit systems and 4 on other systems.) 4625 4626Yes, `rdx` is shifted; that is because the negative bit is stored in the least 4627significant bit of the `rdx` field, and the actual radix (amount of limbs after 4628the decimal/radix point) is stored in the rest of the bits. This is safe because 4629`BC_BASE_DIGS` is always at least 4, which means `rdx` will always need at least 46302 bits less than `scale`. 4631 4632In addition to `rdx` always matching `scale`, another invariant is that `rdx` 4633must always be less than or equal to `len`. (Because `scale` may be greater than 4634`rdx`, `scale` does not have to be less than or equal to `len`.) 4635 4636Another invariant is that `len` must always be less than or equal to `cap`, for 4637obvious reasons. 4638 4639The last thing programmers need to know is that the limb array is stored in 4640little-endian order. This means that the last decimal places are in the limb 4641stored at index 0, and the most significant digits are stored at index `len-1`. 4642 4643This is done to make the most important operations fast. Addition and 4644subtraction are done from least significant to most significant limbs, which 4645means they can speed through memory in the way most computers are best at. 4646Multiplication does the same, sort of, and with division, it matters less. 4647Comparison does need to go backwards, but that's after exhausting all other 4648alternatives, including for example, checking the length of the integer portion 4649of each limb array. 4650 4651Finally, here are some possible special situations with numbers and what they 4652mean: 4653 4654* `len == 0`: the number equals 0. 4655* `len == 0 && scale != 0`: the number equals 0, but it has a `scale` value. 4656 This is the only case where `scale` does not have to coincide with `rdx` 4657 This can happen with division, for example, that sets a specific `scale` for 4658 the result value but may produce 0. 4659* `(rdx >> 1) == len`: the number is greater than or equal to 1, or less than or 4660 equal to -1. 4661* `(rdx >> 1) < len`: the number is greater than -1 and less than 1, not 4662 including 0, although this will be true for 0 as well. However, 0 is always 4663 assumed to be represented by `len == 0`. 4664* `(rdx >> 1) == 0`: the number is an integer. In this case, `scale` must also 4665 equal 0. 4666 4667#### Math Style 4668 4669When I wrote the math for `bc`, I adopted a certain style that, if known, will 4670make it easier to understand the code. The style follows these rules: 4671 4672* `BcNum` arguments always come before arguments of other types. 4673* Among the `BcNum` arguments, the operands always come first, and the `BcNum` 4674 where the result(s) will be stored come last. 4675* Error checking is placed first in the function. 4676* Easy cases are placed next. 4677* Preparation, such as allocating temporaries, comes next. 4678* The actual math. 4679* Cleanup and ensuring invariants. 4680 4681While these rules are not hard and fast, using them as a guide will probably 4682help. 4683 4684#### Digit Clamping 4685 4686There is a question of what to do when parsing numbers and coming across digits 4687that are greater than or equal to the current `ibase`. `bc` and `dc` do one of 4688two things: 4689 4690* Clamp the digit to the maximum correct digit, or 4691* Use the digit as-is, multiplying it by the `ibase` to the power of the digit's 4692 position (starting from 0 at the least significant digit). 4693 4694The former behavior is what GNU `bc` does, and the latter is what GNU `dc`, the 4695OpenBSD `bc`, and the OpenBSD `dc` do. 4696 4697It is possible to switch between the two with the `-c` and `-C` command-line 4698options. However, it is important for developers to understand that both 4699behaviors exist and should exist. 4700 4701The library can also do both, and it is set in a global for each thread. 4702 4703### Strings as Numbers 4704 4705Strings can be assigned to variables. This is a problem because the vectors for 4706variable stacks expect `BcNum` structs only. 4707 4708While I could have made a union, I decided that the complexity of adding an 4709entirely new type, with destructor and everything, was not worth it. Instead, I 4710took advantage of the fact that `free()`, when passed a `NULL` pointer, will do 4711nothing. 4712 4713Using that, I made it so `BcNum`'s could store strings instead. This is marked 4714by the `BcNum` having a `NULL` limb array (`num`) and a `cap` of 0 (which should 4715*never* happen with a real number, though the other fields could be 0). 4716 4717The `BcNum` stores the function that stores the string in the `rdx` field, and 4718it stores the index of the string in the `scale` field. This is used to actually 4719load the string if necessary. 4720 4721Note that historically, string information was stored in the `loc` field of 4722the `d` union in a `BcResult`. This was changed recently to standardize; now, 4723all string information are stored in the `n` field of the `d` union regardless. 4724This means that all string information is stored in `BcNum`'s. This removes 4725extra cases. 4726 4727Also, if a temp is made with a string, then the result type should still be 4728`BC_RESULT_STR`, not `BC_RESULT_TEMP`. This is to make it easier to do type 4729checks. 4730 4731### Pseudo-Random Number Generator 4732 4733In order to understand this section, I suggest you read the information in the 4734manpages about the pseudo-random number generator (PRNG) first; that will help 4735you understand the guarantees it has, which is important because this section 4736delves into implementation details. 4737 4738First, the PRNG I use is seeded; this is because most OS's have an excellent 4739cryptographically secure PRNG available via command-line, usually 4740`/dev/urandom`, but the only *seeded* PRNG available is usually `bash`'s 4741`$RANDOM`, which is essentially a wrapper around C's `rand()`. 4742 4743`rand()` is...bad. It is only guaranteed to return 15 bits of random data. 4744Obviously, getting good random data out of that would be hard with that alone, 4745but implementations also seem to be poor. 4746 4747On top of that, `bc` is an arbitrary-precision calculator; if I made it able to 4748generate random numbers, I could make it generate random numbers of any size, 4749and since it would be seeded, results would be reproducible, when wanted. 4750 4751So to get that, I needed a seeded PRNG with good characteristics. After scouring 4752the Internet, I decided on the [PCG PRNG][215], mostly because of [this blog 4753post][216]. Part of the reason was the behavior of the xoroshiro128+ author, who 4754hates on PCG and its author, but also because PCG seemed to do better when 4755tested by independent parties. 4756 4757After that decision, I faced a challenge: PCG requires 255 bits of seed: 128 for 4758the actual seed, and 127 for the "increment." (Melissa O'Neill, the PCG author, 4759likens the increment to selecting a codebook.) 4760 4761I could, of course, put the entire 255 bits into one massive arbitrary-precision 4762number; `bc` is good at that, after all. But that didn't sit right with me 4763because it would mean any seed selected by users would have the real portion 4764ignored, which is stupid in a program like `bc`. 4765 4766Instead, I decided to make the integer portion the increment (clamped down to 4767size), and the real portion the seed. 4768 4769In most cases, this would be a bad idea because you cannot, in general, know how 4770many decimal places you need to represent any number with `n` real digits in 4771base `b` in another base. However, there is an easy to how many decimal digits 4772after the decimal point it takes to represent reals of base 2 in base 10: the 4773power of two. 4774 4775It turns out that, for base 2 represented in base 10, the power of 2 is 4776*exactly* how many digits are necessary to represent *any* number `n/2^p`, where 4777`p` is the power of 2. This is because at every halving, the number of decimal 4778places increases by 1: 4779 4780``` 47810.5 47820.25 47830.125 47840.0625 47850.03125 47860.015625 4787... 4788``` 4789 4790This happens because because every decimal representation of `1/2^p` ends with a 47915 because 5 is half of 10. Because 5 is odd, half of it always ends with the 4792digits 25, where 2 is where the previous 5 was, and the new 5 is one decimal 4793place further right. 4794 4795So the algorithm to convert all 255 bits of the seed is as follows: 4796 47971. Convert the increment to a `BcNum`. 47982. Convert the seed to a `BcNum`. 47993. Divide the seed by `2^128` with a `scale` of 128. (For 32-bit systems, 4800 substitute 64 bits for 128.) 48014. Add the two numbers together. 4802 4803Likewise, the algorithm to convert from a user-supplied number to a seed is: 4804 48051. Truncate a copy of the number. 48062. Subtract the result from #1 from the original number. This gives the real 4807 portion of the number. 48083. Clamp the result of #1 to 127 (or 63) bits. This is the increment. 48094. Multiply the result of #2 by `2^128`. 48105. Truncate the result of #4. This is the seed. 4811 4812#### Generating Arbitrary-Precision Numbers 4813 4814I wrote a function (`bc_rand_bounded()`) that will return unbiased results with 4815any bound below the max that PCG can generate. 4816 4817To generate an integer of arbitrary size using a bound, `bc` simply uses 4818`bc_rand_bounded()` to generate numbers with a bound `10^BC_BASE_DIGS` for as 4819many limbs as needed to satisfy the bigger bound. 4820 4821To generate numbers with arbitrary precision after the decimal point, `bc` 4822merely generates an arbitrary precision integer with the bound `10^p`, where `p` 4823is the desired number of decimal places, then divides in by `10^p` with a 4824`scale` of `p`. 4825 4826## Debug Code 4827 4828Besides building `bc` in debug mode with the `-g` flag to [`configure.sh`][69], 4829programmers can also add `-DBC_DEBUG_CODE=1` to the `CFLAGS`. This will enable 4830the inclusion of *a lot* of extra code to assist with debugging. 4831 4832For more information, see all of the code guarded by `#if BC_DEBUG_CODE` in the 4833[`include/`][212] directory and in the [`src/`][213] directory. 4834 4835Yes, all of the code is guarded by `#if` preprocessor statements; this is 4836because the code should *never* be in a release build, and by making programmers 4837add this manually (not even an option to [`configure.sh`][69]), it is easier to 4838ensure that never happens. 4839 4840However, that said, the extra debug code is useful; that was why I kept it in. 4841 4842## Performance 4843 4844While I have put in a lot of effort to make `bc` as fast as possible, there 4845might be some things users can do to speed it up without changing the code. 4846 4847First, users can probably use [profile-guided optimization][217] to optimize 4848even better, using the test suite to profile. 4849 4850Second, I included macros that might help branch placement and prediction: 4851 4852* `BC_ERR(e)` 4853* `BC_UNLIKELY(e)` 4854* `BC_NO_ERR(e)` 4855* `BC_LIKELY(e)` 4856 4857`BC_ERR` is the same as `BC_UNLIKELY`, and `BC_NO_ERR` is the same as 4858`BC_LIKELY`; I just added them to also document branches that lead to error 4859conditions or *away* from error conditions. 4860 4861Anyway, if `BC_LIKELY` and `BC_UNLIKELY` are not defined during compilation, 4862they expand to nothing but the argument they were given. 4863 4864They can, however, be defined to `__builtin_expect((e), 1)` and 4865`__builtin_expect((e), 0)`, respectively, on GCC and Clang for better branch 4866prediction and placement. (For more information about `__builtin_expect()` see 4867the [GCC documentation][218].) 4868 4869There might be other compilers that can take advantage of that, but I don't know 4870anything about that. 4871 4872Also, as stated in the [build manual][219], link-time optimization is excellent 4873at optimizing this `bc`. Use it. 4874 4875### Benchmarks 4876 4877To help programmers improve performance, I have built and assembled 4878infrastructure to make benchmarking easy. 4879 4880First, in order to easily run benchmarks, I created 4881[`scripts/benchmark.sh`][220]. 4882 4883Second, I copied and adapted [`ministat.c`][223] [from FreeBSD][221], to make it 4884easier to judge whether the results are significant or not. 4885 4886Third, I made the `make` clean target `make clean_benchmarks`, to clean 4887`scripts/ministat` and the generated benchmark files. 4888 4889Fourth, I made it so [`scripts/benchmark.sh`][220] outputs the timing and memory 4890data in a format that is easy for `scripts/ministat` to digest. 4891 4892To add a benchmark, add a script in the right directory to generate the 4893benchmark. Yes, generate. 4894 4895All of the benchmarks are generated first, from `.bc` and `.dc` files in the 4896[`benchmarks/bc/`][91] and [`benchmarks/dc/`][224]. This is so that massive 4897amounts of data can be generated and then pushed through the calculators. 4898 4899If you need to benchmark `bc` or `dc` with simple loops, have the generator 4900files simply print the loop code. 4901 4902### Caching of Numbers 4903 4904In order to provide some performance boost, `bc` tries to reuse old `BcNum`'s 4905that have the default capacity (`BC_NUM_DEF_SIZE`). 4906 4907It does this by allowing `bc_num_free()` to put the limb array onto a 4908statically-allocated stack (it's just a global array with a set size). Then, 4909when a `BcNum` with the default capacity is needed, `bc_num_init()` asks if any 4910are available. If the answer is yes, the one on top of the stack is returned. 4911Otherwise, `NULL` is returned, and `bc_num_free()` knows it needs to `malloc()` 4912a new limb array. 4913 4914When the stack is filled, any numbers that `bc` attempts to put on it are just 4915freed. 4916 4917This setup saved a few percent in my testing for version [3.0.0][32], which is 4918when I added it. 4919 4920## `bcl` 4921 4922At the request of one of my biggest users, I spent the time to make a build mode 4923where the number and math code of `bc` could be wrapped into a library, which I 4924called `bcl`. 4925 4926This mode is exclusive; `bc` and `dc` themselves are *not* built when building 4927`bcl`. 4928 4929The only things in the `bc` math code that is not included is: 4930 4931* Printing newlines (clients do not care about `bc`'s line lenth restriction). 4932* `dc`'s stream print. 4933 4934Even the [pseudo-random number generator][179] is included, with extra support 4935for generating real numbers with it. (In `bc`, such support is in 4936[`lib2.bc`][26].) 4937 4938### Signal Handling 4939 4940`bcl` does not have the capability for signal handling (as of version `6.0.0`). 4941 4942### Threads 4943 4944It is possible to use `bcl` across multiple threads. However, `bcl` must be 4945initialized on each thread where it will be used. 4946 4947This is because `bcl` uses thread-specific data to store the "globals" for each 4948thread. This is to ensure that threads do not stomp on each other's numbers or 4949other data structures. 4950 4951### Contexts 4952 4953Contexts were an idea by the same user that requested `bcl`. They are meant to 4954make it so multiple clients in one program can keep their data separate from 4955each other. 4956 4957### Numbers 4958 4959Numbers in `bcl` are literally indices into an encapsulated array of numbers, 4960hidden in the context. These indices are then passed to clients to refer to 4961numbers later. 4962 4963### Operand Consumption 4964 4965Most math functions in `bcl` "consume" their operand arguments; the arguments 4966are freed, whether or not an error is returned. 4967 4968This is to make it easy to implement math code, like this: 4969 4970``` 4971n = bcl_add(bcl_mul(a, b), bcl_div(c, d)); 4972``` 4973 4974If numbers need to be preserved, they can be with `bcl_dup()`: 4975 4976``` 4977n = bcl_add(bcl_mul(bcl_dup(a), bc_dup(b)), bcl_div(bcl_dup(c), bcl_dup(d))); 4978``` 4979 4980### Errors 4981 4982Errors can be encoded in the indices representing numbers, and where necessary, 4983clients are responsible for checking those errors. 4984 4985The encoding of errors is this: if an error happens, the value `0-error` is 4986returned. To decode, do the exact same thing. Thus, any index above 4987`0-num_errors` is an error. 4988 4989If an index that represents an error is passed to a math function, that function 4990propagates the error to its result and does not perform the math operation. 4991 4992All of this is to, once again, make it easy to implement the math code as above. 4993 4994However, where possible, errors are returned directly. 4995 4996[1]: https://en.wikipedia.org/wiki/Bus_factor 4997[2]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#top 4998[3]: https://en.wikipedia.org/wiki/Dc_(Unix) 4999[4]: https://en.wikipedia.org/wiki/Reverse_Polish_notation 5000[5]: ./bc/A.1.md#standard-library 5001[6]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc 5002[7]: ./bc/A.1.md#extended-library 5003[8]: #libbc-2 5004[9]: #strgensh 5005[10]: https://vimeo.com/230142234 5006[11]: https://gavinhoward.com/2019/12/values-for-yao/ 5007[12]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf 5008[13]: ./build.md#cross-compiling 5009[14]: ./build.md 5010[15]: #strgenc 5011[16]: http://landley.net/toybox/about.html 5012[17]: https://www.busybox.net/ 5013[18]: https://en.wikipedia.org/wiki/Karatsuba_algorithm 5014[19]: https://clang-analyzer.llvm.org/scan-build.html 5015[20]: https://www.valgrind.org/ 5016[21]: https://clang.llvm.org/docs/AddressSanitizer.html 5017[22]: https://gavinhoward.com/2019/11/finishing-software/ 5018[23]: #history 5019[24]: https://clang.llvm.org/docs/ClangFormat.html 5020[25]: ./algorithms.md 5021[26]: #lib2bc 5022[27]: #vmh 5023[28]: https://github.com/rain-1/linenoise-mob 5024[29]: https://github.com/antirez/linenoise 5025[30]: #bclh 5026[31]: #argsh 5027[32]: ../NEWS.md#3-0-0 5028[33]: ../NEWS.md 5029[34]: https://github.com/skeeto/optparse 5030[35]: #opth 5031[36]: #historyh 5032[37]: #randh 5033[38]: #langh 5034[39]: #numc 5035[40]: #bcc 5036[41]: #bc_lexc 5037[42]: #bc_parsec 5038[43]: #libraryc 5039[44]: #dcc 5040[45]: #dc_lexc 5041[46]: #dc_parsec 5042[47]: #filec 5043[48]: #historyc 5044[49]: #langc 5045[50]: #lexc 5046[51]: #optc 5047[52]: #parsec 5048[53]: #programc 5049[54]: #randc 5050[55]: #fileh 5051[56]: #readc 5052[57]: #programh 5053[58]: #vmc 5054[59]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/gencat.html#top 5055[60]: #manpagesh 5056[61]: #bcl3md 5057[62]: #bcl3 5058[63]: #bclvcxproj 5059[64]: #bclvcxprojfilters 5060[65]: #bclsln 5061[66]: #bcvcxproj 5062[67]: #bcvcxprojfilters 5063[68]: #bcsln 5064[69]: #configuresh 5065[70]: #makefilein 5066[71]: #functionsh 5067[72]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#top 5068[73]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18 5069[74]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html#top 5070[75]: #versionh 5071[76]: ##posix-shell-scripts 5072[77]: #tests 5073[78]: #karatsubapy 5074[79]: #bc-1 5075[80]: #dc-1 5076[81]: ./build.md#build-type 5077[82]: #fuzzing-1 5078[83]: #releasesh 5079[84]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 5080[85]: #locales-1 5081[86]: #manuals-1 5082[87]: #locale_installsh 5083[88]: #locale_uninstallsh 5084[89]: #bc1mdin 5085[90]: #dc1mdin 5086[91]: #bc 5087[92]: https://pandoc.org/ 5088[93]: #release_settingstxt 5089[94]: #aflpy 5090[95]: #randmathpy 5091[96]: #caching-of-numbers 5092[97]: #error-handling 5093[98]: #radamsatxt 5094[99]: https://gitlab.com/akihe/radamsa 5095[100]: #radamsash 5096[101]: https://musl.libc.org/ 5097[102]: ./build.md#settings 5098[103]: #test_settingstxt 5099[104]: #test_settingssh 5100[105]: #functionssh 5101[106]: #bch 5102[107]: https://en.wikipedia.org/wiki/Infix_notation 5103[108]: https://en.wikipedia.org/wiki/Reverse_Polish_notation 5104[109]: https://en.wikipedia.org/wiki/Shunting-yard_algorithm 5105[110]: #bc-parsing 5106[111]: #vectors 5107[112]: https://git.yzena.com/Yzena/Yc/src/branch/master/include/yc/vector.h 5108[113]: https://en.wikipedia.org/wiki/Signal_(IPC) 5109[114]: #custom-io 5110[115]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 5111[116]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html 5112[117]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html 5113[118]: https://www.youtube.com/watch?v=4PaWFYm0kEw 5114[119]: #fuzz_prepsh 5115[120]: #bc_aflyaml 5116[121]: #bc_afl_continueyaml 5117[122]: https://github.com/tmux/tmux 5118[123]: https://tmuxp.git-pull.com/ 5119[124]: #test-suite 5120[125]: https://aflplus.plus/ 5121[126]: #link-time-optimization 5122[127]: #fuzzing-performance 5123[128]: #radamsa 5124[129]: #afl-quickstart 5125[130]: #convenience 5126[131]: #datac 5127[132]: https://git.gavinhoward.com/gavin/vim-bc 5128[133]: https://git.gavinhoward.com/gavin/bc_libs 5129[134]: #debugging 5130[135]: #asserts 5131[136]: #portability 5132[137]: https://pexpect.readthedocs.io/en/stable/ 5133[138]: #test-suite-portability 5134[139]: #historypy 5135[140]: #historysh 5136[141]: #group-tests 5137[142]: #build-system 5138[143]: #generated-tests 5139[144]: #benchmarks-1 5140[145]: #gen 5141[146]: #test-coverage 5142[147]: #integration-with-the-build-system 5143[148]: #test-scripts 5144[149]: #standard-tests 5145[150]: #script-tests 5146[151]: #error-tests 5147[152]: #stdin-tests 5148[153]: #read-tests 5149[154]: #other-tests 5150[155]: #history-tests 5151[156]: #bcl 5152[157]: #bcl-test 5153[158]: #bclc 5154[159]: #valgrind 5155[160]: #addresssanitizer-and-friends 5156[161]: #bc-2 5157[162]: #dc-2 5158[163]: #alltxt-1 5159[164]: #errorstxt 5160[165]: #posix_errorstxt 5161[166]: #timeconstsh 5162[167]: #alltxt-3 5163[168]: #errorstxt-1 5164[169]: #scripts-1 5165[170]: #scripts-2 5166[171]: #alltxt-2 5167[172]: #alltxt-4 5168[173]: #async-signal-safe-signal-handling 5169[174]: #vectorh 5170[175]: #read_errorstxt 5171[176]: #statush 5172[177]: #numbers 5173[178]: #math-style 5174[179]: #pseudo-random-number-generator 5175[180]: #lexh 5176[181]: #parseh 5177[182]: #dch 5178[183]: #libraryh 5179[184]: #numh 5180[185]: #readh 5181[186]: #maps 5182[187]: #slabs-and-slab-vectors 5183[188]: ./build.md#extra-math 5184[189]: #command-line-history 5185[190]: #scriptsed 5186[191]: #linux-timeconstbc-script 5187[192]: #corpuses 5188[193]: ./build.md#history 5189[194]: https://www.valgrind.org/docs/manual/mc-manual.html 5190[195]: #othersh 5191[196]: https://scan.coverity.com/ 5192[197]: https://clang-analyzer.llvm.org/ 5193[198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it 5194[199]: https://cr.yp.to/docs/selfpipe.html 5195[200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph 5196[201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator 5197[202]: https://en.wikipedia.org/wiki/Tail_call 5198[203]: https://en.wikipedia.org/wiki/Functional_programming_language 5199[204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation 5200[205]: #mainc 5201[206]: #argc 5202[207]: https://en.wikipedia.org/wiki/Abstract_syntax_tree 5203[208]: #functions 5204[209]: #parsing 5205[210]: https://en.wikipedia.org/wiki/Stack_machine 5206[211]: https://en.wikipedia.org/wiki/Register_machine 5207[212]: #include 5208[213]: #src 5209[214]: https://gmplib.org/manual/Nomenclature-and-Types 5210[215]: https://en.wikipedia.org/wiki/Radix_point 5211[216]: #main-and-read-functions 5212[215]: https://www.pcg-random.org/ 5213[216]: https://lemire.me/blog/2017/08/22/testing-non-cryptographic-random-number-generators-my-results/ 5214[217]: https://en.wikipedia.org/wiki/Profile-guided_optimization 5215[218]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect 5216[219]: ./build.md#optimization 5217[220]: #benchmarksh 5218[221]: https://cgit.freebsd.org/src/tree/usr.bin/ministat/ministat.c 5219[222]: https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html 5220[223]: #ministatc 5221[224]: #dc 5222[225]: #allsh 5223[226]: #errorssh 5224[227]: #errorsh 5225[228]: #vectorc 5226