• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Development
2
3Updated: 13 Mar 2023
4
5This document is meant for the day when I (Gavin D. Howard) get [hit by a
6bus][1]. In other words, it's meant to make the [bus factor][1] a non-issue.
7
8This document is supposed to contain all of the knowledge necessary to develop
9`bc` and `dc`.
10
11In addition, this document is meant to add to the [oral tradition of software
12engineering][118], as described by Bryan Cantrill.
13
14This document will reference other parts of the repository. That is so a lot of
15the documentation can be closest to the part of the repo where it is actually
16necessary.
17
18## What Is It?
19
20This repository contains an implementation of both [POSIX `bc`][2] and [Unix
21`dc`][3].
22
23POSIX `bc` is a standard utility required for POSIX systems. `dc` is a
24historical utility that was included in early Unix and even predates both Unix
25and C. They both are arbitrary-precision command-line calculators with their own
26programming languages. `bc`'s language looks similar to C, with infix notation
27and including functions, while `dc` uses [Reverse Polish Notation][4] and allows
28the user to execute strings as though they were functions.
29
30In addition, it is also possible to build the arbitrary-precision math as a
31library, named [`bcl`][156].
32
33**Note**: for ease, I will refer to both programs as `bc` in this document.
34However, if I say "just `bc`," I am referring to just `bc`, and if I say `dc`, I
35am referring to just `dc`.
36
37### History
38
39This project started in January 2018 when a certain individual on IRC, hearing
40that I knew how to write parsers, asked me to write a `bc` parser for his math
41library. I did so. I thought about writing my own math library, but he
42disparaged my programming skills and made me think that I couldn't do it.
43
44However, he took so long to do it that I eventually decided to give it a try and
45had a working math portion in two weeks. It taught me that I should not listen
46to such people.
47
48From that point, I decided to make it an extreme learning experience about how
49to write quality software.
50
51That individual's main goal had been to get his `bc` into [toybox][16], and I
52managed to get my own `bc` in. I also got it in [busybox][17].
53
54Eventually, in late 2018, I also decided to try my hand at implementing
55[Karatsuba multiplication][18], an algorithm that that unnamed individual
56claimed I could never implement. It took me a bit, but I did it.
57
58This project became a passion project for me, and I continued. In mid-2019,
59Stefan Eßer suggested I improve performance by putting more than 1 digit in each
60section of the numbers. After I showed immaturity because of some burnout, I
61implemented his suggestion, and the results were incredible.
62
63Since that time, I have gradually been improving the `bc` as I have learned more
64about things like fuzzing, [`scan-build`][19], [valgrind][20],
65[AddressSanitizer][21] (and the other sanitizers), and many other things.
66
67One of my happiest moments was when my `bc` was made the default in FreeBSD.
68Another happiest moment was when I found out that my `bc` had shipped with Mac
69OSX Ventura, without my knowledge.
70
71But since I believe in [finishing the software I write][22], I have done less
72work on `bc` over time, though there are still times when I put a lot of effort
73in, such as now (17 June 2021), when I am attempting to convince OpenBSD to use
74my `bc`.
75
76And that is why I am writing this document: someday, someone else is going to
77want to change my code, and this document is my attempt to make it as simple as
78possible.
79
80### Values
81
82[According to Bryan Cantrill][10], all software has values. I think he's
83correct, though I [added one value for programming languages in particular][11].
84
85However, for `bc`, his original list will do:
86
87* Approachability
88* Availability
89* Compatibility
90* Composability
91* Debuggability
92* Expressiveness
93* Extensibility
94* Interoperability
95* Integrity
96* Maintainability
97* Measurability
98* Operability
99* Performance
100* Portability
101* Resiliency
102* Rigor
103* Robustness
104* Safety
105* Security
106* Simplicity
107* Stability
108* Thoroughness
109* Transparency
110* Velocity
111
112There are several values that don't apply. The reason they don't apply is
113because `bc` and `dc` are existing utilities; this is just another
114reimplementation. The designs of `bc` and `dc` are set in stone; there is
115nothing we can do to change them, so let's get rid of those values that would
116apply to their design:
117
118* Compatibility
119* Integrity
120* Maintainability
121* Measurability
122* Performance
123* Portability
124* Resiliency
125* Rigor
126* Robustness
127* Safety
128* Security
129* Simplicity
130* Stability
131* Thoroughness
132* Transparency
133
134Furthermore, some of the remaining ones don't matter to me, so let me get rid of
135those and order the rest according to my *actual* values for this project:
136
137* Robustness
138* Stability
139* Portability
140* Compatibility
141* Performance
142* Security
143* Simplicity
144
145First is **robustness**. This `bc` and `dc` should be robust, accepting any
146input, never crashing, and instead, returning an error.
147
148Closely related to that is **stability**. The execution of `bc` and `dc` should
149be deterministic and never change for the same inputs, including the
150pseudo-random number generator (for the same seed).
151
152Third is **portability**. These programs should run everywhere that POSIX
153exists, as well as Windows. This means that just about every person on the
154planet will have access to these programs.
155
156Next is **compatibility**. These programs should, as much as possible, be
157compatible with other existing implementations and standards.
158
159Then we come to **performance**. A calculator is only usable if it's fast, so
160these programs should run as fast as possible.
161
162After that is **security**. These programs should *never* be the reason a user's
163computer is compromised.
164
165And finally, **simplicity**. Where possible, the code should be simple, while
166deferring to the above values.
167
168Keep these values in mind for the rest of this document, and for exploring any
169other part of this repo.
170
171#### Portability
172
173But before I go on, I want to talk about portability in particular.
174
175Most of these principles just require good attention and care, but portability
176is different. Sometimes, it requires pulling in code from other places and
177adapting it. In other words, sometimes I need to duplicate and adapt code.
178
179This happened in a few cases:
180
181* Option parsing (see [`include/opt.h`][35]).
182* History (see [`include/history.h`][36]).
183* Pseudo-Random Number Generator (see [`include/rand.h`][37]).
184
185This was done because I decided to ensure that `bc`'s dependencies were
186basically zero. In particular, either users have a normal install of Windows or
187they have a POSIX system.
188
189A POSIX system limited me to C99, `sh`, and zero external dependencies. That
190last item is why I pull code into `bc`: if I pull it in, it's not an external
191dependency.
192
193That's why `bc` has duplicated code. Remove it, and you risk `bc` not being
194portable to some platforms.
195
196## Suggested Course
197
198I do have a suggested course for programmers to follow when trying to understand
199this codebase. The order is this:
200
2011.	`bc` Spec.
2022.	Manpages.
2033.	Test suite.
2044.	Understand the build.
2055.	Algorithms manual.
2066.	Code concepts.
2077.	Repo structure.
2088.	Headers.
2099.	Source code.
210
211This order roughly follows this order:
212
2131. High-level requirements
2142. Low-level requirements
2153. High-level implementation
2164. Low-level implementation
217
218In other words, first understand what the code is *supposed* to do, then
219understand the code itself.
220
221## Useful External Tools
222
223I have a few tools external to `bc` that are useful:
224
225* A [Vim plugin with syntax files made specifically for my `bc` and `dc`][132].
226* A [repo of `bc` and `dc` scripts][133].
227* A set of `bash` aliases (see below).
228* A `.bcrc` file with items useful for my `bash` setup (see below).
229
230My `bash` aliases are these:
231
232```sh
233alias makej='make -j16'
234alias mcmake='make clean && make'
235alias mcmakej='make clean && make -j16'
236alias bcdebug='CPPFLAGS="-DBC_DEBUG_CODE=1" CFLAGS="-Weverything -Wno-padded \
237    -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
238    -Wno-unreachable-code-return -Wno-missing-noreturn \
239    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
240    -pedantic -std=c99" ./configure.sh'
241alias bcconfig='CFLAGS="-Weverything -Wno-padded -Wno-switch-enum \
242    -Wno-format-nonliteral -Wno-cast-align -Wno-unreachable-code-return \
243    -Wno-missing-noreturn -Wno-disabled-macro-expansion -Wno-unreachable-code \
244    -Wall -Wextra -pedantic -std=c99" ./configure.sh'
245alias bcnoassert='CPPFLAGS="-DNDEBUG" CFLAGS="-Weverything -Wno-padded \
246    -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
247    -Wno-unreachable-code-return -Wno-missing-noreturn \
248    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
249    -pedantic -std=c99" ./configure.sh'
250alias bcdebugnoassert='CPPFLAGS="-DNDEBUG -DBC_DEBUG_CODE=1" \
251    CFLAGS="-Weverything -Wno-padded -Wno-switch-enum -Wno-format-nonliteral \
252    -Wno-cast-align -Wno-unreachable-code-return -Wno-missing-noreturn \
253    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
254    -pedantic -std=c99" ./configure.sh'
255alias bcunset='unset BC_LINE_LENGTH && unset BC_ENV_ARGS'
256```
257
258`makej` runs `make` with all of my cores.
259
260`mcmake` runs `make clean` before running `make`. It will take a target on the
261command-line.
262
263`mcmakej` is a combination of `makej` and `mcmake`.
264
265`bcdebug` configures `bc` for a full debug build, including `BC_DEBUG_CODE` (see
266[Debugging][134] below).
267
268`bcconfig` configures `bc` with Clang (Clang is my personal default compiler)
269using full warnings, with a few really loud and useless warnings turned off.
270
271`bcnoassert` configures `bc` to not have asserts built in.
272
273`bcdebugnoassert` is like `bcnoassert`, except it also configures `bc` for debug
274mode.
275
276`bcunset` unsets my personal `bc` environment variables, which are set to:
277
278```sh
279export BC_ENV_ARGS="-l $HOME/.bcrc"
280export BC_LINE_LENGTH="74"
281```
282
283Unsetting these environment variables are necessary for running
284[`scripts/release.sh`][83] because otherwise, it will error when attempting to
285run `bc -s` on my `$HOME/.bcrc`.
286
287Speaking of which, the contents of that file are:
288
289```bc
290define void print_time_unit(t){
291	if(t<10)print "0"
292	if(t<1&&t)print "0"
293	print t,":"
294}
295define void sec2time(t){
296	auto s,m,h,d,r
297	r=scale
298	scale=0
299	t=abs(t)
300	s=t%60
301	t-=s
302	m=t/60%60
303	t-=m
304	h=t/3600%24
305	t-=h
306	d=t/86400
307	if(d)print_time_unit(d)
308	if(h)print_time_unit(h)
309	print_time_unit(m)
310	if(s<10)print "0"
311	if(s<1&&s)print "0"
312	s
313	scale=r
314}
315define minutes(secs){
316	return secs/60;
317}
318define hours(secs){
319	return secs/3600;
320}
321define days(secs){
322	return secs/3600/24;
323}
324define years(secs){
325	return secs/3600/24/365.25;
326}
327define fbrand(b,p){
328	auto l,s,t
329	b=abs(b)$
330	if(b<2)b=2
331	s=scale
332	t=b^abs(p)$
333	l=ceil(l2(t),0)
334	if(l>scale)scale=l
335	t=irand(t)/t
336	scale=s
337	return t
338}
339define ifbrand(i,b,p){return irand(abs(i)$)+fbrand(b,p)}
340```
341
342This allows me to use `bc` as part of my `bash` prompt.
343
344## Code Style
345
346The code style for `bc` is...weird, and that comes from historical accident.
347
348In [History][23], I mentioned how I got my `bc` in [toybox][16]. Well, in order
349to do that, my `bc` originally had toybox style. Eventually, I changed to using
350tabs, and assuming they were 4 spaces wide, but other than that, I basically
351kept the same style, with some exceptions that are more or less dependent on my
352taste.
353
354However, I later managed to get [ClangFormat][24] to work, so I changed the
355style to that.
356
357### ClangFormat
358
359The style is now defined as whatever [ClangFormat][24] outputs using the
360existing `.clang-format` file. More precisely, the style is whatever is output
361when the following command is run in the root directory:
362
363```
364./scripts/format.sh
365```
366
367### Historical Style
368
369The code style used to be:
370
371* Tabs are 4 spaces.
372* Tabs are used at the beginning of lines for indent.
373* Spaces are used for alignment.
374* Lines are limited to 80 characters, period.
375* Pointer asterisk (`*`) goes with the variable (on the right), not the type,
376  unless it is for a pointer type returned from a function.
377* The opening brace is put on the same line as the header for the function,
378  loop, or `if` statement.
379* Unless the header is more than one line, in which case the opening brace is
380  put on its own line.
381* If the opening brace is put on its own line, there is no blank line after it.
382* If the opening brace is *not* put on its own line, there *is* a blank line
383  after it, *unless* the block is only one or two lines long.
384* Code lines are grouped into what I call "paragraphs." Basically, lines that
385  seem like they should go together are grouped together. This one comes down
386  to judgment.
387* Bodies of `if` statements, `else` statements, and loops that are one line
388  long are put on the same line as the statement, unless the header is more than
389  one line long, and/or, the header and body cannot fit into 80 characters with
390  a space inbetween them.
391* If single-line bodies are on a separate line from their headers, and the
392  headers are only a single line, then no braces are used.
393* However, braces are *always* used if they contain another `if` statement or
394  loop.
395* Loops with empty bodies are ended with a semicolon.
396* Expressions that return a boolean value are surrounded by paretheses.
397* Macro backslashes are aligned as far to the left as possible.
398* Binary operators have spaces on both sides.
399* If a line with binary operators overflows 80 characters, a newline is inserted
400  *after* binary operators.
401* Function modifiers and return types are on the same line as the function name.
402* With one exception, `goto`'s are only used to jump to the end of a function
403  for cleanup.
404* All structs, enums, and unions are `typedef`'ed.
405* All constant data is in one file: [`src/data.c`][131], but the corresponding
406  `extern` declarations are in the appropriate header file.
407* All local variables are declared at the beginning of the scope where they
408  appear. They may be initialized at that point, if it does not invoke UB or
409  otherwise cause bugs.
410* All precondition `assert()`'s (see [Asserts][135]) come *after* local variable
411  declarations.
412* Besides short `if` statements and loops, there should *never* be more than one
413  statement per line.
414
415## Repo Structure
416
417Functions are documented with Doxygen-style doc comments. Functions that appear
418in headers are documented in the headers, while static functions are documented
419where they are defined.
420
421### `configure`
422
423A symlink to [`configure.sh`][69].
424
425### `configure.sh`
426
427This is the script to configure `bc` and [`bcl`][156] for building.
428
429This `bc` has a custom build system. The reason for this is because of
430[*portability*][136].
431
432If `bc` used an outside build system, that build system would be an external
433dependency. Thus, I had to write a build system for `bc` that used nothing but
434C99 and POSIX utilities.
435
436One of those utilities is POSIX `sh`, which technically implements a
437Turing-complete programming language. It's a terrible one, but it works.
438
439A user that wants to build `bc` on a POSIX system (not Windows) first runs
440`configure.sh` with the options he wants. `configure.sh` uses those options and
441the `Makefile` template ([`Makefile.in`][70]) to generate an actual valid
442`Makefile`. Then `make` can do the rest.
443
444For more information about the build process, see the [Build System][142]
445section and the [build manual][14].
446
447For more information about shell scripts, see [POSIX Shell Scripts][76].
448
449`configure.sh` does the following:
450
4511.	It processes command-line arguments and figure out what the user wants to
452	build.
4532.	It reads in [`Makefile.in`][70].
4543.	One-by-one, it replaces placeholders (in [`Makefile.in`][70]) of the form
455	`%%<placeholder_name>%%` based on the [build type][81].
4564.	It appends a list of file targets based on the [build type][81].
4575.	It appends the correct test targets.
4586.	It copies the correct manpage and markdown manual for `bc` and `dc` into a
459	location from which they can be copied for install.
4607.	It does a `make clean` to reset the build state.
461
462### `.gitattributes`
463
464A `.gitattributes` file. This is needed to preserve the `crlf` line endings in
465the Visual Studio files.
466
467### `.gitignore`
468
469The `.gitignore`
470
471### `LICENSE.md`
472
473This is the `LICENSE` file, including the licenses of various software that I
474have borrowed.
475
476### `Makefile.in`
477
478This is the `Makefile` template for [`configure.sh`][69] to use for generating a
479`Makefile`.
480
481For more information, see [`configure.sh`][69], the [Build System][142] section,
482and the [build manual][14].
483
484Because of [portability][136], the generated `Makefile.in` should be a pure
485[POSIX `make`][74]-compatible `Makefile` (minus the placeholders). Here are a
486few snares for the unwary programmer in this file:
487
4881.	No extensions allowed, including and especially GNU extensions.
4892.	If new headers are added, they must also be added to `Makefile.in`.
4903.	Don't delete the `.POSIX:` empty target at the top; that's what tells `make`
491	implementations that pure [POSIX `make`][74] is needed.
492
493In particular, there is no way to set up variables other than the `=` operator.
494There are no conditionals, so all of the conditional stuff must be in
495[`configure.sh`][69]. This is, in fact, why [`configure.sh`][69] exists in the
496first place: [POSIX `make`][74] is barebones and only does a build with no
497configuration.
498
499### `NEWS.md`
500
501A running changelog with an entry for each version. This should be updated at
502the same time that [`include/version.h`][75] is.
503
504### `NOTICE.md`
505
506The `NOTICE` file with proper attributions.
507
508### `README.md`
509
510The `README`. Read it.
511
512### `benchmarks/`
513
514The folder containing files to generate benchmarks.
515
516Each of these files was made, at one time or another, to benchmark some
517experimental feature, so if it seems there is no rhyme or reason to these
518benchmarks, it is because there is none, besides historical accident.
519
520#### `bc/`
521
522The folder containing `bc` scripts to generate `bc` benchmarks.
523
524##### `add.bc`
525
526The file to generate the benchmark to benchmark addition in `bc`.
527
528##### `arrays_and_constants.bc`
529
530The file to generate the benchmark to benchmark `bc` using lots of array names
531and constants.
532
533##### `arrays.bc`
534
535The file to generate the benchmark to benchmark `bc` using lots of array names.
536
537##### `constants.bc`
538
539The file to generate the benchmark to benchmark `bc` using lots of constants.
540
541##### `divide.bc`
542
543The file to generate the benchmark to benchmark division in `bc`.
544
545##### `functions.bc`
546
547The file to generate the benchmark to benchmark `bc` using lots of functions.
548
549##### `irand_long.bc`
550
551The file to generate the benchmark to benchmark `bc` using lots of calls to
552`irand()` with large bounds.
553
554##### `irand_short.bc`
555
556The file to generate the benchmark to benchmark `bc` using lots of calls to
557`irand()` with small bounds.
558
559##### `lib.bc`
560
561The file to generate the benchmark to benchmark `bc` using lots of calls to
562heavy functions in `lib.bc`.
563
564##### `multiply.bc`
565
566The file to generate the benchmark to benchmark multiplication in `bc`.
567
568##### `postfix_incdec.bc`
569
570The file to generate the benchmark to benchmark `bc` using postfix increment and
571decrement operators.
572
573##### `power.bc`
574
575The file to generate the benchmark to benchmark power (exponentiation) in `bc`.
576
577##### `subtract.bc`
578
579The file to generate the benchmark to benchmark subtraction in `bc`.
580
581##### `strings.bc`
582
583The file to generate the benchmark to benchmark `bc` using lots of strings.
584
585#### `dc/`
586
587The folder containing `dc` scripts to generate `dc` benchmarks.
588
589##### `modexp.dc`
590
591The file to generate the benchmark to benchmark modular exponentiation in `dc`.
592
593### `gen/`
594
595A folder containing the files necessary to generate C strings that will be
596embedded in the executable.
597
598All of the files in this folder have license headers, but the program and script
599that can generate strings from them include code to strip the license header out
600before strings are generated.
601
602#### `bc_help.txt`
603
604A text file containing the text displayed for `bc -h` or `bc --help`.
605
606This text just contains the command-line options and a short summary of the
607differences from GNU and BSD `bc`'s. It also directs users to the manpage.
608
609The reason for this is because otherwise, the help would be far too long to be
610useful.
611
612**Warning**: The text has some `printf()` format specifiers. You need to make
613sure the format specifiers match the arguments given to `bc_file_printf()`.
614
615#### `dc_help.txt`
616
617A text file containing the text displayed for `dc -h` or `dc --help`.
618
619This text just contains the command-line options and a short summary of the
620differences from GNU and BSD `dc`'s. It also directs users to the manpage.
621
622The reason for this is because otherwise, the help would be far too long to be
623useful.
624
625**Warning**: The text has some `printf()` format specifiers. You need to make
626sure the format specifiers match the arguments given to `bc_file_printf()`.
627
628#### `lib.bc`
629
630A `bc` script containing the [standard math library][5] required by POSIX. See
631the [POSIX standard][2] for what is required.
632
633This file does not have any extraneous whitespace, except for tabs at the
634beginning of lines. That is because this data goes directly into the binary,
635and whitespace is extra bytes in the binary. Thus, not having any extra
636whitespace shrinks the resulting binary.
637
638However, tabs at the beginning of lines are kept for two reasons:
639
6401.	Readability. (This file is still code.)
6412.	The program and script that generate strings from this file can remove
642	tabs at the beginning of lines.
643
644For more details about the algorithms used, see the [algorithms manual][25].
645
646However, there are a few snares for unwary programmers.
647
648First, all constants must be one digit. This is because otherwise, multi-digit
649constants could be interpreted wrongly if the user uses a different `ibase`.
650This does not happen with single-digit numbers because they are guaranteed to be
651interpreted what number they would be if the `ibase` was as high as possible.
652
653This is why `A` is used in the library instead of `10`, and things like `2*9*A`
654for `180` in [`lib2.bc`][26].
655
656As an alternative, you can set `ibase` in the function, but if you do, make sure
657to set it with a single-digit number and beware the snare below...
658
659Second, `scale`, `ibase`, and `obase` must be safely restored before returning
660from any function in the library. This is because without the `-g` option,
661functions are allowed to change any of the globals.
662
663Third, all local variables in a function must be declared in an `auto` statement
664before doing anything else. This includes arrays. However, function parameters
665are considered predeclared.
666
667Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code
668must not use *any* extensions. It has to work when users use the `-s` or `-w`
669flags.
670
671#### `lib2.bc`
672
673A `bc` script containing the [extended math library][7].
674
675Like [`lib.bc`][8], and for the same reasons, this file should have no
676extraneous whitespace, except for tabs at the beginning of lines.
677
678For more details about the algorithms used, see the [algorithms manual][25].
679
680Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary
681programmers when writing code for `lib2.bc`.
682
683#### `strgen.c`
684
685Code for the program to generate C strings from text files. This is the original
686program, although [`strgen.sh`][9] was added later.
687
688The reason I used C here is because even though I knew `sh` would be available
689(it must be available to run `configure.sh`), I didn't know how to do what I
690needed to do with POSIX utilities and `sh`.
691
692Later, [`strgen.sh`][9] was contributed by Stefan Eßer of FreeBSD, showing that
693it *could* be done with `sh` and POSIX utilities.
694
695However, `strgen.c` exists *still* exists because the versions generated by
696[`strgen.sh`][9] may technically hit an environmental limit. (See the [draft C99
697standard][12], page 21.) This is because [`strgen.sh`][9] generates string
698literals, and in C99, string literals can be limited to 4095 characters, and
699`gen/lib2.bc` is above that.
700
701Fortunately, the limit for "objects," which include `char` arrays, is much
702bigger: 65535 bytes, so that's what `strgen.c` generates.
703
704However, the existence of `strgen.c` does come with a cost: the build needs C99
705compiler that targets the host machine. For more information, see the ["Cross
706Compiling" section][13] of the [build manual][14].
707
708Read the comments in `strgen.c` for more detail about it, the arguments it
709takes, and how it works.
710
711#### `strgen.sh`
712
713An `sh` script that will generate C strings that uses only POSIX utilities. This
714exists for those situations where a host C99 compiler is not available, and the
715environment limits mentioned above in [`strgen.c`][15] don't matter.
716
717`strgen.sh` takes the same arguments as [`strgen.c`][15], and the arguments mean
718the exact same things, so see the comments in [`strgen.c`][15] for more detail
719about that, and see the comments in `strgen.sh` for more details about it and
720how it works.
721
722For more information about shell scripts, see [POSIX Shell Scripts][76].
723
724### `include/`
725
726A folder containing the headers.
727
728The headers are not included among the source code because I like it better that
729way. Also there were folders within `src/` at one point, and I did not want to
730see `#include "../some_header.h"` or things like that.
731
732So all headers are here, even though only one ([`bcl.h`][30]) is meant for end
733users (to be installed in `INCLUDEDIR`).
734
735#### `args.h`
736
737This file is the API for processing command-line arguments.
738
739#### `bc.h`
740
741This header is the API for `bc`-only items. This includes the `bc_main()`
742function and the `bc`-specific lexing and parsing items.
743
744The `bc` parser is perhaps the most sensitive part of the entire codebase. See
745the documentation in `bc.h` for more information.
746
747The code associated with this header is in [`src/bc.c`][40],
748[`src/bc_lex.c`][41], and [`src/bc_parse.c`][42].
749
750#### `bcl.h`
751
752This header is the API for the [`bcl`][156] library.
753
754This header is meant for distribution to end users and contains the API that end
755users of [`bcl`][156] can use in their own software.
756
757This header, because it's the public header, is also the root header. That means
758that it has platform-specific fixes for Windows. (If the fixes were not in this
759header, the build would fail on Windows.)
760
761The code associated with this header is in [`src/library.c`][43].
762
763#### `dc.h`
764
765This header is the API for `dc`-only items. This includes the `dc_main()`
766function and the `dc`-specific lexing and parsing items.
767
768The code associated with this header is in [`src/dc.c`][44],
769[`src/dc_lex.c`][45], and [`src/dc_parse.c`][46].
770
771#### `file.h`
772
773This header is for `bc`'s internal buffered I/O API.
774
775For more information about `bc`'s error handling and custom buffered I/O, see
776[Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the
777notes about version [`3.0.0`][32] in the [`NEWS`][32].
778
779The code associated with this header is in [`src/file.c`][47].
780
781#### `history.h`
782
783This header is for `bc`'s implementation of command-line editing/history, which
784is based on a [UTF-8-aware fork][28] of [`linenoise`][29].
785
786For more information, see the [Command-Line History][189] section.
787
788The code associated with this header is in [`src/history.c`][48].
789
790#### `lang.h`
791
792This header defines the data structures and bytecode used for actual execution
793of `bc` and `dc` code.
794
795Yes, it's misnamed; that's an accident of history where the first things I put
796into it all seemed related to the `bc` language.
797
798The code associated with this header is in [`src/lang.c`][49].
799
800#### `lex.h`
801
802This header defines the common items that both programs need for lexing.
803
804The code associated with this header is in [`src/lex.c`][50],
805[`src/bc_lex.c`][41], and [`src/dc_lex.c`][45].
806
807#### `library.h`
808
809This header defines the things needed for [`bcl`][156] that users should *not*
810have access to. In other words, [`bcl.h`][30] is the *public* header for the
811library, and this header is the *private* header for the library.
812
813The code associated with this header is in [`src/library.c`][43].
814
815#### `num.h`
816
817This header is the API for numbers and math.
818
819The code associated with this header is in [`src/num.c`][39].
820
821#### `opt.h`
822
823This header is the API for parsing command-line arguments.
824
825It's different from [`args.h`][31] in that [`args.h`][31] is for the main code
826to process the command-line arguments into global data *after* they have already
827been parsed by `opt.h` into proper tokens. In other words, `opt.h` actually
828parses the command-line arguments, and [`args.h`][31] turns that parsed data
829into flags (bits), strings, and expressions that will be used later.
830
831Why are they separate? Because originally, `bc` used `getopt_long()` for
832parsing, so [`args.h`][31] was the only one that existed. After it was
833discovered that `getopt_long()` has different behavior on different platforms, I
834adapted a [public-domain option parsing library][34] to do the job instead. And
835in doing so, I gave it its own header.
836
837They could probably be combined, but I don't really care enough at this point.
838
839The code associated with this header is in [`src/opt.c`][51].
840
841#### `parse.h`
842
843This header defines the common items that both programs need for parsing.
844
845Note that the parsers don't produce abstract syntax trees (AST's) or any
846intermediate representations. They produce bytecode directly. In other words,
847they don't have special data structures except what they need to do their job.
848
849The code associated with this header is in [`src/parse.c`][50],
850[`src/bc_lex.c`][42], and [`src/dc_lex.c`][46].
851
852#### `program.h`
853
854This header defines the items needed to manage the data structures in
855[`lang.h`][38] as well as any helper functions needed to generate bytecode or
856execute it.
857
858The code associated with this header is in [`src/program.c`][53].
859
860#### `rand.h`
861
862This header defines the API for the [pseudo-random number generator
863(PRNG)][179].
864
865The PRNG only generates fixed-size integers. The magic of generating random
866numbers of arbitrary size is actually given to the code that does math
867([`src/num.c`][39]).
868
869The code associated with this header is in [`src/rand.c`][54].
870
871#### `read.h`
872
873This header defines the API for reading from files and `stdin`.
874
875Thus, [`file.h`][55] is really for buffered *output*, while this file is for
876*input*. There is no buffering needed for `bc`'s inputs.
877
878The code associated with this header is in [`src/read.c`][56].
879
880#### `status.h`
881
882This header has several things:
883
884* A list of possible errors that internal `bc` code can use.
885* Compiler-specific fixes.
886* Platform-specific fixes.
887* Macros for `bc`'s [error handling][97].
888
889There is no code associated with this header.
890
891#### `vector.h`
892
893This header defines the API for the vectors (resizable arrays) that are used for
894data structures.
895
896Vectors are what do the heavy lifting in almost all of `bc`'s data structures.
897Even the maps of identifiers and arrays use vectors.
898
899The code associated with this header is in [`src/vector.c`][228].
900
901#### `version.h`
902
903This header defines the version of `bc`.
904
905There is no code associated with this header.
906
907#### `vm.h`
908
909This header defines the API for setting up and running `bc` and `dc`.
910
911It is so named because I think of it as the "virtual machine" of `bc`, though
912that is probably not true as [`program.h`][57] is probably the "virtual machine"
913API. Thus, the name is more historical accident.
914
915The code associated with this header is in [`src/vm.c`][58].
916
917### `locales/`
918
919This folder contains a bunch of `.msg` files and soft links to the real `.msg`
920files. This is how locale support is implemented in `bc`.
921
922The files are in the format required by the [`gencat`][59] POSIX utility. They
923all have the same messages, in the same order, with the same numbering, under
924the same groups. This is because the locale system expects those messages in
925that order.
926
927The softlinks exist because for many locales, they would contain the exact same
928information. To prevent duplication, they are simply linked to a master copy.
929
930The naming format for all files is:
931
932```
933<language_code>_<country_code>.<encoding>.msg
934```
935
936This naming format must be followed for all locale files.
937
938### `manuals/`
939
940This folder contains the documentation for `bc`, `dc`, and [`bcl`][156], along
941with a few other manuals.
942
943#### `algorithms.md`
944
945This file explains the mathematical algorithms that are used.
946
947The hope is that this file will guide people in understanding how the math code
948works.
949
950#### `bc.1.md.in`
951
952This file is a template for the markdown version of the `bc` manual and
953manpages.
954
955For more information about how the manpages and markdown manuals are generated,
956and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
957
958#### `bcl.3`
959
960This is the manpage for the [`bcl`][156] library. It is generated from
961[`bcl.3.md`][61] using [`scripts/manpage.sh`][60].
962
963For the reason why I check generated data into the repo, see
964[`scripts/manpage.sh`][60] and [Manuals][86].
965
966#### `bcl.3.md`
967
968This is the markdown manual for the [`bcl`][156] library. It is the source for the
969generated [`bcl.3`][62] file.
970
971#### `benchmarks.md`
972
973This is a document that compares this `bc` to GNU `bc` in various benchmarks. It
974was last updated when version [`3.0.0`][32] was released.
975
976It has very little documentation value, other than showing what compiler options
977are useful for performance.
978
979#### `build.md`
980
981This is the [build manual][14].
982
983This `bc` has a custom build system. The reason for this is because of
984[*portability*][136].
985
986If `bc` used an outside build system, that build system would be an external
987dependency. Thus, I had to write a build system for `bc` that used nothing but
988C99 and POSIX utilities, including barebones [POSIX `make`][74].
989
990for more information about the build system, see the [build system][142]
991section, the [build manual][14], [`configure.sh`][69], and [`Makefile.in`][70].
992
993#### `dc.1.md.in`
994
995This file is a template for the markdown version of the `dc` manual and
996manpages.
997
998For more information about how the manpages and markdown manuals are generated,
999and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
1000
1001#### `development.md`
1002
1003The file you are reading right now.
1004
1005#### `header_bcl.txt`
1006
1007Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper
1008header.
1009
1010For more information about generating manuals, see [`scripts/manpage.sh`][60]
1011and [Manuals][86].
1012
1013#### `header_bc.txt`
1014
1015Used by [`scripts/manpage.sh`][60] to give the [generated `bc` manpages][79] a
1016proper header.
1017
1018For more information about generating manuals, see [`scripts/manpage.sh`][60]
1019and [Manuals][86].
1020
1021#### `header_dc.txt`
1022
1023Used by [`scripts/manpage.sh`][60] to give the [generated `dc` manpages][80] a
1024proper header.
1025
1026For more information about generating manuals, see [`scripts/manpage.sh`][60]
1027and [Manuals][86].
1028
1029#### `header.txt`
1030
1031Used by [`scripts/manpage.sh`][60] to give all generated manpages a license
1032header.
1033
1034For more information about generating manuals, see [`scripts/manpage.sh`][60]
1035and [Manuals][86].
1036
1037#### `release.md`
1038
1039A checklist that I try to somewhat follow when making a release.
1040
1041#### `bc/`
1042
1043A folder containing the `bc` manuals.
1044
1045Each `bc` manual corresponds to a [build type][81]. See that link for more
1046details.
1047
1048For each manual, there are two copies: the markdown version generated from the
1049template, and the manpage generated from the markdown version.
1050
1051#### `dc/`
1052
1053A folder containing the `dc` manuals.
1054
1055Each `dc` manual corresponds to a [build type][81]. See that link for more
1056details.
1057
1058For each manual, there are two copies: the markdown version generated from the
1059template, and the manpage generated from the markdown version.
1060
1061### `scripts/`
1062
1063This folder contains helper scripts. Most of them are written in pure [POSIX
1064`sh`][72], but three ([`afl.py`][94], [`karatsuba.py`][78], and
1065[`randmath.py`][95]) are written in Python 3, and one ([`ministat.c`][223]) is
1066written in C. [`ministat.c`][223] in particular is copied from elsewhere.
1067
1068For more information about the shell scripts, see [POSIX Shell Scripts][76].
1069
1070#### `afl.py`
1071
1072This script is meant to be used as part of the fuzzing workflow.
1073
1074It does one of two things: checks for valid crashes, or runs `bc` and or `dc`
1075under all of the paths found by [AFL++][125].
1076
1077See [Fuzzing][82] for more information about fuzzing, including this script.
1078
1079#### `alloc.sh`
1080
1081This script is a quick and dirty script to test whether or not the garbage
1082collection mechanism of the [`BcNum` caching][96] works. It has been little-used
1083because it tests something that is not important to correctness.
1084
1085#### `benchmark.sh`
1086
1087A script making it easy to run benchmarks and to run the executable produced by
1088[`ministat.c`][223] on them.
1089
1090For more information, see the [Benchmarks][144] section.
1091
1092#### `bitfuncgen.c`
1093
1094A source file for an executable to generate tests for `bc`'s bitwise functions
1095in [`gen/lib2.bc`][26]. The executable is `scripts/bitfuncgen`, and it is built
1096with `make bitfuncgen`. It produces the test on `stdout` and the expected
1097results on `stderr`. This means that to generat tests, use the following
1098invokation:
1099
1100```
1101scripts/bitfuncgen > tests/bc/bitfuncs.txt 2> tests/bc/bitfuncs_results.txt
1102```
1103
1104It calls `abort()` if it runs into an error.
1105
1106#### `exec-install.sh`
1107
1108This script is the magic behind making sure `dc` is installed properly if it's
1109a symlink to `bc`. It checks to see if it is a link, and if so, it just creates
1110a new symlink in the install directory. Of course, it also installs `bc` itself,
1111or `dc` when it's alone.
1112
1113#### `functions.sh`
1114
1115This file is a bunch of common functions for most of the POSIX shell scripts. It
1116is not supposed to be run; instead, it is *sourced* by other POSIX shell
1117scripts, like so:
1118
1119```
1120. "$scriptdir/functions.sh"
1121```
1122
1123or the equivalent, depending on where the sourcing script is.
1124
1125For more information about the shell scripts, see [POSIX Shell Scripts][76].
1126
1127#### `fuzz_prep.sh`
1128
1129Fuzzing is a regular activity when I am preparing for a release.
1130
1131This script handles all the options and such for building a fuzzable binary.
1132Instead of having to remember a bunch of options, I just put them in this script
1133and run the script when I want to fuzz.
1134
1135For more information about fuzzing, see [Fuzzing][82].
1136
1137#### `karatsuba.py`
1138
1139This script has at least one of two major differences from most of the other
1140scripts:
1141
1142* It's written in Python 3.
1143* It's meant for software packagers.
1144
1145For example, [`scripts/afl.py`][94] and [`scripts/randmath.py`][95] are both in
1146Python 3, but they are not meant for the end user or software packagers and are
1147not included in source distributions. But this script is.
1148
1149This script breaks my rule of only POSIX utilities necessary for package
1150maintainers, but there's a very good reason for that: it's only meant to be run
1151*once* when the package is created for the first time, and maybe not even then.
1152
1153You see, this script does two things: it tests the Karatsuba implementation at
1154various settings for `KARATSUBA_LEN`, and it figures out what the optimal
1155`KARATSUBA_LEN` is for the machine that it is running on.
1156
1157Package maintainers can use this script, when creating a package for this `bc`,
1158to figure out what is optimal for their users. Then they don't have to run it
1159ever again. So this script only has to run on the packagers machine.
1160
1161I tried to write the script in `sh`, by the way, and I finally accepted the
1162tradeoff of using Python 3 when it became too hard.
1163
1164However, I also mentioned that it's for testing Karatsuba with various settings
1165of `KARATSUBA_LEN`. Package maintainers will want to run the [test suite][124],
1166right?
1167
1168Yes, but this script is not part of the [test suite][124]; it's used for testing
1169in the [`scripts/release.sh`][83] script, which is maintainer use only.
1170
1171However, there is one snare with `karatsuba.py`: I didn't want the user to have
1172to install any Python libraries to run it. Keep that in mind if you change it.
1173
1174#### `link.sh`
1175
1176This script is the magic behind making `dc` a symlink of `bc` when both
1177calculators are built.
1178
1179#### `locale_install.sh`
1180
1181This script does what its name says: it installs locales.
1182
1183It turns out that this is complicated.
1184
1185There is a magic environment variable, `$NLSPATH`, that tells you how and where
1186you are supposed to install locales.
1187
1188Yes, *how*. And where.
1189
1190But now is not the place to rant about `$NLSPATH`. For more information on
1191locales and `$NLSPATH`, see [Locales][85].
1192
1193#### `locale_uninstall.sh`
1194
1195This script does what its name says: it uninstalls locales.
1196
1197This is far less complicated than installing locales. I basically generate a
1198wildcard path and then list all paths that fit that wildcard. Then I delete each
1199one of those paths. Easy.
1200
1201For more information on locales, see [Locales][85].
1202
1203#### `manpage.sh`
1204
1205This script is the one that generates markdown manuals from a template and a
1206manpage from a markdown manual.
1207
1208For more information about generating manuals, see [Manuals][86].
1209
1210#### `ministat.c`
1211
1212This is a file copied [from FreeBSD][221] that calculates the standard
1213statistical numbers, such as mean, average, and median, based on numbers
1214obtained from a file.
1215
1216For more information, see the [FreeBSD ministat(1) manpage][222].
1217
1218This file allows `bc` to build the `scripts/ministat` executable using the
1219command `make ministat`, and this executable helps programmers evaluate the
1220results of [benchmarks][144] more accurately.
1221
1222#### `package.sh`
1223
1224This script is what helps `bc` maintainers cut a release. It does the following:
1225
12261.	Creates the appropriate `git` tag.
12272.	Pushes the `git` tag.
12283.	Copies the repo to a temp directory.
12294.	Removes files that should not be included in source distributions.
12305.	Creates the tarballs.
12316.	Signs the tarballs.
12327.	Zips and signs the Windows executables if they exist.
12338.	Calculates and outputs SHA512 and SHA256 sums for all of the files,
1234	including the signatures.
1235
1236This script is for `bc` maintainers to use when cutting a release. It is not
1237meant for outside use. This means that some non-POSIX utilities can be used,
1238such as `git` and `gpg`.
1239
1240In addition, before using this script, it expects that the folders that Windows
1241generated when building `bc`, `dc`, and [`bcl`][156], are in the parent
1242directory of the repo, exactly as Windows generated them. If they are not there,
1243then it will not zip and sign, nor calculate sums of, the Windows executables.
1244
1245Because this script creates a tag and pushes it, it should *only* be run *ONCE*
1246per release.
1247
1248#### `radamsa.sh`
1249
1250A script to test `bc`'s command-line expression parsing code, which, while
1251simple, strives to handle as much as possible.
1252
1253What this script does is it uses the test cases in [`radamsa.txt`][98] an input
1254to the [Radamsa fuzzer][99].
1255
1256For more information, see the [Radamsa][128] section.
1257
1258#### `radamsa.txt`
1259
1260Initial test cases for the [`radamsa.sh`][100] script.
1261
1262#### `randmath.py`
1263
1264This script generates random math problems and checks that `bc`'s and `dc`'s
1265output matches the GNU `bc` and `dc`. (For this reason, it is necessary to have
1266GNU `bc` and `dc` installed before using this script.)
1267
1268One snare: be sure that this script is using the GNU `bc` and `dc`, not a
1269previously-installed version of this `bc` and `dc`.
1270
1271If you want to check for memory issues or failing asserts, you can build the
1272`bc` using `./scripts/fuzz_prep.sh -a`, and then run it under this script. Any
1273errors or crashes should be caught by the script and given to the user as part
1274of the "checklist" (see below).
1275
1276The basic idea behind this script is that it generates as many math problems as
1277it can, biasing towards situations that may be likely to have bugs, and testing
1278each math problem against GNU `bc` or `dc`.
1279
1280If GNU `bc` or `dc` fails, it just continues. If this `bc` or `dc` fails, it
1281stores that problem. If the output mismatches, it also stores the problem.
1282
1283Then, when the user sends a `SIGINT`, the script stops testing and goes into
1284report mode. One-by-one, it will go through the "checklist," the list of failed
1285problems, and present each problem to the user, as well as whether this `bc` or
1286`dc` crashed, and its output versus GNU. Then the user can decide to add them as
1287test cases, which it does automatically to the appropriate test file.
1288
1289#### `release_settings.txt`
1290
1291A text file of settings combinations that [`release.sh`][83] uses to ensure that
1292`bc` and `dc` build and work with various default settings. [`release.sh`][83]
1293simply reads it line by line and uses each line for one build.
1294
1295#### `release.sh`
1296
1297This script is for `bc` maintainers only. It runs `bc`, `dc`, and [`bcl`][156]
1298through a gauntlet that is mostly meant to be used in preparation for a release.
1299
1300It does the following:
1301
13021.	Builds every [build type][81], with every setting combo in
1303	[`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1304	alone.
13052.	Builds every [build type][81], with every setting combo in
1306	[`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1307	alone for 32-bit.
13083.	Does #1 and #2 for Debug, Release, Release with Debug Info, and Min Size
1309	Release builds.
13104.	Runs the [test suite][124] on every build, if desired.
13115.	Runs the [test suite][124] under [ASan, UBSan, and MSan][21] for every build
1312	type/setting combo.
13136.	Runs [`scripts/karatsuba.py`][78] in test mode.
13147.	Runs the [test suite][124] for both calculators, `bc` alone, and `dc` alone
1315	under [valgrind][20] and errors if there are any memory bugs or memory
1316	leaks.
1317
1318#### `safe-install.sh`
1319
1320A script copied from [musl][101] to atomically install files.
1321
1322#### `test_settings.sh`
1323
1324A quick and dirty script to help automate rebuilding while manually testing the
1325various default settings.
1326
1327This script uses [`test_settings.txt`][103] to generate the various settings
1328combos.
1329
1330For more information about settings, see [Settings][102] in the [build
1331manual][14].
1332
1333#### `test_settings.txt`
1334
1335A list of the various settings combos to be used by [`test_settings.sh`][104].
1336
1337### `src/`
1338
1339This folder is, obviously, where the actual heart and soul of `bc`, the source
1340code, is.
1341
1342All of the source files are in one folder; this simplifies the build system
1343immensely.
1344
1345There are separate files for `bc` and `dc` specific code ([`bc.c`][40],
1346[`bc_lex.c`][41], [`bc_parse.c`][42], [`dc.c`][44], [`dc_lex.c`][45], and
1347[`dc_parse.c`][46]) where possible because it is cleaner to exclude an entire
1348source file from a build than to have `#if`/`#endif` preprocessor guards.
1349
1350That said, it was easier in many cases to use preprocessor macros where both
1351calculators used much of the same code and data structures, so there is a
1352liberal sprinkling of them through the code.
1353
1354#### `args.c`
1355
1356Code for processing command-line arguments.
1357
1358The header for this file is [`include/args.h`][31].
1359
1360#### `bc.c`
1361
1362The code for the `bc` main function `bc_main()`.
1363
1364The header for this file is [`include/bc.h`][106].
1365
1366#### `bc_lex.c`
1367
1368The code for lexing that only `bc` needs.
1369
1370The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106].
1371
1372#### `bc_parse.c`
1373
1374The code for parsing that only `bc` needs. This code is the most complex and
1375subtle in the entire codebase.
1376
1377The headers for this file are [`include/parse.h`][181] and
1378[`include/bc.h`][106].
1379
1380#### `data.c`
1381
1382Due to [historical accident][23] because of a desire to get my `bc` into
1383[toybox][16], all of the constant data that `bc` needs is all in one file. This
1384is that file.
1385
1386There is no code in this file, but a lot of the const data has a heavy influence
1387on code, including the order of data in arrays because that order has to
1388correspond to the order of other things elsewhere in the codebase. If you change
1389the order of something in this file, run `make test`, and get errors, you
1390changed something that depends on the order that you messed up.
1391
1392Almost all headers have `extern` references to items in this file.
1393
1394#### `dc.c`
1395
1396The code for the `dc` main function `dc_main()`.
1397
1398The header for this file is [`include/dc.h`][182].
1399
1400#### `dc_lex.c`
1401
1402The code for lexing that only `dc` needs.
1403
1404The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182].
1405
1406#### `dc_parse.c`
1407
1408The code for parsing that only `dc` needs.
1409
1410The headers for this file are [`include/parse.h`][181] and
1411[`include/bc.h`][182].
1412
1413#### `file.c`
1414
1415The code for `bc`'s implementation of buffered I/O. For more information about
1416why I implemented my own buffered I/O, see [`include/file.h`][55], [Error
1417Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes
1418about version [`3.0.0`][32] in the [`NEWS`][32].
1419
1420The header for this file is [`include/file.h`][55].
1421
1422#### `history.c`
1423
1424The code for `bc`'s implementation of command-line editing/history, which is
1425based on a [UTF-8-aware fork][28] of [`linenoise`][29].
1426
1427For more information, see the [Command-Line History][189] section.
1428
1429The header for this file is [`include/history.h`][36].
1430
1431#### `lang.c`
1432
1433The data structures used for actual execution of `bc` and `dc` code.
1434
1435While execution is done in [`src/program.c`][53], this file defines functions
1436for initializing, copying, and freeing the data structures, which is somewhat
1437orthogonal to actual execution.
1438
1439Yes, it's misnamed; that's an accident of history where the first things I put
1440into it all seemed related to the `bc` language.
1441
1442The header for this file is [`include/lang.h`][38].
1443
1444#### `lex.c`
1445
1446The code for the common things that both programs need for lexing.
1447
1448The header for this file is [`include/lex.h`][180].
1449
1450#### `library.c`
1451
1452The code to implement the public API of the `bcl` library.
1453
1454The code in this file does a lot to ensure that clients do not have to worry
1455about internal `bc` details, especially error handling with `setjmp()` and
1456`longjmp()`. That and encapsulating the handling of numbers are the bulk of what
1457the code in this file actually does because most of the library is still
1458implemented in [`src/num.c`][39].
1459
1460The headers for this file are [`include/bcl.h`][30] and
1461[`include/library.h`][183].
1462
1463#### `main.c`
1464
1465The entry point for both programs; this is the `main()` function.
1466
1467This file has no headers associated with it.
1468
1469#### `num.c`
1470
1471The code for all of the arbitrary-precision [numbers][177] and [math][178] in
1472`bc`.
1473
1474The header for this file is [`include/num.h`][184].
1475
1476#### `opt.c`
1477
1478The code for parsing command-line options.
1479
1480The header for this file is [`include/opt.h`][35].
1481
1482#### `parse.c`
1483
1484The code for the common items that both programs need for parsing.
1485
1486The header for this file is [`include/parse.h`][181].
1487
1488#### `program.c`
1489
1490The code for the actual execution engine for `bc` and `dc` code.
1491
1492The header for this file is [`include/program.h`][57].
1493
1494#### `rand.c`
1495
1496The code for the [pseudo-random number generator (PRNG)][179] and the special
1497stack handling it needs.
1498
1499The PRNG only generates fixed-size integers. The magic of generating random
1500numbers of arbitrary size is actually given to the code that does math
1501([`src/num.c`][39]).
1502
1503The header for this file is [`include/rand.h`][37].
1504
1505#### `read.c`
1506
1507The code for reading from files and `stdin`.
1508
1509The header for this file is [`include/read.h`][185].
1510
1511#### `vector.c`
1512
1513The code for [vectors][111], [maps][186], and [slab vectors][187], along with
1514slabs.
1515
1516The header for this file is [`include/vector.h`][174].
1517
1518#### `vm.c`
1519
1520The code for setting up and running `bc` and `dc`.
1521
1522It is so named because I think of it as the "virtual machine" of `bc`, though
1523that is probably not true as [`program.h`][57] is probably the "virtual machine"
1524code. Thus, the name is more historical accident.
1525
1526The header for this file is [`include/vm.h`][27].
1527
1528### `tests/`
1529
1530This directory contains the entire [test suite][124] and its infrastructure.
1531
1532#### `all.sh`
1533
1534A convenience script for the `make run_all_tests` target (see the [Group
1535Tests][141] section for more information).
1536
1537#### `all.txt`
1538
1539The file with the names of the calculators. This is to make it easier for the
1540test scripts to know where the standard and other test directories are.
1541
1542#### `bcl.c`
1543
1544The test for the [`bcl`][156] API. For more information, see the [`bcl`
1545Test][157] section.
1546
1547#### `error.sh`
1548
1549The script to run the file-based error tests in `tests/<calculator>/errors/` for
1550each calculator. For more information, see the [Error Tests][151] section.
1551
1552This is a separate script so that each error file can be run separately and in
1553parallel.
1554
1555#### `errors.sh`
1556
1557The script to run the line-based error tests in `tests/<calculator>/errors.txt`
1558for each calculator. For more information, see the [Error Tests][151] section.
1559
1560#### `extra_required.txt`
1561
1562The file with the list of tests which both calculators have that need the [Extra
1563Math build option][188]. This exists to make it easy for test scripts to skip
1564those tests when the [Extra Math build option][188] is disabled.
1565
1566#### `history.py`
1567
1568The file with all of the history tests. For more information, see the [History
1569Tests][155] section.
1570
1571#### `history.sh`
1572
1573The script to integrate [`history.py`][139] into the build system in a portable
1574way, and to skip it if necessary.
1575
1576This script also re-runs the test three times if it fails. This is because
1577`pexpect` can be flaky at times.
1578
1579#### `other.sh`
1580
1581The script to run the "other" (miscellaneous) tests for each calculator. For
1582more information, see the [Other Tests][154] section.
1583
1584#### `read.sh`
1585
1586The script to run the read tests for each calculator. For more information, see
1587the [`read()` Tests][153] section.
1588
1589#### `script.sed`
1590
1591The `sed` script to edit the output of GNU `bc` when generating script tests.
1592For more information, see the [Script Tests][150] section.
1593
1594#### `script.sh`
1595
1596The script for running one script test. For more information, see the [Script
1597Tests][150] section.
1598
1599#### `scripts.sh`
1600
1601The script to help the `make run_all_tests` (see the [Group Tests][141] section)
1602run all of the script tests.
1603
1604#### `stdin.sh`
1605
1606The script to run the `stdin` tests for each calculator. For more information,
1607see the [`stdin` Tests][152] section.
1608
1609#### `test.sh`
1610
1611The script to run one standard test. For more information, see the [Standard
1612Tests][149] section.
1613
1614#### `bc/`
1615
1616The standard tests directory for `bc`. For more information, see the [Standard
1617Tests][149] section.
1618
1619##### `all.txt`
1620
1621The file to tell the build system and `make run_all_tests` (see the [Group
1622Tests][141] section) what standard tests to run for `bc`, as well as in what
1623order.
1624
1625This file just lists the test names, one per line.
1626
1627##### `errors.txt`
1628
1629The initial error test file for `bc`. This file has one test per line. See the
1630[Error Tests][151] section for more information.
1631
1632##### `posix_errors.txt`
1633
1634The file of tests for POSIX compatibility for `bc`. This file has one test per
1635line. For more information, see the [Error Tests][151] section.
1636
1637##### `timeconst.sh`
1638
1639The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6].
1640For more information, see the [Linux `timeconst.bc` Script][191]section.
1641
1642##### `errors/`
1643
1644The directory with error tests for `bc`, most discovered by AFL++ (see the
1645[Fuzzing][82] section). There is one test per file. For more information, see
1646the [Error Tests][151] section.
1647
1648##### `scripts/`
1649
1650The script tests directory for `bc`. For more information, see the [Script
1651Tests][150] section.
1652
1653###### `all.txt`
1654
1655A file to tell the build system and `make run_all_tests` (see the [Group
1656Tests][141] section) what script tests to run for `bc`, as well as in what
1657order.
1658
1659This file just lists the test names, one per line.
1660
1661#### `dc/`
1662
1663The standard tests directory for `dc`. For more information, see the [Standard
1664Tests][149] section.
1665
1666##### `all.txt`
1667
1668The file to tell the build system and `make run_all_tests` (see the [Group
1669Tests][141] section) what standard tests to run for `dc`, as well as in what
1670order.
1671
1672This file just lists the test names, one per line.
1673
1674##### `errors.txt`
1675
1676The initial error test file for `dc`. This file has one test per line. See the
1677[Error Tests][151] section for more information.
1678
1679##### `read_errors.txt`
1680
1681The file of tests errors with the `?` command (`read()` in `bc`). This file has
1682one test per line. See the [Error Tests][151] section for more information.
1683
1684##### `errors/`
1685
1686The directory with error tests for `dc`, most discovered by AFL++ (see the
1687[Fuzzing][82] section). There is one test per file. For more information, see
1688the [Error Tests][151] section.
1689
1690##### `scripts/`
1691
1692The script tests directory for `dc`. For more information, see the [Script
1693Tests][150] section.
1694
1695###### `all.txt`
1696
1697The file to tell the build system and `make run_all_tests` (see the [Group
1698Tests][141] section) what script tests to run for `dc`, as well as in what
1699order.
1700
1701This file just lists the test names, one per line.
1702
1703#### `fuzzing/`
1704
1705The directory containing the fuzzing infrastructure. For more information, see
1706the [Fuzzing][82] section.
1707
1708##### `bc_afl_continue.yaml`
1709
1710The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a
1711fuzz run. For more information, see the [Convenience][130] subsection of the
1712[Fuzzing][82] section.
1713
1714##### `bc_afl.yaml`
1715
1716The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a
1717fuzz run. For more information, see the [Convenience][130] subsection of the
1718[Fuzzing][82] section.
1719
1720Be aware that this will delete all previous unsaved fuzzing tests in the output
1721directories.
1722
1723##### `bc_inputs1/`
1724
1725The fuzzing input directory for the first third of inputs for `bc`. For more
1726information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1727
1728##### `bc_inputs2/`
1729
1730The fuzzing input directory for the second third of inputs for `bc`. For more
1731information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1732
1733##### `bc_inputs3/`
1734
1735The fuzzing input directory for the third third of inputs for `bc`. For more
1736information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1737
1738##### `dc_inputs/`
1739
1740The fuzzing input directory for the inputs for `dc`. For more information, see
1741the [Corpuses][192] subsection of the [Fuzzing][82] section.
1742
1743### `vs/`
1744
1745The directory containing all of the materials needed to build `bc`, `dc`, and
1746`bcl` on Windows.
1747
1748#### `bcl.sln`
1749
1750A Visual Studio solution file for [`bcl`][156]. This, along with
1751[`bcl.vcxproj`][63] and [`bcl.vcxproj.filters`][64] is what makes it possible to
1752build [`bcl`][156] on Windows.
1753
1754#### `bcl.vcxproj`
1755
1756A Visual Studio project file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1757and [`bcl.vcxproj.filters`][64] is what makes it possible to build [`bcl`][156]
1758on Windows.
1759
1760#### `bcl.vcxproj.filters`
1761
1762A Visual Studio filters file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1763and [`bcl.vcxproj`][63] is what makes it possible to build [`bcl`][156] on
1764Windows.
1765
1766#### `bc.sln`
1767
1768A Visual Studio solution file for `bc`. This, along with [`bc.vcxproj`][66]
1769and [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on
1770Windows.
1771
1772#### `bc.vcxproj`
1773
1774A Visual Studio project file for `bc`. This, along with [`bc.sln`][68] and
1775[`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on Windows.
1776
1777#### `bc.vcxproj.filters`
1778
1779A Visual Studio filters file for `bc`. This, along with [`bc.sln`][68] and
1780[`bc.vcxproj`][66] is what makes it possible to build `bc` on Windows.
1781
1782#### `tests/`
1783
1784A directory of files to run tests on Windows.
1785
1786##### `tests_bc.bat`
1787
1788A file to run basic `bc` tests on Windows. It expects that it will be run from
1789the directory containing it, and it also expects a `bc.exe` in the same
1790directory.
1791
1792##### `tests_dc.bat`
1793
1794A file to run basic `dc` tests on Windows. It expects that it will be run from
1795the directory containing it, and it also expects a `bc.exe` in the same
1796directory.
1797
1798## Build System
1799
1800The build system is described in detail in the [build manual][14], so
1801maintainers should start there. This section, however, describes some parts of
1802the build system that only maintainers will care about.
1803
1804### Clean Targets
1805
1806`bc` has a default `make clean` target that cleans up the build files. However,
1807because `bc`'s build system can generate many different types of files, there
1808are other clean targets that may be useful:
1809
1810* `make clean_gen` cleans the `gen/strgen` executable generated from
1811  [`gen/strgen.c`][15]. It has no prerequisites.
1812* `make clean` cleans object files, `*.cat` files (see the [Locales][85]
1813  section), executables, and files generated from text files in [`gen/`][145],
1814  including `gen/strgen` if it was built. So this has a prerequisite on
1815  `make clean_gen` in normal use.
1816* `make clean_benchmarks` cleans [benchmarks][144], including the `ministat`
1817  executable. It has no prerequisites.
1818* `make clean_config` cleans the generated `Makefile` and the manuals that
1819  [`configure.sh`][69] copied in preparation for install. It also depends on
1820  `make clean` and `make clean_benchmarks`, so it cleans those items too. This
1821  is the target that [`configure.sh`][69] uses before it does its work.
1822* `make clean_coverage` cleans the generated coverage files for the [test
1823  suite][124]'s [code coverage][146] capabilities. It has no prerequisites. This
1824  is useful if the code coverage tools are giving errors.
1825* `make clean_tests` cleans *everything*. It has prerequisites on all previous
1826  clean targets, but it also cleans all of the [generated tests][143].
1827
1828When adding more generated files, you may need to add them to one of these
1829targets and/or add a target for them especially.
1830
1831### Preprocessor Macros
1832
1833`bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type:
1834
1835* builds,
1836* works under the [test suite][124], and
1837* excludes as much code as possible from all builds.
1838
1839This section will explain the preprocessor style of `bc` and `dc`, as well as
1840provide an explanation of the macros used.
1841
1842#### Style
1843
1844The style of macro use in `bc` is pretty straightforward: I avoid depending on
1845macro definitions and instead, I set defaults if the macro is not defined and
1846then test the value if the macro with a plain `#if`.
1847
1848(Some examples of setting defaults are in [`include/status.h`][176], just above
1849the definition of the `BcStatus` enum.)
1850
1851In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible.
1852
1853There are a couple of cases where I went with standard stuff instead.
1854
1855#### Standard Macros
1856
1857`BC_ENABLED`
1858
1859:   This macro expands to `1` if `bc` is enabled, `0` if disabled.
1860
1861`DC_ENABLED`
1862
1863:   This macro expands to `1` if `dc` is enabled, `0` if disabled.
1864
1865`BUILD_TYPE`
1866
1867:   The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`,
1868    `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct
1869    the user to the correct markdown manual in the `git.gavinhoward.com`
1870    website.
1871
1872`EXECPREFIX`
1873
1874:   This macro expands to the prefix on the executable name. This is used to
1875    allow `bc` and `dc` to skip the prefix when finding out which calculator is
1876    executing.
1877
1878`BC_NUM_KARATSUBA_LEN`
1879
1880:   This macro expands to an integer, which is the length of numbers below which
1881    the Karatsuba multiplication algorithm switches to brute-force
1882    multiplication.
1883
1884`BC_ENABLE_EXTRA_MATH`
1885
1886:   This macro expands to `1` if the [Extra Math build option][188] is enabled,
1887    `0` if disabled.
1888
1889`BC_ENABLE_HISTORY`
1890
1891:   This macro expands to `1` if the [History build option][193] is enabled, `0`
1892    if disabled.
1893
1894`BC_ENABLE_NLS`
1895
1896:   This macro expands to `1` if the [NLS build option][193] (for locales) is
1897    enabled, `0` if disabled.
1898
1899`BC_ENABLE_LIBRARY`
1900
1901:   This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if
1902    disabled. If this is enabled, building the calculators themselves is
1903    disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero.
1904
1905`BC_ENABLE_MEMCHECK`
1906
1907:   This macro expands to `1` if `bc` has been built for use with Valgrind's
1908    [Memcheck][194], `0` otherwise. This ensures that fatal errors still free
1909    all of their memory when exiting. `bc` does not do that normally because
1910    what's the point?
1911
1912`BC_ENABLE_AFL`
1913
1914:   This macro expands to `1` if `bc` has been built for fuzzing with
1915    [AFL++][125], `0` otherwise. See the [Fuzzing][82] section for more
1916    information.
1917
1918`BC_DEFAULT_BANNER`
1919
1920:   This macro expands to the default value for displaying the `bc` banner.
1921
1922`BC_DEFAULT_SIGINT_RESET`
1923
1924:   The macro expands to the default value for whether or not `bc` should reset
1925    on `SIGINT` or quit.
1926
1927`BC_DEFAULT_TTY_MODE`
1928
1929:   The macro expands to the default value for whether or not `bc` should use
1930    TTY mode when it available.
1931
1932`BC_DEFAULT_PROMPT`
1933
1934:   This macro expands to the default value for whether or not `bc` should use a
1935    prompt when TTY mode is available.
1936
1937`DC_DEFAULT_SIGINT_RESET`
1938
1939:   The macro expands to the default value for whether or not `dc` should reset
1940    on `SIGINT` or quit.
1941
1942`DC_DEFAULT_TTY_MODE`
1943
1944:   The macro expands to the default value for whether or not `dc` should use
1945    TTY mode when it available.
1946
1947`DC_DEFAULT_PROMPT`
1948
1949:   This macro expands to the default value for whether or not `dc` should use a
1950    prompt when TTY mode is available.
1951
1952`BC_DEBUG_CODE`
1953
1954:   If this macro expands to a non-zero integer, then `bc` is built with *a lot*
1955    of extra debugging code. This is never set by the build system and must be
1956    set by the programmer manually. This should never be set in builds given to
1957    end users. For more information, see the [Debugging][134] section.
1958
1959## Test Suite
1960
1961While the source code may be the heart and soul of `bc`, the test suite is the
1962arms and legs: it gives `bc` the power to do anything it needs to do.
1963
1964The test suite is what allowed `bc` to climb to such high heights of quality.
1965This even goes for fuzzing because fuzzing depends on the test suite for its
1966input corpuses. (See the [Fuzzing][82] section.)
1967
1968Understanding how the test suite works should be, I think, the first thing that
1969maintainers learn after learning what `bc` and `dc` should do. This is because
1970the test suite, properly used, gives confidence that changes have not caused
1971bugs or regressions.
1972
1973That is why I spent the time to make the test suite as easy to use and as fast
1974as possible.
1975
1976To use the test suite (assuming `bc` and/or `dc` are already built), run the
1977following command:
1978
1979```
1980make test
1981```
1982
1983That's it. That's all.
1984
1985It will return an error code if the test suite failed. It will also print out
1986information about the failure.
1987
1988If you want the test suite to go fast, then run the following command:
1989
1990```
1991make -j<cores> test
1992```
1993
1994Where `<cores>` is the number of cores that your computer has. Of course, this
1995requires a `make` implementation that supports that option, but most do. (And I
1996will use this convention throughout the rest of this section.)
1997
1998I have even tried as much as possible, to put longer-running tests near the
1999beginning of the run so that the entire suite runs as fast as possible.
2000
2001However, if you want to be sure which test is failing, then running a bare
2002`make test` is a great way to do that.
2003
2004But enough about how you have no excuses to use the test suite as much as
2005possible; let's talk about how it works and what you *can* do with it.
2006
2007### Standard Tests
2008
2009The heavy lifting of testing the math in `bc`, as well as basic scripting, is
2010done by the "standard tests" for each calculator.
2011
2012These tests use the files in the [`tests/bc/`][161] and [`tests/dc/`][162]
2013directories (except for [`tests/bc/all.txt`][163], [`tests/bc/errors.txt`][164],
2014[`tests/bc/posix_errors.txt`][165], [`tests/bc/timeconst.sh`][166],
2015[`tests/dc/all.txt`][167], [`tests/dc/errors.txt`][168], and
2016[`tests/dc/read_errors.txt`][175]), which are called the "standard test
2017directories."
2018
2019For every test, there is the test file and the results file. The test files have
2020names of the form `<test>.txt`, where `<test>` is the name of the test, and the
2021results files have names of the form `<test>_results.txt`.
2022
2023If the test file exists but the results file does not, the results for that test
2024are generated by a GNU-compatible `bc` or `dc`. See the [Generated Tests][143]
2025section.
2026
2027The `all.txt` file in each standard tests directory is what tells the test suite
2028and [build system][142] what tests there are, and the tests are either run in
2029that order, or in the case of parallel `make`, that is the order that the
2030targets are listed as prerequisites of `make test`.
2031
2032If the test exists in the `all.txt` file but does not *actually* exist, the test
2033and its results are generated by a GNU-compatible `bc` or `dc`. See the
2034[Generated Tests][143] section.
2035
2036To add a non-generated standard test, do the following:
2037
2038* Add the test file (`<test>.txt` in the standard tests directory).
2039* Add the results file (`<test>_results.txt` in the standard tests directory).
2040  You can skip this step if just the results file needs to be generated. See the
2041  [Generated Tests][147] section for more information.
2042* Add the name of the test to the `all.txt` file in the standard tests
2043  directory, putting it in the order it should be in. If possible, I would put
2044  longer tests near the beginning because they will start running earlier with
2045  parallel `make`. I always keep `decimal` first, though, as a smoke test.
2046
2047If you need to add a generated standard test, see the [Generated Tests][147]
2048section for how to do that.
2049
2050Some standard tests need to be skipped in certain cases. That is handled by the
2051[build system][142]. See the [Integration with the Build System][147] section
2052for more details.
2053
2054In addition to all of the above, the standard test directory is not only the
2055directory for the standard tests of each calculator, it is also the parent
2056directory of all other test directories for each calculator.
2057
2058#### `bc` Standard Tests
2059
2060The list of current (27 February 2023) standard tests for `bc` is below:
2061
2062decimal
2063
2064:   Tests decimal parsing and printing.
2065
2066print
2067
2068:   Tests printing in every base from decimal. This is near the top for
2069    performance of parallel testing.
2070
2071parse
2072
2073:   Tests parsing in any base and outputting in decimal. This is near the top
2074    for performance of parallel testing.
2075
2076lib2
2077
2078:   Tests the extended math library. This is near the top for performance of
2079    parallel testing.
2080
2081print2
2082
2083:   Tests printing at the extreme values of `obase`.
2084
2085length
2086
2087:   Tests the `length()` builtin function.
2088
2089scale
2090
2091:   Tests the `scale()` builtin function.
2092
2093shift
2094
2095:   Tests the left (`<<`) and right (`>>`) shift operators.
2096
2097add
2098
2099:   Tests addition.
2100
2101subtract
2102
2103:   Tests subtraction.
2104
2105multiply
2106
2107:   Tests multiplication.
2108
2109divide
2110
2111:   Tests division.
2112
2113modulus
2114
2115:   Tests modulus.
2116
2117power
2118
2119:   Tests power (exponentiation).
2120
2121sqrt
2122
2123:   Tests the `sqrt()` (square root) builtin function.
2124
2125trunc
2126
2127:   Tests the truncation (`$`) operator.
2128
2129places
2130
2131:   Tests the places (`@`) operator.
2132
2133vars
2134
2135:   Tests some usage of variables. This one came from [AFL++][125] I think.
2136
2137boolean
2138
2139:   Tests boolean operators.
2140
2141comp
2142
2143:   Tests comparison operators.
2144
2145abs
2146
2147:   Tests the `abs()` builtin function.
2148
2149assignments
2150
2151:   Tests assignment operators, including increment/decrement operators.
2152
2153functions
2154
2155:   Tests functions, specifically function parameters being replaced before they
2156    themselves are used. See the comment in `bc_program_call()` about the last
2157    condition.
2158
2159scientific
2160
2161:   Tests scientific notation.
2162
2163engineering
2164
2165:   Tests engineering notation.
2166
2167globals
2168
2169:   Tests that assigning to globals affects callers.
2170
2171strings
2172
2173:   Tests strings.
2174
2175strings2
2176
2177:   Tests string allocation in slabs, to ensure slabs work.
2178
2179letters
2180
2181:   Tests single and double letter numbers to ensure they behave differently.
2182    Single-letter numbers always be set to the same value, regardless of
2183    `ibase`.
2184
2185exponent
2186
2187:   Tests the `e()` function in the math library.
2188
2189log
2190
2191:   Tests the `l()` function in the math library.
2192
2193pi
2194
2195:   Tests that `bc` produces the right value of pi for numbers with varying
2196    `scale` values.
2197
2198arctangent
2199
2200:   Tests the `a()` function in the math library.
2201
2202sine
2203
2204:   Tests the `s()` function in the math library.
2205
2206cosine
2207
2208:   Tests the `c()` function in the math library.
2209
2210bessel
2211
2212:   Tests the `j()` function in the math library.
2213
2214fib
2215
2216:   Tests the `fib()` Fibonacci function in the extended math library.
2217
2218arrays
2219
2220:   Test arrays.
2221
2222misc
2223
2224:   Miscellaneous tests. I named it this because at the time, I struggled to
2225    classify them, but it's really testing multi-line numbers.
2226
2227misc1
2228
2229:   A miscellaneous test found by [AFL++][125].
2230
2231misc2
2232
2233:   A miscellaneous test found by [AFL++][125].
2234
2235misc3
2236
2237:   A miscellaneous test found by [AFL++][125].
2238
2239misc4
2240
2241:   A miscellaneous test found by [AFL++][125].
2242
2243misc5
2244
2245:   A miscellaneous test found by [AFL++][125].
2246
2247misc6
2248
2249:   A miscellaneous test found by [AFL++][125].
2250
2251misc7
2252
2253:   A miscellaneous test found by [AFL++][125].
2254
2255void
2256
2257:   Tests void functions.
2258
2259rand
2260
2261:   Tests the pseudo-random number generator and its special stack handling.
2262
2263recursive_arrays
2264
2265:   Tested the slab vector undo ability in used in `bc_parse_name()` when it
2266    existed. Now used as a stress test.
2267
2268divmod
2269
2270:   Tests divmod.
2271
2272modexp
2273
2274:   Tests modular exponentiation.
2275
2276bitfuncs
2277
2278:   Tests the bitwise functions, `band()`, `bor()`, `bxor()`, `blshift()` and
2279    `brshift()` in [`gen/lib2.bc`][26].
2280
2281leadingzero
2282
2283:   Tests the leading zero functionality and the `plz*()` and `pnlz*()`
2284    functions in [`gen/lib2.bc`][26].
2285
2286is_number
2287
2288:   Tests the `is_number()` built-in function.
2289
2290is_string
2291
2292:   Tests the `is_number()` built-in function.
2293
2294asciify_array
2295
2296:   Tests the ability of `asciify()` to convert an array into a full string.
2297
2298line_by_line1
2299
2300:   Tests the line-by-line behavior of `bc` with regards to `quit` in a function
2301    definition.
2302
2303line_by_line2
2304
2305:   Tests the line-by-line behavior of `bc` with regards to `quit`.
2306
2307line_loop_quit1
2308
2309:   Tests the behavior of `bc` with a `quit` after a single-line loop.
2310
2311line_loop_quit2
2312
2313:   Tests the behavior of `bc` with a `quit` after a single-line loop and a
2314    newline escape.
2315
2316#### `dc` Standard Tests
2317
2318The list of current (27 February 2023) standard tests for `dc` is below:
2319
2320decimal
2321
2322:   Tests decimal parsing and printing.
2323
2324length
2325
2326:   Tests the `length()` builtin function, including for strings and arrays.
2327
2328stack_len
2329
2330:   Tests taking the length of the results stack.
2331
2332exec_stack_len
2333
2334:   Tests taking the length of the execution stack.
2335
2336add
2337
2338:   Tests addition.
2339
2340subtract
2341
2342:   Tests subtraction.
2343
2344multiply
2345
2346:   Tests multiplication.
2347
2348divide
2349
2350:   Tests division.
2351
2352modulus
2353
2354:   Tests modulus.
2355
2356divmod
2357
2358:   Tests divmod.
2359
2360power
2361
2362:   Tests power (exponentiation).
2363
2364sqrt
2365
2366:   Tests the `sqrt()` (square root) builtin function.
2367
2368modexp
2369
2370:   Tests modular exponentiation.
2371
2372boolean
2373
2374:   Tests boolean operators.
2375
2376negate
2377
2378:   Tests negation as a command and as part of numbers.
2379
2380trunc
2381
2382:   Tests the truncation (`$`) operator.
2383
2384places
2385
2386:   Tests the places (`@`) operator.
2387
2388shift
2389
2390:   Tests the left (`<<`) and right (`>>`) shift operators.
2391
2392abs
2393
2394:   Tests the `abs()` builtin function (the `b` command).
2395
2396scientific
2397
2398:   Tests scientific notation.
2399
2400engineering
2401
2402:   Tests engineering notation.
2403
2404vars
2405
2406:   Tests some usage of variables. This one came from [AFL++][125] I think.
2407
2408misc
2409
2410:   Miscellaneous tests. I named it this because at the time, I struggled to
2411    classify them.
2412
2413misc1
2414
2415:   More miscellaneous tests. This used to be an error file
2416    (`tests/dc/errors/15.txt`) due to the presence of a invalid `u` character.
2417    However, starting with version `6.1.0`, `u` became a valid command
2418    (`is_number()`), so the error file became valid. It was checked manually and
2419    moved, and `tests/dc/errors/34.txt` became the new `tests/dc/errors/15.txt`.
2420
2421strings
2422
2423:   Tests strings.
2424
2425rand
2426
2427:   Tests the pseudo-random number generator and its special stack handling.
2428
2429is_number
2430
2431:   Tests the `is_number()` built-in function (the `u` command).
2432
2433is_string
2434
2435:   Tests the `is_number()` built-in function (the `t` command).
2436
2437### Script Tests
2438
2439The heavy lifting of testing the scripting of `bc` is done by the "script tests"
2440for each calculator.
2441
2442These tests use the files in the [`tests/bc/scripts/`][169] and
2443[`tests/dc/scripts/`][170] directories (except for
2444[`tests/bc/scripts/all.txt`][171] and [`tests/dc/scripts/all.txt`][172]), which
2445are called the "script test directories."
2446
2447To add a script test, do the following:
2448
2449* Add the test file (`<test>.bc` or `<test>.dc` in the script tests directory).
2450* Add the results file (`<test>.txt` in the script tests directory). You can
2451  skip this step if just the results file needs to be generated. See the
2452  [Generated Tests][147] section for more information.
2453* Add the name of the test to the `all.txt` file in the script tests directory,
2454  putting it in the order it should be in. If possible, I would put longer tests
2455  near the beginning because they will start running earlier with parallel
2456  `make`.
2457
2458Some script tests need to be skipped in certain cases. That is handled by the
2459[build system][142]. See the [Integration with the Build System][147] section
2460for more details.
2461
2462Another unique thing about the script tests, at least for `bc`: they test the
2463`-g` and `--global-stacks` flags. This means that all of the script tests for
2464`bc` are written assuming the `-g` flag was given on the command-line
2465
2466There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed`
2467script is used to remove an incompatibility with GNU `bc`.
2468
2469If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU
2470`bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is
2471correct according to my reading of the `bc` spec, so my `bc` doesn't as well.
2472
2473The `sed` script edits numbers that end with just one digit on a line by itself
2474to put it on the same line as others.
2475
2476#### `bc` Script Tests
2477
2478The list of current (27 February 2023) script tests for `bc` is below:
2479
2480print.bc
2481
2482:   Tests printing even harder than the print standard test.
2483
2484multiply.bc
2485
2486:   Tests multiplication even harder than the multiply standard test.
2487
2488divide.bc
2489
2490:   Tests division even harder than the divide standard test.
2491
2492subtract.bc
2493
2494:   Tests subtraction even harder than the subtract standard test.
2495
2496add.bc
2497
2498:   Tests addition even harder than the add standard test.
2499
2500parse.bc
2501
2502:   Tests parsing even harder than the parse standard test.
2503
2504root.bc
2505
2506:   Tests that `root()` and `cbrt()` do not go into an infinite loop on a
2507    pathological case found by a user.
2508
2509array.bc
2510
2511:   Tests arrays even harder than the arrays standard test.
2512
2513array2.bc
2514
2515:   Implements a test where an array element is passed as a parameter to a
2516    function, and then another array is passed to a later parameter that is
2517    named the same as the first array. This was added because of a bug found
2518    while writing a script in bc.
2519
2520atan.bc
2521
2522:   Tests arctangent even harder than the arctangent standard test.
2523
2524bessel.bc
2525
2526:   Tests bessel even harder than the bessel standard test.
2527
2528functions.bc
2529
2530:   Tests functions even harder than the functions standard test.
2531
2532globals.bc
2533
2534:   Tests global stacks directly.
2535
2536len.bc
2537
2538:   Tests the `length()` builtin on arrays.
2539
2540rand.bc
2541
2542:   Tests the random number generator in the presence of global stacks.
2543
2544references.bc
2545
2546:   Tests functions with array reference parameters.
2547
2548screen.bc
2549
2550:   A random script provided by an early user that he used to calculate the size
2551    of computer screens
2552
2553strings2.bc
2554
2555:   Tests escaping in strings.
2556
2557ifs.bc
2558
2559:   Tests proper ending of `if` statements without `else` statements.
2560
2561ifs2.bc
2562
2563:   More tests proper ending of `if` statements without `else` statements.
2564
2565#### `dc` Script Tests
2566
2567The list of current (27 February 2023) script tests for `dc` is below:
2568
2569prime.dc
2570
2571:   Tests scripting by generating the first 100,000 primes.
2572
2573asciify.dc
2574
2575:   Tests the asciify command.
2576
2577stream.dc
2578
2579:   Tests the stream command.
2580
2581array.dc
2582
2583:   Tests arrays.
2584
2585else.dc
2586
2587:   Tests else clauses on conditional execution commands.
2588
2589factorial.dc
2590
2591:   Tests scripting with factorial.
2592
2593loop.dc
2594
2595:   Tests scripting by implementing loops.
2596
2597quit.dc
2598
2599:   Tests the quit command in the presence of tail calls.
2600
2601weird.dc
2602
2603:   A miscellaneous test.
2604
2605no_clamp.dc
2606
2607:   A test to ensure `dc` has the same behavior as the BSD `dc` with digi
2608    clamping off when parsing numbers.
2609
2610### Error Tests
2611
2612One of the most useful parts of the `bc` test suite, in my opinion, is the heavy
2613testing of error conditions.
2614
2615Just about every error condition I can think of is tested, along with many
2616machine-generated (by [AFL++][125]) ones.
2617
2618However, because the error tests will often return error codes, they require
2619different infrastructure from the rest of the test suite, which assumes that
2620the calculator under test will return successfully. A lot of that infrastructure
2621is in the [`scripts/functions.sh`][105] script, but it basically allows the
2622calculator to exit with an error code and then tests that there *was* an error
2623code.
2624
2625Besides returning error codes, error tests also ensure that there is output from
2626`stderr`. This is to make sure that an error message is always printed.
2627
2628The error tests for each calculator are spread through two directories, due to
2629historical accident. These two directories are the standard test directory (see
2630the [Standard Tests][149] section) and the `errors/` directory directly
2631underneath the standard tests directory.
2632
2633This split is convenient, however, because the tests in each directory are
2634treated differently.
2635
2636The error tests in the standard test directory, which include `errors.txt` for
2637both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for `dc`,
2638are run by [`tests/errors.sh`][226]. It reads them line-by-line and shoves the
2639data through `stdin`. Each line is considered a separate test. For this reason,
2640there can't be any blank lines in the error files in the standard tests
2641directory because a blank line causes a successful exit.
2642
2643On the other hand, the tests in the `errors/` directory below the standard tests
2644directory are run by [`tests/error.sh`][227] and are considered to be one test
2645per file. As such, they are used differently. They are shoved into the
2646calculator through `stdin`, but they are also executed by passing them on the
2647command-line.
2648
2649To add an error test, first figure out which kind you want.
2650
2651Is it a simple one-liner, and you don't care if it's tested through a file?
2652
2653Then put it in one of the error files in the standard test directory. I would
2654only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()`
2655errors in the `read_errors.txt` file for `dc`; all others I would put in the
2656respective `errors.txt` file.
2657
2658On the other hand, if you care if the error is run as a file on the
2659command-line, or the error requires multiple lines to reproduce, then put the
2660test in the respective `errors/` directory and run the [`configure.sh`][69]
2661script again.
2662
2663After that, you are done; the test suite will automatically pick up the new
2664test, and you don't have to tell the test suite the expected results.
2665
2666### `stdin` Tests
2667
2668The `stdin` tests specifically test the lexing and parsing of multi-line
2669comments and strings. This is important because when reading from `stdin`, the
2670calculators can only read one line at a time, so partial parses are possible.
2671
2672To add `stdin` tests, just add the tests to the `stdin.txt` file in the
2673respective standard tests directory, and add the expected results in the
2674`stdin_results.txt` in the respective standard tests directory.
2675
2676### `read()` Tests
2677
2678The `read()` tests are meant to test the `read()` builtin function, to ensure
2679that the parsing and execution is correct.
2680
2681Each line is one test, as that is the nature of using the `read()` function, so
2682to add a test, just add it as another line in the `read.txt` file in the
2683respective standard tests directory, and add its result to the
2684`read_results.txt` file in the respective standard tests directory.
2685
2686### Other Tests
2687
2688The "other" tests are just random tests that I could not easily classify under
2689other types of tests. They usually include things like command-line parsing and
2690environment variable testing.
2691
2692To add an other test, it requires adding the programming for it to
2693[`tests/other.sh`][195] because all of the tests are written specifically in
2694that script. It would be best to use the infrastructure in
2695[`scripts/functions.sh`][105].
2696
2697### Linux `timeconst.bc` Script
2698
2699One special script that `bc`'s test suite will use is the [Linux `timeconst.bc`
2700script][6].
2701
2702I made the test suite able to use this script because the reason the
2703[toybox][16] maintainer wanted my `bc` is because of this script, and I wanted
2704to be sure that it would run correctly on the script.
2705
2706However, it is not part of the distribution, nor is it part of the repository.
2707The reason for this is because [`timeconst.bc`][6] is under the GPL, while this
2708repo is under a BSD license.
2709
2710If you want `bc` to run tests on [`timeconst.bc`][6], download it and place it
2711at `tests/bc/scripts/timeconst.bc`. If it is there, the test suite will
2712automatically run its tests; otherwise, it will skip it.
2713
2714### History Tests
2715
2716There are automatic tests for history; however, they have dependencies: Python 3
2717and [`pexpect`][137].
2718
2719As a result, because I need the [test suite to be portable][138], like the rest
2720of `bc`, the history tests are carefully guarded with things to ensure that they
2721are skipped, rather than failing if Python and [`pexpect`][137] are not
2722installed. For this reason, there is a `sh` script, [`tests/history.sh`][140]
2723that runs the actual script, [`tests/history.py`][139].
2724
2725I have added as many tests as I could to cover as many lines and branches as
2726possible. I guess I could have done more, but doing so would have required a lot
2727of time.
2728
2729I have tried to make it as easy as possible to run the history tests. They will
2730run automatically if you use the `make test_history` command, and they will also
2731use parallel execution with `make -j<cores> test_history`.
2732
2733However, the history tests are meant only to be run by maintainers of `bc`; they
2734are *not* meant to be run by users and packagers. The reason for this is that
2735they only seem to work reliably on Linux; `pexpect` seems to have issues on
2736other platforms, especially timeout issues.
2737
2738Thus, they are excluded from running with `make test` and [`tests/all.sh`][225].
2739However, they can be run from the [`scripts/release.sh`][83] script.
2740
2741All of the tests are contained in [`tests/history.py`][139]. The reason for this
2742is because they are in Python, and I don't have an easy way of including Python
2743(or at the very least, I am not familiar enough with Python to do that). So they
2744are all in the same file to make it easier on me.
2745
2746Each test is one function in the script. They all take the same number and type
2747of arguments:
2748
27491.	`exe`: the executable to run.
27502.	`args`: the arguments to pass to the executable.
27513.	`env`: the environment.
2752
2753Each function creates a child process with `pexpect.spawn` and then tests with
2754that child. Then the function returns the child to the caller, who closes it
2755and checks its error code against its expected error code.
2756
2757Yes, the error code is not a success all the time. This is because of the UTF-8
2758tests; `bc` gives a fatal error on any non-ASCII data because ASCII is all `bc`
2759is required to handle, per the [standard][2].
2760
2761So in [`tests/history.py`][139], there are four main arrays:
2762
2763* `bc` test functions,
2764* `bc` expected error codes.
2765* `dc` test functions.
2766* `dc` expected error codes.
2767
2768[`tests/history.py`][139] takes an index as an argument; that index is what test
2769it should run. That index is used to index into the proper test and error code
2770array.
2771
2772If you need to add more history tests, you need to do the following:
2773
27741.	Add the function for that test to [`tests/history.py`][139].
27752.	Add the function to the proper array of tests.
27763.	Add the expected error code to the proper array of error codes.
27774.	Add a target for the test to [`Makefile.in`][70].
27785.	Add that target as a prerequisite to either `test_bc_history` or
2779	`test_dc_history`.
2780
2781You do not need to do anything to add the test to `history_all_tests` (see
2782[Group Tests][141] below) because the scripts will automatically run all of the
2783tests properly.
2784
2785### Generated Tests
2786
2787Some tests are *large*, and as such, it is impractical to check them into `git`.
2788Instead, the tests depend on the existence of a GNU-compatible `bc` in the
2789`PATH`, which is then used to generate the tests.
2790
2791If [`configure.sh`][69] was run with the `-G` argument, which disables generated
2792tests, then `make test` and friends will automatically skip generated tests.
2793This is useful to do on platforms that are not guaranteed to have a
2794GNU-compatible `bc` installed.
2795
2796However, adding a generated test is a complicated because you have to figure out
2797*where* you want to put the file to generate the test.
2798
2799For example, `bc`'s test suite will automatically use a GNU-compatible `bc` to
2800generate a `<test>_results.txt` file in the [standard tests][149] directory
2801(either `tests/bc/` or `tests/dc/`) if none exists for the `<test>` test. If no
2802`<test>.txt` file exists in the [standard tests][149] directory, then `bc`'s
2803test suite will look for a `<test>.bc` or `<test>.dc` file in the [script
2804tests][150] directory (either `tests/bc/scripts` or `tests/dc/scripts`), and if
2805that exists, it will use that script to generate the `<test>.txt` file in the
2806[standard tests][149] directory after which it will generate the
2807`<test>_results.txt` file in the [standard tests][149] directory.
2808
2809So you can choose to either:
2810
2811* Have a test in the [standard tests][149] directory without a corresponding
2812  `*_results.txt` file, or
2813* Have a script in the [script tests][150] directory to generate the
2814  corresponding file in the standard test directory before generating the
2815  corresponding `*_results.txt` file.
2816
2817Adding a script has a double benefit: the script itself can be used as a test.
2818However, script test results can also be generated.
2819
2820If `bc` is asked to run a script test, then if the script does not exist, `bc`'s
2821test suite returns an error. If it *does* exist, but no corresponding
2822`<test>.txt` file exists in the [script tests][150] directory, then a
2823GNU-compatible `bc` is used to generate the `<test>.txt` results file.
2824
2825If generated tests are disabled through [`configure.sh`][69], then these tests
2826are not generated if they do not exist. However, if they *do* exist, then they
2827are run. This can happen if a `make clean_tests` was not run between a build
2828that generated tests and a build that will not.
2829
2830### Group Tests
2831
2832While the test suite has a lot of targets in order to get parallel execution,
2833there are five targets that allow you to run each section, or all, of the test
2834suite as one unit:
2835
2836* `bc_all_tests` (`bc` tests)
2837* `timeconst_all_tests` ([Linux `timeconst.bc` script][6] tests)
2838* `dc_all_tests` (`dc` tests)
2839* `history_all_tests` (history tests)
2840* `run_all_tests` (combination of the previous four)
2841
2842In addition, there are more fine-grained targets available:
2843
2844* `test_bc` runs all `bc` tests (except history tests).
2845* `test_dc` runs all `dc` tests (except history tests).
2846* `test_bc_tests` runs all `bc` [standard tests][149].
2847* `test_dc_tests` runs all `dc` [standard tests][149].
2848* `test_bc_scripts` runs all `bc` [script tests][150].
2849* `test_dc_scripts` runs all `dc` [script tests][150].
2850* `test_bc_stdin` runs the `bc` [`stdin` tests][152].
2851* `test_dc_stdin` runs the `dc` [`stdin` tests][152].
2852* `test_bc_read` runs the `bc` [`read()` tests][153].
2853* `test_dc_read` runs the `dc` [`read()` tests][153].
2854* `test_bc_errors` runs the `bc` [error tests][151].
2855* `test_dc_errors` runs the `dc` [error tests][151].
2856* `test_bc_other` runs the `bc` [other tests][151].
2857* `test_dc_other` runs the `dc` [other tests][151].
2858* `timeconst` runs the tests for the [Linux `timeconst.bc` script][6].
2859* `test_history` runs all history tests.
2860* `test_bc_history` runs all `bc` history tests.
2861* `test_dc_history` runs all `dc` history tests.
2862
2863All of the above tests are parallelizable.
2864
2865### Individual Tests
2866
2867In addition to all of the above, individual test targets are available. These
2868are mostly useful for attempting to fix a singular test failure.
2869
2870These tests are:
2871
2872* `test_bc_<test>`, where `<test>` is the name of a `bc` [standard test][149].
2873  The name is the name of the test file without the `.txt` extension. It is the
2874  name printed by the test suite when running the test.
2875* `test_dc_<test>`, where `<test>` is the name of a `dc` [standard test][149].
2876  The name is the name of the test file without the `.txt` extension. It is the
2877  name printed by the test suite when running the test.
2878* `test_bc_script_<test>`, where `<test>` is the name of a `bc` [script
2879  test][150]. The name of the test is the name of the script without the `.bc`
2880  extension.
2881* `test_dc_script_<test>`, where `<test>` is the name of a `dc` [script
2882  test][150]. The name of the test is the name of the script without the `.dc`
2883  extension.
2884* `test_bc_history<idx>` runs the `bc` history test with index `<idx>`.
2885* `test_dc_history<idx>` runs the `dc` history test with index `<idx>`.
2886
2887### [`bcl`][156] Test
2888
2889When [`bcl`][156] is built, the [build system][142] automatically ensures that
2890`make test` runs the [`bcl`][156] test instead of the `bc` and `dc` tests.
2891
2892There is only one test, and it is built from [`tests/bcl.c`][158].
2893
2894The reason the test is in C is because [`bcl`][156] is a C library; I did not
2895want to have to write C code *and* POSIX `sh` scripts to run it.
2896
2897The reason there is only one test is because most of the code for the library is
2898tested by virtue of testing `bc` and `dc`; the test needs to only ensure that
2899the library bindings and plumbing do not interfere with the underlying code.
2900
2901However, just because there is only one test does not mean that it doesn't test
2902more than one thing. The code actually handles a series of tests, along with
2903error checking to ensure that nothing went wrong.
2904
2905To add a [`bcl`][156] test, just figure out what test you want, figure out where
2906in the [`tests/bcl.c`][158] would be best to put it, and put it there. Do as
2907much error checking as possible, and use the `err(BclError)` function. Ensure
2908that all memory is freed because that test is run through [Valgrind][159] and
2909[AddressSanitizer][160].
2910
2911### Integration with the Build System
2912
2913If it was not obvious by now, the test suite is heavily integrated into the
2914[build system][142], but the integration goes further than just making the test
2915suite easy to run from `make` and generating individual and group tests.
2916
2917The big problem the test suite has is that some `bc` code, stuff that is
2918important to test, is only in *some* builds. This includes all of the extra math
2919extensions, for example.
2920
2921So the test suite needs to have some way of turning off the tests that depend on
2922certain [build types][81] when those [build types][81] are not used.
2923
2924This is the reason the is tightly integrated with the [build system][142]: the
2925[build system][142] knows what [build type][81] was used and can tell the test
2926suite to turn off the tests that do not apply.
2927
2928It does this with arguments to the test scripts that are either a `1` or a `0`,
2929depending on whether tests of that type should be enabled or not. These
2930arguments are why I suggest, in the [Test Scripts][148] section, to always use a
2931`make` target to run the test suite or any individual test. I have added a lot
2932of targets to make this easy and as fast as possible.
2933
2934In addition to all of that, the build system is responsible for selecting the
2935`bc`/`dc` tests or the [`bcl` test][157].
2936
2937### Output Directories
2938
2939During any run of the test suite, the test suite outputs the results of running
2940various tests to files. These files are usually output to `tests/bc_outputs/`
2941and `tests/dc_outputs/`.
2942
2943However, in some cases, it may be necessary to output test results to a
2944different directory. If that is the case, set the environment variable
2945`BC_TEST_OUTPUT_DIR` to the name of the directory.
2946
2947If that is done, then test results will be written to
2948`$BC_TEST_OUTPUT_DIR/bc_outputs/` and `$BC_TEST_OUTPUT_DIR/dc_outputs/`.
2949
2950### Test Suite Portability
2951
2952The test suite is meant to be run by users and packagers as part of their
2953install process.
2954
2955This puts some constraints on the test suite, but the biggest is that the test
2956suite must be as [portable as `bc` itself][136].
2957
2958This means that the test suite must be implemented in pure POSIX `make`, `sh`,
2959and C99.
2960
2961#### Test Scripts
2962
2963To accomplish the portability, the test suite is run by a bunch of `sh` scripts
2964that have the constraints laid out in [POSIX Shell Scripts][76].
2965
2966However, that means they have some quirks, made worse by the fact that there are
2967[generated tests][143] and [tests that need to be skipped, but only
2968sometimes][147].
2969
2970This means that a lot of the scripts take an awkward number and type of
2971arguments. Some arguments are strings, but most are integers, like
2972[`scripts/release.sh`][83].
2973
2974It is for this reason that I do not suggest running the test scripts directly.
2975Instead, always use an appropriate `make` target, which already knows the
2976correct arguments for the test because of the [integration with the build
2977system][147].
2978
2979### Test Coverage
2980
2981In order to get test coverage information, you need `gcc`, `gcov`, and `gcovr`.
2982
2983If you have them, run the following commands:
2984
2985```
2986CC=gcc ./configure -gO3 -c
2987make -j<cores>
2988make coverage
2989```
2990
2991Note that `make coverage` does not have a `-j<cores>` part; it cannot be run in
2992parallel. If you try, you will get errors. And note that `CC=gcc` is used.
2993
2994After running those commands, you can open your web browser and open the
2995`index.html` file in the root directory of the repo. From there, you can explore
2996all of the coverage results.
2997
2998If you see lines or branches that you think you could hit with a manual
2999execution, do such manual execution, and then run the following command:
3000
3001```
3002make coverage_output
3003```
3004
3005and the coverage output will be updated.
3006
3007If you want to rerun `make coverage`, you must do a `make clean` and build
3008first, like this:
3009
3010```
3011make clean
3012make -j<cores>
3013make coverage
3014```
3015
3016Otherwise, you will get errors.
3017
3018If you want to run tests in parallel, you can do this:
3019
3020```
3021make -j<cores>
3022make -j<cores> test
3023make coverage_output
3024```
3025
3026and that will generate coverage output correctly.
3027
3028### [AddressSanitizer][21] and Friends
3029
3030To run the test suite under [AddressSanitizer][21] or any of its friends, use
3031the following commands:
3032
3033```
3034CFLAGS="-fsanitize=<sanitizer> ./configure -gO3 -m
3035make -j<cores>
3036make -j<cores> test
3037```
3038
3039where `<sanitizer>` is the correct name of the desired sanitizer. There is one
3040exception to the above: `UndefinedBehaviorSanitizer` should be run on a build
3041that has zero optimization, so for `UBSan`, use the following commands:
3042
3043```
3044CFLAGS="-fsanitize=undefined" ./configure -gO0 -m
3045make -j<cores>
3046make -j<cores> test
3047```
3048
3049### [Valgrind][20]
3050
3051To run the test suite under [Valgrind][20], run the following commands:
3052
3053```
3054./configure -gO3 -v
3055make -j<cores>
3056make -j<cores> test
3057```
3058
3059It really is that easy. I have directly added infrastructure to the build system
3060and the test suite to ensure that if [Valgrind][20] detects any memory errors or
3061any memory leaks at all, it will tell the test suite infrastructure to report an
3062error and exit accordingly.
3063
3064## POSIX Shell Scripts
3065
3066There is a lot of shell scripts in this repository, and every single one of them
3067is written in pure [POSIX `sh`][72].
3068
3069The reason that they are written in [POSIX `sh`][72] is for *portability*: POSIX
3070systems are only guaranteed to have a barebones implementation of `sh`
3071available.
3072
3073There are *many* snares for unwary programmers attempting to modify
3074[`configure.sh`][69], any of the scripts in this directory, [`strgen.sh`][9], or
3075any of the scripts in [`tests/`][77]. Here are some of them:
3076
30771.	No `bash`-isms.
30782.	Only POSIX standard utilities are allowed.
30793.	Only command-line options defined in the POSIX standard for POSIX utilities
3080	are allowed.
30814.	Only the standardized behavior of POSIX utilities is allowed.
30825.	Functions return data by *printing* it. Using `return` sets their exit code.
3083
3084In other words, the script must only use what is standardized in the [`sh`][72]
3085and [Shell Command Language][73] standards in POSIX. This is *hard*. It precludes
3086things like `local` and the `[[ ]]` notation.
3087
3088These are *enormous* restrictions and must be tested properly. I put out at
3089least one release with a change to `configure.sh` that wasn't portable. That was
3090an embarrassing mistake.
3091
3092The lack of `local`, by the way, is why variables in functions are named with
3093the form:
3094
3095```
3096_<function_name>_<var_name>
3097```
3098
3099This is done to prevent any clashes of variable names with already existing
3100names. And this applies to *all* shell scripts. However, there are a few times
3101when that naming convention is *not* used; all of them are because those
3102functions are required to change variables in the global scope.
3103
3104### Maintainer-Only Scripts
3105
3106If a script is meant to be used for maintainers (of `bc`, not package
3107maintainers), then rules 2, 3, and 4 don't need to be followed as much because
3108it is assumed that maintainers will be able to install whatever tools are
3109necessary to do the job.
3110
3111## Manuals
3112
3113The manuals for `bc` and `dc` are all generated, and the manpages for `bc`,
3114`dc`, and `bcl` are also generated.
3115
3116Why?
3117
3118I don't like the format of manpages, and I am not confident in my ability to
3119write them. Also, they are not easy to read on the web.
3120
3121So that explains why `bcl`'s manpage is generated from its markdown version. But
3122why are the markdown versions of the `bc` and `dc` generated?
3123
3124Because the content of the manuals needs to change based on the [build
3125type][81]. For example, if `bc` was built with no history support, it should not
3126have the **COMMAND LINE HISTORY** section in its manual. If it did, that would
3127just confuse users.
3128
3129So the markdown manuals for `bc` and `dc` are generated from templates
3130([`manuals/bc.1.md.in`][89] and [`manuals/dc.1.md.in`][90]). And from there,
3131the manpages are generated from the generated manuals.
3132
3133The generated manpage for `bcl` ([`manuals/bcl.3`][62]) is checked into version
3134control, and the generated markdown manuals and manpages for `bc`
3135([`manuals/bc`][79]) and `dc` ([`manuals/dc`][80]) are as well.
3136
3137This is because generating the manuals and manpages requires a heavy dependency
3138that only maintainers should care about: [Pandoc][92]. Because users [should not
3139have to install *any* dependencies][136], the files are generated, checked into
3140version control, and included in distribution tarballs.
3141
3142If you run [`configure.sh`][69], you have an easy way of generating the markdown
3143manuals and manpages: just run `make manpages`. This target calls
3144[`scripts/manpage.sh`][60] appropriately for `bc`, `dc`, and `bcl`.
3145
3146For more on how generating manuals and manpages works, see
3147[`scripts/manpage.sh`][60].
3148
3149## Locales
3150
3151The locale system of `bc` is enormously complex, but that's because
3152POSIX-compatible locales are terrible.
3153
3154How are they terrible?
3155
3156First, `gencat` does not work for generating cross-compilation. In other words,
3157it does not generate machine-portable files. There's nothing I can do about
3158this except for warn users.
3159
3160Second, the format of `.msg` files is...interesting. Thank goodness it is text
3161because otherwise, it would be impossible to get them right.
3162
3163Third, `.msg` files are not used. In other words, `gencat` exists. Why?
3164
3165Fourth, `$NLSPATH` is an awful way to set where and *how* to install locales.
3166
3167Yes, where and *how*.
3168
3169Obviously, from it's name, it's a path, and that's the where. The *how* is more
3170complicated.
3171
3172It's actually *not* a path, but a path template. It's a format string, and it
3173can have a few format specifiers. For more information on that, see [this
3174link][84]. But in essence, those format specifiers configure how each locale is
3175supposed to be installed.
3176
3177With all those problems, why use POSIX locales? Portability, as always. I can't
3178assume that `gettext` will be available, but I *can* pretty well assume that
3179POSIX locales will be available.
3180
3181The locale system of `bc` includes all files under [`locales/`][85],
3182[`scripts/locale_install.sh`][87], [`scripts/locale_uninstall.sh`][88],
3183[`scripts/functions.sh`][105], the `bc_err_*` constants in [`src/data.c`][131],
3184and the parts of the build system needed to activate it. There is also code in
3185[`src/vm.c`][58] (in `bc_vm_gettext()`) for loading the current locale.
3186
3187If the order of error messages and/or categories are changed, the order of
3188errors must be changed in the enum, the default error messages and categories in
3189[`src/data.c`][131], and all of the messages and categories in the `.msg` files
3190under [`locales/`][85].
3191
3192## Static Analysis
3193
3194I do *some* static analysis on `bc`.
3195
3196I used to use [Coverity][196], but I stopped using it when it started giving me
3197too many false positives and also because it had a vulnerability.
3198
3199However, I still use the [Clang Static Analyzer][197] through
3200[`scan-build`][19]. I only use it in debug mode because I have to add some
3201special code to make it not complain about things that are definitely not a
3202problem.
3203
3204The most frequent example of false positives is where a local is passed to a
3205function to be initialized. [`scan-build`][19] misses that fact, so I
3206pre-initialize such locals to prevent the warnings.
3207
3208To run `scan-build`, do the following:
3209
3210```
3211make clean
3212scan-build make
3213```
3214
3215`scan-build` will print its warnings to `stdout`.
3216
3217## Fuzzing
3218
3219The quality of this `bc` is directly related to the amount of fuzzing I did. As
3220such, I spent a lot of work making the fuzzing convenient and fast, though I do
3221admit that it took me a long time to admit that it did need to be faster.
3222
3223First, there were several things which make fuzzing fast:
3224
3225* Using [AFL++][125]'s deferred initialization.
3226* Splitting `bc`'s corpuses.
3227* Parallel fuzzing.
3228
3229Second, there are several things which make fuzzing convenient:
3230
3231* Preprepared input corpuses.
3232* [`scripts/fuzz_prep.sh`][119].
3233* `tmux` and `tmuxp` configs.
3234* [`scripts/afl.py`][94].
3235
3236### Fuzzing Performance
3237
3238Fuzzing with [AFL++][125] can be ***SLOW***. Spending the time to make it as
3239fast as possible is well worth the time.
3240
3241However, there is a caveat to the above: it is easy to make [AFL++][125] crash,
3242be unstable, or be unable to find "paths" (see [AFL++ Quickstart][129]) if the
3243performance enhancements are done poorly.
3244
3245To stop [AFL++][125] from crashing on test cases, and to be stable, these are
3246the requirements:
3247
3248* The state at startup must be *exactly* the same.
3249* The virtual memory setup at startup must be *exactly* the same.
3250
3251The first isn't too hard; it's the second that is difficult.
3252
3253`bc` allocates a lot of memory at start. ("A lot" is relative; it's far less
3254than most programs.) After going through an execution run, however, some of that
3255memory, while it could be cleared and reset, is in different places because of
3256vectors. Since vectors reallocate, their allocations are not guaranteed to be in
3257the same place.
3258
3259So to make all three work, I had to set up the deferred initialization and
3260persistent mode *before* any memory was allocated (except for `vm.jmp_bufs`,
3261which is probably what caused the stability to drop below 100%). However, using
3262deferred alone let me put the [AFL++][125] initialization further back. This
3263works because [AFL++][125] sets up a `fork()` server that `fork()`'s `bc` right
3264at that call. Thus, every run has the exact same virtual memory setup, and each
3265run can skip all of the setup code.
3266
3267I tested `bc` using [AFL++][125]'s deferred initialization, plus persistent
3268mode, plus shared memory fuzzing. In order to do it safely, with stability above
326999%, all of that was actually *slower* than using just deferred initialization
3270with the initialization *right before* `stdin` was read. And as a bonus, the
3271stability in that situation is 100%.
3272
3273As a result, my [AFL++][125] setup only uses deferred initialization. That's the
3274`__AFL_INIT()` call.
3275
3276(Note: there is one more big item that must be done in order to have 100%
3277stability: the pseudo-random number generator *must* start with *exactly* the
3278same seed for every run. This is set up with the `tmux` and `tmuxp` configs that
3279I talk about below in [Convenience][130]. This seed is set before the
3280`__AFL_INIT()` call, so setting it has no runtime cost for each run, but without
3281it, stability would be abysmal.)
3282
3283On top of that, while `dc` is plenty fast under fuzzing (because of a faster
3284parser and less test cases), `bc` can be slow. So I have split the `bc` input
3285corpus into three parts, and I set fuzzers to run on each individually. This
3286means that they will duplicate work, but they will also find more stuff.
3287
3288On top of all of that, each input corpus (the three `bc` corpuses and the one
3289`dc` corpus) is set to run with 4 fuzzers. That works out perfectly for two
3290reasons: first, my machine has 16 cores, and second, the [AFL++][125] docs
3291recommend 4 parallel fuzzers, at least, to run different "power schedules."
3292
3293### Convenience
3294
3295The preprepared input corpuses are contained in the
3296`tests/fuzzing/bc_inputs{1,2,3}/`, and `tests/fuzzing/dc_inputs` directories.
3297There are three `bc` directories and only one `dc` directory because `bc`'s
3298input corpuses are about three times as large, and `bc` is a larger program;
3299it's going to need much more fuzzing.
3300
3301(They do share code though, so fuzzing all of them still tests a lot of the same
3302math code.)
3303
3304The next feature of convenience is the [`scripts/fuzz_prep.sh`][119] script. It
3305assumes the existence of `afl-clang-lto` in the `$PATH`, but if that exists, it
3306automatically configures and builds `bc` with a fuzz-ideal build.
3307
3308A fuzz-ideal build has several things:
3309
3310* `afl-clang-lto` as the compiler. (See [AFL++ Quickstart][129].)
3311* Debug mode, to crash as easily as possible.
3312* Full optimization (including [Link-Time Optimization][126]), for performance.
3313* [AFL++][125]'s deferred initialization (see [Fuzzing Performance][127] above).
3314* And `AFL_HARDEN=1` during the build to harden the build. See the [AFL++][125]
3315  documentation for more information.
3316
3317There is one big thing that a fuzz-ideal build does *not* have: it does not use
3318[AFL++][125]'s `libdislocator.so`. This is because `libdislocator.so` crashes if
3319it fails to allocate memory. I do not want to consider those as crashes because
3320my `bc` does, in fact, handle them gracefully by exiting with a set error code.
3321So `libdislocator.so` is not an option.
3322
3323However, to add to [`scripts/fuzz_prep.sh`][119] making a fuzz-ideal build, in
3324`tests/fuzzing/`, there are two `yaml` files: [`tests/fuzzing/bc_afl.yaml`][120]
3325and [`tests/fuzzing/bc_afl_continue.yaml`][121]. These files are meant to be
3326used with [`tmux`][122] and [`tmuxp`][123]. While other programmers will have to
3327adjust the `start_directory` item, once it is adjusted, then using this command:
3328
3329```
3330tmuxp load tests/fuzzing/bc_afl.yaml
3331```
3332
3333will start fuzzing.
3334
3335In other words, to start fuzzing, the sequence is:
3336
3337```
3338./scripts/fuzz_prep.sh
3339tmuxp load tests/fuzzing/bc_afl.yaml
3340```
3341
3342Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on
3343`bc` and 4 on `dc`. The outputs will be put into the
3344`tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories.
3345
3346(Note that loading that config will also delete all unsaved [AFL++][125] output
3347from the output directories.)
3348
3349Sometimes, [AFL++][125] will report crashes when there are none. When crashes
3350are reported, I always run the following command:
3351
3352```
3353./scripts/afl.py <dir>
3354```
3355
3356where `dir` is one of `bc1`, `bc2`, `bc3`, or `dc`, depending on which of the
335716 instances reported the crash. If it was one of the first four (`bc11` through
3358`bc14`), I use `bc1`. If it was one of the second four (`bc21` through `bc24`, I
3359use `bc2`. If it was one of the third four (`bc31` through `bc34`, I use `bc3`.
3360And if it was `dc`, I use `dc`.
3361
3362The [`scripts/afl.py`][94] script will report whether [AFL++][125] correctly
3363reported a crash or not. If so, it will copy the crashing test case to
3364`.test.txt` and tell you whether it was from running it as a file or through
3365`stdin`.
3366
3367From there, I personally always investigate the crash and fix it. Then, when the
3368crash is fixed, I either move `.test.txt` to `tests/{bc,dc}/errors/<idx>.txt` as
3369an error test (if it produces an error) or I create a new
3370`tests/{bc,dc}/misc<idx>.txt` test for it and a corresponding results file. (See
3371[Test Suite][124] for more information about the test suite.) In either case,
3372`<idx>` is the next number for a file in that particular place. For example, if
3373the last file in `tests/{bc,dc}/errors/` is `tests/{bc,dc}/errors/18.txt`, I
3374move `.test.txt` to `tests/bc/error/19.txt`.
3375
3376Then I immediately run [`scripts/afl.py`][94] again to find the next crash
3377because often, [AFL++][125] found multiple test cases that trigger the same
3378crash. If it finds another, I repeat the process until it is happy.
3379
3380Once it *is* happy, I do the same `fuzz_prep.sh`, `tmuxp load` sequence and
3381restart fuzzing. Why do I restart instead of continuing? Because with the
3382changes, the test outputs could be stale and invalid.
3383
3384However, there *is* a case where I continue: if [`scripts/afl.py`][94] finds
3385that every crash reported by [AFL++][125] is invalid. If that's the case, I can
3386just continue with the command:
3387
3388```
3389tmuxp load tests/fuzzing/bc_afl_continue.yaml
3390```
3391
3392(Note: I admit that I usually run [`scripts/afl.py`][94] while the fuzzer is
3393still running, so often, I don't find a need to continue since there was no
3394stop. However, the capability is there, if needed.)
3395
3396In addition, my fuzzing setup, including the `tmux` and `tmuxp` configs,
3397automatically set up [AFL++][125] power schedules (see [Fuzzing
3398Performance][127] above). They also set up the parallel fuzzing such that there
3399is one fuzzer in each group of 4 that does deterministic fuzzing. It's always
3400the first one in each group.
3401
3402For more information about deterministic fuzzing, see the [AFL++][125]
3403documentation.
3404
3405### Corpuses
3406
3407I occasionally add to the input corpuses. These files come from new files in the
3408[Test Suite][124]. In fact, I use soft links when the files are the same.
3409
3410However, when I add new files to an input corpus, I sometimes reduce the size of
3411the file by removing some redundancies.
3412
3413And then, when adding to the `bc` corpuses, I try to add them evenly so that
3414each corpus will take about the same amount of time to get to a finished state.
3415
3416### [AFL++][125] Quickstart
3417
3418The way [AFL++][125] works is complicated.
3419
3420First, it is the one to invoke the compiler. It leverages the compiler to add
3421code to the binary to help it know when certain branches are taken.
3422
3423Then, when fuzzing, it uses that branch information to generate information
3424about the "path" that was taken through the binary.
3425
3426I don't know what AFL++ counts as a new path, but each new path is added to an
3427output corpus, and it is later used as a springboard to find new paths.
3428
3429This is what makes AFL++ so effective: it's not just blindly thrashing a binary;
3430it adapts to the binary by leveraging information about paths.
3431
3432### Fuzzing Runs
3433
3434For doing a fuzzing run, I expect about a week or two where my computer is
3435basically unusable, except for text editing and light web browsing.
3436
3437Yes, it can take two weeks for me to do a full fuzzing run, and that does *not*
3438include the time needed to find and fix crashes; it only counts the time on the
3439*last* run, the one that does not find any crashes. This means that the entire
3440process can take a month or more.
3441
3442What I use as an indicator that the fuzzing run is good enough is when the
3443number of "Pending" paths (see [AFL++ Quickstart][129] above) for all fuzzer
3444instances, except maybe the deterministic instances, is below 50. And even then,
3445I try to let deterministic instances get that far as well.
3446
3447You can see how many pending paths are left in the "path geometry" section of
3448the [AFL++][125] dashboard.
3449
3450Also, to make [AFL++][125] quit, you need to send it a `SIGINT`, either with
3451`Ctrl+c` or some other method. It will not quit until you tell it to.
3452
3453### Radamsa
3454
3455I rarely use [Radamsa][99] instead of [AFL++][125]. In fact, it's only happened
3456once.
3457
3458The reason I use [Radamsa][99] instead of [AFL++][125] is because it is easier
3459to use with varying command-line arguments, which was needed for testing `bc`'s
3460command-line expression parsing code, and [AFL++][125] is best when testing
3461input from `stdin`.
3462
3463[`scripts/radamsa.sh`][100] does also do fuzzing on the [AFL++][125] inputs, but
3464it's not as effective at that, so I don't really use it for that either.
3465
3466[`scripts/radamsa.sh`][100] and [Radamsa][99] were only really used once; I have
3467not had to touch the command-line expression parsing code since.
3468
3469### [AddressSanitizer][21] with Fuzzing
3470
3471One advantage of using [AFL++][125] is that it saves every test case that
3472generated a new path (see [AFL++ Quickstart][129] above), and it doesn't delete
3473them when the user makes it quit.
3474
3475Keeping them around is not a good idea, for several reasons:
3476
3477* They are frequently large.
3478* There are a lot of them.
3479* They go stale; after `bc` is changed, the generated paths may not be valid
3480  anymore.
3481
3482However, before they are deleted, they can definitely be leveraged for even
3483*more* bug squashing by running *all* of the paths through a build of `bc` with
3484[AddressSanitizer][21].
3485
3486This can easily be done with these four commands:
3487
3488```
3489./scripts/fuzz_prep.sh -a
3490./scripts/afl.py --asan bc1
3491./scripts/afl.py --asan bc2
3492./scripts/afl.py --asan bc3
3493./scripts/afl.py --asan dc
3494```
3495
3496(By the way, the last four commands could be run in separate terminals to do the
3497processing in parallel.)
3498
3499These commands build an [ASan][21]-enabled build of `bc` and `dc` and then they
3500run `bc` and `dc` on all of the found crashes and path output corpuses. This is
3501to check that no path or crash has found any memory errors, including memory
3502leaks.
3503
3504Because the output corpuses can contain test cases that generate infinite loops
3505in `bc` or `dc`, [`scripts/afl.py`][94] has a timeout of 8 seconds, which is far
3506greater than the timeout that [AFL++][125] uses and should be enough to catch
3507any crash.
3508
3509If [AFL++][125] fails to find crashes *and* [ASan][21] fails to find memory
3510errors on the outputs of [AFL++][125], that is an excellent indicator of very
3511few bugs in `bc`, and a release can be made with confidence.
3512
3513## Code Concepts
3514
3515This section is about concepts that, if understood, will make it easier to
3516understand the code as it is written.
3517
3518The concepts in this section are not found in a single source file, but they are
3519littered throughout the code. That's why I am writing them all down in a single
3520place.
3521
3522### POSIX Mode
3523
3524POSIX mode is `bc`-only.
3525
3526In fact, POSIX mode is two different modes: Standard Mode and Warning Mode.
3527These modes are designed to help users write POSIX-compatible `bc` scripts.
3528
3529#### Standard Mode
3530
3531Standard Mode is activated with the `-s` or `--standard` flags.
3532
3533In this mode, `bc` will error if any constructs are used that are not strictly
3534compatible with the [POSIX `bc` specification][2].
3535
3536#### Warning Mode
3537
3538Warning Mode is activated with the `-w` or `--warn` flags.
3539
3540In this mode, `bc` will issue warnings, but continue, if any constructs are used
3541that are not strictly compatible with the [POSIX `bc` specification][2].
3542
3543### Memory Management
3544
3545The memory management in `bc` is simple: everything is owned by one thing.
3546
3547If something is in a vector, it is owned by that vector.
3548
3549If something is contained in a struct, it is owned by that struct with one
3550exception: structs can be given pointers to other things, but only if those
3551other things will outlast the struct itself.
3552
3553As an example, the `BcParse` struct has a pointer to the one `BcProgram` in
3554`bc`. This is okay because the program is initialized first and deallocated
3555last.
3556
3557In other words, it's simple: if a field in a struct is a pointer, then unless
3558that pointer is directly allocated by the struct (like the vector array or the
3559number limb array), that struct does not own the item at that pointer.
3560Otherwise, the struct *does* own the item.
3561
3562### [Async-Signal-Safe][115] Signal Handling
3563
3564`bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but
3565`bc` is, by and large, CPU-bound. This has several consequences, but the biggest
3566is that there is no easy way to allow signals to interrupt it.
3567
3568This consequence is not obvious, but it comes from the fact that a lot of I/O
3569operations can be interrupted and return [`EINTR`][198]. This makes such I/O
3570calls natural places for allowing signals to interrupt execution, even when the
3571signal comes during execution, and not interrupting I/O calls. The way this is
3572done is setting a flag in the signal handler, which is checked around the time
3573of the I/O call, when it is convenient.
3574
3575Alternatively, I/O bound programs can use the [self-pipe trick][199].
3576
3577Neither of these are possible in `bc` because the execution of math code can
3578take a long time. If a signal arrives during this long execution time, setting a
3579flag like an I/O bound application and waiting until the next I/O call could
3580take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a
3581week to calculate a million digits of pi, and it's not slow as far as `bc`
3582implementations go.)
3583
3584Thus, using just the technique of setting the flag just will not work for an
3585interactive calculator.
3586
3587Well, it can, but it requires a lot of code and massive inefficiencies. I know
3588this because that was the original implementation.
3589
3590The original implementation set a flag and just exit the signal handler. Then,
3591on just about every loop header, I have a check for the signal flag. These
3592checks happened on every iteration of every loop. It was a massive waste because
3593it was polling, and [polling is evil][200].
3594
3595So for version [3.0.0][32], I expended a lot of effort to change the
3596implementation.
3597
3598In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`)
3599to tell the signal handler whether it can use `longjmp()` to stop the current
3600execution. If so, it does. If not, it sets a flag, which then is used by the
3601code outside the signal handler that set the `vm.sig_lock` flag. When that code
3602unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that
3603code executes the `longjmp()` and stops the current execution.
3604
3605Other than that, the rest of the interrupt-based implementation is best
3606described in the [Error Handling][97].
3607
3608However, there are rules for signal handlers that I must lay out.
3609
3610First, signal handlers can only call [async-signal-safe][115] functions.
3611
3612Second, any field set or read by both the signal handler and normal code must be
3613a `volatile sig_atomic_t`.
3614
3615Third, when setting such fields, they must be set to constants and no math can
3616be done on them. This restriction and the above restriction exist in order to
3617ensure that the setting of the fields is always atomic with respect to signals.
3618
3619These rules exist for *any* code using Unix signal handlers, not just `bc`.
3620
3621#### Vectors and Numbers
3622
3623Vectors and numbers needed special consideration with the interrupt-based signal
3624handling.
3625
3626When vectors and numbers are about to allocate, or *reallocate* their arrays,
3627they need to lock signals to ensure that they do not call `malloc()` and friends
3628and get interrupted by a signal because, as you will see in the [Error
3629Handling][97] section, `longjmp()` cannot be used in a signal handler if it may
3630be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and
3631friends.
3632
3633### Asserts
3634
3635If you asked me what procedure is used the most in `bc`, I would reply without
3636hesitation, "`assert()`."
3637
3638I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with
3639[AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes,
3640and a failing `assert()` counts as one.
3641
3642So while a lot of bad bugs might have corrupted data and *not* caused crashes,
3643because I put in so many `assert()`'s, they were *turned into* crashing bugs,
3644and [AFL++][125] found them.
3645
3646By far, the most bugs it found this way was in the `bc` parser. (See the [`bc`
3647Parsing][110] for more information.) And even though I was careful to put
3648`assert()`'s everywhere, most parser bugs manifested during execution of
3649bytecode because the virtual machine assumes the bytecode is valid.
3650
3651Sidenote: one of those bugs caused an infinite recursion when running the sine
3652(`s()`) function in the math library, so yes, parser bugs can be *very* weird.
3653
3654Anyway, the way I did `assert()`'s was like this: whenever I realized that I
3655had put assumptions into the code, I would put an `assert()` there to test it
3656**and** to *document* it.
3657
3658Yes, documentation. In fact, by far the best documentation of the code in `bc`
3659is actually the `assert()`'s. The only time I would not put an `assert()` to
3660test an assumption is if that assumption was already tested by an `assert()`
3661earlier.
3662
3663As an example, if a function calls another function and passes a pointer that
3664the caller previously `assert()`'ed was *not* `NULL`, then the callee does not
3665have to `assert()` it too, unless *also* called by another function that does
3666not `assert()` that.
3667
3668At first glance, it may seem like putting asserts for pointers being non-`NULL`
3669everywhere would actually be good, but unfortunately, not for fuzzing. Each
3670`assert()` is a branch, and [AFL++][125] rates its own effectiveness based on
3671how many branches it covers. If there are too many `assert()`'s, it may think
3672that it is not being effective and that more fuzzing is needed.
3673
3674This means that `assert()`'s show up most often in two places: function
3675preconditions and function postconditions.
3676
3677Function preconditions are `assert()`'s that test conditions relating to the
3678arguments a function was given. They appear at the top of the function, usually
3679before anything else (except maybe initializing a local variable).
3680
3681Function postconditions are `assert()`'s that test the return values or other
3682conditions when a function exits. These are at the bottom of a function or just
3683before a `return` statement.
3684
3685The other `assert()`'s cover various miscellaneous assumptions.
3686
3687If you change the code, I ***HIGHLY*** suggest that you use `assert()`'s to
3688document your assumptions. And don't remove them when [AFL++][125] gleefully
3689crashes `bc` and `dc` over and over again.
3690
3691### Vectors
3692
3693In `bc`, vectors mean resizable arrays, and they are the most fundamental piece
3694of code in the entire codebase.
3695
3696I had previously written a [vector implementation][112], which I used to guide
3697my decisions, but I wrote a new one so that `bc` would not have a dependency. I
3698also didn't make it as sophisticated; the one in `bc` is very simple.
3699
3700Vectors store some information about the type that they hold:
3701
3702* The size (as returned by `sizeof`).
3703* An enum designating the destructor.
3704
3705If the destructor is `BC_DTOR_NONE`, it is counted as the type not having a
3706destructor.
3707
3708But by storing the size, the vector can then allocate `size * cap` bytes, where
3709`cap` is the capacity. Then, when growing the vector, the `cap` is doubled again
3710and again until it is bigger than the requested size.
3711
3712But to store items, or to push items, or even to return items, the vector has to
3713figure out where they are, since to it, the array just looks like an array of
3714bytes.
3715
3716It does this by calculating a pointer to the underlying type with
3717`v + (i * size)`, where `v` is the array of bytes, `i` is the index of the
3718desired element, and `size` is the size of the underlying type.
3719
3720Doing that, vectors can avoid undefined behavior (because `char` pointers can
3721be cast to any other pointer type), while calculating the exact position of
3722every element.
3723
3724Because it can do that, it can figure out where to push new elements by
3725calculating `v + (len * size)`, where `len` is the number of items actually in
3726the vector.
3727
3728By the way, `len` is different from `cap`. While `cap` is the amount of storage
3729*available*, `len` is the number of actual elements in the vector at the present
3730point in time.
3731
3732Growing the vector happens when `len` is equal to `cap` *before* pushing new
3733items, not after.
3734
3735To add a destructor, you need to add an enum item to `BcDtorType` in
3736[`include/vector.h`][174] and add the actual destructor in the same place as the
3737enum item in the `bc_vec_dtors[]` array in [`src/data.c`][131].
3738
3739#### Pointer Invalidation
3740
3741There is one big danger with the vectors as currently implemented: pointer
3742invalidation.
3743
3744If a piece of code receives a pointer from a vector, then adds an item to the
3745vector before they finish using the pointer, that code must then update the
3746pointer from the vector again.
3747
3748This is because any pointer inside the vector is calculated based off of the
3749array in the vector, and when the vector grows, it can `realloc()` the array,
3750which may move it in memory. If that is done, any pointer returned by
3751`bc_vec_item()`, `bc_vec_top()` and `bc_vec_item_rev()` will be invalid.
3752
3753This fact was the single most common cause of crashes in the early days of this
3754`bc`; wherever I have put a comment about pointers becoming invalidated and
3755updating them with another call to `bc_vec_item()` and friends, *do **NOT**
3756remove that code!*
3757
3758#### Maps
3759
3760Maps in `bc` are...not.
3761
3762They are really a combination of two vectors. Those combinations are easily
3763recognized in the source because one vector is named `<name>s` (plural), and the
3764other is named `<name>_map`.
3765
3766There are currently three, all in `BcProgram`:
3767
3768* `fns` and `fn_map` (`bc` functions).
3769* `vars` and `var_map` (variables).
3770* `arrs` and `arr_map` (arrays).
3771
3772They work like this: the `<name>_map` vector holds `BcId`'s, which just holds a
3773string and an index. The string is the name of the item, and the index is the
3774index of that item in the `<name>s` vector.
3775
3776Obviously, I could have just done a linear search for items in the `<name>s`
3777vector, but that would be slow with a lot of functions/variables/arrays.
3778Instead, I ensure that whenever an item is inserted into the `<name>_map`
3779vector, the item is inserted in sorted order. This means that the `<name>_map`
3780is always sorted (by the names of the items).
3781
3782So when looking up an item in the "map", what is really done is this:
3783
37841.	A binary search is carried out on the names in the `<name>_map` vector.
37852.	When one is found, it returns the index in the `<name>_map` vector where the
3786	item was found.
37873.	This index is then used to retrieve the `BcId`.
37884.	The index from the `BcId` is then used to index into the `<name>s` vector,
3789	which returns the *actual* desired item.
3790
3791Why were the `<name>s` and `<name>_map` vectors not combined for ease? The
3792answer is that sometime, when attempting to insert into the "map", code might
3793find that something is already there. For example, a function with that name may
3794already exist, or the variable might already exist.
3795
3796If the insert fails, then the name already exists, and the inserting code can
3797forego creating a new item to put into the vector. However, if there is no item,
3798the inserting code must create a new item and insert it.
3799
3800If the two vectors were combined together, it would not be possible to separate
3801the steps such that creating a new item could be avoided if it already exists.
3802
3803#### Slabs and Slab Vectors
3804
3805`bc` allocates *a lot* of small strings, and small allocations are the toughest
3806for general-purpose allocators to handle efficiently.
3807
3808Because of that reason, I decided to create a system for allocating small
3809strings using something that I call a "slab vector" after [slab
3810allocators][201].
3811
3812These vectors allocate what I call "slabs," which are just an allocation of a
3813single page with a length to tell the slab how much of the slab is used.
3814
3815The vector itself holds slabs, and when the slab vector is asked to allocate a
3816string, it attempts to in the last slab. If that slab cannot do so, it allocates
3817a new slab and allocates from that.
3818
3819There is one exception: if a string is going to be bigger than 128 bytes, then
3820the string is directly allocated, and a slab is created with that pointer and a
3821length of `SIZE_MAX`, which tells the slab vector that it is a direct
3822allocation. Then, the last slab is pushed into the next spot and the new special
3823slab is put into the vacated spot. This ensures that a non-special slab is
3824always last.
3825
3826### Command-Line History
3827
3828When I first wrote `bc`, I immediately started using it in order to eat my own
3829dog food.
3830
3831It sucked, and the biggest reason why was because of the lack of command-line
3832history.
3833
3834At first, I just dealt with it, not knowing how command-line history might be
3835implemented.
3836
3837Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had
3838known about for some time.
3839
3840It turned out to be easier than I thought; the hardest part was the tedious
3841renaming of everything to fit the `bc` naming scheme.
3842
3843Understanding command-line history in `bc` is really about understanding VT-100
3844escape codes, so I would start there.
3845
3846Now, the history implementation of `bc` has been adapted far beyond that initial
3847adaptation to make the command-line history implementation perfect for `bc`
3848alone, including integrating it into `bc`'s [Custom I/O][114] and making sure
3849that it does not disturb output that did not end with a newline.
3850
3851On top of that, at one point, I attempted to get history to work on Windows. It
3852barely worked after a lot of work and a lot of portability code, but even with
3853all of that, it does not have at least one feature: multi-line pasting from the
3854clipboard.
3855
3856#### Editline and Readline
3857
3858I also implemented an integration with both editline and readline, though not at
3859the same time. This allows distributions that want to use either one in `bc` to
3860do so. This enables users to use their `.editrc` and `.inputrc` settings.
3861
3862The integration is custom to each library, and it does *not* involve any of
3863`bc`'s custom history code. It also required changes to `bc`'s [Custom
3864I/O][114].
3865
3866### Error Handling
3867
3868The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it
3869became one of the things that taught me the most about C in particular and
3870programming in general.
3871
3872Before then, error handling was manual. Almost all functions returned a
3873`BcStatus` indicating if an error had occurred. This led to a proliferation of
3874lines like:
3875
3876```
3877if (BC_ERR(s)) return s;
3878```
3879
3880In fact, a quick and dirty count of such lines in version `2.7.2` (the last
3881version before [`3.0.0`][32]) turned up 252 occurrences of that sort of line.
3882
3883And that didn't even guarantee that return values were checked *everywhere*.
3884
3885But before I can continue, let me back up a bit.
3886
3887From the beginning, I decided that I would not do what GNU `bc` does on errors;
3888it tries to find a point at which it can recover. Instead, I decided that I
3889would have `bc` reset to a clean slate, which I believed, would reduce the
3890number of bugs where an unclean state caused errors with continuing execution.
3891
3892So from the beginning, errors would essentially unwind the stack until they got
3893to a safe place from which to clean the slate, reset, and ask for more input.
3894
3895Well, if that weren't enough, `bc` also has to handle [POSIX signals][113]. As
3896such, it had a signal handler that set a flag. But it could not safely interrupt
3897execution, so that's all it could do.
3898
3899In order to actually respond to the signal, I had to litter checks for the flag
3900*everywhere* in the code. And I mean *everywhere*. They had to be checked on
3901every iteration of *every* loop. They had to be checked going into and out of
3902certain functions.
3903
3904It was a mess.
3905
3906But fortunately for me, signals did the same thing that errors did: they unwound
3907the stack to the *same* place.
3908
3909Do you see where I am going with this?
3910
3911It turns out that what I needed was a [async-signal-safe][115] form of what
3912programmers call "exceptions" in other languages.
3913
3914I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement
3915exceptions, so I thought I would learn how to use them. How hard could it be?
3916
3917Quite hard, it turns out, especially in the presence of signals. And that's
3918because there are a few big snares:
3919
39201.	The value of any local variables are not guaranteed to be preserved after a
3921	`longjmp()` back into a function.
39222.	While `longjmp()` is required to be [async-signal-safe][115], if it is
3923	invoked by a signal handler that interrupted a non-[async-signal-safe][115]
3924	function, then the behavior is undefined.
39253.	Any mutation that is not guaranteed to be atomic with respect to signals may
3926	be incomplete when a signal arrives.
3927
3928Oh boy.
3929
3930For number 1, the answer to this is to hide data that must stay changed behind
3931pointers. Only the *pointers* are considered local, so as long as I didn't do
3932any modifying pointer arithmetic, pointers and their data would be safe. For
3933cases where I have local data that must change and stay changed, I needed to
3934*undo* the `setjmp()`, do the change, and the *redo* the `setjmp()`.
3935
3936For number 2 and number 3, `bc` needs some way to tell the signal handler that
3937it cannot do a `longjmp()`. This is done by "locking" signals with a `volatile
3938sig_atomic_t`. (For more information, see the [Async-Signal-Safe Signal
3939Handling][173] section.) For every function that calls a function that is not
3940async-signal-safe, they first need to use `BC_SIG_LOCK` to lock signals, and
3941afterward, use `BC_SIG_UNLOCK` to unlock signals.
3942
3943Code also need to do this for all global, non-atomic mutation, which means that
3944modifying any part of the `BcVm` global struct.
3945
3946`BC_SIG_UNLOCK` has another requirement: it must check for signals or errors and
3947jump if necessary.
3948
3949On top of all of that, *all* functions with cleanup needed to be able to run
3950their cleanup. This meant that `longjmp()` could not just jump to the finish; it
3951had to start what I call a "jump series," using a stack of `jmp_buf`'s
3952(`jmp_bufs` in `BcVm`). Each `longjmp()` uses the top of the `jmp_bufs` stack to
3953execute its jump. Then, if the cleanup code was executed because of a jump, the
3954cleanup code was responsible for continuing the jump series by popping the
3955previous item off the stack and using the new top of the stack for a jump.
3956
3957In this way, C++-style exceptions were implemented in pure C. Not fun, but it
3958works. However, the order of operations matters, especially in the macros that
3959help implement the error handling.
3960
3961For example, in `BC_UNSETJMP`, signals are unlocked before checking for signals.
3962If a signal comes between, that's fine; it will still cause a jump to the right
3963place. However, disabling the lock after could mean that a signal could come
3964*after* checking for signals, but before signals were unlocked, causing the
3965handling of the signal to be delayed.
3966
3967#### Custom I/O
3968
3969Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and
3970`longjmp()` for error handling (see the [Error Handling][97] section), and the
3971buffered I/O in `libc` does not interact well with the use of those procedures;
3972all of the buffered I/O API is basically non-[async-signal-safe][115].
3973
3974Implementing custom buffered I/O had other benefits. First, it allowed me to
3975tightly integrate history with the I/O code. Second, it allowed me to make
3976changes to history in order to make it adapt to user prompts.
3977
3978### Lexing
3979
3980To simplify parsing, both calculators use lexers to turn the text into a more
3981easily-parsable form.
3982
3983While some tokens are only one character long, others require many tokens, and
3984some of those need to store all of the text corresponding to the token for use
3985by the parsers. Tokens that need to store their corresponding text include, but
3986are not limited to:
3987
3988* Strings.
3989* Numbers.
3990* Identifiers.
3991
3992For this purpose, the lexer has a [vector][111] named `str` to store the data
3993for tokens. This data is overwritten if another token is lexed that needs to
3994store data, so the parsers need to copy the data before calling the lexer again.
3995
3996Both lexers do some of the same things:
3997
3998* Lex identifiers into tokens, storing the identifier in `str`.
3999* Lex number strings into tokens, storing the string in `str`.
4000* Lex whitespace.
4001* Lex comments.
4002
4003Other than that, and some common plumbing, the lexers have separate code.
4004
4005#### `dc` Lexing
4006
4007The `dc` lexer is remarkably simple; in fact, besides [`src/main.c`][205],
4008[`src/bc.c`][40], and [`src/dc.c`][44], which just contain one function each,
4009the only file smaller than [`src/dc_lex.c`][45] is [`src/args.c`][206], which
4010just processes command-line arguments after they are parsed by
4011[`src/opt.c`][51].
4012
4013For most characters, the `dc` lexer is able to convert directly from the
4014character to its corresponding token. This happens using `dc_lex_tokens[]` in
4015[`src/data.c`][131].
4016
4017`dc`'s lexer also has to lex the register name after lexing tokens for commands
4018that need registers.
4019
4020And finally, `dc`'s lexer needs to parse `dc` strings, which is the only part of
4021the `dc` lexer that is more complex than the `bc` lexer. This is because `dc`
4022strings need to have a balanced number of brackets.
4023
4024#### `bc` Lexing
4025
4026The `bc` lexer is fairly simple. It does the following things:
4027
4028* Lexes `bc` strings.
4029* Lexes `bc` identifiers. This is necessary because this is how `bc` keywords
4030  are lexed. After ensuring that an identifier is not a keyword, the `bc` lexer
4031  allows the common identifier function to take over.
4032* Turns characters and groups of characters into `bc` operator tokens.
4033
4034### Parsing
4035
4036The difference between parsing `bc` and `dc` code is...vast. The `dc` parser is
4037simple, while the `bc` parser is the most complex piece of code in the entire
4038codebase.
4039
4040However, they both do some of the same things.
4041
4042First, the parsers do *not* use [abstract syntax trees][207]; instead, they
4043directly generate the bytecode that will be executed by the `BcProgram` code.
4044Even in the case of `bc`, this heavily simplifies the parsing because the
4045[Shunting-Yard Algorithm][109] is designed to generate [Reverse Polish
4046Notation][108], which is basically directly executable.
4047
4048Second, any extra data that the `BcProgram` needs for execution is stored into
4049functions (see the [Functions][208] section). These include constants and
4050strings.
4051
4052#### `dc` Parsing
4053
4054The parser for `dc`, like its lexer, is remarkably simple. In fact, the easiness
4055of lexing and parsing [Reverse Polish notation][108] is probably why it was used
4056for `dc` when it was first created at Bell Labs.
4057
4058For most tokens, the `dc` parser is able to convert directly from the token
4059to its corresponding instruction. This happens using `dc_parse_insts[]` in
4060[`src/data.c`][131].
4061
4062`dc`'s parser also has to parse the register name for commands that need
4063registers. This is the most complex part of the `dc` parser; each different
4064register command needs to be parsed differently because most of them require two
4065or more instructions to execute properly.
4066
4067For example, storing in a register requires a swap instruction and an assignment
4068instruction.
4069
4070Another example are conditional execution instructions; they need to produce the
4071instruction for the condition, and then they must parse a possible "else" part,
4072which might not exist.
4073
4074##### Existing Commands
4075
4076`dc` is based on commands, which are usually one letter. The following table is
4077a table of which ASCII characters are already used:
4078
4079| Characters | Used? | For...                                     |
4080|------------|-------|--------------------------------------------|
4081| Space      | x     | Separator                                  |
4082| `!`        | x     | Conditional Execution of Registers         |
4083| `"`        | x     | Bounded Rand Operator                      |
4084| `#`        | x     | Comments                                   |
4085| `$`        | x     | Truncation                                 |
4086| `%`        | x     | Modulus                                    |
4087| `&`        |       |                                            |
4088| `'`        | x     | Rand Operator                              |
4089| `(`        | x     | Greater Than Operator                      |
4090| `)`        | x     | Less Than Operator                         |
4091| `*`        | x     | Multiplication                             |
4092| `+`        | x     | Addition                                   |
4093| `,`        | x     | Depth of Execution Stack                   |
4094| `-`        | x     | Subtraction                                |
4095| `.`        | x     | Numbers                                    |
4096| `/`        | x     | Division                                   |
4097| `0-9`      | x     | Numbers                                    |
4098| `:`        | x     | Store into Array                           |
4099| `;`        | x     | Load from Array                            |
4100| `<`        | x     | Conditional Execution of Registers         |
4101| `=`        | x     | Conditional Execution of Registers         |
4102| `>`        | x     | Conditional Execution of Registers         |
4103| `?`        | x     | Ask for User Input                         |
4104| `@`        | x     | Places Operator                            |
4105| `A-F`      | x     | Numbers                                    |
4106| `G`        | x     | Equal Operator                             |
4107| `H`        | x     | Shift Left                                 |
4108| `I`        | x     | Push `ibase` onto Stack                    |
4109| `J`        | x     | Push `seed` onto Stack                     |
4110| `K`        | x     | Push `scale` onto Stack                    |
4111| `L`        | x     | Pop off of Register                        |
4112| `M`        | x     | Boolean And Operator                       |
4113| `N`        | x     | Boolean Not Operator                       |
4114| `O`        | x     | Push `obase` onto Stack                    |
4115| `P`        | x     | Byte Stream Printing                       |
4116| `Q`        | x     | Quit Some Number of Macros                 |
4117| `R`        | x     | Pop Top of Stack                           |
4118| `S`        | x     | Push onto Register                         |
4119| `T`        | x     | Push Max `ibase` onto Stack                |
4120| `U`        | x     | Push Max `obase` onto Stack                |
4121| `V`        | x     | Push Max `scale` onto Stack                |
4122| `W`        | x     | Push Max of `'` Operator                   |
4123| `X`        | x     | Scale of a Number                          |
4124| `Y`        | x     | Length of Array                            |
4125| `Z`        | x     | Number of Significant Digits               |
4126| `[`        | x     | Strings                                    |
4127| `\\`       | x     | Escaping Brackets in Strings               |
4128| `]`        | x     | Strings                                    |
4129| `^`        | x     | Power                                      |
4130| `_`        | x     | Negative Numbers and Negation              |
4131| Backtick   |       |                                            |
4132| `a`        | x     | Asciify                                    |
4133| `b`        | x     | Absolute Value                             |
4134| `c`        | x     | Clear Stack                                |
4135| `d`        | x     | Duplication of Top of Stack                |
4136| `e`        | x     | Else in Conditional Execution of Registers |
4137| `f`        | x     | Printing the Stack                         |
4138| `g`        | x     | Global Settings                            |
4139| `h`        | x     | Shift Right                                |
4140| `i`        | x     | Set `ibase`                                |
4141| `j`        | x     | Set `seed`                                 |
4142| `k`        | x     | Set `scale`                                |
4143| `l`        | x     | Load from Register                         |
4144| `m`        | x     | Boolean Or Operator                        |
4145| `n`        | x     | Print and Pop                              |
4146| `o`        | x     | Set `obase`                                |
4147| `p`        | x     | Print with Newline                         |
4148| `q`        | x     | Quit Two Macros                            |
4149| `r`        | x     | Swap Top Two Items                         |
4150| `s`        | x     | Store into Register                        |
4151| `t`        | x     | Equivalent of `bc`'s `is_number()`         |
4152| `u`        | x     | Equivalent of `bc`'s `is_string()`         |
4153| `v`        | x     | Square Root                                |
4154| `w`        |       |                                            |
4155| `x`        | x     | Execute String                             |
4156| `y`        | x     | Current Depth of a Register                |
4157| `z`        | x     | Current Depth of Stack                     |
4158| `{`        | x     | Greater Than or Equal Operator             |
4159| `\|`       | x     | Moduler Exponentiation                     |
4160| `}`        | x     | Less Than or Equal Operator                |
4161| `~`        | x     | Division and Modulus Combined              |
4162
4163#### `bc` Parsing
4164
4165`bc`'s parser is, by far, the most sensitive piece of code in this software, and
4166there is a very big reason for that: `bc`'s standard is awful and defined a very
4167poor language.
4168
4169The standard says that either semicolons or newlines can end statements. Trying
4170to parse the end of a statement when it can either be a newline or a semicolon
4171is subtle. Doing it in the presence of control flow constructs that do not have
4172to use braces is even harder.
4173
4174And then comes the biggest complication of all: `bc` has to assume that it is
4175*always* at a REPL (Read-Eval-Print Loop). `bc` is, first and foremost, an
4176*interactive* utility.
4177
4178##### Flags
4179
4180All of this means that `bc` has to be able to partially parse something, store
4181enough data to recreate that state later, and return, making sure to not
4182execute anything in the meantime.
4183
4184*That* is what the flags in [`include/bc.h`][106] are: they are the state that
4185`bc` is saving for itself.
4186
4187It saves them in a stack, by the way, because it's possible to nest
4188structures, just like any other programming language. Thus, not only does it
4189have to store state, it needs to do it arbitrarily, and still be able to
4190come back to it.
4191
4192So `bc` stores its parser state with flags in a stack. Careful setting of these
4193flags, along with properly using them and maintaining the flag stack, are what
4194make `bc` parsing work, but it's complicated. In fact, as I mentioned, the `bc`
4195parser is the single most subtle, fickle, and sensitive piece of code in the
4196entire codebase. Only one thing came close once: square root, and that was only
4197sensitive because I wrote it wrong. This parser is pretty good, and it is
4198*still* sensitive. And flags are the reason why.
4199
4200For more information about what individual flags there are, see the comments in
4201[`include/bc.h`][106].
4202
4203##### Labels
4204
4205`bc`'s language is Turing-complete. That means that code needs the ability to
4206jump around, specifically to implement control flow like `if` statements and
4207loops.
4208
4209`bc` handles this while parsing with what I called "labels."
4210
4211Labels are markers in the bytecode. They are stored in functions alongside the
4212bytecode, and they are just indices into the bytecode.
4213
4214When the `bc` parser creates a label, it pushes an index onto the labels array,
4215and the index of the label in that array is the index that will be inserted into
4216the bytecode.
4217
4218Then, when a jump happens, the index pulled out of the bytecode is used to index
4219the labels array, and the label (index) at the index is then used to set the
4220instruction pointer.
4221
4222##### Cond Labels
4223
4224"Cond" labels are so-called because they are used by conditionals.
4225
4226The key to them is that they come *before* the code that uses them. In other
4227words, when jumping to a condition, code is jumping *backwards*.
4228
4229This means that when a cond label is created, the value that should go there is
4230well-known. Cond labels are easy.
4231
4232However, they are still stored on a stack so that the parser knows what cond
4233label to use.
4234
4235##### Exit Labels
4236
4237Exit labels are not so easy.
4238
4239"Exit" labels are so-called because they are used by code "exiting" out of `if`
4240statements or loops.
4241
4242The key to them is that they come *after* the code that uses them. In other
4243words, when jumping to an exit, code is jumping *forwards*.
4244
4245But this means that when an exit label is created, the value that should go
4246there is *not* known. The code that needs it must be parsed and generated first.
4247
4248That means that exit labels are created with the index of `SIZE_MAX`, which is
4249then specifically checked for with an assert in `bc_program_exec()` before using
4250those indices.
4251
4252There should ***NEVER*** be a case when an exit label is not filled in properly
4253if the parser has no bugs. This is because every `if` statement, every loop,
4254must have an exit, so the exit must be set. If not, there is a bug.
4255
4256Exit labels are also stored on a stack so that the parser knows what exit label
4257to use.
4258
4259##### Expression Parsing
4260
4261`bc` has expressions like you might expect in a typical programming language.
4262This means [infix notation][107].
4263
4264One thing about infix notation is that you can't just generate code straight
4265from it like you can with [Reverse Polish notation][108]. It requires more work
4266to shape it into a form that works for execution on a stack machine.
4267
4268That extra work is called the [Shunting-Yard algorithm][109], and the form it
4269translates infix notation into is...[Reverse Polish notation][108].
4270
4271In order to understand the rest of this section, you must understand the
4272[Shunting-Yard algorithm][109]. Go do that before you read on.
4273
4274###### Operator Stack
4275
4276In `bc`, the [Shunting-Yard algorithm][109] is implemented with bytecode as the
4277output and an explicit operator stack (the `ops` field in `BcParse`) as the
4278operator stack. It stores tokens from `BcLex`.
4279
4280However, there is one **HUGE** hangup: multiple expressions can stack. This
4281means that multiple expressions can be parsed at one time (think an array element
4282expression in the middle of a larger expression). Because of that, we need to
4283keep track of where the previous expression ended. That's what `start` parameter
4284to `bc_parse_operator()` is.
4285
4286Parsing multiple expressions on one operator stack only works because
4287expressions can only *stack*; this means that, if an expression begins before
4288another ends, it must *also* end before that other expression ends. This
4289property ensures that operators will never interfere with each other on the
4290operator stack.
4291
4292Also, the operator precedence is reversed: a lower precedence *value* means a
4293higher precedence.
4294
4295###### Recursion
4296
4297Because expressions can stack, parsing expressions actually requires recursion.
4298Well, it doesn't *require* it, but the code is much more readable that way.
4299
4300This recursion is indirect; the functions that `bc_parse_expr_err()` (the actual
4301expression parsing function) calls can, in turn, call it.
4302
4303###### Expression Flags
4304
4305There is one more big thing: not all expressions in `bc` are equal.
4306
4307Some expressions have requirements that others don't have. For example, only
4308array arguments can be arrays (which are technically not expressions, but are
4309treated as such for parsing), and some operators (in POSIX) are not allowed in
4310certain places.
4311
4312For this reason, functions that are part of the expression parsing
4313infrastructure in `bc`'s parser usually take a `flags` argument. This is meant
4314to be passed to children, and somewhere, they will be checked to ensure that the
4315resulting expression meets its requirements.
4316
4317There are also places where the flags are changed. This is because the
4318requirements change.
4319
4320Maintaining the integrity of the requirements flag set is an important part of
4321the `bc` parser. However, they do not have to be stored on a stack because their
4322stack is implicit from the recursion that expression parsing uses.
4323
4324### Functions
4325
4326Functions, in `bc`, are data structures that contain the bytecode and data
4327produced by the parsers. Functions are what the `BcProgram` program executes.
4328
4329#### Main and Read Functions
4330
4331There are two functions that always exist, which I call the "main" and "read"
4332functions.
4333
4334The "main" function is the function in which any code and data outside other
4335functions is put. Basically, it is the function where the scripting code ends
4336up.
4337
4338The "read" function is the function that is reset and parsed every time a call
4339to the `read()` builtin function happens.
4340
4341#### `dc` Strings
4342
4343In `dc`, strings can be executed, and since there are no actual "functions" in
4344`dc`, strings are handled as functions. In fact, they are effectively translated
4345into functions by parsing.
4346
4347##### Tail Calls
4348
4349Since strings in `dc` are functions, and the fact that `dc` has no native loops,
4350such loops are implemented in `dc` code using strings with conditional execution
4351commands at the end of strings.
4352
4353When such conditional execution, or even unconditional execution, commands are
4354the very last commands in a string, then `dc` can perform a [tail call][202].
4355
4356This is done by recording the fact that a tail call happened, done by
4357incrementing an integer on a stack. When a string is executed *without* a tail
4358call, a new entry is pushed onto the stack with the integer `1`.
4359
4360When a string finally quits that followed tail calls, its stack entry is popped,
4361eliminating all of those tail calls.
4362
4363Why perform tail calls? Because otherwise, `dc` would be subject to the same
4364thing that plagues [functional programming languages][203]: stack overflow. In
4365`dc`'s case, that would manifest itself as a growing [heap][204], because the
4366execution stack is stored on the heap, until a fatal allocation failure would
4367occur.
4368
4369#### Execution
4370
4371Execution is handled by an interpreter implemented using `BcProgram` and code
4372in [`src/program.c`][53].
4373
4374The interpreter is a mix between a [stack machine][210] and a [register
4375machine][211]. It is a stack machine in that operations happen on a stack I call
4376the "results stack," but it is a register machine in that items on the stack can
4377be stored to and loaded from "registers" (`dc` terminology), variables (`bc`
4378terminology), and arrays.
4379
4380##### Stacks
4381
4382There are two stacks in the interpreter:
4383
4384* The "results" stack (as mentioned above).
4385* The "execution" stack.
4386
4387The results stack (the `results` field of the `BcProgram` struct) is the stack
4388where the results of computations are stored. It is what makes the interpreter
4389part [stack machine][210]. It is filled with `BcResult`'s.
4390
4391The execution stack (the `stack` field of the `BcProgram` struct) is the stack
4392that tracks the current execution state of the interpreter. It is the presence
4393of this separate stack that allows the interpreter to implement the machine as a
4394loop, rather than recursively. It is filled with `BcInstPtr`'s, which are the
4395"instruction pointers."
4396
4397These instruction pointers have three fields, all integers:
4398
4399* `func`, the index of the function that is currently executing.
4400* `idx`, the index of the next bytecode instruction to execute in the function's
4401  bytecode array.
4402* `len`, which is the length of the results stack when the function started
4403  executing. This is not used by `dc`, but it used by `bc` because functions
4404  in `bc` should never affect the results stack of their callers.
4405
4406With these three fields, and always executing using the instruction pointer at
4407the top of the execution stack, the interpreter can always keep track of its
4408execution.
4409
4410When a function or a string starts executing, a new `BcInstPtr` is pushed onto
4411the execution stack for it. This includes if a function was called recursively.
4412And then, when the function or string returns, its `BcInstPtr` is popped off of
4413the execution stack.
4414
4415##### Bytecode
4416
4417Execution of functions are done through bytecode produced directly by the
4418parsers (see the [Parsing][209]). This bytecode is stored in the `code`
4419[vector][111] of the `BcFunc` struct.
4420
4421This is a vector for two reasons:
4422
4423* It makes it easier to add bytecode to the vector in the parsers.
4424* `bc` allows users to redefine functions.
4425
4426The reason I can use bytecode is because there are less than 256 instructions,
4427so an `unsigned char` can store all the bytecodes.
4428
4429###### Bytecode Indices
4430
4431There is one other factor to bytecode: there are instructions that need to
4432reference strings, constants, variables, or arrays. Bytecode need some way to
4433reference those things.
4434
4435Fortunately, all of those things can be referenced in the same way: with indices
4436because all of the items are in vectors.
4437
4438So `bc` has a way of encoding an index into bytecode. It does this by, after
4439pushing the instruction that references anything, pushing a byte set to the
4440length of the index in bytes, then the bytes of the index are pushed in
4441little-endian order.
4442
4443Then, when the interpreter encounters an instruction that needs one or more
4444items, it decodes the index or indices there and updates the `idx` field of the
4445current `BcInstPtr` to point to the byte after the index or indices.
4446
4447One more thing: the encoder of the indices only pushes as many bytes as
4448necessary to encode the index. It stops pushing when the index has no more bytes
4449with any 1 bits.
4450
4451##### Variables
4452
4453In `bc`, the vector of variables, `vars` in `BcProgram`, is not a vector of
4454numbers; it is a vector of vector of numbers. The first vector is the vector of
4455variables, the second is the variable stack, and the last level is the actual
4456number.
4457
4458This is because both `bc` and `dc` need variables to be stacks.
4459
4460For `dc`, registers are *defined* to be stacks.
4461
4462For `bc`, variables as stacks is how function arguments/parameters and function
4463`auto` variables are implemented.
4464
4465When a function is called, and a value needs to be used as a function argument,
4466a copy of the value is pushed onto the stack corresponding to the variable with
4467the same name as the function's parameter. For `auto` variables, a new number
4468set to zero is pushed onto each stack corresponding to the `auto` variables.
4469(Zero is used because the [`bc` spec][2] requires that `auto` variables are set
4470to zero.)
4471
4472It is in this way that the old value of the variable, which may not even be
4473related to the function parameter or `auto` variable, is preserved while the
4474variable is used as a function parameter or `auto` variable.
4475
4476When the function returns, of course, the stacks of the variables for the
4477parameters and `auto`'s will have their top item popped, restoring the old value
4478as it was before the function call.
4479
4480##### Arrays
4481
4482Like variables, arrays are also implemented as stacks. However, because they are
4483arrays, there is yet another level; the `arrs` field in `BcProgram` is a vector
4484of vectors of vectors of numbers. The first of the two levels is the vector of
4485arrays, the second the stack of for each array, the third the actual array, and
4486last the numbers in the array.
4487
4488`dc` has no need of this extra stack, but `bc` does because arrays can be
4489function parameters themselves.
4490
4491When arrays are used for function arguments, they are copied with a deep copy;
4492each item of the source vector is copied. This is because in `bc`, according to
4493the [`bc` spec][2], all function arguments are passed by value.
4494
4495However, array references are possible (see below).
4496
4497When arrays are used as `auto`'s, a new vector is pushed with one element; if
4498more elements are needed, the array is grown automatically, and new elements are
4499given the value of zero.
4500
4501In fact, if *any* array is accessed and does not have an element at that index,
4502the array is automatically grown to that size, and all new elements are given
4503the value zero. This behavior is guaranteed by the [`bc` spec][2].
4504
4505###### Array References
4506
4507Array references had to be implemented as vectors themselves because they must
4508be pushed on the vectors stacks, which, as seen above, expect vectors
4509themselves.
4510
4511So thus, references are implemented as vectors on the vector stacks. These
4512vectors are not vectors of vectors themselves; they are vectors of bytes; in
4513fact, the fact that they are byte vectors and not vector vectors is how a
4514reference vector is detected.
4515
4516These reference vectors always have the same two things pushed: a byte encoding
4517(the same way bytecode indices are) of the referenced vector's index in the
4518`arrs` vector, and a byte encoding of the referenced vectors index in the vector
4519stack.
4520
4521If an item in a referenced vector is needed, then the reference is dereferenced,
4522and the item is returned.
4523
4524If a reference vector is passed to a function that does *not* expect a
4525reference, the vector is dereferenced and a deep copy is done, in the same way
4526as vectors are copied for normal array function parameters.
4527
4528### Callbacks
4529
4530There are many places in `bc` and `dc` where function pointers are used:
4531
4532* To implement destructors in vectors. (See the [Vectors][111] section.)
4533* To select the correct lex and parse functions for `bc` and `dc`.
4534* To select the correct function to execute unary operators.
4535* To select the correct function to execute binary operators.
4536* To calculate the correct number size for binary operators.
4537* To print a "digit" of a number.
4538* To seed the pseudo-random number generator.
4539
4540And there might be more.
4541
4542In every case, they are used for reducing the amount of code. Instead of
4543`if`/`else` chains, such as:
4544
4545```
4546if (BC_IS_BC) {
4547	bc_parse_parse(vm.parse);
4548}
4549else {
4550	dc_parse_parse(vm.parse);
4551}
4552```
4553
4554The best example of this is `bc_num_binary()`. It is called by every binary
4555operator. It figures out if it needs to allocate space for a new `BcNum`. If so,
4556it allocates the space and then calls the function pointer to the *true*
4557operation.
4558
4559Doing it like that shrunk the code *immensely*. First, instead of every single
4560binary operator duplicating the allocation code, it only exists in one place.
4561Second, `bc_num_binary()` itself does not have a massive `if`/`else` chain or a
4562`switch` statement.
4563
4564But perhaps the most important use was for destructors in vectors.
4565
4566Most of the data structures in `bc` are stored in vectors. If I hadn't made
4567destructors available for vectors, then ensuring that `bc` had no memory leaks
4568would have been nigh impossible. As it is, I check `bc` for memory leaks every
4569release when I change the code, and I have not released `bc` after version
4570`1.0.0` with any memory leaks, as far as I can remember anyway.
4571
4572### Numbers
4573
4574In order to do arbitrary-precision math, as `bc` must do, there must be some way
4575of representing arbitrary-precision numbers. `BcNum` in [`include/num.h`][184]
4576is `bc`'s way of doing that.
4577
4578(Note: the word ["limb"][214] is used below; it has a specific meaning when
4579applied to arbitrary-precision numbers. It means one piece of the number. It can
4580have a single digit, which is what GNU `bc` does, or it can have multiple, which
4581is what this `bc` does.)
4582
4583This struct needs to store several things:
4584
4585* The array of limbs of the number. This is the `num` field.
4586* The location of the decimal point. This is the `rdx` (short for [radix][215])
4587  field.
4588* The number of limbs the number has. This is the `len` field.
4589* Whether the number is negative or not. This is the least significant bit of
4590  the `rdx` field. More on that later.
4591
4592In addition, `bc`'s number stores the capacity of the limb array; this is the
4593`cap` field.
4594
4595If the number needs to grow, and the capacity of the number is big enough, the
4596number is not reallocated; the number of limbs is just added to.
4597
4598There is one additional wrinkle: to make the usual operations (binary operators)
4599fast, the decimal point is *not* allowed to be in the middle of a limb; it must
4600always be between limbs, after all limbs (integer), or before all limbs (real
4601between -1 and 1).
4602
4603The reason for this is because addition, subtraction, multiplication, and
4604division expect digits to be lined up on the decimal point. By requiring that it
4605be between limbs, no extra alignment is needed, and those operations can proceed
4606without extra overhead.
4607
4608This does make some operations, most notably extending, truncating, and
4609shifting, more expensive, but the overhead is constant, and these operations are
4610usually cheap compared to the binary operators anyway.
4611
4612This also requires something else: `bc` numbers need to know *exactly* how many
4613decimal places they have after the decimal point. If the decimal point must be
4614inbetween limbs, the last decimal place could be in the middle of a limb. The
4615amount of decimal places in a number is carefully tracked and stored in the
4616`scale` field, and this number must always coincide with the `rdx` field by the
4617following formula:
4618
4619```
4620scale + (BC_BASE_DIGS - 1) / BC_BASE_DIGS == rdx >> 1
4621```
4622
4623(`BC_BASE_DIGS` is the number of decimal digits stored in one limb. It is 9 on
462464-bit systems and 4 on other systems.)
4625
4626Yes, `rdx` is shifted; that is because the negative bit is stored in the least
4627significant bit of the `rdx` field, and the actual radix (amount of limbs after
4628the decimal/radix point) is stored in the rest of the bits. This is safe because
4629`BC_BASE_DIGS` is always at least 4, which means `rdx` will always need at least
46302 bits less than `scale`.
4631
4632In addition to `rdx` always matching `scale`, another invariant is that `rdx`
4633must always be less than or equal to `len`. (Because `scale` may be greater than
4634`rdx`, `scale` does not have to be less than or equal to `len`.)
4635
4636Another invariant is that `len` must always be less than or equal to `cap`, for
4637obvious reasons.
4638
4639The last thing programmers need to know is that the limb array is stored in
4640little-endian order. This means that the last decimal places are in the limb
4641stored at index 0, and the most significant digits are stored at index `len-1`.
4642
4643This is done to make the most important operations fast. Addition and
4644subtraction are done from least significant to most significant limbs, which
4645means they can speed through memory in the way most computers are best at.
4646Multiplication does the same, sort of, and with division, it matters less.
4647Comparison does need to go backwards, but that's after exhausting all other
4648alternatives, including for example, checking the length of the integer portion
4649of each limb array.
4650
4651Finally, here are some possible special situations with numbers and what they
4652mean:
4653
4654* `len == 0`: the number equals 0.
4655* `len == 0 && scale != 0`: the number equals 0, but it has a `scale` value.
4656  This is the only case where `scale` does not have to coincide with `rdx`
4657  This can happen with division, for example, that sets a specific `scale` for
4658  the result value but may produce 0.
4659* `(rdx >> 1) == len`: the number is greater than or equal to 1, or less than or
4660  equal to -1.
4661* `(rdx >> 1) < len`: the number is greater than -1 and less than 1, not
4662  including 0, although this will be true for 0 as well. However, 0 is always
4663  assumed to be represented by `len == 0`.
4664* `(rdx >> 1) == 0`: the number is an integer. In this case, `scale` must also
4665  equal 0.
4666
4667#### Math Style
4668
4669When I wrote the math for `bc`, I adopted a certain style that, if known, will
4670make it easier to understand the code. The style follows these rules:
4671
4672* `BcNum` arguments always come before arguments of other types.
4673* Among the `BcNum` arguments, the operands always come first, and the `BcNum`
4674  where the result(s) will be stored come last.
4675* Error checking is placed first in the function.
4676* Easy cases are placed next.
4677* Preparation, such as allocating temporaries, comes next.
4678* The actual math.
4679* Cleanup and ensuring invariants.
4680
4681While these rules are not hard and fast, using them as a guide will probably
4682help.
4683
4684#### Digit Clamping
4685
4686There is a question of what to do when parsing numbers and coming across digits
4687that are greater than or equal to the current `ibase`. `bc` and `dc` do one of
4688two things:
4689
4690* Clamp the digit to the maximum correct digit, or
4691* Use the digit as-is, multiplying it by the `ibase` to the power of the digit's
4692  position (starting from 0 at the least significant digit).
4693
4694The former behavior is what GNU `bc` does, and the latter is what GNU `dc`, the
4695OpenBSD `bc`, and the OpenBSD `dc` do.
4696
4697It is possible to switch between the two with the `-c` and `-C` command-line
4698options. However, it is important for developers to understand that both
4699behaviors exist and should exist.
4700
4701The library can also do both, and it is set in a global for each thread.
4702
4703### Strings as Numbers
4704
4705Strings can be assigned to variables. This is a problem because the vectors for
4706variable stacks expect `BcNum` structs only.
4707
4708While I could have made a union, I decided that the complexity of adding an
4709entirely new type, with destructor and everything, was not worth it. Instead, I
4710took advantage of the fact that `free()`, when passed a `NULL` pointer, will do
4711nothing.
4712
4713Using that, I made it so `BcNum`'s could store strings instead. This is marked
4714by the `BcNum` having a `NULL` limb array (`num`) and a `cap` of 0 (which should
4715*never* happen with a real number, though the other fields could be 0).
4716
4717The `BcNum` stores the function that stores the string in the `rdx` field, and
4718it stores the index of the string in the `scale` field. This is used to actually
4719load the string if necessary.
4720
4721Note that historically, string information was stored in the `loc` field of
4722the `d` union in a `BcResult`. This was changed recently to standardize; now,
4723all string information are stored in the `n` field of the `d` union regardless.
4724This means that all string information is stored in `BcNum`'s. This removes
4725extra cases.
4726
4727Also, if a temp is made with a string, then the result type should still be
4728`BC_RESULT_STR`, not `BC_RESULT_TEMP`. This is to make it easier to do type
4729checks.
4730
4731### Pseudo-Random Number Generator
4732
4733In order to understand this section, I suggest you read the information in the
4734manpages about the pseudo-random number generator (PRNG) first; that will help
4735you understand the guarantees it has, which is important because this section
4736delves into implementation details.
4737
4738First, the PRNG I use is seeded; this is because most OS's have an excellent
4739cryptographically secure PRNG available via command-line, usually
4740`/dev/urandom`, but the only *seeded* PRNG available is usually `bash`'s
4741`$RANDOM`, which is essentially a wrapper around C's `rand()`.
4742
4743`rand()` is...bad. It is only guaranteed to return 15 bits of random data.
4744Obviously, getting good random data out of that would be hard with that alone,
4745but implementations also seem to be poor.
4746
4747On top of that, `bc` is an arbitrary-precision calculator; if I made it able to
4748generate random numbers, I could make it generate random numbers of any size,
4749and since it would be seeded, results would be reproducible, when wanted.
4750
4751So to get that, I needed a seeded PRNG with good characteristics. After scouring
4752the Internet, I decided on the [PCG PRNG][215], mostly because of [this blog
4753post][216]. Part of the reason was the behavior of the xoroshiro128+ author, who
4754hates on PCG and its author, but also because PCG seemed to do better when
4755tested by independent parties.
4756
4757After that decision, I faced a challenge: PCG requires 255 bits of seed: 128 for
4758the actual seed, and 127 for the "increment." (Melissa O'Neill, the PCG author,
4759likens the increment to selecting a codebook.)
4760
4761I could, of course, put the entire 255 bits into one massive arbitrary-precision
4762number; `bc` is good at that, after all. But that didn't sit right with me
4763because it would mean any seed selected by users would have the real portion
4764ignored, which is stupid in a program like `bc`.
4765
4766Instead, I decided to make the integer portion the increment (clamped down to
4767size), and the real portion the seed.
4768
4769In most cases, this would be a bad idea because you cannot, in general, know how
4770many decimal places you need to represent any number with `n` real digits in
4771base `b` in another base. However, there is an easy to how many decimal digits
4772after the decimal point it takes to represent reals of base 2 in base 10: the
4773power of two.
4774
4775It turns out that, for base 2 represented in base 10, the power of 2 is
4776*exactly* how many digits are necessary to represent *any* number `n/2^p`, where
4777`p` is the power of 2. This is because at every halving, the number of decimal
4778places increases by 1:
4779
4780```
47810.5
47820.25
47830.125
47840.0625
47850.03125
47860.015625
4787...
4788```
4789
4790This happens because because every decimal representation of `1/2^p` ends with a
47915 because 5 is half of 10. Because 5 is odd, half of it always ends with the
4792digits 25, where 2 is where the previous 5 was, and the new 5 is one decimal
4793place further right.
4794
4795So the algorithm to convert all 255 bits of the seed is as follows:
4796
47971.	Convert the increment to a `BcNum`.
47982.	Convert the seed to a `BcNum`.
47993.	Divide the seed by `2^128` with a `scale` of 128. (For 32-bit systems,
4800	substitute 64 bits for 128.)
48014.	Add the two numbers together.
4802
4803Likewise, the algorithm to convert from a user-supplied number to a seed is:
4804
48051.	Truncate a copy of the number.
48062.	Subtract the result from #1 from the original number. This gives the real
4807	portion of the number.
48083.	Clamp the result of #1 to 127 (or 63) bits. This is the increment.
48094.	Multiply the result of #2 by `2^128`.
48105.	Truncate the result of #4. This is the seed.
4811
4812#### Generating Arbitrary-Precision Numbers
4813
4814I wrote a function (`bc_rand_bounded()`) that will return unbiased results with
4815any bound below the max that PCG can generate.
4816
4817To generate an integer of arbitrary size using a bound, `bc` simply uses
4818`bc_rand_bounded()` to generate numbers with a bound `10^BC_BASE_DIGS` for as
4819many limbs as needed to satisfy the bigger bound.
4820
4821To generate numbers with arbitrary precision after the decimal point, `bc`
4822merely generates an arbitrary precision integer with the bound `10^p`, where `p`
4823is the desired number of decimal places, then divides in by `10^p` with a
4824`scale` of `p`.
4825
4826## Debug Code
4827
4828Besides building `bc` in debug mode with the `-g` flag to [`configure.sh`][69],
4829programmers can also add `-DBC_DEBUG_CODE=1` to the `CFLAGS`. This will enable
4830the inclusion of *a lot* of extra code to assist with debugging.
4831
4832For more information, see all of the code guarded by `#if BC_DEBUG_CODE` in the
4833[`include/`][212] directory and in the [`src/`][213] directory.
4834
4835Yes, all of the code is guarded by `#if` preprocessor statements; this is
4836because the code should *never* be in a release build, and by making programmers
4837add this manually (not even an option to [`configure.sh`][69]), it is easier to
4838ensure that never happens.
4839
4840However, that said, the extra debug code is useful; that was why I kept it in.
4841
4842## Performance
4843
4844While I have put in a lot of effort to make `bc` as fast as possible, there
4845might be some things users can do to speed it up without changing the code.
4846
4847First, users can probably use [profile-guided optimization][217] to optimize
4848even better, using the test suite to profile.
4849
4850Second, I included macros that might help branch placement and prediction:
4851
4852* `BC_ERR(e)`
4853* `BC_UNLIKELY(e)`
4854* `BC_NO_ERR(e)`
4855* `BC_LIKELY(e)`
4856
4857`BC_ERR` is the same as `BC_UNLIKELY`, and `BC_NO_ERR` is the same as
4858`BC_LIKELY`; I just added them to also document branches that lead to error
4859conditions or *away* from error conditions.
4860
4861Anyway, if `BC_LIKELY` and `BC_UNLIKELY` are not defined during compilation,
4862they expand to nothing but the argument they were given.
4863
4864They can, however, be defined to `__builtin_expect((e), 1)` and
4865`__builtin_expect((e), 0)`, respectively, on GCC and Clang for better branch
4866prediction and placement. (For more information about `__builtin_expect()` see
4867the [GCC documentation][218].)
4868
4869There might be other compilers that can take advantage of that, but I don't know
4870anything about that.
4871
4872Also, as stated in the [build manual][219], link-time optimization is excellent
4873at optimizing this `bc`. Use it.
4874
4875### Benchmarks
4876
4877To help programmers improve performance, I have built and assembled
4878infrastructure to make benchmarking easy.
4879
4880First, in order to easily run benchmarks, I created
4881[`scripts/benchmark.sh`][220].
4882
4883Second, I copied and adapted [`ministat.c`][223] [from FreeBSD][221], to make it
4884easier to judge whether the results are significant or not.
4885
4886Third, I made the `make` clean target `make clean_benchmarks`, to clean
4887`scripts/ministat` and the generated benchmark files.
4888
4889Fourth, I made it so [`scripts/benchmark.sh`][220] outputs the timing and memory
4890data in a format that is easy for `scripts/ministat` to digest.
4891
4892To add a benchmark, add a script in the right directory to generate the
4893benchmark. Yes, generate.
4894
4895All of the benchmarks are generated first, from `.bc` and `.dc` files in the
4896[`benchmarks/bc/`][91] and [`benchmarks/dc/`][224]. This is so that massive
4897amounts of data can be generated and then pushed through the calculators.
4898
4899If you need to benchmark `bc` or `dc` with simple loops, have the generator
4900files simply print the loop code.
4901
4902### Caching of Numbers
4903
4904In order to provide some performance boost, `bc` tries to reuse old `BcNum`'s
4905that have the default capacity (`BC_NUM_DEF_SIZE`).
4906
4907It does this by allowing `bc_num_free()` to put the limb array onto a
4908statically-allocated stack (it's just a global array with a set size). Then,
4909when a `BcNum` with the default capacity is needed, `bc_num_init()` asks if any
4910are available. If the answer is yes, the one on top of the stack is returned.
4911Otherwise, `NULL` is returned, and `bc_num_free()` knows it needs to `malloc()`
4912a new limb array.
4913
4914When the stack is filled, any numbers that `bc` attempts to put on it are just
4915freed.
4916
4917This setup saved a few percent in my testing for version [3.0.0][32], which is
4918when I added it.
4919
4920## `bcl`
4921
4922At the request of one of my biggest users, I spent the time to make a build mode
4923where the number and math code of `bc` could be wrapped into a library, which I
4924called `bcl`.
4925
4926This mode is exclusive; `bc` and `dc` themselves are *not* built when building
4927`bcl`.
4928
4929The only things in the `bc` math code that is not included is:
4930
4931* Printing newlines (clients do not care about `bc`'s line lenth restriction).
4932* `dc`'s stream print.
4933
4934Even the [pseudo-random number generator][179] is included, with extra support
4935for generating real numbers with it. (In `bc`, such support is in
4936[`lib2.bc`][26].)
4937
4938### Signal Handling
4939
4940`bcl` does not have the capability for signal handling (as of version `6.0.0`).
4941
4942### Threads
4943
4944It is possible to use `bcl` across multiple threads. However, `bcl` must be
4945initialized on each thread where it will be used.
4946
4947This is because `bcl` uses thread-specific data to store the "globals" for each
4948thread. This is to ensure that threads do not stomp on each other's numbers or
4949other data structures.
4950
4951### Contexts
4952
4953Contexts were an idea by the same user that requested `bcl`. They are meant to
4954make it so multiple clients in one program can keep their data separate from
4955each other.
4956
4957### Numbers
4958
4959Numbers in `bcl` are literally indices into an encapsulated array of numbers,
4960hidden in the context. These indices are then passed to clients to refer to
4961numbers later.
4962
4963### Operand Consumption
4964
4965Most math functions in `bcl` "consume" their operand arguments; the arguments
4966are freed, whether or not an error is returned.
4967
4968This is to make it easy to implement math code, like this:
4969
4970```
4971n = bcl_add(bcl_mul(a, b), bcl_div(c, d));
4972```
4973
4974If numbers need to be preserved, they can be with `bcl_dup()`:
4975
4976```
4977n = bcl_add(bcl_mul(bcl_dup(a), bc_dup(b)), bcl_div(bcl_dup(c), bcl_dup(d)));
4978```
4979
4980### Errors
4981
4982Errors can be encoded in the indices representing numbers, and where necessary,
4983clients are responsible for checking those errors.
4984
4985The encoding of errors is this: if an error happens, the value `0-error` is
4986returned. To decode, do the exact same thing. Thus, any index above
4987`0-num_errors` is an error.
4988
4989If an index that represents an error is passed to a math function, that function
4990propagates the error to its result and does not perform the math operation.
4991
4992All of this is to, once again, make it easy to implement the math code as above.
4993
4994However, where possible, errors are returned directly.
4995
4996[1]: https://en.wikipedia.org/wiki/Bus_factor
4997[2]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#top
4998[3]: https://en.wikipedia.org/wiki/Dc_(Unix)
4999[4]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5000[5]: ./bc/A.1.md#standard-library
5001[6]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc
5002[7]: ./bc/A.1.md#extended-library
5003[8]: #libbc-2
5004[9]: #strgensh
5005[10]: https://vimeo.com/230142234
5006[11]: https://gavinhoward.com/2019/12/values-for-yao/
5007[12]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
5008[13]: ./build.md#cross-compiling
5009[14]: ./build.md
5010[15]: #strgenc
5011[16]: http://landley.net/toybox/about.html
5012[17]: https://www.busybox.net/
5013[18]: https://en.wikipedia.org/wiki/Karatsuba_algorithm
5014[19]: https://clang-analyzer.llvm.org/scan-build.html
5015[20]: https://www.valgrind.org/
5016[21]: https://clang.llvm.org/docs/AddressSanitizer.html
5017[22]: https://gavinhoward.com/2019/11/finishing-software/
5018[23]: #history
5019[24]: https://clang.llvm.org/docs/ClangFormat.html
5020[25]: ./algorithms.md
5021[26]: #lib2bc
5022[27]: #vmh
5023[28]: https://github.com/rain-1/linenoise-mob
5024[29]: https://github.com/antirez/linenoise
5025[30]: #bclh
5026[31]: #argsh
5027[32]: ../NEWS.md#3-0-0
5028[33]: ../NEWS.md
5029[34]: https://github.com/skeeto/optparse
5030[35]: #opth
5031[36]: #historyh
5032[37]: #randh
5033[38]: #langh
5034[39]: #numc
5035[40]: #bcc
5036[41]: #bc_lexc
5037[42]: #bc_parsec
5038[43]: #libraryc
5039[44]: #dcc
5040[45]: #dc_lexc
5041[46]: #dc_parsec
5042[47]: #filec
5043[48]: #historyc
5044[49]: #langc
5045[50]: #lexc
5046[51]: #optc
5047[52]: #parsec
5048[53]: #programc
5049[54]: #randc
5050[55]: #fileh
5051[56]: #readc
5052[57]: #programh
5053[58]: #vmc
5054[59]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/gencat.html#top
5055[60]: #manpagesh
5056[61]: #bcl3md
5057[62]: #bcl3
5058[63]: #bclvcxproj
5059[64]: #bclvcxprojfilters
5060[65]: #bclsln
5061[66]: #bcvcxproj
5062[67]: #bcvcxprojfilters
5063[68]: #bcsln
5064[69]: #configuresh
5065[70]: #makefilein
5066[71]: #functionsh
5067[72]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#top
5068[73]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18
5069[74]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html#top
5070[75]: #versionh
5071[76]: ##posix-shell-scripts
5072[77]: #tests
5073[78]: #karatsubapy
5074[79]: #bc-1
5075[80]: #dc-1
5076[81]: ./build.md#build-type
5077[82]: #fuzzing-1
5078[83]: #releasesh
5079[84]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
5080[85]: #locales-1
5081[86]: #manuals-1
5082[87]: #locale_installsh
5083[88]: #locale_uninstallsh
5084[89]: #bc1mdin
5085[90]: #dc1mdin
5086[91]: #bc
5087[92]: https://pandoc.org/
5088[93]: #release_settingstxt
5089[94]: #aflpy
5090[95]: #randmathpy
5091[96]: #caching-of-numbers
5092[97]: #error-handling
5093[98]: #radamsatxt
5094[99]: https://gitlab.com/akihe/radamsa
5095[100]: #radamsash
5096[101]: https://musl.libc.org/
5097[102]: ./build.md#settings
5098[103]: #test_settingstxt
5099[104]: #test_settingssh
5100[105]: #functionssh
5101[106]: #bch
5102[107]: https://en.wikipedia.org/wiki/Infix_notation
5103[108]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5104[109]: https://en.wikipedia.org/wiki/Shunting-yard_algorithm
5105[110]: #bc-parsing
5106[111]: #vectors
5107[112]: https://git.yzena.com/Yzena/Yc/src/branch/master/include/yc/vector.h
5108[113]: https://en.wikipedia.org/wiki/Signal_(IPC)
5109[114]: #custom-io
5110[115]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03
5111[116]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html
5112[117]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html
5113[118]: https://www.youtube.com/watch?v=4PaWFYm0kEw
5114[119]: #fuzz_prepsh
5115[120]: #bc_aflyaml
5116[121]: #bc_afl_continueyaml
5117[122]: https://github.com/tmux/tmux
5118[123]: https://tmuxp.git-pull.com/
5119[124]: #test-suite
5120[125]: https://aflplus.plus/
5121[126]: #link-time-optimization
5122[127]: #fuzzing-performance
5123[128]: #radamsa
5124[129]: #afl-quickstart
5125[130]: #convenience
5126[131]: #datac
5127[132]: https://git.gavinhoward.com/gavin/vim-bc
5128[133]: https://git.gavinhoward.com/gavin/bc_libs
5129[134]: #debugging
5130[135]: #asserts
5131[136]: #portability
5132[137]: https://pexpect.readthedocs.io/en/stable/
5133[138]: #test-suite-portability
5134[139]: #historypy
5135[140]: #historysh
5136[141]: #group-tests
5137[142]: #build-system
5138[143]: #generated-tests
5139[144]: #benchmarks-1
5140[145]: #gen
5141[146]: #test-coverage
5142[147]: #integration-with-the-build-system
5143[148]: #test-scripts
5144[149]: #standard-tests
5145[150]: #script-tests
5146[151]: #error-tests
5147[152]: #stdin-tests
5148[153]: #read-tests
5149[154]: #other-tests
5150[155]: #history-tests
5151[156]: #bcl
5152[157]: #bcl-test
5153[158]: #bclc
5154[159]: #valgrind
5155[160]: #addresssanitizer-and-friends
5156[161]: #bc-2
5157[162]: #dc-2
5158[163]: #alltxt-1
5159[164]: #errorstxt
5160[165]: #posix_errorstxt
5161[166]: #timeconstsh
5162[167]: #alltxt-3
5163[168]: #errorstxt-1
5164[169]: #scripts-1
5165[170]: #scripts-2
5166[171]: #alltxt-2
5167[172]: #alltxt-4
5168[173]: #async-signal-safe-signal-handling
5169[174]: #vectorh
5170[175]: #read_errorstxt
5171[176]: #statush
5172[177]: #numbers
5173[178]: #math-style
5174[179]: #pseudo-random-number-generator
5175[180]: #lexh
5176[181]: #parseh
5177[182]: #dch
5178[183]: #libraryh
5179[184]: #numh
5180[185]: #readh
5181[186]: #maps
5182[187]: #slabs-and-slab-vectors
5183[188]: ./build.md#extra-math
5184[189]: #command-line-history
5185[190]: #scriptsed
5186[191]: #linux-timeconstbc-script
5187[192]: #corpuses
5188[193]: ./build.md#history
5189[194]: https://www.valgrind.org/docs/manual/mc-manual.html
5190[195]: #othersh
5191[196]: https://scan.coverity.com/
5192[197]: https://clang-analyzer.llvm.org/
5193[198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it
5194[199]: https://cr.yp.to/docs/selfpipe.html
5195[200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph
5196[201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator
5197[202]: https://en.wikipedia.org/wiki/Tail_call
5198[203]: https://en.wikipedia.org/wiki/Functional_programming_language
5199[204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
5200[205]: #mainc
5201[206]: #argc
5202[207]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
5203[208]: #functions
5204[209]: #parsing
5205[210]: https://en.wikipedia.org/wiki/Stack_machine
5206[211]: https://en.wikipedia.org/wiki/Register_machine
5207[212]: #include
5208[213]: #src
5209[214]: https://gmplib.org/manual/Nomenclature-and-Types
5210[215]: https://en.wikipedia.org/wiki/Radix_point
5211[216]: #main-and-read-functions
5212[215]: https://www.pcg-random.org/
5213[216]: https://lemire.me/blog/2017/08/22/testing-non-cryptographic-random-number-generators-my-results/
5214[217]: https://en.wikipedia.org/wiki/Profile-guided_optimization
5215[218]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
5216[219]: ./build.md#optimization
5217[220]: #benchmarksh
5218[221]: https://cgit.freebsd.org/src/tree/usr.bin/ministat/ministat.c
5219[222]: https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html
5220[223]: #ministatc
5221[224]: #dc
5222[225]: #allsh
5223[226]: #errorssh
5224[227]: #errorsh
5225[228]: #vectorc
5226