• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Development
2
3Updated: 27 Mar 2023
4
5This document is meant for the day when I (Gavin D. Howard) get [hit by a
6bus][1]. In other words, it's meant to make the [bus factor][1] a non-issue.
7
8This document is supposed to contain all of the knowledge necessary to develop
9`bc` and `dc`.
10
11In addition, this document is meant to add to the [oral tradition of software
12engineering][118], as described by Bryan Cantrill.
13
14This document will reference other parts of the repository. That is so a lot of
15the documentation can be closest to the part of the repo where it is actually
16necessary.
17
18## What Is It?
19
20This repository contains an implementation of both [POSIX `bc`][2] and [Unix
21`dc`][3].
22
23POSIX `bc` is a standard utility required for POSIX systems. `dc` is a
24historical utility that was included in early Unix and even predates both Unix
25and C. They both are arbitrary-precision command-line calculators with their own
26programming languages. `bc`'s language looks similar to C, with infix notation
27and including functions, while `dc` uses [Reverse Polish Notation][4] and allows
28the user to execute strings as though they were functions.
29
30In addition, it is also possible to build the arbitrary-precision math as a
31library, named [`bcl`][156].
32
33**Note**: for ease, I will refer to both programs as `bc` in this document.
34However, if I say "just `bc`," I am referring to just `bc`, and if I say `dc`, I
35am referring to just `dc`.
36
37### History
38
39This project started in January 2018 when a certain individual on IRC, hearing
40that I knew how to write parsers, asked me to write a `bc` parser for his math
41library. I did so. I thought about writing my own math library, but he
42disparaged my programming skills and made me think that I couldn't do it.
43
44However, he took so long to do it that I eventually decided to give it a try and
45had a working math portion in two weeks. It taught me that I should not listen
46to such people.
47
48From that point, I decided to make it an extreme learning experience about how
49to write quality software.
50
51That individual's main goal had been to get his `bc` into [toybox][16], and I
52managed to get my own `bc` in. I also got it in [busybox][17].
53
54Eventually, in late 2018, I also decided to try my hand at implementing
55[Karatsuba multiplication][18], an algorithm that that unnamed individual
56claimed I could never implement. It took me a bit, but I did it.
57
58This project became a passion project for me, and I continued. In mid-2019,
59Stefan Eßer suggested I improve performance by putting more than 1 digit in each
60section of the numbers. After I showed immaturity because of some burnout, I
61implemented his suggestion, and the results were incredible.
62
63Since that time, I have gradually been improving the `bc` as I have learned more
64about things like fuzzing, [`scan-build`][19], [valgrind][20],
65[AddressSanitizer][21] (and the other sanitizers), and many other things.
66
67One of my happiest moments was when my `bc` was made the default in FreeBSD.
68Another happiest moment was when I found out that my `bc` had shipped with Mac
69OSX Ventura, without my knowledge.
70
71But since I believe in [finishing the software I write][22], I have done less
72work on `bc` over time, though there are still times when I put a lot of effort
73in, such as now (17 June 2021), when I am attempting to convince OpenBSD to use
74my `bc`.
75
76And that is why I am writing this document: someday, someone else is going to
77want to change my code, and this document is my attempt to make it as simple as
78possible.
79
80### Values
81
82[According to Bryan Cantrill][10], all software has values. I think he's
83correct, though I [added one value for programming languages in particular][11].
84
85However, for `bc`, his original list will do:
86
87* Approachability
88* Availability
89* Compatibility
90* Composability
91* Debuggability
92* Expressiveness
93* Extensibility
94* Interoperability
95* Integrity
96* Maintainability
97* Measurability
98* Operability
99* Performance
100* Portability
101* Resiliency
102* Rigor
103* Robustness
104* Safety
105* Security
106* Simplicity
107* Stability
108* Thoroughness
109* Transparency
110* Velocity
111
112There are several values that don't apply. The reason they don't apply is
113because `bc` and `dc` are existing utilities; this is just another
114reimplementation. The designs of `bc` and `dc` are set in stone; there is
115nothing we can do to change them, so let's get rid of those values that would
116apply to their design:
117
118* Compatibility
119* Integrity
120* Maintainability
121* Measurability
122* Performance
123* Portability
124* Resiliency
125* Rigor
126* Robustness
127* Safety
128* Security
129* Simplicity
130* Stability
131* Thoroughness
132* Transparency
133
134Furthermore, some of the remaining ones don't matter to me, so let me get rid of
135those and order the rest according to my *actual* values for this project:
136
137* Robustness
138* Stability
139* Portability
140* Compatibility
141* Performance
142* Security
143* Simplicity
144
145First is **robustness**. This `bc` and `dc` should be robust, accepting any
146input, never crashing, and instead, returning an error.
147
148Closely related to that is **stability**. The execution of `bc` and `dc` should
149be deterministic and never change for the same inputs, including the
150pseudo-random number generator (for the same seed).
151
152Third is **portability**. These programs should run everywhere that POSIX
153exists, as well as Windows. This means that just about every person on the
154planet will have access to these programs.
155
156Next is **compatibility**. These programs should, as much as possible, be
157compatible with other existing implementations and standards.
158
159Then we come to **performance**. A calculator is only usable if it's fast, so
160these programs should run as fast as possible.
161
162After that is **security**. These programs should *never* be the reason a user's
163computer is compromised.
164
165And finally, **simplicity**. Where possible, the code should be simple, while
166deferring to the above values.
167
168Keep these values in mind for the rest of this document, and for exploring any
169other part of this repo.
170
171#### Portability
172
173But before I go on, I want to talk about portability in particular.
174
175Most of these principles just require good attention and care, but portability
176is different. Sometimes, it requires pulling in code from other places and
177adapting it. In other words, sometimes I need to duplicate and adapt code.
178
179This happened in a few cases:
180
181* Option parsing (see [`include/opt.h`][35]).
182* History (see [`include/history.h`][36]).
183* Pseudo-Random Number Generator (see [`include/rand.h`][37]).
184
185This was done because I decided to ensure that `bc`'s dependencies were
186basically zero. In particular, either users have a normal install of Windows or
187they have a POSIX system.
188
189A POSIX system limited me to C99, `sh`, and zero external dependencies. That
190last item is why I pull code into `bc`: if I pull it in, it's not an external
191dependency.
192
193That's why `bc` has duplicated code. Remove it, and you risk `bc` not being
194portable to some platforms.
195
196## Suggested Course
197
198I do have a suggested course for programmers to follow when trying to understand
199this codebase. The order is this:
200
2011.	`bc` Spec.
2022.	Manpages.
2033.	Test suite.
2044.	Understand the build.
2055.	Algorithms manual.
2066.	Code concepts.
2077.	Repo structure.
2088.	Headers.
2099.	Source code.
210
211This order roughly follows this order:
212
2131. High-level requirements
2142. Low-level requirements
2153. High-level implementation
2164. Low-level implementation
217
218In other words, first understand what the code is *supposed* to do, then
219understand the code itself.
220
221## Useful External Tools
222
223I have a few tools external to `bc` that are useful:
224
225* A [Vim plugin with syntax files made specifically for my `bc` and `dc`][132].
226* A [repo of `bc` and `dc` scripts][133].
227* A set of `bash` aliases (see below).
228* A `.bcrc` file with items useful for my `bash` setup (see below).
229
230My `bash` aliases are these:
231
232```sh
233alias makej='make -j16'
234alias mcmake='make clean && make'
235alias mcmakej='make clean && make -j16'
236alias bcdebug='CPPFLAGS="-DBC_DEBUG_CODE=1" CFLAGS="-Weverything -Wno-padded \
237    -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
238    -Wno-unreachable-code-return -Wno-missing-noreturn \
239    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
240    -pedantic -std=c99" ./configure.sh'
241alias bcconfig='CFLAGS="-Weverything -Wno-padded -Wno-switch-enum \
242    -Wno-format-nonliteral -Wno-cast-align -Wno-unreachable-code-return \
243    -Wno-missing-noreturn -Wno-disabled-macro-expansion -Wno-unreachable-code \
244    -Wall -Wextra -pedantic -std=c99" ./configure.sh'
245alias bcnoassert='CPPFLAGS="-DNDEBUG" CFLAGS="-Weverything -Wno-padded \
246    -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
247    -Wno-unreachable-code-return -Wno-missing-noreturn \
248    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
249    -pedantic -std=c99" ./configure.sh'
250alias bcdebugnoassert='CPPFLAGS="-DNDEBUG -DBC_DEBUG_CODE=1" \
251    CFLAGS="-Weverything -Wno-padded -Wno-switch-enum -Wno-format-nonliteral \
252    -Wno-cast-align -Wno-unreachable-code-return -Wno-missing-noreturn \
253    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
254    -pedantic -std=c99" ./configure.sh'
255alias bcunset='unset BC_LINE_LENGTH && unset BC_ENV_ARGS'
256```
257
258`makej` runs `make` with all of my cores.
259
260`mcmake` runs `make clean` before running `make`. It will take a target on the
261command-line.
262
263`mcmakej` is a combination of `makej` and `mcmake`.
264
265`bcdebug` configures `bc` for a full debug build, including `BC_DEBUG_CODE` (see
266[Debugging][134] below).
267
268`bcconfig` configures `bc` with Clang (Clang is my personal default compiler)
269using full warnings, with a few really loud and useless warnings turned off.
270
271`bcnoassert` configures `bc` to not have asserts built in.
272
273`bcdebugnoassert` is like `bcnoassert`, except it also configures `bc` for debug
274mode.
275
276`bcunset` unsets my personal `bc` environment variables, which are set to:
277
278```sh
279export BC_ENV_ARGS="-l $HOME/.bcrc"
280export BC_LINE_LENGTH="74"
281```
282
283Unsetting these environment variables are necessary for running
284[`scripts/release.sh`][83] because otherwise, it will error when attempting to
285run `bc -s` on my `$HOME/.bcrc`.
286
287Speaking of which, the contents of that file are:
288
289```bc
290define void print_time_unit(t){
291	if(t<10)print "0"
292	if(t<1&&t)print "0"
293	print t,":"
294}
295define void sec2time(t){
296	auto s,m,h,d,r
297	r=scale
298	scale=0
299	t=abs(t)
300	s=t%60
301	t-=s
302	m=t/60%60
303	t-=m
304	h=t/3600%24
305	t-=h
306	d=t/86400
307	if(d)print_time_unit(d)
308	if(h)print_time_unit(h)
309	print_time_unit(m)
310	if(s<10)print "0"
311	if(s<1&&s)print "0"
312	s
313	scale=r
314}
315define minutes(secs){
316	return secs/60;
317}
318define hours(secs){
319	return secs/3600;
320}
321define days(secs){
322	return secs/3600/24;
323}
324define years(secs){
325	return secs/3600/24/365.25;
326}
327define fbrand(b,p){
328	auto l,s,t
329	b=abs(b)$
330	if(b<2)b=2
331	s=scale
332	t=b^abs(p)$
333	l=ceil(l2(t),0)
334	if(l>scale)scale=l
335	t=irand(t)/t
336	scale=s
337	return t
338}
339define ifbrand(i,b,p){return irand(abs(i)$)+fbrand(b,p)}
340```
341
342This allows me to use `bc` as part of my `bash` prompt.
343
344## Code Style
345
346The code style for `bc` is...weird, and that comes from historical accident.
347
348In [History][23], I mentioned how I got my `bc` in [toybox][16]. Well, in order
349to do that, my `bc` originally had toybox style. Eventually, I changed to using
350tabs, and assuming they were 4 spaces wide, but other than that, I basically
351kept the same style, with some exceptions that are more or less dependent on my
352taste.
353
354However, I later managed to get [ClangFormat][24] to work, so I changed the
355style to that.
356
357### ClangFormat
358
359The style is now defined as whatever [ClangFormat][24] outputs using the
360existing `.clang-format` file. More precisely, the style is whatever is output
361when the following command is run in the root directory:
362
363```
364./scripts/format.sh
365```
366
367### Historical Style
368
369The code style used to be:
370
371* Tabs are 4 spaces.
372* Tabs are used at the beginning of lines for indent.
373* Spaces are used for alignment.
374* Lines are limited to 80 characters, period.
375* Pointer asterisk (`*`) goes with the variable (on the right), not the type,
376  unless it is for a pointer type returned from a function.
377* The opening brace is put on the same line as the header for the function,
378  loop, or `if` statement.
379* Unless the header is more than one line, in which case the opening brace is
380  put on its own line.
381* If the opening brace is put on its own line, there is no blank line after it.
382* If the opening brace is *not* put on its own line, there *is* a blank line
383  after it, *unless* the block is only one or two lines long.
384* Code lines are grouped into what I call "paragraphs." Basically, lines that
385  seem like they should go together are grouped together. This one comes down
386  to judgment.
387* Bodies of `if` statements, `else` statements, and loops that are one line
388  long are put on the same line as the statement, unless the header is more than
389  one line long, and/or, the header and body cannot fit into 80 characters with
390  a space inbetween them.
391* If single-line bodies are on a separate line from their headers, and the
392  headers are only a single line, then no braces are used.
393* However, braces are *always* used if they contain another `if` statement or
394  loop.
395* Loops with empty bodies are ended with a semicolon.
396* Expressions that return a boolean value are surrounded by paretheses.
397* Macro backslashes are aligned as far to the left as possible.
398* Binary operators have spaces on both sides.
399* If a line with binary operators overflows 80 characters, a newline is inserted
400  *after* binary operators.
401* Function modifiers and return types are on the same line as the function name.
402* With one exception, `goto`'s are only used to jump to the end of a function
403  for cleanup.
404* All structs, enums, and unions are `typedef`'ed.
405* All constant data is in one file: [`src/data.c`][131], but the corresponding
406  `extern` declarations are in the appropriate header file.
407* All local variables are declared at the beginning of the scope where they
408  appear. They may be initialized at that point, if it does not invoke UB or
409  otherwise cause bugs.
410* All precondition `assert()`'s (see [Asserts][135]) come *after* local variable
411  declarations.
412* Besides short `if` statements and loops, there should *never* be more than one
413  statement per line.
414
415## Repo Structure
416
417Functions are documented with Doxygen-style doc comments. Functions that appear
418in headers are documented in the headers, while static functions are documented
419where they are defined.
420
421### `configure`
422
423A symlink to [`configure.sh`][69].
424
425### `configure.sh`
426
427This is the script to configure `bc` and [`bcl`][156] for building.
428
429This `bc` has a custom build system. The reason for this is because of
430[*portability*][136].
431
432If `bc` used an outside build system, that build system would be an external
433dependency. Thus, I had to write a build system for `bc` that used nothing but
434C99 and POSIX utilities.
435
436One of those utilities is POSIX `sh`, which technically implements a
437Turing-complete programming language. It's a terrible one, but it works.
438
439A user that wants to build `bc` on a POSIX system (not Windows) first runs
440`configure.sh` with the options he wants. `configure.sh` uses those options and
441the `Makefile` template ([`Makefile.in`][70]) to generate an actual valid
442`Makefile`. Then `make` can do the rest.
443
444For more information about the build process, see the [Build System][142]
445section and the [build manual][14].
446
447For more information about shell scripts, see [POSIX Shell Scripts][76].
448
449`configure.sh` does the following:
450
4511.	It processes command-line arguments and figure out what the user wants to
452	build.
4532.	It reads in [`Makefile.in`][70].
4543.	One-by-one, it replaces placeholders (in [`Makefile.in`][70]) of the form
455	`%%<placeholder_name>%%` based on the [build type][81].
4564.	It appends a list of file targets based on the [build type][81].
4575.	It appends the correct test targets.
4586.	It copies the correct manpage and markdown manual for `bc` and `dc` into a
459	location from which they can be copied for install.
4607.	It does a `make clean` to reset the build state.
461
462### `.gitattributes`
463
464A `.gitattributes` file. This is needed to preserve the `crlf` line endings in
465the Visual Studio files.
466
467### `.gitignore`
468
469The `.gitignore`
470
471### `LICENSE.md`
472
473This is the `LICENSE` file, including the licenses of various software that I
474have borrowed.
475
476### `Makefile.in`
477
478This is the `Makefile` template for [`configure.sh`][69] to use for generating a
479`Makefile`.
480
481For more information, see [`configure.sh`][69], the [Build System][142] section,
482and the [build manual][14].
483
484Because of [portability][136], the generated `Makefile.in` should be a pure
485[POSIX `make`][74]-compatible `Makefile` (minus the placeholders). Here are a
486few snares for the unwary programmer in this file:
487
4881.	No extensions allowed, including and especially GNU extensions.
4892.	If new headers are added, they must also be added to `Makefile.in`.
4903.	Don't delete the `.POSIX:` empty target at the top; that's what tells `make`
491	implementations that pure [POSIX `make`][74] is needed.
492
493In particular, there is no way to set up variables other than the `=` operator.
494There are no conditionals, so all of the conditional stuff must be in
495[`configure.sh`][69]. This is, in fact, why [`configure.sh`][69] exists in the
496first place: [POSIX `make`][74] is barebones and only does a build with no
497configuration.
498
499### `NEWS.md`
500
501A running changelog with an entry for each version. This should be updated at
502the same time that [`include/version.h`][75] is.
503
504### `NOTICE.md`
505
506The `NOTICE` file with proper attributions.
507
508### `README.md`
509
510The `README`. Read it.
511
512### `benchmarks/`
513
514The folder containing files to generate benchmarks.
515
516Each of these files was made, at one time or another, to benchmark some
517experimental feature, so if it seems there is no rhyme or reason to these
518benchmarks, it is because there is none, besides historical accident.
519
520#### `bc/`
521
522The folder containing `bc` scripts to generate `bc` benchmarks.
523
524##### `add.bc`
525
526The file to generate the benchmark to benchmark addition in `bc`.
527
528##### `arrays_and_constants.bc`
529
530The file to generate the benchmark to benchmark `bc` using lots of array names
531and constants.
532
533##### `arrays.bc`
534
535The file to generate the benchmark to benchmark `bc` using lots of array names.
536
537##### `constants.bc`
538
539The file to generate the benchmark to benchmark `bc` using lots of constants.
540
541##### `divide.bc`
542
543The file to generate the benchmark to benchmark division in `bc`.
544
545##### `functions.bc`
546
547The file to generate the benchmark to benchmark `bc` using lots of functions.
548
549##### `irand_long.bc`
550
551The file to generate the benchmark to benchmark `bc` using lots of calls to
552`irand()` with large bounds.
553
554##### `irand_short.bc`
555
556The file to generate the benchmark to benchmark `bc` using lots of calls to
557`irand()` with small bounds.
558
559##### `lib.bc`
560
561The file to generate the benchmark to benchmark `bc` using lots of calls to
562heavy functions in `lib.bc`.
563
564##### `multiply.bc`
565
566The file to generate the benchmark to benchmark multiplication in `bc`.
567
568##### `postfix_incdec.bc`
569
570The file to generate the benchmark to benchmark `bc` using postfix increment and
571decrement operators.
572
573##### `power.bc`
574
575The file to generate the benchmark to benchmark power (exponentiation) in `bc`.
576
577##### `subtract.bc`
578
579The file to generate the benchmark to benchmark subtraction in `bc`.
580
581##### `strings.bc`
582
583The file to generate the benchmark to benchmark `bc` using lots of strings.
584
585#### `dc/`
586
587The folder containing `dc` scripts to generate `dc` benchmarks.
588
589##### `modexp.dc`
590
591The file to generate the benchmark to benchmark modular exponentiation in `dc`.
592
593### `gen/`
594
595A folder containing the files necessary to generate C strings that will be
596embedded in the executable.
597
598All of the files in this folder have license headers, but the program and script
599that can generate strings from them include code to strip the license header out
600before strings are generated.
601
602#### `bc_help.txt`
603
604A text file containing the text displayed for `bc -h` or `bc --help`.
605
606This text just contains the command-line options and a short summary of the
607differences from GNU and BSD `bc`'s. It also directs users to the manpage.
608
609The reason for this is because otherwise, the help would be far too long to be
610useful.
611
612**Warning**: The text has some `printf()` format specifiers. You need to make
613sure the format specifiers match the arguments given to `bc_file_printf()`.
614
615#### `dc_help.txt`
616
617A text file containing the text displayed for `dc -h` or `dc --help`.
618
619This text just contains the command-line options and a short summary of the
620differences from GNU and BSD `dc`'s. It also directs users to the manpage.
621
622The reason for this is because otherwise, the help would be far too long to be
623useful.
624
625**Warning**: The text has some `printf()` format specifiers. You need to make
626sure the format specifiers match the arguments given to `bc_file_printf()`.
627
628#### `lib.bc`
629
630A `bc` script containing the [standard math library][5] required by POSIX. See
631the [POSIX standard][2] for what is required.
632
633This file does not have any extraneous whitespace, except for tabs at the
634beginning of lines. That is because this data goes directly into the binary,
635and whitespace is extra bytes in the binary. Thus, not having any extra
636whitespace shrinks the resulting binary.
637
638However, tabs at the beginning of lines are kept for two reasons:
639
6401.	Readability. (This file is still code.)
6412.	The program and script that generate strings from this file can remove
642	tabs at the beginning of lines.
643
644For more details about the algorithms used, see the [algorithms manual][25].
645
646However, there are a few snares for unwary programmers.
647
648First, all constants must be one digit. This is because otherwise, multi-digit
649constants could be interpreted wrongly if the user uses a different `ibase`.
650This does not happen with single-digit numbers because they are guaranteed to be
651interpreted what number they would be if the `ibase` was as high as possible.
652
653This is why `A` is used in the library instead of `10`, and things like `2*9*A`
654for `180` in [`lib2.bc`][26].
655
656As an alternative, you can set `ibase` in the function, but if you do, make sure
657to set it with a single-digit number and beware the snare below...
658
659Second, `scale`, `ibase`, and `obase` must be safely restored before returning
660from any function in the library. This is because without the `-g` option,
661functions are allowed to change any of the globals.
662
663Third, all local variables in a function must be declared in an `auto` statement
664before doing anything else. This includes arrays. However, function parameters
665are considered predeclared.
666
667Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code
668must not use *any* extensions. It has to work when users use the `-s` or `-w`
669flags.
670
671#### `lib2.bc`
672
673A `bc` script containing the [extended math library][7].
674
675Like [`lib.bc`][8], and for the same reasons, this file should have no
676extraneous whitespace, except for tabs at the beginning of lines.
677
678For more details about the algorithms used, see the [algorithms manual][25].
679
680Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary
681programmers when writing code for `lib2.bc`.
682
683#### `strgen.c`
684
685Code for the program to generate C strings from text files. This is the original
686program, although [`strgen.sh`][9] was added later.
687
688The reason I used C here is because even though I knew `sh` would be available
689(it must be available to run `configure.sh`), I didn't know how to do what I
690needed to do with POSIX utilities and `sh`.
691
692Later, [`strgen.sh`][9] was contributed by Stefan Eßer of FreeBSD, showing that
693it *could* be done with `sh` and POSIX utilities.
694
695However, `strgen.c` exists *still* exists because the versions generated by
696[`strgen.sh`][9] may technically hit an environmental limit. (See the [draft C99
697standard][12], page 21.) This is because [`strgen.sh`][9] generates string
698literals, and in C99, string literals can be limited to 4095 characters, and
699`gen/lib2.bc` is above that.
700
701Fortunately, the limit for "objects," which include `char` arrays, is much
702bigger: 65535 bytes, so that's what `strgen.c` generates.
703
704However, the existence of `strgen.c` does come with a cost: the build needs C99
705compiler that targets the host machine. For more information, see the ["Cross
706Compiling" section][13] of the [build manual][14].
707
708Read the comments in `strgen.c` for more detail about it, the arguments it
709takes, and how it works.
710
711#### `strgen.sh`
712
713An `sh` script that will generate C strings that uses only POSIX utilities. This
714exists for those situations where a host C99 compiler is not available, and the
715environment limits mentioned above in [`strgen.c`][15] don't matter.
716
717`strgen.sh` takes the same arguments as [`strgen.c`][15], and the arguments mean
718the exact same things, so see the comments in [`strgen.c`][15] for more detail
719about that, and see the comments in `strgen.sh` for more details about it and
720how it works.
721
722For more information about shell scripts, see [POSIX Shell Scripts][76].
723
724### `include/`
725
726A folder containing the headers.
727
728The headers are not included among the source code because I like it better that
729way. Also there were folders within `src/` at one point, and I did not want to
730see `#include "../some_header.h"` or things like that.
731
732So all headers are here, even though only one ([`bcl.h`][30]) is meant for end
733users (to be installed in `INCLUDEDIR`).
734
735#### `args.h`
736
737This file is the API for processing command-line arguments.
738
739#### `bc.h`
740
741This header is the API for `bc`-only items. This includes the `bc_main()`
742function and the `bc`-specific lexing and parsing items.
743
744The `bc` parser is perhaps the most sensitive part of the entire codebase. See
745the documentation in `bc.h` for more information.
746
747The code associated with this header is in [`src/bc.c`][40],
748[`src/bc_lex.c`][41], and [`src/bc_parse.c`][42].
749
750#### `bcl.h`
751
752This header is the API for the [`bcl`][156] library.
753
754This header is meant for distribution to end users and contains the API that end
755users of [`bcl`][156] can use in their own software.
756
757This header, because it's the public header, is also the root header. That means
758that it has platform-specific fixes for Windows. (If the fixes were not in this
759header, the build would fail on Windows.)
760
761The code associated with this header is in [`src/library.c`][43].
762
763#### `dc.h`
764
765This header is the API for `dc`-only items. This includes the `dc_main()`
766function and the `dc`-specific lexing and parsing items.
767
768The code associated with this header is in [`src/dc.c`][44],
769[`src/dc_lex.c`][45], and [`src/dc_parse.c`][46].
770
771#### `file.h`
772
773This header is for `bc`'s internal buffered I/O API.
774
775For more information about `bc`'s error handling and custom buffered I/O, see
776[Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the
777notes about version [`3.0.0`][32] in the [`NEWS`][32].
778
779The code associated with this header is in [`src/file.c`][47].
780
781#### `history.h`
782
783This header is for `bc`'s implementation of command-line editing/history, which
784is based on a [UTF-8-aware fork][28] of [`linenoise`][29].
785
786For more information, see the [Command-Line History][189] section.
787
788The code associated with this header is in [`src/history.c`][48].
789
790#### `lang.h`
791
792This header defines the data structures and bytecode used for actual execution
793of `bc` and `dc` code.
794
795Yes, it's misnamed; that's an accident of history where the first things I put
796into it all seemed related to the `bc` language.
797
798The code associated with this header is in [`src/lang.c`][49].
799
800#### `lex.h`
801
802This header defines the common items that both programs need for lexing.
803
804The code associated with this header is in [`src/lex.c`][50],
805[`src/bc_lex.c`][41], and [`src/dc_lex.c`][45].
806
807#### `library.h`
808
809This header defines the things needed for [`bcl`][156] that users should *not*
810have access to. In other words, [`bcl.h`][30] is the *public* header for the
811library, and this header is the *private* header for the library.
812
813The code associated with this header is in [`src/library.c`][43].
814
815#### `num.h`
816
817This header is the API for numbers and math.
818
819The code associated with this header is in [`src/num.c`][39].
820
821#### `opt.h`
822
823This header is the API for parsing command-line arguments.
824
825It's different from [`args.h`][31] in that [`args.h`][31] is for the main code
826to process the command-line arguments into global data *after* they have already
827been parsed by `opt.h` into proper tokens. In other words, `opt.h` actually
828parses the command-line arguments, and [`args.h`][31] turns that parsed data
829into flags (bits), strings, and expressions that will be used later.
830
831Why are they separate? Because originally, `bc` used `getopt_long()` for
832parsing, so [`args.h`][31] was the only one that existed. After it was
833discovered that `getopt_long()` has different behavior on different platforms, I
834adapted a [public-domain option parsing library][34] to do the job instead. And
835in doing so, I gave it its own header.
836
837They could probably be combined, but I don't really care enough at this point.
838
839The code associated with this header is in [`src/opt.c`][51].
840
841#### `parse.h`
842
843This header defines the common items that both programs need for parsing.
844
845Note that the parsers don't produce abstract syntax trees (AST's) or any
846intermediate representations. They produce bytecode directly. In other words,
847they don't have special data structures except what they need to do their job.
848
849The code associated with this header is in [`src/parse.c`][50],
850[`src/bc_lex.c`][42], and [`src/dc_lex.c`][46].
851
852#### `program.h`
853
854This header defines the items needed to manage the data structures in
855[`lang.h`][38] as well as any helper functions needed to generate bytecode or
856execute it.
857
858The code associated with this header is in [`src/program.c`][53].
859
860#### `rand.h`
861
862This header defines the API for the [pseudo-random number generator
863(PRNG)][179].
864
865The PRNG only generates fixed-size integers. The magic of generating random
866numbers of arbitrary size is actually given to the code that does math
867([`src/num.c`][39]).
868
869The code associated with this header is in [`src/rand.c`][54].
870
871#### `read.h`
872
873This header defines the API for reading from files and `stdin`.
874
875Thus, [`file.h`][55] is really for buffered *output*, while this file is for
876*input*. There is no buffering needed for `bc`'s inputs.
877
878The code associated with this header is in [`src/read.c`][56].
879
880#### `status.h`
881
882This header has several things:
883
884* A list of possible errors that internal `bc` code can use.
885* Compiler-specific fixes.
886* Platform-specific fixes.
887* Macros for `bc`'s [error handling][97].
888
889There is no code associated with this header.
890
891#### `vector.h`
892
893This header defines the API for the vectors (resizable arrays) that are used for
894data structures.
895
896Vectors are what do the heavy lifting in almost all of `bc`'s data structures.
897Even the maps of identifiers and arrays use vectors.
898
899The code associated with this header is in [`src/vector.c`][228].
900
901#### `version.h`
902
903This header defines the version of `bc`.
904
905There is no code associated with this header.
906
907#### `vm.h`
908
909This header defines the API for setting up and running `bc` and `dc`.
910
911It is so named because I think of it as the "virtual machine" of `bc`, though
912that is probably not true as [`program.h`][57] is probably the "virtual machine"
913API. Thus, the name is more historical accident.
914
915The code associated with this header is in [`src/vm.c`][58].
916
917### `locales/`
918
919This folder contains a bunch of `.msg` files and soft links to the real `.msg`
920files. This is how locale support is implemented in `bc`.
921
922The files are in the format required by the [`gencat`][59] POSIX utility. They
923all have the same messages, in the same order, with the same numbering, under
924the same groups. This is because the locale system expects those messages in
925that order.
926
927The softlinks exist because for many locales, they would contain the exact same
928information. To prevent duplication, they are simply linked to a master copy.
929
930The naming format for all files is:
931
932```
933<language_code>_<country_code>.<encoding>.msg
934```
935
936This naming format must be followed for all locale files.
937
938### `manuals/`
939
940This folder contains the documentation for `bc`, `dc`, and [`bcl`][156], along
941with a few other manuals.
942
943#### `algorithms.md`
944
945This file explains the mathematical algorithms that are used.
946
947The hope is that this file will guide people in understanding how the math code
948works.
949
950#### `bc.1.md.in`
951
952This file is a template for the markdown version of the `bc` manual and
953manpages.
954
955For more information about how the manpages and markdown manuals are generated,
956and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
957
958#### `bcl.3`
959
960This is the manpage for the [`bcl`][156] library. It is generated from
961[`bcl.3.md`][61] using [`scripts/manpage.sh`][60].
962
963For the reason why I check generated data into the repo, see
964[`scripts/manpage.sh`][60] and [Manuals][86].
965
966#### `bcl.3.md`
967
968This is the markdown manual for the [`bcl`][156] library. It is the source for the
969generated [`bcl.3`][62] file.
970
971#### `benchmarks.md`
972
973This is a document that compares this `bc` to GNU `bc` in various benchmarks. It
974was last updated when version [`3.0.0`][32] was released.
975
976It has very little documentation value, other than showing what compiler options
977are useful for performance.
978
979#### `build.md`
980
981This is the [build manual][14].
982
983This `bc` has a custom build system. The reason for this is because of
984[*portability*][136].
985
986If `bc` used an outside build system, that build system would be an external
987dependency. Thus, I had to write a build system for `bc` that used nothing but
988C99 and POSIX utilities, including barebones [POSIX `make`][74].
989
990for more information about the build system, see the [build system][142]
991section, the [build manual][14], [`configure.sh`][69], and [`Makefile.in`][70].
992
993#### `dc.1.md.in`
994
995This file is a template for the markdown version of the `dc` manual and
996manpages.
997
998For more information about how the manpages and markdown manuals are generated,
999and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
1000
1001#### `development.md`
1002
1003The file you are reading right now.
1004
1005#### `header_bcl.txt`
1006
1007Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper
1008header.
1009
1010For more information about generating manuals, see [`scripts/manpage.sh`][60]
1011and [Manuals][86].
1012
1013#### `header_bc.txt`
1014
1015Used by [`scripts/manpage.sh`][60] to give the [generated `bc` manpages][79] a
1016proper header.
1017
1018For more information about generating manuals, see [`scripts/manpage.sh`][60]
1019and [Manuals][86].
1020
1021#### `header_dc.txt`
1022
1023Used by [`scripts/manpage.sh`][60] to give the [generated `dc` manpages][80] a
1024proper header.
1025
1026For more information about generating manuals, see [`scripts/manpage.sh`][60]
1027and [Manuals][86].
1028
1029#### `header.txt`
1030
1031Used by [`scripts/manpage.sh`][60] to give all generated manpages a license
1032header.
1033
1034For more information about generating manuals, see [`scripts/manpage.sh`][60]
1035and [Manuals][86].
1036
1037#### `release.md`
1038
1039A checklist that I try to somewhat follow when making a release.
1040
1041#### `bc/`
1042
1043A folder containing the `bc` manuals.
1044
1045Each `bc` manual corresponds to a [build type][81]. See that link for more
1046details.
1047
1048For each manual, there are two copies: the markdown version generated from the
1049template, and the manpage generated from the markdown version.
1050
1051#### `dc/`
1052
1053A folder containing the `dc` manuals.
1054
1055Each `dc` manual corresponds to a [build type][81]. See that link for more
1056details.
1057
1058For each manual, there are two copies: the markdown version generated from the
1059template, and the manpage generated from the markdown version.
1060
1061### `scripts/`
1062
1063This folder contains helper scripts. Most of them are written in pure [POSIX
1064`sh`][72], but three ([`afl.py`][94], [`karatsuba.py`][78], and
1065[`randmath.py`][95]) are written in Python 3, and one ([`ministat.c`][223]) is
1066written in C. [`ministat.c`][223] in particular is copied from elsewhere.
1067
1068For more information about the shell scripts, see [POSIX Shell Scripts][76].
1069
1070#### `afl.py`
1071
1072This script is meant to be used as part of the fuzzing workflow.
1073
1074It does one of two things: checks for valid crashes, or runs `bc` and or `dc`
1075under all of the paths found by [AFL++][125].
1076
1077See [Fuzzing][82] for more information about fuzzing, including this script.
1078
1079#### `alloc.sh`
1080
1081This script is a quick and dirty script to test whether or not the garbage
1082collection mechanism of the [`BcNum` caching][96] works. It has been little-used
1083because it tests something that is not important to correctness.
1084
1085#### `benchmark.sh`
1086
1087A script making it easy to run benchmarks and to run the executable produced by
1088[`ministat.c`][223] on them.
1089
1090For more information, see the [Benchmarks][144] section.
1091
1092#### `bitfuncgen.c`
1093
1094A source file for an executable to generate tests for `bc`'s bitwise functions
1095in [`gen/lib2.bc`][26]. The executable is `scripts/bitfuncgen`, and it is built
1096with `make bitfuncgen`. It produces the test on `stdout` and the expected
1097results on `stderr`. This means that to generat tests, use the following
1098invokation:
1099
1100```
1101scripts/bitfuncgen > tests/bc/bitfuncs.txt 2> tests/bc/bitfuncs_results.txt
1102```
1103
1104It calls `abort()` if it runs into an error.
1105
1106#### `exec-install.sh`
1107
1108This script is the magic behind making sure `dc` is installed properly if it's
1109a symlink to `bc`. It checks to see if it is a link, and if so, it just creates
1110a new symlink in the install directory. Of course, it also installs `bc` itself,
1111or `dc` when it's alone.
1112
1113#### `functions.sh`
1114
1115This file is a bunch of common functions for most of the POSIX shell scripts. It
1116is not supposed to be run; instead, it is *sourced* by other POSIX shell
1117scripts, like so:
1118
1119```
1120. "$scriptdir/functions.sh"
1121```
1122
1123or the equivalent, depending on where the sourcing script is.
1124
1125For more information about the shell scripts, see [POSIX Shell Scripts][76].
1126
1127#### `fuzz_prep.sh`
1128
1129Fuzzing is a regular activity when I am preparing for a release.
1130
1131This script handles all the options and such for building a fuzzable binary.
1132Instead of having to remember a bunch of options, I just put them in this script
1133and run the script when I want to fuzz.
1134
1135For more information about fuzzing, see [Fuzzing][82].
1136
1137#### `karatsuba.py`
1138
1139This script has at least one of two major differences from most of the other
1140scripts:
1141
1142* It's written in Python 3.
1143* It's meant for software packagers.
1144
1145For example, [`scripts/afl.py`][94] and [`scripts/randmath.py`][95] are both in
1146Python 3, but they are not meant for the end user or software packagers and are
1147not included in source distributions. But this script is.
1148
1149This script breaks my rule of only POSIX utilities necessary for package
1150maintainers, but there's a very good reason for that: it's only meant to be run
1151*once* when the package is created for the first time, and maybe not even then.
1152
1153You see, this script does two things: it tests the Karatsuba implementation at
1154various settings for `KARATSUBA_LEN`, and it figures out what the optimal
1155`KARATSUBA_LEN` is for the machine that it is running on.
1156
1157Package maintainers can use this script, when creating a package for this `bc`,
1158to figure out what is optimal for their users. Then they don't have to run it
1159ever again. So this script only has to run on the packagers machine.
1160
1161I tried to write the script in `sh`, by the way, and I finally accepted the
1162tradeoff of using Python 3 when it became too hard.
1163
1164However, I also mentioned that it's for testing Karatsuba with various settings
1165of `KARATSUBA_LEN`. Package maintainers will want to run the [test suite][124],
1166right?
1167
1168Yes, but this script is not part of the [test suite][124]; it's used for testing
1169in the [`scripts/release.sh`][83] script, which is maintainer use only.
1170
1171However, there is one snare with `karatsuba.py`: I didn't want the user to have
1172to install any Python libraries to run it. Keep that in mind if you change it.
1173
1174#### `link.sh`
1175
1176This script is the magic behind making `dc` a symlink of `bc` when both
1177calculators are built.
1178
1179#### `locale_install.sh`
1180
1181This script does what its name says: it installs locales.
1182
1183It turns out that this is complicated.
1184
1185There is a magic environment variable, `$NLSPATH`, that tells you how and where
1186you are supposed to install locales.
1187
1188Yes, *how*. And where.
1189
1190But now is not the place to rant about `$NLSPATH`. For more information on
1191locales and `$NLSPATH`, see [Locales][85].
1192
1193#### `locale_uninstall.sh`
1194
1195This script does what its name says: it uninstalls locales.
1196
1197This is far less complicated than installing locales. I basically generate a
1198wildcard path and then list all paths that fit that wildcard. Then I delete each
1199one of those paths. Easy.
1200
1201For more information on locales, see [Locales][85].
1202
1203#### `manpage.sh`
1204
1205This script is the one that generates markdown manuals from a template and a
1206manpage from a markdown manual.
1207
1208For more information about generating manuals, see [Manuals][86].
1209
1210#### `ministat.c`
1211
1212This is a file copied [from FreeBSD][221] that calculates the standard
1213statistical numbers, such as mean, average, and median, based on numbers
1214obtained from a file.
1215
1216For more information, see the [FreeBSD ministat(1) manpage][222].
1217
1218This file allows `bc` to build the `scripts/ministat` executable using the
1219command `make ministat`, and this executable helps programmers evaluate the
1220results of [benchmarks][144] more accurately.
1221
1222#### `package.sh`
1223
1224This script is what helps `bc` maintainers cut a release. It does the following:
1225
12261.	Creates the appropriate `git` tag.
12272.	Pushes the `git` tag.
12283.	Copies the repo to a temp directory.
12294.	Removes files that should not be included in source distributions.
12305.	Creates the tarballs.
12316.	Signs the tarballs.
12327.	Zips and signs the Windows executables if they exist.
12338.	Calculates and outputs SHA512 and SHA256 sums for all of the files,
1234	including the signatures.
1235
1236This script is for `bc` maintainers to use when cutting a release. It is not
1237meant for outside use. This means that some non-POSIX utilities can be used,
1238such as `git` and `gpg`.
1239
1240In addition, before using this script, it expects that the folders that Windows
1241generated when building `bc`, `dc`, and [`bcl`][156], are in the parent
1242directory of the repo, exactly as Windows generated them. If they are not there,
1243then it will not zip and sign, nor calculate sums of, the Windows executables.
1244
1245Because this script creates a tag and pushes it, it should *only* be run *ONCE*
1246per release.
1247
1248#### `radamsa.sh`
1249
1250A script to test `bc`'s command-line expression parsing code, which, while
1251simple, strives to handle as much as possible.
1252
1253What this script does is it uses the test cases in [`radamsa.txt`][98] an input
1254to the [Radamsa fuzzer][99].
1255
1256For more information, see the [Radamsa][128] section.
1257
1258#### `radamsa.txt`
1259
1260Initial test cases for the [`radamsa.sh`][100] script.
1261
1262#### `randmath.py`
1263
1264This script generates random math problems and checks that `bc`'s and `dc`'s
1265output matches the GNU `bc` and `dc`. (For this reason, it is necessary to have
1266GNU `bc` and `dc` installed before using this script.)
1267
1268One snare: be sure that this script is using the GNU `bc` and `dc`, not a
1269previously-installed version of this `bc` and `dc`.
1270
1271If you want to check for memory issues or failing asserts, you can build the
1272`bc` using `./scripts/fuzz_prep.sh -a`, and then run it under this script. Any
1273errors or crashes should be caught by the script and given to the user as part
1274of the "checklist" (see below).
1275
1276The basic idea behind this script is that it generates as many math problems as
1277it can, biasing towards situations that may be likely to have bugs, and testing
1278each math problem against GNU `bc` or `dc`.
1279
1280If GNU `bc` or `dc` fails, it just continues. If this `bc` or `dc` fails, it
1281stores that problem. If the output mismatches, it also stores the problem.
1282
1283Then, when the user sends a `SIGINT`, the script stops testing and goes into
1284report mode. One-by-one, it will go through the "checklist," the list of failed
1285problems, and present each problem to the user, as well as whether this `bc` or
1286`dc` crashed, and its output versus GNU. Then the user can decide to add them as
1287test cases, which it does automatically to the appropriate test file.
1288
1289#### `release_settings.txt`
1290
1291A text file of settings combinations that [`release.sh`][83] uses to ensure that
1292`bc` and `dc` build and work with various default settings. [`release.sh`][83]
1293simply reads it line by line and uses each line for one build.
1294
1295#### `release.sh`
1296
1297This script is for `bc` maintainers only. It runs `bc`, `dc`, and [`bcl`][156]
1298through a gauntlet that is mostly meant to be used in preparation for a release.
1299
1300It does the following:
1301
13021.	Builds every [build type][81], with every setting combo in
1303	[`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1304	alone.
13052.	Builds every [build type][81], with every setting combo in
1306	[`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1307	alone for 32-bit.
13083.	Does #1 and #2 for Debug, Release, Release with Debug Info, and Min Size
1309	Release builds.
13104.	Runs the [test suite][124] on every build, if desired.
13115.	Runs the [test suite][124] under [ASan, UBSan, and MSan][21] for every build
1312	type/setting combo.
13136.	Runs [`scripts/karatsuba.py`][78] in test mode.
13147.	Runs the [test suite][124] for both calculators, `bc` alone, and `dc` alone
1315	under [valgrind][20] and errors if there are any memory bugs or memory
1316	leaks.
1317
1318#### `safe-install.sh`
1319
1320A script copied from [musl][101] to atomically install files.
1321
1322#### `test_settings.sh`
1323
1324A quick and dirty script to help automate rebuilding while manually testing the
1325various default settings.
1326
1327This script uses [`test_settings.txt`][103] to generate the various settings
1328combos.
1329
1330For more information about settings, see [Settings][102] in the [build
1331manual][14].
1332
1333#### `test_settings.txt`
1334
1335A list of the various settings combos to be used by [`test_settings.sh`][104].
1336
1337### `src/`
1338
1339This folder is, obviously, where the actual heart and soul of `bc`, the source
1340code, is.
1341
1342All of the source files are in one folder; this simplifies the build system
1343immensely.
1344
1345There are separate files for `bc` and `dc` specific code ([`bc.c`][40],
1346[`bc_lex.c`][41], [`bc_parse.c`][42], [`dc.c`][44], [`dc_lex.c`][45], and
1347[`dc_parse.c`][46]) where possible because it is cleaner to exclude an entire
1348source file from a build than to have `#if`/`#endif` preprocessor guards.
1349
1350That said, it was easier in many cases to use preprocessor macros where both
1351calculators used much of the same code and data structures, so there is a
1352liberal sprinkling of them through the code.
1353
1354#### `args.c`
1355
1356Code for processing command-line arguments.
1357
1358The header for this file is [`include/args.h`][31].
1359
1360#### `bc.c`
1361
1362The code for the `bc` main function `bc_main()`.
1363
1364The header for this file is [`include/bc.h`][106].
1365
1366#### `bc_lex.c`
1367
1368The code for lexing that only `bc` needs.
1369
1370The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106].
1371
1372#### `bc_parse.c`
1373
1374The code for parsing that only `bc` needs. This code is the most complex and
1375subtle in the entire codebase.
1376
1377The headers for this file are [`include/parse.h`][181] and
1378[`include/bc.h`][106].
1379
1380#### `data.c`
1381
1382Due to [historical accident][23] because of a desire to get my `bc` into
1383[toybox][16], all of the constant data that `bc` needs is all in one file. This
1384is that file.
1385
1386There is no code in this file, but a lot of the const data has a heavy influence
1387on code, including the order of data in arrays because that order has to
1388correspond to the order of other things elsewhere in the codebase. If you change
1389the order of something in this file, run `make test`, and get errors, you
1390changed something that depends on the order that you messed up.
1391
1392Almost all headers have `extern` references to items in this file.
1393
1394#### `dc.c`
1395
1396The code for the `dc` main function `dc_main()`.
1397
1398The header for this file is [`include/dc.h`][182].
1399
1400#### `dc_lex.c`
1401
1402The code for lexing that only `dc` needs.
1403
1404The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182].
1405
1406#### `dc_parse.c`
1407
1408The code for parsing that only `dc` needs.
1409
1410The headers for this file are [`include/parse.h`][181] and
1411[`include/bc.h`][182].
1412
1413#### `file.c`
1414
1415The code for `bc`'s implementation of buffered I/O. For more information about
1416why I implemented my own buffered I/O, see [`include/file.h`][55], [Error
1417Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes
1418about version [`3.0.0`][32] in the [`NEWS`][32].
1419
1420The header for this file is [`include/file.h`][55].
1421
1422#### `history.c`
1423
1424The code for `bc`'s implementation of command-line editing/history, which is
1425based on a [UTF-8-aware fork][28] of [`linenoise`][29].
1426
1427For more information, see the [Command-Line History][189] section.
1428
1429The header for this file is [`include/history.h`][36].
1430
1431#### `lang.c`
1432
1433The data structures used for actual execution of `bc` and `dc` code.
1434
1435While execution is done in [`src/program.c`][53], this file defines functions
1436for initializing, copying, and freeing the data structures, which is somewhat
1437orthogonal to actual execution.
1438
1439Yes, it's misnamed; that's an accident of history where the first things I put
1440into it all seemed related to the `bc` language.
1441
1442The header for this file is [`include/lang.h`][38].
1443
1444#### `lex.c`
1445
1446The code for the common things that both programs need for lexing.
1447
1448The header for this file is [`include/lex.h`][180].
1449
1450#### `library.c`
1451
1452The code to implement the public API of the `bcl` library.
1453
1454The code in this file does a lot to ensure that clients do not have to worry
1455about internal `bc` details, especially error handling with `setjmp()` and
1456`longjmp()`. That and encapsulating the handling of numbers are the bulk of what
1457the code in this file actually does because most of the library is still
1458implemented in [`src/num.c`][39].
1459
1460The headers for this file are [`include/bcl.h`][30] and
1461[`include/library.h`][183].
1462
1463#### `main.c`
1464
1465The entry point for both programs; this is the `main()` function.
1466
1467This file has no headers associated with it.
1468
1469#### `num.c`
1470
1471The code for all of the arbitrary-precision [numbers][177] and [math][178] in
1472`bc`.
1473
1474The header for this file is [`include/num.h`][184].
1475
1476#### `opt.c`
1477
1478The code for parsing command-line options.
1479
1480The header for this file is [`include/opt.h`][35].
1481
1482#### `parse.c`
1483
1484The code for the common items that both programs need for parsing.
1485
1486The header for this file is [`include/parse.h`][181].
1487
1488#### `program.c`
1489
1490The code for the actual execution engine for `bc` and `dc` code.
1491
1492The header for this file is [`include/program.h`][57].
1493
1494#### `rand.c`
1495
1496The code for the [pseudo-random number generator (PRNG)][179] and the special
1497stack handling it needs.
1498
1499The PRNG only generates fixed-size integers. The magic of generating random
1500numbers of arbitrary size is actually given to the code that does math
1501([`src/num.c`][39]).
1502
1503The header for this file is [`include/rand.h`][37].
1504
1505#### `read.c`
1506
1507The code for reading from files and `stdin`.
1508
1509The header for this file is [`include/read.h`][185].
1510
1511#### `vector.c`
1512
1513The code for [vectors][111], [maps][186], and [slab vectors][187], along with
1514slabs.
1515
1516The header for this file is [`include/vector.h`][174].
1517
1518#### `vm.c`
1519
1520The code for setting up and running `bc` and `dc`.
1521
1522It is so named because I think of it as the "virtual machine" of `bc`, though
1523that is probably not true as [`program.h`][57] is probably the "virtual machine"
1524code. Thus, the name is more historical accident.
1525
1526The header for this file is [`include/vm.h`][27].
1527
1528### `tests/`
1529
1530This directory contains the entire [test suite][124] and its infrastructure.
1531
1532#### `all.sh`
1533
1534A convenience script for the `make run_all_tests` target (see the [Group
1535Tests][141] section for more information).
1536
1537#### `all.txt`
1538
1539The file with the names of the calculators. This is to make it easier for the
1540test scripts to know where the standard and other test directories are.
1541
1542#### `bcl.c`
1543
1544The test for the [`bcl`][156] API. For more information, see the [`bcl`
1545Test][157] section.
1546
1547#### `error.sh`
1548
1549The script to run the file-based error tests in `tests/<calculator>/errors/` for
1550each calculator. For more information, see the [Error Tests][151] section.
1551
1552This is a separate script so that each error file can be run separately and in
1553parallel.
1554
1555#### `errors.sh`
1556
1557The script to run the line-based error tests in `tests/<calculator>/errors.txt`
1558for each calculator. For more information, see the [Error Tests][151] section.
1559
1560#### `extra_required.txt`
1561
1562The file with the list of tests which both calculators have that need the [Extra
1563Math build option][188]. This exists to make it easy for test scripts to skip
1564those tests when the [Extra Math build option][188] is disabled.
1565
1566#### `history.py`
1567
1568The file with all of the history tests. For more information, see the [History
1569Tests][155] section.
1570
1571#### `history.sh`
1572
1573The script to integrate [`history.py`][139] into the build system in a portable
1574way, and to skip it if necessary.
1575
1576This script also re-runs the test three times if it fails. This is because
1577`pexpect` can be flaky at times.
1578
1579#### `other.sh`
1580
1581The script to run the "other" (miscellaneous) tests for each calculator. For
1582more information, see the [Other Tests][154] section.
1583
1584#### `read.sh`
1585
1586The script to run the read tests for each calculator. For more information, see
1587the [`read()` Tests][153] section.
1588
1589#### `script.sed`
1590
1591The `sed` script to edit the output of GNU `bc` when generating script tests.
1592For more information, see the [Script Tests][150] section.
1593
1594#### `script.sh`
1595
1596The script for running one script test. For more information, see the [Script
1597Tests][150] section.
1598
1599#### `scripts.sh`
1600
1601The script to help the `make run_all_tests` (see the [Group Tests][141] section)
1602run all of the script tests.
1603
1604#### `stdin.sh`
1605
1606The script to run the `stdin` tests for each calculator. For more information,
1607see the [`stdin` Tests][152] section.
1608
1609#### `test.sh`
1610
1611The script to run one standard test. For more information, see the [Standard
1612Tests][149] section.
1613
1614#### `bc/`
1615
1616The standard tests directory for `bc`. For more information, see the [Standard
1617Tests][149] section.
1618
1619##### `all.txt`
1620
1621The file to tell the build system and `make run_all_tests` (see the [Group
1622Tests][141] section) what standard tests to run for `bc`, as well as in what
1623order.
1624
1625This file just lists the test names, one per line.
1626
1627##### `errors.txt`
1628
1629The initial error test file for `bc`. This file has one test per line. See the
1630[Error Tests][151] section for more information.
1631
1632##### `posix_errors.txt`
1633
1634The file of tests for POSIX compatibility for `bc`. This file has one test per
1635line. For more information, see the [Error Tests][151] section.
1636
1637##### `timeconst.sh`
1638
1639The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6].
1640For more information, see the [Linux `timeconst.bc` Script][191]section.
1641
1642##### `errors/`
1643
1644The directory with error tests for `bc`, most discovered by AFL++ (see the
1645[Fuzzing][82] section). There is one test per file. For more information, see
1646the [Error Tests][151] section.
1647
1648##### `scripts/`
1649
1650The script tests directory for `bc`. For more information, see the [Script
1651Tests][150] section.
1652
1653###### `all.txt`
1654
1655A file to tell the build system and `make run_all_tests` (see the [Group
1656Tests][141] section) what script tests to run for `bc`, as well as in what
1657order.
1658
1659This file just lists the test names, one per line.
1660
1661#### `dc/`
1662
1663The standard tests directory for `dc`. For more information, see the [Standard
1664Tests][149] section.
1665
1666##### `all.txt`
1667
1668The file to tell the build system and `make run_all_tests` (see the [Group
1669Tests][141] section) what standard tests to run for `dc`, as well as in what
1670order.
1671
1672This file just lists the test names, one per line.
1673
1674##### `errors.txt`
1675
1676The initial error test file for `dc`. This file has one test per line. See the
1677[Error Tests][151] section for more information.
1678
1679##### `read_errors.txt`
1680
1681The file of tests errors with the `?` command (`read()` in `bc`). This file has
1682one test per line. See the [Error Tests][151] section for more information.
1683
1684##### `errors/`
1685
1686The directory with error tests for `dc`, most discovered by AFL++ (see the
1687[Fuzzing][82] section). There is one test per file. For more information, see
1688the [Error Tests][151] section.
1689
1690##### `scripts/`
1691
1692The script tests directory for `dc`. For more information, see the [Script
1693Tests][150] section.
1694
1695###### `all.txt`
1696
1697The file to tell the build system and `make run_all_tests` (see the [Group
1698Tests][141] section) what script tests to run for `dc`, as well as in what
1699order.
1700
1701This file just lists the test names, one per line.
1702
1703#### `fuzzing/`
1704
1705The directory containing the fuzzing infrastructure. For more information, see
1706the [Fuzzing][82] section.
1707
1708##### `bc_afl_continue.yaml`
1709
1710The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a
1711fuzz run. For more information, see the [Convenience][130] subsection of the
1712[Fuzzing][82] section.
1713
1714##### `bc_afl.yaml`
1715
1716The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a
1717fuzz run. For more information, see the [Convenience][130] subsection of the
1718[Fuzzing][82] section.
1719
1720Be aware that this will delete all previous unsaved fuzzing tests in the output
1721directories.
1722
1723##### `bc_inputs1/`
1724
1725The fuzzing input directory for the first third of inputs for `bc`. For more
1726information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1727
1728##### `bc_inputs2/`
1729
1730The fuzzing input directory for the second third of inputs for `bc`. For more
1731information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1732
1733##### `bc_inputs3/`
1734
1735The fuzzing input directory for the third third of inputs for `bc`. For more
1736information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1737
1738##### `dc_inputs/`
1739
1740The fuzzing input directory for the inputs for `dc`. For more information, see
1741the [Corpuses][192] subsection of the [Fuzzing][82] section.
1742
1743### `vs/`
1744
1745The directory containing all of the materials needed to build `bc`, `dc`, and
1746`bcl` on Windows.
1747
1748#### `bcl.sln`
1749
1750A Visual Studio solution file for [`bcl`][156]. This, along with
1751[`bcl.vcxproj`][63] and [`bcl.vcxproj.filters`][64] is what makes it possible to
1752build [`bcl`][156] on Windows.
1753
1754#### `bcl.vcxproj`
1755
1756A Visual Studio project file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1757and [`bcl.vcxproj.filters`][64] is what makes it possible to build [`bcl`][156]
1758on Windows.
1759
1760#### `bcl.vcxproj.filters`
1761
1762A Visual Studio filters file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1763and [`bcl.vcxproj`][63] is what makes it possible to build [`bcl`][156] on
1764Windows.
1765
1766#### `bc.sln`
1767
1768A Visual Studio solution file for `bc`. This, along with [`bc.vcxproj`][66]
1769and [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on
1770Windows.
1771
1772#### `bc.vcxproj`
1773
1774A Visual Studio project file for `bc`. This, along with [`bc.sln`][68] and
1775[`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on Windows.
1776
1777#### `bc.vcxproj.filters`
1778
1779A Visual Studio filters file for `bc`. This, along with [`bc.sln`][68] and
1780[`bc.vcxproj`][66] is what makes it possible to build `bc` on Windows.
1781
1782#### `tests/`
1783
1784A directory of files to run tests on Windows.
1785
1786##### `tests_bc.bat`
1787
1788A file to run basic `bc` tests on Windows. It expects that it will be run from
1789the directory containing it, and it also expects a `bc.exe` in the same
1790directory.
1791
1792##### `tests_dc.bat`
1793
1794A file to run basic `dc` tests on Windows. It expects that it will be run from
1795the directory containing it, and it also expects a `bc.exe` in the same
1796directory.
1797
1798## Build System
1799
1800The build system is described in detail in the [build manual][14], so
1801maintainers should start there. This section, however, describes some parts of
1802the build system that only maintainers will care about.
1803
1804### Clean Targets
1805
1806`bc` has a default `make clean` target that cleans up the build files. However,
1807because `bc`'s build system can generate many different types of files, there
1808are other clean targets that may be useful:
1809
1810* `make clean_gen` cleans the `gen/strgen` executable generated from
1811  [`gen/strgen.c`][15]. It has no prerequisites.
1812* `make clean` cleans object files, `*.cat` files (see the [Locales][85]
1813  section), executables, and files generated from text files in [`gen/`][145],
1814  including `gen/strgen` if it was built. So this has a prerequisite on
1815  `make clean_gen` in normal use.
1816* `make clean_benchmarks` cleans [benchmarks][144], including the `ministat`
1817  executable. It has no prerequisites.
1818* `make clean_config` cleans the generated `Makefile` and the manuals that
1819  [`configure.sh`][69] copied in preparation for install. It also depends on
1820  `make clean` and `make clean_benchmarks`, so it cleans those items too. This
1821  is the target that [`configure.sh`][69] uses before it does its work.
1822* `make clean_coverage` cleans the generated coverage files for the [test
1823  suite][124]'s [code coverage][146] capabilities. It has no prerequisites. This
1824  is useful if the code coverage tools are giving errors.
1825* `make clean_tests` cleans *everything*. It has prerequisites on all previous
1826  clean targets, but it also cleans all of the [generated tests][143].
1827
1828When adding more generated files, you may need to add them to one of these
1829targets and/or add a target for them especially.
1830
1831### Preprocessor Macros
1832
1833`bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type:
1834
1835* builds,
1836* works under the [test suite][124], and
1837* excludes as much code as possible from all builds.
1838
1839This section will explain the preprocessor style of `bc` and `dc`, as well as
1840provide an explanation of the macros used.
1841
1842#### Style
1843
1844The style of macro use in `bc` is pretty straightforward: I avoid depending on
1845macro definitions and instead, I set defaults if the macro is not defined and
1846then test the value if the macro with a plain `#if`.
1847
1848(Some examples of setting defaults are in [`include/status.h`][176], just above
1849the definition of the `BcStatus` enum.)
1850
1851In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible.
1852
1853There are a couple of cases where I went with standard stuff instead.
1854
1855#### Standard Macros
1856
1857`BC_ENABLED`
1858
1859:   This macro expands to `1` if `bc` is enabled, `0` if disabled.
1860
1861`DC_ENABLED`
1862
1863:   This macro expands to `1` if `dc` is enabled, `0` if disabled.
1864
1865`BUILD_TYPE`
1866
1867:   The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`,
1868    `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct
1869    the user to the correct markdown manual in the `git.gavinhoward.com`
1870    website.
1871
1872`EXECPREFIX`
1873
1874:   This macro expands to the prefix on the executable name. This is used to
1875    allow `bc` and `dc` to skip the prefix when finding out which calculator is
1876    executing.
1877
1878`BC_NUM_KARATSUBA_LEN`
1879
1880:   This macro expands to an integer, which is the length of numbers below which
1881    the Karatsuba multiplication algorithm switches to brute-force
1882    multiplication.
1883
1884`BC_ENABLE_EXTRA_MATH`
1885
1886:   This macro expands to `1` if the [Extra Math build option][188] is enabled,
1887    `0` if disabled.
1888
1889`BC_ENABLE_HISTORY`
1890
1891:   This macro expands to `1` if the [History build option][193] is enabled, `0`
1892    if disabled.
1893
1894`BC_ENABLE_NLS`
1895
1896:   This macro expands to `1` if the [NLS build option][193] (for locales) is
1897    enabled, `0` if disabled.
1898
1899`BC_ENABLE_LIBRARY`
1900
1901:   This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if
1902    disabled. If this is enabled, building the calculators themselves is
1903    disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero.
1904
1905`BC_ENABLE_MEMCHECK`
1906
1907:   This macro expands to `1` if `bc` has been built for use with Valgrind's
1908    [Memcheck][194], `0` otherwise. This ensures that fatal errors still free
1909    all of their memory when exiting. `bc` does not do that normally because
1910    what's the point?
1911
1912`BC_ENABLE_AFL`
1913
1914:   This macro expands to `1` if `bc` has been built for fuzzing with
1915    [AFL++][125], `0` otherwise. See the [Fuzzing][82] section for more
1916    information.
1917
1918`BC_DEFAULT_BANNER`
1919
1920:   This macro expands to the default value for displaying the `bc` banner.
1921
1922`BC_DEFAULT_SIGINT_RESET`
1923
1924:   The macro expands to the default value for whether or not `bc` should reset
1925    on `SIGINT` or quit.
1926
1927`BC_DEFAULT_TTY_MODE`
1928
1929:   The macro expands to the default value for whether or not `bc` should use
1930    TTY mode when it available.
1931
1932`BC_DEFAULT_PROMPT`
1933
1934:   This macro expands to the default value for whether or not `bc` should use a
1935    prompt when TTY mode is available.
1936
1937`DC_DEFAULT_SIGINT_RESET`
1938
1939:   The macro expands to the default value for whether or not `dc` should reset
1940    on `SIGINT` or quit.
1941
1942`DC_DEFAULT_TTY_MODE`
1943
1944:   The macro expands to the default value for whether or not `dc` should use
1945    TTY mode when it available.
1946
1947`DC_DEFAULT_PROMPT`
1948
1949:   This macro expands to the default value for whether or not `dc` should use a
1950    prompt when TTY mode is available.
1951
1952`BC_DEBUG_CODE`
1953
1954:   If this macro expands to a non-zero integer, then `bc` is built with *a lot*
1955    of extra debugging code. This is never set by the build system and must be
1956    set by the programmer manually. This should never be set in builds given to
1957    end users. For more information, see the [Debugging][134] section.
1958
1959## Test Suite
1960
1961While the source code may be the heart and soul of `bc`, the test suite is the
1962arms and legs: it gives `bc` the power to do anything it needs to do.
1963
1964The test suite is what allowed `bc` to climb to such high heights of quality.
1965This even goes for fuzzing because fuzzing depends on the test suite for its
1966input corpuses. (See the [Fuzzing][82] section.)
1967
1968Understanding how the test suite works should be, I think, the first thing that
1969maintainers learn after learning what `bc` and `dc` should do. This is because
1970the test suite, properly used, gives confidence that changes have not caused
1971bugs or regressions.
1972
1973That is why I spent the time to make the test suite as easy to use and as fast
1974as possible.
1975
1976To use the test suite (assuming `bc` and/or `dc` are already built), run the
1977following command:
1978
1979```
1980make test
1981```
1982
1983That's it. That's all.
1984
1985It will return an error code if the test suite failed. It will also print out
1986information about the failure.
1987
1988If you want the test suite to go fast, then run the following command:
1989
1990```
1991make -j<cores> test
1992```
1993
1994Where `<cores>` is the number of cores that your computer has. Of course, this
1995requires a `make` implementation that supports that option, but most do. (And I
1996will use this convention throughout the rest of this section.)
1997
1998I have even tried as much as possible, to put longer-running tests near the
1999beginning of the run so that the entire suite runs as fast as possible.
2000
2001However, if you want to be sure which test is failing, then running a bare
2002`make test` is a great way to do that.
2003
2004But enough about how you have no excuses to use the test suite as much as
2005possible; let's talk about how it works and what you *can* do with it.
2006
2007### Standard Tests
2008
2009The heavy lifting of testing the math in `bc`, as well as basic scripting, is
2010done by the "standard tests" for each calculator.
2011
2012These tests use the files in the [`tests/bc/`][161] and [`tests/dc/`][162]
2013directories (except for [`tests/bc/all.txt`][163], [`tests/bc/errors.txt`][164],
2014[`tests/bc/posix_errors.txt`][165], [`tests/bc/timeconst.sh`][166],
2015[`tests/dc/all.txt`][167], [`tests/dc/errors.txt`][168], and
2016[`tests/dc/read_errors.txt`][175]), which are called the "standard test
2017directories."
2018
2019For every test, there is the test file and the results file. The test files have
2020names of the form `<test>.txt`, where `<test>` is the name of the test, and the
2021results files have names of the form `<test>_results.txt`.
2022
2023If the test file exists but the results file does not, the results for that test
2024are generated by a GNU-compatible `bc` or `dc`. See the [Generated Tests][143]
2025section.
2026
2027The `all.txt` file in each standard tests directory is what tells the test suite
2028and [build system][142] what tests there are, and the tests are either run in
2029that order, or in the case of parallel `make`, that is the order that the
2030targets are listed as prerequisites of `make test`.
2031
2032If the test exists in the `all.txt` file but does not *actually* exist, the test
2033and its results are generated by a GNU-compatible `bc` or `dc`. See the
2034[Generated Tests][143] section.
2035
2036To add a non-generated standard test, do the following:
2037
2038* Add the test file (`<test>.txt` in the standard tests directory).
2039* Add the results file (`<test>_results.txt` in the standard tests directory).
2040  You can skip this step if just the results file needs to be generated. See the
2041  [Generated Tests][147] section for more information.
2042* Add the name of the test to the `all.txt` file in the standard tests
2043  directory, putting it in the order it should be in. If possible, I would put
2044  longer tests near the beginning because they will start running earlier with
2045  parallel `make`. I always keep `decimal` first, though, as a smoke test.
2046
2047If you need to add a generated standard test, see the [Generated Tests][147]
2048section for how to do that.
2049
2050Some standard tests need to be skipped in certain cases. That is handled by the
2051[build system][142]. See the [Integration with the Build System][147] section
2052for more details.
2053
2054In addition to all of the above, the standard test directory is not only the
2055directory for the standard tests of each calculator, it is also the parent
2056directory of all other test directories for each calculator.
2057
2058#### `bc` Standard Tests
2059
2060The list of current (27 February 2023) standard tests for `bc` is below:
2061
2062decimal
2063
2064:   Tests decimal parsing and printing.
2065
2066print
2067
2068:   Tests printing in every base from decimal. This is near the top for
2069    performance of parallel testing.
2070
2071parse
2072
2073:   Tests parsing in any base and outputting in decimal. This is near the top
2074    for performance of parallel testing.
2075
2076lib2
2077
2078:   Tests the extended math library. This is near the top for performance of
2079    parallel testing.
2080
2081print2
2082
2083:   Tests printing at the extreme values of `obase`.
2084
2085length
2086
2087:   Tests the `length()` builtin function.
2088
2089scale
2090
2091:   Tests the `scale()` builtin function.
2092
2093shift
2094
2095:   Tests the left (`<<`) and right (`>>`) shift operators.
2096
2097add
2098
2099:   Tests addition.
2100
2101subtract
2102
2103:   Tests subtraction.
2104
2105multiply
2106
2107:   Tests multiplication.
2108
2109divide
2110
2111:   Tests division.
2112
2113modulus
2114
2115:   Tests modulus.
2116
2117power
2118
2119:   Tests power (exponentiation).
2120
2121sqrt
2122
2123:   Tests the `sqrt()` (square root) builtin function.
2124
2125trunc
2126
2127:   Tests the truncation (`$`) operator.
2128
2129places
2130
2131:   Tests the places (`@`) operator.
2132
2133vars
2134
2135:   Tests some usage of variables. This one came from [AFL++][125] I think.
2136
2137boolean
2138
2139:   Tests boolean operators.
2140
2141comp
2142
2143:   Tests comparison operators.
2144
2145abs
2146
2147:   Tests the `abs()` builtin function.
2148
2149assignments
2150
2151:   Tests assignment operators, including increment/decrement operators.
2152
2153functions
2154
2155:   Tests functions, specifically function parameters being replaced before they
2156    themselves are used. See the comment in `bc_program_call()` about the last
2157    condition.
2158
2159scientific
2160
2161:   Tests scientific notation.
2162
2163engineering
2164
2165:   Tests engineering notation.
2166
2167globals
2168
2169:   Tests that assigning to globals affects callers.
2170
2171strings
2172
2173:   Tests strings.
2174
2175strings2
2176
2177:   Tests string allocation in slabs, to ensure slabs work.
2178
2179letters
2180
2181:   Tests single and double letter numbers to ensure they behave differently.
2182    Single-letter numbers always be set to the same value, regardless of
2183    `ibase`.
2184
2185exponent
2186
2187:   Tests the `e()` function in the math library.
2188
2189log
2190
2191:   Tests the `l()` function in the math library.
2192
2193pi
2194
2195:   Tests that `bc` produces the right value of pi for numbers with varying
2196    `scale` values.
2197
2198arctangent
2199
2200:   Tests the `a()` function in the math library.
2201
2202sine
2203
2204:   Tests the `s()` function in the math library.
2205
2206cosine
2207
2208:   Tests the `c()` function in the math library.
2209
2210bessel
2211
2212:   Tests the `j()` function in the math library.
2213
2214fib
2215
2216:   Tests the `fib()` Fibonacci function in the extended math library.
2217
2218arrays
2219
2220:   Test arrays.
2221
2222misc
2223
2224:   Miscellaneous tests. I named it this because at the time, I struggled to
2225    classify them, but it's really testing multi-line numbers.
2226
2227misc1
2228
2229:   A miscellaneous test found by [AFL++][125].
2230
2231misc2
2232
2233:   A miscellaneous test found by [AFL++][125].
2234
2235misc3
2236
2237:   A miscellaneous test found by [AFL++][125].
2238
2239misc4
2240
2241:   A miscellaneous test found by [AFL++][125].
2242
2243misc5
2244
2245:   A miscellaneous test found by [AFL++][125].
2246
2247misc6
2248
2249:   A miscellaneous test found by [AFL++][125].
2250
2251misc7
2252
2253:   A miscellaneous test found by [AFL++][125].
2254
2255void
2256
2257:   Tests void functions.
2258
2259rand
2260
2261:   Tests the pseudo-random number generator and its special stack handling.
2262
2263rand_limits
2264
2265:   Tests the limits of the pseudo-random number generator `irand()`.
2266
2267recursive_arrays
2268
2269:   Tested the slab vector undo ability in used in `bc_parse_name()` when it
2270    existed. Now used as a stress test.
2271
2272divmod
2273
2274:   Tests divmod.
2275
2276modexp
2277
2278:   Tests modular exponentiation.
2279
2280bitfuncs
2281
2282:   Tests the bitwise functions, `band()`, `bor()`, `bxor()`, `blshift()` and
2283    `brshift()` in [`gen/lib2.bc`][26].
2284
2285leadingzero
2286
2287:   Tests the leading zero functionality and the `plz*()` and `pnlz*()`
2288    functions in [`gen/lib2.bc`][26].
2289
2290is_number
2291
2292:   Tests the `is_number()` built-in function.
2293
2294is_string
2295
2296:   Tests the `is_number()` built-in function.
2297
2298asciify_array
2299
2300:   Tests the ability of `asciify()` to convert an array into a full string.
2301
2302line_by_line1
2303
2304:   Tests the line-by-line behavior of `bc` with regards to `quit` in a function
2305    definition.
2306
2307line_by_line2
2308
2309:   Tests the line-by-line behavior of `bc` with regards to `quit`.
2310
2311line_loop_quit1
2312
2313:   Tests the behavior of `bc` with a `quit` after a single-line loop.
2314
2315line_loop_quit2
2316
2317:   Tests the behavior of `bc` with a `quit` after a single-line loop and a
2318    newline escape.
2319
2320#### `dc` Standard Tests
2321
2322The list of current (27 February 2023) standard tests for `dc` is below:
2323
2324decimal
2325
2326:   Tests decimal parsing and printing.
2327
2328length
2329
2330:   Tests the `length()` builtin function, including for strings and arrays.
2331
2332stack_len
2333
2334:   Tests taking the length of the results stack.
2335
2336exec_stack_len
2337
2338:   Tests taking the length of the execution stack.
2339
2340add
2341
2342:   Tests addition.
2343
2344subtract
2345
2346:   Tests subtraction.
2347
2348multiply
2349
2350:   Tests multiplication.
2351
2352divide
2353
2354:   Tests division.
2355
2356modulus
2357
2358:   Tests modulus.
2359
2360divmod
2361
2362:   Tests divmod.
2363
2364power
2365
2366:   Tests power (exponentiation).
2367
2368sqrt
2369
2370:   Tests the `sqrt()` (square root) builtin function.
2371
2372modexp
2373
2374:   Tests modular exponentiation.
2375
2376boolean
2377
2378:   Tests boolean operators.
2379
2380negate
2381
2382:   Tests negation as a command and as part of numbers.
2383
2384trunc
2385
2386:   Tests the truncation (`$`) operator.
2387
2388places
2389
2390:   Tests the places (`@`) operator.
2391
2392shift
2393
2394:   Tests the left (`<<`) and right (`>>`) shift operators.
2395
2396abs
2397
2398:   Tests the `abs()` builtin function (the `b` command).
2399
2400scientific
2401
2402:   Tests scientific notation.
2403
2404engineering
2405
2406:   Tests engineering notation.
2407
2408vars
2409
2410:   Tests some usage of variables. This one came from [AFL++][125] I think.
2411
2412misc
2413
2414:   Miscellaneous tests. I named it this because at the time, I struggled to
2415    classify them.
2416
2417misc1
2418
2419:   More miscellaneous tests. This used to be an error file
2420    (`tests/dc/errors/15.txt`) due to the presence of a invalid `u` character.
2421    However, starting with version `6.1.0`, `u` became a valid command
2422    (`is_number()`), so the error file became valid. It was checked manually and
2423    moved, and `tests/dc/errors/34.txt` became the new `tests/dc/errors/15.txt`.
2424
2425strings
2426
2427:   Tests strings.
2428
2429rand
2430
2431:   Tests the pseudo-random number generator and its special stack handling.
2432
2433is_number
2434
2435:   Tests the `is_number()` built-in function (the `u` command).
2436
2437is_string
2438
2439:   Tests the `is_number()` built-in function (the `t` command).
2440
2441### Script Tests
2442
2443The heavy lifting of testing the scripting of `bc` is done by the "script tests"
2444for each calculator.
2445
2446These tests use the files in the [`tests/bc/scripts/`][169] and
2447[`tests/dc/scripts/`][170] directories (except for
2448[`tests/bc/scripts/all.txt`][171] and [`tests/dc/scripts/all.txt`][172]), which
2449are called the "script test directories."
2450
2451To add a script test, do the following:
2452
2453* Add the test file (`<test>.bc` or `<test>.dc` in the script tests directory).
2454* Add the results file (`<test>.txt` in the script tests directory). You can
2455  skip this step if just the results file needs to be generated. See the
2456  [Generated Tests][147] section for more information.
2457* Add the name of the test to the `all.txt` file in the script tests directory,
2458  putting it in the order it should be in. If possible, I would put longer tests
2459  near the beginning because they will start running earlier with parallel
2460  `make`.
2461
2462Some script tests need to be skipped in certain cases. That is handled by the
2463[build system][142]. See the [Integration with the Build System][147] section
2464for more details.
2465
2466Another unique thing about the script tests, at least for `bc`: they test the
2467`-g` and `--global-stacks` flags. This means that all of the script tests for
2468`bc` are written assuming the `-g` flag was given on the command-line
2469
2470There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed`
2471script is used to remove an incompatibility with GNU `bc`.
2472
2473If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU
2474`bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is
2475correct according to my reading of the `bc` spec, so my `bc` doesn't as well.
2476
2477The `sed` script edits numbers that end with just one digit on a line by itself
2478to put it on the same line as others.
2479
2480#### `bc` Script Tests
2481
2482The list of current (27 February 2023) script tests for `bc` is below:
2483
2484print.bc
2485
2486:   Tests printing even harder than the print standard test.
2487
2488multiply.bc
2489
2490:   Tests multiplication even harder than the multiply standard test.
2491
2492divide.bc
2493
2494:   Tests division even harder than the divide standard test.
2495
2496subtract.bc
2497
2498:   Tests subtraction even harder than the subtract standard test.
2499
2500add.bc
2501
2502:   Tests addition even harder than the add standard test.
2503
2504parse.bc
2505
2506:   Tests parsing even harder than the parse standard test.
2507
2508root.bc
2509
2510:   Tests that `root()` and `cbrt()` do not go into an infinite loop on a
2511    pathological case found by a user.
2512
2513array.bc
2514
2515:   Tests arrays even harder than the arrays standard test.
2516
2517array2.bc
2518
2519:   Implements a test where an array element is passed as a parameter to a
2520    function, and then another array is passed to a later parameter that is
2521    named the same as the first array. This was added because of a bug found
2522    while writing a script in bc.
2523
2524atan.bc
2525
2526:   Tests arctangent even harder than the arctangent standard test.
2527
2528bessel.bc
2529
2530:   Tests bessel even harder than the bessel standard test.
2531
2532functions.bc
2533
2534:   Tests functions even harder than the functions standard test.
2535
2536globals.bc
2537
2538:   Tests global stacks directly.
2539
2540len.bc
2541
2542:   Tests the `length()` builtin on arrays.
2543
2544rand.bc
2545
2546:   Tests the random number generator in the presence of global stacks.
2547
2548references.bc
2549
2550:   Tests functions with array reference parameters.
2551
2552screen.bc
2553
2554:   A random script provided by an early user that he used to calculate the size
2555    of computer screens
2556
2557strings2.bc
2558
2559:   Tests escaping in strings.
2560
2561ifs.bc
2562
2563:   Tests proper ending of `if` statements without `else` statements.
2564
2565ifs2.bc
2566
2567:   More tests proper ending of `if` statements without `else` statements.
2568
2569#### `dc` Script Tests
2570
2571The list of current (27 February 2023) script tests for `dc` is below:
2572
2573prime.dc
2574
2575:   Tests scripting by generating the first 100,000 primes.
2576
2577asciify.dc
2578
2579:   Tests the asciify command.
2580
2581stream.dc
2582
2583:   Tests the stream command.
2584
2585array.dc
2586
2587:   Tests arrays.
2588
2589else.dc
2590
2591:   Tests else clauses on conditional execution commands.
2592
2593factorial.dc
2594
2595:   Tests scripting with factorial.
2596
2597loop.dc
2598
2599:   Tests scripting by implementing loops.
2600
2601quit.dc
2602
2603:   Tests the quit command in the presence of tail calls.
2604
2605weird.dc
2606
2607:   A miscellaneous test.
2608
2609no_clamp.dc
2610
2611:   A test to ensure `dc` has the same behavior as the BSD `dc` with digi
2612    clamping off when parsing numbers.
2613
2614### Error Tests
2615
2616One of the most useful parts of the `bc` test suite, in my opinion, is the heavy
2617testing of error conditions.
2618
2619Just about every error condition I can think of is tested, along with many
2620machine-generated (by [AFL++][125]) ones.
2621
2622However, because the error tests will often return error codes, they require
2623different infrastructure from the rest of the test suite, which assumes that
2624the calculator under test will return successfully. A lot of that infrastructure
2625is in the [`scripts/functions.sh`][105] script, but it basically allows the
2626calculator to exit with an error code and then tests that there *was* an error
2627code.
2628
2629Besides returning error codes, error tests also ensure that there is output from
2630`stderr`. This is to make sure that an error message is always printed.
2631
2632The error tests for each calculator are spread through two directories, due to
2633historical accident. These two directories are the standard test directory (see
2634the [Standard Tests][149] section) and the `errors/` directory directly
2635underneath the standard tests directory.
2636
2637This split is convenient, however, because the tests in each directory are
2638treated differently.
2639
2640The error tests in the standard test directory, which include `errors.txt` for
2641both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for `dc`,
2642are run by [`tests/errors.sh`][226]. It reads them line-by-line and shoves the
2643data through `stdin`. Each line is considered a separate test. For this reason,
2644there can't be any blank lines in the error files in the standard tests
2645directory because a blank line causes a successful exit.
2646
2647On the other hand, the tests in the `errors/` directory below the standard tests
2648directory are run by [`tests/error.sh`][227] and are considered to be one test
2649per file. As such, they are used differently. They are shoved into the
2650calculator through `stdin`, but they are also executed by passing them on the
2651command-line.
2652
2653To add an error test, first figure out which kind you want.
2654
2655Is it a simple one-liner, and you don't care if it's tested through a file?
2656
2657Then put it in one of the error files in the standard test directory. I would
2658only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()`
2659errors in the `read_errors.txt` file for `dc`; all others I would put in the
2660respective `errors.txt` file.
2661
2662On the other hand, if you care if the error is run as a file on the
2663command-line, or the error requires multiple lines to reproduce, then put the
2664test in the respective `errors/` directory and run the [`configure.sh`][69]
2665script again.
2666
2667After that, you are done; the test suite will automatically pick up the new
2668test, and you don't have to tell the test suite the expected results.
2669
2670### `stdin` Tests
2671
2672The `stdin` tests specifically test the lexing and parsing of multi-line
2673comments and strings. This is important because when reading from `stdin`, the
2674calculators can only read one line at a time, so partial parses are possible.
2675
2676To add `stdin` tests, just add the tests to the `stdin.txt` file in the
2677respective standard tests directory, and add the expected results in the
2678`stdin_results.txt` in the respective standard tests directory.
2679
2680### `read()` Tests
2681
2682The `read()` tests are meant to test the `read()` builtin function, to ensure
2683that the parsing and execution is correct.
2684
2685Each line is one test, as that is the nature of using the `read()` function, so
2686to add a test, just add it as another line in the `read.txt` file in the
2687respective standard tests directory, and add its result to the
2688`read_results.txt` file in the respective standard tests directory.
2689
2690### Other Tests
2691
2692The "other" tests are just random tests that I could not easily classify under
2693other types of tests. They usually include things like command-line parsing and
2694environment variable testing.
2695
2696To add an other test, it requires adding the programming for it to
2697[`tests/other.sh`][195] because all of the tests are written specifically in
2698that script. It would be best to use the infrastructure in
2699[`scripts/functions.sh`][105].
2700
2701### Linux `timeconst.bc` Script
2702
2703One special script that `bc`'s test suite will use is the [Linux `timeconst.bc`
2704script][6].
2705
2706I made the test suite able to use this script because the reason the
2707[toybox][16] maintainer wanted my `bc` is because of this script, and I wanted
2708to be sure that it would run correctly on the script.
2709
2710However, it is not part of the distribution, nor is it part of the repository.
2711The reason for this is because [`timeconst.bc`][6] is under the GPL, while this
2712repo is under a BSD license.
2713
2714If you want `bc` to run tests on [`timeconst.bc`][6], download it and place it
2715at `tests/bc/scripts/timeconst.bc`. If it is there, the test suite will
2716automatically run its tests; otherwise, it will skip it.
2717
2718### History Tests
2719
2720There are automatic tests for history; however, they have dependencies: Python 3
2721and [`pexpect`][137].
2722
2723As a result, because I need the [test suite to be portable][138], like the rest
2724of `bc`, the history tests are carefully guarded with things to ensure that they
2725are skipped, rather than failing if Python and [`pexpect`][137] are not
2726installed. For this reason, there is a `sh` script, [`tests/history.sh`][140]
2727that runs the actual script, [`tests/history.py`][139].
2728
2729I have added as many tests as I could to cover as many lines and branches as
2730possible. I guess I could have done more, but doing so would have required a lot
2731of time.
2732
2733I have tried to make it as easy as possible to run the history tests. They will
2734run automatically if you use the `make test_history` command, and they will also
2735use parallel execution with `make -j<cores> test_history`.
2736
2737However, the history tests are meant only to be run by maintainers of `bc`; they
2738are *not* meant to be run by users and packagers. The reason for this is that
2739they only seem to work reliably on Linux; `pexpect` seems to have issues on
2740other platforms, especially timeout issues.
2741
2742Thus, they are excluded from running with `make test` and [`tests/all.sh`][225].
2743However, they can be run from the [`scripts/release.sh`][83] script.
2744
2745All of the tests are contained in [`tests/history.py`][139]. The reason for this
2746is because they are in Python, and I don't have an easy way of including Python
2747(or at the very least, I am not familiar enough with Python to do that). So they
2748are all in the same file to make it easier on me.
2749
2750Each test is one function in the script. They all take the same number and type
2751of arguments:
2752
27531.	`exe`: the executable to run.
27542.	`args`: the arguments to pass to the executable.
27553.	`env`: the environment.
2756
2757Each function creates a child process with `pexpect.spawn` and then tests with
2758that child. Then the function returns the child to the caller, who closes it
2759and checks its error code against its expected error code.
2760
2761Yes, the error code is not a success all the time. This is because of the UTF-8
2762tests; `bc` gives a fatal error on any non-ASCII data because ASCII is all `bc`
2763is required to handle, per the [standard][2].
2764
2765So in [`tests/history.py`][139], there are four main arrays:
2766
2767* `bc` test functions,
2768* `bc` expected error codes.
2769* `dc` test functions.
2770* `dc` expected error codes.
2771
2772[`tests/history.py`][139] takes an index as an argument; that index is what test
2773it should run. That index is used to index into the proper test and error code
2774array.
2775
2776If you need to add more history tests, you need to do the following:
2777
27781.	Add the function for that test to [`tests/history.py`][139].
27792.	Add the function to the proper array of tests.
27803.	Add the expected error code to the proper array of error codes.
27814.	Add a target for the test to [`Makefile.in`][70].
27825.	Add that target as a prerequisite to either `test_bc_history` or
2783	`test_dc_history`.
2784
2785You do not need to do anything to add the test to `history_all_tests` (see
2786[Group Tests][141] below) because the scripts will automatically run all of the
2787tests properly.
2788
2789### Generated Tests
2790
2791Some tests are *large*, and as such, it is impractical to check them into `git`.
2792Instead, the tests depend on the existence of a GNU-compatible `bc` in the
2793`PATH`, which is then used to generate the tests.
2794
2795If [`configure.sh`][69] was run with the `-G` argument, which disables generated
2796tests, then `make test` and friends will automatically skip generated tests.
2797This is useful to do on platforms that are not guaranteed to have a
2798GNU-compatible `bc` installed.
2799
2800However, adding a generated test is a complicated because you have to figure out
2801*where* you want to put the file to generate the test.
2802
2803For example, `bc`'s test suite will automatically use a GNU-compatible `bc` to
2804generate a `<test>_results.txt` file in the [standard tests][149] directory
2805(either `tests/bc/` or `tests/dc/`) if none exists for the `<test>` test. If no
2806`<test>.txt` file exists in the [standard tests][149] directory, then `bc`'s
2807test suite will look for a `<test>.bc` or `<test>.dc` file in the [script
2808tests][150] directory (either `tests/bc/scripts` or `tests/dc/scripts`), and if
2809that exists, it will use that script to generate the `<test>.txt` file in the
2810[standard tests][149] directory after which it will generate the
2811`<test>_results.txt` file in the [standard tests][149] directory.
2812
2813So you can choose to either:
2814
2815* Have a test in the [standard tests][149] directory without a corresponding
2816  `*_results.txt` file, or
2817* Have a script in the [script tests][150] directory to generate the
2818  corresponding file in the standard test directory before generating the
2819  corresponding `*_results.txt` file.
2820
2821Adding a script has a double benefit: the script itself can be used as a test.
2822However, script test results can also be generated.
2823
2824If `bc` is asked to run a script test, then if the script does not exist, `bc`'s
2825test suite returns an error. If it *does* exist, but no corresponding
2826`<test>.txt` file exists in the [script tests][150] directory, then a
2827GNU-compatible `bc` is used to generate the `<test>.txt` results file.
2828
2829If generated tests are disabled through [`configure.sh`][69], then these tests
2830are not generated if they do not exist. However, if they *do* exist, then they
2831are run. This can happen if a `make clean_tests` was not run between a build
2832that generated tests and a build that will not.
2833
2834### Group Tests
2835
2836While the test suite has a lot of targets in order to get parallel execution,
2837there are five targets that allow you to run each section, or all, of the test
2838suite as one unit:
2839
2840* `bc_all_tests` (`bc` tests)
2841* `timeconst_all_tests` ([Linux `timeconst.bc` script][6] tests)
2842* `dc_all_tests` (`dc` tests)
2843* `history_all_tests` (history tests)
2844* `run_all_tests` (combination of the previous four)
2845
2846In addition, there are more fine-grained targets available:
2847
2848* `test_bc` runs all `bc` tests (except history tests).
2849* `test_dc` runs all `dc` tests (except history tests).
2850* `test_bc_tests` runs all `bc` [standard tests][149].
2851* `test_dc_tests` runs all `dc` [standard tests][149].
2852* `test_bc_scripts` runs all `bc` [script tests][150].
2853* `test_dc_scripts` runs all `dc` [script tests][150].
2854* `test_bc_stdin` runs the `bc` [`stdin` tests][152].
2855* `test_dc_stdin` runs the `dc` [`stdin` tests][152].
2856* `test_bc_read` runs the `bc` [`read()` tests][153].
2857* `test_dc_read` runs the `dc` [`read()` tests][153].
2858* `test_bc_errors` runs the `bc` [error tests][151].
2859* `test_dc_errors` runs the `dc` [error tests][151].
2860* `test_bc_other` runs the `bc` [other tests][151].
2861* `test_dc_other` runs the `dc` [other tests][151].
2862* `timeconst` runs the tests for the [Linux `timeconst.bc` script][6].
2863* `test_history` runs all history tests.
2864* `test_bc_history` runs all `bc` history tests.
2865* `test_dc_history` runs all `dc` history tests.
2866
2867All of the above tests are parallelizable.
2868
2869### Individual Tests
2870
2871In addition to all of the above, individual test targets are available. These
2872are mostly useful for attempting to fix a singular test failure.
2873
2874These tests are:
2875
2876* `test_bc_<test>`, where `<test>` is the name of a `bc` [standard test][149].
2877  The name is the name of the test file without the `.txt` extension. It is the
2878  name printed by the test suite when running the test.
2879* `test_dc_<test>`, where `<test>` is the name of a `dc` [standard test][149].
2880  The name is the name of the test file without the `.txt` extension. It is the
2881  name printed by the test suite when running the test.
2882* `test_bc_script_<test>`, where `<test>` is the name of a `bc` [script
2883  test][150]. The name of the test is the name of the script without the `.bc`
2884  extension.
2885* `test_dc_script_<test>`, where `<test>` is the name of a `dc` [script
2886  test][150]. The name of the test is the name of the script without the `.dc`
2887  extension.
2888* `test_bc_history<idx>` runs the `bc` history test with index `<idx>`.
2889* `test_dc_history<idx>` runs the `dc` history test with index `<idx>`.
2890
2891### [`bcl`][156] Test
2892
2893When [`bcl`][156] is built, the [build system][142] automatically ensures that
2894`make test` runs the [`bcl`][156] test instead of the `bc` and `dc` tests.
2895
2896There is only one test, and it is built from [`tests/bcl.c`][158].
2897
2898The reason the test is in C is because [`bcl`][156] is a C library; I did not
2899want to have to write C code *and* POSIX `sh` scripts to run it.
2900
2901The reason there is only one test is because most of the code for the library is
2902tested by virtue of testing `bc` and `dc`; the test needs to only ensure that
2903the library bindings and plumbing do not interfere with the underlying code.
2904
2905However, just because there is only one test does not mean that it doesn't test
2906more than one thing. The code actually handles a series of tests, along with
2907error checking to ensure that nothing went wrong.
2908
2909To add a [`bcl`][156] test, just figure out what test you want, figure out where
2910in the [`tests/bcl.c`][158] would be best to put it, and put it there. Do as
2911much error checking as possible, and use the `err(BclError)` function. Ensure
2912that all memory is freed because that test is run through [Valgrind][159] and
2913[AddressSanitizer][160].
2914
2915### Integration with the Build System
2916
2917If it was not obvious by now, the test suite is heavily integrated into the
2918[build system][142], but the integration goes further than just making the test
2919suite easy to run from `make` and generating individual and group tests.
2920
2921The big problem the test suite has is that some `bc` code, stuff that is
2922important to test, is only in *some* builds. This includes all of the extra math
2923extensions, for example.
2924
2925So the test suite needs to have some way of turning off the tests that depend on
2926certain [build types][81] when those [build types][81] are not used.
2927
2928This is the reason the is tightly integrated with the [build system][142]: the
2929[build system][142] knows what [build type][81] was used and can tell the test
2930suite to turn off the tests that do not apply.
2931
2932It does this with arguments to the test scripts that are either a `1` or a `0`,
2933depending on whether tests of that type should be enabled or not. These
2934arguments are why I suggest, in the [Test Scripts][148] section, to always use a
2935`make` target to run the test suite or any individual test. I have added a lot
2936of targets to make this easy and as fast as possible.
2937
2938In addition to all of that, the build system is responsible for selecting the
2939`bc`/`dc` tests or the [`bcl` test][157].
2940
2941### Output Directories
2942
2943During any run of the test suite, the test suite outputs the results of running
2944various tests to files. These files are usually output to `tests/bc_outputs/`
2945and `tests/dc_outputs/`.
2946
2947However, in some cases, it may be necessary to output test results to a
2948different directory. If that is the case, set the environment variable
2949`BC_TEST_OUTPUT_DIR` to the name of the directory.
2950
2951If that is done, then test results will be written to
2952`$BC_TEST_OUTPUT_DIR/bc_outputs/` and `$BC_TEST_OUTPUT_DIR/dc_outputs/`.
2953
2954### Test Suite Portability
2955
2956The test suite is meant to be run by users and packagers as part of their
2957install process.
2958
2959This puts some constraints on the test suite, but the biggest is that the test
2960suite must be as [portable as `bc` itself][136].
2961
2962This means that the test suite must be implemented in pure POSIX `make`, `sh`,
2963and C99.
2964
2965#### Test Scripts
2966
2967To accomplish the portability, the test suite is run by a bunch of `sh` scripts
2968that have the constraints laid out in [POSIX Shell Scripts][76].
2969
2970However, that means they have some quirks, made worse by the fact that there are
2971[generated tests][143] and [tests that need to be skipped, but only
2972sometimes][147].
2973
2974This means that a lot of the scripts take an awkward number and type of
2975arguments. Some arguments are strings, but most are integers, like
2976[`scripts/release.sh`][83].
2977
2978It is for this reason that I do not suggest running the test scripts directly.
2979Instead, always use an appropriate `make` target, which already knows the
2980correct arguments for the test because of the [integration with the build
2981system][147].
2982
2983### Test Coverage
2984
2985In order to get test coverage information, you need `gcc`, `gcov`, and `gcovr`.
2986
2987If you have them, run the following commands:
2988
2989```
2990CC=gcc ./configure -gO3 -c
2991make -j<cores>
2992make coverage
2993```
2994
2995Note that `make coverage` does not have a `-j<cores>` part; it cannot be run in
2996parallel. If you try, you will get errors. And note that `CC=gcc` is used.
2997
2998After running those commands, you can open your web browser and open the
2999`index.html` file in the root directory of the repo. From there, you can explore
3000all of the coverage results.
3001
3002If you see lines or branches that you think you could hit with a manual
3003execution, do such manual execution, and then run the following command:
3004
3005```
3006make coverage_output
3007```
3008
3009and the coverage output will be updated.
3010
3011If you want to rerun `make coverage`, you must do a `make clean` and build
3012first, like this:
3013
3014```
3015make clean
3016make -j<cores>
3017make coverage
3018```
3019
3020Otherwise, you will get errors.
3021
3022If you want to run tests in parallel, you can do this:
3023
3024```
3025make -j<cores>
3026make -j<cores> test
3027make coverage_output
3028```
3029
3030and that will generate coverage output correctly.
3031
3032### [AddressSanitizer][21] and Friends
3033
3034To run the test suite under [AddressSanitizer][21] or any of its friends, use
3035the following commands:
3036
3037```
3038CFLAGS="-fsanitize=<sanitizer> ./configure -gO3 -m
3039make -j<cores>
3040make -j<cores> test
3041```
3042
3043where `<sanitizer>` is the correct name of the desired sanitizer. There is one
3044exception to the above: `UndefinedBehaviorSanitizer` should be run on a build
3045that has zero optimization, so for `UBSan`, use the following commands:
3046
3047```
3048CFLAGS="-fsanitize=undefined" ./configure -gO0 -m
3049make -j<cores>
3050make -j<cores> test
3051```
3052
3053### [Valgrind][20]
3054
3055To run the test suite under [Valgrind][20], run the following commands:
3056
3057```
3058./configure -gO3 -v
3059make -j<cores>
3060make -j<cores> test
3061```
3062
3063It really is that easy. I have directly added infrastructure to the build system
3064and the test suite to ensure that if [Valgrind][20] detects any memory errors or
3065any memory leaks at all, it will tell the test suite infrastructure to report an
3066error and exit accordingly.
3067
3068## POSIX Shell Scripts
3069
3070There is a lot of shell scripts in this repository, and every single one of them
3071is written in pure [POSIX `sh`][72].
3072
3073The reason that they are written in [POSIX `sh`][72] is for *portability*: POSIX
3074systems are only guaranteed to have a barebones implementation of `sh`
3075available.
3076
3077There are *many* snares for unwary programmers attempting to modify
3078[`configure.sh`][69], any of the scripts in this directory, [`strgen.sh`][9], or
3079any of the scripts in [`tests/`][77]. Here are some of them:
3080
30811.	No `bash`-isms.
30822.	Only POSIX standard utilities are allowed.
30833.	Only command-line options defined in the POSIX standard for POSIX utilities
3084	are allowed.
30854.	Only the standardized behavior of POSIX utilities is allowed.
30865.	Functions return data by *printing* it. Using `return` sets their exit code.
3087
3088In other words, the script must only use what is standardized in the [`sh`][72]
3089and [Shell Command Language][73] standards in POSIX. This is *hard*. It precludes
3090things like `local` and the `[[ ]]` notation.
3091
3092These are *enormous* restrictions and must be tested properly. I put out at
3093least one release with a change to `configure.sh` that wasn't portable. That was
3094an embarrassing mistake.
3095
3096The lack of `local`, by the way, is why variables in functions are named with
3097the form:
3098
3099```
3100_<function_name>_<var_name>
3101```
3102
3103This is done to prevent any clashes of variable names with already existing
3104names. And this applies to *all* shell scripts. However, there are a few times
3105when that naming convention is *not* used; all of them are because those
3106functions are required to change variables in the global scope.
3107
3108### Maintainer-Only Scripts
3109
3110If a script is meant to be used for maintainers (of `bc`, not package
3111maintainers), then rules 2, 3, and 4 don't need to be followed as much because
3112it is assumed that maintainers will be able to install whatever tools are
3113necessary to do the job.
3114
3115## Manuals
3116
3117The manuals for `bc` and `dc` are all generated, and the manpages for `bc`,
3118`dc`, and `bcl` are also generated.
3119
3120Why?
3121
3122I don't like the format of manpages, and I am not confident in my ability to
3123write them. Also, they are not easy to read on the web.
3124
3125So that explains why `bcl`'s manpage is generated from its markdown version. But
3126why are the markdown versions of the `bc` and `dc` generated?
3127
3128Because the content of the manuals needs to change based on the [build
3129type][81]. For example, if `bc` was built with no history support, it should not
3130have the **COMMAND LINE HISTORY** section in its manual. If it did, that would
3131just confuse users.
3132
3133So the markdown manuals for `bc` and `dc` are generated from templates
3134([`manuals/bc.1.md.in`][89] and [`manuals/dc.1.md.in`][90]). And from there,
3135the manpages are generated from the generated manuals.
3136
3137The generated manpage for `bcl` ([`manuals/bcl.3`][62]) is checked into version
3138control, and the generated markdown manuals and manpages for `bc`
3139([`manuals/bc`][79]) and `dc` ([`manuals/dc`][80]) are as well.
3140
3141This is because generating the manuals and manpages requires a heavy dependency
3142that only maintainers should care about: [Pandoc][92]. Because users [should not
3143have to install *any* dependencies][136], the files are generated, checked into
3144version control, and included in distribution tarballs.
3145
3146If you run [`configure.sh`][69], you have an easy way of generating the markdown
3147manuals and manpages: just run `make manpages`. This target calls
3148[`scripts/manpage.sh`][60] appropriately for `bc`, `dc`, and `bcl`.
3149
3150For more on how generating manuals and manpages works, see
3151[`scripts/manpage.sh`][60].
3152
3153## Locales
3154
3155The locale system of `bc` is enormously complex, but that's because
3156POSIX-compatible locales are terrible.
3157
3158How are they terrible?
3159
3160First, `gencat` does not work for generating cross-compilation. In other words,
3161it does not generate machine-portable files. There's nothing I can do about
3162this except for warn users.
3163
3164Second, the format of `.msg` files is...interesting. Thank goodness it is text
3165because otherwise, it would be impossible to get them right.
3166
3167Third, `.msg` files are not used. In other words, `gencat` exists. Why?
3168
3169Fourth, `$NLSPATH` is an awful way to set where and *how* to install locales.
3170
3171Yes, where and *how*.
3172
3173Obviously, from it's name, it's a path, and that's the where. The *how* is more
3174complicated.
3175
3176It's actually *not* a path, but a path template. It's a format string, and it
3177can have a few format specifiers. For more information on that, see [this
3178link][84]. But in essence, those format specifiers configure how each locale is
3179supposed to be installed.
3180
3181With all those problems, why use POSIX locales? Portability, as always. I can't
3182assume that `gettext` will be available, but I *can* pretty well assume that
3183POSIX locales will be available.
3184
3185The locale system of `bc` includes all files under [`locales/`][85],
3186[`scripts/locale_install.sh`][87], [`scripts/locale_uninstall.sh`][88],
3187[`scripts/functions.sh`][105], the `bc_err_*` constants in [`src/data.c`][131],
3188and the parts of the build system needed to activate it. There is also code in
3189[`src/vm.c`][58] (in `bc_vm_gettext()`) for loading the current locale.
3190
3191If the order of error messages and/or categories are changed, the order of
3192errors must be changed in the enum, the default error messages and categories in
3193[`src/data.c`][131], and all of the messages and categories in the `.msg` files
3194under [`locales/`][85].
3195
3196## Static Analysis
3197
3198I do *some* static analysis on `bc`.
3199
3200I used to use [Coverity][196], but I stopped using it when it started giving me
3201too many false positives and also because it had a vulnerability.
3202
3203However, I still use the [Clang Static Analyzer][197] through
3204[`scan-build`][19]. I only use it in debug mode because I have to add some
3205special code to make it not complain about things that are definitely not a
3206problem.
3207
3208The most frequent example of false positives is where a local is passed to a
3209function to be initialized. [`scan-build`][19] misses that fact, so I
3210pre-initialize such locals to prevent the warnings.
3211
3212To run `scan-build`, do the following:
3213
3214```
3215make clean
3216scan-build make
3217```
3218
3219`scan-build` will print its warnings to `stdout`.
3220
3221## Fuzzing
3222
3223The quality of this `bc` is directly related to the amount of fuzzing I did. As
3224such, I spent a lot of work making the fuzzing convenient and fast, though I do
3225admit that it took me a long time to admit that it did need to be faster.
3226
3227First, there were several things which make fuzzing fast:
3228
3229* Using [AFL++][125]'s deferred initialization.
3230* Splitting `bc`'s corpuses.
3231* Parallel fuzzing.
3232
3233Second, there are several things which make fuzzing convenient:
3234
3235* Preprepared input corpuses.
3236* [`scripts/fuzz_prep.sh`][119].
3237* `tmux` and `tmuxp` configs.
3238* [`scripts/afl.py`][94].
3239
3240### Fuzzing Performance
3241
3242Fuzzing with [AFL++][125] can be ***SLOW***. Spending the time to make it as
3243fast as possible is well worth the time.
3244
3245However, there is a caveat to the above: it is easy to make [AFL++][125] crash,
3246be unstable, or be unable to find "paths" (see [AFL++ Quickstart][129]) if the
3247performance enhancements are done poorly.
3248
3249To stop [AFL++][125] from crashing on test cases, and to be stable, these are
3250the requirements:
3251
3252* The state at startup must be *exactly* the same.
3253* The virtual memory setup at startup must be *exactly* the same.
3254
3255The first isn't too hard; it's the second that is difficult.
3256
3257`bc` allocates a lot of memory at start. ("A lot" is relative; it's far less
3258than most programs.) After going through an execution run, however, some of that
3259memory, while it could be cleared and reset, is in different places because of
3260vectors. Since vectors reallocate, their allocations are not guaranteed to be in
3261the same place.
3262
3263So to make all three work, I had to set up the deferred initialization and
3264persistent mode *before* any memory was allocated (except for `vm.jmp_bufs`,
3265which is probably what caused the stability to drop below 100%). However, using
3266deferred alone let me put the [AFL++][125] initialization further back. This
3267works because [AFL++][125] sets up a `fork()` server that `fork()`'s `bc` right
3268at that call. Thus, every run has the exact same virtual memory setup, and each
3269run can skip all of the setup code.
3270
3271I tested `bc` using [AFL++][125]'s deferred initialization, plus persistent
3272mode, plus shared memory fuzzing. In order to do it safely, with stability above
327399%, all of that was actually *slower* than using just deferred initialization
3274with the initialization *right before* `stdin` was read. And as a bonus, the
3275stability in that situation is 100%.
3276
3277As a result, my [AFL++][125] setup only uses deferred initialization. That's the
3278`__AFL_INIT()` call.
3279
3280(Note: there is one more big item that must be done in order to have 100%
3281stability: the pseudo-random number generator *must* start with *exactly* the
3282same seed for every run. This is set up with the `tmux` and `tmuxp` configs that
3283I talk about below in [Convenience][130]. This seed is set before the
3284`__AFL_INIT()` call, so setting it has no runtime cost for each run, but without
3285it, stability would be abysmal.)
3286
3287On top of that, while `dc` is plenty fast under fuzzing (because of a faster
3288parser and less test cases), `bc` can be slow. So I have split the `bc` input
3289corpus into three parts, and I set fuzzers to run on each individually. This
3290means that they will duplicate work, but they will also find more stuff.
3291
3292On top of all of that, each input corpus (the three `bc` corpuses and the one
3293`dc` corpus) is set to run with 4 fuzzers. That works out perfectly for two
3294reasons: first, my machine has 16 cores, and second, the [AFL++][125] docs
3295recommend 4 parallel fuzzers, at least, to run different "power schedules."
3296
3297### Convenience
3298
3299The preprepared input corpuses are contained in the
3300`tests/fuzzing/bc_inputs{1,2,3}/`, and `tests/fuzzing/dc_inputs` directories.
3301There are three `bc` directories and only one `dc` directory because `bc`'s
3302input corpuses are about three times as large, and `bc` is a larger program;
3303it's going to need much more fuzzing.
3304
3305(They do share code though, so fuzzing all of them still tests a lot of the same
3306math code.)
3307
3308The next feature of convenience is the [`scripts/fuzz_prep.sh`][119] script. It
3309assumes the existence of `afl-clang-lto` in the `$PATH`, but if that exists, it
3310automatically configures and builds `bc` with a fuzz-ideal build.
3311
3312A fuzz-ideal build has several things:
3313
3314* `afl-clang-lto` as the compiler. (See [AFL++ Quickstart][129].)
3315* Debug mode, to crash as easily as possible.
3316* Full optimization (including [Link-Time Optimization][126]), for performance.
3317* [AFL++][125]'s deferred initialization (see [Fuzzing Performance][127] above).
3318* And `AFL_HARDEN=1` during the build to harden the build. See the [AFL++][125]
3319  documentation for more information.
3320
3321There is one big thing that a fuzz-ideal build does *not* have: it does not use
3322[AFL++][125]'s `libdislocator.so`. This is because `libdislocator.so` crashes if
3323it fails to allocate memory. I do not want to consider those as crashes because
3324my `bc` does, in fact, handle them gracefully by exiting with a set error code.
3325So `libdislocator.so` is not an option.
3326
3327However, to add to [`scripts/fuzz_prep.sh`][119] making a fuzz-ideal build, in
3328`tests/fuzzing/`, there are two `yaml` files: [`tests/fuzzing/bc_afl.yaml`][120]
3329and [`tests/fuzzing/bc_afl_continue.yaml`][121]. These files are meant to be
3330used with [`tmux`][122] and [`tmuxp`][123]. While other programmers will have to
3331adjust the `start_directory` item, once it is adjusted, then using this command:
3332
3333```
3334tmuxp load tests/fuzzing/bc_afl.yaml
3335```
3336
3337will start fuzzing.
3338
3339In other words, to start fuzzing, the sequence is:
3340
3341```
3342./scripts/fuzz_prep.sh
3343tmuxp load tests/fuzzing/bc_afl.yaml
3344```
3345
3346Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on
3347`bc` and 4 on `dc`. The outputs will be put into the
3348`tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories.
3349
3350(Note that loading that config will also delete all unsaved [AFL++][125] output
3351from the output directories.)
3352
3353Sometimes, [AFL++][125] will report crashes when there are none. When crashes
3354are reported, I always run the following command:
3355
3356```
3357./scripts/afl.py <dir>
3358```
3359
3360where `dir` is one of `bc1`, `bc2`, `bc3`, or `dc`, depending on which of the
336116 instances reported the crash. If it was one of the first four (`bc11` through
3362`bc14`), I use `bc1`. If it was one of the second four (`bc21` through `bc24`, I
3363use `bc2`. If it was one of the third four (`bc31` through `bc34`, I use `bc3`.
3364And if it was `dc`, I use `dc`.
3365
3366The [`scripts/afl.py`][94] script will report whether [AFL++][125] correctly
3367reported a crash or not. If so, it will copy the crashing test case to
3368`.test.txt` and tell you whether it was from running it as a file or through
3369`stdin`.
3370
3371From there, I personally always investigate the crash and fix it. Then, when the
3372crash is fixed, I either move `.test.txt` to `tests/{bc,dc}/errors/<idx>.txt` as
3373an error test (if it produces an error) or I create a new
3374`tests/{bc,dc}/misc<idx>.txt` test for it and a corresponding results file. (See
3375[Test Suite][124] for more information about the test suite.) In either case,
3376`<idx>` is the next number for a file in that particular place. For example, if
3377the last file in `tests/{bc,dc}/errors/` is `tests/{bc,dc}/errors/18.txt`, I
3378move `.test.txt` to `tests/bc/error/19.txt`.
3379
3380Then I immediately run [`scripts/afl.py`][94] again to find the next crash
3381because often, [AFL++][125] found multiple test cases that trigger the same
3382crash. If it finds another, I repeat the process until it is happy.
3383
3384Once it *is* happy, I do the same `fuzz_prep.sh`, `tmuxp load` sequence and
3385restart fuzzing. Why do I restart instead of continuing? Because with the
3386changes, the test outputs could be stale and invalid.
3387
3388However, there *is* a case where I continue: if [`scripts/afl.py`][94] finds
3389that every crash reported by [AFL++][125] is invalid. If that's the case, I can
3390just continue with the command:
3391
3392```
3393tmuxp load tests/fuzzing/bc_afl_continue.yaml
3394```
3395
3396(Note: I admit that I usually run [`scripts/afl.py`][94] while the fuzzer is
3397still running, so often, I don't find a need to continue since there was no
3398stop. However, the capability is there, if needed.)
3399
3400In addition, my fuzzing setup, including the `tmux` and `tmuxp` configs,
3401automatically set up [AFL++][125] power schedules (see [Fuzzing
3402Performance][127] above). They also set up the parallel fuzzing such that there
3403is one fuzzer in each group of 4 that does deterministic fuzzing. It's always
3404the first one in each group.
3405
3406For more information about deterministic fuzzing, see the [AFL++][125]
3407documentation.
3408
3409### Corpuses
3410
3411I occasionally add to the input corpuses. These files come from new files in the
3412[Test Suite][124]. In fact, I use soft links when the files are the same.
3413
3414However, when I add new files to an input corpus, I sometimes reduce the size of
3415the file by removing some redundancies.
3416
3417And then, when adding to the `bc` corpuses, I try to add them evenly so that
3418each corpus will take about the same amount of time to get to a finished state.
3419
3420### [AFL++][125] Quickstart
3421
3422The way [AFL++][125] works is complicated.
3423
3424First, it is the one to invoke the compiler. It leverages the compiler to add
3425code to the binary to help it know when certain branches are taken.
3426
3427Then, when fuzzing, it uses that branch information to generate information
3428about the "path" that was taken through the binary.
3429
3430I don't know what AFL++ counts as a new path, but each new path is added to an
3431output corpus, and it is later used as a springboard to find new paths.
3432
3433This is what makes AFL++ so effective: it's not just blindly thrashing a binary;
3434it adapts to the binary by leveraging information about paths.
3435
3436### Fuzzing Runs
3437
3438For doing a fuzzing run, I expect about a week or two where my computer is
3439basically unusable, except for text editing and light web browsing.
3440
3441Yes, it can take two weeks for me to do a full fuzzing run, and that does *not*
3442include the time needed to find and fix crashes; it only counts the time on the
3443*last* run, the one that does not find any crashes. This means that the entire
3444process can take a month or more.
3445
3446What I use as an indicator that the fuzzing run is good enough is when the
3447number of "Pending" paths (see [AFL++ Quickstart][129] above) for all fuzzer
3448instances, except maybe the deterministic instances, is below 50. And even then,
3449I try to let deterministic instances get that far as well.
3450
3451You can see how many pending paths are left in the "path geometry" section of
3452the [AFL++][125] dashboard.
3453
3454Also, to make [AFL++][125] quit, you need to send it a `SIGINT`, either with
3455`Ctrl+c` or some other method. It will not quit until you tell it to.
3456
3457### Radamsa
3458
3459I rarely use [Radamsa][99] instead of [AFL++][125]. In fact, it's only happened
3460once.
3461
3462The reason I use [Radamsa][99] instead of [AFL++][125] is because it is easier
3463to use with varying command-line arguments, which was needed for testing `bc`'s
3464command-line expression parsing code, and [AFL++][125] is best when testing
3465input from `stdin`.
3466
3467[`scripts/radamsa.sh`][100] does also do fuzzing on the [AFL++][125] inputs, but
3468it's not as effective at that, so I don't really use it for that either.
3469
3470[`scripts/radamsa.sh`][100] and [Radamsa][99] were only really used once; I have
3471not had to touch the command-line expression parsing code since.
3472
3473### [AddressSanitizer][21] with Fuzzing
3474
3475One advantage of using [AFL++][125] is that it saves every test case that
3476generated a new path (see [AFL++ Quickstart][129] above), and it doesn't delete
3477them when the user makes it quit.
3478
3479Keeping them around is not a good idea, for several reasons:
3480
3481* They are frequently large.
3482* There are a lot of them.
3483* They go stale; after `bc` is changed, the generated paths may not be valid
3484  anymore.
3485
3486However, before they are deleted, they can definitely be leveraged for even
3487*more* bug squashing by running *all* of the paths through a build of `bc` with
3488[AddressSanitizer][21].
3489
3490This can easily be done with these four commands:
3491
3492```
3493./scripts/fuzz_prep.sh -a
3494./scripts/afl.py --asan bc1
3495./scripts/afl.py --asan bc2
3496./scripts/afl.py --asan bc3
3497./scripts/afl.py --asan dc
3498```
3499
3500(By the way, the last four commands could be run in separate terminals to do the
3501processing in parallel.)
3502
3503These commands build an [ASan][21]-enabled build of `bc` and `dc` and then they
3504run `bc` and `dc` on all of the found crashes and path output corpuses. This is
3505to check that no path or crash has found any memory errors, including memory
3506leaks.
3507
3508Because the output corpuses can contain test cases that generate infinite loops
3509in `bc` or `dc`, [`scripts/afl.py`][94] has a timeout of 8 seconds, which is far
3510greater than the timeout that [AFL++][125] uses and should be enough to catch
3511any crash.
3512
3513If [AFL++][125] fails to find crashes *and* [ASan][21] fails to find memory
3514errors on the outputs of [AFL++][125], that is an excellent indicator of very
3515few bugs in `bc`, and a release can be made with confidence.
3516
3517## Code Concepts
3518
3519This section is about concepts that, if understood, will make it easier to
3520understand the code as it is written.
3521
3522The concepts in this section are not found in a single source file, but they are
3523littered throughout the code. That's why I am writing them all down in a single
3524place.
3525
3526### POSIX Mode
3527
3528POSIX mode is `bc`-only.
3529
3530In fact, POSIX mode is two different modes: Standard Mode and Warning Mode.
3531These modes are designed to help users write POSIX-compatible `bc` scripts.
3532
3533#### Standard Mode
3534
3535Standard Mode is activated with the `-s` or `--standard` flags.
3536
3537In this mode, `bc` will error if any constructs are used that are not strictly
3538compatible with the [POSIX `bc` specification][2].
3539
3540#### Warning Mode
3541
3542Warning Mode is activated with the `-w` or `--warn` flags.
3543
3544In this mode, `bc` will issue warnings, but continue, if any constructs are used
3545that are not strictly compatible with the [POSIX `bc` specification][2].
3546
3547### Memory Management
3548
3549The memory management in `bc` is simple: everything is owned by one thing.
3550
3551If something is in a vector, it is owned by that vector.
3552
3553If something is contained in a struct, it is owned by that struct with one
3554exception: structs can be given pointers to other things, but only if those
3555other things will outlast the struct itself.
3556
3557As an example, the `BcParse` struct has a pointer to the one `BcProgram` in
3558`bc`. This is okay because the program is initialized first and deallocated
3559last.
3560
3561In other words, it's simple: if a field in a struct is a pointer, then unless
3562that pointer is directly allocated by the struct (like the vector array or the
3563number limb array), that struct does not own the item at that pointer.
3564Otherwise, the struct *does* own the item.
3565
3566### [Async-Signal-Safe][115] Signal Handling
3567
3568`bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but
3569`bc` is, by and large, CPU-bound. This has several consequences, but the biggest
3570is that there is no easy way to allow signals to interrupt it.
3571
3572This consequence is not obvious, but it comes from the fact that a lot of I/O
3573operations can be interrupted and return [`EINTR`][198]. This makes such I/O
3574calls natural places for allowing signals to interrupt execution, even when the
3575signal comes during execution, and not interrupting I/O calls. The way this is
3576done is setting a flag in the signal handler, which is checked around the time
3577of the I/O call, when it is convenient.
3578
3579Alternatively, I/O bound programs can use the [self-pipe trick][199].
3580
3581Neither of these are possible in `bc` because the execution of math code can
3582take a long time. If a signal arrives during this long execution time, setting a
3583flag like an I/O bound application and waiting until the next I/O call could
3584take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a
3585week to calculate a million digits of pi, and it's not slow as far as `bc`
3586implementations go.)
3587
3588Thus, using just the technique of setting the flag just will not work for an
3589interactive calculator.
3590
3591Well, it can, but it requires a lot of code and massive inefficiencies. I know
3592this because that was the original implementation.
3593
3594The original implementation set a flag and just exit the signal handler. Then,
3595on just about every loop header, I have a check for the signal flag. These
3596checks happened on every iteration of every loop. It was a massive waste because
3597it was polling, and [polling is evil][200].
3598
3599So for version [3.0.0][32], I expended a lot of effort to change the
3600implementation.
3601
3602In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`)
3603to tell the signal handler whether it can use `longjmp()` to stop the current
3604execution. If so, it does. If not, it sets a flag, which then is used by the
3605code outside the signal handler that set the `vm.sig_lock` flag. When that code
3606unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that
3607code executes the `longjmp()` and stops the current execution.
3608
3609Other than that, the rest of the interrupt-based implementation is best
3610described in the [Error Handling][97].
3611
3612However, there are rules for signal handlers that I must lay out.
3613
3614First, signal handlers can only call [async-signal-safe][115] functions.
3615
3616Second, any field set or read by both the signal handler and normal code must be
3617a `volatile sig_atomic_t`.
3618
3619Third, when setting such fields, they must be set to constants and no math can
3620be done on them. This restriction and the above restriction exist in order to
3621ensure that the setting of the fields is always atomic with respect to signals.
3622
3623These rules exist for *any* code using Unix signal handlers, not just `bc`.
3624
3625#### Vectors and Numbers
3626
3627Vectors and numbers needed special consideration with the interrupt-based signal
3628handling.
3629
3630When vectors and numbers are about to allocate, or *reallocate* their arrays,
3631they need to lock signals to ensure that they do not call `malloc()` and friends
3632and get interrupted by a signal because, as you will see in the [Error
3633Handling][97] section, `longjmp()` cannot be used in a signal handler if it may
3634be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and
3635friends.
3636
3637### Asserts
3638
3639If you asked me what procedure is used the most in `bc`, I would reply without
3640hesitation, "`assert()`."
3641
3642I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with
3643[AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes,
3644and a failing `assert()` counts as one.
3645
3646So while a lot of bad bugs might have corrupted data and *not* caused crashes,
3647because I put in so many `assert()`'s, they were *turned into* crashing bugs,
3648and [AFL++][125] found them.
3649
3650By far, the most bugs it found this way was in the `bc` parser. (See the [`bc`
3651Parsing][110] for more information.) And even though I was careful to put
3652`assert()`'s everywhere, most parser bugs manifested during execution of
3653bytecode because the virtual machine assumes the bytecode is valid.
3654
3655Sidenote: one of those bugs caused an infinite recursion when running the sine
3656(`s()`) function in the math library, so yes, parser bugs can be *very* weird.
3657
3658Anyway, the way I did `assert()`'s was like this: whenever I realized that I
3659had put assumptions into the code, I would put an `assert()` there to test it
3660**and** to *document* it.
3661
3662Yes, documentation. In fact, by far the best documentation of the code in `bc`
3663is actually the `assert()`'s. The only time I would not put an `assert()` to
3664test an assumption is if that assumption was already tested by an `assert()`
3665earlier.
3666
3667As an example, if a function calls another function and passes a pointer that
3668the caller previously `assert()`'ed was *not* `NULL`, then the callee does not
3669have to `assert()` it too, unless *also* called by another function that does
3670not `assert()` that.
3671
3672At first glance, it may seem like putting asserts for pointers being non-`NULL`
3673everywhere would actually be good, but unfortunately, not for fuzzing. Each
3674`assert()` is a branch, and [AFL++][125] rates its own effectiveness based on
3675how many branches it covers. If there are too many `assert()`'s, it may think
3676that it is not being effective and that more fuzzing is needed.
3677
3678This means that `assert()`'s show up most often in two places: function
3679preconditions and function postconditions.
3680
3681Function preconditions are `assert()`'s that test conditions relating to the
3682arguments a function was given. They appear at the top of the function, usually
3683before anything else (except maybe initializing a local variable).
3684
3685Function postconditions are `assert()`'s that test the return values or other
3686conditions when a function exits. These are at the bottom of a function or just
3687before a `return` statement.
3688
3689The other `assert()`'s cover various miscellaneous assumptions.
3690
3691If you change the code, I ***HIGHLY*** suggest that you use `assert()`'s to
3692document your assumptions. And don't remove them when [AFL++][125] gleefully
3693crashes `bc` and `dc` over and over again.
3694
3695### Vectors
3696
3697In `bc`, vectors mean resizable arrays, and they are the most fundamental piece
3698of code in the entire codebase.
3699
3700I had previously written a [vector implementation][112], which I used to guide
3701my decisions, but I wrote a new one so that `bc` would not have a dependency. I
3702also didn't make it as sophisticated; the one in `bc` is very simple.
3703
3704Vectors store some information about the type that they hold:
3705
3706* The size (as returned by `sizeof`).
3707* An enum designating the destructor.
3708
3709If the destructor is `BC_DTOR_NONE`, it is counted as the type not having a
3710destructor.
3711
3712But by storing the size, the vector can then allocate `size * cap` bytes, where
3713`cap` is the capacity. Then, when growing the vector, the `cap` is doubled again
3714and again until it is bigger than the requested size.
3715
3716But to store items, or to push items, or even to return items, the vector has to
3717figure out where they are, since to it, the array just looks like an array of
3718bytes.
3719
3720It does this by calculating a pointer to the underlying type with
3721`v + (i * size)`, where `v` is the array of bytes, `i` is the index of the
3722desired element, and `size` is the size of the underlying type.
3723
3724Doing that, vectors can avoid undefined behavior (because `char` pointers can
3725be cast to any other pointer type), while calculating the exact position of
3726every element.
3727
3728Because it can do that, it can figure out where to push new elements by
3729calculating `v + (len * size)`, where `len` is the number of items actually in
3730the vector.
3731
3732By the way, `len` is different from `cap`. While `cap` is the amount of storage
3733*available*, `len` is the number of actual elements in the vector at the present
3734point in time.
3735
3736Growing the vector happens when `len` is equal to `cap` *before* pushing new
3737items, not after.
3738
3739To add a destructor, you need to add an enum item to `BcDtorType` in
3740[`include/vector.h`][174] and add the actual destructor in the same place as the
3741enum item in the `bc_vec_dtors[]` array in [`src/data.c`][131].
3742
3743#### Pointer Invalidation
3744
3745There is one big danger with the vectors as currently implemented: pointer
3746invalidation.
3747
3748If a piece of code receives a pointer from a vector, then adds an item to the
3749vector before they finish using the pointer, that code must then update the
3750pointer from the vector again.
3751
3752This is because any pointer inside the vector is calculated based off of the
3753array in the vector, and when the vector grows, it can `realloc()` the array,
3754which may move it in memory. If that is done, any pointer returned by
3755`bc_vec_item()`, `bc_vec_top()` and `bc_vec_item_rev()` will be invalid.
3756
3757This fact was the single most common cause of crashes in the early days of this
3758`bc`; wherever I have put a comment about pointers becoming invalidated and
3759updating them with another call to `bc_vec_item()` and friends, *do **NOT**
3760remove that code!*
3761
3762#### Maps
3763
3764Maps in `bc` are...not.
3765
3766They are really a combination of two vectors. Those combinations are easily
3767recognized in the source because one vector is named `<name>s` (plural), and the
3768other is named `<name>_map`.
3769
3770There are currently three, all in `BcProgram`:
3771
3772* `fns` and `fn_map` (`bc` functions).
3773* `vars` and `var_map` (variables).
3774* `arrs` and `arr_map` (arrays).
3775
3776They work like this: the `<name>_map` vector holds `BcId`'s, which just holds a
3777string and an index. The string is the name of the item, and the index is the
3778index of that item in the `<name>s` vector.
3779
3780Obviously, I could have just done a linear search for items in the `<name>s`
3781vector, but that would be slow with a lot of functions/variables/arrays.
3782Instead, I ensure that whenever an item is inserted into the `<name>_map`
3783vector, the item is inserted in sorted order. This means that the `<name>_map`
3784is always sorted (by the names of the items).
3785
3786So when looking up an item in the "map", what is really done is this:
3787
37881.	A binary search is carried out on the names in the `<name>_map` vector.
37892.	When one is found, it returns the index in the `<name>_map` vector where the
3790	item was found.
37913.	This index is then used to retrieve the `BcId`.
37924.	The index from the `BcId` is then used to index into the `<name>s` vector,
3793	which returns the *actual* desired item.
3794
3795Why were the `<name>s` and `<name>_map` vectors not combined for ease? The
3796answer is that sometime, when attempting to insert into the "map", code might
3797find that something is already there. For example, a function with that name may
3798already exist, or the variable might already exist.
3799
3800If the insert fails, then the name already exists, and the inserting code can
3801forego creating a new item to put into the vector. However, if there is no item,
3802the inserting code must create a new item and insert it.
3803
3804If the two vectors were combined together, it would not be possible to separate
3805the steps such that creating a new item could be avoided if it already exists.
3806
3807#### Slabs and Slab Vectors
3808
3809`bc` allocates *a lot* of small strings, and small allocations are the toughest
3810for general-purpose allocators to handle efficiently.
3811
3812Because of that reason, I decided to create a system for allocating small
3813strings using something that I call a "slab vector" after [slab
3814allocators][201].
3815
3816These vectors allocate what I call "slabs," which are just an allocation of a
3817single page with a length to tell the slab how much of the slab is used.
3818
3819The vector itself holds slabs, and when the slab vector is asked to allocate a
3820string, it attempts to in the last slab. If that slab cannot do so, it allocates
3821a new slab and allocates from that.
3822
3823There is one exception: if a string is going to be bigger than 128 bytes, then
3824the string is directly allocated, and a slab is created with that pointer and a
3825length of `SIZE_MAX`, which tells the slab vector that it is a direct
3826allocation. Then, the last slab is pushed into the next spot and the new special
3827slab is put into the vacated spot. This ensures that a non-special slab is
3828always last.
3829
3830### Command-Line History
3831
3832When I first wrote `bc`, I immediately started using it in order to eat my own
3833dog food.
3834
3835It sucked, and the biggest reason why was because of the lack of command-line
3836history.
3837
3838At first, I just dealt with it, not knowing how command-line history might be
3839implemented.
3840
3841Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had
3842known about for some time.
3843
3844It turned out to be easier than I thought; the hardest part was the tedious
3845renaming of everything to fit the `bc` naming scheme.
3846
3847Understanding command-line history in `bc` is really about understanding VT-100
3848escape codes, so I would start there.
3849
3850Now, the history implementation of `bc` has been adapted far beyond that initial
3851adaptation to make the command-line history implementation perfect for `bc`
3852alone, including integrating it into `bc`'s [Custom I/O][114] and making sure
3853that it does not disturb output that did not end with a newline.
3854
3855On top of that, at one point, I attempted to get history to work on Windows. It
3856barely worked after a lot of work and a lot of portability code, but even with
3857all of that, it does not have at least one feature: multi-line pasting from the
3858clipboard.
3859
3860#### Editline and Readline
3861
3862I also implemented an integration with both editline and readline, though not at
3863the same time. This allows distributions that want to use either one in `bc` to
3864do so. This enables users to use their `.editrc` and `.inputrc` settings.
3865
3866The integration is custom to each library, and it does *not* involve any of
3867`bc`'s custom history code. It also required changes to `bc`'s [Custom
3868I/O][114].
3869
3870### Error Handling
3871
3872The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it
3873became one of the things that taught me the most about C in particular and
3874programming in general.
3875
3876Before then, error handling was manual. Almost all functions returned a
3877`BcStatus` indicating if an error had occurred. This led to a proliferation of
3878lines like:
3879
3880```
3881if (BC_ERR(s)) return s;
3882```
3883
3884In fact, a quick and dirty count of such lines in version `2.7.2` (the last
3885version before [`3.0.0`][32]) turned up 252 occurrences of that sort of line.
3886
3887And that didn't even guarantee that return values were checked *everywhere*.
3888
3889But before I can continue, let me back up a bit.
3890
3891From the beginning, I decided that I would not do what GNU `bc` does on errors;
3892it tries to find a point at which it can recover. Instead, I decided that I
3893would have `bc` reset to a clean slate, which I believed, would reduce the
3894number of bugs where an unclean state caused errors with continuing execution.
3895
3896So from the beginning, errors would essentially unwind the stack until they got
3897to a safe place from which to clean the slate, reset, and ask for more input.
3898
3899Well, if that weren't enough, `bc` also has to handle [POSIX signals][113]. As
3900such, it had a signal handler that set a flag. But it could not safely interrupt
3901execution, so that's all it could do.
3902
3903In order to actually respond to the signal, I had to litter checks for the flag
3904*everywhere* in the code. And I mean *everywhere*. They had to be checked on
3905every iteration of *every* loop. They had to be checked going into and out of
3906certain functions.
3907
3908It was a mess.
3909
3910But fortunately for me, signals did the same thing that errors did: they unwound
3911the stack to the *same* place.
3912
3913Do you see where I am going with this?
3914
3915It turns out that what I needed was a [async-signal-safe][115] form of what
3916programmers call "exceptions" in other languages.
3917
3918I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement
3919exceptions, so I thought I would learn how to use them. How hard could it be?
3920
3921Quite hard, it turns out, especially in the presence of signals. And that's
3922because there are a few big snares:
3923
39241.	The value of any local variables are not guaranteed to be preserved after a
3925	`longjmp()` back into a function.
39262.	While `longjmp()` is required to be [async-signal-safe][115], if it is
3927	invoked by a signal handler that interrupted a non-[async-signal-safe][115]
3928	function, then the behavior is undefined.
39293.	Any mutation that is not guaranteed to be atomic with respect to signals may
3930	be incomplete when a signal arrives.
3931
3932Oh boy.
3933
3934For number 1, the answer to this is to hide data that must stay changed behind
3935pointers. Only the *pointers* are considered local, so as long as I didn't do
3936any modifying pointer arithmetic, pointers and their data would be safe. For
3937cases where I have local data that must change and stay changed, I needed to
3938*undo* the `setjmp()`, do the change, and the *redo* the `setjmp()`.
3939
3940For number 2 and number 3, `bc` needs some way to tell the signal handler that
3941it cannot do a `longjmp()`. This is done by "locking" signals with a `volatile
3942sig_atomic_t`. (For more information, see the [Async-Signal-Safe Signal
3943Handling][173] section.) For every function that calls a function that is not
3944async-signal-safe, they first need to use `BC_SIG_LOCK` to lock signals, and
3945afterward, use `BC_SIG_UNLOCK` to unlock signals.
3946
3947Code also need to do this for all global, non-atomic mutation, which means that
3948modifying any part of the `BcVm` global struct.
3949
3950`BC_SIG_UNLOCK` has another requirement: it must check for signals or errors and
3951jump if necessary.
3952
3953On top of all of that, *all* functions with cleanup needed to be able to run
3954their cleanup. This meant that `longjmp()` could not just jump to the finish; it
3955had to start what I call a "jump series," using a stack of `jmp_buf`'s
3956(`jmp_bufs` in `BcVm`). Each `longjmp()` uses the top of the `jmp_bufs` stack to
3957execute its jump. Then, if the cleanup code was executed because of a jump, the
3958cleanup code was responsible for continuing the jump series by popping the
3959previous item off the stack and using the new top of the stack for a jump.
3960
3961In this way, C++-style exceptions were implemented in pure C. Not fun, but it
3962works. However, the order of operations matters, especially in the macros that
3963help implement the error handling.
3964
3965For example, in `BC_UNSETJMP`, signals are unlocked before checking for signals.
3966If a signal comes between, that's fine; it will still cause a jump to the right
3967place. However, disabling the lock after could mean that a signal could come
3968*after* checking for signals, but before signals were unlocked, causing the
3969handling of the signal to be delayed.
3970
3971#### Custom I/O
3972
3973Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and
3974`longjmp()` for error handling (see the [Error Handling][97] section), and the
3975buffered I/O in `libc` does not interact well with the use of those procedures;
3976all of the buffered I/O API is basically non-[async-signal-safe][115].
3977
3978Implementing custom buffered I/O had other benefits. First, it allowed me to
3979tightly integrate history with the I/O code. Second, it allowed me to make
3980changes to history in order to make it adapt to user prompts.
3981
3982### Lexing
3983
3984To simplify parsing, both calculators use lexers to turn the text into a more
3985easily-parsable form.
3986
3987While some tokens are only one character long, others require many tokens, and
3988some of those need to store all of the text corresponding to the token for use
3989by the parsers. Tokens that need to store their corresponding text include, but
3990are not limited to:
3991
3992* Strings.
3993* Numbers.
3994* Identifiers.
3995
3996For this purpose, the lexer has a [vector][111] named `str` to store the data
3997for tokens. This data is overwritten if another token is lexed that needs to
3998store data, so the parsers need to copy the data before calling the lexer again.
3999
4000Both lexers do some of the same things:
4001
4002* Lex identifiers into tokens, storing the identifier in `str`.
4003* Lex number strings into tokens, storing the string in `str`.
4004* Lex whitespace.
4005* Lex comments.
4006
4007Other than that, and some common plumbing, the lexers have separate code.
4008
4009#### `dc` Lexing
4010
4011The `dc` lexer is remarkably simple; in fact, besides [`src/main.c`][205],
4012[`src/bc.c`][40], and [`src/dc.c`][44], which just contain one function each,
4013the only file smaller than [`src/dc_lex.c`][45] is [`src/args.c`][206], which
4014just processes command-line arguments after they are parsed by
4015[`src/opt.c`][51].
4016
4017For most characters, the `dc` lexer is able to convert directly from the
4018character to its corresponding token. This happens using `dc_lex_tokens[]` in
4019[`src/data.c`][131].
4020
4021`dc`'s lexer also has to lex the register name after lexing tokens for commands
4022that need registers.
4023
4024And finally, `dc`'s lexer needs to parse `dc` strings, which is the only part of
4025the `dc` lexer that is more complex than the `bc` lexer. This is because `dc`
4026strings need to have a balanced number of brackets.
4027
4028#### `bc` Lexing
4029
4030The `bc` lexer is fairly simple. It does the following things:
4031
4032* Lexes `bc` strings.
4033* Lexes `bc` identifiers. This is necessary because this is how `bc` keywords
4034  are lexed. After ensuring that an identifier is not a keyword, the `bc` lexer
4035  allows the common identifier function to take over.
4036* Turns characters and groups of characters into `bc` operator tokens.
4037
4038### Parsing
4039
4040The difference between parsing `bc` and `dc` code is...vast. The `dc` parser is
4041simple, while the `bc` parser is the most complex piece of code in the entire
4042codebase.
4043
4044However, they both do some of the same things.
4045
4046First, the parsers do *not* use [abstract syntax trees][207]; instead, they
4047directly generate the bytecode that will be executed by the `BcProgram` code.
4048Even in the case of `bc`, this heavily simplifies the parsing because the
4049[Shunting-Yard Algorithm][109] is designed to generate [Reverse Polish
4050Notation][108], which is basically directly executable.
4051
4052Second, any extra data that the `BcProgram` needs for execution is stored into
4053functions (see the [Functions][208] section). These include constants and
4054strings.
4055
4056#### `dc` Parsing
4057
4058The parser for `dc`, like its lexer, is remarkably simple. In fact, the easiness
4059of lexing and parsing [Reverse Polish notation][108] is probably why it was used
4060for `dc` when it was first created at Bell Labs.
4061
4062For most tokens, the `dc` parser is able to convert directly from the token
4063to its corresponding instruction. This happens using `dc_parse_insts[]` in
4064[`src/data.c`][131].
4065
4066`dc`'s parser also has to parse the register name for commands that need
4067registers. This is the most complex part of the `dc` parser; each different
4068register command needs to be parsed differently because most of them require two
4069or more instructions to execute properly.
4070
4071For example, storing in a register requires a swap instruction and an assignment
4072instruction.
4073
4074Another example are conditional execution instructions; they need to produce the
4075instruction for the condition, and then they must parse a possible "else" part,
4076which might not exist.
4077
4078##### Existing Commands
4079
4080`dc` is based on commands, which are usually one letter. The following table is
4081a table of which ASCII characters are already used:
4082
4083| Characters | Used? | For...                                     |
4084|------------|-------|--------------------------------------------|
4085| Space      | x     | Separator                                  |
4086| `!`        | x     | Conditional Execution of Registers         |
4087| `"`        | x     | Bounded Rand Operator                      |
4088| `#`        | x     | Comments                                   |
4089| `$`        | x     | Truncation                                 |
4090| `%`        | x     | Modulus                                    |
4091| `&`        |       |                                            |
4092| `'`        | x     | Rand Operator                              |
4093| `(`        | x     | Greater Than Operator                      |
4094| `)`        | x     | Less Than Operator                         |
4095| `*`        | x     | Multiplication                             |
4096| `+`        | x     | Addition                                   |
4097| `,`        | x     | Depth of Execution Stack                   |
4098| `-`        | x     | Subtraction                                |
4099| `.`        | x     | Numbers                                    |
4100| `/`        | x     | Division                                   |
4101| `0-9`      | x     | Numbers                                    |
4102| `:`        | x     | Store into Array                           |
4103| `;`        | x     | Load from Array                            |
4104| `<`        | x     | Conditional Execution of Registers         |
4105| `=`        | x     | Conditional Execution of Registers         |
4106| `>`        | x     | Conditional Execution of Registers         |
4107| `?`        | x     | Ask for User Input                         |
4108| `@`        | x     | Places Operator                            |
4109| `A-F`      | x     | Numbers                                    |
4110| `G`        | x     | Equal Operator                             |
4111| `H`        | x     | Shift Left                                 |
4112| `I`        | x     | Push `ibase` onto Stack                    |
4113| `J`        | x     | Push `seed` onto Stack                     |
4114| `K`        | x     | Push `scale` onto Stack                    |
4115| `L`        | x     | Pop off of Register                        |
4116| `M`        | x     | Boolean And Operator                       |
4117| `N`        | x     | Boolean Not Operator                       |
4118| `O`        | x     | Push `obase` onto Stack                    |
4119| `P`        | x     | Byte Stream Printing                       |
4120| `Q`        | x     | Quit Some Number of Macros                 |
4121| `R`        | x     | Pop Top of Stack                           |
4122| `S`        | x     | Push onto Register                         |
4123| `T`        | x     | Push Max `ibase` onto Stack                |
4124| `U`        | x     | Push Max `obase` onto Stack                |
4125| `V`        | x     | Push Max `scale` onto Stack                |
4126| `W`        | x     | Push Max of `'` Operator                   |
4127| `X`        | x     | Scale of a Number                          |
4128| `Y`        | x     | Length of Array                            |
4129| `Z`        | x     | Number of Significant Digits               |
4130| `[`        | x     | Strings                                    |
4131| `\\`       | x     | Escaping Brackets in Strings               |
4132| `]`        | x     | Strings                                    |
4133| `^`        | x     | Power                                      |
4134| `_`        | x     | Negative Numbers and Negation              |
4135| Backtick   |       |                                            |
4136| `a`        | x     | Asciify                                    |
4137| `b`        | x     | Absolute Value                             |
4138| `c`        | x     | Clear Stack                                |
4139| `d`        | x     | Duplication of Top of Stack                |
4140| `e`        | x     | Else in Conditional Execution of Registers |
4141| `f`        | x     | Printing the Stack                         |
4142| `g`        | x     | Global Settings                            |
4143| `h`        | x     | Shift Right                                |
4144| `i`        | x     | Set `ibase`                                |
4145| `j`        | x     | Set `seed`                                 |
4146| `k`        | x     | Set `scale`                                |
4147| `l`        | x     | Load from Register                         |
4148| `m`        | x     | Boolean Or Operator                        |
4149| `n`        | x     | Print and Pop                              |
4150| `o`        | x     | Set `obase`                                |
4151| `p`        | x     | Print with Newline                         |
4152| `q`        | x     | Quit Two Macros                            |
4153| `r`        | x     | Swap Top Two Items                         |
4154| `s`        | x     | Store into Register                        |
4155| `t`        | x     | Equivalent of `bc`'s `is_number()`         |
4156| `u`        | x     | Equivalent of `bc`'s `is_string()`         |
4157| `v`        | x     | Square Root                                |
4158| `w`        |       |                                            |
4159| `x`        | x     | Execute String                             |
4160| `y`        | x     | Current Depth of a Register                |
4161| `z`        | x     | Current Depth of Stack                     |
4162| `{`        | x     | Greater Than or Equal Operator             |
4163| `\|`       | x     | Moduler Exponentiation                     |
4164| `}`        | x     | Less Than or Equal Operator                |
4165| `~`        | x     | Division and Modulus Combined              |
4166
4167#### `bc` Parsing
4168
4169`bc`'s parser is, by far, the most sensitive piece of code in this software, and
4170there is a very big reason for that: `bc`'s standard is awful and defined a very
4171poor language.
4172
4173The standard says that either semicolons or newlines can end statements. Trying
4174to parse the end of a statement when it can either be a newline or a semicolon
4175is subtle. Doing it in the presence of control flow constructs that do not have
4176to use braces is even harder.
4177
4178And then comes the biggest complication of all: `bc` has to assume that it is
4179*always* at a REPL (Read-Eval-Print Loop). `bc` is, first and foremost, an
4180*interactive* utility.
4181
4182##### Flags
4183
4184All of this means that `bc` has to be able to partially parse something, store
4185enough data to recreate that state later, and return, making sure to not
4186execute anything in the meantime.
4187
4188*That* is what the flags in [`include/bc.h`][106] are: they are the state that
4189`bc` is saving for itself.
4190
4191It saves them in a stack, by the way, because it's possible to nest
4192structures, just like any other programming language. Thus, not only does it
4193have to store state, it needs to do it arbitrarily, and still be able to
4194come back to it.
4195
4196So `bc` stores its parser state with flags in a stack. Careful setting of these
4197flags, along with properly using them and maintaining the flag stack, are what
4198make `bc` parsing work, but it's complicated. In fact, as I mentioned, the `bc`
4199parser is the single most subtle, fickle, and sensitive piece of code in the
4200entire codebase. Only one thing came close once: square root, and that was only
4201sensitive because I wrote it wrong. This parser is pretty good, and it is
4202*still* sensitive. And flags are the reason why.
4203
4204For more information about what individual flags there are, see the comments in
4205[`include/bc.h`][106].
4206
4207##### Labels
4208
4209`bc`'s language is Turing-complete. That means that code needs the ability to
4210jump around, specifically to implement control flow like `if` statements and
4211loops.
4212
4213`bc` handles this while parsing with what I called "labels."
4214
4215Labels are markers in the bytecode. They are stored in functions alongside the
4216bytecode, and they are just indices into the bytecode.
4217
4218When the `bc` parser creates a label, it pushes an index onto the labels array,
4219and the index of the label in that array is the index that will be inserted into
4220the bytecode.
4221
4222Then, when a jump happens, the index pulled out of the bytecode is used to index
4223the labels array, and the label (index) at the index is then used to set the
4224instruction pointer.
4225
4226##### Cond Labels
4227
4228"Cond" labels are so-called because they are used by conditionals.
4229
4230The key to them is that they come *before* the code that uses them. In other
4231words, when jumping to a condition, code is jumping *backwards*.
4232
4233This means that when a cond label is created, the value that should go there is
4234well-known. Cond labels are easy.
4235
4236However, they are still stored on a stack so that the parser knows what cond
4237label to use.
4238
4239##### Exit Labels
4240
4241Exit labels are not so easy.
4242
4243"Exit" labels are so-called because they are used by code "exiting" out of `if`
4244statements or loops.
4245
4246The key to them is that they come *after* the code that uses them. In other
4247words, when jumping to an exit, code is jumping *forwards*.
4248
4249But this means that when an exit label is created, the value that should go
4250there is *not* known. The code that needs it must be parsed and generated first.
4251
4252That means that exit labels are created with the index of `SIZE_MAX`, which is
4253then specifically checked for with an assert in `bc_program_exec()` before using
4254those indices.
4255
4256There should ***NEVER*** be a case when an exit label is not filled in properly
4257if the parser has no bugs. This is because every `if` statement, every loop,
4258must have an exit, so the exit must be set. If not, there is a bug.
4259
4260Exit labels are also stored on a stack so that the parser knows what exit label
4261to use.
4262
4263##### Expression Parsing
4264
4265`bc` has expressions like you might expect in a typical programming language.
4266This means [infix notation][107].
4267
4268One thing about infix notation is that you can't just generate code straight
4269from it like you can with [Reverse Polish notation][108]. It requires more work
4270to shape it into a form that works for execution on a stack machine.
4271
4272That extra work is called the [Shunting-Yard algorithm][109], and the form it
4273translates infix notation into is...[Reverse Polish notation][108].
4274
4275In order to understand the rest of this section, you must understand the
4276[Shunting-Yard algorithm][109]. Go do that before you read on.
4277
4278###### Operator Stack
4279
4280In `bc`, the [Shunting-Yard algorithm][109] is implemented with bytecode as the
4281output and an explicit operator stack (the `ops` field in `BcParse`) as the
4282operator stack. It stores tokens from `BcLex`.
4283
4284However, there is one **HUGE** hangup: multiple expressions can stack. This
4285means that multiple expressions can be parsed at one time (think an array element
4286expression in the middle of a larger expression). Because of that, we need to
4287keep track of where the previous expression ended. That's what `start` parameter
4288to `bc_parse_operator()` is.
4289
4290Parsing multiple expressions on one operator stack only works because
4291expressions can only *stack*; this means that, if an expression begins before
4292another ends, it must *also* end before that other expression ends. This
4293property ensures that operators will never interfere with each other on the
4294operator stack.
4295
4296Also, the operator precedence is reversed: a lower precedence *value* means a
4297higher precedence.
4298
4299###### Recursion
4300
4301Because expressions can stack, parsing expressions actually requires recursion.
4302Well, it doesn't *require* it, but the code is much more readable that way.
4303
4304This recursion is indirect; the functions that `bc_parse_expr_err()` (the actual
4305expression parsing function) calls can, in turn, call it.
4306
4307###### Expression Flags
4308
4309There is one more big thing: not all expressions in `bc` are equal.
4310
4311Some expressions have requirements that others don't have. For example, only
4312array arguments can be arrays (which are technically not expressions, but are
4313treated as such for parsing), and some operators (in POSIX) are not allowed in
4314certain places.
4315
4316For this reason, functions that are part of the expression parsing
4317infrastructure in `bc`'s parser usually take a `flags` argument. This is meant
4318to be passed to children, and somewhere, they will be checked to ensure that the
4319resulting expression meets its requirements.
4320
4321There are also places where the flags are changed. This is because the
4322requirements change.
4323
4324Maintaining the integrity of the requirements flag set is an important part of
4325the `bc` parser. However, they do not have to be stored on a stack because their
4326stack is implicit from the recursion that expression parsing uses.
4327
4328### Functions
4329
4330Functions, in `bc`, are data structures that contain the bytecode and data
4331produced by the parsers. Functions are what the `BcProgram` program executes.
4332
4333#### Main and Read Functions
4334
4335There are two functions that always exist, which I call the "main" and "read"
4336functions.
4337
4338The "main" function is the function in which any code and data outside other
4339functions is put. Basically, it is the function where the scripting code ends
4340up.
4341
4342The "read" function is the function that is reset and parsed every time a call
4343to the `read()` builtin function happens.
4344
4345#### `dc` Strings
4346
4347In `dc`, strings can be executed, and since there are no actual "functions" in
4348`dc`, strings are handled as functions. In fact, they are effectively translated
4349into functions by parsing.
4350
4351##### Tail Calls
4352
4353Since strings in `dc` are functions, and the fact that `dc` has no native loops,
4354such loops are implemented in `dc` code using strings with conditional execution
4355commands at the end of strings.
4356
4357When such conditional execution, or even unconditional execution, commands are
4358the very last commands in a string, then `dc` can perform a [tail call][202].
4359
4360This is done by recording the fact that a tail call happened, done by
4361incrementing an integer on a stack. When a string is executed *without* a tail
4362call, a new entry is pushed onto the stack with the integer `1`.
4363
4364When a string finally quits that followed tail calls, its stack entry is popped,
4365eliminating all of those tail calls.
4366
4367Why perform tail calls? Because otherwise, `dc` would be subject to the same
4368thing that plagues [functional programming languages][203]: stack overflow. In
4369`dc`'s case, that would manifest itself as a growing [heap][204], because the
4370execution stack is stored on the heap, until a fatal allocation failure would
4371occur.
4372
4373#### Execution
4374
4375Execution is handled by an interpreter implemented using `BcProgram` and code
4376in [`src/program.c`][53].
4377
4378The interpreter is a mix between a [stack machine][210] and a [register
4379machine][211]. It is a stack machine in that operations happen on a stack I call
4380the "results stack," but it is a register machine in that items on the stack can
4381be stored to and loaded from "registers" (`dc` terminology), variables (`bc`
4382terminology), and arrays.
4383
4384##### Stacks
4385
4386There are two stacks in the interpreter:
4387
4388* The "results" stack (as mentioned above).
4389* The "execution" stack.
4390
4391The results stack (the `results` field of the `BcProgram` struct) is the stack
4392where the results of computations are stored. It is what makes the interpreter
4393part [stack machine][210]. It is filled with `BcResult`'s.
4394
4395The execution stack (the `stack` field of the `BcProgram` struct) is the stack
4396that tracks the current execution state of the interpreter. It is the presence
4397of this separate stack that allows the interpreter to implement the machine as a
4398loop, rather than recursively. It is filled with `BcInstPtr`'s, which are the
4399"instruction pointers."
4400
4401These instruction pointers have three fields, all integers:
4402
4403* `func`, the index of the function that is currently executing.
4404* `idx`, the index of the next bytecode instruction to execute in the function's
4405  bytecode array.
4406* `len`, which is the length of the results stack when the function started
4407  executing. This is not used by `dc`, but it used by `bc` because functions
4408  in `bc` should never affect the results stack of their callers.
4409
4410With these three fields, and always executing using the instruction pointer at
4411the top of the execution stack, the interpreter can always keep track of its
4412execution.
4413
4414When a function or a string starts executing, a new `BcInstPtr` is pushed onto
4415the execution stack for it. This includes if a function was called recursively.
4416And then, when the function or string returns, its `BcInstPtr` is popped off of
4417the execution stack.
4418
4419##### Bytecode
4420
4421Execution of functions are done through bytecode produced directly by the
4422parsers (see the [Parsing][209]). This bytecode is stored in the `code`
4423[vector][111] of the `BcFunc` struct.
4424
4425This is a vector for two reasons:
4426
4427* It makes it easier to add bytecode to the vector in the parsers.
4428* `bc` allows users to redefine functions.
4429
4430The reason I can use bytecode is because there are less than 256 instructions,
4431so an `unsigned char` can store all the bytecodes.
4432
4433###### Bytecode Indices
4434
4435There is one other factor to bytecode: there are instructions that need to
4436reference strings, constants, variables, or arrays. Bytecode need some way to
4437reference those things.
4438
4439Fortunately, all of those things can be referenced in the same way: with indices
4440because all of the items are in vectors.
4441
4442So `bc` has a way of encoding an index into bytecode. It does this by, after
4443pushing the instruction that references anything, pushing a byte set to the
4444length of the index in bytes, then the bytes of the index are pushed in
4445little-endian order.
4446
4447Then, when the interpreter encounters an instruction that needs one or more
4448items, it decodes the index or indices there and updates the `idx` field of the
4449current `BcInstPtr` to point to the byte after the index or indices.
4450
4451One more thing: the encoder of the indices only pushes as many bytes as
4452necessary to encode the index. It stops pushing when the index has no more bytes
4453with any 1 bits.
4454
4455##### Variables
4456
4457In `bc`, the vector of variables, `vars` in `BcProgram`, is not a vector of
4458numbers; it is a vector of vector of numbers. The first vector is the vector of
4459variables, the second is the variable stack, and the last level is the actual
4460number.
4461
4462This is because both `bc` and `dc` need variables to be stacks.
4463
4464For `dc`, registers are *defined* to be stacks.
4465
4466For `bc`, variables as stacks is how function arguments/parameters and function
4467`auto` variables are implemented.
4468
4469When a function is called, and a value needs to be used as a function argument,
4470a copy of the value is pushed onto the stack corresponding to the variable with
4471the same name as the function's parameter. For `auto` variables, a new number
4472set to zero is pushed onto each stack corresponding to the `auto` variables.
4473(Zero is used because the [`bc` spec][2] requires that `auto` variables are set
4474to zero.)
4475
4476It is in this way that the old value of the variable, which may not even be
4477related to the function parameter or `auto` variable, is preserved while the
4478variable is used as a function parameter or `auto` variable.
4479
4480When the function returns, of course, the stacks of the variables for the
4481parameters and `auto`'s will have their top item popped, restoring the old value
4482as it was before the function call.
4483
4484##### Arrays
4485
4486Like variables, arrays are also implemented as stacks. However, because they are
4487arrays, there is yet another level; the `arrs` field in `BcProgram` is a vector
4488of vectors of vectors of numbers. The first of the two levels is the vector of
4489arrays, the second the stack of for each array, the third the actual array, and
4490last the numbers in the array.
4491
4492`dc` has no need of this extra stack, but `bc` does because arrays can be
4493function parameters themselves.
4494
4495When arrays are used for function arguments, they are copied with a deep copy;
4496each item of the source vector is copied. This is because in `bc`, according to
4497the [`bc` spec][2], all function arguments are passed by value.
4498
4499However, array references are possible (see below).
4500
4501When arrays are used as `auto`'s, a new vector is pushed with one element; if
4502more elements are needed, the array is grown automatically, and new elements are
4503given the value of zero.
4504
4505In fact, if *any* array is accessed and does not have an element at that index,
4506the array is automatically grown to that size, and all new elements are given
4507the value zero. This behavior is guaranteed by the [`bc` spec][2].
4508
4509###### Array References
4510
4511Array references had to be implemented as vectors themselves because they must
4512be pushed on the vectors stacks, which, as seen above, expect vectors
4513themselves.
4514
4515So thus, references are implemented as vectors on the vector stacks. These
4516vectors are not vectors of vectors themselves; they are vectors of bytes; in
4517fact, the fact that they are byte vectors and not vector vectors is how a
4518reference vector is detected.
4519
4520These reference vectors always have the same two things pushed: a byte encoding
4521(the same way bytecode indices are) of the referenced vector's index in the
4522`arrs` vector, and a byte encoding of the referenced vectors index in the vector
4523stack.
4524
4525If an item in a referenced vector is needed, then the reference is dereferenced,
4526and the item is returned.
4527
4528If a reference vector is passed to a function that does *not* expect a
4529reference, the vector is dereferenced and a deep copy is done, in the same way
4530as vectors are copied for normal array function parameters.
4531
4532### Callbacks
4533
4534There are many places in `bc` and `dc` where function pointers are used:
4535
4536* To implement destructors in vectors. (See the [Vectors][111] section.)
4537* To select the correct lex and parse functions for `bc` and `dc`.
4538* To select the correct function to execute unary operators.
4539* To select the correct function to execute binary operators.
4540* To calculate the correct number size for binary operators.
4541* To print a "digit" of a number.
4542* To seed the pseudo-random number generator.
4543
4544And there might be more.
4545
4546In every case, they are used for reducing the amount of code. Instead of
4547`if`/`else` chains, such as:
4548
4549```
4550if (BC_IS_BC) {
4551	bc_parse_parse(vm.parse);
4552}
4553else {
4554	dc_parse_parse(vm.parse);
4555}
4556```
4557
4558The best example of this is `bc_num_binary()`. It is called by every binary
4559operator. It figures out if it needs to allocate space for a new `BcNum`. If so,
4560it allocates the space and then calls the function pointer to the *true*
4561operation.
4562
4563Doing it like that shrunk the code *immensely*. First, instead of every single
4564binary operator duplicating the allocation code, it only exists in one place.
4565Second, `bc_num_binary()` itself does not have a massive `if`/`else` chain or a
4566`switch` statement.
4567
4568But perhaps the most important use was for destructors in vectors.
4569
4570Most of the data structures in `bc` are stored in vectors. If I hadn't made
4571destructors available for vectors, then ensuring that `bc` had no memory leaks
4572would have been nigh impossible. As it is, I check `bc` for memory leaks every
4573release when I change the code, and I have not released `bc` after version
4574`1.0.0` with any memory leaks, as far as I can remember anyway.
4575
4576### Numbers
4577
4578In order to do arbitrary-precision math, as `bc` must do, there must be some way
4579of representing arbitrary-precision numbers. `BcNum` in [`include/num.h`][184]
4580is `bc`'s way of doing that.
4581
4582(Note: the word ["limb"][214] is used below; it has a specific meaning when
4583applied to arbitrary-precision numbers. It means one piece of the number. It can
4584have a single digit, which is what GNU `bc` does, or it can have multiple, which
4585is what this `bc` does.)
4586
4587This struct needs to store several things:
4588
4589* The array of limbs of the number. This is the `num` field.
4590* The location of the decimal point. This is the `rdx` (short for [radix][215])
4591  field.
4592* The number of limbs the number has. This is the `len` field.
4593* Whether the number is negative or not. This is the least significant bit of
4594  the `rdx` field. More on that later.
4595
4596In addition, `bc`'s number stores the capacity of the limb array; this is the
4597`cap` field.
4598
4599If the number needs to grow, and the capacity of the number is big enough, the
4600number is not reallocated; the number of limbs is just added to.
4601
4602There is one additional wrinkle: to make the usual operations (binary operators)
4603fast, the decimal point is *not* allowed to be in the middle of a limb; it must
4604always be between limbs, after all limbs (integer), or before all limbs (real
4605between -1 and 1).
4606
4607The reason for this is because addition, subtraction, multiplication, and
4608division expect digits to be lined up on the decimal point. By requiring that it
4609be between limbs, no extra alignment is needed, and those operations can proceed
4610without extra overhead.
4611
4612This does make some operations, most notably extending, truncating, and
4613shifting, more expensive, but the overhead is constant, and these operations are
4614usually cheap compared to the binary operators anyway.
4615
4616This also requires something else: `bc` numbers need to know *exactly* how many
4617decimal places they have after the decimal point. If the decimal point must be
4618inbetween limbs, the last decimal place could be in the middle of a limb. The
4619amount of decimal places in a number is carefully tracked and stored in the
4620`scale` field, and this number must always coincide with the `rdx` field by the
4621following formula:
4622
4623```
4624scale + (BC_BASE_DIGS - 1) / BC_BASE_DIGS == rdx >> 1
4625```
4626
4627(`BC_BASE_DIGS` is the number of decimal digits stored in one limb. It is 9 on
462864-bit systems and 4 on other systems.)
4629
4630Yes, `rdx` is shifted; that is because the negative bit is stored in the least
4631significant bit of the `rdx` field, and the actual radix (amount of limbs after
4632the decimal/radix point) is stored in the rest of the bits. This is safe because
4633`BC_BASE_DIGS` is always at least 4, which means `rdx` will always need at least
46342 bits less than `scale`.
4635
4636In addition to `rdx` always matching `scale`, another invariant is that `rdx`
4637must always be less than or equal to `len`. (Because `scale` may be greater than
4638`rdx`, `scale` does not have to be less than or equal to `len`.)
4639
4640Another invariant is that `len` must always be less than or equal to `cap`, for
4641obvious reasons.
4642
4643The last thing programmers need to know is that the limb array is stored in
4644little-endian order. This means that the last decimal places are in the limb
4645stored at index 0, and the most significant digits are stored at index `len-1`.
4646
4647This is done to make the most important operations fast. Addition and
4648subtraction are done from least significant to most significant limbs, which
4649means they can speed through memory in the way most computers are best at.
4650Multiplication does the same, sort of, and with division, it matters less.
4651Comparison does need to go backwards, but that's after exhausting all other
4652alternatives, including for example, checking the length of the integer portion
4653of each limb array.
4654
4655Finally, here are some possible special situations with numbers and what they
4656mean:
4657
4658* `len == 0`: the number equals 0.
4659* `len == 0 && scale != 0`: the number equals 0, but it has a `scale` value.
4660  This is the only case where `scale` does not have to coincide with `rdx`
4661  This can happen with division, for example, that sets a specific `scale` for
4662  the result value but may produce 0.
4663* `(rdx >> 1) == len`: the number is greater than or equal to 1, or less than or
4664  equal to -1.
4665* `(rdx >> 1) < len`: the number is greater than -1 and less than 1, not
4666  including 0, although this will be true for 0 as well. However, 0 is always
4667  assumed to be represented by `len == 0`.
4668* `(rdx >> 1) == 0`: the number is an integer. In this case, `scale` must also
4669  equal 0.
4670
4671#### Math Style
4672
4673When I wrote the math for `bc`, I adopted a certain style that, if known, will
4674make it easier to understand the code. The style follows these rules:
4675
4676* `BcNum` arguments always come before arguments of other types.
4677* Among the `BcNum` arguments, the operands always come first, and the `BcNum`
4678  where the result(s) will be stored come last.
4679* Error checking is placed first in the function.
4680* Easy cases are placed next.
4681* Preparation, such as allocating temporaries, comes next.
4682* The actual math.
4683* Cleanup and ensuring invariants.
4684
4685While these rules are not hard and fast, using them as a guide will probably
4686help.
4687
4688#### Digit Clamping
4689
4690There is a question of what to do when parsing numbers and coming across digits
4691that are greater than or equal to the current `ibase`. `bc` and `dc` do one of
4692two things:
4693
4694* Clamp the digit to the maximum correct digit, or
4695* Use the digit as-is, multiplying it by the `ibase` to the power of the digit's
4696  position (starting from 0 at the least significant digit).
4697
4698The former behavior is what GNU `bc` does, and the latter is what GNU `dc`, the
4699OpenBSD `bc`, and the OpenBSD `dc` do.
4700
4701It is possible to switch between the two with the `-c` and `-C` command-line
4702options. However, it is important for developers to understand that both
4703behaviors exist and should exist.
4704
4705The library can also do both, and it is set in a global for each thread.
4706
4707### Strings as Numbers
4708
4709Strings can be assigned to variables. This is a problem because the vectors for
4710variable stacks expect `BcNum` structs only.
4711
4712While I could have made a union, I decided that the complexity of adding an
4713entirely new type, with destructor and everything, was not worth it. Instead, I
4714took advantage of the fact that `free()`, when passed a `NULL` pointer, will do
4715nothing.
4716
4717Using that, I made it so `BcNum`'s could store strings instead. This is marked
4718by the `BcNum` having a `NULL` limb array (`num`) and a `cap` of 0 (which should
4719*never* happen with a real number, though the other fields could be 0).
4720
4721The `BcNum` stores the function that stores the string in the `rdx` field, and
4722it stores the index of the string in the `scale` field. This is used to actually
4723load the string if necessary.
4724
4725Note that historically, string information was stored in the `loc` field of
4726the `d` union in a `BcResult`. This was changed recently to standardize; now,
4727all string information are stored in the `n` field of the `d` union regardless.
4728This means that all string information is stored in `BcNum`'s. This removes
4729extra cases.
4730
4731Also, if a temp is made with a string, then the result type should still be
4732`BC_RESULT_STR`, not `BC_RESULT_TEMP`. This is to make it easier to do type
4733checks.
4734
4735### Pseudo-Random Number Generator
4736
4737In order to understand this section, I suggest you read the information in the
4738manpages about the pseudo-random number generator (PRNG) first; that will help
4739you understand the guarantees it has, which is important because this section
4740delves into implementation details.
4741
4742First, the PRNG I use is seeded; this is because most OS's have an excellent
4743cryptographically secure PRNG available via command-line, usually
4744`/dev/urandom`, but the only *seeded* PRNG available is usually `bash`'s
4745`$RANDOM`, which is essentially a wrapper around C's `rand()`.
4746
4747`rand()` is...bad. It is only guaranteed to return 15 bits of random data.
4748Obviously, getting good random data out of that would be hard with that alone,
4749but implementations also seem to be poor.
4750
4751On top of that, `bc` is an arbitrary-precision calculator; if I made it able to
4752generate random numbers, I could make it generate random numbers of any size,
4753and since it would be seeded, results would be reproducible, when wanted.
4754
4755So to get that, I needed a seeded PRNG with good characteristics. After scouring
4756the Internet, I decided on the [PCG PRNG][215], mostly because of [this blog
4757post][216]. Part of the reason was the behavior of the xoroshiro128+ author, who
4758hates on PCG and its author, but also because PCG seemed to do better when
4759tested by independent parties.
4760
4761After that decision, I faced a challenge: PCG requires 255 bits of seed: 128 for
4762the actual seed, and 127 for the "increment." (Melissa O'Neill, the PCG author,
4763likens the increment to selecting a codebook.)
4764
4765I could, of course, put the entire 255 bits into one massive arbitrary-precision
4766number; `bc` is good at that, after all. But that didn't sit right with me
4767because it would mean any seed selected by users would have the real portion
4768ignored, which is stupid in a program like `bc`.
4769
4770Instead, I decided to make the integer portion the increment (clamped down to
4771size), and the real portion the seed.
4772
4773In most cases, this would be a bad idea because you cannot, in general, know how
4774many decimal places you need to represent any number with `n` real digits in
4775base `b` in another base. However, there is an easy to how many decimal digits
4776after the decimal point it takes to represent reals of base 2 in base 10: the
4777power of two.
4778
4779It turns out that, for base 2 represented in base 10, the power of 2 is
4780*exactly* how many digits are necessary to represent *any* number `n/2^p`, where
4781`p` is the power of 2. This is because at every halving, the number of decimal
4782places increases by 1:
4783
4784```
47850.5
47860.25
47870.125
47880.0625
47890.03125
47900.015625
4791...
4792```
4793
4794This happens because because every decimal representation of `1/2^p` ends with a
47955 because 5 is half of 10. Because 5 is odd, half of it always ends with the
4796digits 25, where 2 is where the previous 5 was, and the new 5 is one decimal
4797place further right.
4798
4799So the algorithm to convert all 255 bits of the seed is as follows:
4800
48011.	Convert the increment to a `BcNum`.
48022.	Convert the seed to a `BcNum`.
48033.	Divide the seed by `2^128` with a `scale` of 128. (For 32-bit systems,
4804	substitute 64 bits for 128.)
48054.	Add the two numbers together.
4806
4807Likewise, the algorithm to convert from a user-supplied number to a seed is:
4808
48091.	Truncate a copy of the number.
48102.	Subtract the result from #1 from the original number. This gives the real
4811	portion of the number.
48123.	Clamp the result of #1 to 127 (or 63) bits. This is the increment.
48134.	Multiply the result of #2 by `2^128`.
48145.	Truncate the result of #4. This is the seed.
4815
4816#### Generating Arbitrary-Precision Numbers
4817
4818I wrote a function (`bc_rand_bounded()`) that will return unbiased results with
4819any bound below the max that PCG can generate.
4820
4821To generate an integer of arbitrary size using a bound, `bc` simply uses
4822`bc_rand_bounded()` to generate numbers with a bound `10^BC_BASE_DIGS` for as
4823many limbs as needed to satisfy the bigger bound.
4824
4825To generate numbers with arbitrary precision after the decimal point, `bc`
4826merely generates an arbitrary precision integer with the bound `10^p`, where `p`
4827is the desired number of decimal places, then divides in by `10^p` with a
4828`scale` of `p`.
4829
4830## Debug Code
4831
4832Besides building `bc` in debug mode with the `-g` flag to [`configure.sh`][69],
4833programmers can also add `-DBC_DEBUG_CODE=1` to the `CFLAGS`. This will enable
4834the inclusion of *a lot* of extra code to assist with debugging.
4835
4836For more information, see all of the code guarded by `#if BC_DEBUG_CODE` in the
4837[`include/`][212] directory and in the [`src/`][213] directory.
4838
4839Yes, all of the code is guarded by `#if` preprocessor statements; this is
4840because the code should *never* be in a release build, and by making programmers
4841add this manually (not even an option to [`configure.sh`][69]), it is easier to
4842ensure that never happens.
4843
4844However, that said, the extra debug code is useful; that was why I kept it in.
4845
4846## Performance
4847
4848While I have put in a lot of effort to make `bc` as fast as possible, there
4849might be some things users can do to speed it up without changing the code.
4850
4851First, users can probably use [profile-guided optimization][217] to optimize
4852even better, using the test suite to profile.
4853
4854Second, I included macros that might help branch placement and prediction:
4855
4856* `BC_ERR(e)`
4857* `BC_UNLIKELY(e)`
4858* `BC_NO_ERR(e)`
4859* `BC_LIKELY(e)`
4860
4861`BC_ERR` is the same as `BC_UNLIKELY`, and `BC_NO_ERR` is the same as
4862`BC_LIKELY`; I just added them to also document branches that lead to error
4863conditions or *away* from error conditions.
4864
4865Anyway, if `BC_LIKELY` and `BC_UNLIKELY` are not defined during compilation,
4866they expand to nothing but the argument they were given.
4867
4868They can, however, be defined to `__builtin_expect((e), 1)` and
4869`__builtin_expect((e), 0)`, respectively, on GCC and Clang for better branch
4870prediction and placement. (For more information about `__builtin_expect()` see
4871the [GCC documentation][218].)
4872
4873There might be other compilers that can take advantage of that, but I don't know
4874anything about that.
4875
4876Also, as stated in the [build manual][219], link-time optimization is excellent
4877at optimizing this `bc`. Use it.
4878
4879### Benchmarks
4880
4881To help programmers improve performance, I have built and assembled
4882infrastructure to make benchmarking easy.
4883
4884First, in order to easily run benchmarks, I created
4885[`scripts/benchmark.sh`][220].
4886
4887Second, I copied and adapted [`ministat.c`][223] [from FreeBSD][221], to make it
4888easier to judge whether the results are significant or not.
4889
4890Third, I made the `make` clean target `make clean_benchmarks`, to clean
4891`scripts/ministat` and the generated benchmark files.
4892
4893Fourth, I made it so [`scripts/benchmark.sh`][220] outputs the timing and memory
4894data in a format that is easy for `scripts/ministat` to digest.
4895
4896To add a benchmark, add a script in the right directory to generate the
4897benchmark. Yes, generate.
4898
4899All of the benchmarks are generated first, from `.bc` and `.dc` files in the
4900[`benchmarks/bc/`][91] and [`benchmarks/dc/`][224]. This is so that massive
4901amounts of data can be generated and then pushed through the calculators.
4902
4903If you need to benchmark `bc` or `dc` with simple loops, have the generator
4904files simply print the loop code.
4905
4906### Caching of Numbers
4907
4908In order to provide some performance boost, `bc` tries to reuse old `BcNum`'s
4909that have the default capacity (`BC_NUM_DEF_SIZE`).
4910
4911It does this by allowing `bc_num_free()` to put the limb array onto a
4912statically-allocated stack (it's just a global array with a set size). Then,
4913when a `BcNum` with the default capacity is needed, `bc_num_init()` asks if any
4914are available. If the answer is yes, the one on top of the stack is returned.
4915Otherwise, `NULL` is returned, and `bc_num_free()` knows it needs to `malloc()`
4916a new limb array.
4917
4918When the stack is filled, any numbers that `bc` attempts to put on it are just
4919freed.
4920
4921This setup saved a few percent in my testing for version [3.0.0][32], which is
4922when I added it.
4923
4924## `bcl`
4925
4926At the request of one of my biggest users, I spent the time to make a build mode
4927where the number and math code of `bc` could be wrapped into a library, which I
4928called `bcl`.
4929
4930This mode is exclusive; `bc` and `dc` themselves are *not* built when building
4931`bcl`.
4932
4933The only things in the `bc` math code that is not included is:
4934
4935* Printing newlines (clients do not care about `bc`'s line lenth restriction).
4936* `dc`'s stream print.
4937
4938Even the [pseudo-random number generator][179] is included, with extra support
4939for generating real numbers with it. (In `bc`, such support is in
4940[`lib2.bc`][26].)
4941
4942### Signal Handling
4943
4944`bcl` does not have the capability for signal handling (as of version `6.0.0`).
4945
4946### Threads
4947
4948It is possible to use `bcl` across multiple threads. However, `bcl` must be
4949initialized on each thread where it will be used.
4950
4951This is because `bcl` uses thread-specific data to store the "globals" for each
4952thread. This is to ensure that threads do not stomp on each other's numbers or
4953other data structures.
4954
4955### Contexts
4956
4957Contexts were an idea by the same user that requested `bcl`. They are meant to
4958make it so multiple clients in one program can keep their data separate from
4959each other.
4960
4961### Numbers
4962
4963Numbers in `bcl` are literally indices into an encapsulated array of numbers,
4964hidden in the context. These indices are then passed to clients to refer to
4965numbers later.
4966
4967### Operand Consumption
4968
4969Most math functions in `bcl` "consume" their operand arguments; the arguments
4970are freed, whether or not an error is returned.
4971
4972This is to make it easy to implement math code, like this:
4973
4974```
4975n = bcl_add(bcl_mul(a, b), bcl_div(c, d));
4976```
4977
4978If numbers need to be preserved, they can be with `bcl_dup()`:
4979
4980```
4981n = bcl_add(bcl_mul(bcl_dup(a), bc_dup(b)), bcl_div(bcl_dup(c), bcl_dup(d)));
4982```
4983
4984### Errors
4985
4986Errors can be encoded in the indices representing numbers, and where necessary,
4987clients are responsible for checking those errors.
4988
4989The encoding of errors is this: if an error happens, the value `0-error` is
4990returned. To decode, do the exact same thing. Thus, any index above
4991`0-num_errors` is an error.
4992
4993If an index that represents an error is passed to a math function, that function
4994propagates the error to its result and does not perform the math operation.
4995
4996All of this is to, once again, make it easy to implement the math code as above.
4997
4998However, where possible, errors are returned directly.
4999
5000[1]: https://en.wikipedia.org/wiki/Bus_factor
5001[2]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#top
5002[3]: https://en.wikipedia.org/wiki/Dc_(Unix)
5003[4]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5004[5]: ./bc/A.1.md#standard-library
5005[6]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc
5006[7]: ./bc/A.1.md#extended-library
5007[8]: #libbc-2
5008[9]: #strgensh
5009[10]: https://vimeo.com/230142234
5010[11]: https://gavinhoward.com/2019/12/values-for-yao/
5011[12]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
5012[13]: ./build.md#cross-compiling
5013[14]: ./build.md
5014[15]: #strgenc
5015[16]: http://landley.net/toybox/about.html
5016[17]: https://www.busybox.net/
5017[18]: https://en.wikipedia.org/wiki/Karatsuba_algorithm
5018[19]: https://clang-analyzer.llvm.org/scan-build.html
5019[20]: https://www.valgrind.org/
5020[21]: https://clang.llvm.org/docs/AddressSanitizer.html
5021[22]: https://gavinhoward.com/2019/11/finishing-software/
5022[23]: #history
5023[24]: https://clang.llvm.org/docs/ClangFormat.html
5024[25]: ./algorithms.md
5025[26]: #lib2bc
5026[27]: #vmh
5027[28]: https://github.com/rain-1/linenoise-mob
5028[29]: https://github.com/antirez/linenoise
5029[30]: #bclh
5030[31]: #argsh
5031[32]: ../NEWS.md#3-0-0
5032[33]: ../NEWS.md
5033[34]: https://github.com/skeeto/optparse
5034[35]: #opth
5035[36]: #historyh
5036[37]: #randh
5037[38]: #langh
5038[39]: #numc
5039[40]: #bcc
5040[41]: #bc_lexc
5041[42]: #bc_parsec
5042[43]: #libraryc
5043[44]: #dcc
5044[45]: #dc_lexc
5045[46]: #dc_parsec
5046[47]: #filec
5047[48]: #historyc
5048[49]: #langc
5049[50]: #lexc
5050[51]: #optc
5051[52]: #parsec
5052[53]: #programc
5053[54]: #randc
5054[55]: #fileh
5055[56]: #readc
5056[57]: #programh
5057[58]: #vmc
5058[59]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/gencat.html#top
5059[60]: #manpagesh
5060[61]: #bcl3md
5061[62]: #bcl3
5062[63]: #bclvcxproj
5063[64]: #bclvcxprojfilters
5064[65]: #bclsln
5065[66]: #bcvcxproj
5066[67]: #bcvcxprojfilters
5067[68]: #bcsln
5068[69]: #configuresh
5069[70]: #makefilein
5070[71]: #functionsh
5071[72]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#top
5072[73]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18
5073[74]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html#top
5074[75]: #versionh
5075[76]: ##posix-shell-scripts
5076[77]: #tests
5077[78]: #karatsubapy
5078[79]: #bc-1
5079[80]: #dc-1
5080[81]: ./build.md#build-type
5081[82]: #fuzzing-1
5082[83]: #releasesh
5083[84]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
5084[85]: #locales-1
5085[86]: #manuals-1
5086[87]: #locale_installsh
5087[88]: #locale_uninstallsh
5088[89]: #bc1mdin
5089[90]: #dc1mdin
5090[91]: #bc
5091[92]: https://pandoc.org/
5092[93]: #release_settingstxt
5093[94]: #aflpy
5094[95]: #randmathpy
5095[96]: #caching-of-numbers
5096[97]: #error-handling
5097[98]: #radamsatxt
5098[99]: https://gitlab.com/akihe/radamsa
5099[100]: #radamsash
5100[101]: https://musl.libc.org/
5101[102]: ./build.md#settings
5102[103]: #test_settingstxt
5103[104]: #test_settingssh
5104[105]: #functionssh
5105[106]: #bch
5106[107]: https://en.wikipedia.org/wiki/Infix_notation
5107[108]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5108[109]: https://en.wikipedia.org/wiki/Shunting-yard_algorithm
5109[110]: #bc-parsing
5110[111]: #vectors
5111[112]: https://git.yzena.com/Yzena/Yc/src/branch/master/include/yc/vector.h
5112[113]: https://en.wikipedia.org/wiki/Signal_(IPC)
5113[114]: #custom-io
5114[115]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03
5115[116]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html
5116[117]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html
5117[118]: https://www.youtube.com/watch?v=4PaWFYm0kEw
5118[119]: #fuzz_prepsh
5119[120]: #bc_aflyaml
5120[121]: #bc_afl_continueyaml
5121[122]: https://github.com/tmux/tmux
5122[123]: https://tmuxp.git-pull.com/
5123[124]: #test-suite
5124[125]: https://aflplus.plus/
5125[126]: #link-time-optimization
5126[127]: #fuzzing-performance
5127[128]: #radamsa
5128[129]: #afl-quickstart
5129[130]: #convenience
5130[131]: #datac
5131[132]: https://git.gavinhoward.com/gavin/vim-bc
5132[133]: https://git.gavinhoward.com/gavin/bc_libs
5133[134]: #debugging
5134[135]: #asserts
5135[136]: #portability
5136[137]: https://pexpect.readthedocs.io/en/stable/
5137[138]: #test-suite-portability
5138[139]: #historypy
5139[140]: #historysh
5140[141]: #group-tests
5141[142]: #build-system
5142[143]: #generated-tests
5143[144]: #benchmarks-1
5144[145]: #gen
5145[146]: #test-coverage
5146[147]: #integration-with-the-build-system
5147[148]: #test-scripts
5148[149]: #standard-tests
5149[150]: #script-tests
5150[151]: #error-tests
5151[152]: #stdin-tests
5152[153]: #read-tests
5153[154]: #other-tests
5154[155]: #history-tests
5155[156]: #bcl
5156[157]: #bcl-test
5157[158]: #bclc
5158[159]: #valgrind
5159[160]: #addresssanitizer-and-friends
5160[161]: #bc-2
5161[162]: #dc-2
5162[163]: #alltxt-1
5163[164]: #errorstxt
5164[165]: #posix_errorstxt
5165[166]: #timeconstsh
5166[167]: #alltxt-3
5167[168]: #errorstxt-1
5168[169]: #scripts-1
5169[170]: #scripts-2
5170[171]: #alltxt-2
5171[172]: #alltxt-4
5172[173]: #async-signal-safe-signal-handling
5173[174]: #vectorh
5174[175]: #read_errorstxt
5175[176]: #statush
5176[177]: #numbers
5177[178]: #math-style
5178[179]: #pseudo-random-number-generator
5179[180]: #lexh
5180[181]: #parseh
5181[182]: #dch
5182[183]: #libraryh
5183[184]: #numh
5184[185]: #readh
5185[186]: #maps
5186[187]: #slabs-and-slab-vectors
5187[188]: ./build.md#extra-math
5188[189]: #command-line-history
5189[190]: #scriptsed
5190[191]: #linux-timeconstbc-script
5191[192]: #corpuses
5192[193]: ./build.md#history
5193[194]: https://www.valgrind.org/docs/manual/mc-manual.html
5194[195]: #othersh
5195[196]: https://scan.coverity.com/
5196[197]: https://clang-analyzer.llvm.org/
5197[198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it
5198[199]: https://cr.yp.to/docs/selfpipe.html
5199[200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph
5200[201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator
5201[202]: https://en.wikipedia.org/wiki/Tail_call
5202[203]: https://en.wikipedia.org/wiki/Functional_programming_language
5203[204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
5204[205]: #mainc
5205[206]: #argc
5206[207]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
5207[208]: #functions
5208[209]: #parsing
5209[210]: https://en.wikipedia.org/wiki/Stack_machine
5210[211]: https://en.wikipedia.org/wiki/Register_machine
5211[212]: #include
5212[213]: #src
5213[214]: https://gmplib.org/manual/Nomenclature-and-Types
5214[215]: https://en.wikipedia.org/wiki/Radix_point
5215[216]: #main-and-read-functions
5216[215]: https://www.pcg-random.org/
5217[216]: https://lemire.me/blog/2017/08/22/testing-non-cryptographic-random-number-generators-my-results/
5218[217]: https://en.wikipedia.org/wiki/Profile-guided_optimization
5219[218]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
5220[219]: ./build.md#optimization
5221[220]: #benchmarksh
5222[221]: https://cgit.freebsd.org/src/tree/usr.bin/ministat/ministat.c
5223[222]: https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html
5224[223]: #ministatc
5225[224]: #dc
5226[225]: #allsh
5227[226]: #errorssh
5228[227]: #errorsh
5229[228]: #vectorc
5230