• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1%           Copyright Matthew Pulver 2018 - 2019.
2% Distributed under the Boost Software License, Version 1.0.
3%     (See accompanying file LICENSE_1_0.txt or copy at
4%           https://www.boost.org/LICENSE_1_0.txt)
5
6\documentclass{article}
7\usepackage{amsmath} %\usepackage{mathtools}
8\usepackage{amssymb} %\mathbb
9\usepackage{array} % m{} column in tabular
10\usepackage{csquotes} % displayquote
11\usepackage{fancyhdr}
12\usepackage{fancyvrb}
13\usepackage[margin=0.75in]{geometry}
14\usepackage{graphicx}
15\usepackage{hyperref}
16%\usepackage{listings}
17\usepackage{multirow}
18\usepackage[super]{nth}
19\usepackage{wrapfig}
20\usepackage{xcolor}
21
22\hypersetup{%
23  colorlinks=false,% hyperlinks will be black
24  linkbordercolor=blue,% hyperlink borders will be red
25  urlbordercolor=blue,%
26  pdfborderstyle={/S/U/W 1}% border style will be underline of width 1pt
27}
28
29\pagestyle{fancyplain}
30\fancyhf{}
31\renewcommand{\headrulewidth}{0pt}
32\cfoot[]{\thepage\\
33\scriptsize\color{gray} Copyright \textcopyright\/ Matthew Pulver 2018--2019.
34Distributed under the Boost Software License, Version 1.0.\\
35(See accompanying file LICENSE\_1\_0.txt or copy at
36\url{https://www.boost.org/LICENSE\_1\_0.txt})}
37
38\DeclareMathOperator{\sinc}{sinc}
39
40\begin{document}
41
42\title{Autodiff\\
43\large Automatic Differentiation C++ Library}
44\author{Matthew Pulver}
45\maketitle
46
47%\date{}
48
49%\begin{abstract}
50%\end{abstract}
51
52\tableofcontents
53
54%\section{Synopsis}
55%\begingroup
56%\fontsize{10pt}{10pt}\selectfont
57%\begin{verbatim}
58% example/synopsis.cpp
59%\end{verbatim}
60%\endgroup
61
62\newpage
63
64\section{Description}
65
66Autodiff is a header-only C++ library that facilitates the
67\href{https://en.wikipedia.org/wiki/Automatic_differentiation}{automatic differentiation} (forward mode) of
68mathematical functions of single and multiple variables.
69
70This implementation is based upon the \href{https://en.wikipedia.org/wiki/Taylor_series}{Taylor series} expansion of
71an analytic function $f$ at the point $x_0$:
72
73\begin{align*}
74f(x_0+\varepsilon) &= f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots \\
75  &= \sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n + O\left(\varepsilon^{N+1}\right).
76\end{align*}
77The essential idea of autodiff is the substitution of numbers with polynomials in the evaluation of $f(x_0)$. By
78substituting the number $x_0$ with the first-order polynomial $x_0+\varepsilon$, and using the same algorithm
79to compute $f(x_0+\varepsilon)$, the resulting polynomial in $\varepsilon$ contains the function's derivatives
80$f'(x_0)$, $f''(x_0)$, $f'''(x_0)$, ...  within the coefficients. Each coefficient is equal to the derivative of
81its respective order, divided by the factorial of the order.
82
83In greater detail, assume one is interested in calculating the first $N$ derivatives of $f$ at $x_0$. Without loss
84of precision to the calculation of the derivatives, all terms $O\left(\varepsilon^{N+1}\right)$ that include powers
85of $\varepsilon$ greater than $N$ can be discarded. (This is due to the fact that each term in a polynomial depends
86only upon equal and lower-order terms under arithmetic operations.) Under these truncation rules, $f$ provides a
87polynomial-to-polynomial transformation:
88
89\[
90f \qquad : \qquad x_0+\varepsilon \qquad \mapsto \qquad
91    \sum_{n=0}^Ny_n\varepsilon^n=\sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n.
92\]
93C++'s ability to overload operators and functions allows for the creation of a class {\tt fvar}
94(\underline{f}orward-mode autodiff \underline{var}iable) that represents polynomials in $\varepsilon$. Thus
95the same algorithm $f$ that calculates the numeric value of $y_0=f(x_0)$, when
96written to accept and return variables of a generic (template) type, is also used to calculate the polynomial
97$\sum_{n=0}^Ny_n\varepsilon^n=f(x_0+\varepsilon)$. The derivatives $f^{(n)}(x_0)$ are then found from the
98product of the respective factorial $n!$ and coefficient $y_n$:
99
100\[ \frac{d^nf}{dx^n}(x_0)=n!y_n. \]
101
102\section{Examples}
103
104\subsection{Example 1: Single-variable derivatives}
105
106\subsubsection{Calculate derivatives of $f(x)=x^4$ at $x=2$.}
107
108In this example, {\tt make\_fvar<double, Order>(2.0)} instantiates the polynomial $2+\varepsilon$. The {\tt Order=5}
109means that enough space is allocated (on the stack) to hold a polynomial of up to degree 5 during the proceeding
110computation.
111
112Internally, this is modeled by a {\tt std::array<double,6>} whose elements {\tt \{2, 1, 0, 0, 0, 0\}} correspond
113to the 6 coefficients of the polynomial upon initialization. Its fourth power, at the end of the computation, is
114a polynomial with coefficients {\tt y = \{16, 32, 24, 8, 1, 0\}}.  The derivatives are obtained using the formula
115$f^{(n)}(2)=n!*{\tt y[n]}$.
116
117\begin{verbatim}
118#include <boost/math/differentiation/autodiff.hpp>
119#include <iostream>
120
121template <typename T>
122T fourth_power(T const& x) {
123  T x4 = x * x;  // retval in operator*() uses x4's memory via NRVO.
124  x4 *= x4;      // No copies of x4 are made within operator*=() even when squaring.
125  return x4;     // x4 uses y's memory in main() via NRVO.
126}
127
128int main() {
129  using namespace boost::math::differentiation;
130
131  constexpr unsigned Order = 5;                  // Highest order derivative to be calculated.
132  auto const x = make_fvar<double, Order>(2.0);  // Find derivatives at x=2.
133  auto const y = fourth_power(x);
134  for (unsigned i = 0; i <= Order; ++i)
135    std::cout << "y.derivative(" << i << ") = " << y.derivative(i) << std::endl;
136  return 0;
137}
138/*
139Output:
140y.derivative(0) = 16
141y.derivative(1) = 32
142y.derivative(2) = 48
143y.derivative(3) = 48
144y.derivative(4) = 24
145y.derivative(5) = 0
146*/
147\end{verbatim}
148The above calculates
149
150\begin{alignat*}{3}
151{\tt y.derivative(0)} &=& f(2) =&& \left.x^4\right|_{x=2} &= 16\\
152{\tt y.derivative(1)} &=& f'(2) =&& \left.4\cdot x^3\right|_{x=2} &= 32\\
153{\tt y.derivative(2)} &=& f''(2) =&& \left.4\cdot 3\cdot x^2\right|_{x=2} &= 48\\
154{\tt y.derivative(3)} &=& f'''(2) =&& \left.4\cdot 3\cdot2\cdot x\right|_{x=2} &= 48\\
155{\tt y.derivative(4)} &=& f^{(4)}(2) =&& 4\cdot 3\cdot2\cdot1 &= 24\\
156{\tt y.derivative(5)} &=& f^{(5)}(2) =&& 0 &
157\end{alignat*}
158
159\subsection{Example 2: Multi-variable mixed partial derivatives with multi-precision data type}\label{multivar}
160\subsubsection{Calculate $\frac{\partial^{12}f}{\partial w^{3}\partial x^{2}\partial y^{4}\partial z^{3}}(11,12,13,14)$
161with a precision of about 50 decimal digits,\\
162where $f(w,x,y,z)=\exp\left(w\sin\left(\frac{x\log(y)}{z}\right)+\sqrt{\frac{wz}{xy}}\right)+\frac{w^2}{\tan(z)}$.}
163
164In this example, {\tt make\_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14)} returns a {\tt std::tuple} of 4
165independent {\tt fvar} variables, with values of 11, 12, 13, and 14, for which the maximum order derivative to
166be calculated for each are 3, 2, 4, 3, respectively. The order of the variables is important, as it is the same
167order used when calling {\tt v.derivative(Nw, Nx, Ny, Nz)} in the example below.
168
169\begin{verbatim}
170#include <boost/math/differentiation/autodiff.hpp>
171#include <boost/multiprecision/cpp_bin_float.hpp>
172#include <iostream>
173
174using namespace boost::math::differentiation;
175
176template <typename W, typename X, typename Y, typename Z>
177promote<W, X, Y, Z> f(const W& w, const X& x, const Y& y, const Z& z) {
178  using namespace std;
179  return exp(w * sin(x * log(y) / z) + sqrt(w * z / (x * y))) + w * w / tan(z);
180}
181
182int main() {
183  using float50 = boost::multiprecision::cpp_bin_float_50;
184
185  constexpr unsigned Nw = 3;  // Max order of derivative to calculate for w
186  constexpr unsigned Nx = 2;  // Max order of derivative to calculate for x
187  constexpr unsigned Ny = 4;  // Max order of derivative to calculate for y
188  constexpr unsigned Nz = 3;  // Max order of derivative to calculate for z
189  // Declare 4 independent variables together into a std::tuple.
190  auto const variables = make_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14);
191  auto const& w = std::get<0>(variables);  // Up to Nw derivatives at w=11
192  auto const& x = std::get<1>(variables);  // Up to Nx derivatives at x=12
193  auto const& y = std::get<2>(variables);  // Up to Ny derivatives at y=13
194  auto const& z = std::get<3>(variables);  // Up to Nz derivatives at z=14
195  auto const v = f(w, x, y, z);
196  // Calculated from Mathematica symbolic differentiation.
197  float50 const answer("1976.319600747797717779881875290418720908121189218755");
198  std::cout << std::setprecision(std::numeric_limits<float50>::digits10)
199            << "mathematica   : " << answer << '\n'
200            << "autodiff      : " << v.derivative(Nw, Nx, Ny, Nz) << '\n'
201            << std::setprecision(3)
202            << "relative error: " << (v.derivative(Nw, Nx, Ny, Nz) / answer - 1) << '\n';
203  return 0;
204}
205/*
206Output:
207mathematica   : 1976.3196007477977177798818752904187209081211892188
208autodiff      : 1976.3196007477977177798818752904187209081211892188
209relative error: 2.67e-50
210*/
211\end{verbatim}
212
213\subsection{Example 3: Black-Scholes Option Pricing with Greeks Automatically Calculated}
214\subsubsection{Calculate greeks directly from the Black-Scholes pricing function.}
215
216Below is the standard Black-Scholes pricing function written as a function template, where the price, volatility
217(sigma), time to expiration (tau) and interest rate are template parameters. This means that any Greek based on
218these 4 variables can be calculated using autodiff. The below example calculates delta and gamma where the variable
219of differentiation is only the price. For examples of more exotic greeks, see {\tt example/black\_scholes.cpp}.
220
221\begin{verbatim}
222#include <boost/math/differentiation/autodiff.hpp>
223#include <iostream>
224
225using namespace boost::math::constants;
226using namespace boost::math::differentiation;
227
228// Equations and function/variable names are from
229// https://en.wikipedia.org/wiki/Greeks_(finance)#Formulas_for_European_option_Greeks
230
231// Standard normal cumulative distribution function
232template <typename X>
233X Phi(X const& x) {
234  return 0.5 * erfc(-one_div_root_two<X>() * x);
235}
236
237enum class CP { call, put };
238
239// Assume zero annual dividend yield (q=0).
240template <typename Price, typename Sigma, typename Tau, typename Rate>
241promote<Price, Sigma, Tau, Rate> black_scholes_option_price(CP cp,
242                                                            double K,
243                                                            Price const& S,
244                                                            Sigma const& sigma,
245                                                            Tau const& tau,
246                                                            Rate const& r) {
247  using namespace std;
248  auto const d1 = (log(S / K) + (r + sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
249  auto const d2 = (log(S / K) + (r - sigma * sigma / 2) * tau) / (sigma * sqrt(tau));
250  switch (cp) {
251    case CP::call:
252      return S * Phi(d1) - exp(-r * tau) * K * Phi(d2);
253    case CP::put:
254      return exp(-r * tau) * K * Phi(-d2) - S * Phi(-d1);
255  }
256}
257
258int main() {
259  double const K = 100.0;                    // Strike price.
260  auto const S = make_fvar<double, 2>(105);  // Stock price.
261  double const sigma = 5;                    // Volatility.
262  double const tau = 30.0 / 365;             // Time to expiration in years. (30 days).
263  double const r = 1.25 / 100;               // Interest rate.
264  auto const call_price = black_scholes_option_price(CP::call, K, S, sigma, tau, r);
265  auto const put_price = black_scholes_option_price(CP::put, K, S, sigma, tau, r);
266
267  std::cout << "black-scholes call price = " << call_price.derivative(0) << '\n'
268            << "black-scholes put  price = " << put_price.derivative(0) << '\n'
269            << "call delta = " << call_price.derivative(1) << '\n'
270            << "put  delta = " << put_price.derivative(1) << '\n'
271            << "call gamma = " << call_price.derivative(2) << '\n'
272            << "put  gamma = " << put_price.derivative(2) << '\n';
273  return 0;
274}
275/*
276Output:
277black-scholes call price = 56.5136
278black-scholes put  price = 51.4109
279call delta = 0.773818
280put  delta = -0.226182
281call gamma = 0.00199852
282put  gamma = 0.00199852
283*/
284\end{verbatim}
285
286\section{Advantages of Automatic Differentiation}
287The above examples illustrate some of the advantages of using autodiff:
288\begin{itemize}
289\item Elimination of code redundancy. The existence of $N$ separate functions to calculate derivatives is a form
290  of code redundancy, with all the liabilities that come with it:
291  \begin{itemize}
292    \item Changes to one function require $N$ additional changes to other functions. In the \nth{3} example above,
293        consider how much larger and inter-dependent the above code base would be if a separate function were
294        written for \href{https://en.wikipedia.org/wiki/Greeks\_(finance)#Formulas\_for\_European\_option\_Greeks}
295        {each Greek} value.
296    \item Dependencies upon a derivative function for a different purpose will break when changes are made to
297        the original function. What doesn't need to exist cannot break.
298    \item Code bloat, reducing conceptual integrity. Control over the evolution of code is easier/safer when
299        the code base is smaller and able to be intuitively grasped.
300  \end{itemize}
301\item Accuracy of derivatives over finite difference methods. Single-iteration finite difference methods always
302   include a $\Delta x$ free variable that must be carefully chosen for each application. If $\Delta x$ is too
303   small, then numerical errors become large. If $\Delta x$ is too large, then mathematical errors become large.
304   With autodiff, there are no free variables to set and the accuracy of the answer is generally superior to finite
305   difference methods even with the best choice of $\Delta x$.
306\end{itemize}
307
308\section{Mathematics}
309
310In order for the usage of the autodiff library to make sense, a basic understanding of the mathematics will help.
311
312\subsection{Truncated Taylor Series}
313
314Basic calculus courses teach that a real \href{https://en.wikipedia.org/wiki/Analytic_function}{analytic function}
315$f : D\rightarrow\mathbb{R}$ is one which can be expressed as a Taylor series at a point
316$x_0\in D\subseteq\mathbb{R}$:
317
318\[
319f(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \frac{f'''(x_0)}{3!}(x-x_0)^3 + \cdots
320\]
321One way of thinking about this form is that given the value of an analytic function $f(x_0)$ and its derivatives
322$f'(x_0), f''(x_0), f'''(x_0), ...$ evaluated at a point $x_0$, then the value of the function
323$f(x)$ can be obtained at any other point $x\in D$ using the above formula.
324
325Let us make the substitution $x=x_0+\varepsilon$ and rewrite the above equation to get:
326
327\[
328f(x_0+\varepsilon) = f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots
329\]
330Now consider $\varepsilon$ as {\it an abstract algebraic entity that never acquires a numeric value}, much like
331one does in basic algebra with variables like $x$ or $y$. For example, we can still manipulate entities
332like $xy$ and $(1+2x+3x^2)$ without having to assign specific numbers to them.
333
334Using this formula, autodiff goes in the other direction. Given a general formula/algorithm for calculating
335$f(x_0+\varepsilon)$, the derivatives are obtained from the coefficients of the powers of $\varepsilon$
336in the resulting computation. The general coefficient for $\varepsilon^n$ is
337
338\[\frac{f^{(n)}(x_0)}{n!}.\]
339Thus to obtain $f^{(n)}(x_0)$, the coefficient of $\varepsilon^n$ is multiplied by $n!$.
340
341\subsubsection{Example}
342
343Apply the above technique to calculate the derivatives of $f(x)=x^4$ at $x_0=2$.
344
345The first step is to evaluate $f(x_0+\varepsilon)$ and simply go through the calculation/algorithm, treating
346$\varepsilon$ as an abstract algebraic entity:
347
348\begin{align*}
349f(x_0+\varepsilon) &= f(2+\varepsilon) \\
350 &= (2+\varepsilon)^4 \\
351 &= \left(4+4\varepsilon+\varepsilon^2\right)^2 \\
352 &= 16+32\varepsilon+24\varepsilon^2+8\varepsilon^3+\varepsilon^4.
353\end{align*}
354Equating the powers of $\varepsilon$ from this result with the above $\varepsilon$-taylor expansion
355yields the following equalities:
356
357\[
358f(2) = 16, \qquad
359f'(2) = 32, \qquad
360\frac{f''(2)}{2!} = 24, \qquad
361\frac{f'''(2)}{3!} = 8, \qquad
362\frac{f^{(4)}(2)}{4!} = 1, \qquad
363\frac{f^{(5)}(2)}{5!} = 0.
364\]
365Multiplying both sides by the respective factorials gives
366
367\[
368f(2) = 16, \qquad
369f'(2) = 32, \qquad
370f''(2) = 48, \qquad
371f'''(2) = 48, \qquad
372f^{(4)}(2) = 24, \qquad
373f^{(5)}(2) = 0.
374\]
375These values can be directly confirmed by the \href{https://en.wikipedia.org/wiki/Power_rule}{power rule}
376applied to $f(x)=x^4$.
377
378\subsection{Arithmetic}
379
380What was essentially done above was to take a formula/algorithm for calculating $f(x_0)$ from a number $x_0$,
381and instead apply the same formula/algorithm to a polynomial $x_0+\varepsilon$. Intermediate steps operate on
382values of the form
383
384\[
385{\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N
386\]
387and the final return value is of this polynomial form as well. In other words, the normal arithmetic operators
388$+,-,\times,\div$ applied to numbers $x$ are instead applied to polynomials $\bf x$. Through the overloading of C++
389operators and functions, floating point data types are replaced with data types that represent these polynomials. More
390specifically, C++ types such as {\tt double} are replaced with {\tt std::array<double,N+1>}, which hold the above
391$N+1$ coefficients $x_i$, and are wrapped in a {\tt class} that overloads all of the arithmetic operators.
392
393The logic of these arithmetic operators simply mirror that which is applied to polynomials. We'll look at
394each of the 4 arithmetic operators in detail.
395
396\subsubsection{Addition}
397
398The addition of polynomials $\bf x$ and $\bf y$ is done component-wise:
399
400\begin{align*}
401{\bf z} &= {\bf x} + {\bf y} \\
402 &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) + \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
403 &= \sum_{i=0}^N(x_i+y_i)\varepsilon^i \\
404z_i &= x_i + y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
405\end{align*}
406
407\subsubsection{Subtraction}
408
409Subtraction follows the same form as addition:
410
411\begin{align*}
412{\bf z} &= {\bf x} - {\bf y} \\
413 &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) - \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
414 &= \sum_{i=0}^N(x_i-y_i)\varepsilon^i \\
415z_i &= x_i - y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}.
416\end{align*}
417
418\subsubsection{Multiplication}
419
420Multiplication produces higher-order terms:
421
422\begin{align*}
423{\bf z} &= {\bf x} \times {\bf y} \\
424 &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\
425 &= x_0y_0 + (x_0y_1+x_1y_0)\varepsilon + (x_0y_2+x_1y_1+x_2y_0)\varepsilon^2 + \cdots +
426    \left(\sum_{j=0}^Nx_jy_{N-j}\right)\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
427 &= \sum_{i=0}^N\sum_{j=0}^ix_jy_{i-j}\varepsilon^i + O\left(\varepsilon^{N+1}\right) \\
428z_i &= \sum_{j=0}^ix_jy_{i-j} \qquad \text{for}\; i\in\{0,1,2,...,N\}.
429\end{align*}
430In the case of multiplication, terms involving powers of $\varepsilon$ greater than $N$, collectively denoted
431by $O\left(\varepsilon^{N+1}\right)$, are simply discarded. Fortunately, the values of $z_i$ for $i\le N$ do not
432depend on any of these discarded terms, so there is no loss of precision in the final answer. The only information
433that is lost are the values of higher order derivatives, which we are not interested in anyway. If we were, then
434we would have simply chosen a larger value of $N$ to begin with.
435
436\subsubsection{Division}
437
438Division is not directly calculated as are the others. Instead, to find the components of
439${\bf z}={\bf x}\div{\bf y}$ we require that ${\bf x}={\bf y}\times{\bf z}$. This yields
440a recursive formula for the components $z_i$:
441
442\begin{align*}
443x_i &= \sum_{j=0}^iy_jz_{i-j} \\
444 &= y_0z_i + \sum_{j=1}^iy_jz_{i-j} \\
445z_i &= \frac{1}{y_0}\left(x_i - \sum_{j=1}^iy_jz_{i-j}\right) \qquad \text{for}\; i\in\{0,1,2,...,N\}.
446\end{align*}
447In the case of division, the values for $z_i$ must be calculated sequentially, since $z_i$
448depends on the previously calculated values $z_0, z_1, ..., z_{i-1}$.
449
450\subsection{General Functions}
451
452Calling standard mathematical functions such as {\tt log()}, {\tt cos()}, etc. should return accurate higher
453order derivatives. For example, {\tt exp(x)} may be written internally as a specific \nth{14}-degree polynomial to
454approximate $e^x$ when $0<x<1$. This would mean that the \nth{15} derivative, and all higher order derivatives, would
455be 0, however we know that $\frac{d^{15}}{dx^{15}}e^x=e^x$.  How should such functions whose derivatives are known
456be written to provide accurate higher order derivatives? The answer again comes back to the function's Taylor series.
457
458To simplify notation, for a given polynomial ${\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+
459x_N\varepsilon^N$ define
460
461\[
462{\bf x}_\varepsilon = x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N = \sum_{i=1}^Nx_i\varepsilon^i.
463\]
464This allows for a concise expression of a general function $f$ of $\bf x$:
465
466\begin{align*}
467f({\bf x}) &= f(x_0 + {\bf x}_\varepsilon) \\
468 & = f(x_0) + f'(x_0){\bf x}_\varepsilon + \frac{f''(x_0)}{2!}{\bf x}_\varepsilon^2 + \frac{f'''(x_0)}{3!}{\bf x}_\varepsilon^3 + \cdots + \frac{f^{(N)}(x_0)}{N!}{\bf x}_\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\
469 & = \sum_{i=0}^N\frac{f^{(i)}(x_0)}{i!}{\bf x}_\varepsilon^i + O\left(\varepsilon^{N+1}\right)
470\end{align*}
471where $\varepsilon$ has been substituted with ${\bf x}_\varepsilon$ in the $\varepsilon$-taylor series
472for $f(x)$. This form gives a recipe for calculating $f({\bf x})$ in general from regular numeric calculations
473$f(x_0)$, $f'(x_0)$, $f''(x_0)$, ... and successive powers of the epsilon terms ${\bf x}_\varepsilon$.
474
475For an application in which we are interested in up to $N$ derivatives in $x$ the data structure to hold
476this information is an $(N+1)$-element array {\tt v} whose general element is
477
478\[ {\tt v[i]} = \frac{f^{(i)}(x_0)}{i!} \qquad \text{for}\; i\in\{0,1,2,...,N\}. \]
479
480\subsection{Multiple Variables}
481
482In C++, the generalization to mixed partial derivatives with multiple independent variables is conveniently achieved
483with recursion. To begin to see the recursive pattern, consider a two-variable function $f(x,y)$. Since $x$
484and $y$ are independent, they require their own independent epsilons $\varepsilon_x$ and $\varepsilon_y$,
485respectively.
486
487Expand $f(x,y)$ for $x=x_0+\varepsilon_x$:
488\begin{align*}
489f(x_0+\varepsilon_x,y) &= f(x_0,y)
490+ \frac{\partial f}{\partial x}(x_0,y)\varepsilon_x
491+ \frac{1}{2!}\frac{\partial^2 f}{\partial x^2}(x_0,y)\varepsilon_x^2
492+ \frac{1}{3!}\frac{\partial^3 f}{\partial x^3}(x_0,y)\varepsilon_x^3
493+ \cdots
494+ \frac{1}{M!}\frac{\partial^M f}{\partial x^M}(x_0,y)\varepsilon_x^M
495+ O\left(\varepsilon_x^{M+1}\right) \\
496&= \sum_{i=0}^M\frac{1}{i!}\frac{\partial^i f}{\partial x^i}(x_0,y)\varepsilon_x^i + O\left(\varepsilon_x^{M+1}\right).
497\end{align*}
498Next, expand $f(x_0+\varepsilon_x,y)$ for $y=y_0+\varepsilon_y$:
499
500\begin{align*}
501f(x_0+\varepsilon_x,y_0+\varepsilon_y) &= \sum_{j=0}^N\frac{1}{j!}\frac{\partial^j}{\partial y^j}
502    \left(\sum_{i=0}^M\varepsilon_x^i\frac{1}{i!}\frac{\partial^if}{\partial x^i}\right)(x_0,y_0)\varepsilon_y^j
503    + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right) \\
504&= \sum_{i=0}^M\sum_{j=0}^N\frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
505   \varepsilon_x^i\varepsilon_y^j + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right).
506\end{align*}
507
508Similar to the single-variable case, for an application in which we are interested in up to $M$ derivatives in
509$x$ and $N$ derivatives in $y$, the data structure to hold this information is an $(M+1)\times(N+1)$
510array {\tt v} whose element at $(i,j)$ is
511
512\[
513{\tt v[i][j]} = \frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0)
514    \qquad \text{for}\; (i,j)\in\{0,1,2,...,M\}\times\{0,1,2,...,N\}.
515\]
516The generalization to additional independent variables follows the same pattern.
517
518\subsubsection{Declaring Multiple Variables}
519
520Internally, independent variables are represented by vectors within orthogonal vector spaces. Because of this,
521one must be careful when declaring more than one independent variable so that they do not end up in
522parallel vector spaces. This can easily be achieved by following one rule:
523\begin{itemize}
524\item When declaring more than one independent variable, call {\tt make\_ftuple<>()} once and only once.
525\end{itemize}
526The tuple of values returned are independent. Though it is possible to achieve the same result with multiple calls
527to {\tt make\_fvar}, this is an easier and less error-prone method. See Section~\ref{multivar} for example usage.
528
529%\section{Usage}
530%
531%\subsection{Single Variable}
532%
533%To calculate derivatives of a single variable $x$, at a particular value $x_0$, the following must be
534%specified at compile-time:
535%
536%\begin{enumerate}
537%\item The numeric data type {\tt T} of $x_0$. Examples: {\tt double},
538%    {\tt boost::multiprecision::cpp\_bin\_float\_50}, etc.
539%\item The maximum derivative order $M$ that is to be calculated with respect to $x$.
540%\end{enumerate}
541%Note that both of these requirements are entirely analogous to declaring and using a {\tt std::array<T,N>}. {\tt T}
542%and {\tt N} must be set as compile-time, but which elements in the array are accessed can be determined at run-time,
543%just as the choice of what derivatives to query in autodiff can be made during run-time.
544%
545%To declare and initialize $x$:
546%
547%\begin{verbatim}
548%    using namespace boost::math::differentiation;
549%    autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
550%\end{verbatim}
551%where {\tt x0} is a run-time value of type {\tt T}. Assuming {\tt 0 < M}, this represents the polynomial $x_0 +
552%\varepsilon$. Internally, the member variable of type {\tt std::array<T,M>} is {\tt v = \{ x0, 1, 0, 0, ... \}},
553%consistent with the above mathematical treatise.
554%
555%To find the derivatives $f^{(n)}(x_0)$ for $0\le n\le M$ of a function
556%$f : \mathbb{R}\rightarrow\mathbb{R}$, the function can be represented as a template
557%
558%\begin{verbatim}
559%    template<typename T>
560%    T f(T x);
561%\end{verbatim}
562%Using a generic type {\tt T} allows for {\tt x} to be of a regular type such as {\tt double}, but also allows for\\
563%{\tt boost::math::differentiation::autodiff\_fvar<>} types.
564%
565%Internal calls to mathematical functions must allow for
566%\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL). Many standard library functions
567%are overloaded in the {\tt boost::math::differentiation} namespace. For example, instead of calling {\tt std::cos(x)}
568%from within {\tt f}, include the line {\tt using std::cos;} and call {\tt cos(x)} without a namespace prefix.
569%
570%Calling $f$ and retrieving the calculated value and derivatives:
571%
572%\begin{verbatim}
573%    using namespace boost::math::differentiation;
574%    autodiff_fvar<T,M> x = make_fvar<T,M>(x0);
575%    autodiff_fvar<T,M> y = f(x);
576%    for (int n=0 ; n<=M ; ++n)
577%        std::cout << "y.derivative("<<n<<") == " << y.derivative(n) << std::endl;
578%\end{verbatim}
579%{\tt y.derivative(0)} returns the undifferentiated value $f(x_0)$, and {\tt y.derivative(n)} returns $f^{(n)}(x_0)$.
580%Casting {\tt y} to type {\tt T} also gives the undifferentiated value. In other words, the following 3 values
581%are equal:
582%
583%\begin{enumerate}
584%\item {\tt f(x0)}
585%\item {\tt y.derivative(0)}
586%\item {\tt static\_cast<T>(y)}
587%\end{enumerate}
588%
589%\subsection{Multiple Variables}
590%
591%Independent variables are represented in autodiff as independent dimensions within a multi-dimensional array.
592%This is perhaps best illustrated with examples. The {\tt namespace boost::math::differentiation} is assumed.
593%
594%The following instantiates a variable of $x=13$ with up to 3 orders of derivatives:
595%
596%\begin{verbatim}
597%    autodiff_fvar<double,3> x = make_fvar<double,3>(13);
598%\end{verbatim}
599%This instantiates {\bf an independent} value of $y=14$ with up to 4 orders of derivatives:
600%
601%\begin{verbatim}
602%    autodiff_fvar<double,0,4> y = make_fvar<double,0,4>(14);
603%\end{verbatim}
604%Combining them together {\bf promotes} their data type automatically to the smallest multidimensional array that
605%accommodates both.
606%
607%\begin{verbatim}
608%    // z is promoted to autodiff_fvar<double,3,4>
609%    auto z = 10*x*x + 50*x*y + 100*y*y;
610%\end{verbatim}
611%The object {\tt z} holds a 2-dimensional array, thus {\tt derivative(...)} is a 2-parameter method:
612%
613%\[
614%{\tt z.derivative(i,j)} = \frac{\partial^{i+j}f}{\partial x^i\partial y^j}(13,14)
615%    \qquad \text{for}\; (i,j)\in\{0,1,2,3\}\times\{0,1,2,3,4\}.
616%\]
617%A few values of the result can be confirmed through inspection:
618%
619%\begin{verbatim}
620%    z.derivative(2,0) == 20
621%    z.derivative(1,1) == 50
622%    z.derivative(0,2) == 200
623%\end{verbatim}
624%Note how the position of the parameters in {\tt derivative(...)} match how {\tt x} and {\tt y} were declared.
625%This will be clarified next.
626%
627%\subsubsection{Two Rules of Variable Initialization}
628%
629%In general, there are two rules to keep in mind when dealing with multiple variables:
630%
631%\begin{enumerate}
632%\item Independent variables correspond to parameter position, in both the initialization {\tt make\_fvar<T,...>}
633%    and calls to {\tt derivative(...)}.
634%\item The last template position in {\tt make\_fvar<T,...>} determines which variable a derivative will be
635%   taken with respect to.
636%\end{enumerate}
637%Both rules are illustrated with an example in which there are 3 independent variables $x,y,z$ and 1 dependent
638%variable $w=f(x,y,z)$, though the following code readily generalizes to any number of independent variables, limited
639%only by the C++ compiler/memory/platform. The maximum derivative order of each variable is {\tt Nx}, {\tt Ny}, and
640%{\tt Nz}, respectively. Then the type for {\tt w} is {\tt boost::math::differentiation::autodiff\_fvar<T,Nx,Ny,Nz>}
641%and all possible mixed partial derivatives are available via
642%
643%\[
644%{\tt w.derivative(nx,ny,nz)} =
645%    \frac{\partial^{n_x+n_y+n_z}f}{\partial x^{n_x}\partial y^{n_y}\partial z^{n_z} }(x_0,y_0,z_0)
646%\]
647%for $(n_x,n_y,n_z)\in\{0,1,2,...,N_x\}\times\{0,1,2,...,N_y\}\times\{0,1,2,...,N_z\}$ where $x_0, y_0, z_0$ are
648%the numerical values at which the function $f$ and its derivatives are evaluated.
649%
650%In code:
651%\begin{verbatim}
652%    using namespace boost::math::differentiation;
653%
654%    using var = autodiff_fvar<double,Nx,Ny,Nz>; // Nx, Ny, Nz are constexpr size_t.
655%
656%    var x = make_fvar<double,Nx>(x0);       // x0 is of type double
657%    var y = make_fvar<double,Nx,Ny>(y0);    // y0 is of type double
658%    var z = make_fvar<double,Nx,Ny,Nz>(z0); // z0 is of type double
659%
660%    var w = f(x,y,z);
661%
662%    for (size_t nx=0 ; nx<=Nx ; ++nx)
663%        for (size_t ny=0 ; ny<=Ny ; ++ny)
664%            for (size_t nz=0 ; nz<=Nz ; ++nz)
665%                std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
666%                    << w.derivative(nx,ny,nz) << std::endl;
667%\end{verbatim}
668%Note how {\tt x}, {\tt y}, and {\tt z} are initialized: the last template parameter determines which variable
669%$x, y,$ or $z$ a derivative is taken with respect to. In terms of the $\varepsilon$-polynomials
670%above, this determines whether to add $\varepsilon_x, \varepsilon_y,$ or $\varepsilon_z$ to
671%$x_0, y_0,$ or $z_0$, respectively.
672%
673%In contrast, the following initialization of {\tt x} would be INCORRECT:
674%
675%\begin{verbatim}
676%    var x = make_fvar<T,Nx,0>(x0); // WRONG
677%\end{verbatim}
678%Mathematically, this represents $x_0+\varepsilon_y$, since the last template parameter corresponds to the
679%$y$ variable, and thus the resulting value will be invalid.
680%
681%\subsubsection{Type Promotion}
682%
683%The previous example can be optimized to save some unnecessary computation, by declaring smaller arrays,
684%and relying on autodiff's automatic type-promotion:
685%
686%\begin{verbatim}
687%    using namespace boost::math::differentiation;
688%
689%    autodiff_fvar<double,Nx> x = make_fvar<double,Nx>(x0);
690%    autodiff_fvar<double,0,Ny> y = make_fvar<double,0,Ny>(y0);
691%    autodiff_fvar<double,0,0,Nz> z = make_fvar<double,0,0,Nz>(z0);
692%
693%    autodiff_fvar<double,Nx,Ny,Nz> w = f(x,y,z);
694%
695%    for (size_t nx=0 ; nx<=Nx ; ++nx)
696%        for (size_t ny=0 ; ny<=Ny ; ++ny)
697%            for (size_t nz=0 ; nz<=Nz ; ++nz)
698%                std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == "
699%                    << w.derivative(nx,ny,nz) << std::endl;
700%\end{verbatim}
701%For example, if one of the first steps in the computation of $f$ was {\tt z*z}, then a significantly less number of
702%multiplications and additions may occur if {\tt z} is declared as {\tt autodiff\_fvar<double,0,0,Nz>} as opposed to \\
703%{\tt autodiff\_fvar<double,Nx,Ny,Nz>}. There is no loss of precision with the former, since the extra dimensions
704%represent 0 values. Once {\tt z} is combined with {\tt x} and {\tt y} during the computation, the types will be
705%promoted as necessary.  This is the recommended way to initialize variables in autodiff.
706
707\section{Writing Functions for Autodiff Compatibility}\label{compatibility}
708
709In this section, a general procedure is given for writing new, and transforming existing, C++ mathematical
710functions for compatibility with autodiff.
711
712There are 3 categories of functions that require different strategies:
713\begin{enumerate}
714\item Piecewise-rational functions. These are simply piecewise quotients of polynomials. All that is needed is to
715    turn the function parameters and return value into generic (template) types. This will then allow the function
716    to accept and return autodiff's {\tt fvar} types, thereby using autodiff's overloaded arithmetic operators
717    which calculate the derivatives automatically.
718\item Functions that call existing autodiff functions. This is the same as the previous, but may also include
719    calls to functions that are in the autodiff library. Examples: {\tt exp()}, {\tt log()}, {\tt tgamma()}, etc.
720\item New functions for which the derivatives can be calculated. This is the most general technique, as it
721    allows for the development of a function which do not fall into the previous two categories.
722\end{enumerate}
723Functions written in any of these ways may then be added to the autodiff library.
724
725\subsection{Piecewise-Rational Functions}
726\[
727f(x) = \frac{1}{1+x^2}
728\]
729By simply writing this as a template function, autodiff can calculate derivatives for it:
730\begin{Verbatim}[xleftmargin=2em]
731#include <boost/math/differentiation/autodiff.hpp>
732#include <iostream>
733
734template <typename T>
735T rational(T const& x) {
736  return 1 / (1 + x * x);
737}
738
739int main() {
740  using namespace boost::math::differentiation;
741  auto const x = make_fvar<double, 10>(0);
742  auto const y = rational(x);
743  std::cout << std::setprecision(std::numeric_limits<double>::digits10)
744            << "y.derivative(10) = " << y.derivative(10) << std::endl;
745  return 0;
746}
747/*
748Output:
749y.derivative(10) = -3628800
750*/
751\end{Verbatim}
752As simple as $f(x)$ may seem, the derivatives can get increasingly complex as derivatives are taken.
753For example, the \nth{10} derivative has the form
754\[
755f^{(10)}(x) = -3628800\frac{1 - 55x^2 + 330x^4 - 462x^6 + 165x^8 - 11x^{10}}{(1 + x^2)^{11}}.
756\]
757Derivatives of $f(x)$ are useful, and in fact used, in calculating higher order derivatives for $\arctan(x)$
758for instance, since
759\[
760\arctan^{(n)}(x) = \left(\frac{d}{dx}\right)^{n-1} \frac{1}{1+x^2}\qquad\text{for}\quad 1\le n.
761\]
762
763\subsection{Functions That Call Existing Autodiff Functions}
764
765Many of the standard library math function are overloaded in autodiff. It is recommended to use
766\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL) in order for functions to
767be written in a way that is general enough to accommodate standard types ({\tt double}) as well as autodiff types
768({\tt fvar}).
769\\
770Example:
771\begin{Verbatim}[xleftmargin=2em]
772#include <boost/math/constants/constants.hpp>
773#include <cmath>
774
775using namespace boost::math::constants;
776
777// Standard normal cumulative distribution function
778template <typename T>
779T Phi(T const& x)
780{
781  return 0.5 * std::erfc(-one_div_root_two<T>() * x);
782}
783\end{Verbatim}
784Though {\tt Phi(x)} is general enough to handle the various fundamental floating point types, this will
785not work if {\tt x} is an autodiff {\tt fvar} variable, since {\tt std::erfc} does not include a specialization
786for {\tt fvar}. The recommended solution is to remove the namespace prefix {\tt std::} from {\tt erfc}:
787\begin{Verbatim}[xleftmargin=2em]
788#include <boost/math/constants/constants.hpp>
789#include <boost/math/differentiation/autodiff.hpp>
790#include <cmath>
791
792using namespace boost::math::constants;
793
794// Standard normal cumulative distribution function
795template <typename T>
796T Phi(T const& x)
797{
798  using std::erfc;
799  return 0.5 * erfc(-one_div_root_two<T>() * x);
800}
801\end{Verbatim}
802In this form, when {\tt x} is of type {\tt fvar}, the C++ compiler will search for and find a function {\tt erfc}
803within the same namespace as {\tt fvar}, which is in the autodiff library, via ADL. Because of the using-declaration,
804it will also call {\tt std::erfc} when {\tt x} is a fundamental type such as {\tt double}.
805
806\subsection{New Functions For Which The Derivatives Can Be Calculated}\label{new_functions}
807
808Mathematical functions which do not fall into the previous two categories can be constructed using autodiff helper
809functions. This requires a separate function for calculating the derivatives. In case you are asking yourself what
810good is an autodiff library if one needs to supply the derivatives, the answer is that the new function will fit
811in with the rest of the autodiff library, thereby allowing for the creation of additional functions via all of
812the arithmetic operators, plus function composition, which was not readily available without the library.
813
814The example given here is for {\tt cos}:
815\begin{Verbatim}[xleftmargin=2em]
816template <typename RealType, size_t Order>
817fvar<RealType, Order> cos(fvar<RealType, Order> const& cr) {
818  using std::cos;
819  using std::sin;
820  using root_type = typename fvar<RealType, Order>::root_type;
821  constexpr size_t order = fvar<RealType, Order>::order_sum;
822  root_type const d0 = cos(static_cast<root_type>(cr));
823  if constexpr (order == 0)
824    return fvar<RealType, Order>(d0);
825  else {
826    root_type const d1 = -sin(static_cast<root_type>(cr));
827    root_type const derivatives[4]{d0, d1, -d0, -d1};
828    return cr.apply_derivatives(order,
829                                [&derivatives](size_t i) { return derivatives[i & 3]; });
830  }
831}
832\end{Verbatim}
833This uses the helper function {\tt fvar::apply\_derivatives} which takes two parameters:
834\begin{enumerate}
835\item The highest order derivative to be calculated.
836\item A function that maps derivative order to derivative value.
837\end{enumerate}
838The highest order derivative necessary to be calculated is generally equal to {\tt fvar::order\_sum}.  In the case
839of {\tt sin} and {\tt cos}, the derivatives are cyclical with period 4. Thus it is sufficient to store only these
8404 values into an array, and take the derivative order modulo 4 as the index into this array.
841
842A second helper function, not shown here, is {\tt apply\_coefficients}. This is used the same as
843{\tt apply\_derivatives} except that the supplied function calculates coefficients instead of derivatives.
844The relationship between a coefficient $C_n$ and derivative $D_n$ for derivative order $n$ is
845\[
846C_n = \frac{D_n}{n!}.
847\]
848Internally, {\tt fvar} holds coefficients rather than derivatives, so in case the coefficient values are more readily
849available than the derivatives, it can save some unnecessary computation to use {\tt apply\_coefficients}.
850See the definition of {\tt atan} for an example.
851
852Both of these helper functions use Horner's method when calculating the resulting polynomial {\tt fvar}. This works
853well when the derivatives are finite, but in cases where derivatives are infinite, this can quickly result in NaN
854values as the computation progresses. In these cases, one can call non-Horner versions of both function which
855better ``isolate'' infinite values so that they are less likely to evolve into NaN values.
856
857The four helper functions available for constructing new autodiff functions from known coefficients/derivatives are:
858\begin{enumerate}
859\item {\tt fvar::apply\_coefficients}
860\item {\tt fvar::apply\_coefficients\_nonhorner}
861\item {\tt fvar::apply\_derivatives}
862\item {\tt fvar::apply\_derivatives\_nonhorner}
863\end{enumerate}
864
865\section{Function Writing Guidelines}
866
867At a high level there is one fairly simple principle, loosely and intuitively speaking, to writing functions for
868which autodiff can effectively calculate derivatives: \\
869
870{\bf Autodiff Function Principle (AFP)}
871\begin{displayquote}
872A function whose branches in logic correspond to piecewise analytic calculations over non-singleton intervals,
873with smooth transitions between the intervals, and is free of indeterminate forms in the calculated value and
874higher order derivatives, will work fine with autodiff.
875\end{displayquote}
876Stating this with greater mathematical rigor can be done. However what seems to be more practical, in this
877case, is to give examples and categories of examples of what works, what doesn't, and how to remedy some of the
878common problems that may be encountered. That is the approach taken here.
879
880\subsection{Example 1: $f(x)=\max(0,x)$}
881
882One potential implementation of $f(x)=\max(0,x)$ is:
883
884\begin{verbatim}
885    template<typename T>
886    T f(const T& x)
887    {
888        return 0 < x ? x : 0;
889    }
890\end{verbatim}
891Though this is consistent with Section~\ref{compatibility}, there are two problems with it:
892
893\begin{enumerate}
894\item {\tt f(nan) = 0}. This problem is independent of autodiff, but is worth addressing anyway. If there is
895    an indeterminate form that arises within a calculation and is input into $f$, then it gets ``covered up'' by
896    this implementation leading to an unknowingly incorrect result. Better for functions in general to propagate
897    NaN values, so that the user knows something went wrong and doesn't rely on an incorrect result, and likewise
898    the developer can track down where the NaN originated from and remedy it.
899\item $f'(0) = 0$ when autodiff is applied. This is because {\tt f} returns 0 as a constant when {\tt x==0}, wiping
900    out any of the derivatives (or sensitivities) that {\tt x} was holding as an autodiff variable. Instead, let us
901    apply the AFP and identify the two intervals over which $f$ is defined: $(-\infty,0]\cup(0,\infty)$.
902    Though the function itself is not analytic at $x=0$, we can attempt somewhat to smooth out this transition
903    point by averaging the calculation of $f(x)$ at $x=0$ from each interval. If $x<0$ then the result is simply
904    0, and if $0<x$ then the result is $x$. The average is $\frac{1}{2}(0 + x)$ which will allow autodiff to
905    calculate $f'(0)=\frac{1}{2}$. This is a more reasonable answer.
906\end{enumerate}
907A better implementation that resolves both issues is:
908\begin{verbatim}
909    template<typename T>
910    T f(const T& x)
911    {
912        if (x < 0)
913            return 0;
914        else if (x == 0)
915            return 0.5*x;
916        else
917            return x;
918    }
919\end{verbatim}
920
921\subsection{Example 2: $f(x)=\sinc(x)$}
922
923The definition of $\sinc:\mathbb{R}\rightarrow\mathbb{R}$ is
924
925\[
926\sinc(x) = \begin{cases}
927    1 &\text{if}\; x = 0 \\
928    \frac{\sin(x)}{x} &\text{otherwise.}\end{cases}
929\]
930A potential implementation is:
931
932\begin{verbatim}
933    template<typename T>
934    T sinc(const T& x)
935    {
936        using std::sin;
937        return x == 0 ? 1 : sin(x) / x;
938    }
939\end{verbatim}
940Though this is again consistent with Section~\ref{compatibility}, and returns correct non-derivative values,
941it returns a constant when {\tt x==0} thereby losing all derivative information contained in {\tt x} and
942contributions from $\sinc$. For example, $\sinc''(0)=-\frac{1}{3}$, however {\tt y.derivative(2) == 0} when
943{\tt y = sinc(make\_fvar<double,2>(0))} using the above incorrect implementation. Applying the AFP, the intervals
944upon which separate branches of logic are applied are $(-\infty,0)\cup[0,0]\cup(0,\infty)$. The violation occurs
945due to the singleton interval $[0,0]$, even though a constant function of 1 is technically analytic. The remedy
946is to define a custom $\sinc$ overload and add it to the autodiff library. This has been done. Mathematically, it
947is well-defined and free of indeterminate forms, as is the \nth{3} expression in the equalities
948\[
949\frac{1}{x}\sin(x) = \frac{1}{x}\sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n+1}
950    = \sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n}.
951\]
952The autodiff library contains helper functions to help write function overloads when the derivatives of a function
953are known. This is an advanced feature and documentation for this may be added at a later time.
954
955For now, it is worth understanding the ways in which indeterminate forms can occur within a mathematical calculation,
956and avoid them when possible by rewriting the function. Table~\ref{3nans} compares 3 types of indeterminate
957forms. Assume the product {\tt a*b} is a positive finite value.
958
959\begin{table}[h]
960\centering\begin{tabular}{m{7em}||c|c|c}
961 & $\displaystyle f(x)=\left(\frac{a}{x}\right)\times(bx^2)$
962 & $\displaystyle g(x)=\left(\frac{a}{x}\right)\times(bx)$
963 & $\displaystyle h(x)=\left(\frac{a}{x^2}\right)\times(bx)$ \\[0.618em]
964\hline\hline
965Mathematical\newline Limit
966 & $\displaystyle\lim_{x\rightarrow0}f(x) = 0$
967 & $\displaystyle\lim_{x\rightarrow0}g(x) = ab$
968 & $\displaystyle\lim_{x\rightarrow0}h(x) = \infty$ \\
969\hline
970Floating Point\newline Arithmetic
971 & {\tt f(0) = inf*0 = nan} & {\tt g(0) = inf*0 = nan} & {\tt h(0) = inf*0 = nan}
972\end{tabular}
973\caption{Automatic differentiation does not compute limits.
974Indeterminate forms must be simplified manually. (These cases are not meant to be exhaustive.)}\label{3nans}
975\end{table}
976
977Indeterminate forms result in NaN values within a calculation. Mathematically, if they occur at locally isolated
978points, then we generally prefer the mathematical limit as the result, even if it is infinite. As demonstrated in
979Table~\ref{3nans}, depending upon the nature of the indeterminate form, the mathematical limit can be 0 (no matter
980the values of $a$ or $b$), or $ab$, or $\infty$, but these 3 cases cannot be distinguished by the floating point
981result of nan. Floating point arithmetic does not perform limits (directly), and neither does the autodiff library.
982Thus it is up to the diligence of the developer to keep a watchful eye over where indeterminate forms can arise.
983
984\subsection{Example 3: $f(x)=\sqrt x$ and $f'(0)=\infty$}
985
986When working with functions that have infinite higher order derivatives, this can very quickly result in nans in
987higher order derivatives as the computation progresses, as {\tt inf-inf}, {\tt inf/inf}, and {\tt 0*inf} result
988in {\tt nan}. See Table~\ref{sqrtnan} for an example.
989
990\begin{table}[h]
991\centering\begin{tabular}{c||c|c|c|c}
992$f(x)$ & $f(0)$ & $f'(0)$ & $f''(0)$ & $f'''(0)$ \\
993\hline\hline
994{\tt sqrt(x)} & {\tt 0} & {\tt inf} & {\tt -inf} & {\tt inf} \\
995\hline
996{\tt sqr(sqrt(x)+1)} & {\tt 1} & {\tt inf} & {\tt nan} & {\tt nan} \\
997\hline
998{\tt x+2*sqrt(x)+1} & {\tt 1} & {\tt inf} & {\tt -inf}& {\tt inf}
999\end{tabular}
1000\caption{Indeterminate forms in higher order derivatives. {\tt sqr(x) == x*x}.}\label{sqrtnan}
1001\end{table}
1002
1003Calling the autodiff-overloaded implementation of $f(x)=\sqrt x$ at the value {\tt x==0} results in the
1004\nth{1} row (after the header row) of Table~\ref{sqrtnan}, as is mathematically correct. The \nth{2} row shows
1005$f(x)=(\sqrt{x}+1)^2$ resulting in {\tt nan} values for $f''(0)$ and all higher order derivatives. This is due to
1006the internal arithmetic in which {\tt inf} is added to {\tt -inf} during the squaring, resulting in a {\tt nan}
1007value for $f''(0)$ and all higher orders. This is typical of {\tt inf} values in autodiff. Where they show up,
1008they are correct, however they can quickly introduce {\tt nan} values into the computation upon the addition of
1009oppositely signed {\tt inf} values, division by {\tt inf}, or multiplication by {\tt 0}. It is worth noting that
1010the infection of {\tt nan} only spreads upward in the order of derivatives, since lower orders do not depend upon
1011higher orders (which is also why dropping higher order terms in an autodiff computation does not result in any
1012loss of precision for lower order terms.)
1013
1014The resolution in this case is to manually perform the squaring in the computation, replacing the \nth{2} row
1015with the \nth{3}: $f(x)=x+2\sqrt{x}+1$. Though mathematically equivalent, it allows autodiff to avoid {\tt nan}
1016values since $\sqrt x$ is more ``isolated'' in the computation. That is, the {\tt inf} values that unavoidably
1017show up in the derivatives of {\tt sqrt(x)} for {\tt x==0} do not have the chance to interact with other {\tt inf}
1018values as with the squaring.
1019
1020\subsection{Summary}
1021
1022The AFP gives a high-level unified guiding principle for writing C++ template functions that autodiff can
1023effectively evaluate derivatives for.
1024
1025Examples have been given to illustrate some common items to avoid doing:
1026
1027\begin{enumerate}
1028\item It is not enough for functions to be piecewise continuous. On boundary points between intervals, consider
1029    returning the average expression of both intervals, rather than just one of them. Example: $\max(0,x)$ at $x=0$.
1030    In cases where the limits from both sides must match, and they do not, then {\tt nan} may be a more appropriate
1031    value depending on the application.
1032\item Avoid returning individual constant values (e.g. $\sinc(0)=1$.) Values must be computed uniformly along
1033    with other values in its local interval. If that is not possible, then the function must be overloaded to
1034    compute the derivatives precisely using the helper functions from Section~\ref{new_functions}.
1035\item Avoid intermediate indeterminate values in both the value ($\sinc(x)$ at $x=0$) and derivatives
1036    ($(\sqrt{x}+1)^2$ at $x=0$). Try to isolate expressions that may contain infinite values/derivatives so
1037    that they do not introduce NaN values into the computation.
1038\end{enumerate}
1039
1040\section{Acknowledgments}
1041
1042\begin{itemize}
1043\item Kedar Bhat --- C++11 compatibility, Boost Special Functions compatibility testing, codecov integration,
1044    and feedback.
1045\item Nick Thompson --- Initial feedback and help with Boost integration.
1046\item John Maddock --- Initial feedback and help with Boost integration.
1047\end{itemize}
1048
1049\begin{thebibliography}{1}
1050\bibitem{ad} \url{https://en.wikipedia.org/wiki/Automatic\_differentiation}
1051\bibitem{ed} Andreas Griewank, Andrea Walther. \textit{Evaluating Derivatives}. SIAM, 2nd ed. 2008.
1052\end{thebibliography}
1053
1054\end{document}
1055