1% Copyright Matthew Pulver 2018 - 2019. 2% Distributed under the Boost Software License, Version 1.0. 3% (See accompanying file LICENSE_1_0.txt or copy at 4% https://www.boost.org/LICENSE_1_0.txt) 5 6\documentclass{article} 7\usepackage{amsmath} %\usepackage{mathtools} 8\usepackage{amssymb} %\mathbb 9\usepackage{array} % m{} column in tabular 10\usepackage{csquotes} % displayquote 11\usepackage{fancyhdr} 12\usepackage{fancyvrb} 13\usepackage[margin=0.75in]{geometry} 14\usepackage{graphicx} 15\usepackage{hyperref} 16%\usepackage{listings} 17\usepackage{multirow} 18\usepackage[super]{nth} 19\usepackage{wrapfig} 20\usepackage{xcolor} 21 22\hypersetup{% 23 colorlinks=false,% hyperlinks will be black 24 linkbordercolor=blue,% hyperlink borders will be red 25 urlbordercolor=blue,% 26 pdfborderstyle={/S/U/W 1}% border style will be underline of width 1pt 27} 28 29\pagestyle{fancyplain} 30\fancyhf{} 31\renewcommand{\headrulewidth}{0pt} 32\cfoot[]{\thepage\\ 33\scriptsize\color{gray} Copyright \textcopyright\/ Matthew Pulver 2018--2019. 34Distributed under the Boost Software License, Version 1.0.\\ 35(See accompanying file LICENSE\_1\_0.txt or copy at 36\url{https://www.boost.org/LICENSE\_1\_0.txt})} 37 38\DeclareMathOperator{\sinc}{sinc} 39 40\begin{document} 41 42\title{Autodiff\\ 43\large Automatic Differentiation C++ Library} 44\author{Matthew Pulver} 45\maketitle 46 47%\date{} 48 49%\begin{abstract} 50%\end{abstract} 51 52\tableofcontents 53 54%\section{Synopsis} 55%\begingroup 56%\fontsize{10pt}{10pt}\selectfont 57%\begin{verbatim} 58% example/synopsis.cpp 59%\end{verbatim} 60%\endgroup 61 62\newpage 63 64\section{Description} 65 66Autodiff is a header-only C++ library that facilitates the 67\href{https://en.wikipedia.org/wiki/Automatic_differentiation}{automatic differentiation} (forward mode) of 68mathematical functions of single and multiple variables. 69 70This implementation is based upon the \href{https://en.wikipedia.org/wiki/Taylor_series}{Taylor series} expansion of 71an analytic function $f$ at the point $x_0$: 72 73\begin{align*} 74f(x_0+\varepsilon) &= f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots \\ 75 &= \sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n + O\left(\varepsilon^{N+1}\right). 76\end{align*} 77The essential idea of autodiff is the substitution of numbers with polynomials in the evaluation of $f(x_0)$. By 78substituting the number $x_0$ with the first-order polynomial $x_0+\varepsilon$, and using the same algorithm 79to compute $f(x_0+\varepsilon)$, the resulting polynomial in $\varepsilon$ contains the function's derivatives 80$f'(x_0)$, $f''(x_0)$, $f'''(x_0)$, ... within the coefficients. Each coefficient is equal to the derivative of 81its respective order, divided by the factorial of the order. 82 83In greater detail, assume one is interested in calculating the first $N$ derivatives of $f$ at $x_0$. Without loss 84of precision to the calculation of the derivatives, all terms $O\left(\varepsilon^{N+1}\right)$ that include powers 85of $\varepsilon$ greater than $N$ can be discarded. (This is due to the fact that each term in a polynomial depends 86only upon equal and lower-order terms under arithmetic operations.) Under these truncation rules, $f$ provides a 87polynomial-to-polynomial transformation: 88 89\[ 90f \qquad : \qquad x_0+\varepsilon \qquad \mapsto \qquad 91 \sum_{n=0}^Ny_n\varepsilon^n=\sum_{n=0}^N\frac{f^{(n)}(x_0)}{n!}\varepsilon^n. 92\] 93C++'s ability to overload operators and functions allows for the creation of a class {\tt fvar} 94(\underline{f}orward-mode autodiff \underline{var}iable) that represents polynomials in $\varepsilon$. Thus 95the same algorithm $f$ that calculates the numeric value of $y_0=f(x_0)$, when 96written to accept and return variables of a generic (template) type, is also used to calculate the polynomial 97$\sum_{n=0}^Ny_n\varepsilon^n=f(x_0+\varepsilon)$. The derivatives $f^{(n)}(x_0)$ are then found from the 98product of the respective factorial $n!$ and coefficient $y_n$: 99 100\[ \frac{d^nf}{dx^n}(x_0)=n!y_n. \] 101 102\section{Examples} 103 104\subsection{Example 1: Single-variable derivatives} 105 106\subsubsection{Calculate derivatives of $f(x)=x^4$ at $x=2$.} 107 108In this example, {\tt make\_fvar<double, Order>(2.0)} instantiates the polynomial $2+\varepsilon$. The {\tt Order=5} 109means that enough space is allocated (on the stack) to hold a polynomial of up to degree 5 during the proceeding 110computation. 111 112Internally, this is modeled by a {\tt std::array<double,6>} whose elements {\tt \{2, 1, 0, 0, 0, 0\}} correspond 113to the 6 coefficients of the polynomial upon initialization. Its fourth power, at the end of the computation, is 114a polynomial with coefficients {\tt y = \{16, 32, 24, 8, 1, 0\}}. The derivatives are obtained using the formula 115$f^{(n)}(2)=n!*{\tt y[n]}$. 116 117\begin{verbatim} 118#include <boost/math/differentiation/autodiff.hpp> 119#include <iostream> 120 121template <typename T> 122T fourth_power(T const& x) { 123 T x4 = x * x; // retval in operator*() uses x4's memory via NRVO. 124 x4 *= x4; // No copies of x4 are made within operator*=() even when squaring. 125 return x4; // x4 uses y's memory in main() via NRVO. 126} 127 128int main() { 129 using namespace boost::math::differentiation; 130 131 constexpr unsigned Order = 5; // Highest order derivative to be calculated. 132 auto const x = make_fvar<double, Order>(2.0); // Find derivatives at x=2. 133 auto const y = fourth_power(x); 134 for (unsigned i = 0; i <= Order; ++i) 135 std::cout << "y.derivative(" << i << ") = " << y.derivative(i) << std::endl; 136 return 0; 137} 138/* 139Output: 140y.derivative(0) = 16 141y.derivative(1) = 32 142y.derivative(2) = 48 143y.derivative(3) = 48 144y.derivative(4) = 24 145y.derivative(5) = 0 146*/ 147\end{verbatim} 148The above calculates 149 150\begin{alignat*}{3} 151{\tt y.derivative(0)} &=& f(2) =&& \left.x^4\right|_{x=2} &= 16\\ 152{\tt y.derivative(1)} &=& f'(2) =&& \left.4\cdot x^3\right|_{x=2} &= 32\\ 153{\tt y.derivative(2)} &=& f''(2) =&& \left.4\cdot 3\cdot x^2\right|_{x=2} &= 48\\ 154{\tt y.derivative(3)} &=& f'''(2) =&& \left.4\cdot 3\cdot2\cdot x\right|_{x=2} &= 48\\ 155{\tt y.derivative(4)} &=& f^{(4)}(2) =&& 4\cdot 3\cdot2\cdot1 &= 24\\ 156{\tt y.derivative(5)} &=& f^{(5)}(2) =&& 0 & 157\end{alignat*} 158 159\subsection{Example 2: Multi-variable mixed partial derivatives with multi-precision data type}\label{multivar} 160\subsubsection{Calculate $\frac{\partial^{12}f}{\partial w^{3}\partial x^{2}\partial y^{4}\partial z^{3}}(11,12,13,14)$ 161with a precision of about 50 decimal digits,\\ 162where $f(w,x,y,z)=\exp\left(w\sin\left(\frac{x\log(y)}{z}\right)+\sqrt{\frac{wz}{xy}}\right)+\frac{w^2}{\tan(z)}$.} 163 164In this example, {\tt make\_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14)} returns a {\tt std::tuple} of 4 165independent {\tt fvar} variables, with values of 11, 12, 13, and 14, for which the maximum order derivative to 166be calculated for each are 3, 2, 4, 3, respectively. The order of the variables is important, as it is the same 167order used when calling {\tt v.derivative(Nw, Nx, Ny, Nz)} in the example below. 168 169\begin{verbatim} 170#include <boost/math/differentiation/autodiff.hpp> 171#include <boost/multiprecision/cpp_bin_float.hpp> 172#include <iostream> 173 174using namespace boost::math::differentiation; 175 176template <typename W, typename X, typename Y, typename Z> 177promote<W, X, Y, Z> f(const W& w, const X& x, const Y& y, const Z& z) { 178 using namespace std; 179 return exp(w * sin(x * log(y) / z) + sqrt(w * z / (x * y))) + w * w / tan(z); 180} 181 182int main() { 183 using float50 = boost::multiprecision::cpp_bin_float_50; 184 185 constexpr unsigned Nw = 3; // Max order of derivative to calculate for w 186 constexpr unsigned Nx = 2; // Max order of derivative to calculate for x 187 constexpr unsigned Ny = 4; // Max order of derivative to calculate for y 188 constexpr unsigned Nz = 3; // Max order of derivative to calculate for z 189 // Declare 4 independent variables together into a std::tuple. 190 auto const variables = make_ftuple<float50, Nw, Nx, Ny, Nz>(11, 12, 13, 14); 191 auto const& w = std::get<0>(variables); // Up to Nw derivatives at w=11 192 auto const& x = std::get<1>(variables); // Up to Nx derivatives at x=12 193 auto const& y = std::get<2>(variables); // Up to Ny derivatives at y=13 194 auto const& z = std::get<3>(variables); // Up to Nz derivatives at z=14 195 auto const v = f(w, x, y, z); 196 // Calculated from Mathematica symbolic differentiation. 197 float50 const answer("1976.319600747797717779881875290418720908121189218755"); 198 std::cout << std::setprecision(std::numeric_limits<float50>::digits10) 199 << "mathematica : " << answer << '\n' 200 << "autodiff : " << v.derivative(Nw, Nx, Ny, Nz) << '\n' 201 << std::setprecision(3) 202 << "relative error: " << (v.derivative(Nw, Nx, Ny, Nz) / answer - 1) << '\n'; 203 return 0; 204} 205/* 206Output: 207mathematica : 1976.3196007477977177798818752904187209081211892188 208autodiff : 1976.3196007477977177798818752904187209081211892188 209relative error: 2.67e-50 210*/ 211\end{verbatim} 212 213\subsection{Example 3: Black-Scholes Option Pricing with Greeks Automatically Calculated} 214\subsubsection{Calculate greeks directly from the Black-Scholes pricing function.} 215 216Below is the standard Black-Scholes pricing function written as a function template, where the price, volatility 217(sigma), time to expiration (tau) and interest rate are template parameters. This means that any Greek based on 218these 4 variables can be calculated using autodiff. The below example calculates delta and gamma where the variable 219of differentiation is only the price. For examples of more exotic greeks, see {\tt example/black\_scholes.cpp}. 220 221\begin{verbatim} 222#include <boost/math/differentiation/autodiff.hpp> 223#include <iostream> 224 225using namespace boost::math::constants; 226using namespace boost::math::differentiation; 227 228// Equations and function/variable names are from 229// https://en.wikipedia.org/wiki/Greeks_(finance)#Formulas_for_European_option_Greeks 230 231// Standard normal cumulative distribution function 232template <typename X> 233X Phi(X const& x) { 234 return 0.5 * erfc(-one_div_root_two<X>() * x); 235} 236 237enum class CP { call, put }; 238 239// Assume zero annual dividend yield (q=0). 240template <typename Price, typename Sigma, typename Tau, typename Rate> 241promote<Price, Sigma, Tau, Rate> black_scholes_option_price(CP cp, 242 double K, 243 Price const& S, 244 Sigma const& sigma, 245 Tau const& tau, 246 Rate const& r) { 247 using namespace std; 248 auto const d1 = (log(S / K) + (r + sigma * sigma / 2) * tau) / (sigma * sqrt(tau)); 249 auto const d2 = (log(S / K) + (r - sigma * sigma / 2) * tau) / (sigma * sqrt(tau)); 250 switch (cp) { 251 case CP::call: 252 return S * Phi(d1) - exp(-r * tau) * K * Phi(d2); 253 case CP::put: 254 return exp(-r * tau) * K * Phi(-d2) - S * Phi(-d1); 255 } 256} 257 258int main() { 259 double const K = 100.0; // Strike price. 260 auto const S = make_fvar<double, 2>(105); // Stock price. 261 double const sigma = 5; // Volatility. 262 double const tau = 30.0 / 365; // Time to expiration in years. (30 days). 263 double const r = 1.25 / 100; // Interest rate. 264 auto const call_price = black_scholes_option_price(CP::call, K, S, sigma, tau, r); 265 auto const put_price = black_scholes_option_price(CP::put, K, S, sigma, tau, r); 266 267 std::cout << "black-scholes call price = " << call_price.derivative(0) << '\n' 268 << "black-scholes put price = " << put_price.derivative(0) << '\n' 269 << "call delta = " << call_price.derivative(1) << '\n' 270 << "put delta = " << put_price.derivative(1) << '\n' 271 << "call gamma = " << call_price.derivative(2) << '\n' 272 << "put gamma = " << put_price.derivative(2) << '\n'; 273 return 0; 274} 275/* 276Output: 277black-scholes call price = 56.5136 278black-scholes put price = 51.4109 279call delta = 0.773818 280put delta = -0.226182 281call gamma = 0.00199852 282put gamma = 0.00199852 283*/ 284\end{verbatim} 285 286\section{Advantages of Automatic Differentiation} 287The above examples illustrate some of the advantages of using autodiff: 288\begin{itemize} 289\item Elimination of code redundancy. The existence of $N$ separate functions to calculate derivatives is a form 290 of code redundancy, with all the liabilities that come with it: 291 \begin{itemize} 292 \item Changes to one function require $N$ additional changes to other functions. In the \nth{3} example above, 293 consider how much larger and inter-dependent the above code base would be if a separate function were 294 written for \href{https://en.wikipedia.org/wiki/Greeks\_(finance)#Formulas\_for\_European\_option\_Greeks} 295 {each Greek} value. 296 \item Dependencies upon a derivative function for a different purpose will break when changes are made to 297 the original function. What doesn't need to exist cannot break. 298 \item Code bloat, reducing conceptual integrity. Control over the evolution of code is easier/safer when 299 the code base is smaller and able to be intuitively grasped. 300 \end{itemize} 301\item Accuracy of derivatives over finite difference methods. Single-iteration finite difference methods always 302 include a $\Delta x$ free variable that must be carefully chosen for each application. If $\Delta x$ is too 303 small, then numerical errors become large. If $\Delta x$ is too large, then mathematical errors become large. 304 With autodiff, there are no free variables to set and the accuracy of the answer is generally superior to finite 305 difference methods even with the best choice of $\Delta x$. 306\end{itemize} 307 308\section{Mathematics} 309 310In order for the usage of the autodiff library to make sense, a basic understanding of the mathematics will help. 311 312\subsection{Truncated Taylor Series} 313 314Basic calculus courses teach that a real \href{https://en.wikipedia.org/wiki/Analytic_function}{analytic function} 315$f : D\rightarrow\mathbb{R}$ is one which can be expressed as a Taylor series at a point 316$x_0\in D\subseteq\mathbb{R}$: 317 318\[ 319f(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{f''(x_0)}{2!}(x-x_0)^2 + \frac{f'''(x_0)}{3!}(x-x_0)^3 + \cdots 320\] 321One way of thinking about this form is that given the value of an analytic function $f(x_0)$ and its derivatives 322$f'(x_0), f''(x_0), f'''(x_0), ...$ evaluated at a point $x_0$, then the value of the function 323$f(x)$ can be obtained at any other point $x\in D$ using the above formula. 324 325Let us make the substitution $x=x_0+\varepsilon$ and rewrite the above equation to get: 326 327\[ 328f(x_0+\varepsilon) = f(x_0) + f'(x_0)\varepsilon + \frac{f''(x_0)}{2!}\varepsilon^2 + \frac{f'''(x_0)}{3!}\varepsilon^3 + \cdots 329\] 330Now consider $\varepsilon$ as {\it an abstract algebraic entity that never acquires a numeric value}, much like 331one does in basic algebra with variables like $x$ or $y$. For example, we can still manipulate entities 332like $xy$ and $(1+2x+3x^2)$ without having to assign specific numbers to them. 333 334Using this formula, autodiff goes in the other direction. Given a general formula/algorithm for calculating 335$f(x_0+\varepsilon)$, the derivatives are obtained from the coefficients of the powers of $\varepsilon$ 336in the resulting computation. The general coefficient for $\varepsilon^n$ is 337 338\[\frac{f^{(n)}(x_0)}{n!}.\] 339Thus to obtain $f^{(n)}(x_0)$, the coefficient of $\varepsilon^n$ is multiplied by $n!$. 340 341\subsubsection{Example} 342 343Apply the above technique to calculate the derivatives of $f(x)=x^4$ at $x_0=2$. 344 345The first step is to evaluate $f(x_0+\varepsilon)$ and simply go through the calculation/algorithm, treating 346$\varepsilon$ as an abstract algebraic entity: 347 348\begin{align*} 349f(x_0+\varepsilon) &= f(2+\varepsilon) \\ 350 &= (2+\varepsilon)^4 \\ 351 &= \left(4+4\varepsilon+\varepsilon^2\right)^2 \\ 352 &= 16+32\varepsilon+24\varepsilon^2+8\varepsilon^3+\varepsilon^4. 353\end{align*} 354Equating the powers of $\varepsilon$ from this result with the above $\varepsilon$-taylor expansion 355yields the following equalities: 356 357\[ 358f(2) = 16, \qquad 359f'(2) = 32, \qquad 360\frac{f''(2)}{2!} = 24, \qquad 361\frac{f'''(2)}{3!} = 8, \qquad 362\frac{f^{(4)}(2)}{4!} = 1, \qquad 363\frac{f^{(5)}(2)}{5!} = 0. 364\] 365Multiplying both sides by the respective factorials gives 366 367\[ 368f(2) = 16, \qquad 369f'(2) = 32, \qquad 370f''(2) = 48, \qquad 371f'''(2) = 48, \qquad 372f^{(4)}(2) = 24, \qquad 373f^{(5)}(2) = 0. 374\] 375These values can be directly confirmed by the \href{https://en.wikipedia.org/wiki/Power_rule}{power rule} 376applied to $f(x)=x^4$. 377 378\subsection{Arithmetic} 379 380What was essentially done above was to take a formula/algorithm for calculating $f(x_0)$ from a number $x_0$, 381and instead apply the same formula/algorithm to a polynomial $x_0+\varepsilon$. Intermediate steps operate on 382values of the form 383 384\[ 385{\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N 386\] 387and the final return value is of this polynomial form as well. In other words, the normal arithmetic operators 388$+,-,\times,\div$ applied to numbers $x$ are instead applied to polynomials $\bf x$. Through the overloading of C++ 389operators and functions, floating point data types are replaced with data types that represent these polynomials. More 390specifically, C++ types such as {\tt double} are replaced with {\tt std::array<double,N+1>}, which hold the above 391$N+1$ coefficients $x_i$, and are wrapped in a {\tt class} that overloads all of the arithmetic operators. 392 393The logic of these arithmetic operators simply mirror that which is applied to polynomials. We'll look at 394each of the 4 arithmetic operators in detail. 395 396\subsubsection{Addition} 397 398The addition of polynomials $\bf x$ and $\bf y$ is done component-wise: 399 400\begin{align*} 401{\bf z} &= {\bf x} + {\bf y} \\ 402 &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) + \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\ 403 &= \sum_{i=0}^N(x_i+y_i)\varepsilon^i \\ 404z_i &= x_i + y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}. 405\end{align*} 406 407\subsubsection{Subtraction} 408 409Subtraction follows the same form as addition: 410 411\begin{align*} 412{\bf z} &= {\bf x} - {\bf y} \\ 413 &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) - \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\ 414 &= \sum_{i=0}^N(x_i-y_i)\varepsilon^i \\ 415z_i &= x_i - y_i \qquad \text{for}\; i\in\{0,1,2,...,N\}. 416\end{align*} 417 418\subsubsection{Multiplication} 419 420Multiplication produces higher-order terms: 421 422\begin{align*} 423{\bf z} &= {\bf x} \times {\bf y} \\ 424 &= \left(\sum_{i=0}^Nx_i\varepsilon^i\right) \left(\sum_{i=0}^Ny_i\varepsilon^i\right) \\ 425 &= x_0y_0 + (x_0y_1+x_1y_0)\varepsilon + (x_0y_2+x_1y_1+x_2y_0)\varepsilon^2 + \cdots + 426 \left(\sum_{j=0}^Nx_jy_{N-j}\right)\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\ 427 &= \sum_{i=0}^N\sum_{j=0}^ix_jy_{i-j}\varepsilon^i + O\left(\varepsilon^{N+1}\right) \\ 428z_i &= \sum_{j=0}^ix_jy_{i-j} \qquad \text{for}\; i\in\{0,1,2,...,N\}. 429\end{align*} 430In the case of multiplication, terms involving powers of $\varepsilon$ greater than $N$, collectively denoted 431by $O\left(\varepsilon^{N+1}\right)$, are simply discarded. Fortunately, the values of $z_i$ for $i\le N$ do not 432depend on any of these discarded terms, so there is no loss of precision in the final answer. The only information 433that is lost are the values of higher order derivatives, which we are not interested in anyway. If we were, then 434we would have simply chosen a larger value of $N$ to begin with. 435 436\subsubsection{Division} 437 438Division is not directly calculated as are the others. Instead, to find the components of 439${\bf z}={\bf x}\div{\bf y}$ we require that ${\bf x}={\bf y}\times{\bf z}$. This yields 440a recursive formula for the components $z_i$: 441 442\begin{align*} 443x_i &= \sum_{j=0}^iy_jz_{i-j} \\ 444 &= y_0z_i + \sum_{j=1}^iy_jz_{i-j} \\ 445z_i &= \frac{1}{y_0}\left(x_i - \sum_{j=1}^iy_jz_{i-j}\right) \qquad \text{for}\; i\in\{0,1,2,...,N\}. 446\end{align*} 447In the case of division, the values for $z_i$ must be calculated sequentially, since $z_i$ 448depends on the previously calculated values $z_0, z_1, ..., z_{i-1}$. 449 450\subsection{General Functions} 451 452Calling standard mathematical functions such as {\tt log()}, {\tt cos()}, etc. should return accurate higher 453order derivatives. For example, {\tt exp(x)} may be written internally as a specific \nth{14}-degree polynomial to 454approximate $e^x$ when $0<x<1$. This would mean that the \nth{15} derivative, and all higher order derivatives, would 455be 0, however we know that $\frac{d^{15}}{dx^{15}}e^x=e^x$. How should such functions whose derivatives are known 456be written to provide accurate higher order derivatives? The answer again comes back to the function's Taylor series. 457 458To simplify notation, for a given polynomial ${\bf x} = x_0 + x_1\varepsilon + x_2\varepsilon^2 +\cdots+ 459x_N\varepsilon^N$ define 460 461\[ 462{\bf x}_\varepsilon = x_1\varepsilon + x_2\varepsilon^2 +\cdots+ x_N\varepsilon^N = \sum_{i=1}^Nx_i\varepsilon^i. 463\] 464This allows for a concise expression of a general function $f$ of $\bf x$: 465 466\begin{align*} 467f({\bf x}) &= f(x_0 + {\bf x}_\varepsilon) \\ 468 & = f(x_0) + f'(x_0){\bf x}_\varepsilon + \frac{f''(x_0)}{2!}{\bf x}_\varepsilon^2 + \frac{f'''(x_0)}{3!}{\bf x}_\varepsilon^3 + \cdots + \frac{f^{(N)}(x_0)}{N!}{\bf x}_\varepsilon^N + O\left(\varepsilon^{N+1}\right) \\ 469 & = \sum_{i=0}^N\frac{f^{(i)}(x_0)}{i!}{\bf x}_\varepsilon^i + O\left(\varepsilon^{N+1}\right) 470\end{align*} 471where $\varepsilon$ has been substituted with ${\bf x}_\varepsilon$ in the $\varepsilon$-taylor series 472for $f(x)$. This form gives a recipe for calculating $f({\bf x})$ in general from regular numeric calculations 473$f(x_0)$, $f'(x_0)$, $f''(x_0)$, ... and successive powers of the epsilon terms ${\bf x}_\varepsilon$. 474 475For an application in which we are interested in up to $N$ derivatives in $x$ the data structure to hold 476this information is an $(N+1)$-element array {\tt v} whose general element is 477 478\[ {\tt v[i]} = \frac{f^{(i)}(x_0)}{i!} \qquad \text{for}\; i\in\{0,1,2,...,N\}. \] 479 480\subsection{Multiple Variables} 481 482In C++, the generalization to mixed partial derivatives with multiple independent variables is conveniently achieved 483with recursion. To begin to see the recursive pattern, consider a two-variable function $f(x,y)$. Since $x$ 484and $y$ are independent, they require their own independent epsilons $\varepsilon_x$ and $\varepsilon_y$, 485respectively. 486 487Expand $f(x,y)$ for $x=x_0+\varepsilon_x$: 488\begin{align*} 489f(x_0+\varepsilon_x,y) &= f(x_0,y) 490+ \frac{\partial f}{\partial x}(x_0,y)\varepsilon_x 491+ \frac{1}{2!}\frac{\partial^2 f}{\partial x^2}(x_0,y)\varepsilon_x^2 492+ \frac{1}{3!}\frac{\partial^3 f}{\partial x^3}(x_0,y)\varepsilon_x^3 493+ \cdots 494+ \frac{1}{M!}\frac{\partial^M f}{\partial x^M}(x_0,y)\varepsilon_x^M 495+ O\left(\varepsilon_x^{M+1}\right) \\ 496&= \sum_{i=0}^M\frac{1}{i!}\frac{\partial^i f}{\partial x^i}(x_0,y)\varepsilon_x^i + O\left(\varepsilon_x^{M+1}\right). 497\end{align*} 498Next, expand $f(x_0+\varepsilon_x,y)$ for $y=y_0+\varepsilon_y$: 499 500\begin{align*} 501f(x_0+\varepsilon_x,y_0+\varepsilon_y) &= \sum_{j=0}^N\frac{1}{j!}\frac{\partial^j}{\partial y^j} 502 \left(\sum_{i=0}^M\varepsilon_x^i\frac{1}{i!}\frac{\partial^if}{\partial x^i}\right)(x_0,y_0)\varepsilon_y^j 503 + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right) \\ 504&= \sum_{i=0}^M\sum_{j=0}^N\frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0) 505 \varepsilon_x^i\varepsilon_y^j + O\left(\varepsilon_x^{M+1}\right) + O\left(\varepsilon_y^{N+1}\right). 506\end{align*} 507 508Similar to the single-variable case, for an application in which we are interested in up to $M$ derivatives in 509$x$ and $N$ derivatives in $y$, the data structure to hold this information is an $(M+1)\times(N+1)$ 510array {\tt v} whose element at $(i,j)$ is 511 512\[ 513{\tt v[i][j]} = \frac{1}{i!j!}\frac{\partial^{i+j}f}{\partial x^i\partial y^j}(x_0,y_0) 514 \qquad \text{for}\; (i,j)\in\{0,1,2,...,M\}\times\{0,1,2,...,N\}. 515\] 516The generalization to additional independent variables follows the same pattern. 517 518\subsubsection{Declaring Multiple Variables} 519 520Internally, independent variables are represented by vectors within orthogonal vector spaces. Because of this, 521one must be careful when declaring more than one independent variable so that they do not end up in 522parallel vector spaces. This can easily be achieved by following one rule: 523\begin{itemize} 524\item When declaring more than one independent variable, call {\tt make\_ftuple<>()} once and only once. 525\end{itemize} 526The tuple of values returned are independent. Though it is possible to achieve the same result with multiple calls 527to {\tt make\_fvar}, this is an easier and less error-prone method. See Section~\ref{multivar} for example usage. 528 529%\section{Usage} 530% 531%\subsection{Single Variable} 532% 533%To calculate derivatives of a single variable $x$, at a particular value $x_0$, the following must be 534%specified at compile-time: 535% 536%\begin{enumerate} 537%\item The numeric data type {\tt T} of $x_0$. Examples: {\tt double}, 538% {\tt boost::multiprecision::cpp\_bin\_float\_50}, etc. 539%\item The maximum derivative order $M$ that is to be calculated with respect to $x$. 540%\end{enumerate} 541%Note that both of these requirements are entirely analogous to declaring and using a {\tt std::array<T,N>}. {\tt T} 542%and {\tt N} must be set as compile-time, but which elements in the array are accessed can be determined at run-time, 543%just as the choice of what derivatives to query in autodiff can be made during run-time. 544% 545%To declare and initialize $x$: 546% 547%\begin{verbatim} 548% using namespace boost::math::differentiation; 549% autodiff_fvar<T,M> x = make_fvar<T,M>(x0); 550%\end{verbatim} 551%where {\tt x0} is a run-time value of type {\tt T}. Assuming {\tt 0 < M}, this represents the polynomial $x_0 + 552%\varepsilon$. Internally, the member variable of type {\tt std::array<T,M>} is {\tt v = \{ x0, 1, 0, 0, ... \}}, 553%consistent with the above mathematical treatise. 554% 555%To find the derivatives $f^{(n)}(x_0)$ for $0\le n\le M$ of a function 556%$f : \mathbb{R}\rightarrow\mathbb{R}$, the function can be represented as a template 557% 558%\begin{verbatim} 559% template<typename T> 560% T f(T x); 561%\end{verbatim} 562%Using a generic type {\tt T} allows for {\tt x} to be of a regular type such as {\tt double}, but also allows for\\ 563%{\tt boost::math::differentiation::autodiff\_fvar<>} types. 564% 565%Internal calls to mathematical functions must allow for 566%\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL). Many standard library functions 567%are overloaded in the {\tt boost::math::differentiation} namespace. For example, instead of calling {\tt std::cos(x)} 568%from within {\tt f}, include the line {\tt using std::cos;} and call {\tt cos(x)} without a namespace prefix. 569% 570%Calling $f$ and retrieving the calculated value and derivatives: 571% 572%\begin{verbatim} 573% using namespace boost::math::differentiation; 574% autodiff_fvar<T,M> x = make_fvar<T,M>(x0); 575% autodiff_fvar<T,M> y = f(x); 576% for (int n=0 ; n<=M ; ++n) 577% std::cout << "y.derivative("<<n<<") == " << y.derivative(n) << std::endl; 578%\end{verbatim} 579%{\tt y.derivative(0)} returns the undifferentiated value $f(x_0)$, and {\tt y.derivative(n)} returns $f^{(n)}(x_0)$. 580%Casting {\tt y} to type {\tt T} also gives the undifferentiated value. In other words, the following 3 values 581%are equal: 582% 583%\begin{enumerate} 584%\item {\tt f(x0)} 585%\item {\tt y.derivative(0)} 586%\item {\tt static\_cast<T>(y)} 587%\end{enumerate} 588% 589%\subsection{Multiple Variables} 590% 591%Independent variables are represented in autodiff as independent dimensions within a multi-dimensional array. 592%This is perhaps best illustrated with examples. The {\tt namespace boost::math::differentiation} is assumed. 593% 594%The following instantiates a variable of $x=13$ with up to 3 orders of derivatives: 595% 596%\begin{verbatim} 597% autodiff_fvar<double,3> x = make_fvar<double,3>(13); 598%\end{verbatim} 599%This instantiates {\bf an independent} value of $y=14$ with up to 4 orders of derivatives: 600% 601%\begin{verbatim} 602% autodiff_fvar<double,0,4> y = make_fvar<double,0,4>(14); 603%\end{verbatim} 604%Combining them together {\bf promotes} their data type automatically to the smallest multidimensional array that 605%accommodates both. 606% 607%\begin{verbatim} 608% // z is promoted to autodiff_fvar<double,3,4> 609% auto z = 10*x*x + 50*x*y + 100*y*y; 610%\end{verbatim} 611%The object {\tt z} holds a 2-dimensional array, thus {\tt derivative(...)} is a 2-parameter method: 612% 613%\[ 614%{\tt z.derivative(i,j)} = \frac{\partial^{i+j}f}{\partial x^i\partial y^j}(13,14) 615% \qquad \text{for}\; (i,j)\in\{0,1,2,3\}\times\{0,1,2,3,4\}. 616%\] 617%A few values of the result can be confirmed through inspection: 618% 619%\begin{verbatim} 620% z.derivative(2,0) == 20 621% z.derivative(1,1) == 50 622% z.derivative(0,2) == 200 623%\end{verbatim} 624%Note how the position of the parameters in {\tt derivative(...)} match how {\tt x} and {\tt y} were declared. 625%This will be clarified next. 626% 627%\subsubsection{Two Rules of Variable Initialization} 628% 629%In general, there are two rules to keep in mind when dealing with multiple variables: 630% 631%\begin{enumerate} 632%\item Independent variables correspond to parameter position, in both the initialization {\tt make\_fvar<T,...>} 633% and calls to {\tt derivative(...)}. 634%\item The last template position in {\tt make\_fvar<T,...>} determines which variable a derivative will be 635% taken with respect to. 636%\end{enumerate} 637%Both rules are illustrated with an example in which there are 3 independent variables $x,y,z$ and 1 dependent 638%variable $w=f(x,y,z)$, though the following code readily generalizes to any number of independent variables, limited 639%only by the C++ compiler/memory/platform. The maximum derivative order of each variable is {\tt Nx}, {\tt Ny}, and 640%{\tt Nz}, respectively. Then the type for {\tt w} is {\tt boost::math::differentiation::autodiff\_fvar<T,Nx,Ny,Nz>} 641%and all possible mixed partial derivatives are available via 642% 643%\[ 644%{\tt w.derivative(nx,ny,nz)} = 645% \frac{\partial^{n_x+n_y+n_z}f}{\partial x^{n_x}\partial y^{n_y}\partial z^{n_z} }(x_0,y_0,z_0) 646%\] 647%for $(n_x,n_y,n_z)\in\{0,1,2,...,N_x\}\times\{0,1,2,...,N_y\}\times\{0,1,2,...,N_z\}$ where $x_0, y_0, z_0$ are 648%the numerical values at which the function $f$ and its derivatives are evaluated. 649% 650%In code: 651%\begin{verbatim} 652% using namespace boost::math::differentiation; 653% 654% using var = autodiff_fvar<double,Nx,Ny,Nz>; // Nx, Ny, Nz are constexpr size_t. 655% 656% var x = make_fvar<double,Nx>(x0); // x0 is of type double 657% var y = make_fvar<double,Nx,Ny>(y0); // y0 is of type double 658% var z = make_fvar<double,Nx,Ny,Nz>(z0); // z0 is of type double 659% 660% var w = f(x,y,z); 661% 662% for (size_t nx=0 ; nx<=Nx ; ++nx) 663% for (size_t ny=0 ; ny<=Ny ; ++ny) 664% for (size_t nz=0 ; nz<=Nz ; ++nz) 665% std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == " 666% << w.derivative(nx,ny,nz) << std::endl; 667%\end{verbatim} 668%Note how {\tt x}, {\tt y}, and {\tt z} are initialized: the last template parameter determines which variable 669%$x, y,$ or $z$ a derivative is taken with respect to. In terms of the $\varepsilon$-polynomials 670%above, this determines whether to add $\varepsilon_x, \varepsilon_y,$ or $\varepsilon_z$ to 671%$x_0, y_0,$ or $z_0$, respectively. 672% 673%In contrast, the following initialization of {\tt x} would be INCORRECT: 674% 675%\begin{verbatim} 676% var x = make_fvar<T,Nx,0>(x0); // WRONG 677%\end{verbatim} 678%Mathematically, this represents $x_0+\varepsilon_y$, since the last template parameter corresponds to the 679%$y$ variable, and thus the resulting value will be invalid. 680% 681%\subsubsection{Type Promotion} 682% 683%The previous example can be optimized to save some unnecessary computation, by declaring smaller arrays, 684%and relying on autodiff's automatic type-promotion: 685% 686%\begin{verbatim} 687% using namespace boost::math::differentiation; 688% 689% autodiff_fvar<double,Nx> x = make_fvar<double,Nx>(x0); 690% autodiff_fvar<double,0,Ny> y = make_fvar<double,0,Ny>(y0); 691% autodiff_fvar<double,0,0,Nz> z = make_fvar<double,0,0,Nz>(z0); 692% 693% autodiff_fvar<double,Nx,Ny,Nz> w = f(x,y,z); 694% 695% for (size_t nx=0 ; nx<=Nx ; ++nx) 696% for (size_t ny=0 ; ny<=Ny ; ++ny) 697% for (size_t nz=0 ; nz<=Nz ; ++nz) 698% std::cout << "w.derivative("<<nx<<','<<ny<<','<<nz<<") == " 699% << w.derivative(nx,ny,nz) << std::endl; 700%\end{verbatim} 701%For example, if one of the first steps in the computation of $f$ was {\tt z*z}, then a significantly less number of 702%multiplications and additions may occur if {\tt z} is declared as {\tt autodiff\_fvar<double,0,0,Nz>} as opposed to \\ 703%{\tt autodiff\_fvar<double,Nx,Ny,Nz>}. There is no loss of precision with the former, since the extra dimensions 704%represent 0 values. Once {\tt z} is combined with {\tt x} and {\tt y} during the computation, the types will be 705%promoted as necessary. This is the recommended way to initialize variables in autodiff. 706 707\section{Writing Functions for Autodiff Compatibility}\label{compatibility} 708 709In this section, a general procedure is given for writing new, and transforming existing, C++ mathematical 710functions for compatibility with autodiff. 711 712There are 3 categories of functions that require different strategies: 713\begin{enumerate} 714\item Piecewise-rational functions. These are simply piecewise quotients of polynomials. All that is needed is to 715 turn the function parameters and return value into generic (template) types. This will then allow the function 716 to accept and return autodiff's {\tt fvar} types, thereby using autodiff's overloaded arithmetic operators 717 which calculate the derivatives automatically. 718\item Functions that call existing autodiff functions. This is the same as the previous, but may also include 719 calls to functions that are in the autodiff library. Examples: {\tt exp()}, {\tt log()}, {\tt tgamma()}, etc. 720\item New functions for which the derivatives can be calculated. This is the most general technique, as it 721 allows for the development of a function which do not fall into the previous two categories. 722\end{enumerate} 723Functions written in any of these ways may then be added to the autodiff library. 724 725\subsection{Piecewise-Rational Functions} 726\[ 727f(x) = \frac{1}{1+x^2} 728\] 729By simply writing this as a template function, autodiff can calculate derivatives for it: 730\begin{Verbatim}[xleftmargin=2em] 731#include <boost/math/differentiation/autodiff.hpp> 732#include <iostream> 733 734template <typename T> 735T rational(T const& x) { 736 return 1 / (1 + x * x); 737} 738 739int main() { 740 using namespace boost::math::differentiation; 741 auto const x = make_fvar<double, 10>(0); 742 auto const y = rational(x); 743 std::cout << std::setprecision(std::numeric_limits<double>::digits10) 744 << "y.derivative(10) = " << y.derivative(10) << std::endl; 745 return 0; 746} 747/* 748Output: 749y.derivative(10) = -3628800 750*/ 751\end{Verbatim} 752As simple as $f(x)$ may seem, the derivatives can get increasingly complex as derivatives are taken. 753For example, the \nth{10} derivative has the form 754\[ 755f^{(10)}(x) = -3628800\frac{1 - 55x^2 + 330x^4 - 462x^6 + 165x^8 - 11x^{10}}{(1 + x^2)^{11}}. 756\] 757Derivatives of $f(x)$ are useful, and in fact used, in calculating higher order derivatives for $\arctan(x)$ 758for instance, since 759\[ 760\arctan^{(n)}(x) = \left(\frac{d}{dx}\right)^{n-1} \frac{1}{1+x^2}\qquad\text{for}\quad 1\le n. 761\] 762 763\subsection{Functions That Call Existing Autodiff Functions} 764 765Many of the standard library math function are overloaded in autodiff. It is recommended to use 766\href{https://en.cppreference.com/w/cpp/language/adl}{argument-dependent lookup} (ADL) in order for functions to 767be written in a way that is general enough to accommodate standard types ({\tt double}) as well as autodiff types 768({\tt fvar}). 769\\ 770Example: 771\begin{Verbatim}[xleftmargin=2em] 772#include <boost/math/constants/constants.hpp> 773#include <cmath> 774 775using namespace boost::math::constants; 776 777// Standard normal cumulative distribution function 778template <typename T> 779T Phi(T const& x) 780{ 781 return 0.5 * std::erfc(-one_div_root_two<T>() * x); 782} 783\end{Verbatim} 784Though {\tt Phi(x)} is general enough to handle the various fundamental floating point types, this will 785not work if {\tt x} is an autodiff {\tt fvar} variable, since {\tt std::erfc} does not include a specialization 786for {\tt fvar}. The recommended solution is to remove the namespace prefix {\tt std::} from {\tt erfc}: 787\begin{Verbatim}[xleftmargin=2em] 788#include <boost/math/constants/constants.hpp> 789#include <boost/math/differentiation/autodiff.hpp> 790#include <cmath> 791 792using namespace boost::math::constants; 793 794// Standard normal cumulative distribution function 795template <typename T> 796T Phi(T const& x) 797{ 798 using std::erfc; 799 return 0.5 * erfc(-one_div_root_two<T>() * x); 800} 801\end{Verbatim} 802In this form, when {\tt x} is of type {\tt fvar}, the C++ compiler will search for and find a function {\tt erfc} 803within the same namespace as {\tt fvar}, which is in the autodiff library, via ADL. Because of the using-declaration, 804it will also call {\tt std::erfc} when {\tt x} is a fundamental type such as {\tt double}. 805 806\subsection{New Functions For Which The Derivatives Can Be Calculated}\label{new_functions} 807 808Mathematical functions which do not fall into the previous two categories can be constructed using autodiff helper 809functions. This requires a separate function for calculating the derivatives. In case you are asking yourself what 810good is an autodiff library if one needs to supply the derivatives, the answer is that the new function will fit 811in with the rest of the autodiff library, thereby allowing for the creation of additional functions via all of 812the arithmetic operators, plus function composition, which was not readily available without the library. 813 814The example given here is for {\tt cos}: 815\begin{Verbatim}[xleftmargin=2em] 816template <typename RealType, size_t Order> 817fvar<RealType, Order> cos(fvar<RealType, Order> const& cr) { 818 using std::cos; 819 using std::sin; 820 using root_type = typename fvar<RealType, Order>::root_type; 821 constexpr size_t order = fvar<RealType, Order>::order_sum; 822 root_type const d0 = cos(static_cast<root_type>(cr)); 823 if constexpr (order == 0) 824 return fvar<RealType, Order>(d0); 825 else { 826 root_type const d1 = -sin(static_cast<root_type>(cr)); 827 root_type const derivatives[4]{d0, d1, -d0, -d1}; 828 return cr.apply_derivatives(order, 829 [&derivatives](size_t i) { return derivatives[i & 3]; }); 830 } 831} 832\end{Verbatim} 833This uses the helper function {\tt fvar::apply\_derivatives} which takes two parameters: 834\begin{enumerate} 835\item The highest order derivative to be calculated. 836\item A function that maps derivative order to derivative value. 837\end{enumerate} 838The highest order derivative necessary to be calculated is generally equal to {\tt fvar::order\_sum}. In the case 839of {\tt sin} and {\tt cos}, the derivatives are cyclical with period 4. Thus it is sufficient to store only these 8404 values into an array, and take the derivative order modulo 4 as the index into this array. 841 842A second helper function, not shown here, is {\tt apply\_coefficients}. This is used the same as 843{\tt apply\_derivatives} except that the supplied function calculates coefficients instead of derivatives. 844The relationship between a coefficient $C_n$ and derivative $D_n$ for derivative order $n$ is 845\[ 846C_n = \frac{D_n}{n!}. 847\] 848Internally, {\tt fvar} holds coefficients rather than derivatives, so in case the coefficient values are more readily 849available than the derivatives, it can save some unnecessary computation to use {\tt apply\_coefficients}. 850See the definition of {\tt atan} for an example. 851 852Both of these helper functions use Horner's method when calculating the resulting polynomial {\tt fvar}. This works 853well when the derivatives are finite, but in cases where derivatives are infinite, this can quickly result in NaN 854values as the computation progresses. In these cases, one can call non-Horner versions of both function which 855better ``isolate'' infinite values so that they are less likely to evolve into NaN values. 856 857The four helper functions available for constructing new autodiff functions from known coefficients/derivatives are: 858\begin{enumerate} 859\item {\tt fvar::apply\_coefficients} 860\item {\tt fvar::apply\_coefficients\_nonhorner} 861\item {\tt fvar::apply\_derivatives} 862\item {\tt fvar::apply\_derivatives\_nonhorner} 863\end{enumerate} 864 865\section{Function Writing Guidelines} 866 867At a high level there is one fairly simple principle, loosely and intuitively speaking, to writing functions for 868which autodiff can effectively calculate derivatives: \\ 869 870{\bf Autodiff Function Principle (AFP)} 871\begin{displayquote} 872A function whose branches in logic correspond to piecewise analytic calculations over non-singleton intervals, 873with smooth transitions between the intervals, and is free of indeterminate forms in the calculated value and 874higher order derivatives, will work fine with autodiff. 875\end{displayquote} 876Stating this with greater mathematical rigor can be done. However what seems to be more practical, in this 877case, is to give examples and categories of examples of what works, what doesn't, and how to remedy some of the 878common problems that may be encountered. That is the approach taken here. 879 880\subsection{Example 1: $f(x)=\max(0,x)$} 881 882One potential implementation of $f(x)=\max(0,x)$ is: 883 884\begin{verbatim} 885 template<typename T> 886 T f(const T& x) 887 { 888 return 0 < x ? x : 0; 889 } 890\end{verbatim} 891Though this is consistent with Section~\ref{compatibility}, there are two problems with it: 892 893\begin{enumerate} 894\item {\tt f(nan) = 0}. This problem is independent of autodiff, but is worth addressing anyway. If there is 895 an indeterminate form that arises within a calculation and is input into $f$, then it gets ``covered up'' by 896 this implementation leading to an unknowingly incorrect result. Better for functions in general to propagate 897 NaN values, so that the user knows something went wrong and doesn't rely on an incorrect result, and likewise 898 the developer can track down where the NaN originated from and remedy it. 899\item $f'(0) = 0$ when autodiff is applied. This is because {\tt f} returns 0 as a constant when {\tt x==0}, wiping 900 out any of the derivatives (or sensitivities) that {\tt x} was holding as an autodiff variable. Instead, let us 901 apply the AFP and identify the two intervals over which $f$ is defined: $(-\infty,0]\cup(0,\infty)$. 902 Though the function itself is not analytic at $x=0$, we can attempt somewhat to smooth out this transition 903 point by averaging the calculation of $f(x)$ at $x=0$ from each interval. If $x<0$ then the result is simply 904 0, and if $0<x$ then the result is $x$. The average is $\frac{1}{2}(0 + x)$ which will allow autodiff to 905 calculate $f'(0)=\frac{1}{2}$. This is a more reasonable answer. 906\end{enumerate} 907A better implementation that resolves both issues is: 908\begin{verbatim} 909 template<typename T> 910 T f(const T& x) 911 { 912 if (x < 0) 913 return 0; 914 else if (x == 0) 915 return 0.5*x; 916 else 917 return x; 918 } 919\end{verbatim} 920 921\subsection{Example 2: $f(x)=\sinc(x)$} 922 923The definition of $\sinc:\mathbb{R}\rightarrow\mathbb{R}$ is 924 925\[ 926\sinc(x) = \begin{cases} 927 1 &\text{if}\; x = 0 \\ 928 \frac{\sin(x)}{x} &\text{otherwise.}\end{cases} 929\] 930A potential implementation is: 931 932\begin{verbatim} 933 template<typename T> 934 T sinc(const T& x) 935 { 936 using std::sin; 937 return x == 0 ? 1 : sin(x) / x; 938 } 939\end{verbatim} 940Though this is again consistent with Section~\ref{compatibility}, and returns correct non-derivative values, 941it returns a constant when {\tt x==0} thereby losing all derivative information contained in {\tt x} and 942contributions from $\sinc$. For example, $\sinc''(0)=-\frac{1}{3}$, however {\tt y.derivative(2) == 0} when 943{\tt y = sinc(make\_fvar<double,2>(0))} using the above incorrect implementation. Applying the AFP, the intervals 944upon which separate branches of logic are applied are $(-\infty,0)\cup[0,0]\cup(0,\infty)$. The violation occurs 945due to the singleton interval $[0,0]$, even though a constant function of 1 is technically analytic. The remedy 946is to define a custom $\sinc$ overload and add it to the autodiff library. This has been done. Mathematically, it 947is well-defined and free of indeterminate forms, as is the \nth{3} expression in the equalities 948\[ 949\frac{1}{x}\sin(x) = \frac{1}{x}\sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n+1} 950 = \sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n}. 951\] 952The autodiff library contains helper functions to help write function overloads when the derivatives of a function 953are known. This is an advanced feature and documentation for this may be added at a later time. 954 955For now, it is worth understanding the ways in which indeterminate forms can occur within a mathematical calculation, 956and avoid them when possible by rewriting the function. Table~\ref{3nans} compares 3 types of indeterminate 957forms. Assume the product {\tt a*b} is a positive finite value. 958 959\begin{table}[h] 960\centering\begin{tabular}{m{7em}||c|c|c} 961 & $\displaystyle f(x)=\left(\frac{a}{x}\right)\times(bx^2)$ 962 & $\displaystyle g(x)=\left(\frac{a}{x}\right)\times(bx)$ 963 & $\displaystyle h(x)=\left(\frac{a}{x^2}\right)\times(bx)$ \\[0.618em] 964\hline\hline 965Mathematical\newline Limit 966 & $\displaystyle\lim_{x\rightarrow0}f(x) = 0$ 967 & $\displaystyle\lim_{x\rightarrow0}g(x) = ab$ 968 & $\displaystyle\lim_{x\rightarrow0}h(x) = \infty$ \\ 969\hline 970Floating Point\newline Arithmetic 971 & {\tt f(0) = inf*0 = nan} & {\tt g(0) = inf*0 = nan} & {\tt h(0) = inf*0 = nan} 972\end{tabular} 973\caption{Automatic differentiation does not compute limits. 974Indeterminate forms must be simplified manually. (These cases are not meant to be exhaustive.)}\label{3nans} 975\end{table} 976 977Indeterminate forms result in NaN values within a calculation. Mathematically, if they occur at locally isolated 978points, then we generally prefer the mathematical limit as the result, even if it is infinite. As demonstrated in 979Table~\ref{3nans}, depending upon the nature of the indeterminate form, the mathematical limit can be 0 (no matter 980the values of $a$ or $b$), or $ab$, or $\infty$, but these 3 cases cannot be distinguished by the floating point 981result of nan. Floating point arithmetic does not perform limits (directly), and neither does the autodiff library. 982Thus it is up to the diligence of the developer to keep a watchful eye over where indeterminate forms can arise. 983 984\subsection{Example 3: $f(x)=\sqrt x$ and $f'(0)=\infty$} 985 986When working with functions that have infinite higher order derivatives, this can very quickly result in nans in 987higher order derivatives as the computation progresses, as {\tt inf-inf}, {\tt inf/inf}, and {\tt 0*inf} result 988in {\tt nan}. See Table~\ref{sqrtnan} for an example. 989 990\begin{table}[h] 991\centering\begin{tabular}{c||c|c|c|c} 992$f(x)$ & $f(0)$ & $f'(0)$ & $f''(0)$ & $f'''(0)$ \\ 993\hline\hline 994{\tt sqrt(x)} & {\tt 0} & {\tt inf} & {\tt -inf} & {\tt inf} \\ 995\hline 996{\tt sqr(sqrt(x)+1)} & {\tt 1} & {\tt inf} & {\tt nan} & {\tt nan} \\ 997\hline 998{\tt x+2*sqrt(x)+1} & {\tt 1} & {\tt inf} & {\tt -inf}& {\tt inf} 999\end{tabular} 1000\caption{Indeterminate forms in higher order derivatives. {\tt sqr(x) == x*x}.}\label{sqrtnan} 1001\end{table} 1002 1003Calling the autodiff-overloaded implementation of $f(x)=\sqrt x$ at the value {\tt x==0} results in the 1004\nth{1} row (after the header row) of Table~\ref{sqrtnan}, as is mathematically correct. The \nth{2} row shows 1005$f(x)=(\sqrt{x}+1)^2$ resulting in {\tt nan} values for $f''(0)$ and all higher order derivatives. This is due to 1006the internal arithmetic in which {\tt inf} is added to {\tt -inf} during the squaring, resulting in a {\tt nan} 1007value for $f''(0)$ and all higher orders. This is typical of {\tt inf} values in autodiff. Where they show up, 1008they are correct, however they can quickly introduce {\tt nan} values into the computation upon the addition of 1009oppositely signed {\tt inf} values, division by {\tt inf}, or multiplication by {\tt 0}. It is worth noting that 1010the infection of {\tt nan} only spreads upward in the order of derivatives, since lower orders do not depend upon 1011higher orders (which is also why dropping higher order terms in an autodiff computation does not result in any 1012loss of precision for lower order terms.) 1013 1014The resolution in this case is to manually perform the squaring in the computation, replacing the \nth{2} row 1015with the \nth{3}: $f(x)=x+2\sqrt{x}+1$. Though mathematically equivalent, it allows autodiff to avoid {\tt nan} 1016values since $\sqrt x$ is more ``isolated'' in the computation. That is, the {\tt inf} values that unavoidably 1017show up in the derivatives of {\tt sqrt(x)} for {\tt x==0} do not have the chance to interact with other {\tt inf} 1018values as with the squaring. 1019 1020\subsection{Summary} 1021 1022The AFP gives a high-level unified guiding principle for writing C++ template functions that autodiff can 1023effectively evaluate derivatives for. 1024 1025Examples have been given to illustrate some common items to avoid doing: 1026 1027\begin{enumerate} 1028\item It is not enough for functions to be piecewise continuous. On boundary points between intervals, consider 1029 returning the average expression of both intervals, rather than just one of them. Example: $\max(0,x)$ at $x=0$. 1030 In cases where the limits from both sides must match, and they do not, then {\tt nan} may be a more appropriate 1031 value depending on the application. 1032\item Avoid returning individual constant values (e.g. $\sinc(0)=1$.) Values must be computed uniformly along 1033 with other values in its local interval. If that is not possible, then the function must be overloaded to 1034 compute the derivatives precisely using the helper functions from Section~\ref{new_functions}. 1035\item Avoid intermediate indeterminate values in both the value ($\sinc(x)$ at $x=0$) and derivatives 1036 ($(\sqrt{x}+1)^2$ at $x=0$). Try to isolate expressions that may contain infinite values/derivatives so 1037 that they do not introduce NaN values into the computation. 1038\end{enumerate} 1039 1040\section{Acknowledgments} 1041 1042\begin{itemize} 1043\item Kedar Bhat --- C++11 compatibility, Boost Special Functions compatibility testing, codecov integration, 1044 and feedback. 1045\item Nick Thompson --- Initial feedback and help with Boost integration. 1046\item John Maddock --- Initial feedback and help with Boost integration. 1047\end{itemize} 1048 1049\begin{thebibliography}{1} 1050\bibitem{ad} \url{https://en.wikipedia.org/wiki/Automatic\_differentiation} 1051\bibitem{ed} Andreas Griewank, Andrea Walther. \textit{Evaluating Derivatives}. SIAM, 2nd ed. 2008. 1052\end{thebibliography} 1053 1054\end{document} 1055