solving.rst - OpenGrok cross reference for /external/ceres-solver/docs/source/solving.rst

Lines Matching refs:math
20 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
22 :math:`F(x) = \left[f_1(x), ... ,  f_{m}(x) \right]^{\top}` be a
23 :math:`m`-dimensional function of :math:`x`.  We are interested in
26 .. math:: \arg \min_x \frac{1}{2}\|F(x)\|^2\ . \\
30 Where, :math:`L` and :math:`U` are lower and upper bounds on the
31 parameter vector :math:`x`.
34 general :math:`F(x)` is an intractable problem, we will have to settle
37 In the following, the Jacobian :math:`J(x)` of :math:`F(x)` is an
38 :math:`m\times n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)`
39 and the gradient vector is :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2
45 determine a correction :math:`\Delta x` to the vector :math:`x`. For
47 the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
50 .. math:: \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2
54 updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
56 the size of the step :math:`\Delta x`. Depending on how the size of
57 the step :math:`\Delta x` is controlled, non-linear optimization
88    1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
91       .. math::
96    3. :math:`\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 -
99    4. if :math:`\rho > \epsilon` then  :math:`x = x + \Delta x`.
100    5. if :math:`\rho > \eta_1` then :math:`\rho = 2  \rho`
101    6. else if :math:`\rho < \eta_2` then :math:`\rho = 0.5 * \rho`
104 Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
105 matrix used to define a metric on the domain of :math:`F(x)` and
106 :math:`\rho` measures the quality of the step :math:`\Delta x`, i.e.,
111 in the value of :math:`\rho`.
116 .. math::
134          :math:`n`. Similarly the presence of loss functions is also
154 .. math:: \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda  \|D(x)\Delta x\|^2
156 Where, :math:`\lambda` is a Lagrange multiplier that is inverse
157 related to :math:`\mu`. In Ceres, we solve for
159 .. math:: \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\…
162 The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
163 the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
166 will assume that the matrix :math:`\sqrt{\mu} D` has been concatenated
167 at the bottom of the matrix :math:`J` and similarly a vector of zeros
168 has been added to the bottom of the vector :math:`f` and the rest of
169 our discussion will be in terms of :math:`J` and :math:`f`, i.e, the
172 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
199 .. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
202 Here, :math:`k` indicates the Levenberg-Marquardt iteration number and
203 :math:`0 < \eta_k <1` is known as the forcing sequence.  [WrightHolt]_
206 sequence :math:`\eta_k \leq \eta_0 < 1` and the rate of convergence
207 depends on the choice of the forcing sequence :math:`\eta_k`.
224 .. math::
229 Note that the vector :math:`\Delta x^{\text{Gauss-Newton}}` is the
230 solution to :eq:`linearapprox` and :math:`\Delta
233 of the gradient. Dogleg methods finds a vector :math:`\Delta x`
234 defined by :math:`\Delta x^{\text{Gauss-Newton}}` and :math:`\Delta
251 the step computation for a particular choice of :math:`\mu` does not
254 a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
256 vectors, as neither of them depend on the value of :math:`\mu`.
271 .. math::   y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1}
274 Given a set of pairs :math:`\{(x_i, y_i)\}`, the user wishes to estimate
275 :math:`a_1, a_2, b_1, b_2`, and :math:`c_1`.
277 Notice that the expression on the left is linear in :math:`a_1` and
278 :math:`a_2`, and given any value for :math:`b_1, b_2` and :math:`c_1`,
280 of :math:`a_1` and :math:`a_2`. It's possible to analytically
281 eliminate the variables :math:`a_1` and :math:`a_2` from the problem
297 additional optimization step to estimate :math:`a_1` and :math:`a_2`
302 linear in :math:`a_1` and :math:`a_2`, i.e.,
304 .. math:: y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1})
310 the :math:`a_1` and :math:`a_2` optimization problems will do. The
311 only constraint on :math:`a_1` and :math:`a_2` (if they are two
316 :math:`(a_1, a_2)`, but decomposing the graph corresponding to the
372    1. Given an initial point :math:`x`
373    2. :math:`\Delta x = -H^{-1}(x) g(x)`
374    3. :math:`\arg \min_\mu \frac{1}{2} \| F(x + \mu \Delta x) \|^2`
375    4. :math:`x = x + \mu \Delta x`
378 Here :math:`H(x)` is some approximation to the Hessian of the
379 objective function, and :math:`g(x)` is the gradient at
380 :math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
381 different search directions :math:`\Delta x`.
384 :math:`\Delta x` is what gives this class of methods its name.
387 direction :math:`\Delta x` and the method used for one dimensional
388 optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
393 1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
432 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
435 Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
437 :math:`x`. Then it is easy to see that solving :eq:`simple2` is
440 .. math:: H \Delta x = g
452 of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
453 :math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
457 .. math:: \Delta x^* = -R^{-1}Q^\top f
468 cases, using a dense QR factorization is inefficient. Let :math:`H =
470 :math:`R` is an upper triangular matrix, then the solution to
473 .. math::
478 The observant reader will note that the :math:`R` in the Cholesky
479 factorization of :math:`H` is the same upper triangular matrix
480 :math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
481 orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
507 Suppose that the SfM problem consists of :math:`p` cameras and
508 :math:`q` points and the variable vector :math:`x` has the block
509 structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
510 :math:`y` and :math:`z` correspond to camera and point parameters,
511 respectively.  Further, let the camera blocks be of size :math:`c` and
512 the point blocks be of size :math:`s` (for most problems :math:`c` =
513 :math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
518 no term :math:`f_{i}` that includes two or more point blocks.  This in
519 turn implies that the matrix :math:`H` is of the form
521 .. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
524 where, :math:`B \in \mathbb{R}^{pc\times pc}` is a block sparse matrix
525 with :math:`p` blocks of size :math:`c\times c` and :math:`C \in
526 \mathbb{R}^{qs\times qs}` is a block diagonal matrix with :math:`q` blocks
527 of size :math:`s\times s`. :math:`E \in \mathbb{R}^{pc\times qs}` is a
528 general block sparse matrix, with a block of size :math:`c\times s`
529 for each observation. Let us now block partition :math:`\Delta x =
530 [\Delta y,\Delta z]` and :math:`g=[v,w]` to restate :eq:`normal`
533 .. math:: \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix}
539 and apply Gaussian elimination to it. As we noted above, :math:`C` is
541 :math:`s\times s`.  Thus, calculating the inverse of :math:`C` by
543 :math:`\Delta z` by observing that :math:`\Delta z = C^{-1}(w - E^\top
546 .. math:: \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ .
551 .. math:: S = B - EC^{-1}E^\top
553 is the Schur complement of :math:`C` in :math:`H`. It is also known as
556 cameras. :math:`S \in \mathbb{R}^{pc\times pc}` is a block structured
557 symmetric positive definite matrix, with blocks of size :math:`c\times
558 c`. The block :math:`S_{ij}` corresponding to the pair of images
559 :math:`i` and :math:`j` is non-zero if and only if the two images
563 Now, eq-linear2 can be solved by first forming :math:`S`, solving for
564 :math:`\Delta y`, and then back-substituting :math:`\Delta y` to
565 obtain the value of :math:`\Delta z`.  Thus, the solution of what was
566 an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
567 inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
569 :math:`pc\times pc` linear system :eq:`schur`.  For almost all
571 points, :math:`p \ll q`, thus solving :eq:`schur` is
580 :math:`S` as a dense matrix [TrefethenBau]_. This method has
581 :math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
586 But, :math:`S` is typically a fairly sparse matrix, as most images
588 option: Sparse Direct Methods. These methods store :math:`S` as a
608 .. math::
622 reduced camera matrix :math:`S` instead of :math:`H`. One reason to do
623 this is that :math:`S` is a much smaller matrix than :math:`H`, but
624 more importantly, it can be shown that :math:`\kappa(S)\leq
625 \kappa(H)`.  Cseres implements PCG on :math:`S` as the
630 The cost of forming and storing the Schur complement :math:`S` can be
632 that computes :math:`S` and runs PCG on it, almost all of its time is
633 spent in constructing :math:`S`; the time spent inside the PCG
635 to :math:`S` via its product with a vector, one way to evaluate
636 :math:`Sx` is to observe that
638 .. math::  x_1 &= E^\top x
639 .. math::  x_2 &= C^{-1} x_1
640 .. math::  x_3 &= Ex_2\\
641 .. math::  x_4 &= Bx\\
642 .. math::   Sx &= x_4 - x_3
645 Thus, we can run PCG on :math:`S` with the same computational effort
646 per iteration as PCG on :math:`H`, while reaping the benefits of a
648 :math:`H`, :eq:`schurtrick1` can be implemented using just the columns
649 of :math:`J`.
667 of :math:`H` [Saad]_. A useful upper bound is
668 :math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
669 number of the matrix :math:`H`. For most bundle adjustment problems,
670 :math:`\kappa(H)` is high and a direct application of Conjugate
674 *preconditioned* system.  Given a linear system, :math:`Ax =b` and a
675 preconditioner :math:`M` the preconditioned system is given by
676 :math:`M^{-1}Ax = M^{-1}b`. The resulting algorithm is known as
679 matrix :math:`\kappa(M^{-1}A)`.
681 The computational cost of using a preconditioner :math:`M` is the cost
682 of computing :math:`M` and evaluating the product :math:`M^{-1}y` for
683 arbitrary vectors :math:`y`. Thus, there are two competing factors to
684 consider: How much of :math:`H`'s structure is captured by :math:`M`
685 so that the condition number :math:`\kappa(HM^{-1})` is low, and the
686 computational cost of constructing and using :math:`M`.  The ideal
687 preconditioner would be one for which :math:`\kappa(M^{-1}A)
688 =1`. :math:`M=A` achieves this, but it is not a practical choice, as
691 that the more information :math:`M` has about :math:`H`, the more
698 preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
699 block structured matrices like :math:`H` can be generalized to the
703 diagonal preconditioners for :math:`S`. The block diagonal of the
704 matrix :math:`B` [Mandel]_ and the block diagonal :math:`S`, i.e, the
705 block Jacobi preconditioner for :math:`S`. Ceres's implements both of
728 with :math:`O(n)` storage, where as a bad ordering will result in an
752 .. math::
757 :math:`x` from the two equations, solving for y and then back
758 substituting for :math:`x`, or first eliminating :math:`y`, solving
759 for :math:`x` and back substituting for :math:`y`. The user can
762 1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
763 2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
764 3. :math:`\{0: x, y\}`        : Solver gets to decide the elimination order.
865    Identity.  However, [Oren]_ showed that using instead :math:`I *
866    \gamma`, where :math:`\gamma` is a scalar chosen to approximate an
875    .. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
879   .. math:: y_k = \nabla f_{k+1} - \nabla f_k
880   .. math:: s_k = x_{k+1} - x_k
882   Where :math:`f()` is the line search objective and :math:`x` the
908    .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
910    where :math:`\|\cdot\|_\infty` refers to the max norm, and
911    :math:`\Delta x_k` is the step change in the parameter values at
912    the :math:`k`-th iteration.
925    .. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
935    .. math:: \text{new_step_size} >= \text{max_line_search_step_contraction} * \text{step_size}
939    .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
947    .. math:: \text{new_step_size} <= \text{min_line_search_step_contraction} * \text{step_size}
951    .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
962    the underlying math), if ``WOLFE`` line search is being used, *and*
964    condition have been found during the current search (in :math:`<=`
993    .. math:: \|f'(\text{step_size})\| <= \text{sufficient_curvature_decrease} * \|f'(0)\|
995    Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
996    of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
1008    .. math:: \text{new_step_size} <= \text{max_step_expansion} * \text{step_size}
1012    .. math:: \text{max_step_expansion} > 1.0
1125    .. math:: \frac{|\Delta \text{cost}|}{\text{cost}} < \text{function_tolerance}
1127    where, :math:`\Delta \text{cost}` is the change in objective
1137    .. math:: \|x - \Pi \boxplus(x, -g(x))\|_\infty < \text{gradient_tolerance}
1139    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
1140    is projection onto the bounds constraints and :math:`\boxplus` is
1150    .. math:: \|\Delta x\| < (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
1152    where :math:`\Delta x` is the step computed by the linear solver in
1331    .. math:: \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i}
1472       dense matrix. The vectors :math:`D`, :math:`x` and :math:`f` are
1480      is dumped as a text file containing :math:`(i,j,s)` triplets, the
1481      vectors :math:`D`, `x` and `f` are dumped as text files
1515    .. math::
1523    typically large (e.g. :math:`10^9`) and when the values are small
1524    (e.g. :math:`10^{-5}`). It is possible to construct *torture cases*
2162 .. math::  y = f(x) + N(0, I)
2164 i.e., the observation :math:`y` is a random non-linear function of the
2165 independent variable :math:`x` with mean :math:`f(x)` and identity
2166 covariance. Then the maximum likelihood estimate of :math:`x` given
2167 observations :math:`y` is the solution to the non-linear least squares
2170 .. math:: x^* = \arg \min_x \|f(x)\|^2
2172 And the covariance of :math:`x^*` is given by
2174 .. math:: C(x^*) = \left(J'(x^*)J(x^*)\right)^{-1}
2176 Here :math:`J(x^*)` is the Jacobian of :math:`f` at :math:`x^*`. The
2177 above formula assumes that :math:`J(x^*)` has full column rank.
2179 If :math:`J(x^*)` is rank deficient, then the covariance matrix :math:`C(x^*)`
2182 .. math:: C(x^*) =  \left(J'(x^*)J(x^*)\right)^{\dagger}
2185 :math:`y` was identity. This is an important assumption. If this is
2188 .. math:: y = f(x) + N(0, S)
2190 Where :math:`S` is a positive semi-definite matrix denoting the
2191 covariance of :math:`y`, then the maximum likelihood problem to be
2194 .. math:: x^* = \arg \min_x f'(x) S^{-1} f(x)
2196 and the corresponding covariance estimate of :math:`x^*` is given by
2198 .. math:: C(x^*) = \left(J'(x^*) S^{-1} J(x^*)\right)^{-1}
2204 should evaluate :math:`S^{-1/2} f(x)` instead of just :math:`f(x)`,
2205 where :math:`S^{-1/2}` is the inverse square root of the covariance
2206 matrix :math:`S`.
2240 of :math:`J'J` is not defined and instead a pseudo inverse needs to be
2243 The rank deficiency in :math:`J` can be *structural* -- columns
2257      four dimensional quaternion used to parameterize :math:`SO(3)`,
2267      (SVD) of :math:`J'J`. We do not know how to do this for large
2295       .. math::   U S V^\top = J
2299       .. math::   (J'J)^{\dagger} = V  S^{\dagger}  V^\top
2308        .. math::
2325    Default: :math:`10^{-14}`
2327    If the Jacobian matrix is near singular, then inverting :math:`J'J`
2330    .. math::
2339    .. math::
2354 ….. math:: \frac{\sigma_{\text{min}}}{\sigma_{\text{max}}}  < \sqrt{\text{min_reciprocal_condition_…
2356       where :math:`\sigma_{\text{min}}` and
2357       :math:`\sigma_{\text{max}}` are the minimum and maxiumum
2358       singular values of :math:`J` respectively.
2362        .. math:: \operatorname{rank}(J) < \operatorname{num\_col}(J)
2364        Here :\math:`\operatorname{rank}(J)` is the estimate of the
2375     instead of computing the inverse of :math:`J'J`, the Moore-Penrose
2376     pseudoinverse of :math:`J'J` should be computed.
2378     If :math:`J'J` has the eigen decomposition :math:`(\lambda_i,
2379     e_i)`, where :math:`lambda_i` is the :math:`i^\textrm{th}`
2380     eigenvalue and :math:`e_i` is the corresponding eigenvector, then
2381     the inverse of :math:`J'J` is
2383     .. math:: (J'J)^{-1} = \sum_i \frac{1}{\lambda_i} e_i e_i'
2393     of the magnitude of :math:`\lambda_i`. If the ratio of the
2400     .. math::  \frac{\lambda_i}{\lambda_{\textrm{max}}} < \textrm{min_reciprocal_condition_number}