doc/statistics/signal_statistics.qbk

[/
  Copyright 2018 Nick Thompson

  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]

[section:signal_statistics Signal Statistics]

[heading Synopsis]

``
#include <boost/math/statistics/signal_statistics.hpp>

namespace boost::math::statistics {

    template<class Container>
    auto absolute_gini_coefficient(Container & c);

    template<class ForwardIterator>
    auto absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);

    template<class Container>
    auto sample_absolute_gini_coefficient(Container & c);

    template<class ForwardIterator>
    auto sample_absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);

    template<class Container>
    auto hoyer_sparsity(Container const & c);

    template<class ForwardIterator>
    auto hoyer_sparsity(ForwardIterator first, ForwardIterator last);

    template<class Container>
    auto oracle_snr(Container const & signal, Container const & noisy_signal);

    template<class Container>
    auto oracle_snr_db(Container const & signal, Container const & noisy_signal);

    template<class ForwardIterator>
    auto m2m4_snr_estimator(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);

    template<class Container>
    auto m2m4_snr_estimator(Container const & noisy_signal, typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);

    template<class ForwardIterator>
    auto m2m4_snr_estimator_db(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);

    template<class Container>
    auto m2m4_snr_estimator_db(Container const & noisy_signal,typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);

}
``

[heading Description]

The file `boost/math/statistics/signal_statistics.hpp` is a set of facilities for computing quantities commonly used in signal analysis.

Our examples use `std::vector<double>` to hold the data, but this not required.
In general, you can store your data in an Eigen array, and Armadillo vector, `std::array`, and for many of the routines, a `std::forward_list`.
These routines are usable in float, double, long double, and Boost.Multiprecision precision, as well as their complex extensions whenever the computation is well-defined.


[heading Absolute Gini Coefficient]

The Gini coefficient, first used to measure wealth inequality, is also one of the best measures of the sparsity of an expansion in a basis.
A sparse expansion has most of its norm concentrated in just a few coefficients, making the connection with wealth inequality obvious.
See [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard] for details.
However, for measuring sparsity, the phase of the numbers is irrelevant, so we provide the `absolute_gini_coefficient`:

    using boost::math::statistics::sample_absolute_gini_coefficient;
    using boost::math::statistics::absolute_gini_coefficient;
    std::vector<std::complex<double>> v{{0,1}, {0,0}, {0,0}, {0,0}};
    double abs_gini = sample_absolute_gini_coefficient(v);
    // now abs_gini = 1; maximally unequal

    std::vector<std::complex<double>> w{{0,1}, {1,0}, {0,-1}, {-1,0}};
    abs_gini = absolute_gini_coefficient(w);
    // now abs_gini = 0; every element of the vector has equal magnitude

    std::vector<double> u{-1, 1, -1};
    abs_gini = absolute_gini_coefficient(u);
    // now abs_gini = 0
    // Alternative call useful for computing over subset of the input:
    abs_gini = absolute_gini_coefficient(u.begin(), u.begin() + 1);


The sample Gini coefficient returns unity for a vector which has only one nonzero coefficient.
The population Gini coefficient of a vector with one non-zero element is dependent on the length of the input.

The sample Gini coefficient lacks one desirable property of the population Gini coefficient,
namely that "cloning" a vector has the same Gini coefficient; though cloning holds to very high accuracy with the sample Gini coefficient and can easily be recovered by a rescaling.

If sorting the input data is too much expense for a sparsity measure (is it going to be perfect anyway?),
consider calculating the Hoyer sparsity instead.

[heading Hoyer Sparsity]

The Hoyer sparsity measures a normalized ratio of the \u2113[super 1] and \u2113[super 2] norms.
As the name suggests, it is used to measure the sparsity of an expansion in some basis.

The Hoyer sparsity computes ([radic]/N/ - \u2113[super 1](v)/\u2113[super 2](v))/([radic]N -1).
For details, see [@http://www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf Hoyer] as well as [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard].

A few special cases will serve to clarify the intended use:
If /v/ has only one nonzero coefficient, the Hoyer sparsity attains its maxima of 1.
If the coefficients of /v/ all have the same magnitude, then the Hoyer sparsity attains its minima of zero.
If the elements of /v/ are uniformly distributed on an interval [0, /b/], then the Hoyer sparsity is approximately 0.133.


Usage:

    std::vector<Real> v{1,0,0};
    Real hs = boost::math::statistics::hoyer_sparsity(v);
    // hs = 1
    std::vector<Real> v{1,-1,1};
    Real hs = boost::math::statistics::hoyer_sparsity(v.begin(), v.end());
    // hs = 0

The container must be forward iterable and the contents are not modified.
Accepts real, complex, and integer inputs.
If the input is an integral type, the output is a double precision float.


[heading Oracle Signal-to-noise ratio]

The function `oracle_snr` computes the ratio \u2016 /s/ \u2016[sub 2][super 2] / \u2016 /s/ - /x/ \u2016[sub 2][super 2], where /s/ is signal and /x/ is a noisy signal.
The function `oracle_snr_db` computes 10`log`[sub 10](\u2016 /s/ \u2016[super 2] / \u2016 /s/ - /x/ \u2016[super 2]).
The functions are so named because in general, one does not know how to decompose a real signal /x/ into /s/ + /w/ and as such /s/ is regarded as oracle information.
Hence this function is mainly useful for unit testing other SNR estimators.

Usage:

    std::vector<double> signal(500, 3.2);
    std::vector<double> noisy_signal(500);
    // fill 'noisy_signal' signal + noise
    double snr_db = boost::math::statistics::oracle_snr_db(signal, noisy_signal);
    double snr = boost::math::statistics::oracle_snr(signal, noisy_signal);

The input can be real, complex, or integral.
Integral inputs produce double precision floating point outputs.
The input data is not modified and must satisfy the requirements of a `RandomAccessContainer`.

[heading /M/[sub 2]/M/[sub 4] SNR Estimation]

Estimates the SNR of a noisy signal via the /M/[sub 2]/M/[sub 4] method.
See [@https://doi.org/10.1109/26.871393  Pauluzzi and N.C. Beaulieu] and [@https://doi.org/10.1109/ISIT.1994.394869 Matzner and Englberger] for details.

    std::vector<double> noisy_signal(512);
    // fill noisy_signal with data contaminated by Gaussian white noise:
    double est_snr_db = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal);

The /M/[sub 2]/M/[sub 4] SNR estimator is an "in-service" estimator, meaning that the estimate is made using the noisy, data-bearing signal, and does not require a background estimate.
This estimator has been found to be work best between roughly -3 and 15db, tending to overestimate the noise below -3db, and underestimate the noise above 15db.
See [@https://www.mdpi.com/2078-2489/8/3/75/pdf Xue et al] for details.

The /M/[sub 2]/M/[sub 4] SNR estimator, by default, assumes that the kurtosis of the signal is 1 and the kurtosis of the noise is 3, the latter corresponding to Gaussian noise.
These parameters, however, can be overridden:

    std::vector<double> noisy_signal(512);
    // fill noisy_signal with the data:
    double signal_kurtosis = 1.5;
    // Noise is assumed to follow Laplace distribution, which has kurtosis of 6:
    double noise_kurtosis = 6;
    double est_snr = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal, signal_kurtosis, noise_kurtosis);

Now, technically the method is a "blind SNR estimator", meaning that the no /a-priori/ information about the signal is required to use the method.
However, the performance of the method is /vastly/ better if you can come up with a better estimate of the signal and noise kurtosis.
How can we do this? Suppose we know that the SNR is much greater than 1.
Then we can estimate the signal kurtosis simply by using the noisy signal kurtosis.
If the SNR is much less than one, this method breaks down as the noisy signal kurtosis will tend to the noise kurtosis-though in this limit we have an excellent estimator of the noise kurtosis!
In addition, if you have a model of what your signal should look like, you can precompute the signal kurtosis.
For example, sinusoids have a kurtosis of 1.5.
See [@http://www.jcomputers.us/vol8/jcp0808-21.pdf here] for a study which uses estimates of this sort to improve the performance of the /M/[sub 2]/M/[sub 4] estimator.


/Nota bene/: The traditional definition of SNR is /not/ mean invariant.
By this we mean that if a constant is added to every sample of a signal, the SNR is changed.
For example, adding DC bias to a signal changes its SNR.
For most use cases, this is really not what you intend; for example a signal consisting of zeros plus Gaussian noise has an SNR of zero,
whereas a signal with a constant DC bias and random Gaussian noise might have a very large SNR.

The /M/[sub 2]/M/[sub 4] SNR estimator is computed from mean-invariant quantities,
and hence it should really be compared to the mean-invariant SNR.

/Nota bene/: This computation requires the solution of a system of quadratic equations involving the noise kurtosis, the signal kurtosis, and the second and fourth moments of the data.
There is no guarantee that a solution of this system exists for all value of these parameters, in fact nonexistence can easily be demonstrated for certain data.
If there is no solution to the system, then failure is communicated by returning NaNs.
This happens distressingly often; if a user is aware of any blind SNR estimators which do not suffer from this drawback, please open a github ticket and let us know.

The author has not managed to fully characterize the conditions under which a real solution with /S > 0/ and /N >0/ exists.
However, a very intuitive example demonstrates why nonexistence can occur.
Suppose the signal and noise kurtosis are equal.
Then the method has no way to distinguish between the signal and the noise, and the solution is non-unique.


[heading References]

* Mallat, Stephane. ['A wavelet tour of signal processing: the sparse way.] Academic press, 2008.
* Hurley, Niall, and Scott Rickard. ['Comparing measures of sparsity.] IEEE Transactions on Information Theory 55.10 (2009): 4723-4741.
* Jensen, Arne, and Anders la Cour-Harbo. ['Ripples in mathematics: the discrete wavelet transform.] Springer Science & Business Media, 2001.
* D. R. Pauluzzi and N. C. Beaulieu, ['A comparison of SNR estimation techniques for the AWGN channel,] IEEE Trans. Communications, Vol. 48, No. 10, pp. 1681-1691, 2000.
* Hoyer, Patrik O. ['Non-negative matrix factorization with sparseness constraints.], Journal of machine learning research 5.Nov (2004): 1457-1469.

[endsect]
[/section:signal_statistics Signal Statistics]