external/pffft/README.md

---

# PFFFT: a pretty fast FFT and fast convolution with PFFASTCONV

---

<!-- toc -->

- [Brief Description](#brief-description)
- [Why does it exist?](#why-does-it-exist)
- [CMake](#cmake)
- [History / Origin / Changes](#history--origin--changes)
- [Comparison with other FFTs](#comparison-with-other-ffts)
- [Dependencies / Required Linux packages](#dependencies--required-linux-packages)
- [Benchmarks and results](#benchmarks-and-results)

<!-- tocstop -->

---

## Brief description:

PFFFT does 1D Fast Fourier Transforms, of single precision real and
complex vectors. It tries do it fast, it tries to be correct, and it
tries to be small. Computations do take advantage of SSE1 instructions
on x86 cpus, Altivec on powerpc cpus, and NEON on ARM cpus. The
license is BSD-like.

PFFFT is a fork of [Julien Pommier's library on bitbucket](https://bitbucket.org/jpommier/pffft/)
with some changes and additions.


PFFASTCONV does fast convolution (FIR filtering), of single precision
real vectors, utilizing the PFFFT library. The license is BSD-like.

PFDSP contains a few other signal processing functions.
Currently, mixing and carrier generation functions are contained.
It is work in progress - also the API!
The fast convolution from PFFASTCONV might get merged into PFDSP.


## Why does it exist:

I (Julien Pommier) was in search of a good performing FFT library ,
preferably very small and with a very liberal license.

When one says "fft library", FFTW ("Fastest Fourier Transform in the
West") is probably the first name that comes to mind -- I guess that
99% of open-source projects that need a FFT do use FFTW, and are happy
with it. However, it is quite a large library , which does everything
fft related (2d transforms, 3d transforms, other transformations such
as discrete cosine , or fast hartley). And it is licensed under the
GNU GPL , which means that it cannot be used in non open-source
products.

An alternative to FFTW that is really small, is the venerable FFTPACK
v4, which is available on NETLIB. A more recent version (v5) exists,
but it is larger as it deals with multi-dimensional transforms. This
is a library that is written in FORTRAN 77, a language that is now
considered as a bit antiquated by many. FFTPACKv4 was written in 1985,
by Dr Paul Swarztrauber of NCAR, more than 25 years ago ! And despite
its age, benchmarks show it that it still a very good performing FFT
library, see for example the 1d single precision benchmarks
[here](http://www.fftw.org/speed/opteron-2.2GHz-32bit/). It is however not
competitive with the fastest ones, such as FFTW, Intel MKL, AMD ACML,
Apple vDSP. The reason for that is that those libraries do take
advantage of the SSE SIMD instructions available on Intel CPUs,
available since the days of the Pentium III. These instructions deal
with small vectors of 4 floats at a time, instead of a single float
for a traditionnal FPU, so when using these instructions one may expect
a 4-fold performance improvement.

The idea was to take this fortran fftpack v4 code, translate to C,
modify it to deal with those SSE instructions, and check that the
final performance is not completely ridiculous when compared to other
SIMD FFT libraries. Translation to C was performed with [f2c](
http://www.netlib.org/f2c/). The resulting file was a bit edited in
order to remove the thousands of gotos that were introduced by
f2c. You will find the fftpack.h and fftpack.c sources in the
repository, this a complete translation of [fftpack](
http://www.netlib.org/fftpack/), with the discrete cosine transform
and the test program. There is no license information in the netlib
repository, but it was confirmed to me by the fftpack v5 curators that
the [same terms do apply to fftpack v4]
(http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html). This is a
"BSD-like" license, it is compatible with proprietary projects.

Adapting fftpack to deal with the SIMD 4-element vectors instead of
scalar single precision numbers was more complex than I originally
thought, especially with the real transforms, and I ended up writing
more code than I planned..


## The code:

### Good old C:
The FFT API is very very simple, just make sure that you read the comments in `pffft.h`.

The Fast convolution's API is also very simple, just make sure that you read the comments
in `pffastconv.h`.

### C++:
A simple C++ wrapper is available in `pffft.hpp`.

### Git:
This archive's source can be downloaded with git (without the submodules):
```
git clone https://github.com/marton78/pffft.git
```

### Only two files?:
_"Only two files, in good old C, pffft.c and pffft.h"_

This statement does **NO LONGER** hold!

With new functionality and support for AVX, there was need to restructure the sources.
But you can compile and link **pffft** as a static library.


## CMake:
There's now CMake support to build the static libraries `libPFFFT.a`
and `libPFFASTCONV.a` from the source files, plus the additional
`libFFTPACK.a` library. Later one's sources are there anyway for the benchmark.

There are several CMake options to modify library size and optimization.
You can explore all available options with `cmake-gui` or `ccmake`,
the console version - after having installed (on Debian/Ubuntu Linux) one of
```
sudo apt-get install cmake-qt-gui
sudo apt-get install cmake-curses-gui
```

Some of the options:
* `PFFFT_USE_TYPE_FLOAT` to activate single precision 'float' (default: ON)
* `PFFFT_USE_TYPE_DOUBLE` to activate 'double' precision float (default: ON)
* `PFFFT_USE_SIMD` to use SIMD (SSE/AVX/NEON/ALTIVEC) CPU features? (default: ON)
* `DISABLE_SIMD_AVX` to disable AVX CPU features (default: OFF)
* `PFFFT_USE_SIMD_NEON` to force using NEON on ARM (requires PFFFT_USE_SIMD) (default: OFF)
* `PFFFT_USE_SCALAR_VECT` to use 4-element vector scalar operations (if no other SIMD) (default: ON)

Options can be passed to `cmake` at command line, e.g.
```
cmake -DPFFFT_USE_TYPE_FLOAT=OFF -DPFFFT_USE_TYPE_DOUBLE=ON
```

My Linux distribution defaults to GCC. With installed CLANG and the bash shell, you can use it with
```
mkdir build
cd build
CC=/usr/bin/clang CXX=/usr/bin/clang++ cmake -DCMAKE_BUILD_TYPE=Debug ../
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=~ ../
ccmake .                          # or: cmake-gui .
cmake --build .                   # or simply: make
ctest                             # to execute some tests - including benchmarks
cmake --build . --target install  # or simply: [sudo] make install
```

With MSVC on Windows, you need some different options. Following ones to build a 64-bit Release with Visual Studio 2019:
```
mkdir build
cd build
cmake -G "Visual Studio 16 2019" -A x64 ..
cmake --build . --config Release
ctest -C Release
```

see [https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators)


## History / Origin / Changes:
Origin for this code/fork is Julien Pommier's pffft on bitbucket:
[https://bitbucket.org/jpommier/pffft/](https://bitbucket.org/jpommier/pffft/)

Git history shows following first commits of the major contributors:
* Julien Pommier: November 19, 2011
* Marton Danoczy: September 30, 2015
* Hayati Ayguen: December 22, 2019
* Dario Mambro: March 24, 2020

There are a few other contributors not listed here.

The main changes include:
* improved benchmarking, see [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks)
* double support
* avx(2) support
* c++ headers (wrapper)
* additional API helper functions
* additional library for fast convolution
* cmake support
* ctest


## Comparison with other FFTs:
The idea was not to break speed records, but to get a decently fast
fft that is at least 50% as fast as the fastest FFT -- especially on
slowest computers . I'm more focused on getting the best performance
on slow cpus (Atom, Intel Core 1, old Athlons, ARM Cortex-A9...), than
on getting top performance on today fastest cpus.

It can be used in a real-time context as the fft functions do not
perform any memory allocation -- that is why they accept a 'work'
array in their arguments.

It is also a bit focused on performing 1D convolutions, that is why it
provides "unordered" FFTs , and a fourier domain convolution
operation.

Very interesting is [https://www.nayuki.io/page/free-small-fft-in-multiple-languages](https://www.nayuki.io/page/free-small-fft-in-multiple-languages).
It shows how small an FFT can be - including the Bluestein algorithm, but it's everything else than fast.
The whole C++ implementation file is 161 lines, including the Copyright header, see
[https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp](https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp)

## Dependencies / Required Linux packages

On Debian/Ubuntu Linux following packages should be installed:

```
sudo apt-get install build-essential gcc g++ cmake
```


## Benchmarks and results

#### Quicklink
Find results at [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks).

#### General
My (Hayati Ayguen) first look at FFT-benchmarks was with [benchFFT](http://www.fftw.org/benchfft/)
and especially the results of the benchmarks [results](http://www.fftw.org/speed/),
which demonstrate the performance of the [FFTW](http://www.fftw.org/).
Looking at the benchmarked computer systems from todays view (2021), these are quite outdated.

Having a look into the [benchFFT source code](http://www.fftw.org/benchfft/benchfft-3.1.tar.gz),
the latest source changes, including competitive fft implementations, are dated November 2003.

In 2019, when pffft got my attention at [bitbucket](https://bitbucket.org/jpommier/pffft/src/master/),
there were also some benchmark results.
Unfortunately the results are tables with numbers - without graphical plots.
Without the plots, i could not get an impression. That was, why i started
[https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks),
which includes GnuPlot figures.

Today in June 2021, i realized the existence of [https://github.com/FFTW/benchfft](https://github.com/FFTW/benchfft).
This repository is much more up-to-date with a commit in December 2020.
Unfortunately, it looks not so simple to get it run - including the generation of plots.

Is there any website showing benchFFT results of more recent computer systems?

Of course, it's very important, that a benchmark can be compared with a bunch
of different FFT algorithms/implementations.
This requires to have these compiled/built and utilizable.


#### Git submodules for Green-, Kiss- and Pocket-FFT
Sources for [Green-](https://github.com/hayguen/greenffts),
[Kiss-](https://github.com/hayguen/kissfft)
and [Pocket-FFT](https://github.com/hayguen/pocketfft)
can be downloaded directly with the sources of this repository - using git submodules:
```
git clone --recursive https://github.com/marton78/pffft.git
```

Important is `--recursive`, that does also fetch the submodules directly.
But you might retrieve the submodules later, too:
```
git submodule update --init
```

#### Fastest Fourier Transform in the West: FFTW
To allow comparison with FFTW [http://www.fftw.org/](http://www.fftw.org/),
cmake option `-DPFFFT_USE_BENCH_FFTW=ON` has to be used with following commands.
The cmake option requires previous setup of following (debian/ubuntu) package:
```
sudo apt-get install libfftw3-dev
```

#### Intel Math Kernel Library: MKL
Intel's MKL [https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html)
currently looks even faster than FFTW.

On Ubuntu-Linux it's easy to setup with the package `intel-mkl`.
Similar on Debian: `intel-mkl-full`.

There are special repositories for following Linux distributions:
* Debian/apt: [https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html](https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html)
* RedHat/yum: [https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-yum-repo.html](https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-yum-repo.html)
* Gentoo/ebuild: [https://packages.gentoo.org/packages/sci-libs/mkl](https://packages.gentoo.org/packages/sci-libs/mkl)

#### Performing the benchmarks - with CMake
Benchmarks should be prepared by creating a special build folder
```
mkdir build_benches
cd build_benches
cmake ../bench
```

There are several CMake options to parametrize, which fft implementations should be benched.
You can explore all available options with `cmake-gui` or `ccmake`, see [CMake](#cmake).

Some of the options:
* `BENCH_ID`         name the benchmark - used in filename
* `BENCH_ARCH`       target architecture passed to compiler for code optimization
* `PFFFT_USE_BENCH_FFTW`   use (system-installed) FFTW3 in fft benchmark? (default: OFF)
* `PFFFT_USE_BENCH_GREEN`  use Green FFT in fft benchmark? (default: ON)
* `PFFFT_USE_BENCH_KISS`   use KissFFT in fft benchmark? (default: ON)
* `PFFFT_USE_BENCH_POCKET` use PocketFFT in fft benchmark? (default: ON)
* `PFFFT_USE_BENCH_MKL`    use Intel MKL in fft benchmark?  (default: OFF)

These options can be passed to `cmake` at command line, e.g.
```
cmake -DBENCH_ARCH=native -DPFFFT_USE_BENCH_FFTW=ON -DPFFFT_USE_BENCH_MKL=ON ../bench
```

The benchmarks are built and executed with
```
cmake --build .
```

You can also specify to use a different compiler/version with the cmake step, e.g.:

```
CC=/usr/bin/gcc-9 CXX=/usr/bin/g++-9 cmake -DBENCH_ID=gcc9 -DBENCH_ARCH=native -DPFFFT_USE_BENCH_FFTW=ON -DPFFFT_USE_BENCH_MKL=ON ../bench
```

```
CC=/usr/bin/clang-11 CXX=/usr/bin/clang++-11 cmake -DBENCH_ID=clang11 -DBENCH_ARCH=native -DPFFFT_USE_BENCH_FFTW=ON -DPFFFT_USE_BENCH_MKL=ON ../bench
```

For using MSVC/Windows, the cmake command requires/needs the generator and architecture options and to be called from the VS Developer prompt:
```
cmake -G "Visual Studio 16 2019" -A x64 ../bench/
```

see [https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators)


For running with different compiler version(s):
* copy the result file (.tgz), e.g. `cp *.tgz ../`
* delete the build directory: `rm -rf *`
* then continue with the cmake step


#### Benchmark results and contribution
You might contribute by providing us the results of your computer(s).

The benchmark results are stored in a separate git-repository:
See [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks).

This is to keep this repositories' sources small.