1# PFFFT: a pretty fast FFT and fast convolution with PFFASTCONV 2 3## TL;DR 4 5PFFFT does 1D Fast Fourier Transforms, of single precision real and 6complex vectors. It tries do it fast, it tries to be correct, and it 7tries to be small. Computations do take advantage of SSE1 instructions 8on x86 cpus, Altivec on powerpc cpus, and NEON on ARM cpus. The 9license is BSD-like. 10 11 12PFFASTCONV does fast convolution (FIR filtering), of single precision 13real vectors, utilizing the PFFFT library. The license is BSD-like. 14 15PFDSP contains a few other signal processing functions. 16Currently, mixing and carrier generation functions are contained. 17It is work in progress - also the API! 18The fast convolution from PFFASTCONV might get merged into PFDSP. 19 20 21## Why does it exist: 22 23I was in search of a good performing FFT library , preferably very 24small and with a very liberal license. 25 26When one says "fft library", FFTW ("Fastest Fourier Transform in the 27West") is probably the first name that comes to mind -- I guess that 2899% of open-source projects that need a FFT do use FFTW, and are happy 29with it. However, it is quite a large library , which does everything 30fft related (2d transforms, 3d transforms, other transformations such 31as discrete cosine , or fast hartley). And it is licensed under the 32GNU GPL , which means that it cannot be used in non open-source 33products. 34 35An alternative to FFTW that is really small, is the venerable FFTPACK 36v4, which is available on NETLIB. A more recent version (v5) exists, 37but it is larger as it deals with multi-dimensional transforms. This 38is a library that is written in FORTRAN 77, a language that is now 39considered as a bit antiquated by many. FFTPACKv4 was written in 1985, 40by Dr Paul Swarztrauber of NCAR, more than 25 years ago ! And despite 41its age, benchmarks show it that it still a very good performing FFT 42library, see for example the 1d single precision benchmarks 43[here](http://www.fftw.org/speed/opteron-2.2GHz-32bit/). It is however not 44competitive with the fastest ones, such as FFTW, Intel MKL, AMD ACML, 45Apple vDSP. The reason for that is that those libraries do take 46advantage of the SSE SIMD instructions available on Intel CPUs, 47available since the days of the Pentium III. These instructions deal 48with small vectors of 4 floats at a time, instead of a single float 49for a traditionnal FPU, so when using these instructions one may expect 50a 4-fold performance improvement. 51 52The idea was to take this fortran fftpack v4 code, translate to C, 53modify it to deal with those SSE instructions, and check that the 54final performance is not completely ridiculous when compared to other 55SIMD FFT libraries. Translation to C was performed with [f2c]( 56http://www.netlib.org/f2c/). The resulting file was a bit edited in 57order to remove the thousands of gotos that were introduced by 58f2c. You will find the fftpack.h and fftpack.c sources in the 59repository, this a complete translation of [fftpack]( 60http://www.netlib.org/fftpack/), with the discrete cosine transform 61and the test program. There is no license information in the netlib 62repository, but it was confirmed to me by the fftpack v5 curators that 63the [same terms do apply to fftpack v4] 64(http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html). This is a 65"BSD-like" license, it is compatible with proprietary projects. 66 67Adapting fftpack to deal with the SIMD 4-element vectors instead of 68scalar single precision numbers was more complex than I originally 69thought, especially with the real transforms, and I ended up writing 70more code than I planned.. 71 72 73## The code: 74 75### Good old C: 76The FFT API is very very simple, just make sure that you read the comments in `pffft.h`. 77 78The Fast convolution's API is also very simple, just make sure that you read the comments 79in `pffastconv.h`. 80 81### C++: 82A simple C++ wrapper is available in `pffft.hpp`. 83 84 85### Git: 86This archive's source can be downloaded with git including the submodules: 87``` 88git clone --recursive https://github.com/hayguen/pffft.git 89``` 90 91With `--recursive` the submodules for Green and Kiss-FFT are also fetched, 92to use them in the benchmark. You can omit the `--recursive`-option. 93 94For retrieving the submodules later: 95``` 96git submodule update --init 97``` 98 99 100## CMake: 101There's now CMake support to build the static libraries `libPFFFT.a` 102and `libPFFASTCONV.a` from the source files, plus the additional 103`libFFTPACK.a` library. Later one's sources are there anyway for the benchmark. 104 105 106## Origin: 107Origin for this code is Julien Pommier's pffft on bitbucket: 108[https://bitbucket.org/jpommier/pffft/](https://bitbucket.org/jpommier/pffft/) 109 110 111## Comparison with other FFTs: 112 113The idea was not to break speed records, but to get a decently fast 114fft that is at least 50% as fast as the fastest FFT -- especially on 115slowest computers . I'm more focused on getting the best performance 116on slow cpus (Atom, Intel Core 1, old Athlons, ARM Cortex-A9...), than 117on getting top performance on today fastest cpus. 118 119It can be used in a real-time context as the fft functions do not 120perform any memory allocation -- that is why they accept a 'work' 121array in their arguments. 122 123It is also a bit focused on performing 1D convolutions, that is why it 124provides "unordered" FFTs , and a fourier domain convolution 125operation. 126 127Very interesting is [https://www.nayuki.io/page/free-small-fft-in-multiple-languages](https://www.nayuki.io/page/free-small-fft-in-multiple-languages). 128It shows how small an FFT can be - including the Bluestein algorithm, but it's everything else than fast. 129The whole C++ implementation file is 161 lines, including the Copyright header, see 130[https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp](https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp) 131 132## Dependencies / Required Linux packages 133 134On Debian/Ubuntu Linux following packages should be installed: 135 136``` 137sudo apt-get install build-essential gcc g++ cmake 138``` 139 140for benchmarking, you should have additional packages: 141``` 142sudo apt-get install libfftw3-dev gnuplot 143``` 144 145run the benchmarks with `./bench_all.sh ON` , to include benchmarks of fftw3 .. 146more details in README of [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks) 147 148 149## Benchmark results 150 151The benchmark results are stored in a separate git-repository: 152See [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks). 153 154This is to keep the sources small. 155 156