• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# PFFFT: a pretty fast FFT and fast convolution with PFFASTCONV
2
3## TL;DR
4
5PFFFT does 1D Fast Fourier Transforms, of single precision real and
6complex vectors. It tries do it fast, it tries to be correct, and it
7tries to be small. Computations do take advantage of SSE1 instructions
8on x86 cpus, Altivec on powerpc cpus, and NEON on ARM cpus. The
9license is BSD-like.
10
11
12PFFASTCONV does fast convolution (FIR filtering), of single precision
13real vectors, utilizing the PFFFT library. The license is BSD-like.
14
15PFDSP contains a few other signal processing functions.
16Currently, mixing and carrier generation functions are contained.
17It is work in progress - also the API!
18The fast convolution from PFFASTCONV might get merged into PFDSP.
19
20
21## Why does it exist:
22
23I was in search of a good performing FFT library , preferably very
24small and with a very liberal license.
25
26When one says "fft library", FFTW ("Fastest Fourier Transform in the
27West") is probably the first name that comes to mind -- I guess that
2899% of open-source projects that need a FFT do use FFTW, and are happy
29with it. However, it is quite a large library , which does everything
30fft related (2d transforms, 3d transforms, other transformations such
31as discrete cosine , or fast hartley). And it is licensed under the
32GNU GPL , which means that it cannot be used in non open-source
33products.
34
35An alternative to FFTW that is really small, is the venerable FFTPACK
36v4, which is available on NETLIB. A more recent version (v5) exists,
37but it is larger as it deals with multi-dimensional transforms. This
38is a library that is written in FORTRAN 77, a language that is now
39considered as a bit antiquated by many. FFTPACKv4 was written in 1985,
40by Dr Paul Swarztrauber of NCAR, more than 25 years ago ! And despite
41its age, benchmarks show it that it still a very good performing FFT
42library, see for example the 1d single precision benchmarks
43[here](http://www.fftw.org/speed/opteron-2.2GHz-32bit/). It is however not
44competitive with the fastest ones, such as FFTW, Intel MKL, AMD ACML,
45Apple vDSP. The reason for that is that those libraries do take
46advantage of the SSE SIMD instructions available on Intel CPUs,
47available since the days of the Pentium III. These instructions deal
48with small vectors of 4 floats at a time, instead of a single float
49for a traditionnal FPU, so when using these instructions one may expect
50a 4-fold performance improvement.
51
52The idea was to take this fortran fftpack v4 code, translate to C,
53modify it to deal with those SSE instructions, and check that the
54final performance is not completely ridiculous when compared to other
55SIMD FFT libraries. Translation to C was performed with [f2c](
56http://www.netlib.org/f2c/). The resulting file was a bit edited in
57order to remove the thousands of gotos that were introduced by
58f2c. You will find the fftpack.h and fftpack.c sources in the
59repository, this a complete translation of [fftpack](
60http://www.netlib.org/fftpack/), with the discrete cosine transform
61and the test program. There is no license information in the netlib
62repository, but it was confirmed to me by the fftpack v5 curators that
63the [same terms do apply to fftpack v4]
64(http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html). This is a
65"BSD-like" license, it is compatible with proprietary projects.
66
67Adapting fftpack to deal with the SIMD 4-element vectors instead of
68scalar single precision numbers was more complex than I originally
69thought, especially with the real transforms, and I ended up writing
70more code than I planned..
71
72
73## The code:
74
75### Good old C:
76The FFT API is very very simple, just make sure that you read the comments in `pffft.h`.
77
78The Fast convolution's API is also very simple, just make sure that you read the comments
79in `pffastconv.h`.
80
81### C++:
82A simple C++ wrapper is available in `pffft.hpp`.
83
84
85### Git:
86This archive's source can be downloaded with git including the submodules:
87```
88git clone --recursive https://github.com/hayguen/pffft.git
89```
90
91With `--recursive` the submodules for Green and Kiss-FFT are also fetched,
92to use them in the benchmark. You can omit the `--recursive`-option.
93
94For retrieving the submodules later:
95```
96git submodule update --init
97```
98
99
100## CMake:
101There's now CMake support to build the static libraries `libPFFFT.a`
102and `libPFFASTCONV.a` from the source files, plus the additional
103`libFFTPACK.a` library. Later one's sources are there anyway for the benchmark.
104
105
106## Origin:
107Origin for this code is Julien Pommier's pffft on bitbucket:
108[https://bitbucket.org/jpommier/pffft/](https://bitbucket.org/jpommier/pffft/)
109
110
111## Comparison with other FFTs:
112
113The idea was not to break speed records, but to get a decently fast
114fft that is at least 50% as fast as the fastest FFT -- especially on
115slowest computers . I'm more focused on getting the best performance
116on slow cpus (Atom, Intel Core 1, old Athlons, ARM Cortex-A9...), than
117on getting top performance on today fastest cpus.
118
119It can be used in a real-time context as the fft functions do not
120perform any memory allocation -- that is why they accept a 'work'
121array in their arguments.
122
123It is also a bit focused on performing 1D convolutions, that is why it
124provides "unordered" FFTs , and a fourier domain convolution
125operation.
126
127Very interesting is [https://www.nayuki.io/page/free-small-fft-in-multiple-languages](https://www.nayuki.io/page/free-small-fft-in-multiple-languages).
128It shows how small an FFT can be - including the Bluestein algorithm, but it's everything else than fast.
129The whole C++ implementation file is 161 lines, including the Copyright header, see
130[https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp](https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp)
131
132## Dependencies / Required Linux packages
133
134On Debian/Ubuntu Linux following packages should be installed:
135
136```
137sudo apt-get install build-essential gcc g++ cmake
138```
139
140for benchmarking, you should have additional packages:
141```
142sudo apt-get install libfftw3-dev gnuplot
143```
144
145run the benchmarks with `./bench_all.sh ON` , to include benchmarks of fftw3 ..
146more details in README of [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks)
147
148
149## Benchmark results
150
151The benchmark results are stored in a separate git-repository:
152See [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks).
153
154This is to keep the sources small.
155
156