• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2---
3
4# PFFFT: a pretty fast FFT and fast convolution with PFFASTCONV
5
6---
7
8<!-- toc -->
9
10- [Brief Description](#brief-description)
11- [Why does it exist?](#why-does-it-exist)
12- [CMake](#cmake)
13- [History / Origin / Changes](#history--origin--changes)
14- [Comparison with other FFTs](#comparison-with-other-ffts)
15- [Dependencies / Required Linux packages](#dependencies--required-linux-packages)
16- [Benchmarks and results](#benchmarks-and-results)
17
18<!-- tocstop -->
19
20---
21
22## Brief description:
23
24PFFFT does 1D Fast Fourier Transforms, of single precision real and
25complex vectors. It tries do it fast, it tries to be correct, and it
26tries to be small. Computations do take advantage of SSE1 instructions
27on x86 cpus, Altivec on powerpc cpus, and NEON on ARM cpus. The
28license is BSD-like.
29
30PFFFT is a fork of [Julien Pommier's library on bitbucket](https://bitbucket.org/jpommier/pffft/)
31with some changes and additions.
32
33
34PFFASTCONV does fast convolution (FIR filtering), of single precision
35real vectors, utilizing the PFFFT library. The license is BSD-like.
36
37PFDSP contains a few other signal processing functions.
38Currently, mixing and carrier generation functions are contained.
39It is work in progress - also the API!
40The fast convolution from PFFASTCONV might get merged into PFDSP.
41
42
43## Why does it exist:
44
45I (Julien Pommier) was in search of a good performing FFT library ,
46preferably very small and with a very liberal license.
47
48When one says "fft library", FFTW ("Fastest Fourier Transform in the
49West") is probably the first name that comes to mind -- I guess that
5099% of open-source projects that need a FFT do use FFTW, and are happy
51with it. However, it is quite a large library , which does everything
52fft related (2d transforms, 3d transforms, other transformations such
53as discrete cosine , or fast hartley). And it is licensed under the
54GNU GPL , which means that it cannot be used in non open-source
55products.
56
57An alternative to FFTW that is really small, is the venerable FFTPACK
58v4, which is available on NETLIB. A more recent version (v5) exists,
59but it is larger as it deals with multi-dimensional transforms. This
60is a library that is written in FORTRAN 77, a language that is now
61considered as a bit antiquated by many. FFTPACKv4 was written in 1985,
62by Dr Paul Swarztrauber of NCAR, more than 25 years ago ! And despite
63its age, benchmarks show it that it still a very good performing FFT
64library, see for example the 1d single precision benchmarks
65[here](http://www.fftw.org/speed/opteron-2.2GHz-32bit/). It is however not
66competitive with the fastest ones, such as FFTW, Intel MKL, AMD ACML,
67Apple vDSP. The reason for that is that those libraries do take
68advantage of the SSE SIMD instructions available on Intel CPUs,
69available since the days of the Pentium III. These instructions deal
70with small vectors of 4 floats at a time, instead of a single float
71for a traditionnal FPU, so when using these instructions one may expect
72a 4-fold performance improvement.
73
74The idea was to take this fortran fftpack v4 code, translate to C,
75modify it to deal with those SSE instructions, and check that the
76final performance is not completely ridiculous when compared to other
77SIMD FFT libraries. Translation to C was performed with [f2c](
78http://www.netlib.org/f2c/). The resulting file was a bit edited in
79order to remove the thousands of gotos that were introduced by
80f2c. You will find the fftpack.h and fftpack.c sources in the
81repository, this a complete translation of [fftpack](
82http://www.netlib.org/fftpack/), with the discrete cosine transform
83and the test program. There is no license information in the netlib
84repository, but it was confirmed to me by the fftpack v5 curators that
85the [same terms do apply to fftpack v4]
86(http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html). This is a
87"BSD-like" license, it is compatible with proprietary projects.
88
89Adapting fftpack to deal with the SIMD 4-element vectors instead of
90scalar single precision numbers was more complex than I originally
91thought, especially with the real transforms, and I ended up writing
92more code than I planned..
93
94
95## The code:
96
97### Good old C:
98The FFT API is very very simple, just make sure that you read the comments in `pffft.h`.
99
100The Fast convolution's API is also very simple, just make sure that you read the comments
101in `pffastconv.h`.
102
103### C++:
104A simple C++ wrapper is available in `pffft.hpp`.
105
106### Git:
107This archive's source can be downloaded with git (without the submodules):
108```
109git clone https://github.com/marton78/pffft.git
110```
111
112### Only two files?:
113_"Only two files, in good old C, pffft.c and pffft.h"_
114
115This statement does **NO LONGER** hold!
116
117With new functionality and support for AVX, there was need to restructure the sources.
118But you can compile and link **pffft** as a static library.
119
120
121## CMake:
122There's now CMake support to build the static libraries `libPFFFT.a`
123and `libPFFASTCONV.a` from the source files, plus the additional
124`libFFTPACK.a` library. Later one's sources are there anyway for the benchmark.
125
126There are several CMake options to modify library size and optimization.
127You can explore all available options with `cmake-gui` or `ccmake`,
128the console version - after having installed (on Debian/Ubuntu Linux) one of
129```
130sudo apt-get install cmake-qt-gui
131sudo apt-get install cmake-curses-gui
132```
133
134Some of the options:
135* `PFFFT_USE_TYPE_FLOAT` to activate single precision 'float' (default: ON)
136* `PFFFT_USE_TYPE_DOUBLE` to activate 'double' precision float (default: ON)
137* `PFFFT_USE_SIMD` to use SIMD (SSE/AVX/NEON/ALTIVEC) CPU features? (default: ON)
138* `DISABLE_SIMD_AVX` to disable AVX CPU features (default: OFF)
139* `PFFFT_USE_SIMD_NEON` to force using NEON on ARM (requires PFFFT_USE_SIMD) (default: OFF)
140* `PFFFT_USE_SCALAR_VECT` to use 4-element vector scalar operations (if no other SIMD) (default: ON)
141
142Options can be passed to `cmake` at command line, e.g.
143```
144cmake -DPFFFT_USE_TYPE_FLOAT=OFF -DPFFFT_USE_TYPE_DOUBLE=ON
145```
146
147My Linux distribution defaults to GCC. With installed CLANG and the bash shell, you can use it with
148```
149mkdir build
150cd build
151CC=/usr/bin/clang CXX=/usr/bin/clang++ cmake -DCMAKE_BUILD_TYPE=Debug ../
152cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=~ ../
153ccmake .                          # or: cmake-gui .
154cmake --build .                   # or simply: make
155ctest                             # to execute some tests - including benchmarks
156cmake --build . --target install  # or simply: [sudo] make install
157```
158
159With MSVC on Windows, you need some different options. Following ones to build a 64-bit Release with Visual Studio 2019:
160```
161mkdir build
162cd build
163cmake -G "Visual Studio 16 2019" -A x64 ..
164cmake --build . --config Release
165ctest -C Release
166```
167
168see [https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators)
169
170
171## History / Origin / Changes:
172Origin for this code/fork is Julien Pommier's pffft on bitbucket:
173[https://bitbucket.org/jpommier/pffft/](https://bitbucket.org/jpommier/pffft/)
174
175Git history shows following first commits of the major contributors:
176* Julien Pommier: November 19, 2011
177* Marton Danoczy: September 30, 2015
178* Hayati Ayguen: December 22, 2019
179* Dario Mambro: March 24, 2020
180
181There are a few other contributors not listed here.
182
183The main changes include:
184* improved benchmarking, see [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks)
185* double support
186* avx(2) support
187* c++ headers (wrapper)
188* additional API helper functions
189* additional library for fast convolution
190* cmake support
191* ctest
192
193
194## Comparison with other FFTs:
195The idea was not to break speed records, but to get a decently fast
196fft that is at least 50% as fast as the fastest FFT -- especially on
197slowest computers . I'm more focused on getting the best performance
198on slow cpus (Atom, Intel Core 1, old Athlons, ARM Cortex-A9...), than
199on getting top performance on today fastest cpus.
200
201It can be used in a real-time context as the fft functions do not
202perform any memory allocation -- that is why they accept a 'work'
203array in their arguments.
204
205It is also a bit focused on performing 1D convolutions, that is why it
206provides "unordered" FFTs , and a fourier domain convolution
207operation.
208
209Very interesting is [https://www.nayuki.io/page/free-small-fft-in-multiple-languages](https://www.nayuki.io/page/free-small-fft-in-multiple-languages).
210It shows how small an FFT can be - including the Bluestein algorithm, but it's everything else than fast.
211The whole C++ implementation file is 161 lines, including the Copyright header, see
212[https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp](https://github.com/nayuki/Nayuki-web-published-code/blob/master/free-small-fft-in-multiple-languages/FftComplex.cpp)
213
214## Dependencies / Required Linux packages
215
216On Debian/Ubuntu Linux following packages should be installed:
217
218```
219sudo apt-get install build-essential gcc g++ cmake
220```
221
222
223## Benchmarks and results
224
225#### Quicklink
226Find results at [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks).
227
228#### General
229My (Hayati Ayguen) first look at FFT-benchmarks was with [benchFFT](http://www.fftw.org/benchfft/)
230and especially the results of the benchmarks [results](http://www.fftw.org/speed/),
231which demonstrate the performance of the [FFTW](http://www.fftw.org/).
232Looking at the benchmarked computer systems from todays view (2021), these are quite outdated.
233
234Having a look into the [benchFFT source code](http://www.fftw.org/benchfft/benchfft-3.1.tar.gz),
235the latest source changes, including competitive fft implementations, are dated November 2003.
236
237In 2019, when pffft got my attention at [bitbucket](https://bitbucket.org/jpommier/pffft/src/master/),
238there were also some benchmark results.
239Unfortunately the results are tables with numbers - without graphical plots.
240Without the plots, i could not get an impression. That was, why i started
241[https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks),
242which includes GnuPlot figures.
243
244Today in June 2021, i realized the existence of [https://github.com/FFTW/benchfft](https://github.com/FFTW/benchfft).
245This repository is much more up-to-date with a commit in December 2020.
246Unfortunately, it looks not so simple to get it run - including the generation of plots.
247
248Is there any website showing benchFFT results of more recent computer systems?
249
250Of course, it's very important, that a benchmark can be compared with a bunch
251of different FFT algorithms/implementations.
252This requires to have these compiled/built and utilizable.
253
254
255#### Git submodules for Green-, Kiss- and Pocket-FFT
256Sources for [Green-](https://github.com/hayguen/greenffts),
257[Kiss-](https://github.com/hayguen/kissfft)
258and [Pocket-FFT](https://github.com/hayguen/pocketfft)
259can be downloaded directly with the sources of this repository - using git submodules:
260```
261git clone --recursive https://github.com/marton78/pffft.git
262```
263
264Important is `--recursive`, that does also fetch the submodules directly.
265But you might retrieve the submodules later, too:
266```
267git submodule update --init
268```
269
270#### Fastest Fourier Transform in the West: FFTW
271To allow comparison with FFTW [http://www.fftw.org/](http://www.fftw.org/),
272cmake option `-DPFFFT_USE_BENCH_FFTW=ON` has to be used with following commands.
273The cmake option requires previous setup of following (debian/ubuntu) package:
274```
275sudo apt-get install libfftw3-dev
276```
277
278#### Intel Math Kernel Library: MKL
279Intel's MKL [https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html)
280currently looks even faster than FFTW.
281
282On Ubuntu-Linux it's easy to setup with the package `intel-mkl`.
283Similar on Debian: `intel-mkl-full`.
284
285There are special repositories for following Linux distributions:
286* Debian/apt: [https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html](https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html)
287* RedHat/yum: [https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-yum-repo.html](https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-yum-repo.html)
288* Gentoo/ebuild: [https://packages.gentoo.org/packages/sci-libs/mkl](https://packages.gentoo.org/packages/sci-libs/mkl)
289
290#### Performing the benchmarks - with CMake
291Benchmarks should be prepared by creating a special build folder
292```
293mkdir build_benches
294cd build_benches
295cmake ../bench
296```
297
298There are several CMake options to parametrize, which fft implementations should be benched.
299You can explore all available options with `cmake-gui` or `ccmake`, see [CMake](#cmake).
300
301Some of the options:
302* `BENCH_ID`         name the benchmark - used in filename
303* `BENCH_ARCH`       target architecture passed to compiler for code optimization
304* `PFFFT_USE_BENCH_FFTW`   use (system-installed) FFTW3 in fft benchmark? (default: OFF)
305* `PFFFT_USE_BENCH_GREEN`  use Green FFT in fft benchmark? (default: ON)
306* `PFFFT_USE_BENCH_KISS`   use KissFFT in fft benchmark? (default: ON)
307* `PFFFT_USE_BENCH_POCKET` use PocketFFT in fft benchmark? (default: ON)
308* `PFFFT_USE_BENCH_MKL`    use Intel MKL in fft benchmark?  (default: OFF)
309
310These options can be passed to `cmake` at command line, e.g.
311```
312cmake -DBENCH_ARCH=native -DPFFFT_USE_BENCH_FFTW=ON -DPFFFT_USE_BENCH_MKL=ON ../bench
313```
314
315The benchmarks are built and executed with
316```
317cmake --build .
318```
319
320You can also specify to use a different compiler/version with the cmake step, e.g.:
321
322```
323CC=/usr/bin/gcc-9 CXX=/usr/bin/g++-9 cmake -DBENCH_ID=gcc9 -DBENCH_ARCH=native -DPFFFT_USE_BENCH_FFTW=ON -DPFFFT_USE_BENCH_MKL=ON ../bench
324```
325
326```
327CC=/usr/bin/clang-11 CXX=/usr/bin/clang++-11 cmake -DBENCH_ID=clang11 -DBENCH_ARCH=native -DPFFFT_USE_BENCH_FFTW=ON -DPFFFT_USE_BENCH_MKL=ON ../bench
328```
329
330For using MSVC/Windows, the cmake command requires/needs the generator and architecture options and to be called from the VS Developer prompt:
331```
332cmake -G "Visual Studio 16 2019" -A x64 ../bench/
333```
334
335see [https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html#visual-studio-generators)
336
337
338
339For running with different compiler version(s):
340* copy the result file (.tgz), e.g. `cp *.tgz ../`
341* delete the build directory: `rm -rf *`
342* then continue with the cmake step
343
344
345#### Benchmark results and contribution
346You might contribute by providing us the results of your computer(s).
347
348The benchmark results are stored in a separate git-repository:
349See [https://github.com/hayguen/pffft_benchmarks](https://github.com/hayguen/pffft_benchmarks).
350
351This is to keep this repositories' sources small.
352
353