Signalsmith FFT
Signalsmith FFT is a small (600-line) C++ header-only FFT implementation.
I wrote it because KISS FFT (the most common one I could find with a commercial-friendly license) was slow, and I thought I could do better. It's not as fast as FFTW (when used properly), but it's pretty good. :)
It only supports some sizes (2N x 1, 3 or 9), and will round up to the closest supported size. It produces permuted results, with a separate de-permutation step.
Source
signalsmith-fft.h - 21kb, 600 lines
You can also view it in Bitbucket.
License
MIT License - if you need something else, let me know.
Performance
Despite its small size, this library:
- Outperforms KISS FFT at all sample sizes
- Competes with FFTW's "estimate" configuration at larger sizes
Here are some benchmarks I ran myself on the machines I had available (64-bit, max optimisation).
The (fairly rough) benchmark code is also in Bitbucket, and if you run it yourself (or add it to any other benchmarks!), please let me know.
How to use
auto fft = getFft<double>(32768); // returns a std::shared_ptr<FFT<double>>
fft->fft(buffer); // In-place (recommended)
fft->ifft(buffer);
fft->fft(input, spectrum);
fft->ifft(spectrum, input);
The .fft()
method produces permuted output, and .ifft()
expects permuted input.
There are methods to re-order the computed spectrum:
fft->permute(permuted, unpermuted); // Out-of-place (recommended)
fft->ipermute(unpermuted, permuted);
fft->permute(result); // In-place
fft->ipermute(result);
Not all sizes are supported. The actual size may be up to 33% larger than requested.
The actual size is returned from .size()
or .setSize()
:
int currentSize = fft.size();
int actualSize = fft.setSize(16000); // returns 16384
You can also provide a second argument to setSize()
/getFft()
for parallel FFTs:
auto fft = getFft<double>(256, 16); // 16 interleaved 256-point FFTs
fft.setSize(512, 8); // 8 interleaved 512-point FFTs
If .setSize()
doesn't actually change the size, it doesn't reallocate/recalculate.
If it does change, it costs as much as constructing a new object with getFft()
.
In a multi-threaded environment, each thread should have its own instance, because of temporary memory used for out-of-place/in-place shuffling.