summaryrefslogtreecommitdiff
path: root/src/math/fft.org
blob: 45e9595003c21b029a8ec444dddb7cd81d47fae5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
#+title: Fast Fourier Transform
#+subtitle: Implementation in Common Lisp
#+author: Thomas Albers Raviola
#+date: <2023-03-06>
#+setupfile: ../../math_options.org

In this article I attept to explain the principles of the fast fourier transform
(FFT) and develop in incremental steps an implementation in common lisp.

*NOTE*: This is part of my personal notes and as such may contain mistakes or
not be correct at all. If you find any issues please contact me at my email
address.

* Introduction

The FFT is an algorithm for computing the discrete fourier transform (DFT) of an
array of values that is meant to be faster than the usual naive implementation.
Instead of having the usual \[ O(n^2) \] behaviour it is an \[ O(n\log(n)) \]
algorithm.

* Derivation

To derive the FFT we first consider the definition of the DFT

Let \[ \xi_k \] be an array of N, possible complex, numbers. The DFT of \[ \xi_k
\] is an array of the same length \[ x_k \] defined as:

{{{beg-eqn}}}
x_k = \sum_{n=0}^{N-1} \xi_n e^{- \frac{2 \pi i}{N} n k}
{{{end-eqn}}}

A possible implementation of this in code would be something like the following:

#+begin_src common-lisp
  (defun dft (x)
    "Naive discrete fourier transform, not fft"
    (flet ((transform (k)
             (loop :with i = (complex 0 1)
                   :for n :below (length x)
                   :sum (* (aref x n) (exp (- (/ (* 2 pi i k n) (length x))))))))
      (map-seq-iota #'transform (length x))))
#+end_src

But we can do a lot better in terms of performance. To this end we first split
the sum {{{ref}}} into even and odd members

{{{beg-align}}}
x_k &= \sum_{n=0}^{N/2-1} \xi_{2n} e^{- \frac{2 \pi i}{N} (2n) k} +
\sum_{n=0}^{N/2-1} \xi_{2n+1} e^{- \frac{2 \pi i}{N} (2n+1) k}\\
&= \sum_{n=0}^{N/2-1} \xi_{2n} e^{- \frac{2 \pi i}{N/2} n k} +
e^{-\frac{2 \pi i}{N}k} \sum_{n=0}^{N/2-1} \xi_{2n+1} e^{- \frac{2 \pi i}{N/2} n
k}\\
&=E_k + e^{-\frac{2 \pi i}{N}k} O_k
{{{end-align}}}

Where \[ E_k \] and \[ O_k \] themselves are FFTs of a smaller array each.
Furthermore notice how thanks to the cyclic nature of the complex exponential
function it is possible to half the number of computations:

{{{beg-align}}}
x_{k+\frac{N}{2}} &= E_{k+\frac{N}{2}} + e^{-\frac{2 \pi i}{N} (k +
\frac{N}{2})} O_{k+\frac{N}{2}}\\
&= E_k - e^{-\frac{2 \pi i}{N} k} O_k
{{{end-align}}}

Thus

{{{beg-eqn}}}
\begin{aligned}
x_k &= E_k + e^{-\frac{2 \pi i}{N} k} O_k\\
x_{k+\frac{N}{2}} &= E_k - e^{-\frac{2 \pi i}{N} k} O_k
\end{aligned}\,\text{,}\quad k=0,\dots,\frac{N}{2}-1
{{{end-eqn}}}

* Implementing the FFT

With the reduction identities we derived for the FFT a preliminary
implementation could be:

#+begin_src common-lisp
  (defun fft (sequence)
    (labels
        ((fft-step (n step offset)
           (let ((ret (make-array n)))
             (if (= n 1)
                 (setf (aref ret 0) (aref sequence offset))
                 (loop
                   :with n2 = (/ n 2)
                   :with ek = (fft-step n2 (* 2 step) offset)
                   :with ok = (fft-step n2 (* 2 step) (+ offset step))
                   :for k :below n2
                   :for c = (exp (- (/ (* 2 pi (complex 0 1) k) n)))
                   :do
                      (setf (aref ret k) (+ (aref ek k) (* c (aref ok k)))
                            (aref ret (+ k n2)) (- (aref ek k) (* c (aref ok k))))))
             ret)))
      (fft-step (length sequence) 1 0)))
#+end_src

The function ~fft-step~ needs to be able to access both even and odd terms, for
this we use the ~step~ and ~offset~ arguments. ~step~ indicate the distance
between consecutive elements of the input to the fft. If we wanted to compute
the FFT of the even terms, this distance would be 2. For the fft inside this
outer fft of the even entries we also want to split further into even an odd
members, this ~step~ needs to be 4 and so on. ~offset~ represents how many
elements must be skipped until the first element of the subarray.

Notice that we are allocating in each step a new array to return. The FFT can
however be computed with a single array allocation. For that we store the values
of \[ E_k \] in the first half of the return array and the values of \[ O_k \]
in the second half. Finally we use the so called "butterfly" operations to
compute the final result.

#+caption: Diagram of the abstract data flow inside the FFT. Observe how the FFT itself is define recursively halving the number of items in each step
[[/svg/fft.svg]]

#+begin_src common-lisp
  (defun fft* (sequence)
    (let ((result (make-array (length sequence))))
      (labels ((fft-step (n step offset roffset)
                 (if (= n 1)
                     (setf (aref result roffset) (aref sequence offset))
                     (let ((n2 (/ n 2)))
                       (fft-step n2 (* 2 step) offset roffset)                 ; e_k
                       (fft-step n2 (* 2 step) (+ offset step) (+ roffset n2)) ; o_k
                       (loop :for k :below n2
                             :for j = (+ roffset k)
                             :for c = (exp (- (/ (* 2 pi (complex 0d0 1d0) k) n)))
                             :for ek = (aref result j)
                             :for ok = (aref result (+ j n2))
                             :do
                                (setf (aref result j)        (+ ek (* c ok))
                                      (aref result (+ j n2)) (- ek (* c ok))))))))
        (fft-step (length sequence) 1 0 0)
        result)))
#+end_src

There are still many optimizations that could be made to speed up our code, like
twiddle factors, cache optimizations and more. But this goes beyond the scope I
planned for this article. Many would probably be also beyond the scope of my
current abilities.