Wavelets in Chemistry Edited by B. Walczak 9 2000 Elsevier Science B.V. All rights reserved
CHAPTER 1 Finding Frequencies in Signals: The Fourier Transform Bas van den Bogaert Solvay SA, DCRT/ACE, Industrial IT and Statistics, Rue de Ransbeek 310, 1120 Brussels, Belgium
I Introduction This is a chapter on the Fourier transform. One may wonder: why speak of Fourier in a book on wavelets? To be honest, there are plenty of people that learn to use and appreciate wavelets without knowing about Fourier. You might be one of them. Yet, all those involved in the development of wavelets certainly knew Fourier, and as a consequence, wavelet literature is full of Fourier jargon. So, whereas you may not need to know Fourier to apply wavelets, you probably will need to know it in order to appreciate the literature. The goal of this chapter is to introduce Fourier in a soft way. Fourier has a rather bad reputation amongst chemists, the reputation of something highly mathematical and abstract. We will not argue with that. Part of Fourier is indeed inaccessible to the less mathematically inclined. Another part, however, is easy to grasp and apply. The discrete Fourier transform in particular, as one might use it in digital signal processing, has a simple basic structure and comprehensible consequences. It is also that part of Fourier that links well to the wavelet transform. The discrete wavelet transform, that is, the kind you are most likely to be using in the future. What makes these discrete transforms easy to understand is that they have a geometrical interpretation. In terms of linear algebra: they are basis transformations. Nevertheless, we will take a glance at pure and undiluted Fourier: the transform in integral form. Not that we need it, but it would be odd not to mention it. Moreover, useful notions from the Fourier integrals can be effectively used, if only loosely, for discrete Fourier.
2
The Fourier integral
Let us look the beast in the eye: +vc
F(m)-
/
f(t)e-i~~
'/
(1)
nt-OC
r(t) - ~
F(m) e+imtdm
(2)
--0(2
with i2 = - - 1 We have some function f of t, where t is often associated with time, so that we can think of f as a signal. Eq. (1) transforms f into F, where F is no longer a function in t, but in m. When we associate t with time, we may think of m as frequency, as the exponential may be written as: e -i~ - cos(rot) - i sin(rot)
(3)
Eq. (2) does the same thing as 1, but in the other direction. It takes F of m and transforms it into f of t. We see that in order to go in the other direction, the sign of the exponent has been swapped from minus to plus. Furthermore, there is a multiplication factor outside the integral. The factor is needed to get back to the same size if we were to go from f to F and back again to f. We could also have defined a set of Fourier integrals putting that factor in the first equation, or dividing it over both. Eqs (1) and (2) have everything to scare off chemists. There are integrals, complex numbers, and m is said to represent frequency, which leaves us pondering about the meaning of negative values for it. This is pure mathematics, it seems. Yet, this form of Fourier is not just a toy for mathematicians. It is useful for mathematical reasoning on models of the real world. Analytical solutions may be obtained for real-world problems. Useful or not, our present vision of the world becomes increasingly digital, we observe and manipulate the real world using digital tools that discretise it. Most often, signals are not continuous and infinitely long, they are discrete
and of finite length. Mathematics exist that allow travelling back and forth between the continuous and the discrete representation. When the continuous Fourier reasoning is to be used for our discrete data, the additional maths do not simplify things. Arguments that are compelling in continuous Fourier may get twisted upon translation to the digital domain. In fact, the discrete representation of Fourier analysis may seem better off without the burden of its continuous ancestor. However, it is possible to loosely apply continuous Fourier reasoning to discrete settings, reasoning that gives a feeling for what happens when one filters a signal, for instance. The most interesting example of such reasoning involves convolution, an operation that is ubiquitous in the domains where Fourier is used. It will be discussed in Section 3.
3
Convolution
We will introduce the concept of convolution using a simple example from systems analysis. We will take a small side step to introduce the basics of the system. Suppose we have a single reactor that has one flow going in and one going out as depicted in Fig. 1. Suppose the reactor is a so-called continuously stirred tank reactor, or CSTR. A CSTR is a well-known theoretical concept. In a CSTR, mixing is by definition perfect. As soon as some material arrives, it is instantaneously completely and homogeneously dispersed throughout the reactor. Imagine there is water flowing through. Now we spike the input with some ink. When the ink arrives at the reactor, we will immediately see it appear in the output. Not as strong as it was, but diluted to the volume of the reactor. After the initial jump, the colour of the output will gradually fade away, as the ink is washed out of the reactor. In the beginning, when the
i
J
CSTR
Fig. 1 A C S T R and its hnpuise response.
concentration is high, the material is washed away quickly and the concentration drops fast. As the concentration becomes lower, the rate at which the material leaves with the outflow becomes lower, i.e. the concentration drops more slowly. In short: the rate at which the concentration decreases is inversely proportional to the current concentration. This amounts to a simple differential equation. When we solve this equation we obtain a formula of the concentration profile in the output after a single spike in the input. This is called the impulse response of the CSTR. c(t)
-
(4)
c ( 0 ) e TM
where k, the time constant, depends on the ratio of flow to reactor volume, and c(0), the initial concentration, depends on the amount of ink introduced and, again, reactor volume. A high k means that the reactor is flushed rapidly and the concentration drops fast. Now, what would happen if we were to spike the input several times, with some time in between? As depicted in Fig. 2, the concentration profile in the output would be the sum of the responses to the individual spikes. When we cut through the output profile at some moment t we see that the contributions correspond to different positions on the basic impulse response. For the first spike, we are already on the tail of the response, for the last we are still close to the top. To get another view, we start by taking the mirror image of the impulse response. Its exponential slope will be to the left and its perpendicular edge will be to the right. The three contributions at time t are obtained by multiplying the input with the mirrored impulse response positioned at time t. In this example the input consists of the three impulses in the midst of zeros. Therefore, the multiplication leads to a sampling of three points from the (mirrored) impulse response. In general, the input is a
CSTR
Fig. 2 A series of impulses on a CSTR, the responses and their envelope.
continuous signal, which can be regarded as a series of infinitely closely spaced impulses of different amplitude, and there is an infinite number of contributions. In the example, the overall output signal at time t is the sum of the three products. In general, it is the integral of the product of two signals, namely the input and the mirrored impulse response positioned at time t. To obtain the entire output signal, we drag the mirrored impulse response over the input. At each position t of the impulse response, we multiply and sum to get the output signal in t. This dragging process is illustrated by Fig. 3. That operation is called convolution. Any input signal can be thought of as a series of impulses and the output signal will be the convolution of the impulse response and the input. In other words: if we know the impulse response of a system, we can derive what the output will be like given some input. The formal description of a convolution is the convolution integral: g(t)-
/
f(z)h(t-z)d~
(5)
--OO
where g(t) could be the output of a system with impulse response h(t) and input f(t). Impulse responses are usually relatively easy to come by, but the effect of a convolution is often difficult to picture without actually evaluating the convolution integral, which is seldom a simple task. This is where Fourier comes in. The convolution theorem states that a convolution in the t-domain is equivalent to a multiplication in the co-domain. What we need to do is Fourier transform the input and the impulse response. The product of these functions is the Fourier transform of the output. So if we want the output, we need to transform back.
Fig. 3 Every point of the envelope response can be seen as a multiplication of the input impulses and a mirrored impulse response.
This may not seem to be a simplification, but in many cases it is, and in the following, we will frequently use this property.
4
Convolution and discrete Fourier
In the discrete Fourier setting, the convolution theorem still holds, but with an important modification. The multiplication of discrete Fourier transforms corresponds to a convolution that is circular. One can imagine that the convolution described above, dragging some impulse response along the signal, gets into trouble when we have a finite set of data. At the beginning and the end of the signal, the shape we are dragging along it will stick out, as depicted in Fig. 4. The simplest solution is to exclude those sections of the signal in the output, i.e. to start the convolution on the position where the entire shape encounters signal, and to stop when the front of the shape meets the end of the signal. That would make the output signal shorter than the input. An alternative could be to simply sum the remaining products when, in some position, the shape sticks out. That would be equivalent to assuming that the signal is zero beyond the available data. In the CSTR example above that was a reasonable assumption, but in general it is not. Yet another way of solving the problem of missing data at the edges of the signal is to think of the signal as something that repeats itself. After the end
/ Fig. 4 Convolution at beginning o[ discrete signal. The impulse response is too long.
of the signal, we will suppose it starts over as at the beginning. Hence, before the start, we will suppose the signal behaved as it does at the end. This is a circular convolution, as depicted in Fig. 5. In a discrete convolution, either we lose part of the signal, or we deform that part. As long as the signal is long compared to the shape it is being convoluted with, we do not worry too much about the deformation. Under those circumstances, we will loosely use the convolution theorem, as if the circular aspect were not there.
5
Polynomial approximation and basis transformation
This section will elaborate the following ideas. The Fourier transform can be interpreted as a polynomial approximation of a signal, where the polynomial is a series of sines (and cosines) of increasing frequency. When the degree of the polynomial is high enough, the approximation will be perfect: we will accurately reproduce the entire signal. At that point, the polynomial can be seen as a basis for signal space, and the calculation of the coefficients boils down to a basis transformation. Suppose we have a set of n data points (xi, Yi), a calibration line, for example. The data are plotted in Fig. 6.
[_....._
Fig. 5 In a circular convolution, the signal is wrapped around to avoid the problem of an impulse response that is too long at the edges of the signal.
10
Fig. 6 Scatter plot of a set of x - y data.
We wish to describe y as a function of x. A straight line seems okay as a first approximation. In that case the model is a first-order polynomial: y = [30 + [31x + t;
(6)
where the error term t; describes the fact that the Yi will not perfectly fit our model, due to m e a s u r e m e n t error and to model insufficiency. We might want to add a quadratic term if we suspect curvature, i.e. go to a second-order polynomial: y - 130 + 131x + 132x2 + t:
(7)
Note that the e of Eq. (7) is not the same as in Eq. (6). If a second-order is not sufficient we try a third-order etc. If we use a polynomial of order n - 1, we are sure to perfectly describe the data. There would be no degrees of freedom left. Fig. 7 shows the first orders of a p p r o x i m a t i o n of the data of Fig. 6. In general, a perfect description is not what we aim for. As the responses have not been measured with infinite precision, a perfect description would go beyond describing the process we set out to observe. It would describe the m e a s u r e m e n t error as well. In a polynomial approximation, we would typically stop at an order well below the limiting n - 1. In other words, we suspect the higher-order terms to be representing noise. That is a general principle we will also encounter in Fourier. F o r the calculation of the coefficients in our polynomial model we use linear regression, i.e. a least squares projection of the data onto the model. This is very easy to write down in matrix notation. Our model becomes: y-
XI$+ t;
(8)
First order
Zero order 25
15 10
l
9
OoOoo
'~[
9"'.;~.
1
9o
51~176 O" 0
01
5
25 .
10
~~
0
.~ "
5
10
5
9
-
10
15
J
20
Tenth order
Second order . .
.
,
0
15 9
-
15
25
.,,,a'
"
20
0
5
10
15
20
Fig. 7 Polynomial approximation ol'orders O, 1, 2 and lO for the data in Fig. 6.
where y is the n-vector of responses Yl to y,,, t; the n-vector of residual errors, p the p-vector of the coefficients if the polynomial is of order p - 1 and X the n • p model matrix. In case of a second-order model, X can be constructed as"
X
-
l
1 xl x{/ 9
[
9
1
Xn
x n2
(9)
The coefficients [I are estimated using" -
( x T x ) - ' XVy
(10)
The matrix inversion is a crucial element. In the ideal situation, the columns of X are orthogonal. That means that x T x is diagonal and the matrix inversion boils down to simple divisions. We can go one step further and normalise those orthogonal columns, making XTX the identity matrix and allowing us to write" ~--XTy
(11)
Each coefficient 13j can be calculated independently as the inproduct of the response vector y and column j of X: n
[3J -- Z i=l
xi,jYi
(12)
12 If X is constructed simply by adding increasing powers of the basic x, as in Eq. (9), x T x is not diagonal. However, it is possible to construct polynomials in x that do result in'diagonal xTx, i.e. to construct orthogonal polynomials. A simple solution is the Chebychev series. If the xi are equidistant and we rescale them to {0, 1,... ,n}, the following series of polynomials is orthogonal: p0 = 1 Pl - x - ~ n
1
lk2 (n + 1) 2 - k 2 Pk+l - - P l P k - - ~ 4k 2 - 1 Pk-l
(1 < k <_ n -
1)
(13)
The columns of X contain the successive orders of this polynomial. Orders 0 to 8 are plotted in Fig. 8 for the case of n = 100. With such a large n, the functions become smooth, whereas for small n, they are rather ragged. When the Chebychev polynomial series is developed until the order k equals n - 1, i.e. until there are as many polynomial coefficients as there are observations, the matrix X can be considered an orthogonal basis for n-dimensional space. The coefficients [3j are the co-ordinates in this alternative system of axes, this other domain, as it is often called. We could speak of the Chebychev domain in this case. Eq. (10) describes the basis transformation, i.e. the projection of the signal onto the alternative basis. The transform has only changed our perspective on the data; nothing has been changed or lost. So we could also transform back, using the model at the start of all this: y = X~.
k=O
k=l
k=2
k=4
k=5
k=7
k=8
Fig. 8 Some Chehvchev pol.l'noutMls.
13
People working in chemometrics will be familiar with another kind of basis transformation" principal component analysis (PCA). They may be puzzled by the differences between PCA and orthogonal polynomials. Therefore we will compare the two. An orthogonal polynomial provides a fixed basis for n-dimensional space on which we can project any individual vector that is n points long. The basis is fixed in the sense that is not dependent on the vectors that will be projected on it. Or vice versa, we do not need to know those vectors in order to construct any of these polynomial bases. This is in sharp contrast to PCA, that uses a set of vectors to define a new basis, well suited to represent just that set. The Chebychev polynomial is just one possibility to construct a fixed orthogonal basis for n-dimensional space. There are many others. Interesting members of the family are the Hermite polynomial, Fourier and Wavelets. As there are many, the question arises how to choose between them. Before we are able to answer that question, we need to deal with another, more fundamental one: why do a basis transformation in the first place? The purpose of a basis transformation is always: to make things easier to see or do. Take PCA for instance, its main usage is to reduce the number of dimensions of a data set, i.e. to use only a limited set of basis vectors to describe the data. The reduction can be a goal in itself, but it also allows us to concentrate on the main tendencies in the data. It is like the polynomial approximation, where we strive for a model with few terms, typically only the lower order ones. The purpose of the Fourier transform, or the wavelet transform, is much the same. A dimension reduction, i.e. a description of the data using a subset of basis functions, serves to compress and to improve visibility. When visibility is the issue, the process of dropping irrelevant basis functions is usually referred to as filtering in a Fourier setting, or denoising in Wavelets.
6
The Fourier basis
The Fourier polynomial series is not a sequence of increasing powers of x, like the Chebychev polynomial, but a series of sines and cosines of increasing frequency. In fact, there is no longer a notion of x and y, as in the initial example of polynomial approximation, but just y, a series of num-
14 bers. We will call it a signal, which may be a variable recorded over time, an absorption recorded as a function of wavenumber, or any vector you like. It is common practice to speak of the Fourier coefficients as the representation of the signal in the Fourier domain, or, alternatively, the frequency domain. The signal itself is then said to be a representation in the time domain. This is confusing when our signal is not a function of time, but, e.g. an absorption spectrum. In Fourier and Wavelet literature, however, these notions of time and frequency are so common that they are unavoidable. If the signal to be transformed is n points long, the terms of the polynomial are defined on {0, 1 , . . . , n - 1}. This equidistant grid will be called x. For ease of notation we will assume that n is odd. The terms of the polynomial, i.e. the n functions that make up the Fourier basis are a0=
1
ak--Cos(k2rtx/n),
k c [ 1 , . . . , n - 2 1]
bk-sin(k2rtx/n),
k E {1, . . . . ~ n - 21]
(14)
Fig. 9 gives a plot of the ak for k E (0,1,2,3) and the bk for k C (1,2,3), for n = 99. Small n do not give smooth curves. The functions ak and bk enter as columns into the matrix X, e.g. as X-
[a0
a,
...
a(n-,)/2
b,
...
b(n-,)/2]
(15)
The Fourier transform can be done using either Eq. (10), or Eq. (l 1). The [I will be the Fourier coefficients. When we use Eq. (11), we should realise that X is orthogonal, not orthonormal. Therefore the coefficients will not be the true [I but XVXp. When we back transform from the true p, from the result of Eq. (10), we use simply y = Xp. But when we go back from the result of Eq. (11), we need to use y - x ( x T x ) - I p . In short, the use of (xTx) -1 is inevitable, either forwards or backwards. It should be noted here that the Fourier coefficients will probably not be calculated using Eq. (10) or (11). Most applications of Fourier are based on the so-called Fast Fourier Transform (FFT), which is a clever way of arranging the calculations in order to speed them up. That is of little concern
15
a 1
b 1
a2
b2
a3
b 3
Fig. 9 Some elements of the Fourier series, i.e. base functions of the Fourier basis.
but for two reasons. One is that the F F T works only for signals whose length is an integer power of 2. So we have to do something to the signals that do not happen to have that length. This is not an easy problem. When we cut the signal we lose information, when we add something to it we introduce artefacts. We will not go into any detail on this, but we note that the fast wavelet transform suffers from the same problem. The other reason the calculation may concern us is that the F F T will return the Fourier coefficients as a series of n complex numbers. This is the most common representation in the world of Fourier. We should not be bothered by it. The real parts correspond to the cosine terms ak, the imaginary parts belong to the sine terms bk. But there are n complex numbers and this way we will find 2n coefficients, for only n points of the signal. In fact we need only look at the first half of the series of complex numbers, because this series is symmetrical with respect to its central value. The second half is often referred to as the negative frequencies. A pair of ak and bk for which k is the same refers to a single frequency. In other words, k can be interpreted as frequency. This is mathematically obvious as c o s ( x ) = sin(x + n/2), i.e. a cosine is a shifted sine. By taking the sum of a sine and a cosine, we do not create a different frequency, we describe the same basic frequency they have in common, and by changing the ratio of sine and cosine, we set the phase of this frequency. Fig. 10 illustrates this. By
16
ww
Vk/W Fig. 10 Two lhTear comb&ations o[a s&e and a cosine, result&g hl oscillations o[d(fferent phase.
adding a bit of a sine to a cosine, we shift the cosine towards the sine. The more we add, the closer we get to the sine. The n coefficients refer to (n + 1)/2 frequencies if n is odd, or 1 + n/2 if n is even. a0 is zero frequency, the offset of the signal, k = 1 refers to the base frequency of the analysis; all other frequencies are integer multiples of it. We could also say that it is the frequency resolution of our analysis, as we step through the frequency domain with steps the size of the base frequency. The longer the signal (the bigger n), the lower the base frequency, as its period exactly fits the length of the signal. In other words, the higher our frequency resolution. The maximum frequency in the Fourier basis is uniquely determined by the sampling frequency and does not depend on the length of the signal. It is always n/2 times the base frequency, i.e. its period is two nth of the length of the signal n, i.e. two points. Having set out the outlines of the Fourier basis, we can start to answer the question why one should want to use it. One answer could be: because there are signals for which it is appropriate. Which changes the question to: what kind of signals is that. We have seen that a polynomial development of x is a logical thing to do when we want to describe data that most likely fall onto a straight line, with potential curvature that would need to be captured. Analogously, we can think of a signal that is basically a sinusoidal, with potentially some higher frequencies and certainly noise, as illustrated in Fig. l l(a). The Fourier basis would be an obvious choice for such a signal. In Fig. ll(c), the approximation of the signal using the appropriate frequency has been plotted. Being able to pick the appropriate frequencies is
17
(a)
(b)
I
9
I
|
i
(c)
Fig. 11 (a) noisy periodic signal; (b) its PSD," (c) reconstruction of the signal using only the two strong frequencies.
result of Fourier analysis. When the signal of Fig. l l(a) is transformed, a series of n coefficients is obtained that we know to be grouped in pairs referring to the frequencies in the basis. In this case, we are not interested in the phases of those frequencies, only in their relative importance for describing the signal. The sum of the squares of the coefficients of sine and cosine gives what is called the power of that frequency. We could also look at the amplitude, which is the square root of the power. A plot of power versus frequency is called the power spectrum or power spectral density (PSD). For the signal in Fig. 11 (a), the PSD is given in Fig. 11 (b). Two frequencies clearly stand out in the spectrum. The others will be considered noise. In this example we knew that the signal was periodic. In practice, this may not be the case. There may, e.g. be seasonal effects in a product property, or resonance effects in a controlled variable in a process. Fourier analysis aids in finding those phenomena. We could extend our view to any signal that is periodical, not just sinusoidal. Let us look at the block wave of Fig. 12(a). A sine as a first approximation of the block is obvious, but then what? Higher frequencies, with their steeper slopes, are required to get into the corners of the block. It can be derived mathematically that it is a beautiful series of discrete frequencies that makes up a block wave, viz. f, 3f, 5f, ..., where f is a sine with the same period as the block. The amplitude diminishes as the frequencies get
18
J2/
(a)
(b)
(c)
Fig. 12 (a) Some periods of a block wave and a sine with the same period and corresponding phase," (b) the PSD of the block wave," (c) approximation of the block wave using the first six frequencies differing from zero in the PSD.
higher. The PSD of the signal in Fig. 12(a) is given in Fig. 12(b). When we do not use all frequencies to reconstruct the block wave, oscillations will remain, as illustrated by Fig. 12(c). We observe that, although Fourier is suitable for periodic signals, it is not particularly efficient for sharp edges. It is obvious why not: the basis functions themselves, the sines/cosines are smooth. This leads to a general observation: something sharp, sudden or narrow in the time domain will be something wide in the frequency domain.
7
Fourier transform: Numerical examples
To get a feeling of Fourier transformation, in this section we will go through a small numerical example. Suppose we want to transform signals that are 9 points long. We need to set up a 9-by-9 transformation matrix. The matrix is constructed following Eq. (15):
1 cos(1-2n.0/9).., cos(4-2~-0/9) sin(l.27t.0/9).., sin(4.2n.0/9)) X
_
.
.
.
9
"
1 cos(1 2n 8/9)
i
:
cos(4 2n 8/9) sin(l 2n 8/9)
"
sin(4 2n 8/9)
19
This evaluates to: 1
1 1 1 1 1 1 1 1
1
0.77 0.17 -0.50 -0.94 -0.94 -0.50 0.17 0.77
1
1
0.17 -0.94 -0.50 0.77 0.77 -0.50 -0.94 0.17
1
-0.50 -0.50 1 -0.50 -0.50 1 -0.50 -0.50
0
-0.94 0.77 -0.50 0.17 0.17 -0.50 0.77 -0.94
0
0.64 0.98 0.87 0.34 -0.34 -0.87 -0.98 -0.64
0
0.98 0.34 -0.87 -0.64 0.64 0.87 -0.34 -0.98
0
0.87 -0.87 0 0.87 -0.87 0 0.87 -0.87
0.34 -0.64 0.87 -0.98 0.98 -0.87 0.64 -0.34
Note." The values have been rounded to two decimals merely to keep the above table small. F o r future calculations, more decimals may be needed. The columns of this matrix, i.e. the Fourier basis functions for signals that are 9 points long, are plotted in Fig. 13. The functions are more ragged than those of Fig. 9. N o w let us take i.e. on position signal in system order to obtain cients following
1
1
1 1
0.77 0.17
1
-0.5
1
0.17 -0.94 -0.5
a very simple signal: all zeros except for a one in the middle, 5. This may seem a bit artificial, but it is a very i m p o r t a n t analysis. It is the impulse we can use to perturb a system in its impulse response. The calculation of the Fourier coeffiEq. (11) is given below:
1
1
-0.5 -0.5
-0.94 0.77
-0.94 0.77
-0.5
-0.5
1
1
1
1
-0.5 -0.5
0.17 -0.94
1
-0.5
1 -0.94 0.77 -0.5 0.17 0.17 -0.5 0.77 0 0.64 0.98 0.87 0.34 -0.34 -0.87 -0.98 0 0.98 0.34 -0.87 -0.64 0.64 0.87 -0.34 0 0.87 -0.87 0 0.87 -0.87 0 0.87 0 0.34 -0.64 0.87 -0.98 0.98 -0.87 0.64
1
0
1
0.77 0.17
0 0
-0.94 0.77
0
-0.5
-0.5
-0.94 -0.64 -0.98 -0.87 -0.34
x
1 0 0 0 0
=
0.17 0.34 -0.64 0.87 -0.98
F o r this particular signal, the calculations are strongly simplified because of all the zeros. In fact the signal can be said to select just the 5th column of X T, i.e. the 5th row of X.
20
a1
b~
a2
b2
a3
b3
a4
I)4
Fig. 13 The columns of the matrix X in the numerical example.
The result is not directly interpretable. We prefer to calculate the power spectrum. k
0 1 2 3 4
Power
ao al
a2 a3 a4
1 -0.94 0.77 -0.5 0.17
bl b2 b3 ]14
0.34 -0.64 0.87 -0.98
1 1 1 1 1
We see that all frequencies are equally important! This is exactly the reason why the impulse is so popular: it contains all frequencies. When we use it to perturb a system, we excite every frequency. On the other hand, when efficiency of representation is the issue, the sines and cosines of Fourier clearly are not ideal for this completely localised phenomenon in our signal. Now we will Fourier transform a Gaussian that is 9 points long. The transformation matrix remains the same, and the calculation goes like:
21
1
1
1
0.77
1
0.17
1
-0.5
1 0 0 0 0
-0.94 0.64 0.98 0.87 0.34
1
1
0.17 -0.5 -0.94 -0.5
1
1
-0.94 0.77 -0.5 1 -0.5 0.77 -0.5 0.17 0.34 0.98 0.87 0.34 -0.87 -0.64 -0.87 0 0.87 -0.64 0.87 -0.98
-0.94 0.77 -0.5 0.17 -0.34 0.64 -0.87 0.98
2.51 1 1 0.0003 -1.85 0.17 0.77 0.0111 -0.5 0.72 -0.94 0.17 0.1353 -0.5 -0.14 -0.5 -0.5 0.6065 1 0 . 7 7 - 0 . 9 4 x 1.0000 = 0.00 -0.5 0.67 0.6065 -0.87 -0.98 -0.64 -0.61 0.1353 0.87 -0.34 -0.98 0.24 0.0111 0 0.87 -0.87 -0.10 0.0003 -0.87 0.64 -0.34 1
The power spectrum is obtained as:
k 0 1 2 3
ao al a2 a3
2.51 -1.85 0.72 -0.14
bl b2 [I 3
0.67 -0.61 0.24
4
an
O. O0
b4
--0.10
Power 6.28 3.86 0.89 0.08 O. O0
The spectrum shows us that the Gaussian contains primarily low frequencies. This can be expected, as the shape of a Gaussian is rather smooth, and has no very sharp features that would require high frequencies. For continuous, infinitely long signals, it can be derived using Fourier integrals that the transform of a Gaussian is itself a Gaussian, whose width is inversely proportional to that of the original. In other words, the wider the Gaussian in our signal, the narrower its counterpart in the Fourier domain. The spectrum we just calculated contains only half a Gaussian, because our discrete Fourier transformation finds only positive frequencies. If we want to relate our discrete spectrum to the continuous one, all we have to know is that a spectrum is symmetrical in zero frequency. The next thing we will try is to transform back. As we used p - XTy to transform forwards whilst X was not orthogonal, we have to use y - X ( X T x ) - l p to transform backwards. In other words, we have to multiply our coefficients with ( x T x ) -l That is a diagonal matrix, and this multiplication comes down to dividing each coefficient by the sum of squares of the corresponding basis function, which is 9 for the first and 4.5 for the others. The corrected coefficients are:
22
0.2785 -0.4102 0.1610 -0.0311 0.0022 0.1493 -0.1351 0.0539 -0.0124 When we do not touch these coefficients, going back will simply reproduce the initial Gaussian. Here we will try what happens when we drop the highest frequencies. 1
1
l
1
0.77
0.17
1
0.17
1
1
-0.50
-0.94
-0.94
-0.50
-0.50
1
1
-0.50
1
-0.94
0.77
-0.50
1
-0.94
0.77
-0.50
1
-0.50
1
0.17
1
0.77
-0.50
1
-0.94
-0.50
0.17
-0.50
0.77 -0.50 0.17 0.17
0
0
0.64
0.98
0.98
0.34
0 0.87 -0.87
0.87
-0.87
0
0.34
-0.64
0.87
-0.34
0.64
-0.87
-0.50
-0.87
0.87
0
0.77
-0.98
-0.34
-0.64
-0.98
-0.94
0.87 -0.87
0 0.34 -0.64 0.87 -0.98 0.98 -0.87 0.64 -0.34
x
0.2785
0.0293
-0.4102
-0.0448
0.1610
0.1568
0
0.6494
0 0.1493 -0.1351 0 0
=-
0.9252 0.6494 0.1568 -0.0448 0.0293
To show the consequences, the original Gaussian and the reconstruction made by back transformation after cutting the high frequencies, are plotted together in Fig. 14. We still recognise the initial peak in the reconstruction, but the reconstruction is a bit wider and oscillations appear at the foot. The widening is a general result of suppressing high frequencies, the oscillations are due to the sharpness of the cut-off.
8
Fourier and signal processing
Suppose we have a signal consisting of some gaussian peaks and noise, like a chromatogram for instance. A plot of such a signal is given in Fig. 15(a), and its power spectrum in Fig. 15(b).
23
9 ..
Gaussian n
Fig. 14 A Gaussian and its low-pass reproduction.
(a)
(b)
40
(c)
Fig. 15 (a) Gaussians with some noise," (b) The PSD; (c) the smoothed version of (a).
The peaks are found back at low frequencies, the high frequencies are primarily noise. We can imagine filtering in the Fourier domain by dropping or attenuating the high frequencies and then transforming back. Let us simply set all frequencies above 40 to zero, as illustrated in Fig. 15(d). The result is given in Fig. 15(c). We just applied a low-pass filter (LP). The opposite would be a high-pass filter (HP). Under other circumstances, it may be useful to select a band of frequencies somewhere in the frequency range, not necessarily to the high or low extreme. In that case we would be using a band-pass filter (BP). Which filter is appropriate depends on the type of signal and the type of noise.
24 A filter can be implemented in the time domain as well. It would be the convolution of the signal with the back transform of the weight function we apply to the frequencies. Vice versa, a filter designed in the time domain can be implemented in the Fourier domain as a multiplication with the Fourier transform of the impulse response of the filter. The hard cut-off we applied in Fig. 15 amounts to a weight function with the shape of a block; ones up to the cut-off frequency, and zeros above. The back transform of a block is a sinc function, i.e. the function sin(x)/x. The wider the block, the narrower the sinc. The equivalent operation in the time domain would thus be a convolution of the signal with this sinc, as illustrated by Fig. 16. The consequences of a filter shape can be visualised most easily by picturing what happens if there is a spike in the signal. The output of the filter than contains a copy of the filter shape on the position of the spike. In other words, we get to see the impulse response of the filter. For a filter shape that is wide and oscillating, phenomena that are purely local in the time domain get spread out and deformed. If we want to have a more reasonable filter shape in the time domain, we have to use a smoother cut-off in the frequency domain. It is the sharpness of the cut that introduces the oscillations, as it disturbs the delicate balance of frequencies required to localise something in
Fig. 16 Convolution of signal from Fig. 15(a) with sinc function corresponding to the cutoff applied in Fig. 15 (d).
25 time. As an example: in order to improve the smoothness of the cut-off we could make the weight drop from 1 to 0 like a sigmoidal rather than in a single blow. The more we soften the drop, the more localised the filter shape will be, as illustrated by Fig. 17. Starting in the time domain we arrive at more or less the same conclusions. A popular filter is the moving average. It calculates the average over a small window on the data, a window that is slid over the data. When the xi are the data points and Yi is the output of the moving average, the calculation for a window of 4 points would be:
x4)/4 -+- X4 -t- x5)/4
Yl = (XI -nt- X2 -+- X3 @ Y2 = (x2 -~- x3
Y3 -- (X3 -+- X4 -+- X5 @ X 6 ) / 4
The functioning of the moving average over n points can be seen as a convolution of the signal with a block ( 0 . . . 0 1/n ... 1/n 0 . . . 0). Therefore, its effect in the frequency domain is a multiplication with, again, the shape of a sinc, or, if we evaluate it using a power spectrum, the square of a sinc. Fig. 18 gives the power spectrum of a block of 11 points wide applied to a signal 100 points long. The higher frequencies are strongly attenuated, which describes the smoothing effect of the filter, but there are oscillations that ensure that ranges
Fig. 17 Sigmoidal cut o~'s with di[Jerent slope and corresponding.filters.
26
IJ. ,111,.
frequency
Fig. 18 Power spectrum of a l 1-point block in a signal 100 pts long.
of higher frequencies do get through. These oscillations are due to the sharpness of the filter shape. When we are looking for a compromise, something that is smooth and monotonic in both domains, and has a reasonably sharp cut-off, a Gaussian would be a good choice. When the moving average is calculated over fewer points, the power spectrum gets wider, and the oscillations move to frequencies above those in the signal. Fig. 19 shows the power spectrum of a block that is only two points wide. In the case of a moving average over two points, it is easy to understand what happens to the variation that is averaged away. This variation is found by taking the first difference of the data, i.e. taking the difference of two neighbouring points where the moving average takes the sum. When xi are the data points, Yi is the output of the moving average and zi is the output of the first difference operator, the calculations are:
frequency Fig. 19 Power spectrum of a block of 2 pts in a signal 100 points long, the 1st diff operator, and the sum of the spectra.
27
Yl -- (X1 -Jr- X2)/2
Zl = (X2 -- xl)/2
Y2 -- (X2 q- X 3 ) / 2
Z2 = (X3 -- X 2 ) / 2
Y3 = (X3 -}- X4)/2
Z3 = (x4 -- X3)/2
In fact, when we calculate both y and z we could drop every second Yi and Zi and still be able to reconstruct the original xi. The first difference operator can be seen as a convolution of the signal with the sequence 0 . . . 0 - 1/2 + 1 / 2 0 . . . 0 . It is a coarse approximation of the first derivative of the noise-free signal. Using Fourier integrals, it can be shown that the transform of the first derivative of a function is equal to the transform of the function multiplied by o~. In other words, taking the first derivative amplifies high frequencies- the higher the more they are amplified - and annihilates zero frequency. The power spectrum of the first difference operator shows such a high-pass effect, see Fig. 19, where it has been plotted together with the spectrum of the moving average over 2. Note that the power spectrum would remain the same if the sign of the impulse response were changed. That would change the sign of the output, but that would not change the filtering effect. When combining the power spectra of the first difference operator and the moving average over two, we see that the two spectra have a constant sum. This corresponds with what we already knew, the information passed by the two operators is complementary. The moving average over 2 and the first difference operator form a special pair of an LP filter and an HP filter that divide the frequency domain in two, right in the middle of the domain. This type of filter pair plays an important role in the discrete wavelet transform. Although this kind of pair has a special property, that does not mean it is hard to come by. There are an infinite number of such pairs. Consider for instance the sharp cut-off we applied in Fig. 15. We could place the cut-off in the middle of the frequency domain. Letting pass everything below the cutoff creates a low-pass filter, letting pass everything above it creates a complementary high-pass filter. This very neat cut up of the frequency domain has a less appealing counterpart in the time domain, as the shape of the lowpass filter is a sinc much like the one in Fig. 16 (only narrower) and the shape of the high-pass filter is something similar. Sigmoidals as in Fig. 17 offer an alternative. Where the low-pass filters drops following a sigmoidal, a high-
28 pass filter could be made to rise following the mirror image of that sigmoidal. We would again have a pair of complementary LP and HP filters, as illustrated in Fig. 20.
9
Apodisation
Suppose we have a signal like the one in Fig. 21(a), a signal in which we recognise a strong trend that makes it end much higher or lower than it started. The Fourier transform of such a signal risks being dominated by the trend. The reason is that the Fourier basis does not have any simple shapes at its disposal to describe such a trend efficiently. All it has is zero frequency, which is used to move the sines/cosines to the right level. For the sines and cosines, bridging the distance between the mean and the first value of the signal is just as hard as bridging a sudden jump elsewhere in the signal. High frequencies are required to describe a sudden jump. Fig. 21(b) shows the power spectrum
Spectrum
Impulse response
LP HP
/-4-
Fig. 20 A pair of high-pass and low-pass filters with sigmoidal cut-offl representation in tlle fi'equency domain (left) and time domain (right).
(b)
Fig. 21 (a) A sine on a trend," (h) its PSD.
29 of the signal in Fig. 21(a). That signal consists of a sine and a trend. The power spectrum shows several frequencies in addition to the one sine at f=3. Another way of looking at it starts from the periodic nature of the basis functions. As far as the sines and cosines are concerned, the signal could be just one period of a cyclic phenomenon. When we plot the concatenated signal (Fig. 22(a)), we see that it is dominated by a triangular oscillation, a saw-tooth as it is called. The Fourier transform of the signal will be equally dominated by the transform of that saw-tooth, which, due to the sharp edges in the saw-tooth, contains a lot of high frequencies. To illustrate this, the saw-tooth and its power spectrum have been plotted in Fig. 22(b) and (c). A pragmatic chemist may want to solve the problem by detrending the signal, a common practice in, e.g. data pre-treatment of NIR spectra, but in signal processing, the problem is dealt with by multiplying the signal with a weighting function that squeezes the ends to nearly zero. Apodisation, as it is called. A multiplication in the time domain corresponds to a convolution in the frequency domain. One might say that this time it is the power spectrum itself that is filtered rather than the signal itself. So apodisation will move the problem from a surplus of high frequencies to a global deformation of the spectrum of the signal. The shape of the apodisation function is chosen so as to minimise this deformation. One candidate is the Gaussian. The very wide Gaussian required to apodise corresponds with a very narrow Gaussian in the frequency domain, and will have a limited filtering effect on the spectrum. Fig. 23(a) shows the signal of Fig. 21(a) after apodisation, and Fig. 23(b) gives the power spectrum. We observe that the power spectrum has been cleaned up, but that the one sine in the signal is broadened.
W
/ //V
(c)
I,,..-
.
Fig. 22 (a) The signal of Fig. 21(a) repeated several t#nes; (b) the corresponding sawtooth," (c) its PSD.
30
(b)
IIi Fig. 23 (a) The signal o[Fig. 21(a) after gaussian apodisation" (h) its power spectrum.
We have introduced apodisation as a weighting of the signal, but we can just as well view it as a weighting of the Fourier basis functions. The sines and cosines become squeezed down at the ends, as illustrated by Fig. 24. To the left it shows a sine base function, a gaussian apodisation that is chosen narrow in order to amplify its effect, and the resulting apodised base function that has the shape of a ripple. Without apodisation, the basis functions of the Fourier set correspond to sharp pulses in the frequency domain. With apodisation, these pulses become convoluted with the transform of the apodisation function, e.g. a Gaussian. The convolution of a pulse with some shape moves this shape to the position of the pulse. As a consequence, the frequency domain is not cut up in disjoint frequencies, but in a series of overlapping Gaussians. Note that this is no more than a different view of the filtering effect of the transform of the apodisation function. Cd)
I
!
(t3
Fig. 24 (a) A sine base function," (b) a gaussian apodisation function" (c) the apodised sine," (d) power spectrum of (a)" (e) power spectrum of (b)" (f) power spectrum of ('c).
31 The effect of apodisation in the frequency domain is illustrated by Fig. 24. To the right we see the power spectra corresponding to the signals to the left. A pulse, i.e. a single sine, for the base function, a decaying signal (in fact half a Gaussian) for the Gaussian, and a Gaussian at the location of the single sine for the apodised base function. The apodised base functions no longer represent single frequencies, and, in case of the gaussian apodisation, they are no longer orthogonal.