A simple noise-reduction method for real data

A simple noise-reduction method for real data

Physics Letters A 160 (1991) 411—418 North-Holland PHYSICS LETTERS A A simple noise-reduction method for real data Thomas Schreiber’ and Peter Grass...

636KB Sizes 36 Downloads 111 Views

Physics Letters A 160 (1991) 411—418 North-Holland

PHYSICS LETTERS A

A simple noise-reduction method for real data Thomas Schreiber’ and Peter Grassberger Physics Department, Universityof Wuppertal. W-5600 Wuppertal 1, Gauss-St rasse 20, Germany Received 17 April 1991; revised manuscript received 1 October 1991; accepted for publication 2 October 1991 Communicated by A.P. Fordy

A method to reduce noise in experimental data with nonlinear time evolution is presented. A locally linear model is used to obtain a trajectory consistent with the dynamics as well as with the measured data. In contrast to previous methods where the fit to the dynamics and the cleaning are done in separate steps, it is an (iterated) one step procedure. This is made possible by using data both from the past and from the future in the locally linear model. The method is applied to both artificial and real data. Among others, it leads to a significant improvement of correlation dimension estimates.

1. Introduction

The aim of this paper is to present a novel nonlinear method of noise reduction which is much eas-

Encouraged by the increasing theoretical understanding ofcomplex dynamical systems, efforts have been made in the last decades to detector lowdimensional chaotic behaviour in laboratory as well as in field experiments. Nevertheless the usefulness ofthe methods of dynamical systems analysis can be strongly restricted by the limitations of the experimental situation. Most tools developed for the analysis of dynamical systems like Lyapunov exponents and dimensions are based on assumptions which cannot be rigorously fulfilled by experiments both due to the finite number of data points and due to noise, Quite independently from this, it would be of great advantage if one could process data such that noise is reduced without distorting the signal. One stan-

ier to apply, and yet seems to work at least as well as those of refs. [2—41when applied to real-world data. On artificial cases it works even better.

dard method are linear filters. Although there are cases where a linear filter can improve the quality of the data, there is the danger that it also damps out relevant parts of the signal. Eventually, linear filters can even enhance its complexity [1] (see also fig. 2). Recently [2—41nonlinear methods of noise-reduction have been developed which avoid these problems but are not easy to implement. Current address: Niels Bohr Institute, Blegdamsvej 17, DK2100 Copenhagen 0, Denmark.

2. Linear modelling This is not the place to review the highly developed theory of linear filters. We only want to single out certain facts since the method to be introduced later will be a generalization of these. In the simplest form of a low-pass filter a corrected data point is obtained by taking a linear average of his predecessors and his successors: k

~ a•x~÷1. —k

(1)

The coefficients a~have fixed values chosen to suppress the highest frequencies in the spectrum. The larger the value of k the more frequencies are discarded. Applying this method, one has to be sure that the highest frequency of the signal is considerably lower than this cutoff. A low-pass filter can only be used as a noise reduction device if the dominant frequencies of the noise and of the signal are well separated, such as for highly oversampled data with delta-correlated noise.

0375-960l/9l/$ 03.50 © l991 Elsevier Science Publishers B.V. All rights reserved.

411

Volume

160, number 5

PHYSICS LETTERS A

On discontinuous time series such as iterates of the Hénon map it does not work at all because the samp1mg frequency is at the same time a characteristic frequency of the system. An improvement consists in individually fitting the coefficients in eq. (1), and adding a constant term. This is similar to a linear (autoregressive) predictor. with the difference that the model does not onl\ involve values from the past but also from the future. So the most general first order model of the time evolution is ~

a,x,,+,+h.

a
(2)

~ December

!0’~

(4) is obtained either global!~or locally. In a second step, the trajectory .~,, is constructed such that it both satisfies eq. (4) and is close 10 the measured trajector~.v,~Notice that the naive predictor .~,,=t(.’,, v, ,.,,,) would not give a good noise reduction scheme: though noise would be reduced in lhc stable directions (those associated with negative Lyapunov exponents), noise in the unstable directions would be even enhanced. In practice, in refs. [2.3] locally linear fits to the dynamics were used. I,,~

~

~

.

Now the optimal values of the coefficients arc ohtamed by a least squares fit where the quantity

where x = (v,

~ (x,, —.\,,

is an ui-dimensional state vector constructed from

is minimized. Note that if one tries to fit the optimal value for 0~ from the least squares equations one would get the trivial solution a,=5, 0, h=O. Thus, a0 has to be fixed, Choosing a0=O would impose the most radical correction. If the signal is periodic or quasiperiodic, this will eliminate additive noise completely when repeated infinitely often (and with infinitely many points), without reducing or distorting the signal. The proper value of k depends on the number of frequencies involved, For complex time series, the linear equation (2) cannot be expected to hold, but we shall essentially assume it to hold locally in phase space.

3. Nonlinear methods Let us in the following assume that the noise-free signal satisfies a nonlinear evolution equation ~ t(i’ i’ 1’ ) (4 ( - I.’



.



with a chaotic attractor. Here and in the following, we shall always denote the noise-free trajectory by i’,,. the noisy one by x,,, and the cleaned one by i,,. In all previous nonlinear noise reduction schemes [2—4], the same basic strategy has been used. First, from the data an approximation to the evolution law 412

-

i

,,

...v,,,,

)

0

delay coordinates. In the second step, Farmer and Sidorowich [31 compute the trajectory which minimizes L~distance ~—x~=1,, ~ under the constraint that .\, satisfies the evolution equation (4) exact/i’. For .\ data points this leads to a set of (2N— 1) x (2N— I equations which gets increasingly ill-conditioned for large N. It can only be solved by special methods like manifold decomposition [4] and singular value decomposition. While the requirement that the corrected trajectory has to follow exactly the deterministic dynamics seems to be quite natural and leads to precise results when the dynamics is known, it is less obvious if the evolution is only obtained locally and from noisy data. In the method of Kostelich and Yorke [2]. determinism is not forced as a constraint but the de~iation from it is minimized together with the deviation from the measured orbit. The relative weight of both is chosen somewhat ad hoc. For a trajectory segment of length N the same number of equations has to he solved for the corrected coordinates. To avoid illconditioning, the minimization is only done on short overlapping windows in time. As this will not dimmate noise completely, both steps are repeated several times (with v, replaced each time by the last i,, I until the result seems optimal. This second step is necessary essentially to make a coordinate which has evolved deterministically from the past consistent with the future develop-

Volume 160, number 5

PHYSICS LETTERS A

ment. The knowledge of the past can fix the present

point only along the stable direction. Looking back from the future, it can be localized also in the unstable direction (see also the related recent work on shadowing, e.g. ref. [5]). The main idea of the method proposed in the present note is to integrate the information about the future explicitly into the linearized deterministic equation. As it turns out this can be done in a straightforward way. Let the data be embedded in a (k+l+ 1 )-dimensional phase space with delay and “advance” coordinates, =

(x,_

1, x,_,~ ,

,,,

x,,,..., x~÷ k—I,

Xfl+k)

.

(7)

A linear model similar to eq. (2) is made in a (k+l+ 1)-dimensional neighborhood of each point of the data set. The coefficients then become timedependent (or rather dependent on the position in phase space): ~ a~x~÷ ~

(8)

1= —‘

As in the linear case the optimal values of the coefficients are obtained by least squares fits but now the approximation needs only to hold in a neighborhood ~ii’of the point to be corrected. So the extremal condition is 2)°°~ ~ (a(~)x,fl+b~—xm)2 ~ ‘//(x,,) (a —mm (9) —



We choose neighborhoods with fixed numbers of elements.

Each point x~is now simply replaced by the corrected one $c,_—x,~’°obtained using the coefficients in its neighborhood. As in the global linear case, a 0 cannot be obtained from the fit. As in that case and as in the method of ref. [2], the procedure can be repeated several times, So a0 can be chosen arbitrarily close to one, provided the procedure is repeated often enough. The proof that this method really damps out the noise in a simple strictly hyperbolic system is given in an appendix. In the following, we choose always 1= k. Apart from

a0 there remain essentially two parameters which can be varied in our procedure. The first is the order k

2 December 1991

of the local linear model (eq. (8)), the second is the size of the neighborhoods. Larger neighborhoods give more stable fits but if they are chosen too large the locality required for the linear approximation is violated. The value of k is limited from above mainly by the fact that neighbors have to be searched in 2k+ 1 dimensions. For large k, points can be very sparse for limited data sets. So the optimal values can vary largely with the nature ofthe data, the number of given points and the kind of noise imposed. The easiest way to optimize this choice is to find the set of parameters for which the average quality of the fits given by a in eq. (9) is the best. In the cases where the exact dynamics is known this turns out to co-

incide with the minimum of the mean dynamical error defined below. Most of the CPU-time required for our method is consumed for finding a given number of nearest neighbors (typically 5—40) in high dimensions (typically 5—11). Therefore it is essential to use a fast algorithm for this task. The box-assisted algorithm described in ref. [6] efficiently finds all the neighbors within a given distance. It can be extended to our case by the following procedure: — Start with a very fine partition of box length — Find all neighbors within a distance less than — Mark all points which have not enough neighbors yet. — Coarsen the partition by a linear factor of two, ~.

.

.

.

For the remaining points: find all neighbors within a distance less than f. — Repeat the last three steps until enough neighbors have been found for all points. Since the partition is made only with regard to two coordinates, it is optimal to take Xnk and X,~÷kwhich are expected to be most independent. —

Adopting this procedure, one iteration of our noise reduction scheme take 5—15 minutes of CPU-time on a DEC-station 2100 for 65536 data points, depending on the order k and the nature of the data.

4. Applications So far we applied our method to three systems, two

artificial ones and one experimental. While the amount of noise reduction cannot be easily mea413

‘volume 160, numberS

PHYSIC’S LETTERS A

sured for experimental data, in the artificial exanipies not only the original (noiseless) trajectory but also the correct dynamics is known. There we can define the mean di’namical error [2] by ~‘ E(.v)=

~ ‘V

~

[.v,,—/~.v,, ~

(10)

Table 1 Noise reduction on ihc Hénon map. Here, JIS ihe ratio between the root mean square noise levels before and after the cleaning. .

I(~) orsimplyi~

,,

.

-

-

Rd

3,,

=~(~)/~(~)

I—

I~( ~)

,~

-

—i

.

-



-

i’,~,n. the noise level is defined as the deviation from the eorrec~dynamics (eq. (II))). while for 1’ it is deftned as the (IC-

nation from the true trajeetor\. Points in ii

Although this error measure is important, it should not be forgotten that it does not guarantee that the corrected trajectory bears any resemblance to the original trajectory i’,,. or is even generic. Thus we also computed the deviation

I .2

Steps

51)

5.66 9(~

1

4

ret, [IJ

1.1)

4.-6

L

)

and the corresponding ‘value of ,~=i’~( ~)/1 ( s ) The first example we considered is the Henon at tractor [7] It should be easily handled since the original dynamics is given by a two-dimensional map:

-~

-u

(11) But there are points where the stable and the unstable manifold are tangent to each other so that a local coordinate system cannot be fixed by the intersection of both. This problem gets visible at the point where the linear approximation is fitted because the fit equations get singular. In principle one can solve the equations by singular value decomposition as it is done e.g. in ref. [2]. But since there are usually very few problematic points we simpl\ did not replace the noisy point by the corrected one (i) when the sum of squares in (9) was more than a factor of four higher than on the average or (it) when the correction would be larger than a given threshold. The whole procedure is iterated typically’ 4—20 times, using mostly a11= I. The number of uncot’rected points decreases with successive iterations and totals to less than 0.1%. This procedure also postpones the correction of points in very distorted neighborhoods to later stages when some neighbors Notice that we use the exact map only in this error estimaie, hut not in the noise reduction scheme, Thus our resulis should noi be compared to those of refs, [3,4], where the exact [was used also in reducing noise,

414

,,.

For



The noise reduction is defined by

‘~

2 December I 001

.

-‘

~

..

.

.

(a)

.. . ~ -

5 --

2

° -

------.----..-

:

, ‘

.

(b)

‘~‘

i0~.

C

Fig, I. Local slopes in a log~psersus log~tplot for a diseretiied Mackey—Glass equation with r= 30. (a) ‘Ihe result for a noiss daia set of 32768 points. (h) Result obtained from the same Scries after noise reduction, Both (a) and (h) show embedding dimensions 2—20. The delay time is 9 units, corresponding to IS)) steps in our discretization,

Volume 160, number 5

PHYSICS LETTERS A exact

2 December 1991 noisy

1.5

1.5

(b)

(a) 0-

0-

o

1.5

1.5

0

x(n)

x(n)

our method

moving average

(d)

(C) 0-

~—~—

0

o

i.s

0

1.5

x)nI

X)fl)

linear model 1.5’

(e) o

1.5

xtnt

Fig. 2. Phase plots ofiterates ofthe discretized Mackey—Glass equation. (a) The noiseless sample, (b) the same with 1% Gaussian noise added which is scarcely visible. Consequently the corrected sample in (c) looks the same. tn (d) the data had passed through a low-pass filter where a corrected point is obtained from the noisy ones by ~,, = (x,,,,, i + 2x,, +x,~) /4. (e) The result of a global linear model (eq. (2)) of orderk=5.

415

Volume 160, number S

PHYSIC’S LETTERS A

have been already corrected. Table I shows ihe amount of noise reduction we achieved with our model with 65536 data points distorted by’ I % (jausstan additive noise. Kostelich and Yorke [2] give a value ofr~ 1,,=4.76 for the same amount of data. They do not give ‘, tinfortunately in ref. [3] no quantitative result is given for the case that the dynamics has to be fitted from noisy data. Although the performance ol our method increases with the number of data points available, reasonable noise reduction can be ohtamed already with very few points (e.g. t’~,,= 1.83 with 500 points and k=l on the Hénon map). The second test was performed on the Macke’,— Glass delay differential equation [8]

2 December

to choose “optimal” embeddings and norms for tinite and noisy data. Indeed, while D~should not depend on them for perfect data, in practice there is usually quite a dependence which of course casts doubt on the reliability of the entire method. Part of this problem should be as-oided if one can get rid of the noise in the system. Indeed fig. I shows a dramatic increase of the sealing region (if there is any in the noisy case) by’ using a processed time series. To be sure that we end UI) with the real dimension, we computed [)~ also from -

/

-,

-

,.

-







/

—In’)!).

at(tT) l+t’(I—r)’

(12)

.~: 2F



109)

--~

-

--

-

-

.--

~--~

II it is integrated with a discrete time step ~\t= till it can be rewritten [9] as an ( ii+ I ) dimensional

°L-

map. = -



-

-

.

-

-

----

2.%I+Ivr[ .t’,,_%i

\ I +t’~~/

.5

~

I

+t’~~iH

(a) -

((3)

-

~.

I

To he close to the original differential equation .1/ should be large, in our case it was Tl= 500. We chose paiameter values a=0 2 h=O I and r= 30 On the time series obtained from eq (13) noise reduction would be trivial because as it stands it would be highl~oversampled To be more realistic v~eembedded points with a large delay (I ~0 steps corresponding to 9 time units) which corresponds to a reasonable sampling rate in an experiment. Since one of the motivations was to improve dimension estimates, we computed D~[10] for noisy and corrected data. When x,, are the embedding vectors and the correlation sum is defined as (14)

~-‘-—-.-

~

_----_-

1.

L

~s

--

H

.,

,

L

.

-

-

- -‘

-- -

--

--

L ~-

,

-

(b) ~

-‘

- -,

-

-

Ioq~E Fig, 3. Local slopes in a log~p versus log2 plot for data from

then the correlation dimension is given by =

log C’ ( ~) ~im 1i~ log ~

Tavlor—Couetle flow experiment [14,15]. -



I~

A considerable literature [lI—I 3] has evolved how 416

As in fig. I the results

f’or the noisy (a) and the corrected (hI series are displayed. Embedding dimensions 2—21) arc shown, the delay is,,3 time steps No improvement could be obtained b~choosing as delay the first minimum ofthe mutual information (7 time steps) orby taking the Euelidean in stead of the supremum norm,

Volume 160, number 5

PHYSICS LETTERS A

a noisefree data set of 106 points which gave D2=3.15±0.05. The last example we studied was a time sequence from a Taylor—Couette experiment [14,15]. The values of k and of the size of 3?/ were optimized to give the best linear fits. Results are shown in fig. 3. While the raw data did not give any indication of a low dimension, the cleaned data show a small but clearly visible scaling region. In conclusion, we have demonstrated a simple and efficient noise reduction scheme based on ideas from nonlinear dynamics. In spite of the fact that optimal results are obtained iteratively it is not too time-consuming and comparatively easy to implement.

2 December 1991

Let us denote the successively improved time series by ~ We start with ~ =x~.If we accept the correction in each step only with a weight 1 a0, we have 1 z,~/~”> =a0z~°~ + (1—a0) ~ b1z~°2 i —c) ° (A.3) —

~—



Denoting the error remaining at step j by ~w —y

~cj~ ‘~

‘~

(A.4)

‘I

we find Y~’ t)

=

a0~° + (1



(1 1 a0) (~-~i~



b ~/2 ‘~-~

(A.5)

Acknowledgement

This resembles a diffusive system. For u

We would like to thank the Institute for Scientific Interchange for the invitation to Turin, Italy in March 1991, where this work was finished. We are espe-

1y2>0 (the orientation is preservedat each iteration of the map) the largest eigenvalue corresponds to the uniform state and is (1 b1 = 1 +a 1 “~~2) (A 6)

cially grateful to T. Buzug and coworkers for data from their Taylor—Couette flow experiment. One of the authors, a grant the SCIENCE planT.S., of thereceives Commission of thewithin European

Communities.

Let us illustrate in an admittedly trivial case why in principle the method can work. We think of dynamical systems similar to the baker map where the time evolution is linear except for the modulos performed to keep the values bounded. Then a noiseless time series satisfies (A.l)

where we dropped the modulos. The measured series is distorted by additive noise, x,, = y~+ From the tangent map one easily deduces the local Lyapunov numbers /~i.2which in this simple model do not vary with time. They fulfil the conditions ~.

—~ULU2 .



~-



~~~)‘ ~ + ji~

a

0. It has magnttude less than one provided a is not too large and there is a stretching and a contracting direction. For Ui ~ <0 (the orientation is changed) the relevant eigenstate is alternating and the eigenvalue is / b ,i2 1—a ( +1 ~~b0 b0 (1 ~ )(l +~~) = 1 a (A.7) —~-- —

~‘~‘



,

l1i+~t2

which is again contracting under the t _—boy~+b~y,,,,~ +c,

=



—1 +a ~ —

with a = 1

Appendix. A simple case

b01i1 +,a2, b1

~-

(A.2)

We assume b0 and b1 to be known, e.g. from a fit with many data.

same

assumptions.

So the error not simply diffuses like in a smoothing procedure but it is damped out.

References [1(R. Badii, G. Broggi, B. Derigisetti, M. Ravani, S. Ciliberto, A. Politi and M.A. Rubio, Phys. Rev. Lett. 60 (1988) 979.

J.A, Yorke, Phys. Rev. A 38 (1988) 1649. [31J. D. Farmer and J. Sidorowich, Physica D 47 (1991) 373. [2] E.J. Kostelich and

417

Volume 160, number S

PHYSICS LETTERS A

[4]SM. Hammel, Phys. Leti, .A 148 (1990) 421. [5]SM. Hammel, iA. Yorke and C. Grebogi. Bull Am, Math. Soc. 19 (1988) 465.

16] P. CIrassberger. Phys. Leti.

A 48 (1990) 63 [7] M. Hénon, Commun. Math. Phys. 50 (19761 69. [8] MC’ Mackey and L. Glass. Science 197 (1977) 287

[9]P. Grassberger and I. Procaceia, Phys. Res. A 28 (19831 2591.

11(1] P. (irassberger and [II] AM.

I. Proeaceia, Physica D 9 (I 9831 I 89.

Fraser and HE. Swinney. Phys. Rev. A 33 (1986)

1134.

[12] W. Liebert and HG. Schuster, Phys. Lett .A 142 (1989) I 1)7.

418

2 December 90)

1131 W, Liebert, K. Pawelzik and HG. Schuster, Europhys. Lett. 14 (1991)521,

[14]T. Buzug, T. Reimers and G. Poster. Optimal reconstruction of strange attractors from purely geometrical arguments,

preprint (1990). [IS] T. Buzug, T. Reimers and 0. Pfister. Dimensions and Lyapunov, spectra from measured time series of Taylor-Couette flow, in: Proc. NATO Workshop on Nonlinear evolution of spatio-temporal structure in dissipatise continuous systems, to he published