Physica D 56 (1992) 216-228 North-Holland
An improved estimator of dimension and some comments on providing confidence intervals Kevin Judd Department of Mathematical Engineering and Information Physics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan Received 17 June 1991 Revised manuscript received 18 November 1991 Accepted 14 January 1992 Communicated by P.E. Rapp
An estimator of dimension is described that has many advantages over the conventional implementation of the Grassberger-Procaccia method. It can in some circumstances provide a confidence interval for the dimension estimate. For high dimension sets (greater than about 3.5) the dimension estimate has a bias which depends on the structure of the set. If one has suitable knowledge of the structure of the set it is possible to make a bias adjustment. Some numerical calculations are presented which indicate the largest error in the dimension estimate that can be expected. The problem of providing confidence intervals for dimension estimates in general is not solved completely, but we do take a step away from a meaningless number.
1. Introduction Estimating the dimension of an attractor of a dynamical system can provide useful, even vital, information [1-3]. The Grassberger-Procaccia method is a simple way of making such an estimate [4]. The simplicity of this method means that it is widely applied, but regrettably, also widely misapplied. The single most important failing of the Grassberger-Procaccia method of estimating dimension is that it provides no error or confidence interval for the estimate. To quote a dimension estimate of 3.5 is meaningless if the error is 5- 7. It is, if possible, even more meaningless to quote a dimension estimate of 3.5 and give no error estimate at all. The aim of this p a p e r is partly to rectify this situation by formulating a dimension estimator, based on the Grassberger-Procaccia idea of calculating interpoint distances from a trajectory, which in some circumstances can provide a con-
fidence interval for the dimension estimate. The problem of providing confidence intervals is not completely solved here because for high dimension attractors (greater than about 3.5) the dimension estimate has a bias dependent on the structure of the attractor. If one has no suitable knowledge of the structure of the attractor, and hence cannot make a bias adjustment, the confidence intervals are useless. However, some numerical calculations are presented which indicate the largest error in the dimension estimate that can be expected; which might be disappointingly large. With this knowledge we do at least take a step away from a meaningless estimate and provide a dimension estimator which in the future may also provide reliable confidence intervals. Earlier work that considers the statistical error of dimension estimates using the G r a s s b e r g e r Procaccia method, such as refs. [5, 6], provide a useful background to this paper. The primary interest in these works, however, is in the bias
0167-2789/92/$05.00 © 1992- Elsevier Science Publishers B.V. All rights reserved
K. Judd / Improved estimator of dimension
and variation of estimates with t h e sample size. This p a p e r is concerned w i t h the more fundamental issue of improving the estimator of dimension. In general terms a statistic, or estimator, estimates some property of an underlying distribution, given a particular sample from this distribution. To give an error or confidence interval for an estimate, it is necessary either to know or to assume the family from which this particular distribution comes. For example, the sample variance estimator can estimate the variance of a normal distribution coming from the family of normal distributions with the same mean. Only by knowing or assuming the family of distributions can one state confidence intervals. A sample variance, for example, has a x2-distribution and confidence intervals can be calculated by consulting a table of distributions. Hence to provide confidence intervals for an estimate of a d i m e n s i o n - which is a property of the distribution of interpoint distances on the a t t r a c t o r - w e must either know or assume the family of distributions for interpoint distances. In this paper, to avoid technical details, we consider a simple model of an attractor and dynamics, and derive from this the distribution of interpoint distances on the attractor. Although the model is simple, it is indicative of the general situation. Given this distribution, which has the dimension of the attractor as a parameter, we can construct estimators of the dimension for which we can also provide a form of confidence interval. In the first part of this p a p e r we will indicate the four main failings of the conventional implementation of the Grassberger-Procaccia method of estimating dimension. We will then present the simple attractor model and derive with a minim u m of technical detail the distribution of interpoint distances on the attractor. Then we will consider a dimension estimator for which a form of confidence interval can be given if one has sufficient knowledge of the structure of an attractor. Finally, some numerical results will be given that indicate the maximum error that can be
217
expected of the dimension estimator if one has no knowledge of the structure of the attractor.
2. The conventional Grassberger-Procaccia method Briefly, the conventional implementation of the Grassberger-Procaccia method of estimating the dimension of an attractor from a sample trajectory {Xili = 1 , . . . , n} is first to calculate the correlation function C,,(e) by
O
where I" I is a suitable norm and I(E) is a function taking the value 1 if the condition E is satisfied and 0 otherwise. It is claimed that correlation dimension d r is given by
d r = lim lim ,-,On~
log c,,(e) -loge
for almost all sample trajectories. The usual procedure for estimating the dimension is to plot the correlation function, Cn(e), on a log-log graph. The above formula says that in the limit the slope of this graph is an estimate of the dimension. Typically such a graph for a finite length trajectory looks like fig. 1. This graph has three distinct features: for large E the graph flattens, at moderate E the graph is quite linear and for small e the graph jumps irregularly. The flattening of the graph for large E happens because the attractor has a finite size. The linear part of the graph is often referred to as the scaling region [4]. The irregularity for small e is a result of having only a finite amount of data, so that the contribution to the correlation function by a specific pair of points is seen as a jump in the graph. Conventional implementations of the Grassberger and Procaccia method assume that most
K. Judd / Improved estimator of dimension
218 0
I
I
I
I
I
I
I
-2 -4 -6 -8 -10
log(E) -12 -
-'6
:5
?4
-'a
:2
:1
o
Fig. 1. A correlation function plot. The Grassberger-Procaccia correlation function, Cn(e), plotted as a log-log graph. The data is the embedding in three dimensions of 378 years of recordings of the number of sunspots. Although the graph is relatively smooth, the "scaling region" is curved rather than fiat. A conventional dimension estimate will depend on the choice of the scaling region.
of the information about the dimension is contained in the scaling region and proceed by fitting a straight line to the scaling region. There are four main problems in doing this. (1) The first problem, which is inherent in all methods that are based on calculations of interpoint distances, is that from a sample trajectory of length n there are n ( n - 1)/2 interpoint distances, but these distances are not all independent. The triangle inequality states that the distance between two points is no more than the sum of the distances from a third point. (2) The correlation function is an attractive function to use because it is a relatively smooth function. This smoothness is misleading because it contains a lot of statistically correlated information. This is because C,,(e) is the number of interpoint distances less than e, which becomes more statistically correlated as E is increased. Any fit of this function should take this into account by giving the large e values less weight. (3) The scaling region may not reflect information about the dimension of an attractor; examples can be constructed where the scaling region reflects only large-scale properties of an attractor, not the dimension. Also, the scaling region is a subjective, but critical choice. For some data sets a small change in the scaling region significantly
changes the e s t i m a t e - often a chosen scaling region is not straight, or straight on average, but is curved. Worse, taking a scaling region completely ignores information about small distances between points, which is still information about scaling. There is no good reason for throwing this information away if the data is not corrupted by noise. (4) The final problem with the conventional implementation is that it gives no estimate of the error of the dimension estimate. The standard error in fitting the slope in the scaling region is not an estimate of the error of the process of estimating the dimension, but rather the error of fitting a straight line to part of a graph while assuming a normal distribution of errors in the log-log graph, which is quite wrong in itself.
3. The new m e t h o d
The rest of this paper presents a modification of the Grassberger-Procaccia method of estimating the dimension that, in contrast to conventional implementations, considers the distribution of the interpoint distances directly and not the correlation function. This completely avoids problems (2), (3) and (4). However, problem (1) is not avoided with the estimators suggested. In consequence, we cannot provide true confidence intervals, but rather the best possible (that is, the narrowest) confidence intervals the data could support assuming the attractor model we shall introduce is valid. In any experiment a dynamical system can only be sampled discretely, say at regular time intervals, and as a result we will only consider discrete dynamical systems. We will assume there is a dynamical system defined by a (piecewise) smooth map ~ on a smooth manifold M, which has a compact attractor X with basin of attraction fL We will also assume that there exists a natural measure ~ on the attractor X. The natural measure is defined as follows. Let x ~ f~ and A c M.
K. Judd ~Improved estimator of dimension
Define
/~(A, x) = lim 71 ~ l ( ~ x ~ A ) . t ---}oo
s=l
If/z(A, x) is independent of x ~ ~ almost everywhere, by Lebesgue measure on M, then/~(A, x) is a probability measure, called the natural measure ~ [7]. A dimension estimate is an experiment consisting of choosing an initial point x0 ~ fl, then measuring the resultant trajectory x,+~---~Fx,, and from this trajectory we wish to estimate the dimension. Define a sample trajectory {X,} to be a sequence of random variables X~ where X 0 is distributed as some distribution v on fl, which is absolutely continuous with respect to Lesbegue measure on M and X,÷ 1 = grX,. The distribution v determines the choice of the initial condition of a trajectory. A consequence of v being absolutely continuous is that if A c M then lira Pr(Xn E A)
=/z(A)
n --~ oo
almost surely. The correlation dimension de(X,/~) of a set X with a probability measure ~ on X can be defined de(X, ~ ) = lim sup
l o g P r ( l X - Y[ < E) log ~ '
distance distribution Px and has the correlation dimension d c as one of its parameters. In general this cannot be done; instead we will consider a simple model of an attractor X which illuminates the essential parameters of the family of interpoint distance distributions. For this model it will be shown that for ~ sufficiently small Px has the form
px( )
=
where p is a polynomial with degree equal to the topological dimension of the attractor X. Given this distribution family it is possible to construct estimators of d c from sample trajectories which also provide a form of confidence interval.
4. The fundamental model of an attractor Simple attractors like equilibria and those of periodic and quasi-periodic behaviour have simple geometries, namely sets of points, closed curves and torii. However, strange attractors of chaotic dynamical systems have fractal geometries. The simplest example of this is the bakers' map defined on M = [0, 1] × [0, 1) by
~(u,
= [ (A,u,z/Trl) z)
if z < ~ , ,
(A2(1-u),(1-z)/zrz)
~---} 0
where X and Y are random variables on X distributed as /~. The correlation function Cn(¢) defined previously is in fact an estimator of P r ( I X - Y I < e ) that is derived from a sample trajectory {X,}. Clearly, the distribution px(¢) = P r ( l S - YI < E) of the interpoint distances I X - Y[ for random variables X and Y in X chosen by ~ includes the correlation dimension as a parameter. In fact, this parameter determines the asymptotic tail of the distribution. (It is well known amongst statisticians that estimating such a tail parameter is difficult.) Our aim should be to determine a family of distributions which contains the interpoint
219
i f z > T r l,
where A~ + A2 < 1 and ~-~ + 7r2 = 1. This mapping is illustrated in fig. 2. This dynamical system has 1
°
'
v,
Fig, 2. The bakers' map. This map divides the unit square horizontally at ¢r I into a lower region Z 1 and an upper region Z~. These regions are then mapped by attine maps ~bI and ~b2 into the regions V l and V2. The widths of the regions V 1 and V2 are A1 and A2, respectively.
220
K. Judd / Improved estimator of dimension
an attractor that is the product of a Cantor set and an interval. In general a strange attractor has a local small-scale structure like that of the baker's map attractor. That is, a strange attractor X can be modelled locally as a set V × Z where Z is some bounded, connected subset of a smooth manifold and V is a Cantor-like set, and the complete attractor can be modelled as a union of such sets. The important thing to note at this juncture is that as far as calculating the dimension of an attractor is concerned, only the local, small-scale structure in relevant. Hence, to derive an expression for the interpoint distance distribution Px on an attractor X relevant to the estimation of dimension, we need only consider the distribution for a set X of the form X = V x Z. For the Cantor-like sets V we will consider the following class of sets. A set V will be called self-similar if there exists diffeomorphisms ~bj such that if Vj = ~bjV then for all i 4=j, V i O Wj = Q~ and V - - U jV~. Furthermore, V is called finite if the above is true with finitely many diffeomorphisms ~bj and separable if there exists an open connected set U with V c U and for i ~ j , thiU n ~b/U = O. To simplify the analysis we will consider the following sub-class of sets. A set V is a (k, A)-SS set or a uniform affine separable finite self-similar set if in addition to the above the 4~j are affine contractions, that is, there exist Aj such that for all j, [(~jX--~jy] = h j [ x - y l . The construction of such a (k, A)-SS set is illustrated in fig. 3. We will now construct a model dynamical system which generalizes the baker's m a p to a system having an attractor of the form V x Z, where Z is some bounded, connected subset of a smooth manifold and V is a self-similar set (actually, for simplicity, a (k, A)-SS set). It should be noted that a single point is a trivial self-similar set and so such a system can also model the local structure of the elementary attractors, that is, periodic and quasi-periodic attractors. The model system is defined on a region M = U x Z c I~" x ~t, where both U a n d Z are con-
Fig. 3. The construction of a self-similar set. An example of the construction of a (k,A)-SS set, where k = 3. Here are shown some images of the separating set U. T h e first images of U are Uj = ~bjU. The set V is the intersection of all the images generated by repeated mappings of U.
nected, bounded sets and U is closed. (One should keep in mind the baker's map and refer to fig. 2.) Let there exist affine contractions ~bj, j ~ K = {1,..., k}, on U such that Uj ~bjU c U and ~biU n ~ b j U = O if i4:j. Let 0 < A j < I be such that [~jx - ,Tbjyl = Ajlx - y l for all x, y ~ U. Let there exist Zj c Z, j ~ K, such that Z = Uj Zj and Z i Zj = O if i , j . Also let there exist continuous one to one maps qJj such that
,/,~: U x Zj--, Ui x Z, with 0~[u = thi and O f t l z = Zj. Define ~ : M --* M by ~ l u x z , = &j. For x0 ~ U × Z we have trajectories {x,} defined x , + ~ - - - ~ x n. The definition of qt is such that there exists a (k, A)-SS set V and X = V × Z is a strange attractor of the system with basin of attractor {l = U x Z = M. This attractor has a topological dimension t = dim Z. For the model system let /~ be the natural measure, defined in the usual way. It is important to note that p. is a product measure, that is, let A ¢ U and B ¢ Z and define probability measures P-v a n d / ~ z on U and Z respectively by / ~ v ( A ) - - p . ( A × Z ) and # z ( B ) =/~(U x B). As a result, it can be shown that p.(A x B) = #.v(A)/~z(B). Note also that the support o f / z v is V, and hence its subscript is V and not U. I t can also be shown that /.t z is
K. Judd / Improved estimator of dimension
absolutely continuous with respect to Lebesgue measure on Z - i t is in fact a Bernoulli measure. The preceding facts regarding the attractor X and its measure ~ allow us to make the following statement about its dimension:
= t + d~(V, IZ). This is, of course, just as we expect. A more important consequence of the properties of /z relates to Px, the interpoint distance distribution on X. If Pv and Pz are the distribution of interpoint distances on V and Z respectively, according to the measures iZv and iz z, and letting g be the density function of Pz, then
s 2 ) g(s) ds.
(1)
Proof. Since /z z is an absolute continuous measure, then so is the density of Pz, because it is a continuous mapping from Z x Z; hence, the density function g exists. Given two random variables X, X ' ~ X distributed as ~, we can write X = (V, Z ) and X' = (V', Z') where V, V' e V and Z, Z' ~ Z. In fact V and V' are distributed as Iz v, and Z and Z' are distributed as /z z. Furthermore, we have that IX-X'l 2-- IV- V'12+ I Z Z'I 2, where I" I is the Euclidean norm on the respective space V, Z or X, and so Pr(lX-X'l
variables with V0 determined by the initial distribution v in U × Z, and
E+~ = '/'1 u V. = 6AV,,, where A . is a sequence of random variables in K = {1,..., k}. The asymptotic distribution of A . is
d~(X,~) = d~(Z,~z) + d~(V,~,v)
px(~) = f j p v ( ~ -
221
< e)
= L ~ P r ( I X - X ' I < e l I Z - Z ' I =s) g(s)ds
= foTPrOV-W'l < ~ - s ~ ) g ( s ) d s . The importance of eq. (1) is that we may derive the distributions of interpoint distances on V and Z separately, which is easier to accomplish, and then combine them. For the model system a sample trajectory X. can be written as X . = (V., Z.), where V. E U and Z . ~ Z. The V. are a sequence of random
lim P r ( A . = j ) = i z z ( Z j ) n
almost surely.
---~ o o
Proof. From the definition of ~ , P r ( A . = j ) = Pr(Xn ~ U × Zj). By the definition of natural measure, lim._.~ P r ( X . ~ U × Zj) = ~ z ( Z j ) almost surely. The values % = / ~ z ( Z ) are important characteristics of the model system; in fact, by the following theorem the asymptotic behaviour ,of Pv, and hence the dimension of V, is completely determined by Aj and rrj.
Proposition 1. For the model system there exists • 0 > 0 such that for all • < •o
pv(•) = E ~,gv(~/*,). l~K
The proof of this proposition appears in the appendix, where the following corollary is also proved. In the proof of the theorem a value of e 0 is stated explicitly to be
,0= inf{Ix-yl
v,,y
i.j},
that is, the scaling property of pz(E) holds for • less than the least distance between any two "components" V/ of V.
Corollary. For the model system there exists Eo such that for all e < e o pv(e) =e"f(E),
(')°
where E ~'~ ~
=
1
222
K. Judd / Improved estimator of dimension
and f is such that l i m s u p , _ ~ 0 f ( e ) > 0 . As a consequence
bution of the form
Px( e) = e'~+t+sin~xl°g'+°)( a o + . . . +ate t) d ¢ ( V , , v ) =o~. The first part of the corollary states that for e sufficiently small p v ( e ) has an exponential tail, and the second part states that the exponent of this tail is the correlation dimension. The function f is introduced because the density of Pv is singular and so Pv decreases discontinuously, though exponentially on average. Proposition 1 also implies that Pv is log-quasi-periodic and as a result f is also log-quasi-periodic. It now remains to calculate the density of Pz; however, this presents some technical difficulties because although/z z is an absolutely continuous measure, it is a Bernoulli measure and so its density can be difficult to describe. To save unnecessary complexity we will consider here the simplest case of where /z z is uniform. The uniform case arises when ~-j = / z z ( Z j) = I(Z/) where l is Lesbegue measure on Z. Having made this assumption we may now apply eq. (1) to proposition 1 to obtain the following theorem, the proof of which is given in the appendix.
Proposition 2. If for the model system de(V,/z v) = a, Z has dimension t, and ~ z is uniform, then p x ( E ) =E~'+t(ao + ... + a t e t ) . The form of this distribution is only approximate because the density function is only absolutely continuous. It is in general neither smooth nor continuous owing to the singularity of /x v. However, the above form is a reasonable approximation. The most obvious case where it is incomplete is when all the scaling parameters h i of V are equal. In this case by proposition 1 Pv has a log-periodic component superimposed on its scali n g - usually Pv has a log-quasi-periodic component. In cases where the log-periodic component is large, it might be advisable to assume a distri-
for some A and 0. However, this should not normally be the case, and a strong log-quasi-periodic component can often be ignored or absorbed into the polynomial.
5. Estimators of correlation dimension
Having obtained in proposition 2 the approximate form of distribution for the interpoint distances for a simple attractor, which has the correlation dimension as a parameter, we can now devise estimators of the dimension, at least for these simple attractors. However, given that the small-scale structure of a typical attractor is similar to the model system's attractor, we may also conclude that such estimators will apply in general. There is, however, one essential restriction; the distribution form of proposition 2 applies only for distances less than some E0. This E0 marks the scale at which relevant scaling information begins, and is in a sense equivalent to the upper bound of the scaling region of the conventional implementation of the G r a s s b e r g e r Procaccia method. However, it is essential to appreciate that Eo cannot be determined in the same way as one would choose the upper bound of the scaling region, that is, simply as where the correlation function or distribution appears to be scaling, because this may be an artifact of the large scale structure of a complex attractor or an embedding in a high-dimensional space. The value of e0 can only be determined by some additional knowledge of the attractor, for example, clear visual indication that a self-similar structure exists in most of the attractor at a sufficiently small scale, and hence ensuring that the assumed model applies at a small enough scale. Having established this, the actual value of e o is not critical; provided it is small enough, it does not effect a dimension estimate significantly.
223
K. Judd / Improved estimator of dimension
The most efficient and practical method of estimating the correlation dimension, assuming the distribution Px of proposition 2, is to first calculate an empirical estimate of Px, that is, a frequency histogram of the interpoint distances from a sample trajectory, and then to estimate the distribution parameters from this. To calculate a suitable empirical distribution, one divides the real line representing interpoint distances into intervals Bi of equal log-length at the relevant small scale, that is, less than co. For example, a suitable choice is B i ~---[~'i,~i_l) for i > 0 where Ei = Ale0 for some ,~ > 0. In addition define an interval B 0 = [e0,oo) covering the irrelevant large-scale interpoint distances. Such intervals will be called bins. Given a sample trajectory {X,} of length N and set of bins {Bi}, the empirical distribution is found by calculating all interpoint distances and counting the number of distances b i in each bin, that is,
b, =
E
t(tx,-x
l
B,).
O
Typically, several hundred non-empty bins provide an enormous data reduction with no significant loss of information, By proposition 2 if X and Y are random variables distributed as ~ then for i > 0
e r ( I X - YI e B;) -- px(e;) - p x ( e ; _ , ) --- e~aq(e;), where d is the correlation dimension and q is a polynomial of degree equal to the topological dimension of the attractor. Call the above probability Pi and P0 -- P r ( I X - YI > G0). It is possible to show that for almost all sample trajectories {X,} of length N that in the limit as N ~ ~, the {bi} have a multinomial distribution [8], that is,
(F"ibi)! ~i. Pib'.
Pr({b~}) = l-.li( bi! )
(2)
We can now obtain an estimator of the correlation dimension using the fact that the {bi} have a multinomial distribution. First we calculate the {b i} as stated above and assume the {b i} have a multinomial distribution with pi=edq(~ i) for some polynomial q of degree t, then we adjust d and the coefficients of q to obtain a maximum likelihood estimate of d. To obtain the maximum likelihood estimate, one must minimize the log-likelihood function
L ( d , q) = -logPr({bi}) = - ~.,b i log Pi + C, i
where it is should be noted that the constant C is irrelevant. The minimization is a simple non-linear multi-dimensional minimization for which there are several suitable algorithms [9]. This method of estimating the dimension eliminates most of the problems associated with the conventional implementation of the Grassberger-Procaccia method. However, it does not remove problem (1), that is, it assumes that interpoint distances obtained from the points in a sample trajectory are i n d e p e n d e n t - s e e the next section for the consequences of this. At p~'esent no method exists to eliminate or allow for these correlations. There is also introduced in this method two minor difficulties. First is the choice of the degree of the polynomial q. Proposition 2 suggests that the degree should be equal to the topological dimension of the attractor. Since this might be what we are trying to estimate it presents a problem. Close examination of the coefficients of p derived in prop. 2 shows that they increase at a less the geometric rate, and since e is small, this implies the terms of q involving higher powers of e should quickly become small. Consequently, very high power terms Should be negligible. In fact, numerical tests have shown that a degree of 1 or 2 has always been sufficient and is recommended. Certainly, a high-degree polynomial should be avoided for the same reason that highdegree polynomials are avoided when extrapolat-
224
K. Judd / Improved estimator of dimension
ing polynomial fits, since we are c o n c e r n e d here with the tail of a fitting. With a high-degree polynomial one is in d a n g e r of simply fitting a polynomial t h r o u g h the data points, rather than fitting an exponential. T h e second problem introduced is that the form ~/q(E i) can have multiple global and local minima, especially if q is a high degree polynomial. However, if one first takes q to be a constant and makes a fit, which is an estimate similar to a conventional G r a s s b e r g e r Procaccia implementation, and uses this as the initial parameters for the non-linear minimization, then provided the degree of the polynomial is not too large, there should not be any difficulty with multiple minima. Fig. 4 shows the results of dimension estimates for various generalized bakers' m a p attractors and different length sample trajectories. T h e estimates shown in fig. 4a are of V × Z attractors with V a simple one-dimensional C a n t o r set and Z a unit hypercube; for these attractors the dimension is generally underestimated. In fig. 4b the attractors have a unit interval for Z and a multi-dimensional C a n t o r set for V; for these attractors t h e points of a sample trajectory are m o r e dispersed and the dimension is overestimated. It is seen that the accuracy of an estimate d e p e n d s mainly on the structure of the attractor, and that there is a significant bias for attractors of dimension greater than about 3.5.
6. A form of confidence interval in some circumstances A n advantage of the dimension estimator described in the preceding section is that we can provide for it some form of confidence interval because we can assume a p a r a m e t e r i z e d family of distributions to which it applies. However, as illustrated in fig. 4 this can only be d o n e if we know something about the structure of the attractor, for example, that the fractal part is low dimensional. If this is the case then we can cor-
10
•
9
!
"
.........
8
.........
q
"
True
- - - n
:
II
"
I
"
Dimension
"
"
.
........... ° . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
T000
...........
.............. .
"
II
"
i..- ..........~...........
. . . . . . .
7
¢,.. E
6 5
E
4
LU 3 2 1 1
2
3
4
5
6
7
8
g
i0
True Dimension
8
,"
True Dimension
7
i i | | | |
. . . .
. . . .
P
!
i /
'
i
i
i
:
| = . . .
. . = =
= = = .
1 1
2
3
4
5
6
7
8
True Dimension
Fig. 4. Estimating the dimension. Plots of dimension estimates against true dimension for various generalized bakers' map attractors are shown. In (a) the set V is a simple Cantor set with uniform measure and Hausdorff dimension between 0 and 1. The set Z is a unit hypercube chosen so that the dimension of X is between 1 and 10. Estimates are shown for sample trajectories of length 300, 1000 and 10000. In (b) the set Z is the unit interval and the V are n-dimensional Cantor sets with uniform measure and Hausdorff dimension between 0 and n. The resulting dimension of X is between 1 and n + 1.
I6 Judd / Improved estimator of dimension rect for the bias in the estimate, and having done so it can be shown that as the length of a sample trajectory approaches infinity the estimator has a xZ-distribution. Hence, we can find a form of confidence interval for the dimension estimate by making four assumptions: (A) The interpoint distances from which the empirical distribution is derived are independent. (B) The distribution of the interpoint distances comes from a family of distributions of the form ~dq(E) for some polynomial q of degree t. (C) The estimator has a x2-distribution. (D) The bias of the estimator is known. All of these assumptions are, of course, untrue. Assumption A is included to ignore problem 1 regarding the correlation of interpoint distances calculated from a sample trajectory. A result o f this problem, and ignoring it with assumption A, is that there is less information in the empirical distribution than there would be if the assumption were true. Furthermore, it is likely that the information is biased by some feature of a particular sample trajectory, or features of sample trajectories on an attractor in general. Assumption B relies heavily on the local structure of the attractor being of the V × Z form, but even in the ideal situation it is only an approximation. A necessary condition for assumption C to be true is that the sample trajectory is sufficiently long; the same can be said of assumptions A and B. Assumption D is by far the most difficult to assert. For experimental data from an unknown system it is not possible. In such a case the maximum error evaluation of the next section must be used. On the other hand, we might have that the attractor is an n-torus of unknown dimension, or that this is a good approximation of the attractor. In this case~ a bias adjustment for a dimensional estimate can be obtained empirically from fig. 4a. The "confidence interval" obtained by making the preceding four assumptions can be rather unusual. The confidence interval should be interpreted as the narrowest confidence interval the data could s u p p o r t - this is a consequence of as-
10
225 '
I
'
I
'
I
'
I
~ I
~ I.'
I
'
I
'
9
gs E i:5'6 lo 5
~4 W
2
iiiiiill ii -ii i !!!.i:,iii i_i iiil T:: i i i i i i, I
1
i
2
3
i
4
True
8
i
5
J
6
J
7
i
j',"
8
9
o10
Dimension
i
7 ¢-
o
6
¢ E
5
C~
4
'10
3 E •=
2
IJJ
1
fli i !iii 1
2
3 4 5 6 True Dimension
7
8
Fig. 5. Confidence intervals. Here are shown the 90% confidence intervals of dimension estimates calculated using the assumption described in the text.. It is clear that confidence intervals are useless except for when the sample trajectory is sufficiently long compared to the complexity of the attractor. sumption A, which implies there is more information present then there actually is. Fig. 5 shows some dimension estimates and their corresponding " 9 0 % confidence intervals" as determined above. It has been found from computations that the confidence intervals are consistent with the distribution of estimates calculated from different sample trajectories, but are not consistent with biases introduced by correlations in the interpoint distances. Unless the sample trajectories are sufficiently long the confidence interval cannot be regarded as useful. It is vitally important at present to determine the extent of the correlation of interpoint distances for a given length sample trajectory on an attractor of a given dimension, and hence determine the
226
K. Judd / Improved estimator of dimension
effect of this correlation on the dimension estimate. Only by doing this can reliable confidence intervals be given for a dimension estimate. The dimension estimator presented here and the model attractor of the V × Z form should provide useful information about this problem and present research is being directed towards this goal.
8
5 4
,,"I, 2 7. If the attractor is u n k n o w n , then how large an error can be expected?
Suppose we have no information about the structure of an attractor and cannot make an assumption D of the last section to adjust for bias in the dimension estimate. In this case the main form of error in the estimate is the bias. The dimension estimates shown in fig. 4 give an indication of how large this bias error can be. For example, suppose we have a trajectory on some unknown attractor and we embed this trajectory in an eight dimensional space. Assuming the V × Z model is a good approximate of the small scale structure in this embedding, then there are two extreme forms the attractor could have: minimal fractal structure, d e ( V ) < 1, and maximal fractal structure, dim Z = 1. Dimension estimates of attractors with these extreme forms give the maxirrium bias of the estimator and are taken from fig. 4 and shown together in fig. 6. Now suppose we obtained a dimension estimate of 4.5, then the dimensions of the extreme form attractors that give this estimate will mark the extreme possibilities for the true dimension of the attractor. This true-dimension range is indicated in fig. 6 by the open-arrowed line, and is, roughly, from 3.5 to 5. If the dimension estimate obtained were 5.5, then the corresponding true-dimension range is indicated by the solid-arrowed line in fig. 6, and is, roughly; f r o m 4 to 6.5. It can be seen from fig. 6 that only dimension estimates less than about 3.5 are at all accurate and consistent with the confidence intervals obtained using only assumptions A; B and C of the last section.
1 1
2
3
4
5
6
7
8
True Dimension Fig. 6. The possible size of errors. Dimension estimates of attractors for an embedding dimension of 8 with the extreme forms of minimal fractal structure, dc(W)< 1, and maximal fractal structure, dim Z= 1, A dimension estimate of 4.5 could be obtained from an attractor with a true dimension indicated by the open-arrowed line, and for dimension estimate of 5.5 a true-dimension range is indicated by the solidarrowed line. Only dimension estimates less than about 3.5 are at all accurate.
8. C o n c l u s i o n s
A n estimator of dimension has been described that has many advantages over the conventional implementation of the Grassberger-Procaccia method. Specifically, it eliminates the operator dependent characteristics, such as choosing a scaling region, it uses the maximum amount of information from a time series and can provide a form of confidence interval under ideal circumstances. However, there can be a significant bias of the estimate that is dependent on properties of the attractor. This bias is the most serious flaw of the dimension estimator. This flaw is inherent in any dimension estimator based on the interpoint distances calculated from a sample trajectory and results from the correlation of these distances. Present research is being directed toward the problem of discovering the effect of this correlation, d e t e r m i n i n g - empirically if necessary - the estimator's bias. Because of the simplicity of the V × Z model of an attractor, it is possible that the problem is tractable in this case and that
K Judd ~Improved estimator of dimension
more useful confidence intervals for the dimension estimator presented here can be provided.
Acknowledgements This work was supported in part by a Japan Society for the Promotion of Science Post Doctoral Fellowship. I wish also to thank A.I. Mees for encouragement and many helpful comments and suggestions.
227
unique y such that Et~Krrt2h7 v = 1. Then l i m s u p , _ , 0 P v ( ~ ) / e " is finite and non-zero because pv(e) is bounded and by (A.1)is always the convex combination of other finite values. Hence we may write p v ( e ) = e"f(e) where limsup, --.0 f(E) > 0 for all e < ~0. It follows that d~(V, ~ v ) -- lim sup
log P v ( e ) log e
--- a + lim sup log - - f (=e )a ~--,0 log e
Appendix. Proof of propositions
.
[]
Proof of proposition 2. Eq. (1) stated that
Proof of proposition 1. Let m(V~) =j if V~ ~ ~bj.U. We have that
Px(e) = fo°°Pv(~e2 - s 2 ) g ( s ) ds,
Pr([ V / + I - Vj+,[ < e )
where g(e) is the probability density function of
= Pr([~+,-
Vj+,[ < e m(V~+l) 4= m(V/+,))
+ E ~r~ Pr(IV~- ~1
--m(Vj+,) = I ) . For e < e 0 the first term is zero and IV/+ 1 - V/+ll < e < e 0 implies m(V/+ 1) = m(Vj+ 1) = l for some l and so IVi - Vii
Proof of corollary of proposition 1. Assume that V is non-trivial so that Pv is not identically zero. By proposition 1 we have for all e < e0 that P v ( e ) -- E rr2(ll~'pv(elht)
)
~ I~K
X[Aoa 0 + (Aoa I +Alao)s] ds +o(c '~+2)
=Aoao~'~+lf~/2(cos 0) "+1 dO -
This implies that
• ---,0
px(O= fo(VVJ- s2)°
(A.1)
,*'1
limsup p
pz(e). First consider the case of Z = [0, L] with a uniform measure. It is easily shown that g ( E ) = gl(e)=a0+ale, where a 0 = l / L and a l = - 1 / L 2. Also suppose that p v ( e ) = e ~ [ A 0 + Ale + ~(e2)][1 + f(e)] where lim sup,.~ 0 f ( e ) = 0. (The term f is introduced to carry the singular part of the distribution density.) Then
~, \
I ]
limsup p
)
~--*0
from which it follows that limsup,_~ 0 p v ( e ) / e ~ is finite if and only if EtaKTr2At--~'< 1, and is zero if the inequality is strict. Let a be the
( Aoa 1 +Ala0)~ ~+1 a + 2
+°(e'~+2)"
It is stated that for any smooth one-dimensional extension Z, with absolutely continuous measure tt z, it is sufficient to write g(e)=gl(e)+ ae(e2), where a 0 > 0 and a I < 0, to represent the effect of the dimension and boundedness of Z; the higher order terms depend on the curvature and torsion of Z and the nonuniformity o f / ~ z . (For example: if Z is a circle of diameter D then
K. Judd / Improvedestimatorof dimension
228
g(e) = 2/'rrX/~-E2; and if Z = [0, 11 b u t /,z(X) = x/L then g(E) = [(L - e)a/3 + e ( L - ~)2/2]/L3,) T h e stated result follows because Z can be approximated arbitrarily closely by short straight lines with uniform measure. The above implies that if Z is a t-dimensional box Z = [ 0 , L t ] × . . . × [ 0 , L ' ] with a uniform measure on each c o m p o n e n t subspace then g(6) = g t ( 6 ) ---=ao Et-1 + ... +at 62t-1. This follows from r e p e a t e d one-dimensional extensions, and by a similar a r g u m e n t to the one-dimensional case for any smooth t-dimensional extension Z then g(E) = gt(E) -+- ~(~.2t). Define a linear space of integrable functions by
F u r t h e r m o r e if lim sup, _. 0 f ( E ) = 1 then , ~ ( x'~(1
+f(x)),x")(e)
= ~l°( Xa, Xn ) -.~O.(xa+n+ 2). If d c ( V , / z u ) = a then as in the corollary of proposition 1 we may write p v ( e ) = e"q(e)[1 + f ( e ) ] where q is a polynomial of degree t - 1 and lim s u p , _ , 0 f ( e ) = 0. Thus if a smooth b o u n d e d t-dimensional Z has probability density g(e)= gt(e) + ~ ( E 2 t ) , then px(e) =
~(x'~q(x), gt(x))(e)
+a(e•+2t),
where the first term is a function of the required form. [] and limsup,_~ o f ( x ) > 0} and a bilinear o p e r a t o r on . ~ [ x ] by
2ga(q,g)(x) = foXq( x2~-y2 ) g ( y ) d y . W e note that for n ~ 7/+
~ ( x~,x')( e) + A.,..~ ~-n÷~, where
A,,,=
{
fo~'/2(cosO)"+ldO
if n = 0 ,
1/(a + 2)
if n = 1,
1-n
-~--4-'-~A,~+2,n_2
if n > 2.
References [1] J.D. Farmer, E. Ott and J.A. Yorke, Physica D 7 (1983) 153-180. [2] P. Rapp, A.M. Albano, A.I. Mees and G.C. deGuzman, Chaos in Biological Systems, eds. H. Degn etal. (Plenum, New York, 1987). [3] J.P. Zbilut et al., Math. Biosci. 90 (1988) 49-70. [4] P. Grassberger and I. Procaccia, Phys. Rev. Lett. 50 (1983) 346-349. [5] J.B. Ramsey and H.-J. Yuan, Phys. Lett. A 134 (1989) 287-297. [6] J. Theiler, Phys. Rev. A 41 (1990) 3038-3051. [7] LS. Young, IEEE Transactions on Circuits and Systems, Vol. CAS-30, No. 8 (1983) 599-607. [8] K.T. Judd, Ph.D. thesis, Dept. Mathematics, University of Western Australia (1990). [9] W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling, Numerical Recipes: The Art of Scientific Computing (Cambridge Univ. Press, Cambridge, 1988).