Ruggedness measures of medical time series using fuzzy-rough sets and fractals

Pattern Recognition Letters 27 (2006) 447–454 www.elsevier.com/locate/patrec Ruggedness measures of medical time series using fuzzy-rough sets and fr...

Download PDF

394KB Sizes 0 Downloads 37 Views

Report

PDF Reader
Full Text

Pattern Recognition Letters 27 (2006) 447–454 www.elsevier.com/locate/patrec

Ruggedness measures of medical time series using fuzzy-rough sets and fractals Manish Sarkar

*

Department of Computer Science and Engineering, University of Connecticut, 191 Auditorium Road, U-155, Storrs, CT 06269-3155, USA Received 3 April 2004; received in revised form 21 January 2005 Available online 2 November 2005 Communicated by T.K. Ho

Abstract This paper attempts to characterize a medical time series by quantifying the ruggedness of the time series. The presence of two close data points on the time axis implies that these points are similar along the time axis. It creates fuzzy similarity. Following the principle ‘‘similar causes create similar eﬀects’’, we expect that the magnitudes corresponding to those two data points should also be similar. Frequently, it is not observed in a time series. One of the reasons is as follows: if other features have been considered along with the time information, then those two close data points would have looked diﬀerent. Consequently, the magnitudes corresponding to those two apparently similar points become diﬀerent. Therefore, if we consider the closeness along the time axis as a cause, then the eﬀect, i.e., the corresponding magnitudes could be either same or similar or completely dissimilar. This phenomenon makes cause–eﬀect relationship, i.e., time versus magnitude relationship, one-to-many. Speciﬁcally, the closeness creates fuzziness, the one-to-many relationship creates roughness, and together they form fuzzy-roughness. If the ruggedness is expressed as the fuzzy-roughness, then in some time series it is observed that the fuzzy-roughness of a part of the time series is similar to that of the whole time series. Speciﬁcally, the scaling up of the fuzzy-roughness follows the power law of fractal theory. Experiments on ICU data sets show that the ruggedness measure using the fuzzy-rough set based fractal dimension is more robust than some popular measures of ruggedness like Hurst exponent. 2005 Elsevier B.V. All rights reserved. Keywords: Characterization; Time series; Fuzzy; Rough; Fuzzy-rough; Hurst exponent and fractal

1. Introduction 1.1. Motivation In intensive care units (ICUs), diﬀerent physiological monitors generate a large number of time stamped data or time series (Fig. 1). By comparing the time series patterns of the patient with those of a healthy person, physicians can identify the state in which the patient currently is (Morik et al., 2000). The mechanism that human beings use to ﬁnd the similarity between any two time series is a

*

Tel.: +1 860 486 2584; fax: +1 860 486 4817. E-mail address: [email protected]

0167-8655/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2005.09.007

complex perceptual activity. Although the interpretation of similarity is essentially subjective, we notice that measuring the degree of similarity requires (a) a proper representation or characterization of the time series such that the characterization is not inﬂuenced by translation, scaling, noise, outliers and nonstationarity, and (b) a suitable measure of similarity that uses the characterization. In this paper, we focus on the characterization aspect of time series data. To characterize medical time series, measures of dispersion, Fourier transforms, abstractions, etc., are employed (Addison, 1997). None of these measures are in general sufﬁcient. For instance, the measure of dispersion such as the standard deviation in the time domain increases with increasing interval. The Fourier transform assumes that

448

M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454

1.3. Scope In this paper, the fractal dimension is investigated in the context of ICU data sets (Fig. 1). In this time series, the blood pressure of a patient is collected at almost uniform intervals. For the sake of simplicity, in the rest of the paper we are assuming that the Euclidean dimension of the time series is two. However, the results reported in this paper hold for higher dimensions as well. 1.4. Existing works

Fig. 1. Data collected from intensive care units (ICUs). The abscissa and ordinate indicate the time in minutes and the systolic blood pressure in mm Hg for a particular patient.

the time series is stationary, which may not be true in many cases. Due to these drawbacks, often more than one measure is applied to characterize the time series. All these measures attempt to extract some regularity from the apparent randomness inherent in the time series. In some time series, it has been noticed that if the ruggedness or irregularity of a part of the time series is scaled up, then the resultant ruggedness becomes similar to the ruggedness or irregularity of the whole time series. This clue can act as the regularity in the time series, and thus can be used to characterize the time series. Hurst exponent (Addison, 1997; Hurst, 1951; Hurst et al., 1965; Vicsek, 1992) intends to quantify this clue such that the quantiﬁed values are relatively insensitive to translation, scaling, noise and nonstationarity.

Although there are various methods to estimate Hurst exponent, the principle remains same. Initially the time axis of the time series is partitioned into equal intervals. Certain statistic like mean value is computed for each interval, and then it is observed how much this statistic varies across all the intervals. This measure quantiﬁes the ruggedness of the time series for that interval width. Now the length of the interval is changed, and the above procedure is repeated to estimate the ruggedness measure again. The diﬀerence between these two values of the ruggedness measure indicates at what rate the ruggedness measure varies when the resolution of the time series changes. If this rate follows a power law, then it becomes an estimate of the Hurst exponent. The following problems are encountered while estimating the Hurst exponent by covering the data points with intervals: 1. The boundary of each interval is rigid. The data point just outside the boundary is not considered the member of the interval. 2. The contributions of all the data points inside an interval are treated equally.

1.2. Objective While calculating the Hurst exponent, the similarity between a part and the whole time series is interpreted as the statistical self-similarity. It is assumed that the repeated occurrences of a (or a ﬁnite set of) particular pattern (also known as generator) create a part and the whole time series. It is assumed that the part and the whole time series look diﬀerent because the occurrences of the generators are diﬀerent. In real data, diﬀerent generators may generate the part and the whole time series, and the number of generators may be inﬁnite. Hence, it is more realistic to assume that the part and the whole time series look similar when similar generators generate them, not necessarily by the same generator with diﬀerent occurrences. In this spirit, this paper proposes a fractal dimension (Polkowski, 2002; Vicsek, 1992) that quantiﬁes the regularity of the time series by ﬁnding how the fuzzy-roughness scale changes when the time series is observed for longer intervals. For the time series, the proposed dimension may act as a characteristic, which is relatively insensitive to translation, scaling, noise and nonstationarity.

These two factors make the Hurst exponent brittle, and hence the measured value of the exponent varies signiﬁcantly with the slight change of the position of the data point or with the slight change of the interval position. 1.5. Proposed method While determining the Hurst exponent, the aim is to ﬁnd how the roughness of the time series scales up/down with the decrement/increment of the length of each interval. All the points covered by each interval are treated equally, and the data point that is just outside the interval is not considered. In other words, all the data points inside the interval are considered same, and all the data points outside the interval are diﬀerent. It makes the concept of similarity hard or binary. If the position of the interval is changed slightly, then some data points that were same, become diﬀerent and vice versa, and thus the measured exponent changes signiﬁcantly. This problem is reduced if the concept of the interval is changed to a function, for example to a Gaussian, where the boundary is not sharp,

M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454

rather fuzzy. It is because all the points (say hxi, yii) of the time series are treated as similar (with varying degrees) to the point hxj, yji, around which the Gaussian is built. If the width of the Gaussian is changed, then the similarity does not change abruptly. The closer hxi, yii and hxj, yji are, the more is the similarity. The proposed method exploits this trick to derive a fractal dimension that can be used to quantify the ruggedness of a time series. This fractal dimension can also be used as a feature while comparing more than one time series. 2. Backgrounds of Hurst exponent and rough uncertainty 2.1. Hurst exponent Usual methods of estimating Hurst exponents can be structured into the following three steps: • Sequence of partitions: The whole time axis is partitioned into equal intervals. Each interval isolates some scale of observations. • Single scale statistics: It is computed using the following two statistics: – Local statistics: Statistics based on the values of the ordinates within a single interval is extracted. For example, the local statistics in the ith interval can be the mean of the ordinates of all the data points that are having abscissa values in the ith interval. – Partition based statistics: A measure or measures summarizing the local statistics from all the intervals. For instance, it can be the variance of the means in each interval. • Transscale statistics: Estimates of the Hurst exponent are derived from the partition based statistics over a range of interval lengths. Generally, the transscale statistics is the ratio of the logarithm of the partitionbased statistics and the logarithm of the length of the interval.

449

The partition-based statistics IR/S(dl) is the arithmetic mean of R(i, dl)/S(i, dl) for all the intervals. The whole process is repeated for several lengths of the interval dl. The transscale statistics is the slope of the linear regression that ﬁts a plot of log(IR/S(dl)) vs. log(dl) for all l. Intuitively, R(i, dl) measures the variation of the cumulative value of the ordinate (i.e., W) in the ith interval. Instead of the ordinate (y), this method speciﬁcally uses W because the eﬀect of noise and outlier is usually less in W. If we compute the average of R across all the intervals of length dl, then the average value would provide a measure of roughness. However, if the ith interval has many jumps, R(i, dl) would be large, and its large value would dominate in the calculation of the average. Consequently, the computed average value would be very diﬀerent from the actual average value. To reduce this problem, R(i, dl) is normalized using S(i, dl), which represents the spread of the ordinate values within the ith partition. Dispersional analysis: It is similar to the R/S method. But the diﬀerences are • the local statistics is the mean of the ordinates values in each interval, • the partition based statistics is the standard deviation of the means, and • the transscale statistics is 1.0 plus the slope of the linear regression obtained from the ﬁt of log(partition based statistics) vs. log(dl). The dispersional analysis is just opposite to the R/S method. In each interval, the R/S method determines the amount of variation of the cumulative values of the ordinate. Then it computes on an average how much the diﬀerence is across all the intervals. In contrast, the dispersional analysis ﬁnds the average value of the ordinate in each interval, and then it computes how much this average value diﬀers when all the intervals are considered. 2.2. Rough sets

Two popular approaches to estimate the Hurst exponent are as follows: Rescaled range or R/S method: It partitions the time series {hx1, y1i, hx2, y2i, . . ., hxn, yni} into equal intervals each of length dl. Let us call the ith interval W(i, dl). The local statistics in the interval W(i, dl) is deﬁned as R(i, dl)/ S(i, dl), where Rði;dl Þ ¼ maxfhxj ;Wj ijxj 2 W ði;dl Þg Yj

minfhxj ;Wj ijxj 2 W ði;dl Þg; Yj

ð1Þ

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ X 1 2 ðy y Þ ; Sði;dl Þ ¼ fhxj ;y j ijxj 2W ði;dl Þg j kfhxj ;y j ijxj 2 W ði;dl Þgk ð2Þ Pj

Pn 1

where Wj ¼ k¼1 ðy k y Þ; y ¼ n the cardinality of the set A.

k¼1 y k

and kAk indicates

Let R be an equivalence relation on a universal set X. Moreover, let X/R denote the family of all equivalence classes induced on X by R (Klir and Yuan, 1995; Pawlak, 1982, 1991; Pawlak et al., 1995). One such equivalence class in X/R that contains x 2 X is designated by [x]R. For any output class C X, we can deﬁne the lower R(C) and upper RðCÞ approximations, which approach C as closely as possible from the inside and outside, respectively (Pawlak, 1991). Here, RðCÞ ¼ \f½xR j½xR C and x 2 X g

ð3Þ

is the union of all equivalence classes in X/R that are contained in C and RðCÞ ¼ [f½xR j½xR \ C 6¼ ; and x 2 X g

ð4Þ

is the union of all equivalence classes in X/R that overlap with C. The rough set RðCÞ ¼ hRðCÞ; RðCÞi is a representation

450

M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454

of the given set C by R(C) and RðCÞ. The set RðCÞ RðCÞ is a rough description of the boundary of C by the equivalence classes of X/R. The approximation is rough uncertainty free if RðCÞ ¼ RðCÞ. Thus, when all the patterns from an equivalence class do not carry the same output class label, the rough uncertainty is generated as a manifestation of the one-to-many relationship between that equivalence class and the output class labels.

3. Proposed method

2.3. Fuzzy-rough sets

1. How similar any two data points are along the time axis: The similarity decreases as the distance between the data points increases along the time axis. This similarity can be quantiﬁed in the form of fuzzy membership functions. 2. How similar the data points are along the ordinate: This similarity can also be quantiﬁed in the form of fuzzy membership functions.

A rough-fuzzy set (Dubois and Prade, 1990, 1992) is a generalization of the rough set in the sense that here the output class is fuzzy (Zadeh, 1965). Let X be a set, R be an equivalence relation deﬁned on X, and the output class C X be a fuzzy set. The rough-fuzzy set is a tuple hRðCÞ; RðCÞi, where the lower approximation R(C) and the upper approximation RðCÞ are fuzzy sets of X/R, with membership functions deﬁned by Dubois and Prade (1990, 1992) lRðCÞ ð½xR Þ ¼ infflC ðxÞjx 2 ½xR g

8x 2 X

ð5Þ

8x 2 X .

ð6Þ

and lRðCÞ ð½xR Þ ¼ supflC ðxÞjx 2 ½xR g

Here, lR(C)(x) and lRðCÞ ðxÞ are the membership values of [x]R in R(C) and RðCÞ, respectively. A fuzzy-rough set is a further generalization of the rough-fuzzy set. When the equivalence classes are not crisp, they are in the form of fuzzy clusters F1, F2, . . . , FH generated by a fuzzy weak partition (Dubois and Prade, 1990, 1992) of the input set X. Here, H is the number of clusters. The term fuzzy weak partition means that each Fj is a normal fuzzy set, i.e., supflF j ðxÞg ¼ 1 and inf maxflF j ðxÞg > 0 ð7Þ x

x

j

while sup minflF i ðxÞ; lF j ðxÞg < 1 8i; j 2 f1; 2; . . . ; H g; i 6¼ j. x

Fi

ð8Þ Here, lF j ðxÞ is the fuzzy membership function of the pattern x in the cluster Fj. In addition, the output class C may be fuzzy too. Given a weak fuzzy partition F1, F2, . . . , FH on X, the description of any fuzzy set C by means of the fuzzy partitions under the form of an upper and a lower approximation C and C is as follows: lC ðF j Þ ¼ inf fmaxð1 lF j ðxÞ; lC ðxÞÞg x

lC ðF j Þ ¼ supfminðlF j ðxÞ; lC ðxÞÞg

8x.

8x

ð9Þ ð10Þ

x

The tuple hC; Ci is called fuzzy-rough set. Here, lC(x) 2 [0, 1] is the fuzzy membership of the input x to the class C. The fuzzy-roughness appears when a fuzzy cluster contains patterns that belong to diﬀerent classes.

3.1. Sources of uncertainty We can identify the following two uncertainties that can inﬂuence the ruggedness of the time series: Fuzzy uncertainty: It may arise due to the following two factors:

Rough uncertainty: It may appear due to the following reason: Lack of features makes two originally dissimilar points neighbors: When the spatial representations of all the neighbors along the time axis are similar, it is expected that the corresponding ordinate values should also be similar. Due to the incomplete knowledge about the process generating the data, generally the input representation is not perfect. As a result, hxi, yii and its neighbors appear similar based on the time information, although they may not be similar when the other features are augmented. It makes the input–output relationship one-to-many, and the rough uncertainty appears. In some contexts, these kinds of data points are called noisy. 3.2. Measures of fuzzy-roughness Let us assume that along the time axis, the neighborhood region of each data point (say hxi, yii, where xi 2 X, yi 2 Y) is crisp (called W). Typically, the neighborhood region is an interval in which xi lies. If all the neighbors have the same magnitudes, then there is no roughness in the neighborhood. However, if any neighbor has ordinate diﬀerent from yi, then the rough uncertainty arises in W. Although the neighbors are similar from the features perspective, they are not similar from the magnitude perspective. It makes the input–output relationship oneto-many. This uncertainty can be captured using rough ownership function r : X · Y ! [0, 1]. The rough ownership function for the data point hxi, yii with the neighborhood W is deﬁned by Sarkar (2002) ri;W ¼

kW \ Sk ; kW k

ð11Þ

where S is the set of data points in W with magnitudes yi, and kWk denotes the cardinality of the set W. If all the neighbors have magnitudes yi, then ri,W is equal to one indicating that the neighborhood is smooth. In contrast, if ri,W is equal to zero, then possibly hxi, yii is an outlier.

M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454

The confusion is maximum when half of the neighbors have the magnitude yi, and the remaining half of the neighbors have diﬀerent magnitudes. Thus ri,W = 0.5 indicates the maximum roughness. Note that similar kind of formulation is also used in literature to measure rough inclusion (Polkowski and Skowron, 1996). Next we make the situation more complex, but closer to the reality. Till now we have assumed that each neighbor resides in the structure W equally and completely. Now every training pattern belongs to W with diﬀerent degrees, i.e., the training pattern closer to hxi, yii along the time axis belongs to the neighborhood W to a high degree, and the training pattern far away from W supports the neighborhood by a negligible amount. Therefore, W spans all the training patterns. The similarity along the time axis can be measured using the value of a Gaussian at the point hxi, yii. One possible way to deﬁne the Gaussian is 2 lx ði; j; dl Þ ¼ exp ðxi xj Þ =ð2d2l Þ ; ð12Þ where dl is the width of the Gaussian. Hence the amount of the total roughness at the point hxi, yii is n 1 X l ði; j; dl ÞlS ðjÞ; ð13Þ si;dl ¼ n 1 j¼1 x j6¼i

where lS(j) = 1 if yj = yi, otherwise lS(j) = 0. We can still ﬁne-tune Eq. (13). Till now we have considered only the neighbors that have same magnitudes. We relax it to know whether the magnitudes of the neighbors are similar or not, i.e., we modify the characteristic function lS to the fuzzy membership function ly. Speciﬁcally, ! 2 ðy i y j Þ ly ði; j; ry;i;dl Þ ¼ exp ð14Þ 2r2y;i;dl represents the fuzzy similarity between hxi, yii and hxj, yji along the ordinate, where vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ uPn 2 u j¼1 lx ði; j; dl Þðy i y j Þ Pn ry;i;dl ¼ t ð15Þ j¼1 lx ði; j; dl Þ indicates the spread of the Gaussian along the ordinate and around the ith data point hxi, yii. Thus, we incorporate the concept of fuzzy similarity in the rough ownership function to obtain the following fuzzy-rough ownership function: ii;dl ¼

n 1 X l ði; j; dl Þly ði; j; ri;dl Þ. n 1 j¼1 x

ð16Þ

j6¼i

Note that when ry;i;dl ¼ 0, Eq. (14) becomes undeﬁned. To avoid it, in this case ry;i;dl is made equal to one. For an absolutely smooth and horizontal time series, the fuzzy-rough ownership value at each data point would be close to one. In contrast, around a sudden jump or around a discontinuity, the fuzzy-rough ownership value would be close to zero. The confusion about the smoothness of the

451

time series is high when the fuzzy-rough ownership value is in between zero and one. 3.3. Scaling of fuzzy-roughness The ruggedness in terms of fuzzy-roughness at the data point hxi, yii is quantiﬁed by ii;dl , and this value varies from point to point. We would like to ﬁnd the average fuzzyroughness at any data point. Then we would examine at what rate the average fuzzy-roughness changes when the resolution of the time series is changed. We deﬁne a term called partition function that measures the average fuzzy-roughness at any data point of the time series. It is I FR ðdl Þ ¼

n 1X ii;d . n i¼1 l

ð17Þ

We next investigate at what rate the average fuzzy-roughness scales down/up when the resolution of the time series is increased/decreased. We are particularly interested to know whether any power law relationship ðI FR ðdL Þ / dDFR Þ holds. If there is any particular power law, then it can be calculated from DFR ¼

logðI FR ðdl ÞÞ . logðdl Þ

ð18Þ

Ideally, DFR should be measured when the lengths of the neighborhood intervals are very small i.e., when dl ! 0. Thus, DFR ¼ lim

dl !0

logðI FR ðdl ÞÞ . logðdl Þ

ð19Þ

In reality, at the limits of the resolution, DFR does not remain constant. Moreover, DFR ﬂuctuates due to the noise and outliers. Therefore, a more reﬁned estimate can be obtained from the slope of the best-ﬁt line that passes through hlog(dl), log(IFR(dl))i "i = 1, 2, . . ., L, where L is an integer such that the Gaussian with the width dl is suﬃcient to cover the time series with a high degree. If we cannot ﬁt a straight line due to the randomness of the data, then most likely the power law does not hold for the time series, and in that case, the fractal dimension cannot characterize the time series. Thus, using the power law, the fractal dimension DFR measures how the fuzzy-roughness scales down/ up when the resolution is increased/decreased. The proposed fractal dimension acts as a feature, which reﬂects the self-similar property of the time series. The algorithm is shown in Fig. 2. 3.4. Salient aspects of the proposed method The use of Gaussians enables us to interpret the basic philosophy of the fractal in a diﬀerent manner. The Hurst exponent is proposed presuming that there exists a set of generators. It is assumed that the part and the whole time series are constructed by the same generator (Mandelbrot,

452

M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454

Fig. 2. The algorithm to compute the fuzzy-rough set-based fractal dimension DFR. Note that we need to compute lx(i, j, dl) for all i and j before computing ly ði; j; ry;i;rl Þ.

1983). However, the part and the whole time series look diﬀerent since in these two cases the occurrences of the generators are diﬀerent. The Hurst exponent aims to capture how the probability of occurrence changes when the resolution is changed. In reality the number of generators may be inﬁnite, and hence, we may not have any access to know how many of them are present and how they are, and hence it is diﬃcult to estimate their occurrences. Instead of trying to know the generators, it is more attractive to observe the pattern formed in a local region, and how the fuzzy-roughness evolves when the resolution is increased. The proposed dimension is directed along this direction. The advantages of the proposed method are as follows: 1. The fractal nature of the time series is viewed as how the fuzzy-roughness changes with decreasing resolution. Unlike the Hurst exponent, the proposed method does not need to assume any particular generator and the occurrence of the generator. 2. The change of the length of the interval does not change the fractal dimension abruptly. 3. When the dimension is computed from the slope of the best-ﬁt line, we do not face any stair-casing eﬀect.

4. The Hurst exponent can be derived from the proposed dimension. 5. Depending on the domain, other results of the fuzzy set theory can be incorporated into the proposed dimension.

3.5. Proposed technique and generalized fuzzy-rough sets The rough set framework proposed in (Pawlak, 1982) is based on the equivalence relation. In other words, this rough set framework relies on the reﬂexivity, symmetry and transitivity properties of a relation. Various modiﬁcations of this framework have been proposed by relaxing some of those three properties (Inuiguchi et al., 2003; Intan and Mukaidono, 2002). For example, (Slowinski and Vanderpooten, 2000) proposes tolerance relation where the reﬂexivity and symmetry should hold, but the transitivity may or may not hold. Using the tolerance relation, the concept of rough sets has been generalized. All the three properties reﬂexivity, symmetry and transitivity have been relaxed in (Wu et al., 2004), and the concept of rough set is generalized for any kind of binary relation. Let U and W be two ﬁnite universes. Suppose that

M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454

R is an arbitrary relation from U to W. We can deﬁne a setvalued function f : U ! PðW Þ by f ðxÞ ¼ fy 2 W : ðx; yÞ 2 W g;

x 2W.

ð20Þ

Obviously, any set-valued function from U to W deﬁnes a binary relation from U to W by setting R = {(x, y) 2 U · W : y 2 f(x)}. For any set C W, a pair of lower and upper approximations can be deﬁned as RðCÞ ¼ fx 2 U : f ðxÞ Cg;

ð21Þ

RðCÞ ¼ fx 2 U : f ðxÞ \ C 6¼ ;g.

ð22Þ

RðCÞ ¼ hRðCÞ; RðCÞi is referred to as a generalized rough set. The above deﬁnition of rough sets can be used to generalize fuzzy-rough sets. Let R be an arbitrary fuzzy relation from U to W. Deﬁne the mapping / : U ! PðW Þ by /ðx; yÞ ¼ Rðx; yÞ

ðx; yÞ 2 U W .

453

because of the existence of inconsistent data. The remaining 1979 data points were used for the experiments. We have compared the errors in estimating the proposed dimension and the Hurst exponent. Using Eq. (17), we have calculated log(IFR(dl)) and log(dl) 100 times (i.e., L = 100) for diﬀerent values of dl. The minimum and maximum values of d are the half of the minimum and maximum distances between any two data points. Assuming q = 1, log(IFR(dl)) is plotted against log(dl) (Fig. 3). After ﬁtting the best-ﬁt line through these points, we have obtained DFR as 0.9010. The smoothness of the best-ﬁt line indicates that the time series follows the power law of fractal theory. By plotting log(IR/S(dl)) vs. log(dl) for diﬀerent values of l, we have obtained the Hurst exponent as 0.5487 (Fig. 4). We can observe that the plot for the Hurst exponent is not as smooth as that of the proposed

ð23Þ

For any a 2 ½0; 1; /a : U ! PðW Þ is deﬁned as /a ðxÞ ¼ fy 2 W : /ðx; yÞ P ag;

x 2 U.

ð24Þ

For a given a 2 [0, 1] any set C W, a pair of lower and upper approximations can be deﬁned as /a ðCÞ ¼ fx 2 U : /a ðxÞ Cg

ð25Þ

ðCÞ ¼ fx 2 U : / ðxÞ \ C 6¼ ;g. / a a

ð26Þ

h/ðCÞ; /ðCÞi is called generalized fuzzy-rough set, where /ðCÞ ¼ _a2½0;1 ða ^ /1a ðC aþ ÞÞ; ðC a ÞÞ. /ðCÞ ¼ _a2½0;1 ða ^ / a

ð27Þ ð28Þ

Note that the fuzzy-rough ownership function (Eq. (16)) can be used in the case of generalized fuzzy-rough sets (Eqs. (27) and (28)). Consequently, the computation of the fuzzy-rough set based fractal dimension remains same while using generalized fuzzy-rough set.

Fig. 3. The plot to determine the proposed fractal dimension for the data set shown in Fig. 1. Here d is the width of the Gaussian, and IFR is the partition function of the proposed dimension.

4. Results and discussions The proposed method was tested on diﬀerent benchmark ICU data sets available in the UCI repository of machine learning databases (Blake and Merz, 1998). In the data set, as shown in Fig. 1, the ordinate represents systolic arterial pressure in mm Hg, and the abscissa indicates twelve-hour duration during the ICU treatment of an adult respiratory distress syndrome (ARDS) patient under mechanical ventilation. The blood pressure was monitored continuously approximately once in every twelve seconds. Other dependent parameters like mean airways pressure and tidal volume were recorded occasionally. Each data point is numeric, and the total number of data points is 1985. The collected data set is in compressed form; here values for the continuously monitored parameters remain steady between consecutive recorded measurements, and thus a lack of recorded measurements should not be interpreted as a lack of data. We removed six observations

Fig. 4. The plot to determine the Hurst exponent for the data set shown in Fig. 1. Here d is the length of the interval, and IR/S is the partition function of the Hurst exponent computed using the R/S method.

454

M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454

dimension. This kind of stair-casing eﬀect is occurring due to the presence of the crisp boundary in the calculation of the Hurst exponent. To compare the Hurst exponent and the proposed dimension, the error of ﬁt of the proposed dimension is expressed as E¼

L X 2 ½m logðdl Þ þ c logðI FR ðdl ÞÞ ;

ð29Þ

Acknowledgments A strategic research Grant RP960351 from the National Science and Technology Board and the Ministry of Education, Singapore, has supported the work of this paper. The encouragement of Prof. Tze-Yun Leong, National University of Singapore, is highly acknowledged.

l¼1

where m is the slope and c is the intercept of the straight line along the ordinate. The error provides a measure of ﬁt so that the lower the value of E, the better is the ﬁt. We have found E for the proposed dimension as 0.774. Similarly, E for the Hurst exponent is calculated by substituting log(IFR(dl)) of Eq. (29) by log(IR/S(dl)). It results E = 1.4667, which is considerably high compared to E of the proposed dimension. To ﬁnd how well DFR characterizes and discriminates the time series, we have considered a set of ICU time series from the same patient for the systolic and diastolic blood pressures. The number of data for each time series varies between 1600 and 2000. We have used a training set consisting of 50 time series of systolic blood pressure and the other 50 time series with diastolic blood pressure. Using the Hurst exponent as the sole characteristic of the time series, we have separately found the means of the Hurst exponent for the groups of time series with systolic and diastolic blood pressures. Using these two means, we have classiﬁed a set of 100 time series into the systolic and diastolic groups. The result gave 55.23% classiﬁcation rate. We repeated the same experiment with DFR as the characteristics of the time series. We observed that the classiﬁcation rate had increased to 62.45%. This experiment shows that the characterization capability of DFR is more than that of the Hurst exponent. The above experiment is carried out on the data collected from the same subject. Next we collected time series from eleven diﬀerent ARDS patients. Speciﬁcally, we collected three systolic time series and three diastolic time series from each subject. Thus we had 33 time series corresponding to systolic (and diastolic) data. The number of data in each such time series was between 1600 and 2000. The aim was to ﬁnd whether DFR collected from these data can be used to discriminate systolic and diastolic data from new patients. We found that the classiﬁcation rate was 60.03%, which was slightly lower than what we observed on the data collected from a single patient. It shows that the proposed measure retains the discrimination capability even when it is used with diﬀerent time series collected from diﬀerent patients.

References Addison, P.A., 1997. Fractals and Chaos: An Illustrated Course. Institute of Physics Publishing, London. Blake, C.L., Merz, C.J., 1998. UCI repository of machine learning databases. Available from: . Dubois, D., Prade, H., 1990. Rough-fuzzy sets and fuzzy-rough sets. Int. J. Gen. Syst. 17 (2–3), 191–209. Dubois, D., Prade, H., 1992. Putting rough sets and fuzzy sets together. In: Slowinski, R. (Ed.), Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory. Kluwer Academic Publishers, Dordrecht, pp. 1–2. Hurst, H.E., 1951. Trans. Am. Soc. Civil Eng. 116, 770. Hurst, H.E., Black, R., Sinaika, Y.M., 1965. Long-Term Storage in Reservoirs: An experimental Study. Constable, London. Intan, R., Mukaidono, M., 2002. Generalized fuzzy-rough set by conditional probability relations. Int. J. Pattern Recogn. Artif. Intell. 16 (7), 865–881. Inuiguchi, M., Greco, S., Slowinski, R., Tanino, T., 2003. Possibility and necessity measure speciﬁcation using modiﬁers for decision making under fuzziness. Fuzzy Set. Syst. 137, 151–175. Klir, G.S., Yuan, B., 1995. Fuzzy Sets and Fuzzy Logic—Theory and Applications. Prentice Hall, Englewood Cliﬀs, NJ. Mandelbrot, B., 1983. The Fractal Geometry of Nature. W.H. Freeman Company, New York. Morik, K., Imboﬀ, M., Brockhausen, P., Joachims, T., Gather, U., 2000. Knowledge discovery and knowledge validation in intensive care. Artif. Intell. Med. 19 (3), 225–249. Pawlak, Z., 1982. Rough sets. Int. J. Comp. Inform. Sci. 11, 341–356. Pawlak, Z., 1991. Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers. Pawlak, Z., Grzymala-Busse, J.W., Slowinski, R., Ziarko, W., 1995. Rough Sets. Comm. ACM 38 (11), 89–95. Polkowski, L., 2002. On fractal dimension in information systems. Toward exact sets in inﬁnite information systems. Fundam. Inform. 50 (3–4), 305–314. Polkowski, L., Skowron, A., 1996. Rough mereology: A new paradigm for approximate reasoning. Int. J. Approx. Reason. 15 (4), 333– 365. Sarkar, M., 2002. Rough-fuzzy functions in classiﬁcation. Fuzzy Set. Syst. 132 (3), 353–369. Slowinski, R., Vanderpooten, D., 2000. A generalized deﬁnition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng. 12 (2), 331–336. Vicsek, T., 1992. Fractal Growth Phenomenon. World Scientiﬁc, New Jersey. Wu, W.Z., Mi, J.S., Zhang, W.X., 2004. Generalized fuzzy rough sets. Inform. Sci. 160 (1–4), 235–249. Zadeh, L.A., 1965. Fuzzy sets. Inform. Control, 338–353.

Ruggedness measures of medical time series using fuzzy-rough sets and fractals

Ruggedness measures of medical time series using fuzzy-rough sets and fractals

Recommend Documents