Pattern Recognition Letters 27 (2006) 447–454 www.elsevier.com/locate/patrec
Ruggedness measures of medical time series using fuzzy-rough sets and fractals Manish Sarkar
*
Department of Computer Science and Engineering, University of Connecticut, 191 Auditorium Road, U-155, Storrs, CT 06269-3155, USA Received 3 April 2004; received in revised form 21 January 2005 Available online 2 November 2005 Communicated by T.K. Ho
Abstract This paper attempts to characterize a medical time series by quantifying the ruggedness of the time series. The presence of two close data points on the time axis implies that these points are similar along the time axis. It creates fuzzy similarity. Following the principle ‘‘similar causes create similar effects’’, we expect that the magnitudes corresponding to those two data points should also be similar. Frequently, it is not observed in a time series. One of the reasons is as follows: if other features have been considered along with the time information, then those two close data points would have looked different. Consequently, the magnitudes corresponding to those two apparently similar points become different. Therefore, if we consider the closeness along the time axis as a cause, then the effect, i.e., the corresponding magnitudes could be either same or similar or completely dissimilar. This phenomenon makes cause–effect relationship, i.e., time versus magnitude relationship, one-to-many. Specifically, the closeness creates fuzziness, the one-to-many relationship creates roughness, and together they form fuzzy-roughness. If the ruggedness is expressed as the fuzzy-roughness, then in some time series it is observed that the fuzzy-roughness of a part of the time series is similar to that of the whole time series. Specifically, the scaling up of the fuzzy-roughness follows the power law of fractal theory. Experiments on ICU data sets show that the ruggedness measure using the fuzzy-rough set based fractal dimension is more robust than some popular measures of ruggedness like Hurst exponent. 2005 Elsevier B.V. All rights reserved. Keywords: Characterization; Time series; Fuzzy; Rough; Fuzzy-rough; Hurst exponent and fractal
1. Introduction 1.1. Motivation In intensive care units (ICUs), different physiological monitors generate a large number of time stamped data or time series (Fig. 1). By comparing the time series patterns of the patient with those of a healthy person, physicians can identify the state in which the patient currently is (Morik et al., 2000). The mechanism that human beings use to find the similarity between any two time series is a
*
Tel.: +1 860 486 2584; fax: +1 860 486 4817. E-mail address:
[email protected]
0167-8655/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2005.09.007
complex perceptual activity. Although the interpretation of similarity is essentially subjective, we notice that measuring the degree of similarity requires (a) a proper representation or characterization of the time series such that the characterization is not influenced by translation, scaling, noise, outliers and nonstationarity, and (b) a suitable measure of similarity that uses the characterization. In this paper, we focus on the characterization aspect of time series data. To characterize medical time series, measures of dispersion, Fourier transforms, abstractions, etc., are employed (Addison, 1997). None of these measures are in general sufficient. For instance, the measure of dispersion such as the standard deviation in the time domain increases with increasing interval. The Fourier transform assumes that
448
M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454
1.3. Scope In this paper, the fractal dimension is investigated in the context of ICU data sets (Fig. 1). In this time series, the blood pressure of a patient is collected at almost uniform intervals. For the sake of simplicity, in the rest of the paper we are assuming that the Euclidean dimension of the time series is two. However, the results reported in this paper hold for higher dimensions as well. 1.4. Existing works
Fig. 1. Data collected from intensive care units (ICUs). The abscissa and ordinate indicate the time in minutes and the systolic blood pressure in mm Hg for a particular patient.
the time series is stationary, which may not be true in many cases. Due to these drawbacks, often more than one measure is applied to characterize the time series. All these measures attempt to extract some regularity from the apparent randomness inherent in the time series. In some time series, it has been noticed that if the ruggedness or irregularity of a part of the time series is scaled up, then the resultant ruggedness becomes similar to the ruggedness or irregularity of the whole time series. This clue can act as the regularity in the time series, and thus can be used to characterize the time series. Hurst exponent (Addison, 1997; Hurst, 1951; Hurst et al., 1965; Vicsek, 1992) intends to quantify this clue such that the quantified values are relatively insensitive to translation, scaling, noise and nonstationarity.
Although there are various methods to estimate Hurst exponent, the principle remains same. Initially the time axis of the time series is partitioned into equal intervals. Certain statistic like mean value is computed for each interval, and then it is observed how much this statistic varies across all the intervals. This measure quantifies the ruggedness of the time series for that interval width. Now the length of the interval is changed, and the above procedure is repeated to estimate the ruggedness measure again. The difference between these two values of the ruggedness measure indicates at what rate the ruggedness measure varies when the resolution of the time series changes. If this rate follows a power law, then it becomes an estimate of the Hurst exponent. The following problems are encountered while estimating the Hurst exponent by covering the data points with intervals: 1. The boundary of each interval is rigid. The data point just outside the boundary is not considered the member of the interval. 2. The contributions of all the data points inside an interval are treated equally.
1.2. Objective While calculating the Hurst exponent, the similarity between a part and the whole time series is interpreted as the statistical self-similarity. It is assumed that the repeated occurrences of a (or a finite set of) particular pattern (also known as generator) create a part and the whole time series. It is assumed that the part and the whole time series look different because the occurrences of the generators are different. In real data, different generators may generate the part and the whole time series, and the number of generators may be infinite. Hence, it is more realistic to assume that the part and the whole time series look similar when similar generators generate them, not necessarily by the same generator with different occurrences. In this spirit, this paper proposes a fractal dimension (Polkowski, 2002; Vicsek, 1992) that quantifies the regularity of the time series by finding how the fuzzy-roughness scale changes when the time series is observed for longer intervals. For the time series, the proposed dimension may act as a characteristic, which is relatively insensitive to translation, scaling, noise and nonstationarity.
These two factors make the Hurst exponent brittle, and hence the measured value of the exponent varies significantly with the slight change of the position of the data point or with the slight change of the interval position. 1.5. Proposed method While determining the Hurst exponent, the aim is to find how the roughness of the time series scales up/down with the decrement/increment of the length of each interval. All the points covered by each interval are treated equally, and the data point that is just outside the interval is not considered. In other words, all the data points inside the interval are considered same, and all the data points outside the interval are different. It makes the concept of similarity hard or binary. If the position of the interval is changed slightly, then some data points that were same, become different and vice versa, and thus the measured exponent changes significantly. This problem is reduced if the concept of the interval is changed to a function, for example to a Gaussian, where the boundary is not sharp,
M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454
rather fuzzy. It is because all the points (say hxi, yii) of the time series are treated as similar (with varying degrees) to the point hxj, yji, around which the Gaussian is built. If the width of the Gaussian is changed, then the similarity does not change abruptly. The closer hxi, yii and hxj, yji are, the more is the similarity. The proposed method exploits this trick to derive a fractal dimension that can be used to quantify the ruggedness of a time series. This fractal dimension can also be used as a feature while comparing more than one time series. 2. Backgrounds of Hurst exponent and rough uncertainty 2.1. Hurst exponent Usual methods of estimating Hurst exponents can be structured into the following three steps: • Sequence of partitions: The whole time axis is partitioned into equal intervals. Each interval isolates some scale of observations. • Single scale statistics: It is computed using the following two statistics: – Local statistics: Statistics based on the values of the ordinates within a single interval is extracted. For example, the local statistics in the ith interval can be the mean of the ordinates of all the data points that are having abscissa values in the ith interval. – Partition based statistics: A measure or measures summarizing the local statistics from all the intervals. For instance, it can be the variance of the means in each interval. • Transscale statistics: Estimates of the Hurst exponent are derived from the partition based statistics over a range of interval lengths. Generally, the transscale statistics is the ratio of the logarithm of the partitionbased statistics and the logarithm of the length of the interval.
449
The partition-based statistics IR/S(dl) is the arithmetic mean of R(i, dl)/S(i, dl) for all the intervals. The whole process is repeated for several lengths of the interval dl. The transscale statistics is the slope of the linear regression that fits a plot of log(IR/S(dl)) vs. log(dl) for all l. Intuitively, R(i, dl) measures the variation of the cumulative value of the ordinate (i.e., W) in the ith interval. Instead of the ordinate (y), this method specifically uses W because the effect of noise and outlier is usually less in W. If we compute the average of R across all the intervals of length dl, then the average value would provide a measure of roughness. However, if the ith interval has many jumps, R(i, dl) would be large, and its large value would dominate in the calculation of the average. Consequently, the computed average value would be very different from the actual average value. To reduce this problem, R(i, dl) is normalized using S(i, dl), which represents the spread of the ordinate values within the ith partition. Dispersional analysis: It is similar to the R/S method. But the differences are • the local statistics is the mean of the ordinates values in each interval, • the partition based statistics is the standard deviation of the means, and • the transscale statistics is 1.0 plus the slope of the linear regression obtained from the fit of log(partition based statistics) vs. log(dl). The dispersional analysis is just opposite to the R/S method. In each interval, the R/S method determines the amount of variation of the cumulative values of the ordinate. Then it computes on an average how much the difference is across all the intervals. In contrast, the dispersional analysis finds the average value of the ordinate in each interval, and then it computes how much this average value differs when all the intervals are considered. 2.2. Rough sets
Two popular approaches to estimate the Hurst exponent are as follows: Rescaled range or R/S method: It partitions the time series {hx1, y1i, hx2, y2i, . . ., hxn, yni} into equal intervals each of length dl. Let us call the ith interval W(i, dl). The local statistics in the interval W(i, dl) is defined as R(i, dl)/ S(i, dl), where Rði;dl Þ ¼ maxfhxj ;Wj ijxj 2 W ði;dl Þg Yj
minfhxj ;Wj ijxj 2 W ði;dl Þg; Yj
ð1Þ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 1 2 ðy y Þ ; Sði;dl Þ ¼ fhxj ;y j ijxj 2W ði;dl Þg j kfhxj ;y j ijxj 2 W ði;dl Þgk ð2Þ Pj
Pn 1
where Wj ¼ k¼1 ðy k y Þ; y ¼ n the cardinality of the set A.
k¼1 y k
and kAk indicates
Let R be an equivalence relation on a universal set X. Moreover, let X/R denote the family of all equivalence classes induced on X by R (Klir and Yuan, 1995; Pawlak, 1982, 1991; Pawlak et al., 1995). One such equivalence class in X/R that contains x 2 X is designated by [x]R. For any output class C X, we can define the lower R(C) and upper RðCÞ approximations, which approach C as closely as possible from the inside and outside, respectively (Pawlak, 1991). Here, RðCÞ ¼ \f½xR j½xR C and x 2 X g
ð3Þ
is the union of all equivalence classes in X/R that are contained in C and RðCÞ ¼ [f½xR j½xR \ C 6¼ ; and x 2 X g
ð4Þ
is the union of all equivalence classes in X/R that overlap with C. The rough set RðCÞ ¼ hRðCÞ; RðCÞi is a representation
450
M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454
of the given set C by R(C) and RðCÞ. The set RðCÞ RðCÞ is a rough description of the boundary of C by the equivalence classes of X/R. The approximation is rough uncertainty free if RðCÞ ¼ RðCÞ. Thus, when all the patterns from an equivalence class do not carry the same output class label, the rough uncertainty is generated as a manifestation of the one-to-many relationship between that equivalence class and the output class labels.
3. Proposed method
2.3. Fuzzy-rough sets
1. How similar any two data points are along the time axis: The similarity decreases as the distance between the data points increases along the time axis. This similarity can be quantified in the form of fuzzy membership functions. 2. How similar the data points are along the ordinate: This similarity can also be quantified in the form of fuzzy membership functions.
A rough-fuzzy set (Dubois and Prade, 1990, 1992) is a generalization of the rough set in the sense that here the output class is fuzzy (Zadeh, 1965). Let X be a set, R be an equivalence relation defined on X, and the output class C X be a fuzzy set. The rough-fuzzy set is a tuple hRðCÞ; RðCÞi, where the lower approximation R(C) and the upper approximation RðCÞ are fuzzy sets of X/R, with membership functions defined by Dubois and Prade (1990, 1992) lRðCÞ ð½xR Þ ¼ infflC ðxÞjx 2 ½xR g
8x 2 X
ð5Þ
8x 2 X .
ð6Þ
and lRðCÞ ð½xR Þ ¼ supflC ðxÞjx 2 ½xR g
Here, lR(C)(x) and lRðCÞ ðxÞ are the membership values of [x]R in R(C) and RðCÞ, respectively. A fuzzy-rough set is a further generalization of the rough-fuzzy set. When the equivalence classes are not crisp, they are in the form of fuzzy clusters F1, F2, . . . , FH generated by a fuzzy weak partition (Dubois and Prade, 1990, 1992) of the input set X. Here, H is the number of clusters. The term fuzzy weak partition means that each Fj is a normal fuzzy set, i.e., supflF j ðxÞg ¼ 1 and inf maxflF j ðxÞg > 0 ð7Þ x
x
j
while sup minflF i ðxÞ; lF j ðxÞg < 1 8i; j 2 f1; 2; . . . ; H g; i 6¼ j. x
Fi
ð8Þ Here, lF j ðxÞ is the fuzzy membership function of the pattern x in the cluster Fj. In addition, the output class C may be fuzzy too. Given a weak fuzzy partition F1, F2, . . . , FH on X, the description of any fuzzy set C by means of the fuzzy partitions under the form of an upper and a lower approximation C and C is as follows: lC ðF j Þ ¼ inf fmaxð1 lF j ðxÞ; lC ðxÞÞg x
lC ðF j Þ ¼ supfminðlF j ðxÞ; lC ðxÞÞg
8x.
8x
ð9Þ ð10Þ
x
The tuple hC; Ci is called fuzzy-rough set. Here, lC(x) 2 [0, 1] is the fuzzy membership of the input x to the class C. The fuzzy-roughness appears when a fuzzy cluster contains patterns that belong to different classes.
3.1. Sources of uncertainty We can identify the following two uncertainties that can influence the ruggedness of the time series: Fuzzy uncertainty: It may arise due to the following two factors:
Rough uncertainty: It may appear due to the following reason: Lack of features makes two originally dissimilar points neighbors: When the spatial representations of all the neighbors along the time axis are similar, it is expected that the corresponding ordinate values should also be similar. Due to the incomplete knowledge about the process generating the data, generally the input representation is not perfect. As a result, hxi, yii and its neighbors appear similar based on the time information, although they may not be similar when the other features are augmented. It makes the input–output relationship one-to-many, and the rough uncertainty appears. In some contexts, these kinds of data points are called noisy. 3.2. Measures of fuzzy-roughness Let us assume that along the time axis, the neighborhood region of each data point (say hxi, yii, where xi 2 X, yi 2 Y) is crisp (called W). Typically, the neighborhood region is an interval in which xi lies. If all the neighbors have the same magnitudes, then there is no roughness in the neighborhood. However, if any neighbor has ordinate different from yi, then the rough uncertainty arises in W. Although the neighbors are similar from the features perspective, they are not similar from the magnitude perspective. It makes the input–output relationship oneto-many. This uncertainty can be captured using rough ownership function r : X · Y ! [0, 1]. The rough ownership function for the data point hxi, yii with the neighborhood W is defined by Sarkar (2002) ri;W ¼
kW \ Sk ; kW k
ð11Þ
where S is the set of data points in W with magnitudes yi, and kWk denotes the cardinality of the set W. If all the neighbors have magnitudes yi, then ri,W is equal to one indicating that the neighborhood is smooth. In contrast, if ri,W is equal to zero, then possibly hxi, yii is an outlier.
M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454
The confusion is maximum when half of the neighbors have the magnitude yi, and the remaining half of the neighbors have different magnitudes. Thus ri,W = 0.5 indicates the maximum roughness. Note that similar kind of formulation is also used in literature to measure rough inclusion (Polkowski and Skowron, 1996). Next we make the situation more complex, but closer to the reality. Till now we have assumed that each neighbor resides in the structure W equally and completely. Now every training pattern belongs to W with different degrees, i.e., the training pattern closer to hxi, yii along the time axis belongs to the neighborhood W to a high degree, and the training pattern far away from W supports the neighborhood by a negligible amount. Therefore, W spans all the training patterns. The similarity along the time axis can be measured using the value of a Gaussian at the point hxi, yii. One possible way to define the Gaussian is 2 lx ði; j; dl Þ ¼ exp ðxi xj Þ =ð2d2l Þ ; ð12Þ where dl is the width of the Gaussian. Hence the amount of the total roughness at the point hxi, yii is n 1 X l ði; j; dl ÞlS ðjÞ; ð13Þ si;dl ¼ n 1 j¼1 x j6¼i
where lS(j) = 1 if yj = yi, otherwise lS(j) = 0. We can still fine-tune Eq. (13). Till now we have considered only the neighbors that have same magnitudes. We relax it to know whether the magnitudes of the neighbors are similar or not, i.e., we modify the characteristic function lS to the fuzzy membership function ly. Specifically, ! 2 ðy i y j Þ ly ði; j; ry;i;dl Þ ¼ exp ð14Þ 2r2y;i;dl represents the fuzzy similarity between hxi, yii and hxj, yji along the ordinate, where vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uPn 2 u j¼1 lx ði; j; dl Þðy i y j Þ Pn ry;i;dl ¼ t ð15Þ j¼1 lx ði; j; dl Þ indicates the spread of the Gaussian along the ordinate and around the ith data point hxi, yii. Thus, we incorporate the concept of fuzzy similarity in the rough ownership function to obtain the following fuzzy-rough ownership function: ii;dl ¼
n 1 X l ði; j; dl Þly ði; j; ri;dl Þ. n 1 j¼1 x
ð16Þ
j6¼i
Note that when ry;i;dl ¼ 0, Eq. (14) becomes undefined. To avoid it, in this case ry;i;dl is made equal to one. For an absolutely smooth and horizontal time series, the fuzzy-rough ownership value at each data point would be close to one. In contrast, around a sudden jump or around a discontinuity, the fuzzy-rough ownership value would be close to zero. The confusion about the smoothness of the
451
time series is high when the fuzzy-rough ownership value is in between zero and one. 3.3. Scaling of fuzzy-roughness The ruggedness in terms of fuzzy-roughness at the data point hxi, yii is quantified by ii;dl , and this value varies from point to point. We would like to find the average fuzzyroughness at any data point. Then we would examine at what rate the average fuzzy-roughness changes when the resolution of the time series is changed. We define a term called partition function that measures the average fuzzy-roughness at any data point of the time series. It is I FR ðdl Þ ¼
n 1X ii;d . n i¼1 l
ð17Þ
We next investigate at what rate the average fuzzy-roughness scales down/up when the resolution of the time series is increased/decreased. We are particularly interested to know whether any power law relationship ðI FR ðdL Þ / dDFR Þ holds. If there is any particular power law, then it can be calculated from DFR ¼
logðI FR ðdl ÞÞ . logðdl Þ
ð18Þ
Ideally, DFR should be measured when the lengths of the neighborhood intervals are very small i.e., when dl ! 0. Thus, DFR ¼ lim
dl !0
logðI FR ðdl ÞÞ . logðdl Þ
ð19Þ
In reality, at the limits of the resolution, DFR does not remain constant. Moreover, DFR fluctuates due to the noise and outliers. Therefore, a more refined estimate can be obtained from the slope of the best-fit line that passes through hlog(dl), log(IFR(dl))i "i = 1, 2, . . ., L, where L is an integer such that the Gaussian with the width dl is sufficient to cover the time series with a high degree. If we cannot fit a straight line due to the randomness of the data, then most likely the power law does not hold for the time series, and in that case, the fractal dimension cannot characterize the time series. Thus, using the power law, the fractal dimension DFR measures how the fuzzy-roughness scales down/ up when the resolution is increased/decreased. The proposed fractal dimension acts as a feature, which reflects the self-similar property of the time series. The algorithm is shown in Fig. 2. 3.4. Salient aspects of the proposed method The use of Gaussians enables us to interpret the basic philosophy of the fractal in a different manner. The Hurst exponent is proposed presuming that there exists a set of generators. It is assumed that the part and the whole time series are constructed by the same generator (Mandelbrot,
452
M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454
Fig. 2. The algorithm to compute the fuzzy-rough set-based fractal dimension DFR. Note that we need to compute lx(i, j, dl) for all i and j before computing ly ði; j; ry;i;rl Þ.
1983). However, the part and the whole time series look different since in these two cases the occurrences of the generators are different. The Hurst exponent aims to capture how the probability of occurrence changes when the resolution is changed. In reality the number of generators may be infinite, and hence, we may not have any access to know how many of them are present and how they are, and hence it is difficult to estimate their occurrences. Instead of trying to know the generators, it is more attractive to observe the pattern formed in a local region, and how the fuzzy-roughness evolves when the resolution is increased. The proposed dimension is directed along this direction. The advantages of the proposed method are as follows: 1. The fractal nature of the time series is viewed as how the fuzzy-roughness changes with decreasing resolution. Unlike the Hurst exponent, the proposed method does not need to assume any particular generator and the occurrence of the generator. 2. The change of the length of the interval does not change the fractal dimension abruptly. 3. When the dimension is computed from the slope of the best-fit line, we do not face any stair-casing effect.
4. The Hurst exponent can be derived from the proposed dimension. 5. Depending on the domain, other results of the fuzzy set theory can be incorporated into the proposed dimension.
3.5. Proposed technique and generalized fuzzy-rough sets The rough set framework proposed in (Pawlak, 1982) is based on the equivalence relation. In other words, this rough set framework relies on the reflexivity, symmetry and transitivity properties of a relation. Various modifications of this framework have been proposed by relaxing some of those three properties (Inuiguchi et al., 2003; Intan and Mukaidono, 2002). For example, (Slowinski and Vanderpooten, 2000) proposes tolerance relation where the reflexivity and symmetry should hold, but the transitivity may or may not hold. Using the tolerance relation, the concept of rough sets has been generalized. All the three properties reflexivity, symmetry and transitivity have been relaxed in (Wu et al., 2004), and the concept of rough set is generalized for any kind of binary relation. Let U and W be two finite universes. Suppose that
M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454
R is an arbitrary relation from U to W. We can define a setvalued function f : U ! PðW Þ by f ðxÞ ¼ fy 2 W : ðx; yÞ 2 W g;
x 2W.
ð20Þ
Obviously, any set-valued function from U to W defines a binary relation from U to W by setting R = {(x, y) 2 U · W : y 2 f(x)}. For any set C W, a pair of lower and upper approximations can be defined as RðCÞ ¼ fx 2 U : f ðxÞ Cg;
ð21Þ
RðCÞ ¼ fx 2 U : f ðxÞ \ C 6¼ ;g.
ð22Þ
RðCÞ ¼ hRðCÞ; RðCÞi is referred to as a generalized rough set. The above definition of rough sets can be used to generalize fuzzy-rough sets. Let R be an arbitrary fuzzy relation from U to W. Define the mapping / : U ! PðW Þ by /ðx; yÞ ¼ Rðx; yÞ
ðx; yÞ 2 U W .
453
because of the existence of inconsistent data. The remaining 1979 data points were used for the experiments. We have compared the errors in estimating the proposed dimension and the Hurst exponent. Using Eq. (17), we have calculated log(IFR(dl)) and log(dl) 100 times (i.e., L = 100) for different values of dl. The minimum and maximum values of d are the half of the minimum and maximum distances between any two data points. Assuming q = 1, log(IFR(dl)) is plotted against log(dl) (Fig. 3). After fitting the best-fit line through these points, we have obtained DFR as 0.9010. The smoothness of the best-fit line indicates that the time series follows the power law of fractal theory. By plotting log(IR/S(dl)) vs. log(dl) for different values of l, we have obtained the Hurst exponent as 0.5487 (Fig. 4). We can observe that the plot for the Hurst exponent is not as smooth as that of the proposed
ð23Þ
For any a 2 ½0; 1; /a : U ! PðW Þ is defined as /a ðxÞ ¼ fy 2 W : /ðx; yÞ P ag;
x 2 U.
ð24Þ
For a given a 2 [0, 1] any set C W, a pair of lower and upper approximations can be defined as /a ðCÞ ¼ fx 2 U : /a ðxÞ Cg
ð25Þ
ðCÞ ¼ fx 2 U : / ðxÞ \ C 6¼ ;g. / a a
ð26Þ
h/ðCÞ; /ðCÞi is called generalized fuzzy-rough set, where /ðCÞ ¼ _a2½0;1 ða ^ /1a ðC aþ ÞÞ; ðC a ÞÞ. /ðCÞ ¼ _a2½0;1 ða ^ / a
ð27Þ ð28Þ
Note that the fuzzy-rough ownership function (Eq. (16)) can be used in the case of generalized fuzzy-rough sets (Eqs. (27) and (28)). Consequently, the computation of the fuzzy-rough set based fractal dimension remains same while using generalized fuzzy-rough set.
Fig. 3. The plot to determine the proposed fractal dimension for the data set shown in Fig. 1. Here d is the width of the Gaussian, and IFR is the partition function of the proposed dimension.
4. Results and discussions The proposed method was tested on different benchmark ICU data sets available in the UCI repository of machine learning databases (Blake and Merz, 1998). In the data set, as shown in Fig. 1, the ordinate represents systolic arterial pressure in mm Hg, and the abscissa indicates twelve-hour duration during the ICU treatment of an adult respiratory distress syndrome (ARDS) patient under mechanical ventilation. The blood pressure was monitored continuously approximately once in every twelve seconds. Other dependent parameters like mean airways pressure and tidal volume were recorded occasionally. Each data point is numeric, and the total number of data points is 1985. The collected data set is in compressed form; here values for the continuously monitored parameters remain steady between consecutive recorded measurements, and thus a lack of recorded measurements should not be interpreted as a lack of data. We removed six observations
Fig. 4. The plot to determine the Hurst exponent for the data set shown in Fig. 1. Here d is the length of the interval, and IR/S is the partition function of the Hurst exponent computed using the R/S method.
454
M. Sarkar / Pattern Recognition Letters 27 (2006) 447–454
dimension. This kind of stair-casing effect is occurring due to the presence of the crisp boundary in the calculation of the Hurst exponent. To compare the Hurst exponent and the proposed dimension, the error of fit of the proposed dimension is expressed as E¼
L X 2 ½m logðdl Þ þ c logðI FR ðdl ÞÞ ;
ð29Þ
Acknowledgments A strategic research Grant RP960351 from the National Science and Technology Board and the Ministry of Education, Singapore, has supported the work of this paper. The encouragement of Prof. Tze-Yun Leong, National University of Singapore, is highly acknowledged.
l¼1
where m is the slope and c is the intercept of the straight line along the ordinate. The error provides a measure of fit so that the lower the value of E, the better is the fit. We have found E for the proposed dimension as 0.774. Similarly, E for the Hurst exponent is calculated by substituting log(IFR(dl)) of Eq. (29) by log(IR/S(dl)). It results E = 1.4667, which is considerably high compared to E of the proposed dimension. To find how well DFR characterizes and discriminates the time series, we have considered a set of ICU time series from the same patient for the systolic and diastolic blood pressures. The number of data for each time series varies between 1600 and 2000. We have used a training set consisting of 50 time series of systolic blood pressure and the other 50 time series with diastolic blood pressure. Using the Hurst exponent as the sole characteristic of the time series, we have separately found the means of the Hurst exponent for the groups of time series with systolic and diastolic blood pressures. Using these two means, we have classified a set of 100 time series into the systolic and diastolic groups. The result gave 55.23% classification rate. We repeated the same experiment with DFR as the characteristics of the time series. We observed that the classification rate had increased to 62.45%. This experiment shows that the characterization capability of DFR is more than that of the Hurst exponent. The above experiment is carried out on the data collected from the same subject. Next we collected time series from eleven different ARDS patients. Specifically, we collected three systolic time series and three diastolic time series from each subject. Thus we had 33 time series corresponding to systolic (and diastolic) data. The number of data in each such time series was between 1600 and 2000. The aim was to find whether DFR collected from these data can be used to discriminate systolic and diastolic data from new patients. We found that the classification rate was 60.03%, which was slightly lower than what we observed on the data collected from a single patient. It shows that the proposed measure retains the discrimination capability even when it is used with different time series collected from different patients.
References Addison, P.A., 1997. Fractals and Chaos: An Illustrated Course. Institute of Physics Publishing, London. Blake, C.L., Merz, C.J., 1998. UCI repository of machine learning databases. Available from:
. Dubois, D., Prade, H., 1990. Rough-fuzzy sets and fuzzy-rough sets. Int. J. Gen. Syst. 17 (2–3), 191–209. Dubois, D., Prade, H., 1992. Putting rough sets and fuzzy sets together. In: Slowinski, R. (Ed.), Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory. Kluwer Academic Publishers, Dordrecht, pp. 1–2. Hurst, H.E., 1951. Trans. Am. Soc. Civil Eng. 116, 770. Hurst, H.E., Black, R., Sinaika, Y.M., 1965. Long-Term Storage in Reservoirs: An experimental Study. Constable, London. Intan, R., Mukaidono, M., 2002. Generalized fuzzy-rough set by conditional probability relations. Int. J. Pattern Recogn. Artif. Intell. 16 (7), 865–881. Inuiguchi, M., Greco, S., Slowinski, R., Tanino, T., 2003. Possibility and necessity measure specification using modifiers for decision making under fuzziness. Fuzzy Set. Syst. 137, 151–175. Klir, G.S., Yuan, B., 1995. Fuzzy Sets and Fuzzy Logic—Theory and Applications. Prentice Hall, Englewood Cliffs, NJ. Mandelbrot, B., 1983. The Fractal Geometry of Nature. W.H. Freeman Company, New York. Morik, K., Imboff, M., Brockhausen, P., Joachims, T., Gather, U., 2000. Knowledge discovery and knowledge validation in intensive care. Artif. Intell. Med. 19 (3), 225–249. Pawlak, Z., 1982. Rough sets. Int. J. Comp. Inform. Sci. 11, 341–356. Pawlak, Z., 1991. Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers. Pawlak, Z., Grzymala-Busse, J.W., Slowinski, R., Ziarko, W., 1995. Rough Sets. Comm. ACM 38 (11), 89–95. Polkowski, L., 2002. On fractal dimension in information systems. Toward exact sets in infinite information systems. Fundam. Inform. 50 (3–4), 305–314. Polkowski, L., Skowron, A., 1996. Rough mereology: A new paradigm for approximate reasoning. Int. J. Approx. Reason. 15 (4), 333– 365. Sarkar, M., 2002. Rough-fuzzy functions in classification. Fuzzy Set. Syst. 132 (3), 353–369. Slowinski, R., Vanderpooten, D., 2000. A generalized definition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng. 12 (2), 331–336. Vicsek, T., 1992. Fractal Growth Phenomenon. World Scientific, New Jersey. Wu, W.Z., Mi, J.S., Zhang, W.X., 2004. Generalized fuzzy rough sets. Inform. Sci. 160 (1–4), 235–249. Zadeh, L.A., 1965. Fuzzy sets. Inform. Control, 338–353.