FUZZY and systems
sets
ELSEVIER
Fuzzy Sets and Systems 77 (1996) 49-62
Signal analysis in fuzzy information space
g>
Bogdan R. Kosanovi6 a,b,,, Luis F. Chaparro b, Robert J. Sclabassi a,b,c a Laboratory for Computational Neuroscience, University of Pittsburgh, Pittsburgh, PA 15261, USA b Department of Electrical Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA c Departments of Neurological Surgery and Psychiatry, University of Pittsburgh, Pittsburgh, PA 15261. USA
Abstract
A general approach to signal analysis Using fuzzy set theory is presented. A physical process which generates the observed signals is decomposed into several hidden processes that coexist at the same time but to different degrees. A fuzzy information space is constructed from the activities of these hidden processes. It is suggested that the hidden processes characterize the regions of attraction in feature space that are modeled by temporal fuzzy sets. The signal analysis is performed in fuzzy information space, instead of raw signal space, because the motivation is to fit the analysis to the most significant facts about the dynamics instead of fitting the noise in raw data. Analysis of electroencephalographic signals during sleep is presented to illustrate the method.
Keywords." Signal analysis; Systems theory; Temporal fuzzy sets; Hidden Markov modeling; Time series
1. Introduction In this paper we present a methodology that uses fuzzy set theory [39] to identify important aspects of system dynamics from observed signals. Significant information about the system dynamics is mapped from raw signals into a fuzzy information space J , that is constructed using membership functions oftemporal fuzzy sets [21, 23]. This mapping is referred to as hidden process modeling [20]. Signal analysis may then be performed in the space J , that characterizes the motion of the signal vector in both a qualitative and quantitative fashion,
Signals are functions of an independent variable, with time being the most common,
signal := f ( t )
for f : Y - ~ ~ ,
(1)
where t is the time variable, ~-- is the time domain, and ~ is the range of f ( . ) . When several physical, or mathematical, variables are used to describe the signal at a single spatial location, the range ~ is multivariate, and the signal function becomes a vector-valued function, f ( . ) . A nonlinear lumped system can be described by an nth-order scalar differential equation, or its equivalent first-order vector differential equation [36],
£¢ = G(x,t),
(2)
which expands into the set of n scalar differential This work was supported by the National Institute of Mental Health under Grant No. MH41712. * Corresponding author. E-mail:
[email protected], 0165-0114/96/$09.50 (~) 1996 Elsevier Science B.V. All rights reserved SSDI 0 1 6 5 - 0 1 1 4 ( 9 5 ) 0 0 1 2 8 - X
equations
xi = Gi(Xl,X2 . . . . . Xn, t)
(i = 1,2 . . . . . n).
(3)
B.R. Kosanovi~ et al./ Fuzzy Sets and Systems 77 (1996) 4942
50
The vector x = { X l , X 2 . . . . . Xn} in (2) is the state of the system, while its components xi (i = 1,2 . . . . . n) are the state variables. The system of interest is specified by the vector-valued function G(x, t). The vector space 5r, spanned by the state variables, is the state space. Conditions can be specified under which the differential equation (2) has a unique solution, x(t, Xo, to), for any admissible fixed initial condition (x0, to ) [36]. As pointed out by Siljak [36], the solution x(t, xo, to), which passes through the state Xo at time to and is observed at time t, may be regarded as a transition function that specifies how the initial state X(to) is transformed into x(t). Thus, the solution x(t, Xo, to) characterizes the motion of the system from the state X(to) to some state x(t). Moreover, it is possible to define axiomatically a continuous-time dynamic system by its motion, using a reasonable set of hypothesis about the functions x(t, Xo, to) [30, 23]. The solution x(t, Xo, to) specifies all the state variables,
A fuzzy set W is characterized by a membership function/~w('), from the universe of objects X into the unit interval [39]. A fuzzy set induced on a universe whose elements are ordered in time is defined as a temporal fuzzy set [21, 23]. That is, if the universe is an ordered set, it would induce the ordered fuzzy set W < with a corresponding membership function ktffv(.) that has a shape characteristic of the particular way the objects in X are ordered through the crisp linear ordering relation -< [23] (see Fig. 1). Ordered fuzzy sets do not relate the elements of X since the elements x E X are previously ordered through a crisp ordering --<. Whenever the relation -< is applied directly on a time variable, as in Fig. 1, it may be called temporal ordering. For example, consider the physical system governed by an ordinary differential equation (2), and its solution x(t, Xo, to) in the form of (4). The system of equations (4) may be rewritten in vector form as
xi = x i ( t )
xt = X ( t ) ,
( i = 1,2 . . . . . n),
(4)
which in an n-dimensional state space may be used to construct a curve with time t as a parameter. This curve is referred to as the trajectory in the state space, Clearly, Eqs. (4) have the same form as the signals defined by (1); thus, by analogy - we may think of analyzing the motion of a signal vector f ( . ) as characterizing the dynamic properties of the observed signals. The space spanned by these signals is referred to as the raw signal space or measurement space, J/¢'. In practice, the complete knowledge of the system structure and order (number of state variables) is not available. Therefore, the measurement space may not completely capture the system dynamics, as the important information about the system behavior may be unevenly distributed over a number of signals. The above discussion summarizes our motivating conceptual approach towards signal analysis in fuzzy information space, This paper is organized as follows. In Section 2, we introduce and define temporal fuzzy sets. Section 3, then outlines hidden process modeling that is used in Section 4 to construct the fuzzy information space. An experimental application based on EEG signal analysis is presented in Section 5. Finally, the present work and future developments are discussed in Section 6.
2. Temporal fuzzy sets
(5)
where the vector function X(.) is the state space trajectory for the specified initial conditions (xo, to). Considering the observation interval ~ = [to, tl ] and using the trajectory X(.), one can generate the universe of objects X* = X ( ~ ) with elements xt E X* in temporal order. Given an arbitrary fuzzy set W in X*, the vector function X(.) induces a fuzzy set H in Jo as suggested in [39]. That is, the membership function of H is defined as #H(t) =/zw(.V),
y E (X*, -<),
(6)
for all t E ~ that belong to the inverse image of y, i.e. Vt E X-I(.V). The symbol -< indicates that the universe X* is ordered in time. The resulting fuzzy set H, induced from the dynamic trajectory (X*, -<), is then a temporal fuzzy set, since the observation interval ~ is ordered in time. For clarity, we use the superscript notation W -<, instead of H, to denote a temporal fuzzy set induced from a fuzzy set W in (X*, -<). The membership function #~v(') maps the observation interval into [0, 1] C ~ as determined by (5). Thus, the influence of a property modeled by a fuzzy set W in X* on the overall dynamic behavior may be
B.R. Kosanovik et al. / Fuzzy Sets and Systems 77 (1996) 4942
a)
b)
9
24
27
6
F2 18
24 24
30~
15
F2
21
8
~o
~e
8 .
.
8
.
.
16
24
.
. 32
~ 40
48
. 24
. 32
d)
0.50
0.25
0.25
¥ 12
. 16
c)
0.50
-
. 8
FI
0.75
6T
o
FI
0.75
-
51
' 18
24
" 30
~-
6.
Time Is]
. 12
. 18 .
40
24
48
30
Time [sl
Fig. 1. The shape of a membership function #~< changes if the elements of a universe are reordered. Both dynamic trajectories, (a) and (b), reside in a two-dimensional feature space ~,~ = F l x /'2. (a) dynamic trajectory (X,-<) = {xo,x3 . . . . . x30} with At = 3s; (b) dynamic trajectory (Y,-K) = {xo,x15,XlS,X21,x3,x6,x9,x12,x30,x27,x24} obtained by reordering the elements of (X, ~); (c) membership function of a temporal fuzzy set B "< induced from the dynamic trajectory (X,-~); (d) membership function of a temporal fuzzy set B -< induced from the dynamic trajectory (Y,-<). Clearly, X = Y, but ( X , ~ ) ~ (Y,-<). The regions of attraction for (X, -.<) and ( Y, -.<) are centered around the poles A and B that are marked by bullets. The membership functions are estimated by assuming the class prototypes for a given collection of trajectory samples (see Section 2.1 ).
quantified at each time instant, t C °Jo, with the membership value p ~ ( t ) = pw(xt). Clearly, the membership functions of temporal fuzzy sets are
ofwakefulnessforasamplex C X. Since theelements of X are ordered in time, one can induce a fuzzy set W "< in ~--using ( 6 ) a s
functions of time. In practice, the state space trajectory may not be available. Hence, one may consider a collection of feature samples X, extracted from various signals, e.g. physiological signals during sleep, over a time interval ~-. The elements of X lie in a feature space ~ that characterizes the process. Together, the samples form a feature space trajectory that is expressed in terms of a vector valued function X: Y ---+ ~ , i.e. X = X ( J - ) . Next, let W be a fuzzy set that describes a property, e.g. wakefulness, and/~w(x) be the degree
p~,(t) = pw(xt),
xt E (X,-~),
(7)
for all t E ~--. At any time t during the observation period ~--, a temporal fuzzy set W ~ makes it possible to quantify the degree of wakefulness (or some other property relevant to the process of interest) with a real number from the unit interval [0, 1]. When the membership function of a temporal fuzzy set is close to one, the property is almost completely satisfied (i.e. the person is awake), whereas when the values are
B.R. Kosanovik et al./ Fuzzy Sets and Systems 77 (1996) 4942
52
a
"
....... Fig. 2. A single dynamic trajectory with two regions of attraction that are centered around the poles A and B. The parts of a trajectory within the regionsof attractionare not shown.The transitionlinks b-d are grouped into a transition pathway PI, while the link a is
isolated, close to zero, the property is not present (i.e. the person is asleep). At any other time, the property is partially satisfied to a degree quantified by the membership function of the temporal fuzzy set. We construct temporal fuzzy sets in such a way that they characterize the regions of attraction in a space where the trajectory lies. In general, that is the feature space ~-, and the feature space trajectory (X, --<). The regions of attraction are then considered the areas in space ~', "visited" by the feature vector for either prolonged periods of time or quite often, and in a specific pattern or sequence. The fragments of a trajectory that connect the regions of attraction are called transition links. If several links are "close" to one another they form a transition pathway. These concepts are illustrated in Fig. 2. In general, regions and pathways can overlap. Hence, it is often hard to distinguish one from the other. Pathways with a large number of links may "look" like regions of attraction,
2.1. Membership function estimation To estimate the membership function of a temporal fuzzy set W -~ we start with (7); that is, we determine the membership function p w ( ' ) o f a property modeled by W. Then, #2(') is obtained by tracking the feature vector as it travels along the trajectory (X, -<). In most practical applications, the trajectory is given by a collection of samples, or feature vectors, equally spaced in time. Thus, the membership function estimation problem may be approached as a pattern recognition
problem. By partitioning the samples of a trajectory into classes one obtains the regions of attraction. Since in general, it is not possible to determine crisp boundaries between classes, fuzzy partitioning is a preferred methodology [ 1, 4]. At this point, there are two ways to proceed. One is to use a priori knowledge about the problem and propose reasonable class prototypes that model the regions of attraction by approximating the corresponding poles (as shown in Fig. 1). The membership functions are computed using an appropriate distance metric within a predefined analytical expression (as in (8) and (9)), or relative to some objective function (which may or may not be optimized for the assumed prototypes). An alternative approach is unsupervised pattern recognition, prompted by a lack of precise knowledge about a physical process. Then, the membership functions are estimated through a suitable unsupervised fuzzy partitioning method, e.g. fuzzy cmeans (FCM) [2]. Usually, a major difficulty is to determine the proper number of regions of attraction (i.e. number of temporal fuzzy sets, or fuzzy clusters)that jointly characterize the dynamic behavior presented by the measured signals. This problem is known as the cluster validity problem and it has been thoroughly investigated by a number of researchers, [1, 8, 34, 35, 38]. However, a question that remains is how to include the specific properties of a dynamic trajectory when assessing cluster validity. The initial work reported in [23] uses cardinality-based functionals that perform well for the quasi-stationary processes as defined in [19]. In [23], it is assumed that whenever the cardinality of the crisp clusters of a nearest maximum membership (NMM) partition [1] drops close to zero, an upper bound for the reasonable number of clusters is reached. For the case of samples that are normal mixtures, the cardinality will correspond to the a priori probabilities [8, 9]. Hence, the partitions that are made of classes with extremely small a priori probabilities are considered unreasonable. An example based on Fig. 1 has been solved presuming the existence of two clusters with the prototypes placed at "reasonable" positions with respect to the trajectory samples. Therefore, the coordinates of the prototypes may differ from the results of unsupervised fuzzy partitioning. The membership functions are computed assuming a simple similarity relationship, based on a distance between the
B.R. Kosanovi? et al. / Fuzzy Sets and Systems 77 (1996) 4 9 4 2
53
Table 1 The sample (xt E X, Yt c:::Y) and prototype (zi E i f ) coordinates for the dynamic trajectories of Fig. 1. The feature space is ~,~ = F1 × F2. (Y, 4 ) (g, -~ )
Yo x0
Y12 x3
Yl5 x6
YI8 x9
Y21 x12
Y3 x15
Y6 x18
Y9 x21
Y30 x24
Y27 x27
Y24 x30
ZA
7,B
Fl Fe
10 8
9 14
8 20
12 24
17 22
34 20
42 19
48 16
52 20
48 26
46 32
14 18
47 21
samples xt and the prototypes zi, resembling the one in [42]: 1 2
hi( xt ) - 1 + d ( xt, zi )
,
i E {A,B},
(8)
where d(., .) is the Euclidean metric. The values of h i ( ' ) a r e then normalized, for each sample, so that the membership functions &(.) add to unity, i.e. ~ i Iti(xt) = 1, for all t. Specifically,
#i(xt) def
hi(xt) i E {A,B}, (9) ~'~jE{A,B} hj(xt)' for all t. The actual sample and prototype coordinates are listed in Table 1. If the prototypes are not known in advance, unsupervised fuzzy partitioning is applied [23]. Using the fuzzy c-means clustering algorithm, with fuzzy exponent m = 2 and maximum norm error bound set to Cmax = 5 × 10 -5, we obtain the solution for this example (Fig. 1 ) after eight iterations. As expected, the estimated prototypes are at slightly different locations: zA = (11.38,17.69), zB = (45.62,22.18), compared with ZA = ( 14, 18) and z8 = (47, 21 ), that are used in (8). However, the membership functions of the corresponding temporal fuzzy sets are very close at do~(U, UvcM) = [[U - UFCMIIo~= 0.0841, where U and Urc~ are the fuzzy partition matrices [ 1] obtained with (8) and (9), and the FCM algorithm, respectively. Alternative ways for estimation of temporal fuzzy sets are discussed in [23].
3. Hidden process modeling Recalling Fig. 2 and the definition of the regions of attraction, one can consider temporal fuzzy sets as providing a dynamic profile of a physical process. A collection of temporal fuzzy sets that characterizes the dynamic activity, or profile, of a system is composed
Table 2 A HMM with c = 4 hidden states, and a finite set of admissible observation symbols, V = {Vl,V2,V3}. At each time unit two random experiments are performed. One to determine the output symbol given the current state, and the other to decide about the possible state transition. Observations (UL)
Hidden states (LL) Clock time
tq
q4 1
v3
q4 2
v2
ql 3
/;2
"'"
P3
q3
""
q2
4
--.
T
of the membership functions and cluster prototypes. Membership functions quantify The Activity, while the prototypes provide quantitative physical characterization of The Activity. Since the membership functions of temporal fuzzy sets are functions of time, and given the relationship between probability and fuzziness [7, 25], one can think of The Activity as a collection of hidden processes that coexist at the same time but to different degrees, i.e., a generalized hidden Markov model [20, 29]. A hidden Markov model (HMM), as defined in [31], "is a doubly stochastic process with an underlying stochastic process that is not observable (it is hidden), but can be observed through another set of stochastic processes that produce the sequence of observed symbols". That is, a HMM breaks the physical process into two levels, as shown in Table 2. The upper level (UL) contains a sequence of symbols {Oi} that is observable. The lower (hidden) level (LL) is based on the assumption that at each time unit, the process behavior is governed by a single hidden state qj. The elements of the output sequence {Oi} belong to a set of admissible observation values V that in general may be uncountable. A set of all admissible states is Q = {ql . . . . . qc}, where c E N is the total number of states. Hidden process modeling (HPM) approaches the observed physical process in a different manner. Instead of forcing the process realization into a number of discrete states with a collection
B.R. Kosanovik et al./ Fuzzy Sets and Systems 77 (1996) 4942
54
l
RawSignals
(MeasurementSpace)
Feature Extraction
[ DynamicTrajectory 1 Hidden
[
t
(FealureSpace)
Process Modeling
TemporalFuzzySets ] 1
(Fuzzylnformatlon Space)
Fig. 3. Raw signals from measurement space J g are used to obtain the dynamic trajectory in feature space ~-. As a result of the hidden process modeling, the temporal fuzzy sets are induced in the form of a dynamic profile with membership functions that construct the fuzzy information space J .
temporal fuzzy sets to a binary range {0, 1} and foreing them to be crisp. Assumption (i) is especially useful when prediction is not at issue, that is when considering only
signal or system analysis. Hence, it is the one to be explored in the rest of this paper. While the last possibility has been heavily explored under the name of hidden Markov modeling, time prediction as stated in (ii) is another open research problem and will not be directly addressed in this paper. However, one possible solution to problem (ii) will become obvious upon introduction of the concept of fuzzy information space. A recent result of Mohamed and Gader [29] may be useful for further investigation of this problem.
4. Fuzzy information space of probability distributions that describe the transitions and observations, we start by observing the regions of attraction in feature space. Then, each region is characterized by a temporal fuzzy set, or a hidden process. At any time, the hidden processes coexist to the degrees quantified by their membership functions, Hence, the hidden states of a HMM are replaced by the hidden processes. An important advantage of HPM comes from the observation that complex dynamic processes are very likely to be composed of a large number of parallel processes all o f which are active at all times but to different degrees. This is particularly true for physiological processes, where often several systems act together to produce a summary process, In other words, a HPM appears to be a natural tool in such situations. The set of admissible symbols V of a HMM is replaced by the set of admissible values that a dynamic trajectory (X,-<) can take during the observation. A natural question is where does randomness go? There are three possible situations, (i) The physical process may be considered deterministic. (ii) The hidden processes (membership values) at each time constitute an outcome of a f u z z y event [7, 16, 40]. Fuzzy events in this case may be defined using possibility distributions [6, 18], while the transitions between the fuzzy events are random. (iii) The HPM is reduced to a classical HMM by letting only one hidden process exist at any given time, i.e. by restricting the membership functions of
The HPM yields a dynamic profile of the physical process. The space constructed from the membership functions of the resulting temporal fuzzy sets may be called the f u z z y information space J . While the raw signal space J / i s based on the measurements of individual signals, the fuzzy information space J provides the global information about the system dynamics related to the regions .of attraction. Fig. 3 summarizes the construction of fuzzy information spaces. The space J always lies within the unit hypereube q/ = [0, 1]c C I~c, where c E I~l is the number of hidden processes (regions of attraction or temporal fuzzy sets). If the membership functions are estimated using the FCM algorithm, the fuzzy information space J is reduced to a convex hull generated by the standard basis in R c. Then, J = eonv(B¢),
(10)
where Bc = {el,...,ec} is the standard basis in I~~ with ek =
(t~lk .....
•ck),
(~ik = { 1, i = k, - 0, otherwise,
(11)
and conv(B~) =
( ~ c / X E RCl x ---- ~ o~iei; ~ O~i i=l i=l
=
1;
) ~i E ~ + [.3 {0}; ei E Bc I " (12)
B.R. Kosanovik et al. I Fuzzy Sets and Systems 77 (1996) 49~2
p_~
'
1 ~ ~t~ A1
Ac
~t7 Fig. 4. Fuzzy informationspace constructedfromthe membership functions of temporal fuzzy sets ~¢ = [0, 1]c. If the membership functions p ~ , ..., pc~ are constrained to sum to unity, the fuzzy information space becomes a convex hull (shaded region) generated by the standard basis in I~c, J = conv(Bc). Every point within the fuzzy information space quantifies the activities of hidden processes. In the case of HMM, the extreme points of the convex hull, i.e. Q = {Al . . . . . Ac} = Bc, are the admissible states (and they are all crisp). HPM allows several processes to coexist at the same time but to different degrees, i.e. it admits the points anywhere within the fuzzy information space .~¢ so that, for in-
stance, point D is admissible,
The geometry of a fuzzy information space is illustrated in Fig. 4. Applying hidden process modeling, every point x on a dynamic trajectory (X,-<) C ~ is mapped to a point g in J . Hence, the dynamic trajectory (X, -<) is mapped into the fuzzy membership trajectory (M, -<) that is contained in J . For the case of the FCM algorithm (with Euclidean distance in ~ = ~P, and fuzzy exponent m = 2), the velocity of a fuzzy membership vector OlJ/Ot will simply depend on the radial components of a feature vector velocity Ox/Ot relative to the poles of attraction, as shown in [23], i.e. ~
1 2
~ ~
rj°
r°k ~xk I
djk
dik 2
0~(xk) = j=l
0t
,
(13)
5~djk
where r°k -- (xk - Z~)/llxk - zil[ is a radial unit vector oriented from the cluster center z~ (i.e. the pole of attraction Pi), towards the sample Xk, dik = [Ixk -- zi [] is the Euclidean distance between sample xk and
55
prototype zi, and (., .) is the usual inner product on []~P [27]. Clearly, the membership velocity t~tt/~t, in the vicinity of xk, is determined by the relationship between the radial projections of a feature vector velocity, Ox/~t at xk, to the directions r/°k centered at the poles of attraction. Moreover, the fuzzy membership vector/u approaches an extreme point At of the convex hull (10), whenever the feature vector x approaches the corresponding pole of attraction P~ as illustrated in Fig. 5. Each point on the membership trajectory (M,-<) C J determines the degree to which process dynamics is governed by the hidden processes Pi. The similarity between the geometry of a fuzzy information space and fuzzy sets in general [26] is not a coincidence. Namely, whenever the regions of attraction are based o n dynamic properties that have a physical meaning, o n e may construct a meta-universe from these properties to be ~'~M = { e l , . . - , Pc}. Then, the membership vector ~(t) C J may be interpreted as a timevarying fuzzy set induced on f2M. That is, a fuzzy s e t with the time-varying fit values that are determined by the motion of a signal trajectory in ~ . This set is consequently named the dynamic fuzzy set since its membership values change in time. A dynamic fuzzy set should not be mistaken for a temporal fuzzy set. For example, given a fixed time to, we have that /t(t0)= [p~(t0) = [/.tD(Pl )
p~(to) 11D(P2)
"" ...
p~(t0)] v
(14)
]XD(Pc)]TIt=t0,
(15) where [. ]T is a matrix transpose. The values pD(P~) (i = 1,2 ..... c) are the time-varying fit values of a dynamic fuzzy set. They quantify the contribution of the properties from a metauniverse to the dynamics of a system at any given time to. Therefore, a single observation of the raw signals results in one specific dynamic fuzzy set that captures the most significant aspects of system behavior. In practice, scatter plots are more convenient for the visualization of dynamic f u z z y sets, with the points corresponding to time samples. Finally, the hidden process modeling map can be formally defined as H P M : ~ ~ J C [0, 1]~,
(16)
56
B R. KosanoviE et aL /Fuzz), Sets and Systems 77 (1996) 49~52 -4
~3
40
HpM~> Pl
0
0
P3
I
I
i
20
40
60
~"
A2
' ~'-
,~ g2
Fig. 5. Hidden process modeling (HPM) maps the feature space trajectory (X, 4 ) C ~ into the fuzzy membership trajectory (/14, 4 ) C J . For the case of an FCM algorithm, whenever the feature space trajectory approaches a pole of attraction Pi, the fuzzy membership trajectory comes close to the corresponding extreme point of the convex hull Ai.
where c is the number of hidden processes, and p = HPM(xt), Vxt E (X, -<) C ~ , is calculated through a suitable fuzzy partitioning algorithm. Then, the fuzzy membership trajectory (M, -<) is an image of the feature space trajectory (X, -<) under the HPM map, i.e. (M, -<) = HPM[(X, -<)].
(17)
The membership function of a dynamic fuzzy set p(t) is a multivariate time series. Each variable #~(t) is a membership function of the temporal fuzzy set that characterizes the corresponding region of attraction. Signal analysis may then be performed in fuzzy information space J instead of raw signal space J / . Any number of well established algorithms may be applied on the membership space vector p(t), e.g. smoothing, filtering, spectral analysis, prediction, etc.
5. Application To illustrate the concept of analysis in fuzzy information space, we use electroencephalographic (EEG) signals collected during sleep. Recordings from one channel (C4 referenced to linked mastoids) have been processed in order to develop an initial understanding of the method. The traditional approach to characterizing sleep process involves visual inspection and classification of each 30 s of data into one of the following stages: awake, 1, 2, 3, 4, or R E M (rapid eye movement) sleep [33]. Additional physiological signals may be added to
consideration if necessary, e.g. respiration, body temperature, or similar [12]. We have concentrated on the first hour of sleep onset after the "lights out" event. This interval contains a difficult to characterize transition of "falling asleep" followed by classic non-REM sleep dominated by high-energy low-frequency 6-waves (0.5-4 Hz). Without making prior assumptions on the type of sleep stages, we applied hidden process modeling to obtain the sleep dynamics in an unsupervised fashion [22, 23]. The feature space W is constructed based on the energy of the EEG in eight frequency bands [23]. Both the raw data and the enhanced image [24], where the intensity values correspond to the magnitude squared of a short-time Fourier transform (STFT) [32], for a typical segment of the EEG signal, are shown in Fig. 6. Numerically, every feature is bounded to the [0, 1] interval; hence, o~ -- [0, 1]8. Time quantization of a dynamic trajectory is performed for 3 s intervals. Since the feature space is derived from a positive timefrequency distribution [5] and the EEG signals are considered as quasi-stationary, the cluster prototypes approximate the spectra of the hidden processes [19]. Analysis is started with a reasonable physiological assumption that at least two global processes exist, awake and sleep, i.e. c = 2. The FCM algorithm is applied on the feature vectors extracted for the first hour of sleep and the dynamic profile, calculated with a Euclidean distance norm and fuzzy exponent m -- 2, is shown in Fig. 7. As it was pointed out in Section 3, temporal fuzzy sets are characterized by their membership functions and cluster prototypes. In
B.R. Kosanovik et al./Fuzzy Sets and Systems 77 (1996) 49~52
57
sO1 p 3 v l .ime
: • ~.!:~ '
i:: iiiiii~i:;i:i! : i:i~~: : : i i:.~..::ii ~
' ~"~!~:~i ": "
-r
~": ":~'~::~i., ~"
~ I
16 ~ 12
:i::~i:ii~i: : : ' :;! :i~,::~:~:~ .:!;
. . . . . ..i
~ .:::: ::: ::-
. i
::~::..' ::",.~:::.:
? i.
::.~
:::'~-~"--'~.:: :..:~::..~:'::~.
sO1 p 3 v l .img I
t
8
::~i ::
:
:
:
:-
:::
i : :!
i ::
.'~:~:i.:~ . . . . . ~,~:..'!~:i:::.L'. "
:.:',,.:~'..
i
:ii.
i:-i
t
~: i :
,':':,~:..::;:!.:~::.~:!:::::::.:~i.:: ! ..~:i:L~:.'::::i : :
~: ~ .~::i::: . . . .
;
s _ e e g l d.O03
-1
.
0.5
1 00:28:00
:
i
:
i
;
i
0 0 : 2 g :00
00:30:00 Time
00:31:00
Fig. 6. A single channel of EEG showing the "transition to sleep" through the beginning of stage 2: (bottom graph) raw signal from C4 electrode; (middle graph) uniformly quantized and enhanced spectrogram image; (top graph) spectrogram image obtained with adaptive quantization and subsequent enhancement. Low energy is represented with bright pixels, and high energy with dark pixels. Both images are enhanced using range compression [15] as reported in [24].
a) 1
0.6
0.6
u~ 0.4
P~
0.4
0.2
0.2
0.8
0.8
o.6
u¢
o! I
1f
0.6
:'2
0.4
0.2 0
b)
1
I - -
',
[
0.4
0.2 0
15
30
Time [mini
45
60
0
i
0
4
8
12
16
Frequency [Hz]
Fig. 7. A typical dynamic profile for the first hour of normal sleep (At = 3 s) under the assumption of two hidden processes. (a) membership functions ( / ~ , I ~ ) of temporal fuzzy sets (hidden processes), estimated with the FCM algorithm (Euclidean distance, c = 2, m = 2). (b) Cluster prototypes (PI, .°2) of temporal fuzzy sets (hidden processes). The prototypes estimate the normalized spectra corresponding to the poles of attraction.
58
B.R. KosanoviOet al./ Fuzzy Sets and Systems 77 (1996) 49~52
1\ 0.a
\
.
"\ \ \
0.6
p(
\ \\
0.4
\ ",\
0.2
o
" ~
\
0'.2
0'.4 f~
0'.6
0'.8
Fig. 8. A scatterplot of the dynamicfuzzyset/~ within the fuzzy informationspaceJ for the case of two hiddenprocessesof Fig. 7. other words, we are not able to interpret correctly the membership function of a temporal fuzzy set without considering the corresponding cluster prototype that carries its physical meaning, or the characterization of a region of attraction. Using basic physiological facts about sleep [33], process P1 of Fig. 7, with membership function #~, is identified as awake, or possibly drowsy, since its prototype suggests the existence of low energy 6and 0-waves (4-8 Hz) and a fair amount of a-wave (8-12 Hz) activity. That is, the hidden process P1 is approximated by an autoregressive moving-average (ARMA) model [ 17] that has such spectrum. It should be noted at this point that the type of a resulting physical model that characterizes temporal fuzzy sets is determined by the selection of feature variables and fuzzy partitioning technique. An important fact is that the process P2, in the same figure, was identified as sleep not just because it is the complement of P1 in a set-theoretic sense, i.e. / ~ ---- 1 - / ~ ( ,
(18)
but also because the corresponding prototype exhibited considerably higher energy spectrum, with an extreme within the delta range which is known to correspond to deep sleep. The dynamic fuzzy set for this analysis is shown in Fig. 8. Since the FCM algorithm constrains the sum of membership functions to unity, the fuzzy information space J for c = 2 is a straight line given by (18).
Next, we applied the HPM for c = 3, 4 ..... 16, that is, we assumed the existence of more than just two hidden processes. The results for c = 3 (Fig. 9) show that the awake process did not change considerably (Q1 ~ P1 ), while the sleep process has been split into two processes, P2 --~ Q2 and Q3. This observation is verified by comparing the dynamic profiles (i.e. membership functions and cluster prototypes). It is also observed that the dynamic behavior of awake process considerably changed 4 min after the "lights out" event. The same change in dynamics was present in the previous analysis for c = 2 (Fig. 7). Sleep transition identified in both cases has started about 15 min. following the "lights out". An arousal which occurred shortly after t = 45 min was also captured. The dynamic fuzzy set for c = 3 is shown in Fig. 10. The change in dynamics detected 4 min after the "lights out" is explained using the analysis for c = 6. Namely, the awake process has split into two parts, $1 and 82, as shown in Fig. 11. Only the first three hidden processes of the dynamic profile are plotted. Upon inspection of the cluster prototypes, a conclusion has been reached that the process S1 is most likely a combination of drowsiness and stage-l, while $2 still represents the awake process (with possibly closed eyes). This characterization is based on the following two facts about the dynamic profile: (i) the prototype of the process $2 exhibits very strong ~waves; (ii) the prototype of $1 has weak a-wave, and slightly stronger 6-wave activity. Since the fuzzy information space is six dimensional (e -- 6), the dynamic fuzzy set cannot be easily visualized. However, a pair of three-dimensional projections is shown in Fig. 12. It is important to note that the dynamic profiles calculated on an additional eight subjects have exhibited similar properties. The corresponding scatter plots matched in shape whenever the cluster prototypes were suggesting similar spectral characteristics for the hidden processes. Nevertheless, actual shapes of the membership functions were quite different, reflecting the individuality and richness of sleep process dynamics. Clearly, the hidden processes identified with an HPM approach do not necessarily correspond to the traditional sleep stages, but provide for an unsupervised partitioning of feature space, resulting in cluster prototypes that are yet to be fully explained.
R R. Kosanovi? et al./ Fuzzy Sets and Systems 77 (1996) 4 9 4 2
a)
59
b) 0.8 0.6
"~
Q'
0.4 0.2
0
15
30
45
0
60
1
0
4
8
I
i
O0
4
8
1
,
12
0.6
0.4
Q2
0.2
0.4 0.2
l - 0
15
30
45
60
,
0.6
#~
3
16
1
0.6
t~
J
1'2
I 16
0.6
0.4
Q3
L[
O.
.-I
0
0,4 0.2
15
30
45
60
Time [mini
0
I 0
4
8
1'2
16
Frequency [Hz]
Fig. 9. Dynamic profile for the first hour of normal sleep (At = 3 s) under the assumption of three hidden processes. (a) Membership functions of temporal fuzzy sets, estimated with the FCM algorithm (Euclidean distance, c = 3, m = 2). (b) Cluster prototypes (QI, Q2, Q3) of temporal fuzzy sets. Process QI did not change considerably when compared to P1 (Fig. 7), except for slight drop of energy in 6 range. Processes Q2 and Q3 are observed to be the consequence of splitting process P2 (Fig. 7).
"
~
i
.
#~<
'
#2
"1 0 O
ILl
Fig. 10. A scatter plot of the dynamic fuzzy set ~ within the fuzzy information space J for the case of the three hidden processes
shown in Fig. 9. 6. Discussion We have presented a general approach to signal analysis based on fuzzy set theory. Temporal fuzzy sets [21, 23] are defined and used in mapping the signal trajectory from the feature space into the membership trajectory o f a fuzzy information space. The
fuzzy information space is constructed from the membership functions of temporal fuzzy sets. How the approach can be used in practice is illustrated by an experimental application of modeling sleep onset dynamics using EEG data. Several important observations can be made. The difference between traditional and temporal fuzzy sets comes from the way the universe is treated. In the traditional case, the order in which the elements o f a universe appear does not matter. That is, the values of a membership function may be written in any order, or equivalently, the membership function is allowed to change its shape when the elements o f the universe are shuffled. I On the contrary, a temporal fuzzy set is constructed from the universe in a specific order, determined by the underlying system dynamics. Thus, it is possible to obtain different temporal fuzzy sets from the same universe if the order o f elements is changed, ~One exception are fuzzy numbers. Essentially, they are a special kind of ordered fuzzy sets since their universe is a set of real numbers. Hence, fuzzy numbers are also induced from an ordered universe. However, a set of real numbers as such may not correspond to either dynamics or time.
B. R. Kosanovi? et al. / Fuzzy Sets and Systems 77 (1996) 4 9 4 2
60
a)
b)
0.8
/~
0.8
o.6
i
0.4
::
,~1 o.6 0.4
0.2
0.2 I
O0
15
30
45
1
,
!
!
o
.
8
~
60
O0
4
i
0.6
04
!
0.4
o.2
::
0.2
1s
30
,
,
o.6
i
0.4
:
0.2
°o
16
1'2
6
I
[ I
1
12
o.s
o.e
°o
8
I[
4s
60
I
°o 1
o.o 0.4 i
0.2
J I
is
30
45
Time [mini
oo
o
12
Io
Frequency [Hz]
Fig. 11. Dynamic profile for the first hour of normal sleep (At = 3 s) under the assumption Of six hidden processes. Only the first three temporal fuzzy sets are shown. (a) Membership functions for the first three temporal fuzzy sets, estimated with the FCM algorithm (Euclidean distance, c = 6, m = 2). (b) Cluster prototypes (Sl, $2, $3 ) for the first three temporal fuzzy sets. Process S1 correspondsto drowsiness, process $2 is awake, and process $3 is closest to stage-2 and early stage-3 that appear during transition from awake to sleep. as shown in Fig. 1. For both dynamic trajectories in Fig. 1, the temporal fuzzy sets model the regions of attraction characterized by specific dynamic properties represented by the poles of attraction (prototypes) A and B. A map between the feature space and fuzzy information space is called a hidden process model (HPM) [20], since it is a generalization of a hidden Markov model (HMM) [31]. HPM regards signal measurements as certain, while the system structure is unknown and uncertain. Hence, feature extraction is used, as in most of pattern recognition problems, to reduce the noise in measurement space, A major advantage of this approach lies in the use of unsupervised pattern recognition when identifying the dynamic properties of signal trajectory, and in providing the quantitative information about the system motion in feature space. While doing so, one attempts to fit the most significant information about the system dynamics instead of fitting the noise in raw signals, This becomes extremely important in the situations where the relevant aspects of system dynamics are unevenly distributed over a large number of measured
signals. Another advantage is that the signal analysis in fuzzy information space may be guided by physical properties of the regions of attraction that are characterized by temporal fuzzy sets. Finally, data fusion is easy to accomplish since the feature space may be constructed from a number of different sources while the fuzzy partitioning generally does not have to be based on purely geometric similarity relations in a Hilbert space. In practice, difficulties may appear in deciding the proper number of hidden processes, or fuzzy clusters. Moreover, as a consequence of the unsupervised partitioning of the feature space, the transition pathways may appear as separate processes if the number of links is high, or if the transient process is slow enough to generate a large number of trajectory samples, as suggested by process $3 of Fig. 11. Future work will examine the development of more robust validity functionals that make use of the fact that feature samples originate from a dynamic trajectory. Also, the feature space may be modified to include some of the dynamic properties of a trajectory, e.g. gradient or velocity components, etc. A number
R R. KosanovE' et al./Fuzzy Sets and Systems 77 (1996) 49~2
a) ~
~
1 o tL~
.
8
~,
J~ -
.
0.6. o.4. 0.2.
~ 1 . ~,.
o 1
,
~
~
~
#2
0 0
ttl
61
Hence, prediction may be done on a dynamic fuzzy set in fuzzy information space, which is an obvious solution for dealing with fuzzy events that may be introduced by HPM. We conclude with an interesting observation as to the relationship between the principle o f i n c o m p a t ibility [41], and fuzzy information space. Namely, whenever the precision and significance concerning complex system behavior become almost mutually exclusive characteristics, the analysis in fuzzy information space may still provide relevant information about the system dynamics. That is, the raw signals provide for precision, while the dynamic fuzzy sets focus o n relevance.
b) ~ 1, ~~'i" •,
Acknowledgements
~
L
~
The authors would like to express their gratitude to Ronald E. Dahl, Neal D. Ryan, and Mingui Sun, for numerous and insightful discussions about sleep onset dynamics and signal analysis.
~
#6~
0"8070.40.2~ ~"~'" ~ 0 " 8 kt5"~
~0
0.2
tta
Fig. 12. A scatterplot of the dynamic fuzzy set/t within the fuzzy information space ..¢ for the case of six hidden processes. Only two 3-D projectionsof the 20 possibleare shown: (a) #1~-#(-#(; (b) -< -~ -< /24 "/25 "/26"
of alternative algorithms for hidden process modeling can be derived using, possibilistic clustering [28], partitioning using similarity relations [3, 14, 13, 37], and derivatives of FCM [ 10, 11 ]. It is possible that different aspects of dynamics can be extracted with these algorithms, and that some of them may be more effective for specific applications. In addition, the heuristic knowledge about the system may be included in consideration through a suitable modification of similarity relations. Since the membership function of a dynamic fuzzy set is nothing else but another time series, a number of well-established algorithms for time-series analysis can be used to process further the measured signals.
[ll J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Plenum Press, New York, 1981). [2] J.C. Bezdek,R. Ehrlich and W. Full, FCM: the fuzzyc-means clustering algorithm, Comput. Geosci., 10 (1984) 191-203. [3] J.C. Bezdek and J.D. Harris, Fuzzy partitions and relations; an axiomatic basis for clustering, Fuzzy Sets and Systems 1 (1978) 111-127. [4] J.C. Bezdek and P.K. Sankar, Eds., Fuzzy Models For Pattern Recoynition: Methods that Search/br Structures in Data (IEEE Press, New York, 1992).
[5] L. Cohen and T.E. Posch,Positivetime-frequency distribution functions, IEEE Trans. ASSP 33(1) (1985) 31-38. [6] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications (Academic Press, New York, 1980). [7] D. Dubois and H. Prade, Fuzzy sets and probability: misunderstandings, bridges and gaps, in: Proc. 2nd IEEE Internat. Conf. on Fuzzy Systems, vol. 2, San Francisco, CA, 28 March-I April 1993 (IEEE Press, New York, 1993) 1059-1068. [81 1. Gath and A.B. Geva, Fuzzy clustering for the estimation of the parameters of the components of mixtures of normal distributions, Pattern Recognition Lett. 9 (1989) 77-86. [91 1. Gath and A.B. Geva, Unsupervised optimal fuzzy clustering, 1FEE Trans. Pattern Anal. Machine Intelligence !1 (1989) 773-781.
62
B.R. Kosanovik et al./ Fuzzy Sets and Systems 77 (1996) 4942
[I0] J.J.D. Gruijter and A.B. McBratney, A modified fuzzy kmeans method for predictive classification, in H.H. Bock, Ed., Classification and Related Methods of Data Analysis, Proc. 1st Conf, of the International Federation of Classification Societies (IFCS), Technical University of Aachen, FRG (New York, 1988) 97-104. International Federation of Classification Societies, Elsevier, Amsterdam. [11] D.E. Gustafson and W.C. Kessel, Fuzzy clustering with a fuzzy covariance matrix, Proc. IEEE CDC, San Diego, CA, (1979) 761-766. [12] RM. Harper, R.J. Sclabassi and T. Estrin, Time series analysis and sleep research, 1EEE Trans. Automat. Control 19 (1974) 932-943. [13] R.J. Hathaway and J.C. Bezdek, NERF c-means: noneuclidean relational fuzzy clustering, Pattern Recognition 27 (1994) 429-437. [14] R.J. Hathaway, J.W. Davenport and J.C. Bezdek, Relational duals of the c-means algorithms, Pattern Recognition 22 (1989) 205-212. [15] A.K. Jain, Fundamentals of Digital Image Processing (Prentice-Hall, Englewood Cliffs, NJ, 1989). [16] A. Kandel and W.J. Byatt, Fuzzy sets, fuzzy algebra, and fuzzy statistics, Proc. IEEE 66 (1978) 1619-1639. [17] S.M. Kay, Modern Spectral Estimation: Theory and Application (Prentice-Hall, Englewood Cliffs, NJ, 1988). [18] G.J. Klir and T.A. Foger, Fuzzy Sets, Uncertainty, and Information (Prentice-Hall, Englewood Cliffs, NJ, 1988). [19] B.R. Kosanovi~, L.F. Chaparro and R.J. Sclabassi, Modeling of quasi-stationary signals using temporal fuzzy sets and timefrequency distributions, in: Proc. IEEE-SP Internat. Syrup. on Time-Frequency and Time-Scale Analysis, Philadelphia, PA, (IEEE Press, New York, 1994) 425-428. [20] B.R. Kosanovi6, L.F. Chaparro and R.J. Sclabassi, Hidden process modeling, in: Proc. IC,4SSP-95, Vol. 5, Detroit, MI, (IEEE Press, New York, 1995) 2935-2938. [21] B.R. Kosanovi6, L.F. Chaparro, M. Sun and R.J. Sclabassi, Physical system modeling using temporal fuzzy sets, in: Proc. lnternat. Joint Conf. of N,4FIPS/IFIS/NASA '94, San Antonio, TX, (IEEE Press, New York, 1994) 429-433. [22] B.R. Kosanovi6, R.E. Dahl, N.D. Ryan and R.J. Sclabassi, Sleep onset dynamics characterized by a fuzzy clustering technique, Sleep Res. 23 (1994) 448. [23] B.R. Kosanovi~ et. al., Fuzzy modeling of dynamic processes, submitted. [24] B.R. Kosanovi6 et al., Adaptive linear quantization and enhancement of the EEG spectrograms during sleep, in preparation.
[25] B. Kosko, Fuzziness vs. probability, Int. J. General Systems 17 (1990) 211-240. [26] B. Kosko, Neural Networks and Fuzzy Systems. ,4 Dynamical Systems Approach to Machine Intelligence (Prentice-Hall, Englewood Cliffs, NJ, 1992). [27] E. Kreyszig, Introductory Functional ,4nalysis with ,4pplications (Wiley, New York, 1978). [28] R. Krishnapuram and J.M. Keller, A possibilistic approach to clustering, IEEE Trans. Fuzzy Systems 1(2) (1993) 98-110. [29] M. Mohamed and P. Gader, Generalization of hidden Markov models using fuzzy integrals, in: Proc. lnternat. Joint Conf. of N,4F1PSIIFISIN,4S,4"94, San Antonio, TX (IEEE Press, New York, 1994)3-7. [30] V.V. Nemytskii and V.V. Stepanov, Qualitative Theory of Differential Equations (Princeton Univ. Press, Princeton, NJ, 1960). [31] L.R. Rahiner and B.H. Juang, An introduction to hidden Markov models, IEEE ,4SSP Mag. (1986) 4-16. [32] L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs, NJ, 1978). [33] A. Rechtschaffen and A. Kales, A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects, Brain Information Service/Brain Research Institute, UCLA (1968). [34] F.F. Rivera, E.L Zapata and J.M. Carazo, Cluster validity based on the hard tendency of the fuzzy classification, Pattern Recognition Lett. 11 (1990) 7-12. [35] M. Roubens, Pattern classification problems and fuzzy sets, Fuzzy Sets and Systems 1 (1978) 239-253. [36] D.D. Siljak, Nonlinear Systems: The Parameter ,4nalysis and Design (Wiley, New York, 1969). [37] S. Tamura, S. Higuchi and K. Tanaka, Pattern classification based on fuzzy relations, IEEE Trans. Systems Man Cybernet. 1(1) (1971) 61~6. [38] X.L. Xie and G. Beni, A validity measure for fuzzy clustering, IEEE Trans. Pattern Anal. Machine Intelligence 13 (1991) 841-847. [39] L.A. Zadeh, Fuzzy sets, Inform. Control 8 (1965) 338-353. [40] L.A. Zadeh, Probability measures of fuzzy events, J. Math. Anal. ,4ppl. 23 (1968)421-427. [41] L.A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. Systems Man Cybernet. 3 (1973) 28-44. [42] H.-J. Zimmermann and P. Zysno, Quantifying vagueness in decision models, European J. Oper. Res. 22 (1985) 148-158.