Fuzzy Sets and Systems 130 (2002) 253–264
www.elsevier.com/locate/fss
Fuzzy system modeling in pharmacology: an improved algorithm Kemal Kilica , Beth A. Sprouleb; c; d , I. Burhan T,urksena; ∗ , Claudio A. Naranjob; d; e; f a Department
of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ontario, Canada, M5S 3G8 b Psychopharmacology Research Program, Sunnybrook & Women’s College Health Sciences Centre, University of Toronto, 5 King’s College Road, Toronto, Ontario, Canada, M5S 3G8 c Faculty of Pharmacy, University of Toronto, 5 King’s College Road, Toronto, Ontario, Canada, M5S 3G8 d Departments of Psychiatry, University of Toronto, 5 King’s College Road, Toronto, Ontario, Canada, M5S 3G8 e Pharmacology, University of Toronto, 5 King’s College Road, Toronto, Ontario, Canada, M5S 3G8 f Medicine, University of Toronto, 5 King’s College Road, Toronto, Ontario, Canada, M5S 3G8 Received 21 August 2000; received in revised form 22 August 2001; accepted 12 September 2001
Abstract In this paper, we propose an improved fuzzy system modeling algorithm to address some of the limitations of the existing approaches identi4ed during our modeling with pharmacological data. This algorithm di5ers from the existing ones in its approach to the cluster validity problem (i.e., number of clusters), the projection schema (i.e., input membership assignment and rule determination), and signi4cant input determination. The new algorithm is compared with the Bazoon–Turksen model, which is based on the well-known Sugeno–Yasukawa approach. The comparison was made in terms of predictive performance using two di5erent data sets. The 4rst comparison was with a two variable nonlinear function prediction problem and the second comparison was with a clinical pharmacokinetic modeling problem. It is shown that the proposed algorithm provides more precise predictions. Determining the degree of signi4cance for each input variable, allows the user to distinguish their c 2002 Elsevier Science B.V. All rights reserved. relative importance. Keywords: Fuzzy sets; Fuzzy logic; Fuzzy system modeling; Pharmacokinetic modeling
1. Introduction Most of the recent research on fuzzy system modeling [3,5,6,8] is based on the objective determination of the structure in data, whereas in older approaches the structure was determined a priori from other sources ∗
Corresponding author. Tel.: +1-416-978-6420; fax: +1-416-978-3453. E-mail address:
[email protected] (I.B. T,urksen).
such as experts’ knowledge. Data analysis (and=or data mining) is now one of the basic steps of fuzzy system modeling. Data consist of objects that are de4ned in terms of some attributes. For example, in the psychopharmacological domain objects are subjects (individuals whose data are collected during the study), and associated attributes may be scales that measure their mood, drug dosage or their weight, for example. The overall goal of data mining is to 4nd the structure of this data in terms of the relationships
c 2002 Elsevier Science B.V. All rights reserved. 0165-0114/02/$ - see front matter PII: S 0 1 6 5 - 0 1 1 4 ( 0 1 ) 0 0 1 9 6 - 8
254
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
identi4ed in a rule structure. Collected data may provide limited information unless the structure, i.e., the rules, which are hidden in the system, are identi4ed. Fuzzy system modeling determines these structures by using fuzzy if–then rules that relate the inputs to the corresponding outputs. Hence, the 4rst assumption is that the available data provides the necessary basis to determine the structure (information, relationships between inputs and outputs). However, most systems have various sources of uncertainties that make the modeling a diGcult task. Fuzzy system modeling overcomes these diGculties in a di5erent and, we think, improved manner than other modeling tools such as regression or classical expert systems, by handling the uncertainties with underlying fuzzy set theory. According to Zimmermann [14] “certainty” implies that a person has quantitatively and qualitatively appropriate information to describe, prescribe and predict deterministically and numerically a system. Situations, which are not described by the above de4nition, are called “uncertain”. There are various sources of uncertainties. One possible source is “lack of information”. That is to say, the data may not provide the necessary information that governs the system behavior. For example, significant inputs may be missing due to technical diGculties in collecting the data, for example, genetic data, historical data or data that may not be represented in numerical terms. Another reason may be that the analyzer may be unaware of the existence of such signi4cant input(s). Furthermore, not only missing input variables but also the quality and the quantity of the data may lead to uncertainty. An approach for such situations is a method of “approximation”. Alternately, one attempts to model the system as much as the data on hand allows. Another source of uncertainty is the “abundance of information” (complexity). Humans, despite our intelligence, cannot identify patterns even in a medium size data set. It is usually not easy to perceive the information hidden in raw data by just looking at the numerical 4gures. One approach is to represent the data in a way that is easier to perceive. Fuzzy system modeling transforms the data into perceivable information by using fuzzy “granularity”. The fuzzy if–then rules obtained from the data expose fuzzy information granules, and thus provide a descriptive explanation of system behavior better than raw numeric 4gures.
A third source of uncertainty may be “conMicting evidence”. That is to say there may be a situation where a considerable amount of data may point to a certain system behavior whereas part of the data may point to a totally di5erent system behavior. There are two possible reasons for this problem. One of them is very close to the 4rst source of uncertainty described above, in that the essential input variable that distinguishes the system characteristics may be missing. Alternatively, the data analysis algorithm itself may not be able to identify the two behaviors due to underlying assumptions in its execution, resulting in inaccurate rule structures. The goal of this paper is to provide an algorithm that determines the structure of data in a manner that improves upon the ones currently proposed in the literature. In Section 2, the basics of fuzzy system modeling are presented along with the shortcomings of the existing algorithms in current literature. Furthermore, we present the details of the proposed new algorithm. In Section 3, we present and demonstrate the results obtained from two di5erent data sets. The 4rst data set is a well-known test problem introduced by Sugeno– Yasukawa [8] that predicts the output of a nonlinear function. The second one is a clinical pharmacology data set used in modeling the pharmacokinetics of the sedative-hypnotic alprazolam. Finally, future research objectives are outlined in the conclusion. 2. Literature review and the proposed algorithm One of the well-known generic frames proposed for fuzzy system modeling, is the Sugeno–Yasukawa model [8]. The Bazoon–Turksen model was developed based on this modeling approach and was applied successfully in several pharmacological problems [5,6]. Later Emami et al. [3] proposed a more systematic and algorithmic approach to form the fuzzy rule base, which uses a parameterized approximate reasoning approach to eliminate some heuristic aspects of fuzzy reasoning and classi4cation. A typical fuzzy rule base has the following form: IF x1 ∈ X1 isr A11 AND x2 ∈ X2 isr A12 AND : : : AND x NV ∈ XNV isr A1; NV THEN y ∈ Y isr B1
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
ALSO ::: ALSO IF x1 ∈ X1 isr A c1 AND x2 ∈ X2 isr A c2 AND : : : AND x NV ∈ XNV isr A c; NV THEN y ∈ Y isr Bc where Ai; j and Bi are fuzzy linguistic labels interpreted as fuzzy sets in universe of discourse Uj and V , respectively and r is number of rules and NV is number of variables. Each rule can be written in canonical form as Rulei : (X1 ; X2 ; : : : ; XNV ; Y ) isr R i , where R i is a fuzzy relation in U1 × U2 × · · · × UNV × V and “isr”(pronounced “easr”) means “is related to” (i.e. “fuzzy is”). The relation R is the combination of all the rules. The following is the fuzzy system modeling algorithm proposed by Emami et al. [3] that may be used to develop a fuzzy rulebase and identify the associated parameters: 1. Determine a system model with a training data set 1.1. Rule Generation; Output Fuzzy Clustering (a) Perform agglomerative hierarchical hard clustering for initial prototypes (AHM). (b) Perform fuzzy clustering for output data (Fuzzy C-Means - FCM) in order to determine: (i) a suitable weighting exponent for clustering output data (m), (ii) 4nd the optimum number of output clusters (c). (c) Project the scatter points of the output clusters into input spaces. (d) Form the membership functions for the entire input–output space (Classi4cation) in terms of type 1 membership representation. 1.2. Input Selection; Input Membership Assignment (a) Perform fuzzy line clustering for input membership functions. (b) Eliminate ine5ective input candidates. (c) Determine fuzzy rules. 1.3. Fuzzy Inference Parameter Optimization (a) Obtain the optimum values of fuzzy inference parameters (p; q; ; ) with the myopic type 1 formulas.
255
1.4. Membership Parameter Tuning (a) Adjust the parameters of input and output fuzzy membership functions. (b) Execute model with adjusted parameter to see if parameter adjustment improves the output. 2. Test the model with a test data set. BrieMy, 4rst the output variable is fuzzy clustered and the obtained fuzzy output clusters are projected to each input variable independently and the fuzzy rule base is obtained via 4tting a line to input membership values. Next, the parameters of the reasoning mechanism (p; q; ; ) are obtained by a constrained optimization methodology. And 4nally, the rules are 4ne-tuned with respect to the modeling error. In this paper, we propose signi4cant modi4cations to the 4rst two steps, which are essentially the determination of the fuzzy if–then rules. Each step will be described in detail to provide information on the shortcomings of the existing techniques and the rationale for our proposed modi4cations. When forming the fuzzy rules, both Sugeno– Yasukawa [8] and Emami et al. [3], use the same approach that is based on clustering the output data and then projecting the output clusters into the input domain. The intuition behind this approach is simple; after determining the fuzzy clusters of output data, one can group the input data and determine which input clusters lead to which output cluster. Determination of the number of output clusters and the weighting exponent is necessary at this stage. This is speci4ed in 1.1(b) above. Determination of the number of output clusters is referred as the cluster validity problem. Available techniques, such as the partition coeGcient and entropy proposed by Bezdek [1], Xie–Beni Index [13], Sugeno–Yasukawa Index [8] and the algorithm proposed by Emami et al. [3] are based on the selection of an (m; c) pair with respect to a pre-speci4ed function. However, this selection cannot guarantee the best pair for the modeling performance since they are only the best with respect to a pre-speci4ed function. This is because these techniques were primarily designed only for clustering purposes. Our proposed modi4cation is to select the number of clusters with respect to the performance measure. The proposed algorithm is as follows:
256
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
Algorithm 1 1. For c = 1 to maximum number of clusters (a prespeci4ed value) do 1.1. total error:=0 1.2. For i = 1 to number of training data do 1.2.1. Form a rule base by using the (training data — ith data) 1.2.2. Predict the error for ith data 1.2.3. total error:=total error + error 1.3. average error:=total error=number of training data 3. Select the c value that produces the minimum average error. In this algorithm one may discuss two potential shortcomings. One of them is the complexity of the algorithm. The second For loop 4nds the errors of each training data vector individually by using the remaining training data to build a rulebase which is known as take-one-out strategy. For our purposes, the proposed fuzzy system modeling is designed to be used in the pharmacological domain where the number of data sets available, is limited (i.e. often ¡200). Hence, this complex analysis does not last for more than a couple of minutes. However, for larger amounts of data, an alternative approach would be to randomly divide the data into two, and use one of them to build the rule base and the second to determine the error sets, rather than doing this test individually for each data set. This approach reduces the complexity with the order of number of training data, however the error that will be found would be less robust because of the random division. The second potential shortcoming is the upper boundary that is set for maximum number of clusters. We propose 10 as a typical boundary. As mentioned previously, fuzzy system modeling is more descriptive than raw numerical 4gures, because it represents the data in terms of fuzzy clusters. Therefore, the maximum number of clusters must be limited to a number that is reasonable for human perception. Generally speaking in psychology it is believed that humans are able to perceive 7G2 clusters, hence, we set the maximum number of clusters as 10 in order to cover this region. One may select the maximum number of clusters as high as number of data but obviously increasing this parameter means increasing the computational time.
We also propose a minor modi4cation to the classi4cation schema of Emami et al. [3]. In that schema, the curve 4tting violates the basic assumption of the FCM algorithm, namely that the summation of membership degrees for an individual in all clusters must be = 1. A new classi4cation algorithm, which is quite fast and satis4es the FCM assumption, is proposed. In this approach, 4rst the cluster centers of the fuzzy sets are determined by using a hard clustering algorithm. The cluster centers of the hard clusters are assigned as the cluster centers of the fuzzy sets. Next, the intermediate values are speci4ed in order to form the fuzzy sets as follows: let vA and vB be the cluster centers of two consecutive fuzzy sets, say fuzzy set A and fuzzy set B. Then the intermediate membership degree of a point, say x is determined as A (x) = (x−vA )=(vA −vB ). If point x is closer to the cluster center of fuzzy set A then it has a higher membership degree to fuzzy set A. Furthermore, this assignment guarantees a total membership of 1. Note that the fuzzy sets obtained with this approach allow only two fuzzy sets to intersect with each other, which must be the case to satisfy the FCM assumption. This approach also eliminates the parameter of the weighting exponent since FCM is not used, the level of fuzziness is 4xed. Finally, we propose another small modi4cation for the boundary fuzzy sets, i.e. the fuzzy set with the minimum cluster center and the fuzzy set with the maximum cluster center. Fuzzy clustering algorithms like FCM determine memberships with respect to their closeness to the cluster centers. Cluster centers are typical values, or prototypes of the clusters, hence, they are not the highest nor the smallest value in the set but are intermediate. This creates a problem for the boundary fuzzy sets. For example, if VERY OLD AGE is a fuzzy set that is the highest boundary, with a cluster center of 85 years then the logic behind FCM like clustering algorithms assigns a membership degree of 1 to the cluster center and assigns smaller values as the age increases or decreases. For ages that are ¡85 this is logical since they have a smaller membership to VERY OLD AGE and a higher membership to the fuzzy set OLD AGE. However, this would not be the case for ages ¿85. As age increases the membership degree in VERY OLD AGE decreases, and the excessive membership degree would be assigned to OLD AGE, which is not valid. A higher age cannot have a higher degree of
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
membership in a lower fuzzy set, i.e., OLD AGE. So we propose that boundary values should have a membership degree of 1, i.e., the degree of membership in VERY OLD AGE should be 1 beyond the cluster center 85. The next step in fuzzy system modeling is the determination of the rulebase. In the existing literature this is achieved by projection of the output membership clusters into the input space and by applying fuzzy line clustering in order to determine the rules. While building the rule base the inputs are treated individually, as if there are no interactions between them. Any linkage is implied indirectly in the inference step. However, as previously mentioned for our purposes, the data belongs to objects (patients in the study) and each input variable refers to one attribute of this object. Hence there is a natural source of dependency between the values of input variables in each data vector since they are from the same individual. In fact in most of the modeling domains this is the case. Furthermore, both Sugeno–Yasukawa [8] and Emami et al. [3] assume that only one convex input cluster corresponds to each output cluster, for each input variable. That is to say the rules are all in the form of IF x1 ∈ X1 isr Ai;1 AND x2 ∈ X2 isr Ai; 2 AND : : : AND xN ∈ XN isr Ai; N THEN y ∈ Y isr Bi . However, this is not necessarily true. It is quite possible that the rules may have the structure which allows OR, such as IF (x1 ∈ X1 isr Ai;1;1 ) OR (x1 ∈ X1 isr Ai;1; 2 ) AND x2 ∈ X2 isr Ai; 2 AND : : : AND xN ∈ XN isr Ai; N THEN y ∈ Y isr Bi . That is to say for each output cluster, there may be more than one input cluster rather than just one convex input cluster. Both of these limitations of the existing techniques signi4cantly reduce the validity of the rule base obtained by using these algorithms. Our second major proposition addresses this limitation. First, the problem will be explained in a simple example and then the proposed new approach will be presented. Suppose that the input output relations obtained from the system generates the following data set presented in Table 1 where X1 and X2 are the inputs and Y is the output. As previously stated, 4rst the output is clustered. From the data one can easily determine that there are two fuzzy output clusters, one is SMALL Y , which is clustered around 10 and the second output cluster is LARGE Y , which is clustered around 100. The next step is the determination of the relationship between
257
Table 1 The data set that relates X1 and X2 to the output Y X1
X2
Y
1 2 3 4 5 6 7 3 4 4 5 5
5 2 7 2 7 6 3 4 4 5 4 5
11 12 13 14 12 13 11 100 102 105 104 107
Fig. 1. The rectangular points represents the data corresponding to SMALL Y, and the circular data corresponds to LARGE Y. The triangular point is the test data.
the inputs and the outputs. Fig. 1 illustrates the data in a scatter diagram. From Fig. 1, it is clear that LARGE Y is related to MEDIUM X1 AND MEDIUM X2 . This can be presented as a fuzzy if–then rule such as; “If X1 isr MEDIUM and X2 isr MEDIUM Then Y is LARGE”. Where as, SMALL Y is related to a ring-shaped cluster that surrounds the medium region. Hence the fuzzy if–then rule cannot be represented with a rule that allows only a single convex input cluster to correspond to the output cluster. Even though this data was created in order to demonstrate one of the shortcom-
258
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
Fig. 2. The fuzzy if-then rule obtained by using Bazoon-Turksen modeling approach for the small example in Table 1.
ings of the Sugeno–Yasukawa modeling approach there are many real life examples that take this form. For example, in pharmacokinetic modeling the time that has passed since the last dose of drug was taken is an example of an input variable such as X2 , where lower drug concentrations are associated with both shorter and longer time frames. This is discussed more fully in Section 3.2. The existing methodology of clustering the output and projecting it to input space variables one by one leads to an incorrect rule which is presented in Fig. 2. As you can see the 4rst rule, namely Rule 0, corresponds to SMALL Y and after the projection it seems that ALL X1 AND ALL X2 THEN Y is SMALL which is obviously not the case. Hence, the modeling approaches that use similar approaches to Sugeno–Yasukawa [8] and Emami et. al. [3] produce invalid fuzzy rules for such cases. There are several reasons for such invalid rule development. First, treating the inputs as separate entities in the rule development stage may be misleading. Second, the one convex input fuzzy set corresponding to the output assumption may not be valid for many applications. For example, both children and elderly subjects may be more sensitive to a drug. However, with the existing approach one may inadvertently produce a fuzzy if–then rule that states ALL patients are sensitive to a drug, which reduces the strength of fuzzy system modeling. One possible solution, which avoids the abovementioned problem, is to treat the input variables as an n-dimensional vector. One can 4rst cluster the output, then determine the corresponding inputs as n-dimensional vectors. We can fuzzy cluster the output and assign linguistic labels, then use the corresponding n-dimensional vectors to obtain fuzzy rules such as; “If the inputs are similar to this
n-dimensional vector then output will be small”. So for each fuzzy output cluster again one rule will be generated, but this rule will relate the input space as a whole object to the corresponding output. While inferring the output for the test data the similarity of its input features with the n-dimensional input cluster may be determined and by using this similarity as the degree of 4ring we can infer the corresponding output. Here similarity is based on a distance measure and in this study we used Euclidean distance as the distance measure. Other possible distance measures such as Mahalanobis, Minkowski may be used, with the most suitable one determined by the data. Note that input selection, or feature determination is one of the most crucial part of any system identi4cation algorithm. Another modi4cation that is proposed in this paper is the determination of the signi4cant inputs. The Sugeno–Yasukawa [8] model splits the training data set into two subsets in order to conduct two experiments. In the 4rst experiment, one subset is used as the training set to determine a model with one variable and the other as test set, and vice versa in the second experiment. A regularity criterion index (RCI) is determined, which is the total prediction error of the test sets from the two experiments based on two separate models for each input variable. Then, the input variable that yields the best RCI is selected. Next, the selected input and each of the remaining input variables are tested one by one in combination with the 4rst selected variable and the combination of two variables that improves the RCI is selected. This routine continues until the RCI improves no longer. Emami et. al. [3] proposed determining the signi4cant inputs by checking the core of the fuzzy sets over its range for each rule and determining the ones that have the narrower core utilizing the boundary axiom of 1 for the t-norms. All of the existing literature in fuzzy system modeling 4eld assumes that an input variable is either signi4cant or not signi4cant, whereas both reality and the philosophy of fuzzy theory implies that this dichotomous approach is not valid. That is to say there are some inputs that are more signi4cant and others that are less signi4cant to a certain degree. In our new fuzzy system modeling approach, rather than determining the signi4cant inputs, the signi4cance of each input is determined. This is achieved as follows:
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
Algorithm 2 1. Initialize the signi4cance of each input, Sig( j) = 1=NV 2. While the termination criteria not satis4ed do 2.1. For i = 1 to number of input variables (NV) do 2.1.1 Increase the signi4cance of j’th input by ; Sig( j) := Sig(j) + 2.1.2 Decrease the signi4cance of the remaining inputs by =(NV − 1) 2.1.3 temporary error := 0 2.1.4 For k = 1 to number of training data do 2.1.4.1 Form a rule base by using the (training data-kth data) 2.1.4.2 Predict the error for kth data 2.1.4.3 temporary error:=temporary error + error 2.1.5 average error:=temporary error= number of training data 2.2. Select the minimum two average errors obtained for each increment of signi4cance of input variables and select the best one randomly 2.3. Save the minimum error found until this stage and the signi4cance combination that is used to reach this minimum error 3. The signi4cance combination is the one that produces the minimum error Recall that the number of output clusters is selected with respect to the model error. Hence, Algorithm 2 is iterated for each possible cluster size. The random selection in step 2.2. is used in order to allow a wider search space of alternatives. Also, one must be careful at step 2.1.2 to avoid obtaining negative signi4cance degrees. This is achieved by not allowing negative signi4cance and redistributing and normalizing the signi4cance values at the end of each iteration. In this way, the sum of the signi4cance values for the inputs will be 1 after each iteration. There are possible termination criteria that may be used such as a 4xed number of iterations or termination if some number of consecutive iterations is not modifying the minimum error by a certain amount. In this study, we used the former approach and set the iteration size to 200. Fi-
259
nally, the best cluster size and signi4cance combination in terms of the average training error is used in the 4nal model. When inferring the output for the test data one must determine the degree of 4ring of each rule by using its similarity to each input fuzzy cluster. As an example recall the triangular point in Fig. 1. Its similarity to the corresponding fuzzy input clusters is de4ned in terms of the distance calculation based on the k-nearest neighbors approach. In this approach, we determine the nearest k to this triangular point and then determine its similarity, and therefore its membership degree to the fuzzy clusters with respect to these k nearest points. This algorithm is presented in detail in Algorithm 3:1 and 3:2. Before presenting the details of the algorithm, one of the possible problems in 4nding the similarity (or distance) of n-dimensional points is that each dimension may come from a di5erent range of orders. One dimension may be in the order of 10, and another may be in the order of 10 000. In order to avoid possible problems that this may cause we normalize the dimensions in a range of 0–1. In this paper, the similarity of the test data to input clusters is based on their Euclidean distances to the input vectors that correspond to a certain output fuzzy cluster in each rule. The proposed k-nearest algorithm is as follows. Let {X1 ; X2 ; : : : ; XND } be the data set where Xk is an NV-dimensional object, such that each dimension corresponds to a fuzzy linguistic variable (input). Suppose by using the structure identi4cation algorithm summarized above the data is clustered into c patterns and each data point, say Xk , is associated with A (Xk ) = [A1 (Xk ); : : : Ac (Xk )] which is a c-dimensional membership degree vector where each dimension speci4es the membership degree of Xk in each rule. Let X ∗ be the test data vector which is again an NV-dimensional object. The following algorithm is used to determine the degree of 4ring of each rule is as follows: Algorithm 3.1 1. Choose k ∗ 2. For the test object X ∗ , determine k ∗ -NN data ∗ points X [1] ; : : : ; X [k ] based on the following distance measure d(Xi ; Xj ) = sqrt(#n=1::NV [sig(n)(xi; n − xj; n )2 ])
260
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
which is basically the weighted Euclidean distance, such that and the weight of each dimension, i.e. input variable, is sig(n). That is to say the signi4cance degree of each input variable determined as described earlier. 3. Each k ∗ -NN, X [i] , point is associated with a c-dimensional vector, A (X [i] ), and each raw speci4es the membership degree of X [i] in the corresponding cluster. A (X [i] ) = [A1 (X [i] ); : : : ; Ac (X [i] )] Hence there are k ∗ such vectors. A (X [1] ) = [A1 (X [1] ); : : : ; Ac (X [1] )] .. . ∗ ∗ ∗ A (X [k ] ) = [A1 (X [k ] ); : : : ; Ac (X [k ] )] 4. Assign A (X ∗ ) = [(#k=1: :k ∗ A1 (X [k] ))=k ∗ ; : : : ; (#k=1: :k ∗ Ac (X [k] ))=k ∗ )] Recall that the structure identi4cation algorithm allows each point to be a member of, at the most, two clusters based on the constraint that sum of the membership degrees to each cluster is one [4]. In order to be consistent similar constraint will be applied for the test data. Hence Algorithm 3:2 provides a methodology to determine the degree of 4ring that satis4es this constraint. Algorithm 3.2 5. Determine the sum of membership degrees of each consecutive clusters; A( j; j+1) (Xt ) = Aj; (Xt ) + A( j+1) (Xt ) 6. Select j ∗ such that A( j∗ ; j∗ +1 )(Xt ) = supj {A( j; j+1) (Xt )} 7. %j∗ (Xt ) = A( j∗ ) (Xt )=(A( j∗ ) (Xt ) + A( j∗ +1) (Xt )) %( j∗ +1) (Xt ) = A( j∗ +1) (Xt )=(A( j∗ ) (Xt ) + A( j∗ +1) (Xt )) Algorithm 3:1 and 3:2 determines the similarity of the test data to each input fuzzy cluster, hence the estimated degree of 4ring for each rule. The 4nal step is inference. Sugeno–Yasukawa suggest a relatively simple approach where the inferred output is determined by multiplying the center of the gravities of the output clusters with the degree of 4ring associated with the corresponding rule and then 4nding the weighted average of this 4gure. Emami et al. [8] further improves this algorithm by parametric inference.
For details of this approach please refer to [12]. At this stage, we are proposing to use an inference mechanism similar to the one proposed by Sugeno–Yasukawa. The only modi4cation is that our inference schema 4res only two rules because of the new output clustering schema. Recall that the output space is clustered and classi4ed by assuming that the total membership degrees add up to 1. This approach restricts the output data to be a member of only two consecutively ordered output fuzzy sets. Hence, the inputs also are members of only two fuzzy clusters. Our proposal is to determine the two consecutive fuzzy sets to which the test data have the highest sum of belongingness and then 4re the associated rules by normalizing their belongingness for these two consecutive fuzzy sets such as; y∗ = (%i (X ∗ )×y [i] ), where y [i] is the cluster center of the ith rule and %i (X ∗ ) is the degree of match (or 4ring) of the test data, i.e., estimated membership degree of x∗ to the n-dimensional cluster of Ai , antecedent of the ith rule. 3. Experimental analysis The proposed algorithm has been tested in two different problems. The 4rst one is the nonlinear system that was introduced by Sugeno–Yasukawa [8] and the second one uses clinical data for modeling the pharmacokinetics of alprazolam. 3.1. The nonlinear system The nonlinear system has two input variables x1 and x2 , and a single output y, which is de4ned as follows: y = (1 + x1−2 + x2−1:5 )2 ;
1 6 x1 ; x2 6 5:
Ten experiments are constructed in order to conduct statistical analysis. For each one of the experiments 90 data vectors were generated randomly for x1 and x2 and the associated y calculated. In order to test the performance of our algorithm to determine the signi4cance of the input variables, dummy x3 and x4 input variables were also randomly generated. The Bazoon–Turksen [5,6] model, which is based on Sugeno–Yasukawa [8] approach, was used for comparison. The weighting exponent, m, was set to be 2 for FCM. Sixty data vectors were used to determine a rule base with the training set, and
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264 Table 2 The signi4cant variables determined by Bazoon–Turksen model for each experiment and the corresponding signi4cance degrees determined by the proposed algorithma
Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 Exp. 6 Exp. 7 Exp. 8 Exp. 9 Exp. 10 Average
B–T
PA
1; 2 4 1; 2; 3 1; 2; 3 1; 2 1; 2 1; 2 2; 3 1; 2 1; 2; 4
0.45 – 0.54 – 0 – 0.01 0.55 – 0.45 – 0 – 0 0.74 – 0.26 – 0 – 0 0.50 – 0.41 – 0.05 – 0.04 0.70 – 0.29 – 0.01 – 0 0.58 – 0.34 – 0.03 – 0.05 0.35 – 0.58 – 0.03 – 0.04 0.14 – 0.86 – 0 – 0 0.53 – 0.47 – 0 – 0 0.67 – 0.32 – 0.01 – 0 0.53 – 0.45 – 0.01 – 0.01
a Recall that for B–T the input variables are either signi4cant or not. For the PA there is a signi4cance degree associated for ach input variable that sums up to 1. The bold 4gures for PA are the ones that has the highest signi4cance.
Table 3 The RMSE comparison
Exp. Exp. Exp. Exp. Exp. Exp. Exp. Exp. Exp. Exp.
1 2 3 4 5 6 7 8 9 10
B–T
PA
0.76 0.77 0.63 0.78 0.61 0.67 0.73 0.69 0.67 0.91
0.48 0.56 0.52 0.65 0.54 0.60 0.53 0.63 0.41 0.49
the remaining 30 data vectors were used as the test set to test the model. The proposed algorithm successfully determined the dummy variables for all of the 10 experiments as the least signi4cant variables. The average of the signi4cance degrees of the input variables obtained from the 10 experiments are determined as (0:53; 0:45; 0:01; 0:01) for x1 ; x2 ; x3 and x4 , respectively. The Bazoon–Turksen model predicted only in 4ve out of ten experiments that the 4rst and second variables are signi4cant. The results of the ten experiments are presented in Table 2 in terms of the signi4cant input determination. Note that in the remaining tables (Tables 2–6), B–T refers to Bazoon–
261
Table 4 The RMSE and BIAS comparison of the algorithms based on the average of 10 experiments
RMSE BIAS
B–T
PA
0.72 −0:17
0.54 0.01
Table 5 The RMSE comparisons of Boomer, Bazoon–Turksen and the proposed algorithm for the 10 experiments where the 4gures are in nM=l
Exp. Exp. Exp. Exp. Exp. Exp. Exp. Exp. Exp. Exp.
1 2 3 4 5 6 7 8 9 10
Boomer
B–T
PA
9.2 6.7 25.5 16.3 21.0 13.1 15.2 11.3 16.2 15.3
18.9 5.6 32.8 19.0 8.9 7.0 4.5 4.8 11.1 7.6
13.4 8.4 22.3 8.8 6.7 7.3 11.1 9.1 9.1 5.3
Table 6 The average RMSE and the BIAS of the predictions comparison for the Boomer, Bazoon–Turksen and the proposed algorithm. The 4gures are in nM=l
RMSE BIAS
Boomer
B–T
PA
14.9 −5:25
12.0 5.20
10.1 1.23
Turksen algorithm and PA stands for Proposed approach. The predictive performance of the new algorithm was signi4cantly better than the Bazoon–Turksen approach. The results of the 10 experiments are tabulated in Table 3 in terms of the root mean square error (RMSE) of the prediction. We also performed a paired t-test and the corresponding t-value to the pair Bazoon–Turksen vs. proposed algorithm is 4.86 with the signi4cance level (p)60:000. Also, a Wilcoxon Signed-Rank test is performed where the corresponding z-value is 2.80 for the pair with the signi4cance level (p)60:003. The comparison of the average
262
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
Fig. 3. The actual and predicted output for the proposed algorithm (a) and Bazoon-Turksen model (b) for a sample experiment.
RMSE of the ten experiments as well as the average bias is presented in Table 4. The bias is calculated as the mean error. Also, the predicted versus actual output graphs for both approaches are presented in Fig. 3 for a sample experiment. 3.2. The alprazolam pharmacokinetic modeling Alprazolam is a benzodiazepine drug widely used in the treatment of anxiety disorders. Alprazolam pharmacokinetic data had been collected by our group as part of a drug interaction study [4]. Serial plasma alprazolam concentrations were obtained from 10 healthy volunteers (mean age 24 years) at 0:5; 1; 1:5; 2; 2:5; 3; 4; 6; 8; ∼24 h and ∼36 h after a single oral dose of 1 mg alprazolam was administered. We have explored several ways of modeling this data with respect to appropriate input=output variables [7]. The proposed algorithm will be compared with two di5erent modeling algorithms, in terms of predictive performance, using all the following inputs: age, weight, height, time since last dose and output: plasma alprazolam concentration (nM=l) at the times of measurement speci4ed above. The 4rst modeling algorithm used a more traditional mathematical model known as a one-compartment pharmacokinetic model, which assumes 4rst order absorption. The parameters of this one-compartment model were optimized using a nonlinear regression analysis with the ‘Boomer’ software [2]. The second one is the Bazoon–Turksen fuzzy system modeling. The performance of the model is based on the prediction error of the plasma concentrations used to construct the pharmacokinetic curve (i.e., the graph of alprazolam concentration
Fig. 4. The plasma concentration of alprazolam in one individual over time.
versus time) for the test individuals. Before presenting the comparison results, one point should be clari4ed. After oral administration of a drug, it is gradually absorbed into the body, therefore, for a period of time the concentration of the drug in the blood increases. At some point, the drug elimination process equals absorption, and the blood concentration reaches a peak. Finally, the elimination rate exceeds the absorption rate. The blood concentrations decline until the drug is completely cleared, or another dose is taken. For a drug that is taken orally, the pharmacokinetic curve often takes the shape depicted in Fig. 4. As it is observed from Fig. 4, smaller concentrations occur both during very early time points and during very late time points over the dosage levels. Recall the discussion on single convex input clusters for each output cluster, the rule that would be obtained for smaller values of the output would cover the whole range of times for this particular input even though this is not the case. The existing algorithms provide rules that are not valid for situations such as this. In a previous study [12], a subjective breakdown of these
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
263
Fig. 5. The predicted versus actual pharmacokinetic curves for the proposed algorithm on (left), Bazoon-Turksen model (middle) and two compartmental analysis (right).
rules into two convex fuzzy sets were made rather than leaving it a single set. This signi4cantly improved the predictive performance of the algorithm. In this experiment, the objective was to predict the pharmacokinetic concentration versus time curves of the test individuals based on the historical input– output data described above. The performance measure is a comparison of the predicted pharmacokinetic curve with the actual one. These comparisons will be presented in terms of the root mean square error (RMSE) of the predicted concentrations. The fuzzy rulebase was constructed by using nine of the patients’ data as training set. The inputs were their age, weight, height and the time that has passed since the dose was administered. Then the remaining patient was used as the test set. This experiment was repeated ten times so that each time a di5erent subject was used as the test set. By doing this we eliminated a possible source of randomness that may a5ect the results. Based on the average of the ten experiments the signi4cance vector was determined to be 0.00 – 0.45 – 0.02 – 0.53 for the age–weight–height and the time since last dose, respectively. The signi4cance degrees obtained from the data appear clinically valid, since it was expected that the time since last dose would be a very important input. The actual measurements compared to the predictions for the individual presented in Fig. 4 are presented in Fig. 5 as a sample experiment result. Table 5, demonstrates the results of the ten experiments in terms of the RMSE of the predictions. The average RMSE and the average BIAS is represented in Table 6. From Table 6 both of the fuzzy system modeling algorithms produced better results than the “boomer” software on the average, although the clinical signi4cance of these di5erences may be
minimal. We conducted paired t-test for each combination in order to determine the statistical signi4cance of the results. The null hypothesis that the proposed algorithm is better than Boomer was statistically signi4cant. The corresponding t-value for the pair Boomer-proposed algorithm is 2.82 with the signi4cance level (p)60:01. Even though Bazoon–Turksen was better than Boomer on average the di5erence was not signi4cant (p60:24). Similarly the di5erence between the proposed algorithm and Bazoon–Turksen was not statistically signi4cant (p60:17). Hence, the statistical tests revealed that only the proposed algorithm was statistically better than the Boomer software.
4. Conclusion In this paper we proposed a fuzzy system modeling algorithm that solves some of the major limitations in the existing algorithms identi4ed while modeling our pharmacological data. The 4rst problem was with the cluster validity, that is, the determination of the optimal cluster size, which is directly related to the number of the rules. The existing algorithms use the minimization of a pre-speci4ed function to select the c. This does not guarantee a cluster size that minimizes the modeling error. We developed a schema that selects the cluster size contingent on the training error. Secondly, we demonstrated that clustering the output and projecting the output fuzzy clusters into input space by neglecting the interactions between the inputs might lead to fuzzy rules that do not adequately represent the system. The requirement of having a single convex input fuzzy
264
K. Kilic et al. / Fuzzy Sets and Systems 130 (2002) 253–264
cluster for each fuzzy output cluster also reduces the strength of the fuzzy rules in terms of representing the hidden structure in the system. We have proposed a schema that is based on n-dimensional input space, where the interactions between the inputs are imbedded but unknown. This is achieved by projecting the output clusters into n-dimensional input space and building input clusters without any assumptions of convexity. Inference is conducted by determining the similarity of the test data with respect to these input clusters via a k-neighbor search algorithm. Finally, we have proposed a new approach in the selection of signi4cant inputs. We believe that this new approach is a breakthrough in fuzzy system modeling, because it no longer classi4es the inputs as either signi4cant or insigni4cant, but rather assigns a signi4cance degree with respect to their weight in predicting the output. The signi4cance degrees are assigned by a local search algorithm that is randomized to overcome cycles and entrapments in local optimums. Other minor improvements in fuzzy system modeling were also introduced such as a new classi4cation schema that is quick and satis4es the requirement that the total membership of each data value to the fuzzy clusters be 1, overcoming the anomaly caused by the classi4cation and the FCM procedure. An improvement in the membership functions for the fuzzy sets that are at the boundaries has been proposed and the reasons demonstrated with a linguistic example. We are continuing to work on improvements on the algorithm. Future work will introduce a parametric approach that allows the use of more general t-norms and t-conorms in the inference process. Also, Type II approach that is proposed by Turksen [9 – 11] will be implemented and its performance tested. The proposed algorithm will also be tested in other pharmacological modeling problems. Acknowledgements This work is supported by The Whitaker Foundation and NSERC.
References [1] J.C. Bezdek, Cluster validity with fuzzy sets, J. Cybernet. 3 (1974) 58–72. [2] D.W.A. Bourne, BOOMER, a simulation and modeling program for pharmacokinetic and pharmacodynamic data analysis, Comput. Methods Programs Biomed. 29 (1989) 191–195. [3] M.R. Emami, I.B. Turksen, A.A. Goldenberg, A uni4ed parametrized formulation of reasoning in fuzzy modeling and control, Fuzzy Sets and Systems 108 (1999) 59–81. [4] P.C. Hassan, B.A. Sproule, C.A. Naranjo, N. Hermann, Dose– response evaluation of the interaction between sertraline and alprazolam in vivo, J. Clin. Psychopharmacol. 20 (2000) 150–158. [5] C.A. Naranjo, K.E. Bremner, M. Bazoon, I.B. Turksen, Using fuzzy logic to predict response to citalopram in alcohol dependence, Clin. Pharmacol. Therapy 62 (1997) 209–224. [6] B.A. Sproule, M. Bazoon, K.I. Shulman, I.B. Turksen, C.A. Naranjo, Fuzzy logic pharmacokinetic modeling: application to lithium concentration prediction, Clin. Pharmacol. Therapy 62 (1997) 29–40. [7] B.A. Sproule, K. Kilic, C.A. Naranjo, I.B. Turksen, Exploring fuzzy logic modeling: Alprazolam experiment [Abstract], Clin. Pharmacol. Ther. 67 (2000) 160. [8] M. Sugeno, T.A. Yasukawa, A fuzzy logic based approach to qualitative modeling, IEEE Trans. Fuzzy Systems 31 (1993) 7–31. [9] I.B. Turksen, Type I and Type II fuzzy system modeling, Fuzzy Sets and Systems 106 (1999) 11–34. [10] I.B. Turksen, Theories of set and logic with crisp of fuzzy information granules, J. Adv. Comput. Intell. 2000, in press. [11] I.B. Turksen, Computing with descriptive and veristic words; knowledge representation and reasoning, in: P.P. Wang (Ed.), Computing with Words, Elsevier, Amsterdam, 2000, in press. [12] I.B. Turksen, B.A. Sproule, C.A. Naranjo, K. Kilic, Fuzzy logic pharmacokinetic modeling, Proceedings ERUDIT (European Network in Uncertainty Techniques — Developments for Use in Information Technology) Workshop: Fuzzy Diagnotic and Therapeutic Decision Support, University of Vienna Medical School, Vienna, Austria, May 11–12, 2000, pp. 19 –23. [13] X.L. Xie, G.A. Beni, Validity measure for fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell. 3 (1991) 841–846. [14] H.-J Zimmermann, Fuzzy data analysis, in: O. Kaynak, L.A. Zadeh, B. Turksen, I.J. Rudas (Eds.), Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications, Springer, Berlin, 1998, pp. 198–230.