The application of rough set and Mahalanobis distance to enhance the quality of OSA diagnosis

The application of rough set and Mahalanobis distance to enhance the quality of OSA diagnosis

Expert Systems with Applications 38 (2011) 7828–7836 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

426KB Sizes 2 Downloads 42 Views

Expert Systems with Applications 38 (2011) 7828–7836

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

The application of rough set and Mahalanobis distance to enhance the quality of OSA diagnosis Pa-Chun Wang a,b,c, Chao-Ton Su d,⇑, Kun-Huang Chen d, Ning-Hung Chen e a

Department of Otolaryngology, Cathay General Hospital, Taipei, Taiwan Fu Jen Catholic University School of Medicine, Taipei, Taiwan c Department of Public Health, China Medical University, Taichung, Taiwan d Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Taiwan e Sleep Center, Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chang Gung University, Taipei, Taiwan b

a r t i c l e

i n f o

Keywords: Obstructive sleep apnea Mahalanobis distance Rough set

a b s t r a c t This study aims to apply an analytical approach based on anthropometry and questionnaire data to detect obstructive sleep apnea (OSA). In recent years, OSA has become a pressing public health problem that demands serious attention. Approximately one in five American adults has at least mild OSA. In 2004, access economics estimated that in the Australian community, the cost of sleep disorders was over $7 billion, and much of this cost was related to OSA. Traditionally, a polysomnography (PSG) is considered to be a well-established and effective diagnosis for this disorder. However, PSG is time consuming and labor intensive as doctors require an overnight PSG evaluation in sleep laboratories with dedicated systems and attending personnel. Our proposed analytical approach is the integration of a rough set (RS) and the Mahalanobis distance (MD). RS was utilized to select important features, while MD was employed to distinguish the pattern of OSA. In this study, data were collected from 86 subjects (62 diseases and 24 non-diseases) who were referred for clinical suspicion of OSA. To grade the severity of the sleep apnea, the number of events per hour is reported as the apnea–hypopnea index (AHI). In the study, we define AHI < 5 as non-disease and AH P 5 as disease. According to sensitivity, specificity analysis, and g-means, the results show that our proposed method outperforms other methods such as logistic regression (LR), artificial neural networks (ANNs), support vector machine (SVM), and the C4.5 decision tree. Implementation results show that not only can our proposed method effectively detect OSA; it can reduce the cost and time needed for an accurate diagnosis. The proposed approach can be employed by physicians when providing the clinical decision for their patients. Crown Copyright Ó 2010 Published by Elsevier Ltd. All rights reserved.

1. Introduction Sleep apnea is a form of sleep disorder characterized by pauses in breathing during sleep. This is usually associated with snoring and may recur frequently throughout the night. Sleep apnea also occurs when the muscle tone of the body ordinarily relaxes during sleep. At the level of the throat, the human airway is composed of collapsible walls of soft tissue that can obstruct breathing during sleep. Young et al. (1993) shown that obstructive sleep apnea (OSA) is a common problem with potentially serious health consequences. Wright, Johns, Watt, Melville, and Sheldon (1997) shown that OSA has been linked to the augmented risk of mortality and ⇑ Corresponding author. Tel.: +886 3 574 2936; fax: +886 3 572 2204. E-mail address: [email protected] (C.-T. Su).

morbidity due to cardiovascular and neurophysiologic disorders. Polysomnography (PSG) is considered to be a well-established and effective diagnosis method for this disorder. The PSG of sleep apnea shows pauses in breathing, and it monitors the heart, lung, and brain activity, as well as the breathing patterns, the arm and leg movements, and the blood oxygen levels during sleep. However, the PSG method is time consuming and labor intensive because it requires an overnight evaluation in sleep laboratories with dedicated systems and attending personnel. The accurate identification of an apnea event requires the direct measurement of the upper airway airflow and respiratory effort. For this reason, it is important to develop a simple and effective predictive method for OSA diagnosis. Komorowski and Øhrn (1999) revealed that rough set (RS) theory has been considered as an effective mathematical tool for uncertain, imprecise, noisy, and incomplete information, and it

0957-4174/$ - see front matter Crown Copyright Ó 2010 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.12.122

7829

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836

has been successfully applied in knowledge discovery, pattern classification, fault diagnosis, medicine, and so on. Taguchi and Jugulum (2002) shown that Mahalanobis distance (MD) is used to distinguish the pattern of a certain group from other groups, which is much like the process by which a doctor determines whether or not a patient has a certain kind of disease. MD is a distance measure based on correlations between variables (features) by which different patterns can be identified and analyzed with respect to a reference baseline. This method has been applied in healthcare, fire alarms, handwriting, medicine and so on (Pokrajac, Megalooikonomou, Lazarevic, Kontos, & Obradovic, 2005). This paper aims to develop a predictive method for OSA diagnosis based on anthropometric (e.g., gender, age, weight, height, and so on) and questionnaires (e.g., Epworth sleepiness scale (ESS), snore outcomes survey (SOS) data. The proposed approach utilizes RS to select important features and MD to distinguish the pattern of OSA. A comparison is made between the proposed method and the other methods available (e.g., logistic regression (LR), back propagation neural network (BPN), learning vector quantisation (LVQ), support vector machine (SVM), C4.5 decision tree, and RS. In Section 2, related studies including feature selection, RS, and MD are described. The proposed approach is then presented in Section 3. Section 4 demonstrates a case study by employing our proposed approach. Some discussion will be presented. The final section presents the conclusion.

2. Related study 2.1. Obstructive sleep apnea The causes of sleep apnea are either complete obstruction (obstructive apnea) or partial obstruction (obstructive hypopnea). Hypopnea is the slow and shallow breathing of the airway. Through clinical observation, the sleep apnea syndrome (SAS) can be divided into three kinds: (1) OSA, (2) central sleep apnea (CSA), and (3) mixed sleep apnea. OSA comprises more than 85% of SAS cases. Approximately one in five American adults has at least mild OSA. The American Sleep Disorders Association (1990) defines OSA as the condition when a patient’s airway becomes obstructed while sleeping, which in turn reduces oxygen saturation in the blood. A patient with OSA may have symptoms such as excessive drowsiness, obesity, snoring, restless sleep, dry mouth upon awakening, morning headaches, and so on. Therefore, OSA may be diagnosed if a patient presents a combination of the abovementioned symptoms. Sleep apnea is usually associated with snoring and may recur frequently throughout the night. It occurs when the muscles that support the dorsal soft tissue of the throat (i.e., soft palate, uvula, tonsil, and tongue) relax, which causes the pharynx to collapse, hence resulting in sleep apnea. Normally, it takes a diagnosis of over five apneas per hour (AHI) and/or oxygen levels to drop below 90% to obtain an obstructive sleep apnea diagnosis.

If not treated properly, Desai, Ellis, Wheatley, and Grunstein (2003) revealed that OSA can have severe effects on driving, which can lead to motor vehicle accidents. In most of the traffic accidents that were caused by untreated OSA, no effort to employ brakes were involved prior to the accident as the drivers tended to fall asleep while they are behind the wheel. In accidents due to lack of braking, more serious damage can occur. As a result, OSAinduced accidents are more fatal than other causes of traffic accidents. It has been reported that the prevalence of OSA in commercial truck drivers could be as high as 46%. Driving accidents resulting from OSA entail a very large expense (Barthel & Strome, 1999). Hillman, Murphy, and Pezzullo (2006) shown that access economics estimated that the cost of sleep disorders to the Australian community was over $7 billion; and a great deal of this cost was related to OSA in 2004. Nasal continuous positive airway pressure (CPAP), conducted overnight in the sleep laboratory, and is considered to be a wellestablished and effective therapy (Jenkinson, Davies, Mullins, & Stradling, 1999). A pressure of 4 cm H2O are initially given to patients. The pressure is then increased gradually by 1 cm H2O every 20 min until the level at which apnea, hypopnea, snoring, and recurrent oxyhemoglobin desaturations occur. Presently, hospitals diagnose OSA through questionnaires and PSG. The ESS is a questionnaire intended to measure daytime sleepiness. This can be helpful in diagnosing sleep disorders. The SOS is another recently designed questionnaire for evaluating the snoring of a patient. PSG provides a reliable data on OSA, and is a multi-parametric test used in sleep medicine. PSG is a comprehensive recording of the biophysiological changes that occur during sleep, which involves the collection of data from electroencephalogram (EEG), electrooculogram (EOG), electromyogram (EMG), heartbeat, and oximeter of lobe. Sleep apnea is defined by apnea blockage exceeding 10 s. On the other hand, breath hypopnea is defined by a 30% reduction in air circulation in the airway or a 4% reduction in oxyhemoglobin saturation. PSG is considered to be a well-established and effective diagnosis for this disorder. However, it is time consuming and labor intensive. There are many test protocols available in diagnosing OSA, including O2 pulse oximeter, body mass index (BMI), and two stages (BMI with O2 pulse oximeter and questionnaires with O2 pulse oximeter). Table 1 shows these OSA diagnostic methods. However, patients need to wear the O2 pulse oximeter overnight with these methods, which can be very inconvenient. We developed an analytical approach that integrates RS and MD for OSA diagnosis in order to provide a convenient and fast diagnostic method.

2.2. Feature selection With an enormous amount of data, data mining will need a great deal of resources to conduct. To reduce the cost, we have to preprocess the available data. Feature selection, one of the preprocessing techniques, has become the focus of interest in recent years. However, feature selection still needs to deal with problems

Table 1 OSA diagnosis methods. Method

Sensitivity

Distinguish

Advantage and shortcoming

Ref.

Questionnaire

Middle

Low

Rowley, Aboussouan, and Badr (2000)

O2 pulse oximeter

Middle

Middle

BMI Two stage (BMI + O2 pulse oximeter Two stage (questionnaire + O2 pulse oximeter)

Middle Middle Middle

Low Middle Middle

Implementation examination is easy, but sensitivity and distinguish are not good Sensitivity and distinguish are good, but lacks the efficiency Sensitivity and distinguish are not good Sensitivity and distinguish are middle Sensitivity and distinguish are middle, addition Implementation examination is easy

Ryan et al. (1995), Yamashiro and Kryger (1999) Rosenthal and Diana (2008a, 2008b) Jacob et al. (1995) Gurubhagavatula and Maislin (2001)

7830

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836

such as the excessive noise and inappropriate features. Modification can simplify the classifier, reduce the cost and computational time, and increase the accuracy. The inappropriate features removed should be ineffectual, unnecessary, or have the least contribution to predicting the unknown data. The problem of feature selection is how to select a subset of (o) features from a larger set of (p) features or measurements to optimize the value of a criterion over all the subsets of size m. There are (o) = p!/o!(p  o)! such subsets. Usually, the data dimensionality is large, which increases the difficulty of the classifier parameters’ estimation. This phenomenon is known as the curse of dimensionality (Theodoridis & Koutroumbas, 2003). Many methods have been developed to surmount this disadvantage. The earliest stepwise techniques and dynamic programming solutions are more efficient because they avoid exhaustive enumeration, but they offer no guarantee that the selected subset would yield the best value of the criterion among all subsets of size m. Branch and bound methods are powerful combinatorial optimization tools, and similar formulations have been applied to other problems in pattern recognition such as clustering and nearest neighbor. The branch and bound method attempts to satisfy two criteria: (1) the selected subset should be as small as possible, and (2) a bound is placed on the value calculated by the evaluation function (Doak, 1992). However, it is difficult to choose a suitable evaluation function and criterion value. For the past few years, the RS theory has emerged as a feature selection, which is used in various synergetic combinations with principal component analysis (PCA), SVM, and so on. Swiniarski and Skowron (2003) used RS and principal component analysis (PCA) for face recognition, thereby effectively reducing the number of features and increasing the classification accuracy. The RS theory is briefly described in the following section.

and

A  B ¼ fX \ Yj8X 2 A; 8Y 2 B; X \ Y – £g

ð5Þ

The RS approach has two basic concepts for data analysis, namely, lower approximations and upper approximations. Let X denote the subset of elements of the universe UðX  UÞ. BX means the lower approximation of X in BðB  AÞ, and it is defined as the union of all these elementary sets which are contained in X. BX means the upper approximation of X. It is defined as the union of these elementary sets that have a non-empty intersection with X. The boundary BNB is the difference between BX and BX of the universe UðX  UÞ. BX; BX and BNB can be defined as follows:

BX ¼ fxi 2 INDðBÞ # Xg

ð6Þ

BX ¼ fxi 2 INDðBÞ \ X – /g

ð7Þ

BNB ¼ BX  BXg

ð8Þ

We can measure inexactness and express the topological characterization of imprecision with the following equations: (1) If BX – / and B – U, then X is roughly B-definable. (2) If BX – / and B – U, then X is internally B-undefinable. (3) If BX – / and B – U, then X is totally B-undefinable. Let B and D be the attribute sets that induce equivalence relations over U, then the positive, negative, and boundary regions can be defined as follows:

POSB ðDÞ ¼ [ BX

ð9Þ

X2U=D

NEGB ðDÞ ¼ U  [ BX

ð10Þ

2.3. Rough set

BNDB ðDÞ ¼ [ BX  [ BX

ð11Þ

The mathematical theory of RS, introduced by Pawlak (1982) has been proven to be an important tool for handling the uncertainty that arises from inexact, noisy, or incomplete information. The RS table, also called information system (IS), is often used to organize sample data (objects and attributes). The IS will be called a decision table if the attributes consist of conditional attributes and decision attributes in a table. A more detailed background on RS can be found in the literature (Walczak and Massart, 1999). An information table is a pair of IS, where U is a non-empty finite set of objects called a universe, A is the condition attribute or a non-empty finite set of attributes so that information function fa: U ? Va for every a # A, and Va is the set of values that a can take. The IS is shown as (1)

The quantity k can be used to measure the degree of dependency between B and D. The degree can be defined as follows:

IS ¼ ðU; AÞ

X2U=D

ð1Þ

IND(B) by every set of attributes B  A is the indiscernibility relation that defines two objects, xi and xj, which are indiscernible by the set of attributes B in A, if b(xi) = b(xj) for every b  B. The IND(B) is called the elementary set in B because it represents the smallest discernible group of objects. The construction of elementary IND(B) sets is the first step in RS classification. The IND(B) is shown as (2)

INDðBÞ ¼ fðxi ; xj Þ 2 U 2 j8a 2 B; bðxi Þ ¼ bðxj Þg

ð2Þ

The partition generated by IND(B) is denoted as U/IND(B) and is calculated as follows:

U=INDðBÞ ¼ fa 2 BjINDðfbgÞg

ð3Þ

where

U=INDðbÞ ¼ fxi 2 bðxi Þ ¼ c; bðxi Þ 2 Ujc 2 V a g

X2U=D

ð4Þ



X2U=D

jPOSB ðDÞj jUj

ð12Þ

Attribute reduction in the rough set theory provides a filterbased tool to extract concise knowledge from a domain. It can preserve the information content while reducing the amount of attributes involved. Based on dependency degree, a reduct can be defined by the following definition: let R be a subset of C, then R is said to be a reduct if

rR ðDÞ ¼ r C ðDÞ ^

8R0  R; r R0 < r c ðDÞ

ð13Þ

Specifically, a reduct with minimal cardinality is called a minimal reduct. The goal of attribute reduction is to find a minimal reduct. Its objective function is

min jRj R2R

ð14Þ

where R is the set that consists of all the reducts of C. RS is methodologically significant to the artificial intelligence and cognitive science domains, especially in the representation of reasoning with vague or imprecise knowledge, such as in medical diagnosis, and so on. Komorowski and Zytkow (1997) revealed that RS approach is suitable for processing qualitative information that is difficult to analyze by standard statistical techniques. It integrates learning-from-example techniques, extracts rules from a data set of interest, and finds data regularities. Numerous rough set-based feature selection methodologies can be found in literature. Pan, Hong, and Nahavandi (2003) described the application of the RS method to feature selection and reduction

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836

in texture images recognition. Swiniarski and Andrzej (2003) presented the applications of RS methods for feature selection in pattern recognition. In medical diagnosis, the feature selection approach attempts to eliminate as many features as possible in the problem domain while still obtaining useful and meaningful outcomes with acceptable accuracy. Having a minimal number of features often leads to the establishment of simple models that can be more easily interpreted. This paper uses RS to obtain important attributes from n-dimensional to establish a reduced model. 2.4. Mahalanobis distance The Mahalanobis distance (MD) was introduced by Mahalanobis in 1936. Taguchi and Jugulum (2002) shown that MD is a distance measure based on the correlations between variables by which different patterns can be identified and analyzed with respect to a reference baseline. It is often used to measure the degree of abnormality as compared to normal conditions. It measures the distances in multidimensional spaces, taking into account the correlations between variables or features that may exist. The calculation of MD is given as follows: First, a reference data set H = (hij)mn consisting of m different variables and a sample size of n is selected from sample groups of normal conditions. Then X is normalized as

hij  li zij ¼ si

ð15Þ

where li and si are the mean and standard deviation of the ith variables, i = 1, . . . , m and j = 1, . . . , n. The normalized data set is obtained as Z = (zij)mn.  0 0 Second, for a feature vector of unknown y0 ¼  0 state  y1 ; y2 ; . . . ; 0 T 0 0 T ym  , the normalized feature vector y ¼ y1 ; y2 ; . . . ; ym is gained using the mean and standard deviation of the reference data set as follows:

yij ¼

yoi  li si

ð16Þ

Finally, the MD2 value from the feature vector y0 to the reference data set X is given as

MD2 ¼

1  yT  C 1  y m

ð17Þ

1 where C is the correlation matrix of Z, and C ¼ n1 ZZ T . Some of the advantages of MD when compared to other classical statistical approaches are as follows (Taguchi & Jugulum, 2002):

1. It takes into account not only the average value but also the variance and the covariance of the variables measured. 2. It accounts for ranges of acceptability (variance) between variables. 3. It compensates for interactions (covariance) between variables. 4. It is dimensionless. For medical diagnosis, its primary problem is reduced to finding a proper threshold T, which can distinguish disease or non-disease. If MD2 6 T, the patient is non-disease. If MD2 > T, disease occurs.

7831

3.1. The modeling stage In this stage, the required model is developed, which includes three steps. The first step is to define the problem. In this step, we should identify the condition attributes (inputs) and the decision attributes (output). Next, we collect the required data and preprocess them. Data preprocessing is necessary for preparing the train and test data set. Data preprocessing removes the inconsistent observations and missing values of the OSA data. The second step is the feature selection, which selects the important features by RS based on the training data. The construction process involves IS, IND(B), BX; BX, and POSB(D). The final step employs MD to distinguish the pattern of OSA. We first identify a ‘‘reference’’ or ‘‘normal’’ group to construct the Mahalanobis space (MS). Then, we use a normal group data and the selected features (obtained in Step 2) to construct the Mahalanobis space MS1, and use abnormal group data and selected features to construct the Mahalanobis space MS2. Finding an appropriate threshold to effectively distinguish the normal from the abnormal examples is an important issue. The threshold value is experimentally set based on MS1 and MS2. An effective threshold can enhance the diagnostic and forecasting ability of MD. 3.2. The application stage After modeling, the developed model can be used to predict the OSA pattern by inputting the OSA data with the selected features. 4. Case study 4.1. The case To evaluate the effectiveness of the proposed approach, a real case dataset from the Sleep Center of the Chang Gung Memorial Hospital in Taipei, Taiwan, was considered in this section. We collected 124 subjects (90 men and 34 women) who were referred for clinical suspicion of OSA. The patients were consecutively recruited from the outpatient clinic. The study was approved by the International Rugby Board (IRB) of the Chang Gung Memorial Hospital. In order to evaluate the classifiers in the proposed approach, two metrics, sensitivity and specificity, were used. The sensitivity metric is defined as the accuracy on the positive examples [true positives/(true positives + false negatives)], whereas specificity is the accuracy on the negative examples [true negatives/(true negatives + false positives)]. The abovementioned ‘‘negative’’ is taken as the majority class, whereas ‘‘positive’’ is the minority class. This study uses the g-means as the main metric to evaluate the performance of the classifiers. The g-means metric suggested by Kubat and Matwin (1997) has been used by several researchers for evaluating classifiers. This metric simultaneously takes sensitivity and specificity into account, and is defined as

g-means ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Sensitivity  Specificity

ð18Þ

Moreover, the sensitivity and specificity will also be listed separately to give the reader an even better idea of the performance of each classifier.

3. Proposed approach 4.2. Implementation In this study, we propose to integrate RS and MD to improve the quality of the OSA diagnosis. The proposed approach can be divided into two stages: the first stage uses RS to select important features and employs MD to distinguish the pattern of OSA, and the second stage shows how to apply the model developed in the first stage. Fig. 1 shows the procedure of the proposed approach. A more detailed description is shown below.

In this case study, an ‘‘event’’ can be either an apnea, characterized by complete cessation of airflow for at least 10 s, or a hypopnea in which airflow decreases by 50% for 10 s or decreases by 30% if there is an associated decrease in oxygen saturation or arousal from sleep (Anonymous, 1999). To grade the severity of the sleep apnea, the number of events per hour is reported as the

7832

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836 The Modeling Stage

Problem

Define the problem

Feature Selection

Use RS which include IS, IND(B), B X and BX to select important features

The Application Stage

Building a predictive model of OSA

Diagnosis and prediction

Use normal group data and selected features to construct MS1

Use the developed threshold to diagnosis OSA

Collect data

Use abnormal group data and selected features to construct MS2

Preprocess the data

Determine a threshold dased on MS1 and MS2

Fig. 1. The proposed approach.

apnea–hypopnea index (AHI). An AHI of less than 5 is considered normal, an AHI of 5–15 is mild, 15–30 is moderate, and more than 30 events per hour characterize severe sleep apnea. In the study, we define AHI < 5 as non-disease and AHI P 5 is disease. The collected OSA data contains 12 attributes, including anthropomorphic measurements (e.g., age, gender, height, weight, and BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), frequency of desaturation (DI3, DI4), frequency of paroxysmal leg movement in an hour (PLM), and questionnaire measurements (ESS, SOS). These data were obtained from the computerized data records, which also included the initial AHI on the diagnostic study. The explanation of each attribute is shown in Table 2. The next step is data preprocessing. This task was carried out as follows: (1) inconsistent observations where inconsistent data were deleted and (2) missing values where missing values were

Table 2 The OSA attributes. No.

Item

Description

A B C D E F G H

Gender Age BW BH BMI SBP DBP ESS

I J K L

SOS DI3 DI4 PLM

Gender (1, 2) Years (0–100) Weight (body weight, in kg) Height (body height, in cm) Body mass index (body mass index, kg/m2) Systolic blood pressure (mm Hg) Diastolic blood pressure (mm Hg) Daytime sleepiness survey scale (0–24, 24 = worst daytime sleepiness situation) Snorning survey score (0–100, 0 = worst snoring score) Frequency of desaturation (saturation index < 3% in an hour) Frequency of desaturation (saturation index < 4% in an hour) Frequency of paroxysmal leg movement in an hour

ignored in the analysis. Consequently, 86 subjects (62 diseases and 24 non-diseases) were studied. The subjects ranged from 11 to 78 years old, with an average age of 48 years old. The mean BMI was 24.98 kg/m2 (see Table 3). We separated the collected OSA data into two parts: Group I and Group II (see Table 4). Group I was used to establish the model, while Group II was used to test the developed model. We then employed RS to select the important features. A set of patients U (1, 2, . . . , 57) were put in the rows of the table, while the columns denoted the conditional attributes A (Gender, Age, BW, BH, BMI, SBP, DBP, ESS, SOS, DI3, DI4, and PLM) of these subjects, and a related decision attribute D. In this IS (Table 5), the objects were classified into one of the two categories: 1 (non-disease) and 2 (disease). Patients 1, 2, . . . , 16 are non-disease and patients 17, 18, . . . , 57 are disease. Subsequently, the condition classes of the objects as groupings of indiscernible objects are listed in Table 6. We are interested in the subset X. We distinguished this set from the whole data set in the space of 12 attributes A = (Gender, Age, BW, BH, BMI, SBP, DBP, ESS, SOS, DI3, DI4, and PLM). Based on Table 6, one can calculate the lower and upper approximations of the set. The elementary sets are presented in Table 5, and are also contained in X, are as follows:

fx1 ; x4 ; x8 ; x9 g; fx7 ; x12 g; fx21 ; x33 g; fx29 ; x37 ; x49 g; . . . ; fx22 ; x26 g; fx28 ; x40 g It means that the lower approximation is given by the following set of objects:

Table 4 The OSA data.

Group I (training data) Group II (testing data)

Table 3 The statistical measures of used clinical attributes. Item

Non-disease

Disease

Gender Age. BW BH BMI SBP DBP ESS SOS DI3 DI4 PLM

1.45 ± 0.50 46.45 ± 12.49 164.47 ± 8.47 64.50 ± 11.28 23.77 ± 3.23 122.25 ± 20.55 79.29 ± 10.40 8.29 ± 6.23 63.91 ± 22.24 19.08 ± 9.65 9.5 ± 8.91 2.06 ± 6.175

1.14 ± 0.35 49.03 ± 11.64 166.56 ± 6.84 70.79 ± 10.91 25.44 ± 2.98 125.56 ± 16.44 81.98 ± 10.46 10.75 ± 6.34 44.93 ± 18.37 124.75 ± 129.84 124.58 ± 129.81 2.97 ± 9.50

Disease

Non-disease

Total

41 21

16 8

57 29

Table 5 The decision table. Sample

1 2 3 4 . . . 56 57

Attributes Gender

Age

BW

BH

...

DI3

DI4

PLM

D

2 1 2 2 . . . 1 1

52 54 47 62 . . . 40 73

165.5 165 154.5 167 . . . 167 168

69.2 64 53.6 61 . . . 65 58.1

... ... ... ... ... ... ... ... ...

3 10 2 7 . . . 46 78

3 9 2 7 . . . 328 84

0 0 0 0 . . . 326 84

1 1 1 1 . . . 2 2

7833

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836 Table 6 The indiscernibility relation. U/A

Gender

Age

BW

...

DI3

DI4

PLM

D

{x1, x4, x8, x9} {x7, x12} {x21, x33} {x29, x37, x49} . . . {x22, x26} {x28, x40}

– – – – . . . – –

(49.5, Inf) (Inf, 48.5) (49.5, Inf) (49.5, Inf) . . . (Inf, 48.5) (49.5, Inf)

(159.5, Inf) (Inf, 159.5) (159.5, Inf) (159.5, Inf) . . . (159.5, Inf) (159.5, Inf)

... ... ... ... ... ... ... ... ...

(Inf, 26.0) (Inf, 26.0) (Inf, 26.0) (26.0, Inf) . . . (26.0, Inf) (26.0, Inf)

(Inf, 11.0) (Inf, 11.0) (11.0, Inf) (11.0, Inf) . . . (11.0, Inf) (11.0, Inf)

– – – – . . . – –

1 1 2 2 . . . 2 2

BX ¼ fx29 ; x37 ; x49 ; x28 ; x40 g

Table 7 Average and standard deviation of the raw data. Sample

1 2 3 . . . 16 Avg.(mi) Std.(si)

Features

BX ¼ fx21 ; x33 ; . . . ; x29 ; x37 ; x49 ; x22 ; x26 ; x28 ; x40 g

Gender

Age

DBP

ESS

SOS

DI3

2 1 2 . . . 1 1.50 0.52

52 54 47 . . . 51 47.50 13.70

80 87 94 . . . 65 78.13 11.49

5 16 2 . . . 14 9.38 6.47

64 20 65 . . . 49 65.19 21.15

3 10 2 . . . 9 11.38 10.61

Table 8 Normalized data. Sample

1 2 3 . . . 16

Features Gender

Age

0.968 0.968 0.968 . . . 0.968

0.328 0.474 0.036 . . . 0.255

DBP 0.163 0.772 1.381 . . . 1.142

ESS

SOS

DI3

MD1

0.676 1.024 1.140 . . . 0.715

0.056 2.136 0.009 . . . 0.765

0.789 0.129 0.883 . . . 0.223

0.261 1.007 0.804 . . . 0.556

Table 9 Inverse of the correlation matrix of the normal group.

Gender Age DBP ESS SOS DI3

Gender

Age

DBP

ESS

SOS

DI3

1.449 0.209 0.0146 0.203 0.477 0.605

0.209 1.207 0.352 0.062 0.228 0.124

0.014 0.352 1.108 0.014 0.113 0.010

0.203 0.062 0.014 1.161 0.317 0.058

0.477 0.228 0.113 0.317 1.358 0.430

0.605 0.124 0.010 0.058 0.430 1.352

BX B ¼ fx21 ; x33 ; . . . ; x22 ; x26 g We have the POSB(D) = (Gender, Age, DBP, ESS, SOS, and DI3), where six important features were used to establish the required model. We then used the normal group data and the six selected features to construct MD1. The 16 normal examples in the training set were designed as the reference (normal) group. First, the average and standard deviation of each feature in the normal group were calculated (Table 7) and the data in Table 8 were normalized by using Eq. (16). We can use the normalized data to construct the correlation matrix and its inverse (see Table 9). Finally, the Mahalanobis distances were calculated (using Eq. (17), as shown in Table 8. Similarly, we can calculate the Mahalanobis distance for the 41 disease data using the inverse correlation matrix (Table 9) from the non-disease data. Fig. 2 shows the Mahalanobis distance for both the disease and non-disease patients. A wide range of MD values were observed for the disease patients; however the non-disease patients are quite uniform. For medical diagnosis, the primary problem is to find a proper threshold T that can distinguish disease or non-disease patients. If MD 6 T, the patient is non-disease. The patient is disease if MD P T. Next, we calculate MD of the 16 data in the normal graph and 41 data in the abnormal graph. Its distribution chart is shown in Fig. 2. In this graph, the threshold was set at 0.5–3.0 to calculate its g-means, as shown in Table 10. Based on sensitivity and specificity, we found that the best threshold value can be set at 2.5. When the threshold was set at 2.5, its sensitivity and specificity would be 100%.

Fig. 2. MD distribution of Group I (k = 6).

7834

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836

0.8660, 0.7905, and 0.4082, respectively. The results indicated that our proposed approach outperformed the other methods. The proposed approach may then provide physicians with the final decision for their patients.

Table 10 Sensitivity analysis of the threshold. Threshold

Disease

Nondisease

Sensitivity (%)

Specificity (%)

g-Means (%)

0.5

Positive Negative

41 14

0 2

100

12.50

35.36

1.0

Positive Negative

41 8

0 8

100

50

70.71

1.5

Positive Negative

41 3

0 13

100

81.25

90.13

2.0

Positive Negative

41 1

0 15

100

93.75

96.82

2.5

Positive Negative

41 0

0 16

100

100

100

3.0

Positive Negative

41 0

0 16

100

100

100

4.4. A comparison 4.4.1. OSA In this study, six important features, including gender, age, DBP, ESS, SOS, and DI3, were identified by RS. These important features will be discussed in detail. Gender has only been recently recognized as a significant factor. Several studies have tried to provide an explanation for the male predominance in OSA, including the differences in anatomic size, the greater collapsibility of the upper airway, the greater increase in upper airway resistance in men, and the hormonal changes in women (Kapsimalis & Kryger, 2002; Shepertycky, Banno, & Kryger, 2005). In many studies, the age index is often used in the prediction model of the OSA disease (John, Linda, & Brien, 2003; Sharma et al., 2006). OSA has two possible underlying causes: (1) an anatomically vulnerable airway and (2) neurologically unstable breathing control. When people get older, their control force in the airway and breathing will be weaker. Therefore, age is neurologically influential and also influences the airway. The SOS and ESS are valid, reliable, and sensitive questionnaires to evaluate the severity of sleep-related problems. The ESS is intended to measure daytime sleepiness. It is often used clinically to screen for the manifestations of behavioral morbidity associated with OSA (Gliklich & Wang, 2002; Rosenthal & Diana, 2008). The SOS is another recent questionnaire that evaluates patients with snoring problems. In this study we also found that the hemodynamic parameter, such as the diastolic blood pressure (DBP), is more relevant to the development of OSA than systolic blood pressure (SBP). The ID3 index is the frequency of desaturation (index < 3% in an hour). This index may explain why a more severe desaturation than the one predicted in alveolar hypoventilation has been demonstrated in OSA patients (Gurubhagavatula & Maislin, 2001; Jacob et al., 1995). In other words, Schäfer, Ewig, Hasper, and Lüderitz (1997) showed that oxygen desaturation occurs more often in proportion to the frequency of respiratory disturbances in OSA subjects. The other feature not selected by RS includes BW, BH, BMI, DI4, SBP, and PLM. We will discuss these features next. BMI is a statistical measurement that compares the weight and height of a person. It is a useful index to estimate the body’s level of obesity. Obesity is often seen in OSA patients, but is not an important feature in our experimental result. There are two reasons for this: First, the World Health Organization defines obesity as a BMI of 30 kg/m2 or higher. The mean values of BMI were 23 and 25 in this study, which means the subjects were not obese. Second, the average difference between the BMI of non-disease and disease was 2. There were no obvious differences between DI3 and DI4 among the patients. We chose ID3 because its

To test the developed model, we used the best threshold value of 2.5 to distinguish the non-disease from the disease patients in the Group II data. The results of the final experiment are summarized in Table 11. Our proposed approach successfully distinguished OSA with a sensitivity and specificity of 85.71% and 100%, respectively. 4.3. A comparison In this section, we compared the performance of our proposed approach with the other popular classification techniques such as LR, BPN, LVQ, SVM, C4.5 decision tree, and RS. LR was first established as an analytical tool in epidemiology. Since then, LR has become the accepted ‘‘gold standard’’ in different research areas. Artificial neural networks (ANNs) are computer programs modeled after the biological nervous system. Based on experience, they are capable of recognizing complex patterns in the data. BPN and LVQ are common types of ANNs. Corinna and Vapnik (1995) revealed that SVMs are a classification method based on the statistical learning theory. The decision tree is a widely used approach in classification problems. In this comparison, SVM was implemented using LIBSVM. LIBSVM provides an efficient parameter selection tool using crossvalidation via parallel grid search under the kernel of the radial basis function type. BPN and LVQ are models of ANNs, and the Professional II PLUS software was utilized to construct BPN and LVQ in this study. The parameters of BPN and LVQ containing the learning rate, momentum, and number of hidden nodes were optimized by trial and error to find the combinations with the minimum root mean square error. All the results of the C4.5 decision tree in this comparison were operated by See5, which is a software tool designed. Finally, RSES was used to implement the RS for classification problems. Table 11 shows the test results on the OSA data set. The obtained g-means of our proposed application, LR, BPN, LVQ, SVM, C4.5 decision tree, and RS, were 0.9258, 0.8660, 0.7319, 0.4755,

Table 11 A comparison. Method

Selected attributes

RS and MD LR BPNN LVQ SVM C4.5 RS

Gender, Age, Gender, Age, Gender, Age, Gender, Age, Gender, Age, SOS, DI3 Gender, Age,

DBP, ESS, SOS, DI3 BW, BH, BMI, SBP, DBP, BW, BH, BMI, SBP, DBP, BW, BH, BMI, SBP, DBP, BW, BH, BMI, SBP, DBP, DBP, ESS, SOS, DI3

ESS, ESS, ESS, ESS,

SOS, SOS, SOS, SOS,

DI3, DI3, DI3, DI3,

DI4, DI4, DI4, DI4,

PLM PLM PLM PLM

Sensitivity (%)

Specificity (%)

g-Means (%)

85.71 85.71 85.71 90.47 85.71 100 66.67

100.00 87.5 62.5 25 87.5 62.50 25.00

92.58 86.60 73.19 47.55 86.60 79.06 40.82

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836

frequency intensity was lower and relatively more sensitive than that of ID4. The PLM represents the frequency of paroxysmal leg movement per hour during night sleep, indicating the severity of sleep disturbance caused by this particular disease.

7835

Acknowledgement This work was supported in part by the National Science Council, Taiwan, under grant NSC-98-2221-E-007-071-MY3. References

4.4.2. Methods We have to consider the ‘‘initial assumptions’’ of the model before using LR. However, the ‘‘initial assumptions’’ of the model were not considered in OSA. Therefore, the results of LR analysis cannot be validly interpreted before the initial assumptions have been satisfactorily checked. However, the priori statistical properties can be hazardous. If we cannot provide the initial assumptions of OSA, the result of LR may not be valid. ANN is a novel method that has been used successfully in OSA applications, but suffers from some disadvantages such as overtraining, over-fitting, and network optimization. Vogt and Bared (1998) revealed that ANN models have been criticized for working as a black-box and for sometimes over-fitting the data, especially when the sample size is small. In this study, we studied 86 subjects which belonged to a small sample size. The ANN result may therefore not be completely reliable. SVMs have been successfully applied to classification and regression problems such as character recognition, but it also has some disadvantages. First, SVM works as a black-box similar to traditional neural networks. Second, Huang and Chiu (2006) showed that there are three parameters needed to be predetermined before the training phase and extra time is needed to adjust the parameters. For this reason, SVM cannot help the doctors’ diagnoses immediately. Our proposed application does not need to adjust the parameters. The C4.5 program generates a classifier in the form of a ‘‘decision tree.’’ So far, the methods for deciding the optimum options (adjusting parameters) are not clear. Thus, much time will be consumed in finding the optimum combination of options. For this reason, our proposed application outperformed C4.5. The RS theory offers a schematic approach for analyzing data without initial assumptions. It is an advantage because we cannot consider initial assumptions before data analysis. Medical data are less amenable to such initial assumptions unless we employ a number of initial assumptions and constraint settings. However, Pattaraintakorna and Cerconeb (2008) revealed that medical science is not an exact science in which processes can be easily analyzed and modeled. The RS and MD approaches take advantage of a correct and proven philosophy to work with medical data without strong a priori reasoning. It is appropriate to use RS and MD to analyze medical science. Moreover, the greater the MD value there is, the greater the distance from the normal (non-disease). The MD value can help doctors judge the OSA of a patient’s condition. 5. Conclusion In recent years, OSA has become an important public health concern. In this study, we applied the RS and MD to enhance the quality of OSA diagnosis when conducted using demographic information (age and gender), anthropomorphic measurements (height, weight, BMI, SBP, DBP, frequency of desaturation (DI3, DI4), frequency of paroxysmal leg movement in an hour (PLM)), and questionnaire measurements (ESS, SOS). The proposed method uses RS to select important features and MD to distinguish the pattern of OSA. Implementation results show that our proposed method not only can effectively detect OSA, it can reduce the cost and time needed for an accurate diagnosis. The proposed approach can be employed by physicians when providing the clinical decision for their patients.

Anonymous (1999). Sleep-related breathing disorders in adults: Recommendations for syndrome definition and measurement techniques in clinical research. The Report of an American Academy of Sleep Medicine Task Force. Sleep, 22, 667– 689. Barthel, S. W., & Strome, M. S. (1999). Obstructive sleep apnea and surgery. Medical Clinics of North America, 83(1), 85–96. Corinna, C., & Vapnik, V. N. (1995). Support vector networks. Machine Learning, 20, 1–25. Desai, A. V., Ellis, E., Wheatley, J. R., & Grunstein, R. R. (2003). Fatal distraction: A case of fatal fall-asleep road accidents and their medicolegal outcomes. The Medical Journal of Australia, 178, 396–399. Doak, J. (1992). An evaluation of feature selection methods and their application to computer security. University of California, Department of Computer Science. Gliklich, R. E., & Wang, P. C. (2002). Validation of the snore outcomes survey for patients with sleep-disordered breathing. Archives of Otolaryngology – Head and Neck Surgery, 128, 819–824. Gurubhagavatula, I., & Maislin, G. (2001). An algorithm to stratify sleep apnea risk in a sleep disorders clinic population. American Journal of Respiratory and Critical Care Medicine, 170, 371–376. Hillman, D., Murphy, A., & Pezzullo, L. (2006). The economic cost of sleep disorders. Sleep, 29(3), 299–305. Huang, S. J., & Chiu, N. H. (2006). Optimization of analogy weights by genetic algorithm for software effort estimation. Information and Software Technology, 48, 1034–1045. Jacob, S. V., Morielli, A., Mograss, M. A., Ducharme, F. M., Schloss, M. D., & Brouillette, R. T. (1995). Home testing for pediatric obstructive sleep apnea syndrome secondary to adenotonsillar hypertrophy. Pediatric Pulmonology, 20, 241–252. Jenkinson, C., Davies, R. J., Mullins, R., & Stradling, J. R. (1999). Comparison of therapeutic and subtherapeutic nasal continuous positive airway pressure for obstructive sleep apnoea: A randomised prospective parallel trial. Lancet, 353, 2100–2105. John, B. D., Linda, M. S., & Brien, P. E. (2003). Predicting sleep apnea and excessive day sleepiness in the severely obese. Chest, 123, 1134–1141. Kapsimalis, F., & Kryger, M. H. (2002). Gender and obstructive sleep apnea syndrome. Sleep, 25, 497–504. Komorowski, J., & Øhrn, J. (1999). Modelling prognostic power of cardiac tests using rough sets. Artificial Intelligence in Medicine, 15(2), 167–191. Komorowski, J., & Zytkow, J. (1997). Principles of data mining and knowledge discovery. Springer. Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training set: One-sided selection. In Proceedings 14th international conference on machine learning (ICML ’97). Pan, L., Hong, Z., & Nahavandi, S. (2003). The application of rough set and kohonen network to feature selection for object extraction. In Proceedings of the 2003 international conference on machine learning and cybernetics (Vol. 21, pp. 185– 189). Pattaraintakorna, P., & Cerconeb, N. (2008). Integrating rough set theory and medical applications. Applied Mathematics Letters, 21, 400–403. Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Science, 11, 341–356. Pokrajac, D., Megalooikonomou, V., Lazarevic, A., Kontos, D., & Obradovic, Z. (2005). Applying spatial distribution analysis techniques to classification of 3D medical images. Artificial Intelligence in Medicine, 33(3), 261–280. Rosenthal, L. D., & Diana, D. C. (2008a). The Epworth sleepiness scale in the identification of obstructive sleep apnea. The Journal of Nervous and Mental Disease, 196, 429–431. Rosenthal, L. D., & Diana, D. C. (2008b). The epworth sleepiness scale in the identification of obstructive sleep apnea. The Journal of Nervous and Mental Disease, 196, 429–471. Rowley, J. A., Aboussouan, L. S., & Badr, M. S. (2000). The use of clinical prediction formulas in the evaluation of obstructive sleep apnea. NCBI, 23, 929–938. Ryan, P. J., Hilton, M. F., Boldy, D. A., Evans, A., Bradbury, S., Sapiano, S., et al. (1995). Validation of british thoracic society guidelines for the diagnosis of the sleep apnoea/hypopnoea syndrome: Can polysomnography be avoided. Chest, 50, 972–975. Schäfer, H., Ewig, S., Hasper, E., & Lüderitz, B. (1997). Predictive diagnostic value of clinical assessment and nonlaboratory monitoring system recordings in patients with symptoms suggestive of obstructive sleep apnea syndrome. NCBI, 64, 194–199. Sharma, S. K., Malik, V., Vasudev, C., Banga, A., Mohan, A., Handa, K. K., et al. (2006). Prediction of obstructive sleep apnea in patients presenting to a tertiary care center. Sleep Breath, 10, 147–154. Shepertycky, M. R., Banno, K., & Kryger, M. H. (2005). Differences between men and women in clinical presentation of patients diagnosed with obstructive sleep apnea syndrome. Sleep, 28, 309–314.

7836

P.-C. Wang et al. / Expert Systems with Applications 38 (2011) 7828–7836

Swiniarski, R. W., & Andrzej, S. (2003). Rough set methods in feature selection and recognition. Pattern Recognition Letters, 24, 833–849. Swiniarski, R., & Skowron, A. (2003). Rough set methods in feature selection and recognition. Pattern Recognition Letters, 24, 833–849. Taguchi, G., & Jugulum, R. (2002). The Mahalanobis–Taguchi strategy. New York, USA: John Wiley & Sons. Theodoridis, S., & Koutroumbas, K. (2003). Pattern recognition. Amsterdam, Holland: Academic Press. Vogt, A., & Bared, J. G. (1998). Accident models for two-lane rural roads: Segments and intersections. Transportation Research Board of the National Academies, 1635, 18–29.

Walczak, B., & Massart, D. L. (1999). Rough sets theory. Chemometrics and Intelligent Laboratory Systems, 47, 1–16. Wright, J., Johns, R., Watt, I., Melville, A., & Sheldon, T. (1997). Health effects of obstructive sleep apnea and the effectiveness of continuous positive airway pressure: A systematic review of the research evidence. British Medical Journal, 314, 851–860. Yamashiro, Y., & Kryger, M. H. (1999). Nocturnal oximetry: Is it a screening tool for sleep disorders. Sleep, 18, 167–171. Young, T., Palta, M., Dempsey, J., Skatrud, J., Weber, S., & Badr, S. (1993). He occurrence of sleep-disordered breathing among middle-aged adults. New England Journal of Medicine, 328(17), 1230–1235.