Neurocomputing 174 (2016) 194–202
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
An algorithm for classification over uncertain data based on extreme learning machine$ Keyan Cao a,b, Guoren Wang b,c, Donghong Han b, Mei Bai b, Shuoru Li b a
Shenyang Jianzhu University, Liaoning, Shenyang 110168, China Northeastern University, Liaoning, Shenyang 110819, China c Key Laboratory of Medical Image Computing (Northeastern University), Ministry of Education, China b
art ic l e i nf o
a b s t r a c t
Article history: Received 29 September 2014 Received in revised form 16 May 2015 Accepted 17 May 2015 Available online 10 August 2015
In recent years, along with the generation of uncertain data, more and more attention is paid to the mining of uncertain data. In this paper, we study the problem of classifying uncertain data using Extreme Learning Machine (ELM). We first propose the UU-ELM algorithm for classification of uncertain data which is uniformly distributed. Furthermore, the NU-ELM algorithm is proposed for classifying uncertain data which are non-uniformly distributed. By calculating bounds of the probability, the efficiency of the algorithm can be improved. Finally, the performances of our methods are verified through a large number of simulated experiments. The experimental results show that our methods are effective ways to solve the problem of uncertain data classification, reduce the execution time and improve the efficiency. & 2015 Elsevier B.V. All rights reserved.
Keywords: Extreme learning machine Classification Uncertain data Single hidden layer feedforward neural networks
1. Introduction In recent years, a large amount of uncertain data is generated and collected due to new techniques of data acquisition, which are widely used in many real-world applications, such as wireless sensor networks [1,2], moving object detection [8,34,36], meteorology and mobile telecommunication. However, since the intrinsic differences between uncertain and deterministic data, it is difficult to deal with uncertain data using traditional data mining algorithms for deterministic data. Therefore, many researchers put efforts in developing new techniques of data processing and mining on uncertain data [6,7,32,33]. Uncertain data model can be loosely classified into the following three categories [39]: (1) the most rigorous assumption of uncertainty is conceptually described by a continuous probability density function (pdf) in the data space D. Given any uncertain R tuple x in D and its pdf f ðxÞ; x A D f ðxÞ dx ¼ 1; (2) an uncertain tuple xi consists of a set of possible values in the data space as its instances, denoted by xi ¼ x1i ; x2i ; …; xji ; …; xni . The number of instances for an uncertain data xi is denote by ∣xi ∣ ¼ n. It can be regarded as the discrete case of probability density function. Let
☆ This research is supported by the NSFC (Grant nos. 61173029, 61472069, 61332006, 60933001, 75105487 and 61100024), National Basic Research Program of China (973, Grant no. 2011CB302200-G), National High Technology Research and Development 863 Program of China (Grant no. 2012AA011004) and the Fundamental Research Funds for the Central Universities (Grant no. N110404011).
http://dx.doi.org/10.1016/j.neucom.2015.05.121 0925-2312/& 2015 Elsevier B.V. All rights reserved.
pðxji Þ denote the exist probability of instance xji , then pðxji Þ 4 0, and Pj ¼ n j j ¼ 1 pðxi Þ ¼ 1; (3) it assumes that the standard deviation of each tuple is available [5], although such assumption of uncertainty is fairly simple and modest, but it is not a mainstream model in the uncertain data management field. Note that in this paper, we mainly focus on the pdf model as it has been widely used in the setting of uncertain data model. In this pdf model, there exist two different distribution: (a) uniformly distributed, the probabilities of the instances of the same uncertain data are equal, as shown in Fig. 1; (b) non-uniformly distributed, the probability of instance in accordance with same distribution, as shown in Fig. 1(b). Classification is one of the key problems in data mining area which can find interesting patterns, and has significant application merits in many fields. There are many published works on classification method [3,4,11,12,31,37,38]. Inducing uncertainty to the data makes the problem far more difficult to tackle, as this will further limit the accuracy of subsequent classification. Therefore, how to effectively classify uncertain data is great importance. There are many challenges which will affect the uncertain data classification. Challenge 1: What is the classification over uncertain data? When considering deterministic data, it is deterministic which classes the certain object belongs to. However, over uncertain data, it is uncertain that which classes the uncertain object belongs to. Thus, classification result over uncertain data cannot be de fined just based on the definition of classification over deterministic data.
K. Cao et al. / Neurocomputing 174 (2016) 194–202
195
Fig. 1. Uncertain object: (a) uniformly distributed and (b)non-uniformly distributed.
Our contribution. We present a new definition of classification over uncertain data. Keeping the basic idea of the traditional definition of classification, we employ probability in this new definition. We can get the probability of the uncertain object belonging to any class, then it will be assigned to the class with the maximum probability. Challenge 2: How can the uncertain data be classified efficiently? Each uncertain object contains some possible instances, and any instance has an exist probability. A naive approach is to process all instances of uncertain object, and the sum up the probabilities of instances that belong to the same class, then the uncertain object is assigned the class which the maximum probability belongs to. This naive approach is infeasible in realistic due to the reason, it costs too much time to process all the instances contained in an uncertain object. In order to classify the uncertain data, a more effective approach is expected. Our contribution. We propose a pruning-based approach to effectively and efficiently reduce the processing amount of instances and save the cost. First, based on ELM method, we propose the Uniformly distributed Uncertain data classification based on ELM (UU-ELM) algorithm for classification over uncertain data which are uniformly distributed, which can quickly get the bounds of probability of objects belong to each class, to improve the efficiency. Second, we propose NU-ELM algorithm for classification over uncertain data which are in non-uniformly distributed. By calculating the upper bounds and lower bounds, we can reduce the amount of calculation. Motivation (sensor data): Sensor networks are frequently used to monitor the surrounding environment, in which each sensor reports its measurements to a central location. In the case of environmental monitoring sensor net, measurements may include air pressure, temperature and humidity. The true measured values cannot be accurately obtained due to limitations of the measuring equipments. Instead, sensor readings are sent in order to approximate the true value, leading to uncertain objects. Traditional classification algorithms are unable to deal with such challenges. In this paper, we investigate uncertain data classification based on ELM [13,15–17,19–22,26,27]. In the remainder of this paper, we first introduce the ELM in Section 2. After that, we formally define our problem in Section 3. We analyze the challenges of classification over uncertain data, and develop the two algorithms respectively in Section 4. Section 5 presents an extensive empirical study. In Section 6, we conclude this paper with directions for future work.
2. Brief of extreme learning machine In this section, we present a brief overview of ELM, developed by Huang et al. [18,23,26,28,29]. ELM is based on a generalized Single Hidden-layer Feedforward Network (SLFN). The
interpolation capability and universal approximation capability of ELMs have been investigated [24]. In ELM, the hidden-layer node parameters are mathematically calculated instead of being iteratively tuned, providing good generalization performance at thousands of times higher speeds than traditional popular learning algorithms for feedforward neural networks [24]. The output function of ELM for generalized SLFNs is represented by f L ðxÞ ¼
L X
βi g i ðxÞ ¼
i¼1
L X
βi Gðai ; bi ; xÞ;
x A Rd ; βi A Rm
ð1Þ
i¼1
where β ¼ ½β1 ; …; βL T is the vector of the output weights between the hidden layer of L nodes and the output node, and g i denotes the output function Gðai ; bi ; xÞ of the ith hidden node. For additive nodes with activation function g; g i is defined as ai A Rd ; bi A R
g i ¼ Gðai ; bi ; xÞ ¼ gðai x þ bi Þ;
ð2Þ
For Radial Basis Function (RBF) nodes with activation function g; g i is defined as ai A Rd ; bi A R þ
g i ¼ Gðai ; bi ; xÞ ¼ gðbi J x ai J Þ;
ð3Þ
The above equations can be written compactly as Hβ ¼ T
ð4Þ
where 2 3 2 3 Gða1 ; b1 ; x1 Þ⋯GðaL ; bL ; x1 Þ hðx1 Þ 6 7 6 7 ⋮ H¼4 ⋮ 5¼4 5 hðxN Þ Gða1 ; b1 ; xN Þ⋯GðaL ; bL ; xN Þ 2
βT
3
2
6 17 7 β¼6 4 ⋮ 5
βTL
and Lm
ð5Þ NL
3
t T1 6 ⋮ 7 T ¼4 5 t TN
ð6Þ Nm
H is the hidden layer output matrix of the SLFN [13,14,24]; the i th column of H is the i th hidden node output with respect to inputs x1 ; x2 ; …; xN hðxÞ ¼ Gða1 ; b1 ; xÞ; …; gðaL ; bL ; xÞ is called the hidden layer feature mapping. The i th row of H is the hidden layer feature mapping with respect to the i th, input xi : hðxi Þ. It has been proved [24,28] that from the interpolation capability point of view, if the activation function g is infinitely differentiable in any interval, the hidden layer parameters can be randomly generated. For the problem of multiclass classifier with single output, ELM can approximate any target continuous function and the output of the ELM classifier hðxÞβ can be as close as possible to the class labels in the corresponding regions [25]. To maximize the training errors, ξi and to minimize the norm of the output weights, the classification problem for the proposed constrained-optimizationbased ELM with a single-output node can be formulated as [25] Minimize : LPELM ¼ 12 J β J 2 þC 12
N X i¼1
ξ2i
196
K. Cao et al. / Neurocomputing 174 (2016) 194–202
Subject to : hðxi Þβ ¼ t i ξi ;
i ¼ 1; …; N
ð7Þ
where C is user-specified parameter. Based on the KKT theorem [10], training ELM is equivalent to solve the following dual optimization problem: N N X 1 1X LDELM ¼ J β J 2 þ C ξ2i αi ðhðxi Þβ t i þ ξi Þ 2 2i¼1 i¼1
ð8Þ
where each Lagrange multiplier αi corresponds to the i th training sample. The KKT optimality conditions given in [10] are as follows: N X ∂LDELM ¼ 0⟶β ¼ αi hðxi ÞT ¼ HT α ∂β i¼1
∂LDELM ¼ 0⟶αi ¼ C ξi ; ∂ ξi
ð9Þ
i ¼ 1; …; N
∂LDELM ¼ 0⟶hðxi Þβ t i þ ξi ¼ 0; ∂α i
ð10Þ
i ¼ 1; …; N
ð11Þ
where α ¼ ½α1 ; …; αN . For a multiclass classifier, ELM requires multioutput nodes instead of a single-output node [25]. m-class of classifiers have m output nodes. If the original class label is p, the expected output T p vector of the m output modes is t i ¼ 0; …; 0; 1; 0; …; 0 . In this case, only the p th element of t i ¼ ½t i;1; ⋯; t i;m T is one, while the rest of the elements are set to zero. Under these conditions, the classification problem can be formulated as [25] Minimize : LP ELM ¼ 12 J β J 2 þ C 12
N X i¼1 T
Subject to : hðxi Þβ ¼ t Ti ξi ;
J ξi J 2 i ¼ 1; …; N
ð12Þ
where ξ ¼ ½ξi;1 ; …; ξi;m is the training error vector of the m output nodes with respect to the training sample xi . Similar as above, training ELM is equivalent to solve the following dual optimization problem:
uncertain object belongs to and the class with maximum probability is chosen. 3.1. Binary classification of ELM For an uncertain object xi , each sample of xi is an instance, all the instances of set Sðxi Þ are learned using ELM to obtain binary classes. Definition 1 (Classification over uncertain data). xji denotes any one instance of uncertain object xi ; f ðxji Þ is a probability of xji . Class P set consists of two classes cm and cn . If xj A cm f ðxji Þ Z 0:5, then the i j j instance xi is in class cm , otherwise, xi belongs to class cn . For binary classification case, ELM needs only one output node ðm ¼ 1Þ , and the decision function of ELM classifier is 0 1 L X @ f ðxÞ ¼ sign βi Gðai ; bi ; xÞA ¼ signðβ hðxÞÞ ð14Þ j¼1
For an uncertain object, its instances are classified into different classes, i.e. in object xc . cs and cw are class labels of the shaded area and the white area respectively. In Fig. 2, all the instances of xa are in class cw , so xa belongs to class cw . However, some instances of xc are in class cs , and the others are in class cw . The membership of class will be determined by the class which has a higher probability. To simplify this computing, we construct a function as follows: 8 P > 1; f ðxji Þ signðβ hðxi ÞÞ Z 0:5 > > > < f ðxji Þ A c1 f ðxi Þ ¼ P > f ðxji Þ signðβ hðxi ÞÞ o 0:5 0; > > > f ðxj Þ A c : 0 i
T
LDELM
N N X m X 1 1X ¼ JβJ2 þC J ξi J 2 αi;j ðhðxi Þβj t i;j þ ξi;j Þ 2 2i¼1 i¼1j¼1
ð13Þ
where βj is the vector of the weights linking hidden layer to the j th output node and β ¼ ½β1 ; …; βm . 3. Problem definition In this section, we first introduce the definition of classification on uncertain data and secondly summarize the symbols used in this paper in Table 1. Uncertain data model: Let us assume that uncertain data set consists of uncertain objects x1 ; x2 ; …; xi ; …; xn . Each object is described by a pdf, xi denotes the i th object in data set. Probability density function of xi is denoted by f ðxi Þ. Meanwhile, xji is the j th instance of object xi . Let Sðxi Þ denote the set of all instances of xi . R f ðxji Þ 4 0 for any instances xji in set Sðxi Þ, and x A Sðxi Þ f ðxi Þ dx ¼ 1. As an example, xa ; xb and xc are three uncertain objects, each object is described by a continuous probability density function. Fig. 1 maps some uncertain objects into a 2D coordinate. For simplicity, we only consider two attributes in the example in Fig. 2. Let the shaded area and the white area represent two classes respectively, and the range of uncertain object is to be classified into the corresponding category. When the range of the uncertain objects is classified into the same class, then the uncertain object should belong to that particular class, as xa ; xb . However, when the instances of uncertain object are classified into different classes, then we will consider probability of each instance, as xc . For an uncertain data, we need to consider all possible classes which the
3.2. Multiclass classification of ELM The decision function of multiclass classification of ELM is as follows: 1 ! 1 f ðxÞ ¼ sign hðxÞH T þ HH T T ð15Þ C The expected output vector is 2 3 t1 6 7 T ¼4⋮ 5 tL For multiclass cases, S(class) denotes the set of classes, j SðclassÞj is the number of classes and output nodes of hidden layer. The predicted class label of a given testing sample is the index number of the output node which has the highest output value for the given testing sample. Let f k ðxji Þ denote the output function of the k th ð1 r k r j SðclassÞj Þ output node, i.e., f ðxji Þ ¼ ½f 1 ðxji Þ; …; f j SðclassÞj ðxji Þ, then the predicted class label of sample xji is labelðxji Þ ¼ arg
max
f τ ðxji Þ
τ A ð1;…;j SðclassÞj Þ
Due to the nature of classification over uncertain data [30], Definition (1) cannot be applied to uncertain multiclass classification. If the uncertain object is classified into three categories, and all probabilities in three classes are smaller than 0.5, then Definition 1 cannot be used for multiclass classification. For this case, Definition 1 is modified as follows: Definition 2 (Probability of a classification over instance). Let xji denote the j th instance of uncertain object xi ; f ðxji Þ is the
K. Cao et al. / Neurocomputing 174 (2016) 194–202
197
Table 1 Frequently used symbols and notations. Symbol
Interpretation
Symbol
U
Uncertain object set Any one instance of xi
xi
i th Uncertain object
f ðxji Þ SðclassÞ
Probability of instance xji Set of all classes
pcl ðxji Þ SSðxi Þ Si ðpÞ s
Probability of instance xji C belongs to class cl Set of sells that are covered by object xi Set of all p-cells of xi Area of any one cell
f ðgðm; nÞÞ f ðgðm; nÞÞ
Upper bound of probability of instances which are in cell gðm; nÞ
xji Sðxi Þ cl
Set of all instances of xi A class label
gðm; nÞ Si ðcÞ l pcgðm;nÞ
A cell label Set of all c-cells of xi Probability of gðm; nÞ belongs to cl
pcl ðxi Þ
Probability of xi belongs to cl
pcl gðm;nÞ
Upper bound of probability of instances which are in gðm; nÞ belong to cl
10
Interpretation
Lower bound of probability of instances which are in cell gðm; nÞ
Property 1. We assume that the probability of xi belonging to cl is not small than 0.5, i.e. pcl ðxi Þ Z 0:5, then the object xi belongs to class cl .
9 8
xa
According to Property 1, in some cases, we can quickly determine which class the object belongs to, unnecessary calculations are then avoided. Problem: Given an uncertain data set containing training data and testing data, each uncertain object xi is described by a continuous probability density function f ðxi Þ. The class label is the output for each uncertain data object.
7 6
0.3
5
xc
4
0.7
3 2
xb
1 0
1
2
4. Classification algorithm
3
4
5
6
7
8
9
10
Fig. 2. Example of classification over uncertain data.
probability associated with xji based on the pdf function f ðxi Þ of xi . Given a number of classes j SðclassÞj , ELM can classify the instance xji into j SðclassÞj classes. If the instance xji belongs to the class cl ; ð1 rl r j SðclassÞj Þ; pcl ðxji Þ is probability of instance xji belonging to the class cl , then pcl ðxji Þ ¼ f ðxji Þ. Definition 3 (Probability of classification over uncertain object). We assume that xi is an uncertain object, that is described by a continuous probability density function f ðxi Þ. Let Sðxi Þ be the set of instances of uncertain object xi ; j Sðxi Þj is the number of instances of object xi ; xji is the j th instance of xi . S(class) is the set of classes, j SðclassÞj is the number of classes. pcl ðxi Þ is the probability of object xi belonging to the class cl ; 1 r l r j SðclassÞj ; pcl ðxi Þ is equal to the sum of probabilities of the instances which belong to class cl , then pcl ðxi Þ ¼
X xji
f ðxji Þ
A cl
Definition 4 (Classification of uncertain data). Let SðclassÞ be the set of classes, j SðclassÞj is the number of classes. cl is the class label ð1 r l r j SðclassÞj Þ . xi is an uncertain object, that is described by a continuous probability density function f ðxi Þ . pcl ðxi Þ is the probability of object xi belonging to the class cl ; 1 rl r j SðclassÞj . If pcl ðxi Þ Z 8 pcj ðxi Þ ð1 r j r j SðclassÞj ; j a lÞ, then the uncertain object xi belongs to class cl . For example in Fig. 2, let cs and cw be class labels of the shaded area and the white area respectively. According to P Definitions 3 and 4, we know that pcs ðxi Þ ¼ xj A cs f ðxji Þ ¼ 0:3 and P i j cw p ðxi Þ ¼ xj A cw f ðxi Þ ¼ 0:7, then xc belongs to class cw . i
Although we can classify uncertain data, but it is a naive approach. In this section, based on the definition of classification over uncertain data, we proposed two methods which are suitable for uniformly distribution and non-uniformly distribution of uncertain data objects.
4.1. Uniformly distributed First, we proposed a grid-based method, named UU-ELM to classify uncertain data which are uniformly distributed with the use of grid framework. In this section, we describe UU-ELM algorithm for classification over uncertain data. In order to present the pruning approach for uncertain data, we first review the structure of the grid. For simplicity and limitation of space, we only discuss the case that uncertain objects are in a 2-dimensional space. Given an uncertain database U, we divide its domain by grid G, which can be viewed as a 2-dimensional array of cells. The area of any one grid is s. Let gðm; nÞ be a cell in G, as shown in Fig. 3. Each cell is square. Let SSðxi Þ be the set of cells that are covered by object xi , area of xi is denoted by areaðxi Þ. The cells which are an covered by xi are divided into two categories: completely covered cells and partially covered cells, denoted by c-cell and p-cell respectively. Let Si ðcÞ denote the set of all c-cells and Si ðpÞ denote the set of all partially covered cells of xi , then Si ðcÞ [ Si ðpÞ ¼ SSðxi Þ and Si ðcÞ \ Si ðpÞ ¼ ∅. All cells are classified by ELM, and the results are available. Let l pcgðm;nÞ denote the probability of cell gðm; nÞ belonging to class cl . Property 2. Let j cj -cells denote the number of c-cells that are covered by object xi ; Si ðcÞ denotes the set of all c-cells of xi , the probability of any one c-cell belonging to class cl is pccl , the probability of all c-cells of xi belong to class cl is as follows: P pccl cell ðxi Þ ¼
cl gðm;nÞ A Si ðcÞ pgðm;nÞ
j cj cell
s j cj cell ¼
X
l pcgðm;nÞ s
gðm;nÞ A Si ðcÞ
198
K. Cao et al. / Neurocomputing 174 (2016) 194–202 l probability of cell gðm; nÞ belonging to class cl . pcsmin denotes the upper bound and lower bound of probability of instances in range Smin belong to class cl respectively. if
if we can get pccl cell ðxi Þ Z 12 areaðxi Þ based on Property 1, object xi belongs to class cl . If we cannot judge class labels that object xi belongs to, based on Property 2, we need to consider the p-cells of xi . Partially covered cells are divided into four categories, as shown in Fig. 4. In Fig. 4(a), cell gðm; nÞ is a partially cell of object xi ; xi and cell gðm; nÞ in a, b and c points. We define a minimum bounding rectangle (MBR) outside which the object has zero (or negligible) probability of occurrence, denoted by sða; b; c; dÞ, i.e. sða; b; cÞ r sðxi \ gðm; nÞÞ rsða; b; c; dÞ. Let smin and smax denote the minimum and maximum value of sðxi \ gðm; nÞÞ respectively. In Fig. 4(b), arc tangent to the grid at point e; f point, makes the area of sða; b; e; f Þ is min value, then smin ¼ sða; b; c; dÞ; smax ¼ sða; b; e; f Þ. According to the same principle, in Fig. 4(c), we can know that smin ¼ sða; b; c; dÞ; smax ¼ sða; b; e; f Þ. In Fig. 4(d), smin ¼ sða; b; c; d; eÞ; smax ¼ sða; b; f ; g; eÞ. Property 3. Let smin and smax denote the minimum and maximum value of intersection area of object xi and gðm; nÞ; gðm; nÞ is a p-cell of l object xi , the area of gðm; nÞ is denoted by s; pcgðm;nÞ denotes the
smin l Z pcgðm;nÞ s then l l pcsmin ¼ pcgðm;nÞ
we need to consider the relationship between l determine the lower bound of pcsmin : if
smin s
l and pcgðm;nÞ to
smin l Z 1 pcgðm;nÞ s then l pcsðminÞ ¼
smin s l l ð1 pcgðm;nÞ Þ ¼ min 1 þ pcgðm;nÞ s s
otherwise l pcsðminÞ ¼0
if smin l o pcgðm;nÞ s then l pcsmin ¼
smin s
l is based on the relationship between the lower bound of pcsmin cl smin and p : if gðm;nÞ s
smin l Z 1 pcgðm;nÞ s then Fig. 3. Uncertain objects on grid.
l pcsðminÞ ¼
smin l 1 þ pcgðm;nÞ s
Fig. 4. Intersection of object xi and gðm; nÞ: (a) case 1, (b) case 2, (c) case 3 and (d) case 4.
K. Cao et al. / Neurocomputing 174 (2016) 194–202
199
Fig. 5. Non-distribution of uncertain object: (a) MBR of objects and (b) grids of object.
otherwise l pcsðminÞ
Table 2 Example of probability of xi .
¼0
l are obtained based The upper bound and lower bound of pcsmax on the same method as smin . The detail is not given in this section.
Class
c1
c2
c3
c4
c5
Upper bound Lower bound
0.28 0.1
0.5 0.3
0.26 0.25
0.2 0.05
0.21 0.15
Property 4. Let pccl cell denote the probability of all c-cells of xi belonging to class cl ; pcxli and pcxli are upper bound and lower bound of pcxli respectively, then X l pcl ðxi Þ ¼ pccl cell þ pcsmax pcl ðxi Þ
Table 3 Example of list L. L
t1
t2
t3
t4
t5
Probability
pcx2i
pcx3i
pcx5i
pcx1i
pcx4i
gðm;nÞ A Si ðpÞ
¼ pccl cell þ
X
l pcsmin
gðm;nÞ A Si ðpÞ
4.2. Non-uniformly distributed In this section, we introduce the NU-ELM (classification over uncertain data which are non-uniformly distributed based on grid) algorithm. For any one uncertain object, we define a minimum bounding rectangle (MBR), as shown in Fig. 5(a), we divide its domain of MBR by grid G(MBR), which can be viewed as a 2-dimensional array of cells. The area of any one grid is s. gðm; nÞ denotes a cell in GðMBRÞ. Let f ðgðm; nÞÞ and f ðgðm; nÞÞ denote the upper bound and lower bound of the probability of instances which are in cell gðm; nÞ. Let max f ðgðm; nÞ; gði; jÞÞ and min f ðgðm; nÞ; gði; jÞÞ denote the maximum value and minimum value of f ðgðm; nÞÞ and f ðgði; jÞÞ respectively, i.e. if f ðgðm; nÞÞ Z f ðgði; jÞÞ, then max f ðgðm; nÞ; gði; jÞÞ ¼ f ðgðm; nÞÞ and min f ðgðm; nÞ; gði; jÞÞ ¼ f ðgði; jÞÞ, otherwise, max f ðgðm; nÞ; gði; jÞÞ ¼ f ðgði; jÞÞ and min f ðgðm; nÞ; gði; jÞÞ ¼ f ðgðm; nÞÞ. In each grid gðm; nÞ, we can know the f ðgðm; nÞÞ f ðgðm; nÞÞ and area of the cell. We optimize the grid according to the following two strategies: (1) If f ðgðm; nÞÞ s f ðgðm; nÞÞ s 4 γ , we divide the cell gðm; nÞ into two cells. (2) If max f ðgðm; nÞ; gði; jÞÞ ½sðgðm; nÞÞ þ sðgði; jÞÞ min f ðgðm; nÞ; gði; jÞÞ ½sðgðm; nÞÞ þ sðgði; jÞÞ o γ , the cell gðm; nÞ and gði; jÞ are merged into one cell. According to two strategies, Fig. 5(a) is optimized for Fig. 5(b). The set of cells of xi is denoted by SSðxi Þ. l l Definition 5. Let pcgðm;nÞ and pcgðm;nÞ denote the upper bound and
lower bound of the probability of the instances belonging to class cl . pcl ðxi Þ and pcl ðxi Þ denote the upper bound and lower bound of
Table 4 Uncertain data set. Data set
Number of instances
Number of classes
Magic 04 Waveform Pendigits Letter Pageblocks
300 300 300 300 300
2 3 10 26 5
the probability of object xi belonging to class cl . X X l l pcl ðxi Þ ¼ pcgðm;nÞ pcl ðxi Þ ¼ pcgðm;nÞ gðm;nÞ A SSðxi Þ
gðm;nÞ A SSðxi Þ
4.3. Bounds of probability Let SðclassÞ be the set of classes, j SðclassesÞj is the number of classes, cl is the classes label, ð1 r l r j SðclassesÞj Þ. According to Properties 4 and 5, we can know that the bounds of probability of xi belonging to class cl . Theorem 1. L ¼ t 1 ; t 2 ; …; t m ð1 r m r j SðclassesÞj Þ is the list of lower bounds of probability in descending order according to the probability of xi belonging to each class. Let pðt j Þ denote the upper bound of the probability whose location is t j ; cðt 1 Þ denote the class whose location is t 1 . If pðt j Þ o pðt i Þ (for j ¼ 2 to j ¼ j SðclassesÞj ), then xi belongs to class cðt 1 Þ. Example: We consider the example, there are five classes in SðclassesÞ, bounds of probability of xi belonging to each class are
200
K. Cao et al. / Neurocomputing 174 (2016) 194–202
Table 5 Training time comparison of SVM, DEC, UU-ELM and NU-ELM. Data sets
Magic 04 Waveform Pendigits Letter Pageblocks
SVM
0.988 0.848 1.268 1.289 1.118
DEC
1.230 1.045 0.983 0.105 0.979
UU-ELM
NU-ELM
sig
hardlim
sin
sig
hardlim
sin
Time (s)
Time (s)
Time (s)
Time (s)
Time (s)
Time (s)
0.1310 0.0624 0.0942 0.1032 0.6942
0.1716 0.0624 0.0942 0.1005 0.5942
0.7662 0.1256 0.1584 0.1382 0.892
– 0.0312 0.0468 0.0453 0.0468
– 0.0312 0.7509 0.5528 0.7509
– 0.0312 0.0312 0.0322 0.0312
Table 6 Testing time comparison of SVM, DEC, UU-ELM and NU-ELM. Data sets
Magic 04 Waveform Pendigits Letter Pageblocks
SVM
1.5231 0.7256 1.6312 1.389 1.002
DEC
1.9561 0.9685 2.3262 1.1232 1.989
UU-ELM
NU-ELM
sig
hardlim
sin
sig
hardlim
sin
Time (s)
Time (s)
Time (s)
Time (s)
Time (s)
Time (s)
1.3709 0.2808 0.6597 0.7032 0.5645
1.7129 0.4680 0.7843 0.6986 0.687
1.4135 0.4251 0.7786 0.6856 0.596
– 0.1408 0.5408 0.6805 0.4689
– 0.2028 0.7509 0.5123 0.5961
– 0.2184 0.4160 0.5987 0.426
shown in Table 2. The list L is shown in Table 3. According to Theorem 1, for j ¼ 2 to j ¼ 5; pðt j Þ o pðt 1 Þ, we can determine object xi belongs to class c1 .
Table 7 Accuracy comparison for SVM, DEC, UU-ELM and NU-ELM. Data sets
5. Performance verification This section compares the performance of several algorithms (Support Vector Machine (SVM) [9], Dynamic Classifier Ensemble (DCE) [35], UU-ELM, and NU-ELM,) in the real-world benchmark regression and multiclass classification data sets. All the evaluations are carried out in Windows 7, MATLAB, running on a Intel Core(TM) i3-3.3 running 4GB RAM. 5.1. Data set and experimental setup Because there is no real uncertain data set available, the data sets we used in the experiments are synthesized from real data sets. The values of each object of certain data set are fitted to uncertain data, which are conformed to the Gaussian distribution. All source data sets were taken from the UCI Machine Learning Repository. Table 4 describes the data sets in detail.
Magic04: 3000 uncertain objects are synthesized, each object consists of 300 instances, each object is described by pdf.
Waveform: 1200 uncertain objects are synthesized, each object consists of 300 instances, each object is described by pdf.
Pendigits: 2000 uncertain objects are synthesized, each object
Magic 04 Waveform Pendigits Letter Pageblocks
SVM
DEC
UU-ELM
NU-ELM
sig
sig
hardlim sin
hardlim sin
TR (%) TR (%) TR (%) TR (%)
TR (%) TR (%) TR (%)
TR (%)
75.23 89.63 80.58 81.65 76.32
64.37 62.15 53.24 82.63 82.12
– 56.37 48.33 67.59 60.21
– 52.96 50.67 69.53 62.36
74.65 86.69 81.65 80.73 79.59
76.17 92.49 88.32 83.59 80.57
73.62 88.93 85.96 82.61 82.62
– 56.33 46.67 68.28 62.17
For comparison, we implemented SVM [9] and DEC [35]. DEC is a recently proposed ensemble classifier for uncertain data classification. In order to show the performance of the ELM classifier, three different activate functions (sig, hardlim, and sin) are used to execute algorithm in the experiments. 5.2. Efficiency evaluation Firstly, we evaluate the efficiency of our methods proposed in this paper. Table 5 shows the training time among SVM, DEC, UUELM and NU-ELM. We can see that training time of UU-ELM and NU-ELM is shorter than SVM and DEC. Table 6 shows the testing time of SVM, DEC, UU-ELM and NU-ELM. As we expect, UU-ELM and NU-ELM are faster than DEC.
consists of 300 instances, each object is described by pdf.
Letter: 5000 uncertain objects are synthesized, each object
consists of 300 instances, each object is described by pdf. Pageblocks: 1300 uncertain tuples are synthesized, each object consists of 300 instances, each object is described by pdf.
5.3. Accuracy evaluation Table 7 shows the accuracy rate of SVM, DEC, UU-ELM and NUELM with five data sets. We adopt the traditional learning principle in
K. Cao et al. / Neurocomputing 174 (2016) 194–202
the training phase. We compare the accuracy among SVM, DEC, UUELM and UC-ELM as shown in Table 7. It can be seen that ELM can always achieve comparable performance as SVM and DEC. Seen from Table 7, different functions of ELM can be used in different data sets in order to have similar accuracy in different size of data sets, although any output function can be used in all types of data sets. We can see that UU-ELM and DEC performs are as good as SVM.
6. Conclusions In this paper, we studied the problem of classification based on ELM over uncertain data. We first proposed UU-ELM algorithm based on ELM, which is suitable for uniformly distributed, the grid used for pruning in UU-ELM algorithm. Furthermore, we proposed NU-ELM algorithm to classify uncertain data which are in non-uniform distribution. In NU-ELM algorithm, we divided the grid based on MBR of each uncertain data, through the estimate of the probability bound to reduce calculation. Finally, experiments were conducted on real data, which showed that the algorithms proposed in this paper are efficient and are able to deal with uncertain data classification in a real-time fashion. In our future work, we plan to consider probabilistic data with multiple dimensions and exclusive conditions. References [1] J.A. Faradjian, P. Bonnett, Gadt: a probability space adt for representing and querying the physical world, in: ICDE, 2002, pp. 201–211. [2] C.C. Aggarwal, On density based transforms for uncertain data mining, in: Data Engineering, 2007, pp. 866–875. [3] A. Ahmadian, A. Mostafa, An efficient texture classification algorithm using Gabor wavelet, Eng. Med. Biol. Soc. 1 (2013) 930–933. [4] J. Awaka, T. Iguchi, K. Okamoto, Rain type classification algorithm, J. Cardiovasc. Electrophysiol. (2007) 213–224. [5] P.S.Y.C.C. Aggarwal, A framework for clustering uncertain data streams, in: ICDE, 2008, pp. 150–159. [6] C.C. Aggarwal, Managing and mining uncertain data, Adv. Database Syst. 35 (2009) 1–41. [7] P.C.C. Aggarwal, A survey of uncertain data algorithms and applications, IEEE Trans. Knowl. Data Eng. (2009) 609–623. [8] L. Chen, M.T. Özsu, V. Oria, Robust and fast similarity search for moving object trajectories, in: SIGMOD, 2005, pp. 491–502. [9] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 10 (3) (1995) 273–297. [10] R. Fletcher, Practical methods of optimization, IEEE Trans. Syst. Man Cybern. Part B: Cybern. (1987). [11] P. Gil-Jiménez, S. Lafuente-Arroyo, S. Maldonado-Bascón, H. Gómez-Moreno, Shape classification algorithm using support vector machines for traffic sign recognition, Comput. Intell. Bioinspired Syst. (2005) 873–880. [12] H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. Knowl. Data Eng. 17 (4) (2005) 491–502. [13] G.-B. Huang, Learning capability and storage capacity of two-hidden-layer feedforward networks, IEEE Trans. Neural Netw. 14 (2) (2003) 274–281. [14] G.-B. Huang, H.A. Babri, Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions, IEEE Trans. Neural Netw. 9 (1) (1998) 224–229. [15] G.-B. Huang, L. Chen, Convex incremental extreme learning machine, Neurocomputing 70 (2007) 3056–3062. [16] G.-B. Huang, L. Chen, Enhanced random search based incremental extreme learning machine, Neurocomputing 71 (2008) 3460–3468. [17] G.-B. Huang, L. Chen, C.-K. Siew, Universal Approximation using Incremental Feedforward Networks with Arbitrary Input Weights, Technical Report ICIS, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 2003. [18] G.-B. Huang, L. Chen, C.-K. Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Netw. 17 (2006) 879–892. [19] G.-B. Huang, Y.-Q. Chen, H.A. Babri, Classification ability of single hidden layer feedforward neural networks, IEEE Trans. Neural Netw. 11 (3) (2000) 799. [20] G.-B. Huang, P. Saratchandran, N. Sundararajan, An efficient sequential learning algorithm for growing and pruning RBF (GAP-RBF) networks, IEEE Trans. Syst. Man Cybern. Part B 34 (6) (2004) 2284–2292. [21] G.-B. Huang, P. Saratchandran, N. Sundararajan, A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation, IEEE Trans. Neural Netw. 16 (1) (2005) 57–67. [22] G.-B. Huang, C.-K. Siew, Extreme learning machine with randomly assigned RBF kernels, Int. J. Inf. Technol. 11 (1) (2005) 16–24.
201
[23] G.-B. Huang, C.-K. Siew, Extreme learning machine: RBF network case, in: ICARCV, vol. 2, Kunming, China, 6–9 December 2004, pp. 1029–1036. [24] G.-B. Huang, D.H. Wang, Y. Lan, Extreme learning machines: a survey, Int. J. Mach. Learn. Cybern. 2 (2) (2011) 107–122. [25] G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst. Man Cybern. 42 (2) (2012) 513–529. [26] G.-B. Huang, Q.-Y. Zhu, K.Z. Mao, C.-K. Siew, P. Saratchandran, N. Sundararajan, Can threshold networks be trained directly? IEEE Trans. Circuits Syst. II 53 (3) (2006) 187–191. [27] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme Learning Machine, Technical Report ICIS, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, January 2004. [28] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (2006) 489–501. [29] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, in: IJCNN, vol. 2, Budapest, Hungary, 25–29 July 2004, pp. 985–990. [30] Keyan Cao, Guoren Wang, Donghong Han, Jingwei Ning, Xin Zang, Classification of uncertain data streams based on extreme learning machine, Cogn. Comput. 7 (1) (2015) 150–160. [31] T. Korte, W. Jung, C. Wolpert, S. Spehl, B. Schumacher, B. Esmailzadeh, B. Lüderitz, A new classification algorithm for discrimination of ventricular from supraventricular tachycardia in a dual chamber implantable cardioverter defibrillator, J. Cardiovasc. Electrophysiol. 9 (1) (1998) 70–73. [32] C. Leung, Mining uncertain data, in: Wiley Interdiscovery Rewind: Data Mining and Knowledge Discovery, 2011, pp. 316–329. [33] J. Liu, H. Deng, Outlier detection on uncertain data based on local information, Knowl.-Based Syst. 51 (2013) 60–71. [34] V. Ljosa, A.K. Singh, Apla: indexing arbitrary probability distributions, in: ICDE, 2007, pp. 946–955. [35] S. Pan, K. Wu, Y. Zhang, X. Li, Classifier ensemble for uncertain data stream classification, Adv. Knowl. Discov. Data Min. (2010) 488–495. [36] D.R. Cheng, S. Prabhakar, Querying imprecise data in moving object environments, Knowl. Data Eng. 16 (9) (2004) 1112–1127. [37] E.A. Rundensteiner, A classification algorithm for supporting object-oriented views, Inf. Knowl. Manag. (1994) 18–25. [38] B. Xu, D. Jiang, J. Li, Hsm: a fast packet classification algorithm, Adv.Inf. Netw. Appl. 1 (2005) 987–992. [39] C. Zhang, M. Gao, A. Zhou, Tracking high quality clusters over uncertain data streams, in: ICDE, 2009, pp. 1641–1648.
Keyan Cao is a lecturer at Shenyang Jianzhu University, received Ph.D. degree from NorthEastern University in 2014. Her research interests include data mining, uncertain data management and data stream management.
Guoren Wang is a professor and Ph.D. supervisor at Northeastern University, and a senior member of CCF. His research interests include XML data management, query processing and optimization, probabilistic database, and bioinformatics.
Donghong Han is an associate professor at Northeastern University. Her research interests include data stream management, data mining and uncertain data management. She is a member of CCF.
202
K. Cao et al. / Neurocomputing 174 (2016) 194–202 Mei Bai received B.Sc. and M.Sc. in computer science and technology from the Northeastern University in July 2009 and July 2011, respectively. She is currently a Ph.D. candidate in the Department of Computer Science, Northeastern University. Her research interests include sensory data management and uncertain data management.
Shuoru Li is a graduate at Northeastern University, China. His research interests include data stream classification and machine learning.