QR factorization based Incremental Extreme Learning Machine with growth of hidden nodes

QR factorization based Incremental Extreme Learning Machine with growth of hidden nodes

Pattern Recognition Letters 65 (2015) 177–183 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.c...

569KB Sizes 0 Downloads 26 Views

Pattern Recognition Letters 65 (2015) 177–183

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

QR factorization based Incremental Extreme Learning Machine with growth of hidden nodes✩ Yibin Ye, Yang Qin∗ Harbing Institute of Technology, Shenzhen Graduate School, Key lab of Network Oriented Intelligent Computation, China

a r t i c l e

i n f o

Article history: Received 26 January 2015 Available online 7 August 2015 Keywords: Extreme Learning Machine(ELM) Incremental learning QR factorization

a b s t r a c t In this paper, a computationally competitive incremental algorithm based on QR factorization is proposed, to automatically determine the number of hidden nodes in generalized single-hidden-layer feedforward networks (SLFNs). This approach, QR factorization based Incremental Extreme Learning Machine (QRI-ELM), is able to add random hidden nodes to SLFNs one by one. The computational complexity of this approach is analyzed in this paper as well. Simulation results show and verify that our new approach is fast and effective with good generalization and accuracy performance.

1. Introduction Extreme Learning Machine (ELM) is a fast learning algorithm for single hidden layer feedforward neural networks (SLFNs) [10], whose input weights do not need to be tuned and can be randomly generated, whereas the output weights are analytically determined using the least-square method, thus allowing a significant training time reduction. ELM has received increasingly attention in recent years, and many variants were developed to tap the potentials of it. B-ELM is able to optimize the weights of the output layer based on a Bayesian linear regression [18]. The optimally pruned Extreme Learning Machine (OP-ELM) extends the original ELM algorithm and wraps it within a methodology using pruning of neurons [14]. Derived from the original circular backpropagation architecture, C-ELM can benefit from the enhancement provided by the circular input without losing any of the fruitful properties that characterize the basic ELM framework [3]. An online sequential learning algorithm based on ELM has also been explored (OS-ELM) by [13]. The parameters of hidden nodes are randomly selected just as ELM, but the output weights are analytically updated based on the sequentially arriving data rather than the whole dataset. Based on OS-ELM, a time varying version has also been developed by [22]. One of the open problems of ELM is how to assign the number of hidden neurons, which is the only parameter that needs to be set by the user, usually by trial and error. The first hidden nodes incremental algorithm for ELM, referred to as Incremental Extreme Learning Ma-



This paper has been recommended for acceptance by G. Borgefors. Corresponding author. Tel.: +8675526032794. E-mail addresses: [email protected] (Y. Ye), [email protected], csyqin@hitsz. edu.cn (Y. Qin). ∗

http://dx.doi.org/10.1016/j.patrec.2015.07.031 0167-8655/© 2015 Elsevier B.V. All rights reserved.

© 2015 Elsevier B.V. All rights reserved.

chine (I-ELM) , randomly adds nodes to the hidden layer one by one and freezes the output weights of the existing hidden nodes when a new hidden node is added [8]. However, due to the random generation of (input weights of) added hidden nodes, some of which may play a very minor role in the network output and thus may eventually increase the network complexity. In order to avoid this issue, an Enhanced method for I-ELM, called EI-ELM has been proposed by [7]. At each learning step, several hidden nodes are randomly generated and the one leading to least error is then selected and added to the network. Later, another ELM-based hidden nodes incremental learning algorithm referred to as Error Minimized Extreme Learning Machine (EM-ELM) was proposed by [4]. It can also add random hidden nodes to SLFNs one by one or even group by group, with all the previous output weights updated accordingly at each step. Compared with I-ELM and EI-ELM which keep the output weights of existing hidden nodes fixed when adding a new hidden nodes, EM-ELM attains a much better generalization performance. The basic knowledge on ELM will be briefly introduced in Section 2. When implementing EM-ELM directly, there will be more computational expensive than the original ELM. Fortunately, we found a way to optimize EM-ELM with modification to simplify its computation. These will be discussed in detail in Section 3. To further reduce computational complexity, hence the training time, we propose another incremental algorithm based on QR factorization, as described in Section 4. Although QR factorization is already used in ELM in some literature [6,12,19], this technique is not applied to hidden node incremental ELM yet [9]. Comparison simulations among ELM, EM-ELM and the proposed one with several datasets are demonstrated in Section 5. Finally, conclusions are drawn in the last section.

178

Y. Ye, Y. Qin / Pattern Recognition Letters 65 (2015) 177–183

2. Extreme Learning Machine Assume that an SLFN with I input neurons, K hidden neurons, L output neurons and activation function g( · ) is trained to learn N distinct samples (X, T), where X = {xi [n]} ∈ RN×I and T = {tl [n]} ∈ RN×L are the input matrix and target matrix respectively, xi [n] denotes the input data in ith input neuron at nth time instant, and tl [n] denotes the desired output in lth output neuron at nth time instant. In ELM, the input weights {wik } and hidden biases {bk } are randomly generated, where wik is the weight connecting ith input neuron to kth hidden neuron, and bk is the bias of kth hidden neuron. Further let w0k = bk and x0 [n] = 1. Hence, the hidden-layer output matrix H = {hk [n]} ∈ RN×K can be obtained by:



hk [n] = g

I 



xi [n] · wik

of equations (N is unchanged), which helps H · β get closer to T. On the other hand, as K increases, H may become ill-conditioned. In this case, adding more hidden nodes will not help the output error E (Hk ) = Hk βk − T get smaller anymore. This is also verified in the simulation section later. 3. Error minimized ELM The Error Minimized ELM, namely EM-ELM, is designed to update † † Hk+1 iteratively by Hk , instead of Hk+1 , when one new node is added to the existing k hidden nodes network [4]. Assume hk+1 is the new column in Hk+1 from (k + 1)th neuron, Uk+1 is upper part of Hk+1 , †

and Dk+1 is lower part of are:

(1)

Dk+1 =

i=0

be the matrix of output weights, where β kl Let β = {βkl } ∈ denotes the weight connection between kth hidden neuron and lth output neuron; and Y = {yl [n]} ∈ RN×L be the matrix of network output data, with yl [n] the output data in lth output neuron at nth time instant. Therefore, this equation can be obtained for the linear output neurons: RK×L

yl [n] =

K 

hk [n] · βkl

(2)

k=1

or Y = H · β

(3)

Thus, given the hidden-layer output matrix H and the targets matrix T, to minimize Y − T2 , the output weights can be calculated by the minimum norm least-square(LS) solution of the linear system:

βˆ = H† · T,

(4)

where H† is the Moore–Penrose generalized inverse of matrix H [10,16]. By computing output weights analytically, ELM achieves good generalization performance with speedy training phase. The key step in ELM is to compute H†, which generally can be done by using the singular value decomposition (SVD) [5]. Unfortunately, the computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix-matrix multiplication, even if a state-of-the-art implementation is used. It is shown that if the N training data are distinct, H is full column rank (its rank = number of columns) with probability one when K ≤ N. [10] In real applications, the number of hidden nodes is always less than the number of training data. In this case, HT H is invertible. So H† can be explicitly expressed as (HT H)−1 HT [16]. Since H is an N by K matrix (represented as ∈ RN×K ), while H† ∈ RK×N . Assuming that arithmetic multiplication with individual elements has complexity O(1), 1 the computational complexity of getting H† in ELM involves O(K2 N), O(K3 ) and O(K2 N), i.e. O(2K 2 N + K 3 ) in total. Remark 2.1. EM-ELM and our proposed QRI-ELM are based on the assumption that H is full column rank then H† = (HT H)−1 HT . However, there exist situations when H is ill-conditioned (”almost” not full column rank) as K increases though K < N still holds. One consequence of it is that, even if the matrix HT H is invertible, a computer algorithm may be unsuccessful in obtaining an approximate inverse, and if it does obtain one it may be numerically inaccurate [5,17]. Divergence will inevitably occur if EM-ELM or QRI-ELM continues to be applied in this case. Remark 2.2. When H is well-conditioned, adding more hidden nodes can be considered as adding more unknowns to satisfy same amount

. The key steps in the original paper

hTk+1 (I − Hk H†k ) hTk+1 (I − Hk H†k )hk+1

Uk+1 = H†k (I − hk+1 Dk )

 H†k+1 =

Uk+1

(5)

(6)

 (7)

Dk+1

Although it is claimed that the training time of EM-ELM is less than that of ELM, we can see that is not true with simple analysis if using the above formula directly. The most computational consum† ing step of EM-ELM is multiplication of Hk and Hk , with complexity 2 2 O(kN ), even more than O(k N) in ELM. (Note k  N is usually the case.) However, with a little modification in EM-ELM, we can reduce the computational complexity dramatically, with complexity of O(5kN + 2N) when even updating the output weights β: 3

Dk+1 =

hTk+1 − hTk+1 Hk H†k hTk+1 hk+1 − hTk+1 Hk H†k hk+1

Uk+1 = H†k − H†k hk+1 Dk

 βk+1 =

Uk+1 Dk+1

(8)

(9)

 T

(10)

We applied the latter modified implementation for EM-ELM in later simulation. 4. The proposed QR factorization based incremental ELM Using QR factorization, the hidden-layer output matrix H can be decomposed as H = Q · R, where Q is an orthogonal matrix and R is an upper triangular matrix, respectively [20]. Thus the MP inverse of H can be represented as H† = R−1 · QT . The basic idea here is to calculate R−1 based on R−1 as the k+1 k number of hidden nodes k increases, followed by getting output , so as to further reduce computational weights βˆ k+1 by βˆ k and R−1 k complexity. There are several ways for QR factorization, such as Gram– Schmidt process, Householder transformations, or Givens rotations. With careful investigation and examination, we found that the Gram– Schmidt method is suitable for incremental algorithm development. In linear algebra, the Gram–Schmidt process is a method for orthonormalizing a set of vectors in an inner product space [11]. An 2

If A ∈ Rm×n and B ∈ Rn×p , there will be mnp element multiplications in their matrix product AB ∈ Rm×p , whose computational complexity will then be O(mnp). 1

† Hk+1 2

3

They are δ hk , Uk and Dk in [4], respectively. e.g. Since hTk ∈ R1×N , Hk ∈ RN×k and H†k ∈ Rk×N , the computational complexity of

hTk Hk H†k will be O(2kN), much less than that of hTk (Hk H†k ).

Y. Ye, Y. Qin / Pattern Recognition Letters 65 (2015) 177–183

important application of the Gram–Schmidt process to the column vectors of a full column rank matrix yields the QR decomposition. Details of the process are listed below. Let H = [h1 , h2 , . . . , hk ], Q = [q1 , q2 , . . . , qk ] and

⎡ ⎢ ⎣

r11

r12 r22

R=⎢



... ... .. .

r1k r2k ⎥ .. ⎥ ⎦. . rkk

Finally,

βˆ k+1 = H†k+1 T = R−1 QT T k+1 k+1 ⎡ ⎤ ⎡ ⎤ −1 −1 T R−1 −R δ r r Q k+1 k k k k+1,k+1 ⎦ ⎣ ⎦·T =⎣ · −1 rk+1,k+1

0

 =

Since H = Q · R, we have,

⎡ ⎢ ⎣

r11

r12 r22

[h1 , h2 , . . . , hk ] = [q1 , q2 , . . . , qk ] · ⎢

... ... .. .



r1k r2k ⎥ .. ⎥ ⎦ . rkk

(11)

hence,

h1 = q1 · r11

(12)

h2 = q1 · r12 + q2 · r22

(13)

.. .

179

βˆ k − R−1 δ rk+1 βk+1 k βk+1

qTk+1



(22)

where βk+1 = qTk+1 T/rk+1,k+1 Now, our proposed QR factorization based incremental algorithm, namely QRI-ELM can be summarized as follows. Given a set of training data, assume that a single hidden layer neural network is to be trained, starting with 1 hidden nodes, to maximum number of hidden nodes Kmax , and the expected learning accuracy  . Note that it is instead of Rk used as an intermediate variable in the whole reR−1 k

cursive process, hence Pk = R−1 is introduced in the procedure. The k whole process is demonstrated in Algorithm 1. Algorithm 1 QRI-ELM.

hk = q1 · r1k + · · · + qk · rkk

(14)

Note that QT Q = I ⇒ qTi qi = 1 and qTi q j = 0 when i = j, so 2 hT1 h1 = r11 qT1 q1 r11 = r11

we have r11 =

qTi hk

=

qTi qi

(15)

Randomly generate the single hidden node input weights set

2:

Calculate the hidden-layer output matrix H1 ( = h1 )

{ωi1 }Ii=0

3: 4:

hT1 h1 and q1 = h1 /r11 . For k > i ≥ 1,

5:

· rik = rik

(16)

Finally, let δ hk = qk · rkk = hk − (q1 · r1k + · · · + qk−1 · rk−1,k ), so rkk =  δ hTk δ hk and qk = δ hk /rkk . Assume now we add the (k + 1)th hidden node to the k nodes network. After generating input weights, we will have an additional  column hk+1 added to Hk to form

1:

Hk+1 = Hk hk+1

6: 7:

8:

δ rk+1 = QTk hk+1 δ hk+1 = hk+1 − Qk δ rk+1   1 1 pk+1,k+1 = = (δ hTk+1,k+1 δ hk+1,k+1 )− 2

. Con-

sequently, the QR  factorization  of Hk+1 becomes Hk+1 = Qk+1 Rk+1 , where

Qk+1 =

Qk qk+1

⎡  Rk+1 =

Rk



δ rk+1

0 rk+1,k+1

rk+1,k+1

and

⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎣

r11 r12 . . . r1k

r1,k+1

r22 . . . r2k

r2,k+1

..

.

.. .

.. .

rkk

rk,k+1



qk+1 = pk+1,k+1 δ hk+1,k+1

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

βk+1 =

pk+1,k+1 qTk+1 T



βˆ k+1 = 

rk+1,k+1 .

1

Calculate the inverse of R1 : P1 ( = p11 = r1 ) = (hT1 h1 )− 2 11 Calculate Q1 = q1 = p11 h1 ˆ ( = β ) = P QT T Calculate the output weight β 1 1 1 1 while k = 1 to Kmax and E (Hk ) = Hk βk − T >  do A new hidden node is added, the corresponding input weights set are generated {ωi,k+1 }Ii=0 , and the corresponding hk+1 are calculated. Update the following variables in sequence:

Pk+1 =

According to (16), we have



βˆ k − Pk δ rk+1 βk+1 βk+1 pk+1,k+1

(24) (25) (26) (27)



Pk −Pk δ rk+1 pk+1,k+1 0

(23)

(28)

 (29)



δ rk+1 = QTk hk+1

(17)

δ hk+1 = qk+1 · rk+1,k+1 = hk+1 − (q1 · r1,k+1 + · · · + qk · rk,k+1 ) = hk+1 − Qk δ rk+1

(18)

10:

(19)

Let us examine the computational complexity of the QRI-ELM algorithm here. The most computational consuming step for updating output weights after adding a new hidden node is in (23), with comˆ plexity of O(kN). Other steps of updating every β k+1 involve multiplication of O(kN), O(N), O(N), O(NL), O(k2 ) from (24) to (28), respectively. Considering L = 1 and k  N are usually the case, the total ˆ complexity of updating β k+1 in QRI-ELM is less than O((2k + 3)N ). For comparison, recall complexity O((5k + 2)N) in modified EM-ELM and O(2k2 N + k3 ) in ELM.

rk+1,k+1 =



δ hTk+1 δ hk+1

qk+1 = δ hk+1 /rk+1,k+1

 R−1 = k+1

Rk

δ rk+1

0 rk+1,k+1

−1

9:

(20)

⎡ =⎣

−1 R−1 −R−1 δ rk+1 rk+1,k+1 k k

0

−1 rk+1,k+1

⎤ ⎦

Qk+1 = Qk qk+1

(21)

(30)

k←k+1 end while

180

Y. Ye, Y. Qin / Pattern Recognition Letters 65 (2015) 177–183

Fig. 1. Performance comparison of original ELM, EM-ELM and proposed QRI-ELM.

Table 1 Performance comparison among EM-ELM, I-ELM, RAN and MRAN in regression applications, cited from Tables II and III in [4]. Datasets

Abalone

Boston Housing

Auto Price

Machine CPU

Algorithms

# Nodes

Training time(s)

Test RMSE

EM-ELM I-ELM RAN MRAN EM-ELM I-ELM RAN MRAN EM-ELM I-ELM RAN MRAN EM-ELM I-ELM RAN MRAN

200 200 186 68 200 200 41 36 21 200 24 23 21 200 7 7

0.0193 0.2214 39.928 255.84 0.0073 0.0515 2.0940 22.767 0.0053 0.0329 0.3565 2.5015 0.0047 0.0234 0.1735 0.2454

0.0755 0.0920 0.1183 0.0906 0.0073 0.1167 0.1474 0.1321 0.0027 0.0977 0.1418 0.1373 0.0018 0.0504 0.1069 0.1068

In real applications, the stop criterion can be set as output error E(Hk ) less than the target error  , or be set as |E (Hk+1 ) − E (Hk )| <  , depending on specific cases. Remark 4.1. Differences among the term “QR (iteration) algorithm”, “Iterative QR Decomposition” and our proposed “QR Incremental ELM algorithm”: in numerical linear algebra, the ”QR algorithm” is an eigenvalue algorithm, i.e. a procedure to calculate the eigenvalues and eigenvectors of a matrix with iteration [5]; while in ”iterative QR decomposition”, the elements (or vectors) of R are calculated in a sequential way to reduce complexity, with a fixed source matrix H [2]. In our proposed algorithm, with the dimension (columns) of matrix H incrementally changed each round, the corresponding output weights are updated accordingly, yet without computing each Rk 5. Simulation Before our simulation, we would like to quote some results from [4], listed in Table 1, which shows the comparison performances of EM-ELM and other incremental algorithms like resource allocation network(RAN) [15], minimum resource allocation network(MRAN) [23], and Incremental Extreme Learning Machine(I-ELM) in some datasets [1].4

4 Note the values of training time in Table 1 is only for the last round, while in our simulation results shown later, the training time is accumulated from all previous rounds.

There are not many hidden nodes incremental algorithms based on ELM (including EM-ELM, I-ELM and EI-ELM) thus far [9], and EMELM is clearly much better than I-ELM and other non-ELM-based algorithms like RAN or MRAN in terms of accuracy performance [4,21] (also see Table 1 for reference). Therefore, in this section, we mainly investigate the performance of the proposed algorithm QRI-ELM in some benchmark applications, comparing with only EM-ELM and the original ELM. These simulations are conducted in Matlab R2013a on a Linux machine (2.9GHz dual core CPU). For all three algorithms, the sigmoid additive activation function has been chosen as the output function of the hidden nodes: g(x) = 1+e1 −x . The average results are obtained with 50 trials for all cases. SinC function is the first benchmark problem these algorithms are compared in. One thousand training and test data randomly generated from the interval (-10, 10) with a uniform distribution. Fig. 1 shows the generalization performance of original ELM, EMELM and our proposed QRI-ELM, with different training time. It can be seen from Fig. 1 (a) that with the help of growing hidden nodes and incrementally updating the output weights, EM-ELM is much faster than the conventional ELM, while QRI-ELM is much faster than EMELM due to more efficient updating mechanism. On the other hand, from Fig. 1 (b) we can see that all three algorithms maintain more or less the same testing root mean square error (RMSE) within 15 hidden nodes. Note that when adding more nodes beyond 15 hidden nodes, EM-ELM and QRI-ELM began to diverge. What is going wrong here? Recall that both EM-ELM and QRI-ELM require a condition (the hidden matrix H must not be ill-conditioned) to be met. It can be inferred from Fig. 1 (b) that adding more nodes other than 15 makes H ill-conditioned and leads to divergence when continuing training in EM-ELM or QRI-ELM. Although both EM-ELM and QRI-ELM have the risk of divergence when the number of hidden nodes become large, QRI-ELM still gain better performance (diverge less if it occurs) than EM-ELM. In fact, the experiment above is only for analysis. In real applications, as noted in previous section, the network growing procedure could stop if the network output error reduced very slowly, (or even stop reducing, i.e. E (Hk+1 ) − E (Hk ) ≤ 0). In this way, the network chose the optimal number of hidden nodes, while avoiding divergence at the same time as noted in Remark 2.2. Further comparison has been conducted in some real benchmark classification problems shown in Table 2 [1], which includes three classification applications(i.e., Diabetes, Satimage, Image Segmentation). In our simulation, the input data have been normalized into [−1, 1]. The performance comparison between EM-ELM and QRI-ELM on classification cases is given in Table 3. In the implementation of both algorithms, the initial state are given 1 hidden node and new nodes

Y. Ye, Y. Qin / Pattern Recognition Letters 65 (2015) 177–183 Table 2 Specification of benchmark datasets. Datasets

Attributes

Classes

#Training data

#Test data

Diabetes Satimage Image segmentation

8 36 19

2 6 7

576 3217 1500

192 3218 810

are added one by one. The growing progress will stop when certain number of hidden nodes are reached. It is indicated that QRI-ELM consumes less time than EM-ELM while maintaining similar performance in terms of test accuracy. The performance comparison among ELM, EM-ELM and QRI-ELM has also been conducted in these benchmark problems. The target training accuracy is set as 90% or 80% as marked in Table 4. The optimal number of hidden nodes for the original ELM is obtained by trial-and-error. In the implementation of EM-ELM and QRI-ELM, the

181

initial structures are given one hidden node and new random hidden nodes are added one by one. As seen from the simulation results given in Table 4, EM-ELM is faster than traditional ELM, and QRI-ELM is further faster than EM-ELM as expected, with similar test accuracies comparison. It should be noticed that the training time of QRIELM is smaller than that of EM-ELM, though not so significant comparing with Table 3, since the preprocess time of calculating training accuracy in each round (for comparing with stop accuracy) is much larger than the time needed in the basic algorithm itself. This is also one of the problems worth investigating: how to reduce preprocess time for training/test accuracy in classification cases, or to find another efficient stop criteria, has become a critical challenge, when the main part of the incremental algorithm evolves fast enough. One way to improve it is to assess training accuracy in every several (instead of one) round. Nonetheless, we can still see significant training time reduction when applying QRI-ELM instead of EM-ELM in large number hidden nodes case (3.71 s versus 5.31 s, more than 100 hidden nodes, in Satimage dataset case).

Table 3 Performance comparison in benchmark datasets (when certain numbers of hidden nodes are reached). Datasets

Algorithms

# Nodes

Training time(s)

Training accuracy(%)

Test accuracy(%)

Diabetes

EM-ELM QRI-ELM EM-ELM QRI-ELM EM-ELM QRI-ELM

20 20 150 150 20 20

0.0037 0.0020 1.08 0.13 0.0175 0.0086

78.89 78.85 89.69 89.69 88.54 88.61

77.54 77.46 87.22 87.27 87.98 87.96

Satimage Image Segmentation

Table 4 Performance comparison in Benchmark datasets (when stop conditions are met). Datasets (Stop accuracy)

Algorithms

# Nodesa

Diabetes(80%)

ELM EM-ELM QRI-ELM ELM EM-ELM QRI-ELM ELM EM-ELM QRI-ELM

24.00 30.40 29.26 149.5 188.2 174.2 23.52 27.50 27.42

Satimage(90%)

Image Segmentation(90%)

Training time(s) 0.6934 0.2427 0.2297 52.48 5.31 3.71 1.60 0.27 0.25

Test accuracy(%) 76.90 76.95 76.87 87.94 88.33 87.88 89.68 89.58 89.31

a The values of numbers of nodes shown in Table 4 are not integers because each of them is an average of 50 trials.

Fig. 2. Performance comparison of original ELM, EM-ELM and proposed QRI-ELM in Diabetes dataset.

182

Y. Ye, Y. Qin / Pattern Recognition Letters 65 (2015) 177–183

Fig. 3. Performance comparison of original ELM, EM-ELM and proposed QRI-ELM in Satimage dataset.

Fig. 4. Performance comparison of original ELM, EM-ELM and proposed QRI-ELM in Image Segmentation dataset.

Finally, performance (in terms of training time and test accuracy) comparisons of ELM, EM-ELM and QRI-ELM in Diabetes dataset, Satimage dataset and Image Segmentation dataset, are demonstrated in Fig. 2–4, respectively. All of them show the same results: with the same or similar test accuracy performances, the proposed QRI-ELM consumes the least training time of all three algorithms. 6. Conclusions In this paper, we have proposed a method for incremental ELM based on QR factorization, namely QRI-ELM. This is another efficient method and competitive option other than EM-ELM. It can also automatically determine the number of hidden nodes in generalized single hidden layer feedforward networks. The output weights can be incrementally updated efficiently during the growth of the networks. Computational complexity analysis and simulation results show that the new approach can further reduce the training time yet maintain similar generalization performance (in terms of test accuracy) comparing with EM-ELM. Acknowledgment This work is supported by Shenzhen Science and Technology Research Fund under grant JC201005260183A, ZYA201106070013A, and JCYJ20140417172417131.

References [1] A. Asuncion, D. Newman, 2007. The UCI machine learning repository. URL: http://archive.ics.uci.edu/ml/. [2] R.-H. Chang, C.-H. Lin, K.-H. Lin, C.-L. Huang, F.-C. Chen, Iterative decomposition architecture using the modified gram–schmidt algorithm for Mimo systems, IEEE Trans. Circuits Syst. I: Regul. Pap. 57 (5) (2010) 1095–1102. [3] S. Decherchi, P. Gastaldo, R. Zunino, E. Cambria, J. Redi, Circular-elm for the reduced-reference assessment of perceived image quality, Neurocomputing 102 (2013) 78–89. [4] G. Feng, G.-B. Huang, Q. Lin, R. Gay, Error minimized extreme learning machine with growth of hidden nodes and incremental learning, IEEE Trans. Neural Netw. 20 (8) (2009) 1352–1357, doi:10.1109/TNN.2009.2024147. [5] G.H. Golub, C.F.V. Loan, Matrix Computations, 3rd ed., Johns Hopkins, 1996. [6] P. Horata, S. Chiewchanwattana, K. Sunat, Enhancement of online sequential extreme learning machine based on the householder block exact inverse {QRD} recursive least squares, Neurocomputing 149, Part A (0) (2015) 239–252, doi:10.1016/j.neucom.2013.10.047. ELM 2013. [7] G. Huang, L. Chen, Enhanced random search based incremental extreme learning machine, Neurocomputing 71 (16–18) (2008) 3460–3468. [8] G. Huang, L. Chen, C. Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Netw. 17 (4) (2006) 879. [9] G. Huang, G.-B. Huang, S. Song, K. You, Trends in extreme learning machines: a review, Neural Net. 61 (0) (2015) 32–48, doi:10.1016/j.neunet.2014.10.001. [10] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1–3) (2006) 489–501. [11] K. Kuttler, Linear Algebra: Theory and Applications, The Saylor Foundation, 2012. [12] B. Li, J. Wang, Y. Li, Y. Song, An Improved On-Line Sequential Learning Algorithm for Extreme Learning Machine, in: D. Liu, S. Fei, Z.-G. Hou, H. Zhang, C. Sun (Eds.), Advances in Neural Networks – ISNN 2007, Lecture Notes in Computer Science, vol. 4491, Springer Berlin Heidelberg, 2007, pp. 1087–1093.

Y. Ye, Y. Qin / Pattern Recognition Letters 65 (2015) 177–183 [13] N.-Y. Liang, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and accurate online sequential learning algorithm for feedforward networks, IEEE Trans. Neural Netw. 17 (6) (2006) 1411–1423. [14] Y. Miche, A. Sorjamaa, P. Bas, O. Simula, C. Jutten, A. Lendasse, OP-ELM: optimally pruned extreme learning machine, IEEE Trans. Neural Netw. 21 (1) (2010) 158– 162. [15] J. Platt, A resource-allocating network for function interpolation, Neural Comput. 3 (2) (1991) 213–225. [16] C.R. Rao, S.K. Mitra, Generalized Inverse of Matrices and its Applications, vol. 7, Wiley, New York, 1971. [17] J.D. Riley, Solving systems of linear equations with a positive definite, symmetric, but possibly ill-conditioned matrix, Math. Tables Other Aids Comput. (1955) 96– 101. [18] E. Soria-Olivas, J. Gomez-Sanchis, J. Martin, J. Vila-Frances, M. Martinez, J. Magdalena, A. Serrano, BELM: bayesian extreme learning machine, IEEE Trans. Neural Net. 22 (3) (2011) 505–509.

183

[19] Y. Tan, R. Dong, H. Chen, H. He, Neural network based identification of hysteresis in human meridian systems, Int. J. Appl. Math. Comput. Sci. 22 (3) (2012) 685– 694. [20] L.N. Trefethen, D. Bau III, Numerical Linear Algebra, vol. 50, SIAM, 1997. [21] Y. Ye, S. Squartini, F. Piazza, Incremental-based extreme learning machine algorithms for time-variant neural networks, Adv. Intell. Comput. Theor. Appl. (2010) 9–16. [22] Y. Ye, S. Squartini, F. Piazza, Online sequential extreme learning machine in nonstationary environments, Neurocomputing 116 (2013) 94–101, doi:10.1016/j.neucom.2011.12.064. [23] L. Yingwei, N. Sundararajan, P. Saratchandran, A sequential learning scheme for function approximation using minimal radial basis function neural networks, Neural Comput. 9 (2) (1997) 461–478.