An improved sequential method for principal component analysis

An improved sequential method for principal component analysis

Pattern Recognition Letters 24 (2003) 1409–1415 www.elsevier.com/locate/patrec An improved sequential method for principal component analysis q Ze Wa...

262KB Sizes 0 Downloads 105 Views

Pattern Recognition Letters 24 (2003) 1409–1415 www.elsevier.com/locate/patrec

An improved sequential method for principal component analysis q Ze Wang a

a,*

, Yin Lee a, Simone Fiori b, Chi-Sing Leung c, Yi-Sheng Zhu

a

Department of Biomedical Engineering, Shanghai Jiao Tong University, HuaShan RD. 1954, Shanghai 200030, China b Department of Industrial Engineering, University of Perugia, Italy c Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China Received 15 August 2002; received in revised form 12 October 2002

Abstract In sequential principal component (PC) extraction, when increasing numbers of PCs are extracted the accumulated extraction error becomes dominant and makes a reliable extraction of the remaining PCs difficult. This paper presents an improved cascade recursive least squares method for PCsÕ extraction. The good features of the proposed approach are illustrated through simulation results, and include improved convergence speed and higher extraction accuracy. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Cascade recursive least squares (CRLS); Principal component analysis; Vector orthogonalization by subspace deflation

1. Introduction In the standard numerical approach to principal component analysis (PCA), the data covariance matrix is firstly estimated and then its eigenvectors and associated eigenvalues are extracted by some well-known numerical algorithms, e.g. the QR decomposition or the SVD algorithm. However, this approach is not practicable to handle large data-sets, because the dimensions of the covariance matrix become too large to be manipulated. Moreover, the whole set of eigenvectors has to be

q The work was partially supported by a grant from City University of Hong Kong (No. 7001079). * Corresponding author. E-mail address: [email protected] (Z. Wang).

evaluated even though only some of them are needed as in the case of image compression or pattern recognition (Turk and Pentland, 1991). To steer clear of these problems, several online neural network approaches (Oja, 1989; Abbas and Fahmy, 1994; Bannour and Azimi-Sadjadi, 1995; Diamantaras and Kung, 1996; Cichocki et al., 1996; Fiori and Piazza, 2000) have been proposed, which enable us to find the eigenvectors and the associated eigenvalues one by one directly from the input data vector without need to compute the data covariance matrix. The existing neural algorithms for principal component analysis may be classified into parallel or sequential ones. In the parallel extraction mode, a hierarchic network is trained in order for the component neurons to encode, in a parallel way, the wanted principal vectors, while in the sequential

0167-8655/03/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 2 ) 0 0 3 8 1 - 1

1410

Z. Wang et al. / Pattern Recognition Letters 24 (2003) 1409–1415

extraction mode each neuron learns alone a principal component on the basis of input data deflated from other components extracted before by preceding neurons. As opposed to the parallel-extraction techniques, the sequential approaches have the intrinsic drawback that the neurons near the input cannot help the neurons far from the input to improve the training results. The accumulated error will be dominant when more and more principal components (PCs) are extracted. This will render the remaining extraction difficult. Therefore extensive training is required while, in practice, only limited training is allowed. So improvement is needed to speed up the convergence and the accuracy. According to the mutual orthogonality of the eigenvectors, Wong et al. (2000) proposed that the initial weight vector of the next extraction should be orthogonal to the eigen-subspace spanned by the already extracted weight vectors. In this paper, we combine this approach with the CRLS-PCA algorithm proposed by Cichocki et al. (1996), and as an extension, after each eigenvectorÕs training, we perform orthogonalization by subspace deflation. This combination has a twofold effect: Firstly, it improves convergence because it saves the substantial training effort needed to bring an unitary vector to the right dimensional space; secondly, it provides an accurate and orthogonal PCs extraction, which is very important in application fields such as image compression. The CRLS-PCA extraction method with initial weight deflation and after-learning deflation is presented in the following section. And then simulations and results are given in Section 3. Section 4 concludes this paper.

yi ¼ wTi x or; in matrix notation; y ¼ W T x;

ð1Þ

therefore x can be reconstructed by m m X X ^¼ yi wi ¼ ðwTi xÞwi : x

ð2Þ

i¼1

In the sense of minimal mean-square-error ^k2 g), the vectors reconstruction (MSE ¼ Efkx x w1 ; w2 ; . . . ; wm tend to be the m eigenvectors of signal xÕs covariance matrix associated to the largest eigenvalues. Hence the reconstruction of x ^ can be easily obtained as from x ^ ¼ Wy ¼ WW T x: x

A common practice in signal processing is to find an efficient representation of the data for data reduction. Consider a zero-mean random vector T xðtÞ ¼ ½x1 ðtÞ; x2 ðtÞ; . . . ; xn ðtÞ 2 Rn generated from a stationary multivariate stochastic process. Given a set of m orthogonal, unitary vectors W ¼ ½w1 ; w2 ; . . . ; wm  2 Rnm (where usually m 6 n), the projection of x onto wi is

ð3Þ

These orthogonal vectors are generally ordered decreasingly according to their variance, consequently they are called principal vectors (or eigenvectors). And the uncorrelated outputs yðtÞ ¼ ½y1 ðtÞ; y2 ðtÞ; . . . ; ym ðtÞT 2 Rm are termed PCs. The classical approaches for PCA extraction firstly estimate the input data covariance matrix and then evaluate the eigenvalues and the corresponding eigenvectors. This is not practical for large datasets because the dimensions of the covariance matrix become too large to be handled. As reported by Costa and Fiori (2001), the sequential CRLS approach shows the best performance among other up-to-date neural algorithms. As shown in Fig. 1, in the CRLS the principal vectors are extracted through a cascade neural network one by one. For the ith neuron in the cascade, the learning equations read e1 ðtÞ ¼ xðtÞ;

ð4Þ

yi ðtÞ ¼ wTi ðt 1Þei ðtÞ;

ð5Þ

Pn gi ð0Þ ¼

2. The proposed method

i¼1

j¼1

PN

t¼1

e2iðjtÞ

; N gi ðtÞ ¼ gi ðt 1Þ þ yj2 ðkÞ; wi ðtÞ ¼ wi ðt 1Þ þ

ð6Þ

yi ðtÞ ðei ðtÞ wi ðt 1Þyi ðtÞÞ: gi ðtÞ ð7Þ

After each principal component extraction, current remaining signal reduction error is calculated, for the next extraction, by the bridge formula: ei ðtÞ ¼ ei 1 ðtÞ yi 1 ðtÞwi 1 ðtÞ:

ð8Þ

Z. Wang et al. / Pattern Recognition Letters 24 (2003) 1409–1415

1411

Fig. 1. The CRLS-PCA network architecture (Cichocki et al., 1996).

This approach exhibits the best performance with respect to convergence speed, computational burden and memory requirement. However, the same problem of accumulated error still exists as in other sequential methods. When more and more principal components are extracted, the accumulated extraction error will degrade the remaining principal component extractions, particularly when it is subject to limited training. So we have to make

the extracted vector converge to the true eigenvector as fast as possible. Consider the case that after the first i principal components have been extracted, the extracted c i ¼ ½w1 ; w2 ; . . . ; wi  span an principal vectors W eigen-subspace Si with a certain degree of accuracy. If the initial value for the next training principal vector is chosen close to the correct one, the convergence should be improved, and

Fig. 2. Images used in the simulations. (a) Lena 512  512. (b) Peppers 512  512. (c) Bird 256  256. (d) Bridge 256  256.

1412

Z. Wang et al. / Pattern Recognition Letters 24 (2003) 1409–1415

consequently, the accumulated error should be decreased. A reasonable choice of that initial value should be a vector orthogonal to Si . This initial weight vector can be easily obtained by deflating a randomly generated vector wiþ1 through ciW c T wiþ1 : wiþ1 ð0Þ ¼ wiþ1 W i

ð9Þ

With this initial step, a substantial amount of training necessary to bring a random vector to the right subspace is saved. Thus, the convergence is improved. When the training data-set or training time is limited, after each extraction, the same deflation step could be performed on the extracted vector, this further assures the orthogonality of the extracted principal vectors.

3. Simulations and results In order to evaluate the proposed PCA method, we performed simulations on some standard test images as shown in Fig. 2, including two 512  512 images (the Lena and the peppers pictures) and two 256  256 images (the bird and the bridge pictures). The standard singular value decomposition (SVD), original CRLS-PCA algorithm and the proposed approach are all tested. The 256 gray-level are firstly subdivided pffiffiffi pimages ffiffiffi into a set of n  n non-overlapping blocks, which are reshaped into vectors of size n. The pixel values are normalized to the range between 0 and 1; then the mean value is removed from each pixel to get a zero-mean training set. These training vectors are randomly inputed to the network as

Fig. 3. SNR vs different eigenvectors of four images when the block size is 4  4. (a) Lena 512  512. (b) Peppers 512  512. (c) Bird 256  256. (d) Bridge 256  256.

Z. Wang et al. / Pattern Recognition Letters 24 (2003) 1409–1415

shown in Fig. 1 to extract all principal components. Using different principal vectors, we reconstructed the original image and normalized back the results by re-scaling the images to their original ranges and by adding the mean-pixel-value. As a performance index, we calculated the signalto-noise ratio (SNR), defined as PNrow PNcol 2 i¼1 j¼1 Iij SNR ¼ 10 log10 PNrow PNcol ðdBÞ; 2 ðIij bI ij Þ i¼1

j¼1

ð10Þ where I is the original image, bI is the reconstructed one and Nrow  Ncol is the support size. Three block types are considered in our experiments: 4  4, 8  8 and 16  16. The SNR versus the number of eigenvectors of the images are shown in Figs. 3–5. In Figs. 3 and 4

1413

part of the curves are shown in their associated insets with different scales. All the legends in each figure are corresponding to the curve from the top to the bottom. One epoch is defined as a sequential scan of the image with the considered block type. For 512  512 images, there are 16 384, 4096 and 1024 learning patterns within each epoch respectively for 4  4, 8  8 and 16  16 block types, while for 256  256 images, there are 4096, 1024 and 256 learning patterns respectively. According to the number of learning patterns within each epoch, the original CRLS algorithm is tested with 1 epoch for 4  4 block, 2 epochs for 8  8 block, and 10 epochs for 16  16 block. Our proposed method is evaluated with 1 epoch for both 4  4, 8  8 block, 2 epochs for 16  16 block. The results of 4  4 and 8  8 block simulations show that the original CRLS performs similar to our proposed

Fig. 4. SNR vs different eigenvectors of four images when the block size is 8  8. (a) Lena 512  512. (b) Peppers 512  512. (c) Bird 256  256. (d) Bridge 256  256.

1414

Z. Wang et al. / Pattern Recognition Letters 24 (2003) 1409–1415

Fig. 5. SNR vs different eigenvectors of four images when the block size is 16  16. (a) Lena 512  512. (b) Peppers 512  512. (c) Bird 256  256. (d) Bridge 256  256.

method as shown in Figs. 3 and 4. This is because the learning patterns in each epoch in these two cases are enough to refine the extracted PCs. However, the accumulated error finally prevents the last PCs extraction as shown in the insets of Figs. 3 and 4. In the case of 16  16 block, 1 epoch is enough to get an accurate extraction of all PCs for the 512  512 images as shown in Fig. 5(a) and (b). For 256  256 images, 1 epoch gives a converging extraction of all PCs; 2 epochs can get nearly the same extraction of all PCs as the SVD approach as shown in Fig. 5(c) and (d), while the original CRLS approach cannot produce an accurate extraction of all PCs even after 10 epochs learning as shown in Fig. 5. In the case of several PCs extraction such as 16 and 64 in the case of 4  4 and 8  8 block size, the accumulated error only

affects the last PC extraction. But when the number of PCs becomes larger such as 256 in the case of 16  16, the extraction error becomes tremendous with increasing number of principal vectors being extracted. This is partly due to not enough training patterns, but the main reason is the accumulated extraction error. The more the accurate extraction is, the more the training epochs needed. Using our approach, 1 epoch is enough for an accurate extraction of all principal vectors regardless of the number of learning patterns within each epoch. Our comparison also concerned the computational complexity exhibited by the considered algorithms. On a 1 GHz CPU and 512 MB RAM desktop computer with Matlab, the results are listed in Table 1, where we can see that the proposed method is faster than the original CRLS method.

Z. Wang et al. / Pattern Recognition Letters 24 (2003) 1409–1415

1415

Table 1 Elapsed time for PCs extraction Method

Block size

PCs

Time (s) Lena

Peppers

Bridge

Bird

Improved CRLS

44 88 16  16

16 64 256

7.8 17.1 24.7

8.0 17.5 26.1

2.3 3.4 11.1

1.9 3.0 10.6

Original CRLS

44 88 16  16

16 64 256

10.1 23.1 54.0

10.2 24.0 55.1

3.2 5.8 18.0

3.1 5.6 17.7

4. Conclusion Using the property of the eigenvector space, we proposed a fast and accurate sequential method for extracting all eigenvectors in PCA. The improvement over the existing CRLS method relies on the introduction of a proper initialization step based on orthogonalization by subspace deflation. Simulation results showed that 1 epoch is usually enough to obtain an accurate extraction of the eigenvectors of any order and that the proposed approach is faster than the CRLS approach.

References Abbas, H.M., Fahmy, M.M., 1994. Neural model for Karhunen–Loeve transform with application to adaptive image compression. IEE Proc. I, Commun. Speech Vision 140 (2), 135–143.

Bannour, S., Azimi-Sadjadi, M.R., 1995. Principal component extraction using recursive least squares learning. IEEE Trans. Neural Networks 6, 457–469. Cichocki, A., Kasprzak, W., Skarbek, W., 1996. Adaptive learning algorithm for principal component analysis with partial data. Proc. Cybernetics Syst. 2, 1014–1019. Costa, S., Fiori, S., 2001. Image compression using principal component neural networks. Image and Vision Comput. 19, 649–668. Diamantaras, K.I., Kung, S.Y., 1996. Principal Component Neural Networks: Theory and Applications. In: Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York. Fiori, S., Piazza, F., 2000. A general class of w–APEX PCA neural algorithms. IEEE Trans. Circuits and Systems––Part I 47 (9), 1394–1398. Oja, E., 1989. Neural networks, principal components, and subspaces. Int. J. Neural Systems 1, 61–68. Turk, M., Pentland, A., 1991. Eigenfaces for recognition. J. Cognitive Neurosci. 3, 71–86. Wong, A.S.Y., Wong, K.W., Leung, C.S., 2000. A practical sequential method for principal component analysis. Neural Process. Lett. 11, 107–112.