Auto-correlation wavelet support vector machine

Auto-correlation wavelet support vector machine

Image and Vision Computing 27 (2009) 1040–1046 Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.c...

361KB Sizes 1 Downloads 84 Views

Image and Vision Computing 27 (2009) 1040–1046

Contents lists available at ScienceDirect

Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis

Auto-correlation wavelet support vector machine G.Y. Chen, G. Dudek * Center for Intelligence Machines, McGill University, McConnell Building, 3480 University Street, Montreal, Que., Canada H3A 2A7

a r t i c l e

i n f o

Article history: Received 22 November 2005 Received in revised form 20 September 2008 Accepted 20 September 2008

Keywords: Wavelets Support vector machine Machine learning Pattern recognition Function regression Auto-correlation

a b s t r a c t A support vector machine (SVM) with the auto-correlation of a compactly supported wavelet as a kernel is proposed in this paper. The authors prove that this kernel is an admissible support vector kernel. The main advantage of the auto-correlation of a compactly supported wavelet is that it satisfies the translation invariance property, which is very important for its use in signal processing. Also, we can choose a better wavelet by selecting from different wavelet families for our auto-correlation wavelet kernel. This is because for different applications we should choose wavelet filters selectively for the autocorrelation kernel. We should not always select the same wavelet filters independent of the application, as we demonstrate. Experiments on signal regression and pattern recognition show that this kernel is a feasible kernel for practical applications. Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction The support vector machine (SVM) was first developed by Vapnik for pattern recognition and function regression. It has been applied with great success in many application domains such as handwritten digit recognition, image classification, face detection, object detection, text classification, etc. [1–3]. An SVM, as typically defined and applied, assumes that all samples in the training set are independent and identically distributed. It uses an approximate implementation of the structure risk minimization principal in statistical learning theory, rather than the empirical risk minimization method. The key operating principle is that a kernel is used to map the input data into a higher dimensional feature space so that the (classification) problem becomes linearly separable. The kernel that is used for this plays a very important role in the performance of the SVM application. The most popular kernels are the Gaussian kernel, the polynomial kernel, the exponential radial basis function kernel, and the spline kernel, among others. Over the past decade, wavelet transforms have received substantial attention from researchers in numerous application areas. Both discrete and continuous wavelet transforms have shown great promises in such diverse fields as pattern recognition, image noise suppression, signal processing, image compression, and computer graphics, to name only a few. Chen and Xie [4] proposed two SVM kernels by using multiwavelet functions. Zhang et al. [5] pro* Corresponding author. E-mail addresses: [email protected] (G.Y. Chen), [email protected] (G. Dudek). 0262-8856/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2008.09.006

posed the scalar wavelet kernel for SVMs and they found that it outperforms the Gaussian kernel for function regression and pattern recognition. The scalar wavelet they used is defined as 2 wðxÞ ¼ cosð1:75xÞex =2 . However, there are many other kinds of wavelets that provide good properties as well. Is the wavelet used in [5] well suited to SVM applications? The answer is probably not. Since the general wavelet function may not be an admissible kernel for SVM, it is natural to use the auto-correlation of a wavelet to build SVM kernels. For the basis of wavelets and their applications, the reader is referred to [7–12]. In this paper, we propose to use the auto-correlation of compactly supported wavelets to construct SVM kernels and apply these to signal regression and pattern recognition. It is proven that this kernel is an admissible support vector kernel. Experimental results in this paper demonstrate that our auto-correlation wavelet kernel outperforms the scalar wavelet kernel, the Gaussian kernel, and the exponential radial basis function kernel for signal regression using several prototypical examples of real life signals. For pattern recognition, we conduct experiments illustrating their use in recognizing handwritten numerals and obtain good recognition rates without exotic algorithmic embellishments. Our results are better than the recognition rates demonstrated by Chen et al. [7]. This is an indication that our proposed autocorrelation wavelet kernel is a feasible kernel for practical applications. Note that this paper is an extension of our earlier conference paper [6]. The organization of this paper is as follows. Section 2 reviews the SVM for pattern recognition and function regression. Section 3 explains our auto-correlation wavelet SVM. Section 4 conducts some experiments for pattern recognition and function regression

1041

G.Y. Chen, G. Dudek / Image and Vision Computing 27 (2009) 1040–1046

and compares it with other kernels such as the scalar wavelet kernel, the Gaussian kernel, and the exponential radial basis function kernel. Finally, Section 5 draws conclusions and discusses future work that could be performed.

2.1. Function regression SVM function regression is a very important problem[1–3]. Suppose we have the independent uniformly distributed data ðx1 ; y1 Þ; . . . ; ðxl ; yl Þ, and we want to find a function

y ¼ f ðx; wÞ ¼ hw; xi þ b ¼ wT x þ b that minimizes the following risk function: l 1 CX ðjy  f ðxi ; wÞj  Þþ ; kwk2 þ 2 l i¼1 i

where C > 0 is a constant and  > 0 is a small number, and ðÞþ keeps the positive argument and takes zero value for negative arguments. This minimization problem can be solved by maximizing

i¼1

ðai þ ai Þ þ

!

ai yi Kðx; xi Þ þ b ;

i¼1

We briefly review the SVM for function regression and pattern recognition in this section. As noted above, this is a well-established technique that appears to have good performance.

l X

n X

f ðxÞ ¼ sign

2. Review of SVM methods

Wða; a Þ ¼  

for i 2 ½1; n. The decision function for a two-class classification problem becomes

i X ðai  ai Þyi i¼1

l X l 1X  ða  ai Þðaj  aj ÞKðxi ; xj Þ 2 i¼1 j¼1 i

subject to l X ðai  ai Þ ¼ 0;

l X

b ¼ yr 

ai yi Kðxr ; xi Þ;

i¼1

where ðxr ; yr Þ is any training example. 3. Auto-correlation of compactly supported wavelet as SVM kernel The wavelet transform provides a time–frequency representation of a signal. The dyadic wavelet transform has attracted a lot of attention in the signal processing community. In a dyadic wavelet transform, any finite energy signal is represented in terms of dilates and translates of a signal function called a wavelet. Wavelets satisfy a multiresolution analysis and they obey the following relations: L1 pffiffiffi X 2 hk /ð2x  kÞ

/ðxÞ ¼ and

L1 pffiffiffi X 2 g k /ð2x  kÞ;

wðxÞ ¼

where g k ¼ ð1Þk hLk1 , k ¼ 0; . . . ; L  1. By the definition of auto-correlation, we have

UðxÞ ¼

Z

þ1

/ðtÞ/ðt  xÞdt;

1 þ1

i¼1

Z

wðtÞwðt  xÞdt:

WðxÞ ¼

The approximated function is given by

It was derived in [13] that

l X f ðxÞ ¼ ðai  ai ÞKðx; xi Þ þ b;

UðxÞ ¼ Uð2xÞ þ

L=2 1X a2l1 ðUð2x  2l þ 1Þ þ Uð2x þ 2l  1ÞÞ; 2 l¼1

WðxÞ ¼ Uð2xÞ 

L=2 1X a2l1 ðUð2x  2l þ 1Þ þ Uð2x þ 2l  1ÞÞ; 2 l¼1

1

i¼1

* + l 1 X  ðai  ai Þxi ; ðxr þ xs Þ ; b¼ 2 i¼1 where Kðx; xi Þ is the SVM kernel with regard to x and xi , and xr and xs are any support vectors. 2.2. SVM for pattern recognition Given an identically independent distributed (i.i.d.) training example set fðx1 ; y1 Þ; . . . ; ðxn ; yn Þg, where x 2 RN , y 2 f1; 1g. The kernel function can map the training examples from the input space into a feature space such that the mapped training examples are linearly separable. In order to have a better classification result, we maximize the margin of separation between patterns. The problem can be converted to maximize the following dual optimization problem [1–3]: n X

ai 

i¼1

subject to n X

ai yi ¼ 0;

i¼1

ai 2 ½0; C

ð2Þ

k¼0

ai ; ai 2 ½0; C:

WðaÞ ¼

ð1Þ

k¼0

n X n 1X ai yi aj yj Kðxi ; xj Þ 2 i¼1 j¼1

where fak g are the auto-correlation coefficients of the filter fh1 ; . . . ; hL1 g:

ak ¼ 2

L1k X

hl hlþk

for k ¼ 1; . . . ; L  1

l¼0

and

a2k ¼ 0 for k ¼ 1; . . . ; L=2  1: It is not difficult to find that both U and W have support of ½L þ 1; L  1. A translation invariant kernel Kðx; x0 Þ ¼ Kðx  x0 Þ is an admissible support vector (SV) kernel if and only if its Fourier transform is non-negative [14]. This can be satisfied by defining the following auto-correlation wavelet kernel:

Kðx; x0 Þ ¼

 N   Y xi  x0i W ; a i¼1

where N is the dimension of the input feature vector and a is the scale factor. It should be mentioned that we can choose any compactly supported wavelet function to construct auto-correlation

1042

G.Y. Chen, G. Dudek / Image and Vision Computing 27 (2009) 1040–1046

wavelet kernel Kðx; x0 Þ. However, Daubechies-4 (D4) wavelet performs the best for signal regression in our experiments. Fig. 1 shows the D4 wavelet and its auto-correlation kernel. The wavelet function used here does not have an explicit form. In order to generate it, we need to set one wavelet coefficient to 1 and all the remaining coefficients to 0. An inverse wavelet transform generates the desired wavelet function depending on the selected input wavelet filter. Since the wavelet function has an implicit form, we save it in memory as one dimensional array with a relatively large number of sample points. This array needs to be generated only once and then saved for later use. The main reason why we use the auto-correlation wavelet kernel is because that the auto-correlation of a compactly supported wavelet satisfies the translation invariant property. This property is very important in signal processing. The wavelet has a limitation on this. The wavelet transform generates very different wavelet coefficients even if the input signal is shifted a little bit. This limitation can be overcome by taking the auto-correlation on the wavelet function. The following theorem states that this kernel Kðx; x0 Þ is an admissible SV kernel. Theorem. The auto-correlation wavelet kernel Kðx; x0 Þ is an admissible SV kernel. Proof. It is easy to show that the Fourier transform of the auto-correlation function of wðxÞ is equal to the power spectrum jFðwðxÞÞj2 x x0  i i [15]. Since this is non-negative, we know that W a is an admissible SV kernel. We also know that the product of two admissible kernels is still an admissible SV kernel. Therefore, it is clear that the auto-correlation wavelet kernel Kðx; x0 Þ is an admissible SV kernel. This completes the proof. h

4. Applications of auto-correlation wavelet SVM In this section, we apply the auto-correlation wavelet SVM to two important applications: function regression and pattern recognition.

4.1. Function regression We conduct some experiments for signal regression on a number of signals. The signals cover different types of cases, such as non-continuous, continuous, and infinitely differentiable. Our SVM code is obtained by modifying S. Gunn’s Matlab code [16]. The signals used in our experiments are

8 0; 10 6 x < 7:5; > > > > > 0:2x þ 1:5; 7:5 6 x < 5:0; > > > > > 0:5; 5:0 6 x < 2:5; > > > < 0:2x þ 1; 2:5 6 x < 0; f1 ðxÞ ¼ > 0:2x þ 1; 0 6 x < 2:5; > > > > > 0:5; 2:5 6 x < 5:0; > > > > > 0:2x þ 1:5; 5:0 6 x < 7:5; > > : 0; 7:5 6 x 6 10; 8 10 6 x < 2; > < 2:186x  12:864; f2 ðxÞ ¼ 4:246x; 2 6 x < 0; > : 10e0:05x0:5 sinðð0:03x þ 0:7ÞxÞ; 0 6 x 6 10; 8 10 6 x < 7:5; > > 0; > > > 1=3; 7:5 6 x < 5:0; > > > > > 2=3; 5:0 6 x < 2:5; > < f3 ðxÞ ¼ 1; 2:5 6 x < 2:5; > > > > 2=3; 2:5 6 x < 5:0; > > > > > 1=3; 5:0 6 x < 7:5; > > : 0; 7:5 6 x 6 10; f4 ðxÞ ¼ sinðð6p=20Þð10 þ xÞÞ; 10 6 x 6 10: Fig. 2 shows these four original signals and Fig. 3 shows the regression results for the auto-correlation wavelet kernel with a training sample of 51 points. We have a uniformly sampled examples of 200 points. The training examples in the three experiments are 51, 26, and 13, respectively. The remaining sample points in the total samples are used for testing. For comparison, experiments are

2

2

1.5

1.5

1

1

0.5

0.5

0

0

0

1

wavelet function

2

0

2

Autocorrelation of wavelet function

Fig. 1. The Daubechies-4 wavelet function and its auto-correlation kernel.

4

1043

G.Y. Chen, G. Dudek / Image and Vision Computing 27 (2009) 1040–1046

conducted on the four signal regression problems by using the autocorrelation wavelet kernel, the scalar wavelet kernel, the Gaussian kernel, and the exponential radial basis function kernel, respec0 2 tively. The Gaussian kernel is defined as Kðx; x0 Þ ¼ eðbjjxx k Þ , and the exponential radial basis function kernel is defined as 0 Kðx; x0 Þ ¼ eðbkxx kÞ , where in both cases b > 0 is a parameter chosen by the user. Tables 1–3 list the parameters used and the regression mean square errors (MSE) for the auto-correlation wavelet kernel, the scalar wavelet kernel, the Gaussian kernel, and the exponential radial basis function kernel, respectively. The wavelet kernel is deQ x x0  fined as Kðx; x0 Þ ¼ i w i a i , where a is a parameter chosen by 2 the user and wðuÞ ¼ cosð1:75uÞeu =2 . We adopt the parameter a ¼ 1 for the scalar wavelet kernel and b ¼ 1 for both the Gaussian kernel and the exponential radial basis function kernel. The constant C and  are chosen as 1 and 0.05 for all the experiments. All these parameters are chosen the same way as [5]. We select a ¼ 5 for our auto-correlation wavelet kernel, and D4 wavelet is used to create the auto-correlation wavelet kernel. Our SVM code is obtained by modifying S. Gunn’s Matlab code [16]. As mentioned in [5], these parameter values are chosen by the widely used crossvalidation technique ([17,18]). It is clear that the auto-correlation wavelet kernel are better than the scalar wavelet kernel, the Gaussian kernel, and the exponential radial basis function kernel for the first three signals. These signals are either non-continuous or continuous, but not infinitely differentiable. For infinitely differentiable signal f4 , our auto-correlation wavelet kernel does not perform as well as the scalar wavelet kernel. However, in real world we can rarely have infinitely differentiable signals. Our auto-correlation wavelet kernel will perform the best for signal regression for nearly all of the real world signals.

4.2.1. Feature extraction with dual-tree complex wavelets Kingsbury [19,20] introduced a new kind of wavelet transform, called the dual-tree complex wavelet transform, that exhibits approximate shift invariant property and improved angular resolution. The success of the transform is because of the use of filters in two trees, a and b. He proposed a simple delay of one sample between the level 1 filters in each tree and then the use of alternate odd-length and even-length linear-phase filters. As he pointed out that there are some difficulties in the odd/even filter approach. Therefore, he proposed a new Q-shift dual-tree [21], where all the filters beyond level 1 are even length. The filters in the two trees are just the time-reverse of each other, as are the analysis and reconstruction filters. The new filters are shorter than before, and the new transform still satisfies the shift invariant property and good directional selectivity in multiple dimensions. As shown later, this dual-tree complex wavelet can be successfully used in invariant feature extraction for pattern recognition. We propose a novel descriptor that employs the dual-tree complex wavelet transform and auto-correlation wavelet SVM. In order to eliminate the translation variance, we move the centroid of the pattern to the center of the pattern image. Also, we normalize the pattern so that it fits into a 32  32 image. Since the dual-tree complex wavelet has the properties of shift invariance and good directional selectivity in 2D, we perform the 2D dual-tree complex wavelet on the normalized pattern and use the features at different resolution scales as a feature vector in order to train and test the SVM. There should exist two sets of data: training dataset and testing dataset. Since the dual tree complex wavelet coefficients have real and imaginary parts, we take magnitude of the complex number and use this magnitude value as our features. The steps of the descriptor for each pattern can be summarized as follows:

4.2. Pattern recognition In this section, we first give an invariant feature extraction descriptor and then apply it to pattern recognition by using autocorrelation wavelet SVM.

1. Move the pattern centroid to the center of the pattern image. 2. Scale the pattern so that it fits exactly into a 32  32 matrix. 3. Perform the 2D dual-tree complex wavelet transform on the normalized pattern.

10

1 0.8

5

0.6 0

0.4 0.2

−5

0 −10

−5

0

5

10

−10 −10

−5

0

5

10

−5

0

5

10

1

1 0.8

0.5

0.6 0

0.4

−0.5

0.2 0 −10

−1 −5

0

5

10

−10

Fig. 2. The four original functions used in the experiments.

1044

G.Y. Chen, G. Dudek / Image and Vision Computing 27 (2009) 1040–1046

10

1 0.8

5

0.6 0

0.4 0.2

−5

0 −10

−5

0

5

10

−10 −10

−5

0

5

10

−5

0

5

10

1

1 0.8

0.5

0.6 0

0.4

−0.5

0.2 0 −10

−1 −5

0

5

10

−10

Fig. 3. Original functions (solid line) and resulting approximation by the auto-correlation wavelet kernel (dotted line).

Table 1 Regression error for signal regression by using 51 training samples and 149 test samples SVM kernel

SVM parameter

Auto-correlation wavelet kernel Scalar wavelet kernel Gaussian kernel Exponential RBF kernel

a ¼ 5; a ¼ 1; b ¼ 1; b ¼ 1;

C C C C

¼ 1; ¼ 1; ¼ 1; ¼ 1;

 ¼ 0:05  ¼ 0:05  ¼ 0:05  ¼ 0:05

Error, f1

Error, f2

Error, f3

Error, f4

0.0250 0.0387 0.0399 0.0337

0.0461 0.0543 0.1816 0.0609

0.0329 0.0526 0.0708 0.0469

0.0428 0.0353 0.0369 0.0496

The error values in bold faces are the best results we obtained for different kernels. Table 2 Regression error for signal regression by using 26 training samples and 174 test samples SVM kernel

SVM parameter

Auto-correlation wavelet kernel Scalar wavelet kernel Gaussian kernel Exponential RBF kernel

a ¼ 5; a ¼ 1; b ¼ 1; b ¼ 1;

C C C C

¼ 1; ¼ 1; ¼ 1; ¼ 1;

 ¼ 0:05  ¼ 0:05  ¼ 0:05  ¼ 0:05

Error, f1

Error, f2

Error, f3

Error, f4

0.0314 0.0454 0.0417 0.0410

0.0675 0.1100 0.1942 0.1148

0.0465 0.0506 0.0730 0.0588

0.0472 0.0366 0.0397 0.0703

The error values in bold faces are the best results we obtained for different kernels. Table 3 Regression error for signal regression by using 13 training samples and 187 test samples SVM kernel

SVM parameter

Auto-correlation wavelet kernel Scalar wavelet kernel Gaussian kernel Exponential RBF kernel

a ¼ 5; a ¼ 1; b ¼ 1; b ¼ 1;

C C C C

¼ 1; ¼ 1; ¼ 1; ¼ 1;

 ¼ 0:05  ¼ 0:05  ¼ 0:05  ¼ 0:05

Error, f1

Error, f2

Error, f3

Error, f4

0.0439 0.0487 0.0640 0.0511

0.2945 0.3518 0.4024 0.3818

0.0675 0.0973 0.1047 0.0750

0.0939 0.0510 0.0479 0.1545

The error values in bold faces are the best results we obtained for different kernels.

4. Train the auto-correlation wavelet SVM with the extracted feature vectors from the training dataset. 5. Test the auto-correlation wavelet SVM in order to get the recognition rates. The good property of the auto-correlation wavelet, and the approximate shift invariant property of the dual-tree complex wavelet and its good directional selectivity in 2D guarantee that the new method will be a good method compared with existing methods in pattern recognition. By using a two-class classification, one can construct the n-class classifier as follows [2]:

1. Construct n two-class classification rules where rule fk separates training vectors of class k from the other training vectors (sign½fk ðxi Þ ¼ 1 if vector xi belongs to the class k; sign½fk ðxi Þ ¼ 1, otherwise). 2. Construct the n-class classifier by choosing the class corresponding to the maximal value of function fk ðxi Þ:

m ¼ argmaxff1 ðxi Þ; . . . ; fn ðxi Þg: 4.2.2. Experiments We conducted experimental evaluations using the CENPARMI handwritten numeral database. This database contains the images

G.Y. Chen, G. Dudek / Image and Vision Computing 27 (2009) 1040–1046

of 6000 unconstrained handwritten numerals originally collected from dead letter envelopes by the US postal service at different locations. We used 4000 numerals for training and 2000 for testing. A sample of 100 handwritten numerals from this database is shown in Fig. 4. The individual images are already segmented and isolated. Within each pattern image there exists only one pattern on an uniform background. Each original pattern is represented by 32  32 pixels. The pattern is first normalized so that it is translation- and scale-invariant, and then the dual-tree complex wavelet is applied on the normalized pattern. The wavelet coefficients of an image have multi-resolution representation of the original image. The coarse resolution wavelet coefficients normally represent the global shape of the image, while the fine resolution coefficients represent the details of the image. Our SVM code was produced by modifying LIBSVM, which is avalilable at http : ==www:csie:ntu:edu:tw=  cjlin=libsvm. This is a 10-class classification problem instead of a 2-class problem. All the parameter values are selected using a classical cross-validation mechanism [17,18]. We use the dual-tree complex wavelet features at different resolution scales in the experiments reported here. The recognition rate for the proposed method is 95.65% by using the dual-tree complex wavelet features and the auto-correlation wavelet kernel. Note that this recognition rate is higher than that reported by Chen et al. [7], where they used multiwavelets and neural networks for recognizing handwritten numerals with a recognition rate of 92.20%. This indicates that the autocorrelation wavelet kernel is a feasible kernel for practical pattern recognition applications.

100

1045

5. Conclusions and future work In this paper, we propose to use the auto-correlation of compactly supported wavelet as a kernel for SVM and apply it to signal regression and pattern recognition. The kernel is an admissible support vector kernel, and it can be used to approximate arbitrary functions of any dimensions. It is not surprising that the auto-correlation wavelet kernel is better than or comparable to other kinds of kernels. Experiments for signal regression and pattern recognition show that this kernel is a feasible kernel for practical applications. The reason why this auto-correlation wavelet kernel is sometimes better is that the auto-correlation of a wavelet exhibits the translation invariance property whereas the wavelet does not have it. Also, for the auto-correlation wavelet kernel, we can choose from many kinds of wavelets so that a wavelet with better properties with respect to a specific application domain can be selected. D4 is exactly such a choice for our auto-correlation wavelet kernel. Future work will be done by applying the auto-correlation wavelet kernel to other related practical applications. More generally, the question of how to automatically select the best family for a specific application is an open problem. Acknowledgements The authors thank the anonymous reviewers whose constructive ideas and suggestions have improved the quality of the paper. This work was supported by the postdoctoral fellowship from the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canadian Space Agency Postdoctoral Fellowship Supplement. The modified version of LIBSVM (a library for support vector machines) tool has been used in this paper for classification available at http : ==www:csie:ntu: edu:tw=  cjlin=libsvm. References

200

300

400

500

600

700 100

200

300

400

Fig. 4. A sample of 100 handwritten numerals in the CENPARMI database.

[1] V.N. Vapnik, The Nature of Statistical Learning, Springer, New York, 1995. [2] V.N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998. [3] C. Cortes, V.N. Vapnik, Support vector networks, Machine Learning 20 (1995) 273–297. [4] G.Y. Chen, W.F. Xie, Multiwavelet support vector machines, in: Proceedings of Image and Vision Computing, Dunedin, New Zealand, November 28–29, 2005. [5] L. Zhang, W. Zhou, L. Jiao, Wavelet support vector machine, IEEE Transactions on Systems, Man, and Cybernetics – Part B 34 (1) (2004) 34–39. [6] G.Y. Chen, G. Dudek, Auto-correlation wavelet support vector machine and its applications to regression, in: Proceedings of the 2nd Canadian Conference on Computer and Robot Vision, British Columbia, May 9–11, 2005. _ [7] G.Y. Chen, T.D. Bui, A. Krzyzak, Contour-based handwritten numeral recognition using multiwavelets and neural networks, Pattern Recognition 36 (7) (2003) 1597–1604. [8] C.K. Chui, An Introduction to Wavelets, Academic Press, Boston, 1992. [9] I. Daubechies, Orthonormal bases of compactly supported wavelets, Communications on Pure and Applied Mathematics 41 (1988) 909–996. [10] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992. [11] S. Mallat, A theory for multiresolution signal representation: the wavelet representation, IEEE Transactions on PAMI 11 (7) (1989) 674–693. [12] G. Strang, Wavelets and dilation equations: a brief introduction, SIAM Review 31 (4) (1989) 614–627. [13] N. Saito, G. Beylkin, Multiresolution representations using the auto-correlation functions of compactly supported wavelets, IEEE Transactions on Signal Processing 41 (1993) 3584–3590. [14] A. Smola, B. Scholkopf, K.-R. Muller, The connection between regulation operators and support vector kernels, Neural Network 11 (1998) 637–649. [15] K.R. Castleman, Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1979. 07632. [16] S. Gunn, Support vector machines for classification and regression, University of Southampton, Southampton, UK, Image Speech and Intelligent Systems (ISIS) Group, Technical Report, May 1998. [17] T. Joachims, Estimating the generalization performance of a SVM efficiently, in: Proceedings of the 17th International Conference on Machine Learning, San Fransisco, CA, 2000.

1046

G.Y. Chen, G. Dudek / Image and Vision Computing 27 (2009) 1040–1046

[18] M. Kearns, D. Ron, Algorithmic stability and sanity-check bounds for leaveone-out cross validation, in: Proceedings of the Tenth Conference on Computer Learning Theory, New York, 1997, pp. 152–162. [19] N.G. Kingsbury, The dual-tree complex wavelet transform: a new efficient tool for image restoration and enhancement, in: Proceedings of the EUSIPCO’98, Rhodes, Sept. 1998, pp. 319–322.

[20] N.G. Kingsbury, Shift invariant properties of the dual-tree complex wavelet transform, in: Proceedings of the IEEE ICASSP’99, Phoenix, AZ, March 1999. [21] N.G. Kingsbury, A dual-tree complex wavelet transform with improved orthogonality and symmetry properties, in: Proceedings of the IEEE ICIP, Vancouver, Sept. 11–13, 2000.