Pattern Recognition Letters 24 (2003) 2021–2030 www.elsevier.com/locate/patrec
Shape classification using smooth principal components q R.H. Glendinning *, R.A. Herbert QinetiQ Ltd., Great Malvern, Worcestershire WR14 3PS, UK Received 28 March 2002; received in revised form 25 October 2002
Abstract We suggest and assess a novel approach to shape classification using smooth functional principal components. This gives a rotation, location and scale invariant classifier. Our experiments show that this approach can outperform a number of competitors including conventional eigenshape methods and time series methods. The degree of smoothing associated with the best classification performance is determined automatically using cross-validation. 2003 Elsevier Science B.V. All rights reserved. Keywords: Shape classification; Smooth functional principal components; Random trigonometric polynomial; Sample spectral function
1. Introduction Classifying grey level images is a central problem in pattern recognition. Applications include the detection and classification of cancerous structures in mammograms, see Hastie et al. (1999), classifying environmental noise sources from spectrograms (Couvreur et al., 1998), numeral recognition from scanned hand writing (Hastie et al., 1995), image segmentation (Ifarraguerri and Chang, 2000), the classification of sand particles (Drolon et al., 2000) and various object recognition problems, see Glendinning (1999).
q
This work was carried out as part of Technology Group 10 of the MoD Corporate Research Programme. QinetiQ 2002. * Corresponding author. Tel.: +44-1-684-894384. E-mail address:
[email protected] (R.H. Glendinning).
We focus on procedures based on closed curves extracted from the images of interest. Typical examples are the boundaries of hand written characters, object silhouettes (Jaggi et al., 1999) or selected level sets of speech spectrograms, see Pinkowski (1993). Formally, let ð Ye ðtÞ; 0 6 t < 2pÞ denote the curve of interest. Our observations are generated by e ðtk Þ ¼ Ye ðtk Þ þ fðtk Þ; X
k ¼ 1; . . . ; T
ð1Þ
where ðtk ; k ¼ 1; . . . ; T Þ are T points sampled from the interval ð0 6 t < 2pÞ using an appropriate algorithm. Measurement errors are described by the stationary circular process ðfðtÞ; 0 6 t < 2pÞ. Here e ðtk Þ may be a scalar, vector or complex-valued. X In the remainder of this document, we assume that e ðtk Þ is the distance from the object centroid. To X simplify the notation, we suppress tk and put e ¼ ðX ek ; k ¼ 1; . . . ; T Þ (known as the centroid X distance function) with Ye describing the analogous quantity for ð Ye ðtk Þ; k ¼ 1; . . . ; T Þ.
0167-8655/03/$ - see front matter 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0167-8655(03)00040-0
2022
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
Our aim is to allocate the underlying image to a e . Admissible specified class using the vector X procedures are typically based on features which are invariant to location, rotation and scale changes. These features characterize significant aspects of the shape of the curves of interest and can be calculated using a range of procedures, see Loncaric (1998). The latter describing the use of invariant moments, landmarks, deformable templates, time series methods and curve fitting.
2. The features We adopt a frequency domain viewpoint, see Eom (1998) or Glendinning (1999) to describe the characteristics of a location and scale adjusted e . We calculate the seversion of the sequence X e quence X ¼ ðXk ; k ¼ 1; . . . ; T Þ, with Xk ¼ r^X1 ~ ðXk 2 e e aveð X ÞÞ, where aveð X Þ and r^X~ are the sample e . We estimate mean and variance associated with X e the spectral function fX ;T ðhÞ of X by the sample ðnÞ spectral function or periodogram ðIX ;T ðxk Þ; k ¼ 1; . . . ; T Þ, where xk ¼ 2pk=T and 2 T 1 X ðnÞ Xk expð ikhÞ ; 0 6 h < 2p: IX ;T ðhÞ ¼ 2pT k¼1 ð2Þ The underlying motivation of the frequency domain approach is that fX ;T ðhÞ fY ;T ðhÞ, where the latter is the spectrum of the underlying curve of interest, see Eq. (1). This is a good approximation when the effects of measurement noise are negligible. ðnÞ Our focus is on the vector Z ¼ ðIX ;T ðxk Þ; xk 2 W Þ, where W is a problem specific window. This is used to limit attention to regions of the spectrum which show significant differences between classes. The vector Z is invariant to rotation, scale and location changes, and is related to the features used in (Drolon et al., 2000) or (Kauppinen et al., 1995). The key step in our approach is to construct a smooth interpolant fW ðhÞ of Z over W , and use this function as the fundamental data unit, see Section 3. The choice of the form of fW ðhÞ is problem specific, with splines used in (Champely et al.,
1997) and Fourier series in our examples, although wavelets may give computational advantages, see Greenshields and Rosiene (1998). The use of fW ðhÞ, rather than Z has a number of advantages. Firstly, it provides a simple means of describing the smoothness of Z or the underlying spectral function fY ;T ðhÞ. The latter is zero outside the Fourier frequencies for stationary circular processes, see Singer and Chellappa (1985). Our approach generates smooth estimates of fY ;T ðhÞ under less restrictive assumptions than the autoregressive spectral estimates described in (Eom, 1998; Glendinning, 1999) or Kurita et al. (1994). Auto-regressive estimates are not optimal for data which cannot be adequately described by a low order (or sparse) auto-regression, see Beamish and Priestley (1981). Secondly, we can interpret fW ðhÞ as an estimate of the normalized spectral function (over W ) associated with the continuous circular process ð Ye ðtÞ; 0 6 t < 2pÞ. This is advantageous for data collected at differing sampling rates. In addition, periodic components located between the Fourier frequencies, may be less apparent when observations are limited to the Fourier frequencies. The ðnÞ direct use of the trigonometric polynomial IX ;T ðhÞ gives enhanced sensitivity in (non-circular) signal detection problems, see Glendinning (1997). We ðnÞ see that fW ðhÞ ¼ IX ;T ðhÞ, when W ¼ ½0; 2p and a Fourier series (with period 2p) is used as an interpolant. In Sections 3 and 4, we show how the function fW ðhÞ can be used to generate classification rules, although mixed features of the form e Þ; fW ðhÞÞ can be used with the bivariate ð^ r1 aveð X X~ analogue of our algorithm. An experiment describing the performance of our approach is described in Section 5, with conclusions summarized in Section 6.
3. Classifying functional data Classifying curves presents particular problems due to the high dimensionality of the sampled data and the smooth nature of many curves of interest, see Ramsay and Silverman (1997). An important group of procedures developed for data of this
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
type adopt a functional viewpoint. These regard the underlying curve as the fundamental data unit, rather than the sampled values. These include penalized discriminant analysis (Hastie et al., 1995), functional canonical correlation (Ramsay and Silverman, 1997), and the direct use of basis function expansions in (Alsberg, 1993). The latter summarizing a range of techniques used in the analysis of chemical spectra. We focus on the general philosophy suggested by Hall et al. (2001). This is based on the use of the principal component scores to describe between class variability for equally sampled data. This technique is phrased in functional data analytic terms, although no smoothing is carried out. Related techniques are described in (Champely et al., 1997), who employ spline based pre-processing and an instrumental variable classifier or Do and Kirk (1999), who use smooth functional principal components to generate smooth estimates of the mean and covariance matrix associates with equally sampled curves. We use an approach suggested by Silverman (1996) to generate smooth functional principal components. This describes the curves of interest ðfW ðhÞÞ using empirically derived eigenfunctions (the functional analogue of eigenvectors). The value of this approach is that the smoothness of the eigenfunctions can be controlled. This is advantageous as it leads to more stable estimates of the associated scores. Recent applications of functional principal components include the analysis of the spatial diversity of ecological population the sample spectral function of time series of differing length, see Bjørnstad et al. (1998). Our approach is related to techniques based on the use simple parametric models of the spectral function, see Eom (1998), Glendinning (1999) or Kurita et al. (1994). However, these procedures are not optimal for data which cannot be described adequately by a low order (or sparse) autoregression, see Beamish and Priestley (1981). A novel feature of our algorithm is the use of cross-validation to determine the value of the smoothing parameter which gives the best classification performance. This differs from earlier techniques used to generate an appropriate value of the smoothing parameter for functional princi-
2023
pal components in (Silverman, 1996; Do and Kirk, 1999) or Champely et al. (1997), where the degree of smoothing is associated with the best description of the data. In addition, our algorithm can be applied to data sampled at different rates (different values of T ), where the direct use of principal components is inappropriate, see Rao (1987). Data of this type is generated by certain boundary sampling schemes (Glendinning, 1999) and decentralized target detection systems with differing communication channel bandwidths.
4. The algorithm Formally, we have K disjoint classes describing different types of aircraft or machine parts in our experiments. We denote ith fundamental data unit ðlðiÞÞ in the lth class by fW ðhÞ, with h 2 W . This curve is observed at the intervals xk ¼ 2pk=T in W . The first step in our smooth version of Hall et al. (2001) is to calculate approximations of the form V X Sk2 fW ðhÞ h~ cj ðhÞ; Sk2 fW ðhÞik c~j ðhÞ; h 2 W ð3Þ j¼1
for various values of V . Here Sk2 fW ðhÞ is a smoothed version of the fundamental data unit fW ðhÞ given by the application of Eq. (5) and ð~ cj ðhÞ; j ¼ 1; . . . ; V Þ is a sequence of V eigenfunctions (whose smoothness depends on k). This approximation is determined from the pooled training data and the algorithm described in (Silverman, 1996), to extract the eigenfunctions and associated scores (see the Appendix A), although there are a number of alternative approaches see Cardot (2000) or Kneip (1994). We focus on the case where the fundamental data unit is periodic over W , with the more general case described in (Ramsay and Silverman, 1997). All the eigenfunctions and associated inner products in (3) are determined from the coefficients of the interpolant, which is assumed to have integrable second derivatives. We use a Fourier based interpolant for fW ðhÞ, with the more general case described in (Ramsay and Silverman, 1997). The scores (the coefficients of the eigenfunctions in Eq. (3)) are determined from the relationship
2024
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
hg; hik ¼ g^Sk2 h^;
ð4Þ
where g^ and h^ are vectors of Fourier coefficients of the functions g and h, and Sk2 is the smoothing operator (a diagonal matrix) Sk f ¼
2J X
sv fv /v ;
sv ¼ ð1 þ kq2v Þ1=2 ;
ð5Þ
v¼0
where Fourier representation of f is given by Pthe 2J 2 f ¼ v¼0 fv /v , and qv ¼ 4p2 v2 jW j . We choose J to represent the data Z in the interval W . The scores ðh~ cj ðhÞ; Sk2 fW ðhÞik ; j ¼ 1; . . . ; V Þ are given by ð~ c0j fW ; j ¼ 1; . . . ; V Þ where c~0j and fW are vectors of Fourier coefficients associated with c~j and fW ðhÞ, see Eq. (4). A Gaussian classifier (based on a multivariate density with un-equal covariance matrices and equal priors for each class in our experiments) is constructed from these features, although other classifiers may be appropriate for highly non-Gaussian data, see Hall et al. (2001). The features associated with unclassified curves are determined from the set of eigenfunctions ð~ cj ðhÞ; j ¼ 1; . . . ; V Þ are calculated from the training set and fW ðhÞ, the normalized periodogram associated with the unclassified curve. We select the value of k and the scores associated with a subset of r eigenfunctions ð~ cj ðhÞ; k 2 H ðr; kÞÞ which maximizes an estimate of their classification performance. In this way, the smoothing parameter and number of eigenfunctions and scores are selected using their effect on classification performance. Other approaches to the problem of choosing the number of principal curves are described in (Kneip and Utikal, 2001) or Rajan and Rayner (1997). We use leave-v-out cross-validation, with the percentage of correctly classified spectral functions as the performance measure, although the latter can be replaced by other problem specific metrics. Here v curves in the training set are removed and the eigenfunctions and scores calculated using the remaining data. These scores are used to classify the curves removed from the training set. We take v ¼ 5, and repeat this procedure 50 times to provide stable estimates of classification performance. This process is repeated for a grid of values of k, typically 0:5 10j , for j ¼ 1–5 and an all subset
search of the eigenfunctions, see Prakash and Murty (1995). To reduce computational costs, we only consider the first k eigenfunctions (ordered by the value of their eigenvalues, see Silverman (1996)) for k ¼ 1; . . . ; T .
5. The experiments We examine the performance of our algorithm using a number of data sets. These include radar returns from helicopter blades in hovering flight, see Glendinning and Goode (2003), selected level sets from speech spectrograms, see Pinkowski (1993) and a number of two-dimensional silhouettes. We present representative results for the twodimensional silhouettes of selected aircraft and machine parts described in (Eom, 1998). Test and training data generated by randomly rotating these binary images and extracting a sequence of T points from their boundaries using equi-spaced radial distance sampling, see Kauppinen et al. (1995), where a range of sampling procedures are assessed. We take T ¼ 32 and calculate the distance from the centroid to each point (to give the centroid distance function). This value of T lies in the middle of the range used by Kauppinen et al. (1995), who show that the performance of certain auto-regressive methods tends to improve as T decreases, with the opposite effect for Fourier based procedures. We follow earlier work in this area and contaminate each profile by independent Gaussian noise with zero mean and standard deviations /. The latter is used to generate test and training data with / ¼ 0:05 (mean of the centroid distance functions), / ¼ 0:1 (mean of the centroid distance functions) and a limited number of experiments with / ¼ 0:2 (mean of the centroid distance functions). This mimics variations in silhouette boundaries due to changes in pose and pixelation effects. The difficulty of our experiments ðnÞ increases with /, as the periodogram IX ;T ðhÞ moves closer to uniformity (white noise). We restrict attention to equally sampled data to facilitate comparisons with Eom (1998), and a fixed number of observations for comparisons with the use of (non-functional) principal compo-
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
nents, see Rao (1987). The latter is applied to the ðnÞ vector Z ¼ ðIX ;T ðxk Þ; k ¼ 1; . . . ; ½T =2 1Þ in our experiments, where xk ¼ 2pk=T are the Fourier frequencies (with W constructed in an analogous manner). We determine an optimal number of scores using the approach described in section four. 5.1. The aircraft silhouettes We present results for the eight aircraft silhouettes used in the experiments described in (Eom, 1998). These are labeled b1b, DC10, F14a, F16a, (Harrier) GR3, MIG 29, Mirage 5 and Shuttle, see Fig. 1. The training set is made from 50 independent samples of randomly rotated and noise corrupted centroid distance functions for each aircraft type (with the same number of centroid distance functions in the test set). This gives an experiment of approximately the same size as Eom (1998). 5.2. The results First, we bench mark our algorithm against Eom (1998). This is based on a bivariate representation of the silhouette boundaries, with features extracted from auto-regressive spectral estimates associated with both components (using around 267 points sampled from the silhouette boundaries). The zeros associated with the spectral estimates are the inputs to a neural network classifier.
2025
All silhouettes are correctly classified for / ¼ 0:05 using the scores associated with the first eight eigenfunctions smoothed with k ¼ 104 . This exceeds the performance (97.8%) reported in (Eom, 1998), for a classifier based on 40 spectral features. The phase information extracted by EomÕs sampling scheme, can be used by a bivariate analogue of our algorithm. Next, we compare our approach with the use of (non-functional) principal components on the ðnÞ vectors ðIX ;T ðxk Þ; k ¼ 1; . . . ; ½T =2 1Þ, where xk ¼ 2pk=T are the Fourier frequencies. The use of principal components is feasible for uniformly samples data, see Castro et al. (1986), and indicates the value of smoothing, a key feature of our approach. For / ¼ 0:05, we note that both procedures give approximately the same performance (100% for our algorithm, against 99.75% for the direct use of the first ten principal component scores). Next, we consider the more challenging scenario generated by / ¼ 0:1. Here the optimal performance (96% correctly classified), is given by the smoothing parameter k ¼ 3 101 and the scores associated with the first seven eigenfunctions, see Table 1. This gives an improvement of 3.5% over the direct use of the first nine principal component scores, see Table 2. 5.3. Machine parts This experiment is based on the machine parts used in the experiments described in (Eom, 1998).
Fig. 1. Aircraft silhouettes.
2026
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
Table 1 Confusion matrices for the aircraft silhouettes using functional principal components; results for the scores associated with the first seven principal components are presented for k ¼ 3 101 and / ¼ 0:1 b1b DC10 F14a F16a GR3 MIG 29 Mirage Shuttle
b1b
DC10
F14a
F16a
GR3
MIG 29
Mirage
Shuttle
92 0 0 8 0 0 0 0
0 100 2 0 0 0 0 0
0 0 98 2 0 2 0 0
4 0 0 88 0 2 0 0
2 0 0 0 100 0 0 0
2 0 0 2 0 96 0 0
0 0 0 0 0 0 96 2
0 0 0 0 0 0 4 98
Overall, 96% correctly classified.
Table 2 Confusion matrices for aircraft silhouettes using non-functional principal components; results for the scores associated with the first nine principal components are presented for / ¼ 0:1 b1b DC10 F14a F16a GR3 MIG 29 Mirage Shuttle
b1b
DC10
F14a
F16a
GR3
MIG 29
Mirage
Shuttle
88 0 0 6 0 6 0 0
2 100 2 2 0 0 0 0
2 0 98 10 0 4 0 0
4 0 0 80 0 6 0 0
2 0 0 2 100 4 0 0
2 0 0 0 0 80 0 0
0 0 0 0 0 0 98 4
0 0 0 0 0 0 2 96
Overall, 92.5% correctly classified.
These are labeled D1–D8 and are presented in Fig. 2. Test and training data are constructed with / ¼ 0:05 and / ¼ 0:1. The training set is made from 50 independent samples of randomly rotated and noise corrupted centroid distance functions for each machine part (with the same number of centroid distance functions in the test set). This gives an experiment of approximately the same size as Eom (1998). 5.4. The results Our approach classifies 97.25% of the machine parts correctly for / ¼ 0:05 and k ¼ 5 101 , see Table 3. Here, our classifier uses the scores associated with the first eight eigenfunctions and exceeds the performance (96.3%) of the spectral approach reported in (Eom, 1998) for machine parts with no added noise and 40 features. The use of the scores associated with the first six (non-
Fig. 2. Machine parts.
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
2027
Table 3 Confusion matrices for machine parts using functional principal components; results for the scores associated with the first eight eigenfunctions are presented for k ¼ 5 101 and / ¼ 0:05 D1 D2 D3 D4 D5 D6 D7 D8
D1
D2
D3
D4
D5
D6
D7
D8
100 0 0 0 0 0 0 0
0 100 0 0 0 0 0 0
0 0 100 0 0 0 0 0
0 0 0 92 0 0 0 14
0 0 0 2 100 0 0 0
0 0 0 0 0 100 0 0
0 0 0 0 0 0 100 0
0 0 0 0 0 0 0 86
Overall, 97.25% correctly classified. Table 4 Confusion matrices for machine parts using non-functional principal components; results for the scores associated with the first six principal components are presented for / ¼ 0:1 D1 D2 D3 D4 D5 D6 D7 D8
D1
D2
D3
D4
D5
D6
D7
D8
100 0 0 0 0 0 0 0
0 100 0 0 0 0 0 0
0 0 100 0 0 0 0 0
0 0 0 82 0 0 0 14
0 0 0 2 100 0 0 4
0 0 0 0 0 100 0 0
0 0 0 0 0 0 100 0
0 0 0 16 0 0 0 82
Overall, 95.5% correctly classified.
functional) principal components performs less well, with 95.5% correct classification, see Table 4. Next, we consider results for / ¼ 0:1. Here the optimal performance (87%) is given by the smoothing parameter k ¼ 102 and the scores associated with the first eight eigenfunctions. This gives an improvement of about 1% over the use of the scores associated with the first six (non-functional) principal components. From the results using aircraft and machine parts, we see that our approach is relative insensitive to increasing levels of contamination with additive Gaussian noise (with the greatest reduction in performance in the machine part experiments). The performance of Fourier descriptors to increasing noise exhibits similar characteristics in the experiments described in (Kauppinen et al., 1995). This contrasts with the sensitivity of low order auto-regressive shape classifiers. The advantage of smoothing is twofold: measurement noise can be suppressed and the instability of the scores and eigenfunctions reduced
(giving improved classification performance). Here instability refers to small changes in the data give large changes in the values of the score and eigenfunctions, see Silverman (1996). When silhouettes are subject to substantial levels of additive noise, we suggest the use of a two-stage approach. The first based on noise suppression followed by the use of our algorithm on the smoothed data, see Cardot (2000) for related ideas.
6. Conclusions We suggest a novel shape classification algorithm based on smooth functional principal components. A novel feature of our algorithm is the use of cross-validation to determine the value of the smoothing parameter associated with the best classification performance. This differs from earlier techniques used to generate an appropriate value of the smoothing parameter (Silverman, 1996;
2028
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
Do and Kirk, 1999) or Champely et al. (1997), where the degree of smoothing is associated with the best description of the data. We benchmark the performance of our approach using (non-functional) principal components analysis (as a representative eigenshape technique) and an algorithm described in (Eom, 1998). This is based on the use of a high order autoregression to estimate the spectral density associated with the components of a bivariate boundary representation. Our approach out-performs both of these techniques in our experiments (which include the same silhouettes used in (Eom, 1998)), although the effect of smoothing is relatively modest. We note that the value of smoothing may be more pronounced in scenarios where the key differences between classes are described by the scores associated with lower order eigenfunctions (associated with directions accounting for a small percentage of the variance) as the latter are more sensitive to smoothing, see Silverman (1996). Note that Prakash and Murty (1995) show that lower order principal components may contain useful information. However, improvements in performance must be balanced against increased computational costs in resource limited applications. The principal contributions to computational costs are feature extraction, the choice of classifier and the construction of an optimal set of features. The costs associated with feature extraction provide a useful way of comparing different algorithms, as techniques can be implemented with different classifiers (low (training) cost classifiers such as a Gaussian classifier or more costly neural network technologies). We estimate the cost associated with the extraction of functional principal component scores is about three times larger (in terms of FLOPS) than non-functional principal components analysis, the direct competitor of our approach. The relative costs associated with the extraction of spectral features using autoregressive techniques, see Eom (1998), depend on the order of the models considered, although these are of lower dimension than those used in principal components analysis. While our experiments are based on the same number of equally spaced points from the
boundary of each silhouette, our approach covers unequally sampled data, provided the same number of basis functions are used to describe the curves of interest (and give a non-degenerate matrix decomposition, see Silverman (1996)). When this cannot be guaranteed or there are large numbers of missing values, we suggest the use of the procedures described in (James and Hastie, 2001). Differing numbers of points ðT Þ are generated by certain sampling schemes (see Glendinning (1999)) and decentralized target detection systems with differing communication channel bandwidths. Smooth functional principal components can be used to extract features from scalar or bivariate boundary representations, see Kauppinen et al. (1995), with the analysis of vector-valued curves described in (Ramsay and Silverman, 1997). However, registration is needed to provide rotation and location invariant classification rules. High-dimensional vectors developing smoothly over time (such as the boundary of a moving object or the pixels of the corresponding grey level images, see Murase and Sakai (1996)), also fall within the scope of our general approach. Here our algorithm can be used to classify time trajectories, with functional principal components used to exploit their smoothness (here the focus is on the (high dimensional) trajectories, rather than dimension reduction for each image). Our approach differs from the use of conventional eigenshape methods, see Jain et al. (1998), by the use of smoothing, rotation and location invariance without registration and its applicability to unequally sampled data. Here smooth functional principal components have substantial advantages over the use of non-functional principal components, see Castro et al. (1986), for a general discussion of this issue. The algorithm introduced by Tatum and Hurvich (1993), can be used to calculate a robust analogue of the periodogram. This decreases the effect of local occlusions on the performance our algorithm. Wavelets can be used to provide a robust representation of silhouette boundaries in the direct use of functional principal components. Large-scale wavelet coefficients are used to provide an approximation to silhouette boundaries which are insensitive to occlusions, see Jaggi et al. (1999).
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
Where silhouette boundaries are contaminated by periodic components, we replace the periodogram with the robust estimate described by Von Sachs (1993). Although, we make minimal assumptions about the genesis of the curves of interest, we note that our approach can be used to classify data generated by a stochastic process (a time series) using estimates of their spectral function (or its logarithm). Here there are several estimates of fX ;T ðhÞ which can be used as the fundamental data unit, see Diggle and Al Wasel (1997). Note that the characteristic function can be used in an analogous manner for long tailed data. The latter are used to model telecommunications traffic and man-made noise emissions. Appendix A We summaries the steps used to calculate the eigenfunctions used in our algorithm. From Silverman (1996), we have • Determine the Fourier coefficients associated with each curve. • Apply the (half-spline) smoother described in Eq. (5) to the sample of Fourier coefficients. • Perform a standard principal components analysis on the sample of smoothed Fourier coefficients. Let the jth eigenvector be denoted by aj . • Smooth the eigenvectors to give an estimate of the Fourier coefficients associated with the corresponding eigenfunctions using c~j ¼ Sk aj . Here Sk is interpreted as a diagonal matrix. Then renormalize to give a unit norm. • Apply inverse Fourier transform to c~j to determine the associated eigenfunctions. References Alsberg, B.K., 1993. Representation of spectra by continuous functions. J. Chemom. 7, 177–193. Beamish, N., Priestley, M.B., 1981. A study of AR and window spectral estimation. J. Roy. Statist. Soc. Ser. C 30, 41–58. Bjørnstad, O.N., Stenseth, N.C., Saitoh, T., Lingjærde, O.C., 1998. Mapping the regional transition to cyclicity in Clethrionomys rufocanus: Spectral densities and functional data analysis. Res. Popul. Ecol. 40, 77–84.
2029
Cardot, H., 2000. Non-parametric estimation of smoothed principal components analysis of sampled noisy functions. J. Nonparametric Statist. 12, 503–538. Castro, P.E., Lawton, W.H., Sylvestre, E.A., 1986. Principal modes of variation for processes with continuous sample curves. Technometrics 28, 329–337. Champely, S., Guinand, B., Thioulouse, J., Clermidy, A., 1997. Functional data analysis of curve asymmetry with application to the color pattern of Hydropsyche contubernalis head capsule. Biometrics 53, 294–305. Couvreur, C., Fontaine, V., Gaunard, P., Mubikangiey, C.G., 1998. Automatic classification of environmental noise events by hidden Markov models. Appl. Acoust. 54, 187–206. Diggle, P.J., Al Wasel, I., 1997. Spectral analysis of replicated biomedical time series. J. Roy. Statist. Soc. Ser. C 46, 31–71. Do, K.A., Kirk, K., 1999. Discriminant analysis of eventrelated potential curves using smoothed principal components. Biometrics 55, 174–181. Drolon, H., Druaux, F., Faure, A., 2000. Particles shape analysis and classification using the wavelet transform. Pattern Recognition Lett. 21, 473–482. Eom, K.B., 1998. Shape recognition using spectral features. Pattern Recognition Lett. 19, 189–195. Glendinning, R.H., 1997. Testing for a jump in the periodogram. J. Statist. Comput. Simul. 56, 117–144. Glendinning, R.H., 1999. Robust shape classification. Signal Process 77, 121–138. Glendinning, R.H., Goode, A.J., 2003. Semiparametric classification of noisy curves. Pattern Recognition 36, 35–44. Greenshields, I.R., Rosiene, J.A., 1998. A fast wavelet-based Karhunen–Loeve transform. Pattern Recognition 31, 839– 845. Hall, P., Poskitt, D.S., Presnell, B., 2001. A functional dataanalytic approach to signal discrimination. Technometrics 43, 1–9. Hastie, T., Buja, A., Tibshirani, R., 1995. Penalized discriminant analysis. Ann. Statist. 23, 73–102. Hastie, T., Ikeda, D., Tibshirani, R., 1999. Statistical measures for the computer-aided diagnosis of mammographic masses. J. Comput. Graph. Statist. 8, 531–543. Ifarraguerri, A., Chang, C.I., 2000. Unsupervised hyperspectral image analysis with projection pursuit. IEEE Trans. Geosci. Remote Sensing 38, 2529–2538. Jaggi, S., Karl, W.C., Mallat, S.G., Willsky, A.S., 1999. Silhouette recognition using high-resolution pursuit. Pattern Recognition 32, 753–771. Jain, A.K., Zhong, Y., Dubuisson-Jolly, M.P., 1998. Deformable template models: A review. Signal Process 71, 109–129. James, G.M., Hastie, T.J., 2001. Functional linear discriminant analysis for irregularly sampled curves. J. Roy. Statist. Soc. Ser. B 63, 533–550. Kauppinen, H., Sepp€anen, T., Pietik€ainen, M., 1995. An experimental comparison of autoregressive and Fourier based descriptors in 2D shape classification. IEEE Trans. Pattern Anal Machine Intell. 17, 201–207. Kneip, A., 1994. Nonparametric estimation of common regressors for similar curve data. Ann. Statist. 22, 1386–1427.
2030
R.H. Glendinning, R.A. Herbert / Pattern Recognition Letters 24 (2003) 2021–2030
Kneip, A., Utikal, K.J., 2001. Inference for density families using functional principal component analysis. J. Amer. Statist. Assoc. 96, 519–542. Kurita, T., Sekita, I., Otsu, N., 1994. Invariant distance measures for planar shapes based on complex autoregressive model. Pattern Recognition 27, 903–911. Loncaric, S., 1998. A survey of shape analysis techniques. Pattern Recognition 31, 983–1001. Murase, H., Sakai, R., 1996. Moving object recognition in eigenspace representation: Gait analysis and lip reading. Pattern Recognition Lett. 17, 155–162. Pinkowski, B., 1993. Multiscale Fourier descriptors for classifying semivowels in spectrograms. Pattern Recognition 26, 1593–1602. Prakash, M., Murty, M.N., 1995. A genetic approach for selection of (near-)optimal subsets of principal components for discrimination. Pattern Recognition Lett. 16, 781–787. Rajan, J.J., Rayner, P.J.W., 1997. Model order selection for the singular value decomposition and the discrete Karhunen–
Loeve transform using a Bayesian approach. IEE Proc. Vision Image Signal Process 144, 116–123. Ramsay, J.O., Silverman, B.W., 1997. Functional Data Analysis. Springer-Verlag, New York. Rao, C.R., 1987. Prediction of growth curve models (with discussion). Statist. Sci. 2, 434–471. Silverman, B.W., 1996. Smoothed functional principal components analysis by choice of norm. Ann. Statist. 24, 1– 24. Singer, P.F., Chellappa, R., 1985. Machine perception of partially specified planar shapes. Proc. IEEE Computer Soc. Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, 10–13 June, pp. 497–502. Tatum, L.G., Hurvich, C.M., 1993. High breakdown methods for time series analysis. J. Roy. Statist. Soc. Ser. B 55, 881– 896. Von Sachs, R., 1993. Estimating the spectrum of a stochastic process in the presence of a contaminating signal. IEEE Trans. Signal Process 41, 323–333.