Nonlinear Analysis 71 (2009) e2934–e2939
Contents lists available at ScienceDirect
Nonlinear Analysis journal homepage: www.elsevier.com/locate/na
SVM-based target recognition from synthetic aperture radar images using target region outline descriptors Georgios C. Anagnostopoulos ∗ Electrical & Computer Engineering Department, Florida Institute of Technology, Melbourne, FL 32901, United States
article
info
Keywords: Pattern recognition Machine vision and scene understanding
abstract The work in this paper explores the discriminatory power of target outline description features in conjunction with Support Vector Machine (SVM) based classification committees, when attempting to recognize a variety of targets from Synthetic Aperture Radar (SAR) images. In specific, approximate target outlines are first determined from SAR images via a simple mathematical morphology-based segmentation approach that discriminates target from radar shadow and ground clutter. Next, the obtained outlines are expressed as truncated Elliptical Fourier Series (EFS) expansions, whose coefficients are utilized as discriminatory features and processed by an ensemble of SVM classifiers. In order to experimentally illustrate the merit of the proposed scheme, this work reports classification results on a 3-class target recognition problem using SAR intensity imagery from the well-known Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset. The novel approach was compared to selected methods mentioned in the literature in terms of classification accuracy. The results illustrate that only a small amount of EFS coefficients is necessary to achieve recognition rates that rival other established methods and, thus, target outline information can be a powerful discriminatory feature for automatic target recognition applications relevant to SAR imagery. © 2009 Elsevier Ltd. All rights reserved.
1. Introduction Over the last decade, a sizeable sector of Automatic Target Recognition (ATR) research has focused on recognizing a variety of military ground targets based on Synthetic Aperture Rader (SAR) imagery data. Furthermore, the Moving and Stationary Target Acquisition and Recognition [1] data set, a collection of SAR images taken of soviet made military vehicles, has been extensively used in the past to benchmark proposed ATR approaches. The bulk of successful approaches that have been demonstrated on the MSTAR data can be grouped into two main categories: template-based and target geometrybased. Template-based methods utilize a collection of prototype or reference images that characterize a target class. Rather than using a single prototype for each class, template-based methods utilize reference collections to account for intra-class variability that is due to varying pose or radar illumination of the same target. Previously unseen SAR images are compared in the spatial or frequency domain to the collection of stored templates via the use of some appropriate distance metric and then classified according to the label of the closest prototype as in the 1-Nearest Neighbor rule [2]. Template-based methods themselves can be grouped into two families. The first one, encompasses methods using distance metrics that are based on minimum squared error or maximum likelihood scores. Examples are the Quarter Power, Log Magnitude and Conditionally Gaussian classification models that are surveyed in [3]. Each of these three approaches
∗
Tel.: +1 321 674 7125; fax: +1 321 674 8192. E-mail address:
[email protected].
0362-546X/$ – see front matter © 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.na.2009.07.030
G.C. Anagnostopoulos / Nonlinear Analysis 71 (2009) e2934–e2939
e2935
models pixel intensities in the image as independent identically-distributed random variables. The Quarter Power model assumes a Gamma distribution, the log magnitude model assumes a Log-Normal model, and the conditionally Gaussian model assumes a complex Normal distribution. Subsequently, each pixel is used as a feature in a class membership likelihood calculation. In particular, for the quarter power and log magnitude models, likelihood maximization is shown to be equivalent to squared error minimization, where the error is the difference of intensities between the test image and a template. On occasion all three methods have demonstrated classification accuracies in excess of 95% [3]. Another subfamily of template-based methods employs filter templates and include methods such as Maximum Average Correlation Height (MACH) and the Distance Correlation Classifier Filter (DCCF; see [4]), Extended Maximum Average Correlation Height [5], Polynomial Distance Correlation Classifier Filter (PDCCF; [5]), and Minimum Noise and Correlation Energy (MINACE; [6]). The aforementioned approaches construct correlation filter masks from the training set and label each test pattern according to the strength, with which each class-associated filter responds. Furthermore, each of these filter template methods has demonstrated good performance, usually, in excess of 90%. However, template-matching methods have a few noteworthy disadvantages. The most important one is the inherently elevated space complexity of class templates, which leads to increased computational demands during testing. Apart from the radar’s depression angle, SAR returns of a target also vary significantly with the target’s pose (azimuth) and are, therefore, not invariant to relative rotation of the target. The last fact creates the need to store several templates for the same target to account for a number of different target orientations. For example, [3], consider 72 different target poses for each target class, since the MSTAR dataset provides a target poses for every 5 degrees for a given depression angle. Thus, for the ten-class problem they consider, they need to utilize 720 templates, which decreases significantly the quantity of available training samples per class (about 10 patterns per class in the previous study). The anticipated robustness of the resulting classifier becomes even more questionable, when we take into account the high dimensionality of the patterns employed; each pixel intensity of a SAR image is regarded as a feature, which, for example, leads to 214 features for a 128 × 128 image. A final disadvantage worth mentioning is the fact that the aforementioned methods treat the entire image as information useful for recognition. Obviously, this includes the noisy ground clutter region as well, which, in theory, should not contain any valuable target-discriminating information. The second category of popular approaches to SAR classification include methods that base their decision on the geometrical characteristics of the target and, therefore, significantly reduce the number of classification features employed in comparison to template-matching methods. Rather than considering the raw image intensity values as individual features, these methods preprocess the images and extract a relatively small amount of target geometry-based features that are capable of capturing important information about a target class. For instance, the pose of the target has already been shown to convey information that has a large impact on the classification accuracy of an ATR system [7]. The approximate length and width of the target, the average radar cross section (RCS) and log standard deviation (LSD; see [8]) were also considered as features. The radar cross section is a measure of overall reflectivity of the target, while the LSD measures the intra-pixel variation in intensity in the target region. Using only these 4 features [3] report a classification accuracy of 92%, when the target pose was assumed known a priori. Landmark analysis based on the location and magnitude of intensity peaks was has also been investigated by [9–11,8,12]. Additional features, such as edges and corners have been considered by [13]. An example of features pertaining to the shape of the target region are the Hu Moments [14]. However, the features’ rotationand scale-invariant nature cannot be fully taken advantage of, since in the case of the MSTAR dataset radar returns are not pose-invariant, while the images were constructed using the same scale. Overall, geometry-based approaches have been demonstrating competitive performance to their template-matching alternatives, while limiting the amount of features needed to fulfill the recognition task. In this paper we propose a novel approach that primarily focuses on the economical description of target region outlines via the use of Elliptical Fourier Series (EFS) coefficients, which are subsequently used as discriminatory features by a committee of Support Vector Machine (SVM) classifiers. In this new, geometrical-based method the target region is estimated through a simple and efficient, segmentation pre-processing step, after which the target’s estimated outline is extracted. 26 EFS coefficient-based features are then computed from each target outline and are used for classification. The approach is compared to other established methodologies and experimental results indicate very competitive levels of classification performance, while utilizing only a small number of human-interpretable features. The rest of the paper is organized as follows. Section 2 describes EFS coefficients and some of their important properties, while Section 3 presents the utilized methodology for identifying the target region in a SAR image, the subsequent feature extraction and, finally, the recognition stage. Finally, Section 4 reports on the obtained experimental results and states a few important conclusions. 2. Closed curve description via elliptical Fourier series Assume a parametric, closed curve C on the plane. Since C is closed, then we can assume that that there is a parameterization based on a variable θ , such that C = {(x, y) ⊂ R2 |r(θ ) = [x(θ ) y(θ )]T θ ∈ [0, 2π )} and r(θ + 2π ) = r(θ)∀ θ ∈ R. Let pk (θ ) be the phasor with integer angular frequency k defined as pk (θ )= ˆ
cos(kθ ) sin(kθ )
k = 0, 1, 2, . . . .
(1)
e2936
G.C. Anagnostopoulos / Nonlinear Analysis 71 (2009) e2934–e2939
Under some mild constraints that typically apply to Fourier signal representations, the parametric description r(θ ) of C can be expressed as an Elliptical Fourier Series (EFS), whose synthesis and analysis equations are +∞ X
r(θ ) =
Fk pk (θ )
(2)
k=0
Z Fk =
r(θ )pTk (θ )dθ
2π
k = 0, 1, 2, . . . .
(3)
The matrix quantity Fk ∈ R2×2 is called kth-order EFS coefficient of r(θ ). In essence, this coefficient contains the ordinary kthorder Fourier Series (FS) coefficients of the periodic function x(θ ) in its first row and of y(θ ) in its second row. The coefficient Fo is given as
Fo = ro
|
0
(4)
where ro is the curve’s center of gravity and 0 is the zero column vector. Additionally, Eq. (2) reveals that r(θ ) can be expressed as a superposition of phasors that are rotating at angular speeds that are proportional to their order k. Similar to the ordinary FS representation, the EFS representation features certain important properties. For example, spatial translation of C to C 0 by ∆r yields that r (θ ) = r(θ ) + ∆r 0
⇒
F0o = Fo + ∆r | 0 F0k = Fk k = 1, 2, . . .
(5)
where r0 (θ ) is the representation of the translated curve C 0 . In other words, all EFS coefficients but the 0th-order one remain invariant to translation. Furthermore, any linear transformation A of the curve’s description yields that r0 (θ ) = Ar(θ ) ⇒ F0k = AFk
k = 0, 1, 2, . . . .
(6)
Moreover, if we define the rotation matrix H(φ)= ˆ
cos(φ) sin(φ)
− sin(φ) cos(φ)
φ ∈ (−π , π]
(7)
that rotates a planar vector by an angle φ (in a counter-clockwise direction), we can show that r (φ) = r(φ − ψ) 0
⇒
F0o = Fo F0k = Fk H (kψ)
k = 1, 2, . . . .
(8)
It is also interesting to note that, for non-trivial, planar curves, each EFS coefficient can be associated to an ellipse (or, as a special case, to a circle) with major and minor axis lengths λ1k > 0 and λ2k > 0 respectively with λ1k ≥ λ2k . See, for example, [15]. More specifically, it can be shown that, if the kth EFS coefficient corresponds to a non-trivial ellipse (λ1k ≥ λ2k > 0; for example, unlike Fo , which corresponds to a point), it can be uniquely decomposed as H (φk ) Sk Λk H (ψk ) λk Sk H (ψk )
Fk =
λ1k > λ2k > 0 λ1k = λ2k = λk > 0
k = 1, 2, . . .
(9)
where Λk = diag {λ1k , λ2k } and Sk = diag {1, sk }, where sk = ±1, is a signature matrix. Eq. (9) implies that the kth phasor is moving on the coefficient’s associated ellipse, whose major axis is tilted by φk ∈ (−π /2, π /2] radians with respect to the positive x semi-axis (we will refer to it as rotation). Additionally, if the ellipse’s orientation is adjusted so that φk = 0, for θ = 2nπ , n = 0, ±1, ±2, . . . phasor pk (θ ) will form an angle of ψk ∈ (−π , π] radians with the positive x semi-axis, which we refer to as shift. The phasor’s motion will be counter-clockwise or clockwise, if sk = 1 or sk = −1 respectively. EFS coefficients can be used as shape descriptors to characterize, smoothen or approximate the outline of 2D objects. Among all other Fourier-based descriptors, EFS-based descriptors have certain unique advantages. First, they facilitate an appealing geometric interpretation as just discussed. Secondly, the truncation of the series in Eq. (2) always reconstructs a closed shape boundary, which makes the EFS representation ideal for smoothing outlines in contrast to other Fourier representations (for example, see [16]). Moreover, calculation of EFS coefficients can be easily performed via Fast Fourier Transforms for spatially-sampled curves as illustrated in [17]. Finally, rotation-, scale-, translation- and shift-invariant descriptors can be easily obtained from the EFS form, if desired. 3. Methodology In order for a the target outline to be obtained, a given SAR image needs first to be appropriately segmented, so that its target region can be separated from the shadow and ground clutter regions of the image. An example of these three different regions is shown in Fig. 1. Typically, the target region will consist of a contiguous region of pixel locations indicating strong radar returns. The shadow region consists of all low intensity pixels that appear behind the target, while the clutter region consists of pixels that correspond to noisy radar reflections from the ground surrounding the target. The target region
G.C. Anagnostopoulos / Nonlinear Analysis 71 (2009) e2934–e2939
e2937
Shadow
Clutter
Original Image
Segmented Image
Fig. 1. The left subfigure displays the original SAR image (chip) of a target. The pixel intensities represent magnitudes of the return signals. The right subfigure depicts the typical 3 main subareas in each SAR image containing a target: the target area, which, usually, corresponds to high-magnitude returns, the shadow area, which is due to the particular azimuth and depression angle used when illuminating the target with the radar and, finally, the ground clutter area. Table 1 Confusion matrix for the Quarter Power classification model using the 48 × 48 segmented images. Each data entry of the table, except the ones of the last column and last row, corresponds to the number of instances of the target class designated by the row identifier that were classified as the target class specified by the column identifier. Off-diagonal entries correspond to misclassification errors. The table shows that the most frequent confusions (24 of them) were T62 targets erroneously identified as T72s. Estimated class 2S1 2S1 BMP2 BRDM2 BTR60 BTR70 Actual class D7 T62 T72 ZIL131 ZSU23 4 Estimate accuracy
BMP2
Classification accuracy (%) BRDM2
BTR60
BTR70
D7
T62
T72
ZIL131
ZSU23 4
252 4 6 1 559 7 0 2 246 0 0 0 0 1 5 0 0 0 3 1 0 0 14 0 1 1 0 0 1 0 98.05% 95.88% 93.18% Total classification accuracy
0 2 8 194 0 1 2 0 2 1 92.38%
3 4 9 0 190 1 2 4 6 0 87% 94.10%
0 0 0 0 0 271 0 0 0 2 99.27%
3 0 0 0 0 1 227 7 1 0 95%
2 9 0 1 0 0 24 557 6 9 91.61%
4 2 9 0 0 0 9 0 257 0 91.46%
0 3 0 0 0 0 5 0 0 261 97.03%
91.97 95.23 89.78 99.49 96.94 98.91 83.15 95.70 93.80 95.26
segmentation procedure itself was performed using a two step sequence of basic thresholding and morphological operations in order to keep the implementation simple and computationally efficient. First, a threshold of 90% of the magnitude of the brightest pixel was used to produce a grossly approximated target region. Then, a morphological close operation was applied to the obtained image using a square structuring element. In the case where disconnected regions were obtained, regions that did not contain a pixel that was brighter than 40% of the magnitude of the brightest pixel were discarded. A morphological close operation was applied again on the result with an increasingly larger structuring element until a single target region, whose outline, finally, was extracted. While not being a necessary or required step in the proposed technique, in order to guarantee good segmentation quality, the following procedure was adopted: for each processed image, after segmentation, a new image was constructed from the original, such that the intensities of non-target region pixels (i.e. the ones belonging to the shadow and ground cluster regions) were set to zero. Next, the newly obtained images were classified by an already trained Quarter Power model. If the resulting classification accuracy was maintained, it would be a clear indication that the employed segmentation process preserved almost all discriminatory characteristics of the target and that it provides a reliable estimate of the true target region. Indeed, this was precisely the case, as Table 1 showcases, where the obtained classification results for 48 × 48 images are shown. The accuracy obtained was 94.1% in comparison to 95.5%, which was achieved, when using the raw SAR images. As target region outline descriptors, raw EFS coefficients were employed as classification features. As it was mentioned earlier, target regions strongly depend on the target pose. Additionally, the MSTAR dataset assumes the same scale for all SAR images. Both these facts eliminate the need for scale or rotation invariance as a desirable property. Shift-invariance was not considered in this study, as the method of locating a suitable starting point for the curve was based on estimated pose information and, thus, was deemed, in general, satisfactory. Regarding pose estimation, it is interesting to point out that the first EFS coefficient describes an ellipse, whose orientation and size roughly resemble the ones of the target region, as shown in Fig. 2. Since these EFS coefficients inherently capture pose information, there is no need to estimate the pose in a separate step as is done by other ATR systems. In particular, the first 7 EFS coefficients (orders 0 through 6) were extracted from each
e2938
G.C. Anagnostopoulos / Nonlinear Analysis 71 (2009) e2934–e2939
Fig. 2. Elliptical trajectory of the 1st-order (k = 1) phasor, whose characteristics (size and orientation of axes) are determined by the 1st-order EFS. The major axis orientation may give a reasonable estimation of target pose, while the size of the trajectory is closely related to the size of the target region.
Table 2 Results for the 3-class MSTAR recognition problem using the low-resolution SAR images (48 × 48 pixels). The proposed approach seems to fair quite well, when compared to other established methods. Apparently, it features an advantage of about 3.5% over all other competing approaches. An exception is the Quarter Power-based classification, which seems to indicate a small lead in performance. However, the less than 1% margin turns out not to be significant, whether from a statistical point of view or from a practical significance perspective. Feature set
Classifier
Number of features
Classification accuracy (%)
7 EFS coefficients 80 × 80 normalized image QP normalized image MINACE
DAGSVM SVM Nearest neighbor Max value
26 6400 2304 4096
93.46 90.99 94.10 90.60
target region outline, which resulted in a single 26-dimensional feature vector per SAR image (the 0th-order EFS coefficient contains only 2 non-zero quantities). The Directed Acyclic Graph Support Vector Machine (DAG-SVM; see [18]) multi-class recognition scheme was chosen to fulfill the classification stage role, as SVMs have shown good generalization properties in a variety of domains and recognition settings, especially, when dealing with small cardinality training sets comprised of relatively high-dimensional patterns. 4. Experimental results and conclusions The obtained results for the 3-class MSTAR problem (see [7]) are showcased in Table 2. The table entry associated to an alternative SVM-based classification scheme corresponds to the results of [19]. The MINACE results were taken from [6], while the Quarter Power (QP) results were reported in DeVore and O’Sullivan. We observe that the new EFS-based features in conjunction with the DAG-SVM voting scheme provide competitive performance, when compared to other already established techniques. It seems that the proposed method has an advantage of about 3.5% over the other methods, except the QP method, which appears to perform slightly better (less than 1%). However, it is a method that uses an enormous amount of raw features per image (6400 intensities in contrast to 26 EFS coefficient-related features), its accuracy advantage can be shown to be not statistically significant with 95% confidence and, finally, it does not provide any insight to what humaninterpretable characteristics contribute to accurate target recognition. Finally, we need to note that increasing the number of EFS coefficients to use as features did not significantly improve the recognition results, which may lead us to conclude that, indeed, a small quantity of EFS coefficients may allow for a quite accurate description of the target region outline. On balance, this paper has presented a new geometric-based approach to target recognition using SAR images. The new approach initially estimates and, finally, isolates the target region from a SAR image through an appropriate segmentation procedure that, as shown via an accuracy appraisal of a trained Quarter Power model, is very effective in retaining most (if not all) discriminatory information that is important to recognizing the type of the target. Subsequently, Elliptical Fourier Series (EFS) coefficients computed from the resulting target region outline are used as features in a multiclassSVM classification scheme. It was shown that EFS coefficients have several important properties, which should make them ideal outline descriptors. In this study, 7 coefficients were used resulting into 26 features. The obtained experimental results indicated that the information contained in these 7 coefficients was sufficient to convey the important characteristics of the target region outlines encountered, when using the MSTAR dataset. Furthermore, the specific set of EFS coefficient-based features appears to, indirectly, but successfully, address the target orientation issue, which has been a challenge for other competing methods; the need of target pose estimation is eliminated. Furthermore, the comparative experimental results depict the proposed method as highly competitive in terms of classification performance, while extracting and utilizing a relatively small number of features from the SAR images. In particular, the simple segmentation procedure to segment out the target region, the employment of the Fast Fourier Transform to calculate the required EFS coefficients and the small size
G.C. Anagnostopoulos / Nonlinear Analysis 71 (2009) e2934–e2939
e2939
of the feature set may make the proposed approach even more appealing due to its limited computational burden, when attempting to classify a target in a previously unseen SAR image. Acknowledgements The author acknowledges partial support from the following NSF grants: No. 0647018, No. 0647120, No. 0717680 and No. 0717674. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. Finally, the author would also like to express his gratitude to Louis P. Nicoli for his contributions to this work. References [1] Moving and Stationary Target Acquisition and Recognition (MSTAR) Public Dataset. https://www.sdms.afrl.af.mil/datasets/mstar/index.php. [2] R. Duda, P. Hart, D. Stork, Pattern Classification, 2nd ed., John Wiley & Sons. Inc., New York, 2001. [3] M.D. DeVore, J.A. O’Sullivan, A performance-complexity study of several approaches to automatic target recognition from synthetic aperture radar images, IEEE Transactions on Aerospace Electronic Systems 38 (2) (2002) 632–648. [4] A. Mahalanobis, D. Carlson, B.V. Kumar, Evaluation of MACH and DCCF correlation filters for SAR ATR using the MSTAR public database, Proceedings of SPIE, Algorithms for Synthetic Aperture Radar Imagery V 3370 (1998) 460–468. [5] R. Singh, B.V. Kumar, Performance of the extended maximum average correlation height (EMACH) filter and the polynomial distance classifier correlation filter (PDCCF) for multiclass SAR detection and classification, Proceedings of SPIE, Algorithms for Synthetic Aperture Radar Imagery IX 4727 (2002) 265–276. [6] R. Patnaik, D. Casasent, MINACE filter classification algorithms for ATR using MSTAR data, Proceedings of the SPIE, Automatic Target Recognition XV 5807 (2005) 101–111. [7] S. Yijun, Z. Liu, S. Todorovic, J. Li, Synthetic aperture radar automatic target recognition using adaptive boosting, Proceedings of the SPIE, Algorithms for Synthetic Aperture Radar Imagery XII 5808 (2005) 282–293. [8] J. Saghri, C. Guilas, Hausdorff probabilistic feature analysis in SAR image recognition, Proceedings of SPIE, Applications of Digital Image Processing XXVIII 5909 (2005) 21–32. [9] A. Kim, S. Dogan, J. Fisher III, R. Moses, A. Willsky, Attributing scatterer anisotropy for model based ATR, Proceedings of SPIE, Algorithms for Synthetic Aperture Radar Imagery VII 4053 (2000) 176–188. [10] T. Pink, U. Ramanathan, Intelligent selection of useful features for optimal feature-based classification, IEEE International Geoscience & Remote Sensing Symposium 7 (2000) 3012–3014. [11] B. Bhanu, G. Jones III, Exploiting azimuthal variance of scatterers for multiple look SAR recognition, Proceedings of SPIE, Algorithms for Synthetic Aperture Radar Imagery IX 4727 (2002) 290–298. [12] B. Ravichandran, A. Gandhe, R. Smith, XCS for robust automatic target recognition, in: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, 2005, pp. 1803–1810. [13] K. Krawiec, B. Bhanu, Visual learning by co-evolutionary feature synthesis, IEEE Transactions on Systems, Man, And Cybernetics—Part B: Cybernetics 35 (3) (2005) 409–425. [14] Y. Yang, Y. Qiu, C. Lu, 2005. Automatic Target Classification Experiments on the MSTAR SAR Images. SNPD/SAWN, 2–7. [15] F.P. Kuhl, C.R. Giardina, Accuracy of curve approximation by harmonically related vectors with elliptical loci, Computer Graphics and Image Processing 6 (1977) 277–285. [16] E. Persoon, K.S. Fu, Shape discrimination using Fourier descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (3) (1986) 388–397. [17] F.P. Kuhl, C.R. Giardina, Elliptic Fourier features of a closed contour, Computer Graphics and Image Processing 18 (1982) 236–258. [18] J. Platt, N. Cristianini, J. Shawe-Taylor, Advances in Neural Information Processing Systems, 12th ed., MIT Press, Massachusetts, 2000. [19] M. Bryant, F. Garber, SVM classifier applied to the MSTAR public data set, Proceedings of SPIE, Algorithms for Synthetic Aperture Radar Imagery VI 3721 (1999) 355–360.