Model Selection of Sea Clutter Using Cross Validation Method

Model Selection of Sea Clutter Using Cross Validation Method

Available online at www.sciencedirect.com Available online at www.sciencedirect.com ScienceDirect ScienceDirect Procediaonline Computer 00 (2019) 00...

592KB Sizes 0 Downloads 38 Views

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

ScienceDirect ScienceDirect

Procediaonline Computer 00 (2019) 000–000 Available at Science www.sciencedirect.com Procedia Computer Science 00 (2019) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 158 (2019) 394–400

3rd World Conference on Technology, Innovation and Entrepreneurship (WOCTINE) 3rd World Conference on Technology, Innovation and Entrepreneurship (WOCTINE)

Model Selection of Sea Clutter Using Cross Validation Model Selection of Sea Clutter Using Cross Validation Method a,b Method Taha Houcine Kerbaa , Amar Mezachea,c*, Houcine Oudiraa,d Taha Houcine Kerbaaa,b, Amar Mezachea,c*, Houcine Oudiraa,d Département d’Electronique, Université Mohamed Boudiaf-M’sila, 28000 M’sila, Algérie. b Laboratoire Analyse Université des SignauxMohamed et des Systèmes (LASS) M’Sila, Département d’Electronique, Boudiaf-M’sila, 28000Algeria M’sila, Algérie. c b Laboratoire SISCOM Constantine Algeria Laboratoire Analyse des Signaux et des Systèmes1,(LASS) M’Sila, Algeria d c Laboratoire de Génie Electrique (LGE) 1, M’Sila, Algeria Laboratoire SISCOM Constantine Algeria d Laboratoire de Génie Electrique (LGE) M’Sila, Algeria

a a

Abstract Abstract This work concerns a model selection of sea radar clutter used for adaptive target detection. Three distributions without This work a modelK,selection of sea clutter usedGaussian for adaptive target detection. Threewith distributions thermal noise concerns are considered; Pareto type II radar and compound inverse Gaussian (CG-IG) scale andwithout shape thermal noise considered; II comparing and compound Gaussian inverse Gaussian cumulative (CG-IG) with scale and shape parameters. Thearemodel selectionK,is Pareto carried type out by the experimental complementary distribution function (CCDF), drawn theselection recordedisdata intensity, a set of thethe CCDF curves derived from the underling models. To do function this, the parameters. The from model carried out bytocomparing experimental complementary cumulative distribution (CCDF), drawn from the recorded a set theinto CCDF derived underlingismodels. this, the cross validation technique is used data after intensity, dividing atoset of of data fourcurves segments. Thefrom bestthe distribution selectedToindowhich cross technique after dividing a set the of data fourcurve segments. The bestCCDF distribution which mean validation of the means square isofused errors (MSEs) between real into CCDF and the fitted curve is is selected minimal.inTo selectthea suited of statistical model in most cases,(MSEs) fitting comparisons through PIxel X-band database (IPIX).a mean the means square of errors between the are realillustrated CCDF curve and Intelligent the fitted CCDF curve isradar minimal. To select suited statistical model in most fitting comparisons From this study, it is shown that cases, the appropriate model is? are illustrated through Intelligent PIxel X-band radar database (IPIX). From this study, it is shown that the appropriate model is? © 2019 The Author(s). Published by Elsevier B.V. © 2019 The Authors. Published by Elsevier B.V. © 2019 The Author(s). Published by B.V. committee of the 3rd World Conference on Technology, Innovation and Peer-review under responsibility of Elsevier the scientific Peer-review under responsibility of the scientific committee of the 3rd World Conference on Technology, Innovation and Peer-review under responsibility of the scientific committee of the 3rd World Conference on Technology, Innovation and Entrepreneurship Entrepreneurship Entrepreneurship Keywords: Model Selection; Cross validation; CCDF; MSE. Keywords: Model Selection; Cross validation; CCDF; MSE.

1. Introduction 1. Introduction The ability to detect a distant target is one of the key features of radar systems. This is carried out via automatic The ability to detect distant target is one of the keyand features of radar systems. This is carried To out achieve via automatic detection schemes witha high probability of detection a fixed low false alarm probability. these detection with high probability of detection a fixedafter lowinvestigating false alarm backscattered probability. To achieve these properties,schemes correct models of targets and clutter must beand selected data in terms of properties, correct(ie., models of targets clutter must be selected after investigating backscattered data inconditions terms of radar parameters grazing angle, and antenna polarization, cell resolution, etc) and several environment radar parameters (ie., grazing angle, antenna polarization, cell resolution, etc) and several environment conditions

* Amar Mezache. E-mailMezache. address: [email protected] * Amar E-mail address: [email protected] 1877-0509© 2019 The Author(s). Published by Elsevier B.V. Peer-review under of Published the scientific of the 3rd World Conference on Technology, Innovation and Entrepreneurship 1877-0509© 2019 responsibility The Author(s). bycommittee Elsevier B.V. Peer-review under responsibility of the scientific committee of the 3rd World Conference on Technology, Innovation and Entrepreneurship

1877-0509 © 2019 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 3rd World Conference on Technology, Innovation and Entrepreneurship 10.1016/j.procs.2019.09.067



Taha Houcine Kerbaa et al. / Procedia Computer Science 158 (2019) 394–400

395

(wind speed, rough sea or land surface, etc). In this context, radar clutter modeling has been attracted attention of many researchers. First radar clutter models include Rayleigh or exponential distribution (Gaussian clutter case), Weibull distribution and log-normal distribution (non-Gaussian clutter). The last two models are used when the longer tails were observed in the radar backscatter and are suited to characterize smooth and rough sea surfaces with specific values of grazing angle and pulse width. With the development of high resolution radars, more useful compound Gaussian models combine both temporal and spatial components. The temporal component is commonly known as speckle and models the constrictive and destructive inference effects between multiple scatterers. It is typically associated with the small local wind-driven ripples (capillary waves) on the ocean surface. The spatial component represents changes in the large and medium scale waves which modulate the speckle [1, 2]. In the intensity domain, the speckle component always follows the exponential distribution whereas the texture component may follow several models like gamma, inverse gamma, inverse-Gaussian, log-normal distributions, etc [3]. Considering these laws, the K, Pareto type II or generalized Pareto (GP), CG-IG and K-LNT compound Gaussian models were obtained and have been investigated to fit real IPIX and Ingarra databases. It is affirmed that there is no model to fit the different data sets in all cases. In fact, sea clutter modeling still remains a challenging difficult task. The question is, can one use a clutter model achieving the best fitting to empirical data in all conditions. This question may be answered by another way when an on-line modeling is carried out using automatic selection model technique which switches between several clutter models based on a minimum fitting error. Model selection is a fundamental problem in many data analysis tasks. It has application in many signal processing areas including parametric spectrum estimation, system identification, array processing, radar and sonar. Other model selection problems include application of the maximum a posteriori principle, non-linear system modeling, polynomial phase signal modeling, neural networks and biomedical engineering. Among signal processing practitioners two approaches for model selection have gained popularity and are widely used [1]. These are Akaike's Information Criterion (AIC) [2, 3, 4] and the Bayesian Information Criterion (BIC) [5, 6], which is also known as Rissanen's Minimum Description Length (MDL) [7]. Other model selection methods exist, including the criterion [8], the Cp method [9] and the corrected AIC [10]. Relevant publications on the topic include in [11] and references therein. In some isolated cases, many methods produce marginally better (from a practical point of view) model selection results than those based on Akaike's and the MDL criteria. Recently it has been proposed that the Bootstrap he applied to the task of model selection [12]. The main advantage of the Bootstrap over classical statistical methods is that it can be applied with minimal assumptions, to scenarios where no information about the underlying distributions involved is available [13]. Although there are many model selection procedures, the development of new techniques that outperform the popular ones is still growing and continues to grow. For example, forward or Backward Elimination and Stepwise methods [14, 15] and in 1998 the Shi and Tsai we have recently developed a new method based on the generalized Kullback-Leibler information. In this paper, cross validation approach is used as a model selection method of sea clutter for adaptive radar detection. The clutter is modeled as a K-distribution, a GP distribution, or a CG-IG distribution with unknown parameters. This selection method is based on comparing the experimental CCDF to a set of the CCDF curves derived from the mathematical proposed models. The best model is selected in which the MSE between the real CCDF curve and the fitted CCDF curve is minimal. From this study, it is shown that the appropriate model is? The rest of the paper is organized as follows. The proposed approach for model selection is explained in section 2. In section 3, a review of different modeled for distribution clutter is presented. In section 4, modeling comparisons of the proposed model selection is presented using IPIX real data. Concluding remarks are given in section 5 2. Cross Validation Method Model selection problem is applied to many various fields of scientific studies, in order to corroborate or verify a theory, or in order to prefer one theory among a set competing hypothesis. Different criteria are taken as basis to select one model between several parallel models in both statistical and visual types. This paper proposes a cross validation method in model selection based on curve fitting. Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than other methods. Cross-validation divides the training data into several disjointed parts of approximately equal size. Each

Taha Houcine Kerbaa et al. / Procedia Computer Science 158 (2019) 394–400 Author name / Procedia Computer Science 00 (2019) 000–000

396

3

part is selected in turn as the testing data, whereas the remaining parts are used as the training data. The prediction model built on the training data is then applied to predicting the class labels of testing data. This process is repeated until all parts have been masked once, and then the prediction accuracies across all blinded tests are combined to give an overall performance estimate [16-17]. The proposed CV method in clutter modeling is executed using the following steps: 1

Split the sample into 4 subsets of equal size

2

For each fold, estimate a model on all the subsets except one

3

Use the left out subset to test the model, by calculating a CV metric (The MSE)

4

Average the CV metric across subsets to get the CV error

5

Choose the best model according to the lowest CV error (the average of MSEs)

3. Overview of K, GP, and CGIG Distribution Compound Gaussian processes are presented to describe sea-clutter returns which consist of a rapidly varying speckle component modulated by a slowly varying texture component. In this section, we present common clutter models that are K , GP (Pareto type II) and CG-IG distributions and to fit them to real IPIX data. The K-distribution is characterized by a shape parameter, , and a scale parameter . That is, for a square law detector, the probability density function (PDF) of the envelope, X, is given by Watts [18]: 

p X ( x) =



p ( y ) p ( x | y ) dy =

0

2c +1 x ( )

 −1 2

(

K −1 2c x

)

(1)

where the pdfs of the speckle, X , and the texture, Y, are, given by respectively

p X / Y ( x | y) = and

pY ( y) =

 x exp − 2 4y  4y



2

  , (0  x  ) 

(2)

2b 2 2 -1 y exp (− b 2 y 2 ), (0  y  ) ( )

(3)

Where (.) is the Gamma function, the term 4y2/π represents the underling mean intensity of the clutter (may vary spatially and temporally), c = b  / 2 and K (x) is a modified Bessel function. For high resolution sea clutter, values of  are generally observed in the interval [0.1, +  ]. = 0.1 represents very spiky clutter and  =  represents thermal noise. The corresponding CCDF of (1) is given by +

CCDF (T = P (T ) =

2

 p( x)dx = ( ) (T )

 /2

(

K 2 T

)

(4)

T

The GP distribution is obtained when the modulation component is an inverse gamma PDF [19, 20] pY ( y) =

    y − −1 exp  −  ( )  y

(5)

where  is the shape parameter, and  the scale parameter. The Pareto plus noise PDF is obtained as [21, 22]



Taha Houcine Kerbaa et al. / Procedia Computer Science 158 (2019) 394–400

p ( x) =

  (x +  ) +1

397

(6)

Integrating (6) from T to +  , the corresponding CCDF is also given in an integral form

  P(T ) =  T + 

  



(7)

If the inverse Gaussian law is used to describe the modulation component, the CG-IG PDF is obtained. The IG law is presented in [23, 24] pY ( y) =

1 / 2 2 y

3/2

 ( y − ) 2 exp  −  2 2 y 

  

(8)

where  is the shape parameter, and  is the mean. Note that,  relies upon sea conditions and radar parameters. Spiky clutter corresponds to values of 0    1 and the exponential distribution or Gaussian clutter is attained for  →∞. The CG-IG PDF is expressed by         exp   1 − 1 + 2 x   (9) p X (x ) =  +   ( + 2 x ) 3 / 2  ( + 2 x )          The corresponding CCDF of (9) is determined to be −1 / 2   2T  2T  P (T ) = 1 + exp 1 − 1 +          

   

(10)

4. Experimental Results In order to evaluate the performance of the proposed method, real-world IPIX lake clutter is used. The real data are real sea clutter data collected by McMaster University, Canada, IPIX radar. IPIX is experimental X-band search radar, capable of dual polarized and frequency agile operation. The radar site was located at the east of ‘place Polonaise’ at Grimsby, Ontario, looking at lake Ontario from a height of 20m. The PRF is 1 KHz per polarization, the antenna beam width is 0.9° and the sweeps per cell are 60 000. We carried out our analysis on many files. The range resolution for this file is 3, 15 and 30m, where the number of range cells is 34. The data have been previously processed to remove the dc offset and the phase imbalance due to hardware imperfections [25]. The following experimental procedure focuses firstly on the parameter estimation found by [zlog(z)] estimator and secondly on the validation of the used models using the real data described previously. The MSE values are calculated from the fitted and empirical CCDFs curves. According to these values which are obtained from specific range of the CCDFs between 10-3 and 10-1, the selection model gives lower values allowing best tail fitting as several scenes will show. The MSE values are illustrated in Tables.1. In the case of a resolution of 3m, HH polarization, 18 th range cell and VV polarization, 25th range cell; better modeling performance is obtained by the GP CCDFs as depicted in Figs.1 and 4. Note that, the K distribution does not fit accurately the tail of empirical data. If another study based on the use of the same resolution with HH and VV polarization, 7th range cell. The same pattern is observed in Figs. 2 and 3. The K gives the lowest values of MSE. Now in the case of a resolution of 15m, a HH polarization is considered with 12th range cell, the CIG provides lower values of MSE with respect to the other models as shown in Fig. 5. The same results are obtained if the case of a resolution of 30m with VV polarization, 31 th range cell is considered as depicted in Fig. 6.

Taha Houcine Kerbaa et al. / Procedia Computer Science 158 (2019) 394–400 Author name / Procedia Computer Science 00 (2019) 000–000

398

5

Table.1: MSE of K, GP and CG-IG models for HH and VV polarizations. HH

HH

VV

VV

HH

VV

Model

3m 18thcell

3m 7thcell

3m 25thcell

3m 7thcell

15m 12th cell

30m 28th cell

K

-2.343

-3.297

-3.214

-3.707

-2.854

-1.600

GP

-2.672

-3.342

-3.427

-3.667

-2.994

-1.681

CG-IG

-2.516

-3.332

-3.29

-3.688

-3.238

-1.942

0

0

10

10

Real ccdf K GP CGIG

Real ccdf K GP CGIG

-1

10

-1

CCDFs

CCDFs

10

-2

10

-2

10

-3

10

-10

-5

0

5 10 15 Normalized threshold, T(dB)

20

25

-3

30

Fig. 1: Fitted CCDFs for HH polarization, resolution of 3m

10

-10

-5

0

5 10 15 Normalized threshold, T(dB)

20

25

30

Fig. 2: Fitted CCDFs for HH polarization, resolution of 3m and 7th range cell.

and 18th range cell. 0

0

10

10

Real ccdf K GP CGIG

Real ccdf K GP CGIG

-1

-1

10

CCDFs

CCDFs

10

-2

10

-2

10

-3

10

-10

-5

0

5 10 15 Normalized threshold, T(dB)

20

25

30

-3

10

-10

-5

0

5 10 15 Normalized threshold, T(dB)

20

25

Fig.3: Fitted CCDFs for VV polarization, resolution of 3m

Fig. 4: Fitted CCDFs for VV polarization, resolution of 3m

and 7th range cell.

and 25th range cell.

30



Taha Houcine Kerbaa et al. / Procedia Computer Science 158 (2019) 394–400 0

399

0

10

10 Real ccdf K GP CGIG

Real ccdf K GP CGIG

-1

-1

CCDFs

10

CCDFs

10

-2

-2

10

10

-3

10

-10

-3

-5

0

5 10 15 Normalized threshold, T(dB)

20

25

30

10

-10

-5

0

5 10 15 Normalized threshold, T(dB)

20

25

Fig. 5: Fitted CCDFs for HH polarization, resolution of 15m

Fig. 6: Fitted CCDFs for VV polarization, resolution of 30m

and 12th range cell.

and 28th range cell.

30

5. Conclusion In this paper, we have considered the problem of model selection of sea clutter. The clutter was modeled by the K-distribution, the GP distribution, or the CG-IG distribution. The cross validation approach for model selection was carried out by comparing the experimental CCDF, drawn from the recorded data intensity, to a set of the CCDF curves derived from the mathematical proposed models. The best model is selected in which the mean of means square of errors (MSE) between the real CCDF curve and the fitted CCDF curve is minimal. To assess the obtained results, To apply the proposed selection method, IPIX real data is subdivided into four sbsets. The cross validation method provides a suitable environment to define and it is quickly produce satisfactory results when a large number of samples are at hand.?

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

M. W. Long, Radar Reflectivity of Land and Sea- Third Edition, Artech House, 2001. L. Rosenberg and G. V. Weinberg, “Performance analysis of Pareto CFAR detectors”, International Conference on Radar Systems (Radar 2017), Belfast, UK, Oct 23-26, 2017 I. Chalabi, A. Mezache, “Estimators of compound Gaussian clutter with log-normal texture”, Remote sensing letters, Vol. 10, Issue. 7, 2019, pp. 709-716. B. Porat, "Digital Processing of Random Signals. Theory and Methods", Englewood Cliffs, NJ, 1994. H. Akaike, "Statistical Predictor Identification", Annals of the Institute of Statistical Mathematics, 22: 203–217, 1970. H. Akaike, "A New Look at the Statistical Model Identification", IEEE Transactions on Automatic Control, 19: 716–723, 1974. H. Akaike, "Comments on “On Model Structure Testing in System Identification”", International Journal of Control, 27: 323–324, 1978. H. Akaike, "An Objective Use of Bayesian Models", Annals of the Institute of Statistical Mathematics, 29: 9–20, 1977. G. Schwarz, "Estimating the Dimensions of a Model", The Annals of Statistics, 6: 461–464, 1978. J. Rissanen, "A Universal Prior for Integers and Estimating by Minimum Description Length", The Annals of Statistics, 11: 416–431, 1983. E. J. Hannan and B. G. Quinn, "The Determination of the Order of an Autoregression", Journal of the Royal Statistical Society, Series B, 41: 190–195, 1979. C. L. Mallows, "Some comments on Cp", Technometrics, 15: 661–675, 1973. C. M. Hurvich and C. L. Tsai, "Regression and Time Series Model Selection in Small Samples", Biometrika, 76: 297–307, 1989. C. R. Rao and Y. Wu, "A Strongly Consistent Procedure for Model Selection in a Regression Problem", Biometrika, 76: 469–474, 1989. P. M. Djuric, "Using the Bootstrap to Select Models", In Proc. of the IEEE ICASSP 97, vol. V, pp. 3729-3732, Munich, Germany, 1997. B. Efron and R. Tibshirani, "An Introduction to the Bootstrap", Chapman and Hall, 1993. H. Linhart and W. Zucchini, "Model Selection", Wiley, 1986. Morris H. DeGroot and Mark J. Schervish, "Probability and Statstics", Addison Wesely, 3rd Edition 2002.

400

Taha Houcine Kerbaa et al. / Procedia Computer Science 158 (2019) 394–400 Author name / Procedia Computer Science 00 (2019) 000–000

7

[19] H. Shakouri G. ; S.K.Y. Nikravesh " A new approach in model selection using fuzzy decision making-a trade off between probability and possibility theories" In Proceedings of the IEEE 2012 International Conference on systems, man and cybernetics, 8-11 Oct. 2000 Nashville, TN, USA [20] Knut, Baumann, "Cross-validation as the objective function for variable-selection techniques", TrAC Trends in Analytical Chemistry, Vol.22, no. 6, 395-406 (2003) [21] Watts S, "The performance of cell-averaging CFAR systems in sea clutter", IEEE Int. Radar Conf., Alexandria, VA, USA, 7-12 May 2000, pp. 398-403. [22] Balleri, A., Nehorai, A., Wang, J.: ‘Maximum likelihood estimation for compound-Gaussian clutter with inverse gamma texture’, IEEE Trans. Aerosp. Electron. Syst , Vol. 43, no. 2, 775-779(2007) [23] Rosenberg, L., Bocquet, S.: ‘The Pareto distribution for high grazing angle sea-clutter’, IEEE Int. Geoscience and Remote Sensing Conf., 21-26 July 2013, Melbourne, Australia [24] Bocquet, S.: ‘Parameter estimation for Pareto and K distributed clutter with noise,’ IET Radar. Sonar. Navig., Vol. 9, no. 1, 104-113(2014) [25] Sahed, M, Mezache, A.: ‘Closed-form estimators for the Pareto clutter plus noise parameters based on non-integer positive and negative order moments’, IET Radar Sonar Navig., Vol. 11, no. 2, 359-369(2017) [26] L. Li .: ’Joint parameter estimation and target localization for bistatic MIMO radar system in impulsive noise’, Signal, Image and Video Processing, Vol. 9, no. 8, 1775–1783(2015). [27] R. Zhang, W. Sheng, X. Ma, Y. Han.: ‘Clutter map CFAR detector based on maximal resolution cell’, Signal, Image and Video Processing ., Vol. 9, no. 5, 1151–1162(2015) [28] A. Younsi, M. Greco, F. Gini and A.M. Zoubir, "Performance of the adaptive generalized matched subspace constant false alarm rate detector in non-Gaussian noise: an experimental analysis", IET Radar Sonar and Navigation, Vol. 3, No. 3, , pp. 195-202, 2009.