Hybrid intelligent technique for automatic communication signals recognition using Bees Algorithm and MLP neural networks based on the efficient features

Expert Systems with Applications 38 (2011) 6000–6006 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

Download PDF

257KB Sizes 0 Downloads 26 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 38 (2011) 6000–6006

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Hybrid intelligent technique for automatic communication signals recognition using Bees Algorithm and MLP neural networks based on the efﬁcient features Ataollah Ebrahimzadeh Shrme ⇑ Faculty of Electrical and Computer Engineering, Babol University of Technology, Shariati Blvd., Babol 4715716467, Iran

a r t i c l e

i n f o

Keywords: Communication signal recognition Bees Algorithm Multi-layer perceptron Learning algorithms Spectral characteristics features Higher order moments Higher order cumulants Feature selection Optimization

a b s t r a c t Automatic communication signal recognition plays an important role for many novel computer and communication technologies. Most of the proposed techniques can only identify a few kinds of digital signal and/or low order of them. They usually require high levels of signal to noise ratio (SNR). In this paper, we investigate twofold. First, we propose an efﬁcient system that uses a combination set of spectral characteristics and higher order moments up to eighth and higher order cumulants up to eighth as the effective features. As the classiﬁer we used a multi-layer perceptron (MLP) neural network. In this stage we investigate different learning algorithms of MLP neural networks that some of them, such as quick prop (QP) learning algorithm, extended delta-bar-delta (EDBD), super self adaptive back propagation (SuperSAB) and conjugate gradient (CG) are proposed for the ﬁrst time in the area of communication signals recognition. Experimental results show that proposed system discriminates a lot of digital communication signals with high accuracy even at very low SNRs. But a lot of features are used for this recognition. Then at the second fold, in order to reduce the complexity of the recognizer, we have proposed a novel hybrid intelligent technique. In this technique we have optimized the classiﬁer design by Bees Algorithm (BA) for selection of the best features that are fed to the classiﬁer. Simulation results show that the proposed technique has very high recognition accuracy with seven features selected by BA. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction Automatic communication signal recognition plays an important role for various applications, such as: electronic surveillance, interference identiﬁcation, monitoring, spectrum management, software radios, intelligent modems, etc. Due to the increasing usage of digital communication signals in the novel technologies, in this paper we have focused on the recognition of these signals. Automatic signal classiﬁcation techniques, usually, divided two principle techniques. One is the decision theoretic (DT) approach and the other is pattern recognition (PR) approach. DT methods use probabilistic and hypothesis testing arguments to formulate the recognition problem (Donoho & Huo, 1997; Sue, Jefferson, & Mengchou, 2008; Swami & Sadler, 2000). Pattern recognition approaches, however, do not need such careful treatment. They are easy to implement. PR methods can be further divided in two main subsystems: the feature extraction subsystem and the classiﬁer subsystem. The former extracts the features and the latter determines the membership of signal. Examples of features used are: instantaneous features (Nandi & Azzouz, 1998), higher order statistics (Dobre, Bar-Ness, & Su, 2004; Spooner, 1995), magnitude of Haar wavelet transform ⇑ Tel./fax: +98 111 32 39 214. E-mail address: [email protected] 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.11.021

(Ho, Prokopiw, & Chan, 2000; Ou, Huang, Yuan, & Yang, 2004), constellation shape (Mobasseri, 2000), zero crossing (Hsue & Soliman, 1990), kurtosis of the signal (Deng, Doroslovacki, Mustafa, Jinghao, & Sunggy, 2002), etc. The second subsystem, the pattern recognizer, is responsible for classifying the incoming signal based on the features extracted. It can be implemented in many ways, e.g. K-nearest neighbourhood classiﬁer (KNN), fuzzy classiﬁer, multilayer perceptron (MLP) neural network, etc. (Avci, Hanbay, & Varol, 2007; Chani & Lamontagne, 1993; Iversen, 2003; Mingquan, Xianci, & Leming, 1996; Mingquan, Xianci, & Lemin, 1998; Nandi & Azzouz, 1998; Sehier, 1993). In Chani and Lamontagne (1993), Ghani and Lamontagne proposed using the multi-layer perceptron (MLP) neural network with back-propagation (BP) learning algorithm for automatic signal type identiﬁcation. They showed that neural network classiﬁer outperforms other classiﬁers such as K-Nearest Neighbor (KNN). In Nandi and Azzouz (1998), the authors showed that the neural network classiﬁer has a higher performance than the threshold classiﬁer. In Sehier (1993), the authors showed that the artiﬁcial neural network has better performance than K-Nearest Neighbor (KNN) classiﬁer and the well known binary decision trees. In Avci et al. (2007), the authors have done a comparative study of implementation of feature extraction and classiﬁcation algorithms based on discrete wavelet decompositions and adaptive network based fuzzy interference system (ANFIS) for recognition of considered signal types.

6001

A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006

From the published works, it appears clear that in the design of a system for automatic classiﬁcation of communication signals, there are some important issues, which, if suitably addressed, may lead to the development of more robust and efﬁcient recognizers. One of these issues is related to the choice of the classiﬁcation approach to be adopted. In particular, we think that despite its great potential, the MLP approach has not received the attention it deserves in the communication signal classiﬁcation literature. For example, the training algorithm of the neural networks classiﬁer has not received the attention. Most of papers have used BP training algorithm. In this paper we use different learning algorithms of MLP neural networks that some of them are proposed for the ﬁrst time in the area of communication signals recognition. Choosing the right feature set is still another issue. In some papers the higher order cumulants have been applied, and in some another papers spectral features used, separately. In this paper, for the ﬁrst time, a combination set of the spectral features and higher order moments up to eighth and higher order cumulants up to eighth are proposed as the effective features. In the other hand in the most of previous methods, the features are not selected in a completely automatic way. Feature selection determines the optimal set of features from the given set of features. In this paper it is used a novel optimization algorithm, i.e. Bees Algorithm (BA) that is introduced in Pham et al. (2006). We have optimized the classiﬁer design by for selection of the best features that are fed to the classiﬁer, from the combined statistical and spectral feature set. Thus the optimum design of the classiﬁer is another issue that is targeted in this paper. The paper is organized as follows. Feature extraction module will be described in Section 2. Section 3, describes the classiﬁer. Section 4 present the Bees Algorithm and Section 5 describes the feature selection using it. Section 6 shows some simulation results. Finally, Section 7 concludes the paper.

2. Feature extraction In digital communications, based on changes in the frequency of message, amplitude of message, phase of message, or changes in amplitude and phase, there are four main digital signal formats: frequency shift keying (FSK), amplitude shift keying (ASK), phase shift keying (PSK) and quadrature amplitude signal (QAM), respectively (Proakis, 2001). In this paper we have considered the following radio signals for recognition: FSK2, FSK4, PSK2, PSK4, ASK4, ASK8, QAM8, QAM16, QAM32, QAM64 and QAM128. For simplifying the indication, we substitute the signals FSK2, FSK4, PSK2, PSK4, ASK4, ASK8, QAM8, QAM16, QAM32, QAM64 and QAM128 with P1, P2, P3, P4, P5, P6, P7, P8, P9, P10 and P11 respectively. Finding the suitable features is a very important step for recognition of these radio signals. In this paper a suitable combination of the spectral features and higher order moments and higher order cumulants up to eighth and spectral features are proposed as the prominent characteristics of the considered signals. Following phrases describes these features.

raf

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! !2ﬃ u u1 X X 1 ¼t f 2 ðiÞ jfN ðiÞj C A ðiÞa N C A ðiÞa n

n

t

P where fN ðiÞ ¼ fcrðiÞ ; f c ðiÞ ¼ f ðiÞ mf ; mf ¼ N1 Ni¼1 f ðiÞ. raf could s differentiate between the modulation schemes without frequency information and the FSK modulation schemes and also between FSK2 and FSK4. (2) cmax: is the maximum value of the power spectral density of the normalized-centered instantaneous amplitude of the intercepted signal segment. This feature can express the character of signal’s envelope and was added to differentiate between the modulation schemes that carry amplitude modulation and those that do not. (3) rfn: Standard deviation of the direct value of the normalizedcentered instantaneous frequency, evaluated over the over non-weak segments of the intercepted signal:

rfn

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! !2 u u1 X 1 X t 2 ¼ f ðiÞ fN ðiÞ C A ðiÞa N C A ðiÞa n

n

t

ð2Þ

t

rfn is used to discriminate between FSK2 and FSK4. 2.2. Higher order moments and higher order cumulants Probability distribution moments are a generalization of concept of the expected value. Recall that the general expression for the ith moment of a random variable is given by Nikias and Petropulu (1993):

li ¼

Z

1

ðs mÞi f ðsÞds

ð3Þ

1

where m is the mean of the random variable. The deﬁnition for the ith moment for a ﬁnite length discrete signal is given by:

li ¼

N X ðsk mÞi f ðsk Þ

ð4Þ

k¼1

where N is the data length. In this study signals are assumed to be zero mean. Thus:

li ¼

N X

sik f ðsk Þ

ð5Þ

k¼1

Next, the auto-moment of the random variable may be deﬁned as follows:

Mpq ¼ E½spq ðs Þq

ð6Þ

where p is called the moment order and s* stands for complex conjugation of s. Assume a zero-mean discrete based-band signal sequence of the form sk = ak + jbk. Using the deﬁnition of the auto-moments, the expressions for different orders may be easily derived. For example:

M83 ¼ E½s5 ðs Þ3 ¼ E½ða þ jbÞ5 ða jbÞ3 ) M 83 2

2

2.1. Spectral features

ð1Þ

t

3

3

4

4

5 5

¼ E½ða5 þ j5a4 b þ j 10a3 b þ j 10a2 b þ j 5ab þ j b Þ 2

2

3 3

ða3 j3a2 b þ j 3ab j b Þ Spectral features were demonstrated to be suitable for signals which contain hidden information in a single domain (Nandi & Azzouz, 1998). In this paper, based on the considered digital signals, the following spectral features are selected. (1) raf: Standard deviation of the absolute value of the normalized-centered instantaneous frequency over non-weak segments of the intercepted signal:

2

2

3

3

5

) M 83 ¼ E½a8 þ j2a7 b j 2a6 b j 6a5 b þ j 60a3 b 6

6

7

7

5

8 8

þ j 2a2 b j 2ab j b 2

6

8

) M 83 ¼ E½a8 þ 2a6 b 2a2 b b

ð7Þ

Consider a scalar zero mean random variable s with characteristic function ^f ðtÞ. Expanding the logarithm of the characteristic function as a Taylor series, one obtains:

6002

A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006

Table 1 Some of the Chosen higher order features for a number of the considered digital signal types.

M41 M61 M84 C61 C80 C84

PSK2

QAM16

QAM64

1 1 1 16 244 244

0 1.32 3.13 2.08 13.99 17.38

0 1.3 3.9 1.797 11.5 0

log ^f ðtÞ ¼ k1 ðjtÞ þ þ

kr ðjtÞr þ r!

ð8Þ

The constants kr in (8) are called the cumulants (of the distribution) of s. The symbolism for pth order of cumulant is similar to that of the pth order moment. More specially:

C pq ¼ Cum½ s; . . . ; s ; s ; . . . ; s : |ﬄﬄﬄ{zﬄﬄﬄ} |ﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄ} ðpqÞterms

ð9Þ

ðqÞterms

For example:

wij ðt þ 1Þ ¼ wij ðtÞ e

3.1. Back propagation with momentum (BP with momentum) BP (Rumelhart & McClelland, 1986) makes use of gradient descent with a momentum term to smooth out oscillation. It adds an extra momentum parameter, l, to the weight changes. Eq. (13) gives the weight update for BP with momentum:

Dwij ðt þ 1Þ ¼ e

C 81 ¼ Cumðs; s; s; s; s; s; s; s Þ

M½s1 ; . . . ; sn ¼

X 8v

h i h i Cum fsj gj2v 1 . . . Cum fsj gj2v q

ð10Þ

where the summation index is over all partitions v = (v1, . . . , vq) for the set of indexes (1, 2, . . . , n), and q is the number of elements in a given partition. Cumulants may be also be derived in terms of moments:

Cum½s1 ; . . . ; sn ¼

X Y Y ð1Þq1 ðq 1Þ!E½ sj E½ sj 8v

j2v 1

ð11Þ

j2v q

where the summation is being performed on all partitions v = (v1, . . . , vq) for the set of indices (1, 2, . . . , n). We have considered the second, fourth, sixth and eighth order of the moments and cumulant as the features. These features are: M20, M21, M40, M41, M42, M60, M61, M62, M63, M 80, M81, M82, M83, M84, C20, C21, C40, C41, C42, C60, C61, C62, C63, C80, C81, C82, C83 and C84. Therefore the total number of the statistical features is 28. We have computed all of these features for the considered digital signals. Table 1 shows some of the higher order moments and higher order cumulants for a number of the considered digital signal types. These values are computed under the constraint of unit variance in noise free.

3. Classiﬁer In this paper, it is used a MLP neural network as the classiﬁer. A MLP neural network consists of an input layer (of source nodes), one or more hidden layers (of computation nodes) and an output layer (Haykin, 1999). The number of nodes in the input and the output layers depend on the number of input and output variables, respectively. In this paper a single hidden layer MLP neural network was chosen as the classiﬁer. The recognition basically consists of two phases training and testing. In training stage, weights are calculated according to the chosen learning algorithm. The issue of learning algorithm and its speed is very important for MLP. (BP) algorithm is still one of the most popular algorithms. In BP a simple gradient descent algorithm updates the weight values:

ð12Þ

where wij represents the weight value from neuron j to neuron i, e is the learning rate parameter, E represent the error function. However under certain conditions, the BP network classiﬁer can produce non-robust results and easily converge to local minimum. Moreover it is time consuming in training phase. New algorithms have been proposed so far in order to network training. However, some algorithms require much computing power to achieve good training. In this paper it is used different learning algorithms of MLP neural networks that some of them are proposed for the ﬁrst time in the area of communication signals recognition. Following phrases describes brieﬂy these algorithms.

The nth order cumulant is a function of the moments of orders up to (and including) n. Moments may be expressed in terms of cumulants as:

@E ðtÞ @wij

dE dE ðtÞ þ l ðt 1Þ dwij dwij

ð13Þ

This takes account of the previous weight changes and leads to a more stable algorithm and accelerates convergence in shallow areas of the cost function. 3.2. Quick prop (QP) algorithm Quick prop (QP) algorithm does not need much computation. QP calculates the update for each weight separately using a Newton-like method to minimize the error in that weight dimension. As reported by Fahlman (1988), two major assumptions are made in the development of QP algorithm: (i) the error vs. weight curve for each weight is assumed to be a convex parabola, and (ii) the change in the slope of the error curve with respect to each weight, is not affected by all the other weights that are changing at the same time. The weight update rule for QP is given by following expression:

wij ðt þ 1Þ ¼ wij ðtÞ þ Dwij ðtÞ

Dwij ðtÞ ¼

g ij ðtÞ ¼

g ij ðtÞ Dwij ðt 1Þ g ij ðt 1Þ g ij ðtÞ

dE dwij

ð14Þ ð15Þ

ð16Þ

where wij represents the weight value from neuron j to neuron i, E represent the error function. There are some heuristics that are used with QP. If gij(t) and gij(t 1) are the same sign, then ggij(t) is also added to Dwij(t). The gij(t) gradient is modiﬁed by weight decay, given by:

g ij ðtÞ ¼ g ij ðtÞ þ 0:0001 wij ðtÞ

ð17Þ

Also the derivative of the sigmoid function f is altered to be:

fj0 ¼ oj ð1 oj Þ þ 0:1

ð18Þ

3.3. SuperSAB SuperSAB (Tollenaere, 1990) is an adaptive learning rate algorithm. Each weight wij, connecting node j with node i has its own learning rate that can vary in response to the error surface. The weight update is:

A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006

Dwij ðt þ 1Þ ¼ gij ðtÞ

gijðtÞ ¼

dE ðtÞ þ lDwij ðt 1Þ dwij

8 < gþ gij ðt 1Þ; : g gij ðt 1Þ;

if

dE dwij

if

dE ðt 1Þ dw ðtÞ 0

dE dwij

ij

dE ðt 1Þ dw ðtÞ 0

ð19Þ

ð20Þ

ij

And gþ g1 . The parameters (g) and (g+) are the increment and decrement factors.

6003

tially decreasing function of the magnitude of the weighted average gradient component Dij. To prevent oscillations in the values of the weights, aij and lij are kept below preset upper bounds amax and lmax. The values of the training parameters adopted for the algorithms were determined empirically. They were as follows: for BP: e = 0.9 and l = 0.6; for EDBD: amax = lmax = 2, Aa = 0.095, Al = 0.01, ua = 0.1, ul = 0.01, ca = 0.0, cl = 0.0, and h = 0.7. 4. Bees Algorithm

3.4. Conjugate gradient (CG) CG (Battiti, 1992) is a method from the ﬁeld of optimization which makes use of the second derivative of the error surface. If the error surface was quadratic, this method would reach the minimum in N steps, where N is the number of weights. Each weight update is conjugate to the previous one, and requires a line search to calculate its length. The algorithm calculates a search vector which is the direction of the weight step, and the line search ﬁnds the minimum point along that vector. For non-quadratic functions, the algorithm will not reach the minimum in N steps, and the algorithm needs to ‘restart’ or re-evaluate its parameters. No restart procedure was used in this implementation since the algorithm always reached the target error in fewer than N steps. Line minimisation methods are lengthy to list in full. However, the accuracy is the important ﬁgure. The algorithm used here performed an exact search, ﬁnding the minimum to be within a certain interval. The length of the interval was at most 5% of the length of the previous weight step. The update rule is as follows:

Dwðt þ 1Þ ¼ gðtÞdðtÞ

ð21Þ

dðtÞ ¼ gðtÞ þ bðt 1Þdðt 1Þ

ð22Þ

The Polak–Ribiere rule is used for calculating b(t 1), given by:

bðt 1Þ ¼

gðtÞðgðtÞ gðt 1ÞÞ gðt 1Þ2

ð23Þ

3.5. Extended delta-bar-delta algorithm As its name implies, this algorithm (Jacobs, 1988) is an extension of DBD (Minai & Williams, 1990). It also aims to decrease the training time for MLPs. The changes in weights are calculated as:

Dwij ðt þ 1Þ ¼ aij ðtÞ

dE ðtÞ þ lij Dwij ðt 1Þ dwij

ð24Þ

Where aij and lij are learning and momentum coefﬁcients, respectively. The use of momentum in EDBD is one of the differences between it and DBD. The learning coefﬁcient change is given as:

DaijðtÞ

8 dE ðtÞ 0 A expðca jDij ðtÞjÞ; if Dij ðt 1Þ dw > > ij < a dE ¼ ua aij ðtÞ; if Dij ðt 1Þ dw ðtÞ 0 ij > > : 0; otherwise

ð25Þ

DlijðtÞ

5. Bees Algorithm for feature selection From Section two it is found that the total number of the features was a lot. This number of features causes high computational complexity in the recognizer. On the other hand, although some of

1. Initialize population with random solutions. 2. Evaluate fitness of the population.

where Aa, ua and ca are positive constants and Dij is computed as a dE dE weighted average of dw ðt 1Þ and dw ðt 2Þ given by Dij ¼ ij ij dE dE ð1 hÞ dw ðt 1Þ þ h ðt 2Þ. In the equation, h is the convex dwij ij factor. The momentum coefﬁcient change is obtained as:

8 dE ðtÞ 0 A expðcl jDij ðtÞjÞ; if Dij ðt 1Þ dw > > ij < l dE ¼ ul lij ðtÞ; if Dij ðt 1Þ dwij ðtÞ 0 > > : 0; otherwise

Bees Algorithm is an optimization algorithm inspired by the natural foraging behavior of honey bees (Pham et al., 2006). Fig. 1 shows the pseudo code for the algorithm in its simplest form. This algorithm requires a number of parameters to be set, namely: number of scout bees (n), number of patches selected out of n visited points (m), number of elite patches out of m selected patches (e), number of bees recruited for the best e patches (nep), number of bees recruited for the other (m-e) selected patches (nsp), size of patches (ngh) and stopping criterion. The algorithm starts with the n scout bees being placed randomly in the search space. The ﬁtness of the points visited by the scout bees are evaluated in step 2. In step 4, bees that have the highest ﬁtness are designated as ‘‘selected bees’’ and sites visited by them are chosen for neighborhood search. Then, in steps 5 and 6, the algorithm conducts searches in the neighborhood of the selected bees, assigning more bees to search near to the best e bees. The bees can be chosen directly according to the ﬁtness associated with the points they are visiting. Alternatively, the ﬁtness values are used to determine the probability of the bees being selected. Searches in the neighborhood of the best e bees which represent more promising solutions are made more detailed by recruiting more bees to follow them than the other selected bees. Together with scouting, this differential recruitment is a key operation of the Bees Algorithm. In step 6, for each site only the bee with the highest ﬁtness will be selected to form the next bee population. In nature, there is no such a restriction. This constraint is introduced here to reduce the number of points to be explored. In step 7, the remaining bees in the population are assigned randomly around the search space scouting for new potential solutions. These steps are repeated until a stopping criterion is met. At the end of each iteration, the colony will have two parts to its new population – representatives from each selected patch and other scout bees assigned to conduct random searches.

3. While (stopping criterion not met) Forming new population. 4. Select elite bees for neighborhood search. Select other bees for neighborhood search. 5. Recruit bees for selected bees and evaluate fitness. 6. Select the fittest bee from each site.

ð26Þ

where Al, ul and cl are positive constants. Note that increments in the learning coefﬁcient are not constant, but vary as an exponen-

7. Assign remaining bees to search randomly and evaluate their fitness. 8. End While.

Fig. 1. Pseudo code of the basic Bees Algorithm.

6004

A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006

up to 2000 training epochs. The number of neurons in the hidden layer has been determined manually.

Table 2 The parameters of the Bees Algorithm. Bees Algorithm parameters

Symbol

Value

Number of scout bees Number of selected sites Number of elite bees Initial patch size Number bees around elite points Number of bees around other selected points

n m e ngh nep nsp

25 5 2 0.1 20 15

Table 3 Performance of BP MLP recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR

these features may carry good classiﬁcation information when treated separately, there is a little gain if they are combined together (due to the sharing the same information content). The easiest way to reduce the number of features is the feature selection. Feature subset selection algorithms can be classiﬁed into two categories based on whether or not feature selection is done independently of the learning algorithm used to construct the classiﬁer. If feature selection is performed independently of the learning algorithm, the technique is said to follow a ﬁlter approach. Otherwise, it is said to follow a wrapper approach (Kohavi & John, 1997). In this paper, we have used the wrapper type approach. As mentioned previously, the proposed method employs MLP networks as classiﬁers to guide the selection of features. This guidance is provided in the form of feedback to the selection process as to how well a given set of features characterizes patterns from different classes. The method also requires a data set for use in the feature selection process. The data set comprises patterns, each with Nt features. The classes of all the patterns in the training set are known. From the original data sets, new data sets can be constructed in which patterns only have a subset of the original features. In other words, a pattern in a new data set will have Ns features selected from the original set of Nt features. A bee represents a subset of Ns features. It can be uniquely identiﬁed by a binary string (e.g. 010110111) where the total number of bits is Nt and the total number of non-zero bits is Ns. The position of a bit along the string indicates a particular feature. If a feature is selected to form a data set, the corresponding bit is 1. Otherwise, it is zero. Feature selection starts with the random generation of a population of binary strings (or bees). For each string, a data set is constructed using the selected features speciﬁed in the string. Part of the data sets (training data) is taken to train a MLP. The remaining data (the test data) is employed to evaluate the classiﬁcation accuracy of the trained MLP. As can be seen in Fig. 1, the Bees Algorithm involves neighborhood searching. In this work, this means generating and evaluating neighbors of the ﬁttest bees. Various operators could be employed to create neighbors of a given bee, including monadic operators such as mutation, inversion, swap and insertion (single or multiple). Table 2 shows the parameter values adopted for the Bees Algorithm. The values were decided empirically.

4 0 4 8 12

NHHL = 30

NHHL = 60

Training

Testing

Training

Testing

78.12 83.45 86.25 97.05 98.25

72.35 80.24 85.42 96.45 97.65

83.55 88.25 95.85 98.90 99.12

83.45 87.25 95.55 98.75 99.04

Table 4 Performance of CG recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR

4 0 4 8 12

NHHL = 30

NHHL = 60

Training

Testing

Training

Testing

81.24 86.46 93.88 98.95 99.25

80.45 84.46 93.25 97.55 99.10

82.33 86.35 95.35 98.95 99.28

81.25 85.55 94.44 98.95 99.15

Table 5 Performance of SuperSAB recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR

4 0 4 8 12

NHHL = 30

NHHL = 60

Training

Testing

Training

Testing

80.14 87.22 95.12 97.55 98.50

79.85 85.45 94.25 97.15 98.45

84.65 89.12 97.20 98.85 99.20

83.76 88.52 96.55 98.85 99.14

Table 6 Performance of EDBD recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR

4 0 4 8 12

NHHL = 30

NHHL = 60

Training

Testing

Training

Testing

85.12 90.28 97.85 98.25 99.20

84.12 89.42 96.78 97.96 98.85

87.25 93.60 97.80 99.30 99.35

86.50 93.42 97.66 99.22 99.24

6. Simulation results This section presents the some simulation results of the proposed recognizer. We assumed that carrier frequencies were estimated correctly (or to be known). Thus, we only considered complex base-band signals. Gaussian noise was added according to SNRs, 4, 0, 4, 8, and 12 dB. For each signal type 2000 samples are used. The number of output layer node is equal to the number of the signals. The activation functions of tansigmoid and logistic were used in the hidden and the output layers, respectively. The MSE is taken equal to 106. The MLP classiﬁer is allowed to run

Table 7 Performance of QP recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR

4 0 4 8 12

NHHL = 30

NHHL = 60

Training

Testing

Training

Testing

89.35 93.63 97.80 98.85 99.55

88.75 92.60 97.46 98.32 99.40

90.86 95.76 98.78 99.38 99.62

90.76 93.52 96.55 98.85 99.44

6005

A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006 Table 8 Performance comparison of different training algorithm in case of number of epochs and testing accuracy for NNHL = 30 at different SNR values. SNR

BP with momentum Passed Epochs 1245 685 1575 978 1440

4 0 4 8 12

CG

SuperSAB

EDBD

QP

Testing

Passed epochs

Testing

Passed epochs

Testing

Passed epochs

Testing

Passed epochs

Testing

72.35 80.24 85.42 96.45 97.65

565 442 320 280 658

80.45 84.46 93.25 97.55 99.10

285 280 450 295 426

79.85 85.45 94.25 97.15 98.45

725 324 565 364 485

84.12 89.42 96.78 97.96 98.85

350 128 425 268 245

88.75 92.60 97.46 98.32 99.40

bles 3–7 show the performances of different training algorithms with these numbers. Table 8 shows the performance comparison of different algorithms in case of number of epochs and testing accuracy at number neurons in the hidden layers (NNHL) equal to 30. From these tables, it could be concluded that most of algorithms that are proposed for the ﬁrst time in the area of communication signals have high recognition accuracy, especially in low level of SNR. Adaptive learning rate algorithms such as SuperSAB were faster than second order methods such as conjugate gradient and conjugate gradient to be faster than BP. All of recognizer have higher RA than BP. Another point worth noting is that the most of learning algorithms need less hidden neurons than the conventional BP. Comparison the tables show that results obtained using only 30 hidden layer neurons. For example in QP algorithm is higher than the results obtained using 60 hidden layer neurons. Performances are notably better for the lower SNR values. From the results it is clear that QP is the fastest algorithm and the recognizer with this learning algorithm outperformed all others. Therefore for next section we consider this training algorithm.

Table 9 recognition accuracy (RA) of testing performance for four features selected using BA. SNR

RA

4 0 4 8 12

84.12 89.25 95.65 97.24 98.22

Table 10 recognition accuracy (RA) of testing performance for seven features selected using BA. SNR

RA

4 0 4 8 12

88.05 92.24 97.22 98.20 99.30

6.2. Performance of the recognizer using BA Table 11 recognition accuracy (RA) of testing performance for twenty features selected using BA. SNR

RA

4 0 4 8 12

88.50 92.45 97.38 98.25 99.35

Table 12 Accuracy matrix of hybrid intelligent recognizer with the seven selected features at SNR = 4 dB. P1 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11

P2

P3

P4

P5

P6

P7

P8

P9

P10

P11

98.0 96.2 98.8 98.2 96.5 98.0 97.6 96.9 96.0 96.1 97.1

In this subsection we evaluate the performances of the proposed recognizer using the Bees Algorithm (BA). For starting the optimization, one has to specify the number of features that varies from 1 to 31. Therefore, experiments were carried out to investigate the possibility of even smaller data set and the compromise in performance that one might observe. The criterion of selection for the minimum number of features is degradation the recognition accuracy (RA) for testing performance (TP) of about 2% at lowest SNR. Tables 9–11 show the RA of recognizer at three-chosen number of the features selected by BA. Comparison of this table and Table 7, indicates that: (a) selection the four features doesn’t make the proper performance, (b) with selection of the seven features, the recognizer records a performance degradation of about 0.5% only at SNR = 4 dB and for other SNRs, the difference is negligible, (c) performances of the recognizer with seven features (selected using BA) and twenty features (selected using BA) have a little difference. Therefore it can be said that selection the seven features (using BA) achieves the desirable performance and doesn’t need to consider more features. Tables 12 and 13 show the accuracy matrix of hybrid intelligent technique with the seven selected features at SNR = 4 dB and SNR = 40 dB. It is found that the proposed recognizer has a high success rate for recognition of the different types of digital signal. This high performance is achieved with only seven selected features which have been optimized using the BA.

6.1. Performance of the Straight Recognizer (SR)

6.3. Performance comparison

In this subsection, we evaluate the performance of the recognizer without optimization (straight recognizer), i.e., full features are used. We have selected two different numbers of neurons. Ta-

As mentioned in Mobasseri (2000), direct comparison with other works is difﬁcult in radio signal recognition. This is mainly because of the fact that there is no single uniﬁed data set available.

6006

A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006

Table 13 Accuracy matrix of hybrid intelligent recognizer with the seven selected features at SNR = 40 dB. P1 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11

P2

P3

P4

P5

P6

P7

P8

P9

P10

P11

100 100 99.4 99.2 99.8 99.6

this idea reduces the number of features without trading off the generalization ability and accuracy. The optimized recognizer also has high performance for recognition of the considered different kinds of digital signal for all SNRs. This high efﬁciency is achieved with only seven features, which have been selected using BA. For future works, we can use the BA in order to construct of the classiﬁer as well as the features selection. Also we can consider another set of digital signal types and evaluate this technique for recognition of them.

100 99.8

References 100 99.8 99.6

Different setup of signal types will lead to different performance. Besides, there are many different kinds of benchmarking systems used for signal’s quality. This causes difﬁculties for direct numerical comparison. As for neural network-based signal recognizers, the authors in Chani and Lamontagne (1993), reported a generalization rate of 90% and 93% of accuracy of data sets with SNR of 15–25 dB. However, the performances for lower SNRs are reported to be less than 80% for a fully connected network, and about 90% for a hierarchical network. Mingquan et al. (1996) show an average accuracy around 90% with 4096 samples per realization and SNR ranges between 5 and 25 dB. By increasing the number of samples, an increase of 5% in average performance is achieved. In Mingquan et al. (1998), Lu et al. show through computer simulation the average recognition rate is 83%, and it reaches over 90% for SNR value of over 20 dB. However, if SNR is less than 10 dB, the performance drops to less than 70%. The hybrid intelligent technique that is proposed in this paper has many advantages. It recognizes different kinds of the digital radio signal. This recognizer can identify the considered digital signals even at very low SNRs; For instance it has a success rate of around 88% at SNR = 4 dB. The performances of the recognizer is higher than 98% for SNR > 4 dB. Through feature selection, the input dimension is reduced to less than quarter of the original size without trading of the generalization ability and accuracy. 7. Conclusion Automatic recognition of digital signal formats is an important subject for novel communication systems. Most of the previous techniques can only recognize a few kinds of digital signal and/or lower orders of digital signals. They usually need high levels of SNR for classiﬁcation of the considered digital signals. These problems are mainly due to the facts: the features, the selection method of the features and the classiﬁer that are used in these techniques. In this paper we have proposed a combination set of spectral features and higher order moments up to eighth and higher order cumulants up to eighth as the effective features. These features have high ability in representing of the radio signal formats. As the classiﬁer, we have proposed a MLP neural network with different training algorithms. By using the mentioned features and classiﬁer, we have presented a highly efﬁcient recognizer. This recognizer discriminates a lot of digital signal types with high accuracy even at very low SNRs. But a lot of features are used for this recognition. In order to reduce the complexity of the recognizer we have used an optimizer, i.e. Bees Algorithm (BA). Using

Avci, E., Hanbay, D., & Varol, A. (2007). An expert discrete wavelet adaptive network based fuzzy inference system for digital modulation recognition. Expert Systems with Applications, 33, 582–589. Battiti, R. (1992). First and second order methods for learning. Neural Computation, 4, 141–166. Chani, N., & Lamontagne, R. (1993). Neural networks applied to the classiﬁcation of spectral features for automatic modulation recognition. Proceedings of MILCOM, 1494–1498. Deng, H., Doroslovacki, M., Mustafa, H., Jinghao, X., & Sunggy, K. (2002). Automatic digital modulation classiﬁcation using instantaneous features. Proceedings of ICASSP, 4, IV4168. Dobre, O. A., Bar-Ness, Y., & Su, W. (2004). Robust QAM modulation classiﬁcation algorithm based on cyclic cumulants. Proceedings of WCNC, 745–748. Donoho, D., & Huo, X. (1997). Large-sample modulation classiﬁcation using Hellinger representation. Signal Processing and Advanced Wireless Communications, 133–137. Fahlman, S. (1988). An empirical study of learning speed in back-propagation networks. Tech Report CMU-CS-88-162, Carnegie-Mellon University, Computer Science Department. Haykin, S. (1999). Neural networks: A comprehensive foundation. New York: MacMillan. Ho, K. C., Prokopiw, W., & Chan, Y. T. (2000). Modulation identiﬁcation of digital signals by wavelet transform. Proceedings of IEE, Radar, Sonar Navigation, 147(4), 169–176. Hsue, S. Z., & Soliman, S. S. (1990). Automatic modulation classiﬁcation using zerocrossing. IEE Proceedings of Radar, Sonar and Navigation, 137. pp. 459–46. Iversen, A. (2003). The use of artiﬁcial neural networks for automatic modulation recognition. Technical Report, Heriot-Watt University, December 2003. Jacobs, R. A. (1988). Increased rate of convergence through learning rate adaptation. Neural Networks, 1, 295–307. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Atriﬁcal intelligence, 97, 273–324. Minai, A. A., & Williams, R. D. (1990). Back-propagation heuristics: A study of the extended delta-bar-delta algorithm. In International joint conference on neural networks, 1, (pp. 595–600) San Diego, California, June 17–21. Mingquan, L., Xianci, X., & Lemin, L. (1998). AR modeling based features extraction for multiple signals for modulation recognition. In Signal processing proceedings, fourth international conference, Beijing, Vol. 2 (pp. 1385–1388). Mingquan, L., Xianci, X., & Leming, L. (1996). Cyclic spectral features based modulation recognition. Proceedings of ICCT, 792–795. Mobasseri, B. G. (2000). Digital modulation classiﬁcation using constellation shape. Signal Processing, 80, 251–277. Nandi, A. K., & Azzouz, E. E. (1998). Algorithms for automatic modulation recognition of communication signals. IEEE Transactions on Communications, 46(4), 431–436. Nikias, C. L., & Petropulu, A. P. (1993). Higher-order spectra analysis: A nonlinear signal processing framework. Englewood Cliffs, NJ: PTR Prentice-Hall. Ou, X., Huang, Xiaowei, Yuan, Xiao, & Yang, Wangquan (2004). Quasi-Haar wavelet and modulation identiﬁcation of digital signals. Proceedings of ICASSP, 733–737. Pham, D. T., Ghanbarzadeh, A., Koc, E., Otri, S., Rahim, S., & Zaidi, S. (2006). The bees algorithm, a novel tool for complex optimization problems. In Proceedings of second international virtual Conference on intelligent production machines and systems (IPROMS 2006). Proakis, J. G. (2001). Digital communications. New York: McGraw-Hill. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press. Sehier, C. L. P. (1993). Automatic modulation recognition with a hierarchical neural network. Proceedings of MILCOM, 111–115. Spooner, C. M. (1995). Classiﬁcation of cochannel communication signals using cyclic cumulants. In Proceedings of ASILOMAR (pp. 531–536). Sue, W., Jefferson, L. X., & Mengchou, Z. (2008). Real-time modulation classiﬁcation based on maximum likelihood. IEEE Communications Letters, 12(11). Swami, A., & Sadler, B. M. (2000). Hierarchical digital modulation classiﬁcation using cumulants. IEEE Transactions on Communications, 48, 416–429. Tollenaere, T. (1990). Supersab: Fast adaptive bp with good scaling properties. Neural Networks, 3, 561–573.

Hybrid intelligent technique for automatic communication signals recognition using Bees Algorithm and MLP neural networks based on the efficient features

Hybrid intelligent technique for automatic communication signals recognition using Bees Algorithm and MLP neural networks based on the efficient features

Recommend Documents