Expert Systems with Applications 38 (2011) 6000–6006
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Hybrid intelligent technique for automatic communication signals recognition using Bees Algorithm and MLP neural networks based on the efficient features Ataollah Ebrahimzadeh Shrme ⇑ Faculty of Electrical and Computer Engineering, Babol University of Technology, Shariati Blvd., Babol 4715716467, Iran
a r t i c l e
i n f o
Keywords: Communication signal recognition Bees Algorithm Multi-layer perceptron Learning algorithms Spectral characteristics features Higher order moments Higher order cumulants Feature selection Optimization
a b s t r a c t Automatic communication signal recognition plays an important role for many novel computer and communication technologies. Most of the proposed techniques can only identify a few kinds of digital signal and/or low order of them. They usually require high levels of signal to noise ratio (SNR). In this paper, we investigate twofold. First, we propose an efficient system that uses a combination set of spectral characteristics and higher order moments up to eighth and higher order cumulants up to eighth as the effective features. As the classifier we used a multi-layer perceptron (MLP) neural network. In this stage we investigate different learning algorithms of MLP neural networks that some of them, such as quick prop (QP) learning algorithm, extended delta-bar-delta (EDBD), super self adaptive back propagation (SuperSAB) and conjugate gradient (CG) are proposed for the first time in the area of communication signals recognition. Experimental results show that proposed system discriminates a lot of digital communication signals with high accuracy even at very low SNRs. But a lot of features are used for this recognition. Then at the second fold, in order to reduce the complexity of the recognizer, we have proposed a novel hybrid intelligent technique. In this technique we have optimized the classifier design by Bees Algorithm (BA) for selection of the best features that are fed to the classifier. Simulation results show that the proposed technique has very high recognition accuracy with seven features selected by BA. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction Automatic communication signal recognition plays an important role for various applications, such as: electronic surveillance, interference identification, monitoring, spectrum management, software radios, intelligent modems, etc. Due to the increasing usage of digital communication signals in the novel technologies, in this paper we have focused on the recognition of these signals. Automatic signal classification techniques, usually, divided two principle techniques. One is the decision theoretic (DT) approach and the other is pattern recognition (PR) approach. DT methods use probabilistic and hypothesis testing arguments to formulate the recognition problem (Donoho & Huo, 1997; Sue, Jefferson, & Mengchou, 2008; Swami & Sadler, 2000). Pattern recognition approaches, however, do not need such careful treatment. They are easy to implement. PR methods can be further divided in two main subsystems: the feature extraction subsystem and the classifier subsystem. The former extracts the features and the latter determines the membership of signal. Examples of features used are: instantaneous features (Nandi & Azzouz, 1998), higher order statistics (Dobre, Bar-Ness, & Su, 2004; Spooner, 1995), magnitude of Haar wavelet transform ⇑ Tel./fax: +98 111 32 39 214. E-mail address:
[email protected] 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.11.021
(Ho, Prokopiw, & Chan, 2000; Ou, Huang, Yuan, & Yang, 2004), constellation shape (Mobasseri, 2000), zero crossing (Hsue & Soliman, 1990), kurtosis of the signal (Deng, Doroslovacki, Mustafa, Jinghao, & Sunggy, 2002), etc. The second subsystem, the pattern recognizer, is responsible for classifying the incoming signal based on the features extracted. It can be implemented in many ways, e.g. K-nearest neighbourhood classifier (KNN), fuzzy classifier, multilayer perceptron (MLP) neural network, etc. (Avci, Hanbay, & Varol, 2007; Chani & Lamontagne, 1993; Iversen, 2003; Mingquan, Xianci, & Leming, 1996; Mingquan, Xianci, & Lemin, 1998; Nandi & Azzouz, 1998; Sehier, 1993). In Chani and Lamontagne (1993), Ghani and Lamontagne proposed using the multi-layer perceptron (MLP) neural network with back-propagation (BP) learning algorithm for automatic signal type identification. They showed that neural network classifier outperforms other classifiers such as K-Nearest Neighbor (KNN). In Nandi and Azzouz (1998), the authors showed that the neural network classifier has a higher performance than the threshold classifier. In Sehier (1993), the authors showed that the artificial neural network has better performance than K-Nearest Neighbor (KNN) classifier and the well known binary decision trees. In Avci et al. (2007), the authors have done a comparative study of implementation of feature extraction and classification algorithms based on discrete wavelet decompositions and adaptive network based fuzzy interference system (ANFIS) for recognition of considered signal types.
6001
A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006
From the published works, it appears clear that in the design of a system for automatic classification of communication signals, there are some important issues, which, if suitably addressed, may lead to the development of more robust and efficient recognizers. One of these issues is related to the choice of the classification approach to be adopted. In particular, we think that despite its great potential, the MLP approach has not received the attention it deserves in the communication signal classification literature. For example, the training algorithm of the neural networks classifier has not received the attention. Most of papers have used BP training algorithm. In this paper we use different learning algorithms of MLP neural networks that some of them are proposed for the first time in the area of communication signals recognition. Choosing the right feature set is still another issue. In some papers the higher order cumulants have been applied, and in some another papers spectral features used, separately. In this paper, for the first time, a combination set of the spectral features and higher order moments up to eighth and higher order cumulants up to eighth are proposed as the effective features. In the other hand in the most of previous methods, the features are not selected in a completely automatic way. Feature selection determines the optimal set of features from the given set of features. In this paper it is used a novel optimization algorithm, i.e. Bees Algorithm (BA) that is introduced in Pham et al. (2006). We have optimized the classifier design by for selection of the best features that are fed to the classifier, from the combined statistical and spectral feature set. Thus the optimum design of the classifier is another issue that is targeted in this paper. The paper is organized as follows. Feature extraction module will be described in Section 2. Section 3, describes the classifier. Section 4 present the Bees Algorithm and Section 5 describes the feature selection using it. Section 6 shows some simulation results. Finally, Section 7 concludes the paper.
2. Feature extraction In digital communications, based on changes in the frequency of message, amplitude of message, phase of message, or changes in amplitude and phase, there are four main digital signal formats: frequency shift keying (FSK), amplitude shift keying (ASK), phase shift keying (PSK) and quadrature amplitude signal (QAM), respectively (Proakis, 2001). In this paper we have considered the following radio signals for recognition: FSK2, FSK4, PSK2, PSK4, ASK4, ASK8, QAM8, QAM16, QAM32, QAM64 and QAM128. For simplifying the indication, we substitute the signals FSK2, FSK4, PSK2, PSK4, ASK4, ASK8, QAM8, QAM16, QAM32, QAM64 and QAM128 with P1, P2, P3, P4, P5, P6, P7, P8, P9, P10 and P11 respectively. Finding the suitable features is a very important step for recognition of these radio signals. In this paper a suitable combination of the spectral features and higher order moments and higher order cumulants up to eighth and spectral features are proposed as the prominent characteristics of the considered signals. Following phrases describes these features.
raf
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! !2ffi u u1 X X 1 ¼t f 2 ðiÞ jfN ðiÞj C A ðiÞa N C A ðiÞa n
n
t
P where fN ðiÞ ¼ fcrðiÞ ; f c ðiÞ ¼ f ðiÞ mf ; mf ¼ N1 Ni¼1 f ðiÞ. raf could s differentiate between the modulation schemes without frequency information and the FSK modulation schemes and also between FSK2 and FSK4. (2) cmax: is the maximum value of the power spectral density of the normalized-centered instantaneous amplitude of the intercepted signal segment. This feature can express the character of signal’s envelope and was added to differentiate between the modulation schemes that carry amplitude modulation and those that do not. (3) rfn: Standard deviation of the direct value of the normalizedcentered instantaneous frequency, evaluated over the over non-weak segments of the intercepted signal:
rfn
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! !2 u u1 X 1 X t 2 ¼ f ðiÞ fN ðiÞ C A ðiÞa N C A ðiÞa n
n
t
ð2Þ
t
rfn is used to discriminate between FSK2 and FSK4. 2.2. Higher order moments and higher order cumulants Probability distribution moments are a generalization of concept of the expected value. Recall that the general expression for the ith moment of a random variable is given by Nikias and Petropulu (1993):
li ¼
Z
1
ðs mÞi f ðsÞds
ð3Þ
1
where m is the mean of the random variable. The definition for the ith moment for a finite length discrete signal is given by:
li ¼
N X ðsk mÞi f ðsk Þ
ð4Þ
k¼1
where N is the data length. In this study signals are assumed to be zero mean. Thus:
li ¼
N X
sik f ðsk Þ
ð5Þ
k¼1
Next, the auto-moment of the random variable may be defined as follows:
Mpq ¼ E½spq ðs Þq
ð6Þ
where p is called the moment order and s* stands for complex conjugation of s. Assume a zero-mean discrete based-band signal sequence of the form sk = ak + jbk. Using the definition of the auto-moments, the expressions for different orders may be easily derived. For example:
M83 ¼ E½s5 ðs Þ3 ¼ E½ða þ jbÞ5 ða jbÞ3 ) M 83 2
2
2.1. Spectral features
ð1Þ
t
3
3
4
4
5 5
¼ E½ða5 þ j5a4 b þ j 10a3 b þ j 10a2 b þ j 5ab þ j b Þ 2
2
3 3
ða3 j3a2 b þ j 3ab j b Þ Spectral features were demonstrated to be suitable for signals which contain hidden information in a single domain (Nandi & Azzouz, 1998). In this paper, based on the considered digital signals, the following spectral features are selected. (1) raf: Standard deviation of the absolute value of the normalized-centered instantaneous frequency over non-weak segments of the intercepted signal:
2
2
3
3
5
) M 83 ¼ E½a8 þ j2a7 b j 2a6 b j 6a5 b þ j 60a3 b 6
6
7
7
5
8 8
þ j 2a2 b j 2ab j b 2
6
8
) M 83 ¼ E½a8 þ 2a6 b 2a2 b b
ð7Þ
Consider a scalar zero mean random variable s with characteristic function ^f ðtÞ. Expanding the logarithm of the characteristic function as a Taylor series, one obtains:
6002
A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006
Table 1 Some of the Chosen higher order features for a number of the considered digital signal types.
M41 M61 M84 C61 C80 C84
PSK2
QAM16
QAM64
1 1 1 16 244 244
0 1.32 3.13 2.08 13.99 17.38
0 1.3 3.9 1.797 11.5 0
log ^f ðtÞ ¼ k1 ðjtÞ þ þ
kr ðjtÞr þ r!
ð8Þ
The constants kr in (8) are called the cumulants (of the distribution) of s. The symbolism for pth order of cumulant is similar to that of the pth order moment. More specially:
C pq ¼ Cum½ s; . . . ; s ; s ; . . . ; s : |fflfflffl{zfflfflffl} |fflfflfflfflffl{zfflfflfflfflffl} ðpqÞterms
ð9Þ
ðqÞterms
For example:
wij ðt þ 1Þ ¼ wij ðtÞ e
3.1. Back propagation with momentum (BP with momentum) BP (Rumelhart & McClelland, 1986) makes use of gradient descent with a momentum term to smooth out oscillation. It adds an extra momentum parameter, l, to the weight changes. Eq. (13) gives the weight update for BP with momentum:
Dwij ðt þ 1Þ ¼ e
C 81 ¼ Cumðs; s; s; s; s; s; s; s Þ
M½s1 ; . . . ; sn ¼
X 8v
h i h i Cum fsj gj2v 1 . . . Cum fsj gj2v q
ð10Þ
where the summation index is over all partitions v = (v1, . . . , vq) for the set of indexes (1, 2, . . . , n), and q is the number of elements in a given partition. Cumulants may be also be derived in terms of moments:
Cum½s1 ; . . . ; sn ¼
X Y Y ð1Þq1 ðq 1Þ!E½ sj E½ sj 8v
j2v 1
ð11Þ
j2v q
where the summation is being performed on all partitions v = (v1, . . . , vq) for the set of indices (1, 2, . . . , n). We have considered the second, fourth, sixth and eighth order of the moments and cumulant as the features. These features are: M20, M21, M40, M41, M42, M60, M61, M62, M63, M 80, M81, M82, M83, M84, C20, C21, C40, C41, C42, C60, C61, C62, C63, C80, C81, C82, C83 and C84. Therefore the total number of the statistical features is 28. We have computed all of these features for the considered digital signals. Table 1 shows some of the higher order moments and higher order cumulants for a number of the considered digital signal types. These values are computed under the constraint of unit variance in noise free.
3. Classifier In this paper, it is used a MLP neural network as the classifier. A MLP neural network consists of an input layer (of source nodes), one or more hidden layers (of computation nodes) and an output layer (Haykin, 1999). The number of nodes in the input and the output layers depend on the number of input and output variables, respectively. In this paper a single hidden layer MLP neural network was chosen as the classifier. The recognition basically consists of two phases training and testing. In training stage, weights are calculated according to the chosen learning algorithm. The issue of learning algorithm and its speed is very important for MLP. (BP) algorithm is still one of the most popular algorithms. In BP a simple gradient descent algorithm updates the weight values:
ð12Þ
where wij represents the weight value from neuron j to neuron i, e is the learning rate parameter, E represent the error function. However under certain conditions, the BP network classifier can produce non-robust results and easily converge to local minimum. Moreover it is time consuming in training phase. New algorithms have been proposed so far in order to network training. However, some algorithms require much computing power to achieve good training. In this paper it is used different learning algorithms of MLP neural networks that some of them are proposed for the first time in the area of communication signals recognition. Following phrases describes briefly these algorithms.
The nth order cumulant is a function of the moments of orders up to (and including) n. Moments may be expressed in terms of cumulants as:
@E ðtÞ @wij
dE dE ðtÞ þ l ðt 1Þ dwij dwij
ð13Þ
This takes account of the previous weight changes and leads to a more stable algorithm and accelerates convergence in shallow areas of the cost function. 3.2. Quick prop (QP) algorithm Quick prop (QP) algorithm does not need much computation. QP calculates the update for each weight separately using a Newton-like method to minimize the error in that weight dimension. As reported by Fahlman (1988), two major assumptions are made in the development of QP algorithm: (i) the error vs. weight curve for each weight is assumed to be a convex parabola, and (ii) the change in the slope of the error curve with respect to each weight, is not affected by all the other weights that are changing at the same time. The weight update rule for QP is given by following expression:
wij ðt þ 1Þ ¼ wij ðtÞ þ Dwij ðtÞ
Dwij ðtÞ ¼
g ij ðtÞ ¼
g ij ðtÞ Dwij ðt 1Þ g ij ðt 1Þ g ij ðtÞ
dE dwij
ð14Þ ð15Þ
ð16Þ
where wij represents the weight value from neuron j to neuron i, E represent the error function. There are some heuristics that are used with QP. If gij(t) and gij(t 1) are the same sign, then ggij(t) is also added to Dwij(t). The gij(t) gradient is modified by weight decay, given by:
g ij ðtÞ ¼ g ij ðtÞ þ 0:0001 wij ðtÞ
ð17Þ
Also the derivative of the sigmoid function f is altered to be:
fj0 ¼ oj ð1 oj Þ þ 0:1
ð18Þ
3.3. SuperSAB SuperSAB (Tollenaere, 1990) is an adaptive learning rate algorithm. Each weight wij, connecting node j with node i has its own learning rate that can vary in response to the error surface. The weight update is:
A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006
Dwij ðt þ 1Þ ¼ gij ðtÞ
gijðtÞ ¼
dE ðtÞ þ lDwij ðt 1Þ dwij
8 < gþ gij ðt 1Þ; : g gij ðt 1Þ;
if
dE dwij
if
dE ðt 1Þ dw ðtÞ 0
dE dwij
ij
dE ðt 1Þ dw ðtÞ 0
ð19Þ
ð20Þ
ij
And gþ g1 . The parameters (g) and (g+) are the increment and decrement factors.
6003
tially decreasing function of the magnitude of the weighted average gradient component Dij. To prevent oscillations in the values of the weights, aij and lij are kept below preset upper bounds amax and lmax. The values of the training parameters adopted for the algorithms were determined empirically. They were as follows: for BP: e = 0.9 and l = 0.6; for EDBD: amax = lmax = 2, Aa = 0.095, Al = 0.01, ua = 0.1, ul = 0.01, ca = 0.0, cl = 0.0, and h = 0.7. 4. Bees Algorithm
3.4. Conjugate gradient (CG) CG (Battiti, 1992) is a method from the field of optimization which makes use of the second derivative of the error surface. If the error surface was quadratic, this method would reach the minimum in N steps, where N is the number of weights. Each weight update is conjugate to the previous one, and requires a line search to calculate its length. The algorithm calculates a search vector which is the direction of the weight step, and the line search finds the minimum point along that vector. For non-quadratic functions, the algorithm will not reach the minimum in N steps, and the algorithm needs to ‘restart’ or re-evaluate its parameters. No restart procedure was used in this implementation since the algorithm always reached the target error in fewer than N steps. Line minimisation methods are lengthy to list in full. However, the accuracy is the important figure. The algorithm used here performed an exact search, finding the minimum to be within a certain interval. The length of the interval was at most 5% of the length of the previous weight step. The update rule is as follows:
Dwðt þ 1Þ ¼ gðtÞdðtÞ
ð21Þ
dðtÞ ¼ gðtÞ þ bðt 1Þdðt 1Þ
ð22Þ
The Polak–Ribiere rule is used for calculating b(t 1), given by:
bðt 1Þ ¼
gðtÞðgðtÞ gðt 1ÞÞ gðt 1Þ2
ð23Þ
3.5. Extended delta-bar-delta algorithm As its name implies, this algorithm (Jacobs, 1988) is an extension of DBD (Minai & Williams, 1990). It also aims to decrease the training time for MLPs. The changes in weights are calculated as:
Dwij ðt þ 1Þ ¼ aij ðtÞ
dE ðtÞ þ lij Dwij ðt 1Þ dwij
ð24Þ
Where aij and lij are learning and momentum coefficients, respectively. The use of momentum in EDBD is one of the differences between it and DBD. The learning coefficient change is given as:
DaijðtÞ
8 dE ðtÞ 0 A expðca jDij ðtÞjÞ; if Dij ðt 1Þ dw > > ij < a dE ¼ ua aij ðtÞ; if Dij ðt 1Þ dw ðtÞ 0 ij > > : 0; otherwise
ð25Þ
DlijðtÞ
5. Bees Algorithm for feature selection From Section two it is found that the total number of the features was a lot. This number of features causes high computational complexity in the recognizer. On the other hand, although some of
1. Initialize population with random solutions. 2. Evaluate fitness of the population.
where Aa, ua and ca are positive constants and Dij is computed as a dE dE weighted average of dw ðt 1Þ and dw ðt 2Þ given by Dij ¼ ij ij dE dE ð1 hÞ dw ðt 1Þ þ h ðt 2Þ. In the equation, h is the convex dwij ij factor. The momentum coefficient change is obtained as:
8 dE ðtÞ 0 A expðcl jDij ðtÞjÞ; if Dij ðt 1Þ dw > > ij < l dE ¼ ul lij ðtÞ; if Dij ðt 1Þ dwij ðtÞ 0 > > : 0; otherwise
Bees Algorithm is an optimization algorithm inspired by the natural foraging behavior of honey bees (Pham et al., 2006). Fig. 1 shows the pseudo code for the algorithm in its simplest form. This algorithm requires a number of parameters to be set, namely: number of scout bees (n), number of patches selected out of n visited points (m), number of elite patches out of m selected patches (e), number of bees recruited for the best e patches (nep), number of bees recruited for the other (m-e) selected patches (nsp), size of patches (ngh) and stopping criterion. The algorithm starts with the n scout bees being placed randomly in the search space. The fitness of the points visited by the scout bees are evaluated in step 2. In step 4, bees that have the highest fitness are designated as ‘‘selected bees’’ and sites visited by them are chosen for neighborhood search. Then, in steps 5 and 6, the algorithm conducts searches in the neighborhood of the selected bees, assigning more bees to search near to the best e bees. The bees can be chosen directly according to the fitness associated with the points they are visiting. Alternatively, the fitness values are used to determine the probability of the bees being selected. Searches in the neighborhood of the best e bees which represent more promising solutions are made more detailed by recruiting more bees to follow them than the other selected bees. Together with scouting, this differential recruitment is a key operation of the Bees Algorithm. In step 6, for each site only the bee with the highest fitness will be selected to form the next bee population. In nature, there is no such a restriction. This constraint is introduced here to reduce the number of points to be explored. In step 7, the remaining bees in the population are assigned randomly around the search space scouting for new potential solutions. These steps are repeated until a stopping criterion is met. At the end of each iteration, the colony will have two parts to its new population – representatives from each selected patch and other scout bees assigned to conduct random searches.
3. While (stopping criterion not met) Forming new population. 4. Select elite bees for neighborhood search. Select other bees for neighborhood search. 5. Recruit bees for selected bees and evaluate fitness. 6. Select the fittest bee from each site.
ð26Þ
where Al, ul and cl are positive constants. Note that increments in the learning coefficient are not constant, but vary as an exponen-
7. Assign remaining bees to search randomly and evaluate their fitness. 8. End While.
Fig. 1. Pseudo code of the basic Bees Algorithm.
6004
A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006
up to 2000 training epochs. The number of neurons in the hidden layer has been determined manually.
Table 2 The parameters of the Bees Algorithm. Bees Algorithm parameters
Symbol
Value
Number of scout bees Number of selected sites Number of elite bees Initial patch size Number bees around elite points Number of bees around other selected points
n m e ngh nep nsp
25 5 2 0.1 20 15
Table 3 Performance of BP MLP recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR
these features may carry good classification information when treated separately, there is a little gain if they are combined together (due to the sharing the same information content). The easiest way to reduce the number of features is the feature selection. Feature subset selection algorithms can be classified into two categories based on whether or not feature selection is done independently of the learning algorithm used to construct the classifier. If feature selection is performed independently of the learning algorithm, the technique is said to follow a filter approach. Otherwise, it is said to follow a wrapper approach (Kohavi & John, 1997). In this paper, we have used the wrapper type approach. As mentioned previously, the proposed method employs MLP networks as classifiers to guide the selection of features. This guidance is provided in the form of feedback to the selection process as to how well a given set of features characterizes patterns from different classes. The method also requires a data set for use in the feature selection process. The data set comprises patterns, each with Nt features. The classes of all the patterns in the training set are known. From the original data sets, new data sets can be constructed in which patterns only have a subset of the original features. In other words, a pattern in a new data set will have Ns features selected from the original set of Nt features. A bee represents a subset of Ns features. It can be uniquely identified by a binary string (e.g. 010110111) where the total number of bits is Nt and the total number of non-zero bits is Ns. The position of a bit along the string indicates a particular feature. If a feature is selected to form a data set, the corresponding bit is 1. Otherwise, it is zero. Feature selection starts with the random generation of a population of binary strings (or bees). For each string, a data set is constructed using the selected features specified in the string. Part of the data sets (training data) is taken to train a MLP. The remaining data (the test data) is employed to evaluate the classification accuracy of the trained MLP. As can be seen in Fig. 1, the Bees Algorithm involves neighborhood searching. In this work, this means generating and evaluating neighbors of the fittest bees. Various operators could be employed to create neighbors of a given bee, including monadic operators such as mutation, inversion, swap and insertion (single or multiple). Table 2 shows the parameter values adopted for the Bees Algorithm. The values were decided empirically.
4 0 4 8 12
NHHL = 30
NHHL = 60
Training
Testing
Training
Testing
78.12 83.45 86.25 97.05 98.25
72.35 80.24 85.42 96.45 97.65
83.55 88.25 95.85 98.90 99.12
83.45 87.25 95.55 98.75 99.04
Table 4 Performance of CG recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR
4 0 4 8 12
NHHL = 30
NHHL = 60
Training
Testing
Training
Testing
81.24 86.46 93.88 98.95 99.25
80.45 84.46 93.25 97.55 99.10
82.33 86.35 95.35 98.95 99.28
81.25 85.55 94.44 98.95 99.15
Table 5 Performance of SuperSAB recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR
4 0 4 8 12
NHHL = 30
NHHL = 60
Training
Testing
Training
Testing
80.14 87.22 95.12 97.55 98.50
79.85 85.45 94.25 97.15 98.45
84.65 89.12 97.20 98.85 99.20
83.76 88.52 96.55 98.85 99.14
Table 6 Performance of EDBD recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR
4 0 4 8 12
NHHL = 30
NHHL = 60
Training
Testing
Training
Testing
85.12 90.28 97.85 98.25 99.20
84.12 89.42 96.78 97.96 98.85
87.25 93.60 97.80 99.30 99.35
86.50 93.42 97.66 99.22 99.24
6. Simulation results This section presents the some simulation results of the proposed recognizer. We assumed that carrier frequencies were estimated correctly (or to be known). Thus, we only considered complex base-band signals. Gaussian noise was added according to SNRs, 4, 0, 4, 8, and 12 dB. For each signal type 2000 samples are used. The number of output layer node is equal to the number of the signals. The activation functions of tansigmoid and logistic were used in the hidden and the output layers, respectively. The MSE is taken equal to 106. The MLP classifier is allowed to run
Table 7 Performance of QP recognizer with different number of neurons in the hidden layer (NNHL) at different SNR values. SNR
4 0 4 8 12
NHHL = 30
NHHL = 60
Training
Testing
Training
Testing
89.35 93.63 97.80 98.85 99.55
88.75 92.60 97.46 98.32 99.40
90.86 95.76 98.78 99.38 99.62
90.76 93.52 96.55 98.85 99.44
6005
A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006 Table 8 Performance comparison of different training algorithm in case of number of epochs and testing accuracy for NNHL = 30 at different SNR values. SNR
BP with momentum Passed Epochs 1245 685 1575 978 1440
4 0 4 8 12
CG
SuperSAB
EDBD
QP
Testing
Passed epochs
Testing
Passed epochs
Testing
Passed epochs
Testing
Passed epochs
Testing
72.35 80.24 85.42 96.45 97.65
565 442 320 280 658
80.45 84.46 93.25 97.55 99.10
285 280 450 295 426
79.85 85.45 94.25 97.15 98.45
725 324 565 364 485
84.12 89.42 96.78 97.96 98.85
350 128 425 268 245
88.75 92.60 97.46 98.32 99.40
bles 3–7 show the performances of different training algorithms with these numbers. Table 8 shows the performance comparison of different algorithms in case of number of epochs and testing accuracy at number neurons in the hidden layers (NNHL) equal to 30. From these tables, it could be concluded that most of algorithms that are proposed for the first time in the area of communication signals have high recognition accuracy, especially in low level of SNR. Adaptive learning rate algorithms such as SuperSAB were faster than second order methods such as conjugate gradient and conjugate gradient to be faster than BP. All of recognizer have higher RA than BP. Another point worth noting is that the most of learning algorithms need less hidden neurons than the conventional BP. Comparison the tables show that results obtained using only 30 hidden layer neurons. For example in QP algorithm is higher than the results obtained using 60 hidden layer neurons. Performances are notably better for the lower SNR values. From the results it is clear that QP is the fastest algorithm and the recognizer with this learning algorithm outperformed all others. Therefore for next section we consider this training algorithm.
Table 9 recognition accuracy (RA) of testing performance for four features selected using BA. SNR
RA
4 0 4 8 12
84.12 89.25 95.65 97.24 98.22
Table 10 recognition accuracy (RA) of testing performance for seven features selected using BA. SNR
RA
4 0 4 8 12
88.05 92.24 97.22 98.20 99.30
6.2. Performance of the recognizer using BA Table 11 recognition accuracy (RA) of testing performance for twenty features selected using BA. SNR
RA
4 0 4 8 12
88.50 92.45 97.38 98.25 99.35
Table 12 Accuracy matrix of hybrid intelligent recognizer with the seven selected features at SNR = 4 dB. P1 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
98.0 96.2 98.8 98.2 96.5 98.0 97.6 96.9 96.0 96.1 97.1
In this subsection we evaluate the performances of the proposed recognizer using the Bees Algorithm (BA). For starting the optimization, one has to specify the number of features that varies from 1 to 31. Therefore, experiments were carried out to investigate the possibility of even smaller data set and the compromise in performance that one might observe. The criterion of selection for the minimum number of features is degradation the recognition accuracy (RA) for testing performance (TP) of about 2% at lowest SNR. Tables 9–11 show the RA of recognizer at three-chosen number of the features selected by BA. Comparison of this table and Table 7, indicates that: (a) selection the four features doesn’t make the proper performance, (b) with selection of the seven features, the recognizer records a performance degradation of about 0.5% only at SNR = 4 dB and for other SNRs, the difference is negligible, (c) performances of the recognizer with seven features (selected using BA) and twenty features (selected using BA) have a little difference. Therefore it can be said that selection the seven features (using BA) achieves the desirable performance and doesn’t need to consider more features. Tables 12 and 13 show the accuracy matrix of hybrid intelligent technique with the seven selected features at SNR = 4 dB and SNR = 40 dB. It is found that the proposed recognizer has a high success rate for recognition of the different types of digital signal. This high performance is achieved with only seven selected features which have been optimized using the BA.
6.1. Performance of the Straight Recognizer (SR)
6.3. Performance comparison
In this subsection, we evaluate the performance of the recognizer without optimization (straight recognizer), i.e., full features are used. We have selected two different numbers of neurons. Ta-
As mentioned in Mobasseri (2000), direct comparison with other works is difficult in radio signal recognition. This is mainly because of the fact that there is no single unified data set available.
6006
A.E. Shrme / Expert Systems with Applications 38 (2011) 6000–6006
Table 13 Accuracy matrix of hybrid intelligent recognizer with the seven selected features at SNR = 40 dB. P1 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
100 100 99.4 99.2 99.8 99.6
this idea reduces the number of features without trading off the generalization ability and accuracy. The optimized recognizer also has high performance for recognition of the considered different kinds of digital signal for all SNRs. This high efficiency is achieved with only seven features, which have been selected using BA. For future works, we can use the BA in order to construct of the classifier as well as the features selection. Also we can consider another set of digital signal types and evaluate this technique for recognition of them.
100 99.8
References 100 99.8 99.6
Different setup of signal types will lead to different performance. Besides, there are many different kinds of benchmarking systems used for signal’s quality. This causes difficulties for direct numerical comparison. As for neural network-based signal recognizers, the authors in Chani and Lamontagne (1993), reported a generalization rate of 90% and 93% of accuracy of data sets with SNR of 15–25 dB. However, the performances for lower SNRs are reported to be less than 80% for a fully connected network, and about 90% for a hierarchical network. Mingquan et al. (1996) show an average accuracy around 90% with 4096 samples per realization and SNR ranges between 5 and 25 dB. By increasing the number of samples, an increase of 5% in average performance is achieved. In Mingquan et al. (1998), Lu et al. show through computer simulation the average recognition rate is 83%, and it reaches over 90% for SNR value of over 20 dB. However, if SNR is less than 10 dB, the performance drops to less than 70%. The hybrid intelligent technique that is proposed in this paper has many advantages. It recognizes different kinds of the digital radio signal. This recognizer can identify the considered digital signals even at very low SNRs; For instance it has a success rate of around 88% at SNR = 4 dB. The performances of the recognizer is higher than 98% for SNR > 4 dB. Through feature selection, the input dimension is reduced to less than quarter of the original size without trading of the generalization ability and accuracy. 7. Conclusion Automatic recognition of digital signal formats is an important subject for novel communication systems. Most of the previous techniques can only recognize a few kinds of digital signal and/or lower orders of digital signals. They usually need high levels of SNR for classification of the considered digital signals. These problems are mainly due to the facts: the features, the selection method of the features and the classifier that are used in these techniques. In this paper we have proposed a combination set of spectral features and higher order moments up to eighth and higher order cumulants up to eighth as the effective features. These features have high ability in representing of the radio signal formats. As the classifier, we have proposed a MLP neural network with different training algorithms. By using the mentioned features and classifier, we have presented a highly efficient recognizer. This recognizer discriminates a lot of digital signal types with high accuracy even at very low SNRs. But a lot of features are used for this recognition. In order to reduce the complexity of the recognizer we have used an optimizer, i.e. Bees Algorithm (BA). Using
Avci, E., Hanbay, D., & Varol, A. (2007). An expert discrete wavelet adaptive network based fuzzy inference system for digital modulation recognition. Expert Systems with Applications, 33, 582–589. Battiti, R. (1992). First and second order methods for learning. Neural Computation, 4, 141–166. Chani, N., & Lamontagne, R. (1993). Neural networks applied to the classification of spectral features for automatic modulation recognition. Proceedings of MILCOM, 1494–1498. Deng, H., Doroslovacki, M., Mustafa, H., Jinghao, X., & Sunggy, K. (2002). Automatic digital modulation classification using instantaneous features. Proceedings of ICASSP, 4, IV4168. Dobre, O. A., Bar-Ness, Y., & Su, W. (2004). Robust QAM modulation classification algorithm based on cyclic cumulants. Proceedings of WCNC, 745–748. Donoho, D., & Huo, X. (1997). Large-sample modulation classification using Hellinger representation. Signal Processing and Advanced Wireless Communications, 133–137. Fahlman, S. (1988). An empirical study of learning speed in back-propagation networks. Tech Report CMU-CS-88-162, Carnegie-Mellon University, Computer Science Department. Haykin, S. (1999). Neural networks: A comprehensive foundation. New York: MacMillan. Ho, K. C., Prokopiw, W., & Chan, Y. T. (2000). Modulation identification of digital signals by wavelet transform. Proceedings of IEE, Radar, Sonar Navigation, 147(4), 169–176. Hsue, S. Z., & Soliman, S. S. (1990). Automatic modulation classification using zerocrossing. IEE Proceedings of Radar, Sonar and Navigation, 137. pp. 459–46. Iversen, A. (2003). The use of artificial neural networks for automatic modulation recognition. Technical Report, Heriot-Watt University, December 2003. Jacobs, R. A. (1988). Increased rate of convergence through learning rate adaptation. Neural Networks, 1, 295–307. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Atrifical intelligence, 97, 273–324. Minai, A. A., & Williams, R. D. (1990). Back-propagation heuristics: A study of the extended delta-bar-delta algorithm. In International joint conference on neural networks, 1, (pp. 595–600) San Diego, California, June 17–21. Mingquan, L., Xianci, X., & Lemin, L. (1998). AR modeling based features extraction for multiple signals for modulation recognition. In Signal processing proceedings, fourth international conference, Beijing, Vol. 2 (pp. 1385–1388). Mingquan, L., Xianci, X., & Leming, L. (1996). Cyclic spectral features based modulation recognition. Proceedings of ICCT, 792–795. Mobasseri, B. G. (2000). Digital modulation classification using constellation shape. Signal Processing, 80, 251–277. Nandi, A. K., & Azzouz, E. E. (1998). Algorithms for automatic modulation recognition of communication signals. IEEE Transactions on Communications, 46(4), 431–436. Nikias, C. L., & Petropulu, A. P. (1993). Higher-order spectra analysis: A nonlinear signal processing framework. Englewood Cliffs, NJ: PTR Prentice-Hall. Ou, X., Huang, Xiaowei, Yuan, Xiao, & Yang, Wangquan (2004). Quasi-Haar wavelet and modulation identification of digital signals. Proceedings of ICASSP, 733–737. Pham, D. T., Ghanbarzadeh, A., Koc, E., Otri, S., Rahim, S., & Zaidi, S. (2006). The bees algorithm, a novel tool for complex optimization problems. In Proceedings of second international virtual Conference on intelligent production machines and systems (IPROMS 2006). Proakis, J. G. (2001). Digital communications. New York: McGraw-Hill. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press. Sehier, C. L. P. (1993). Automatic modulation recognition with a hierarchical neural network. Proceedings of MILCOM, 111–115. Spooner, C. M. (1995). Classification of cochannel communication signals using cyclic cumulants. In Proceedings of ASILOMAR (pp. 531–536). Sue, W., Jefferson, L. X., & Mengchou, Z. (2008). Real-time modulation classification based on maximum likelihood. IEEE Communications Letters, 12(11). Swami, A., & Sadler, B. M. (2000). Hierarchical digital modulation classification using cumulants. IEEE Transactions on Communications, 48, 416–429. Tollenaere, T. (1990). Supersab: Fast adaptive bp with good scaling properties. Neural Networks, 3, 561–573.