Kernel-based parallel multi-user detector for massive-MIMO

Kernel-based parallel multi-user detector for massive-MIMO

ARTICLE IN PRESS JID: CAEE [m3Gsc;February 15, 2017;10:27] Computers and Electrical Engineering 0 0 0 (2017) 1–11 Contents lists available at Scie...

886KB Sizes 0 Downloads 40 Views

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;February 15, 2017;10:27]

Computers and Electrical Engineering 0 0 0 (2017) 1–11

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Kernel-based parallel multi-user detector for massive-MIMOR Rangeet Mitra∗, Vimal Bhatia Discipline of Electrical Engineering, Indian Institute of Technology, Indore, India

a r t i c l e

i n f o

Article history: Received 15 June 2016 Revised 6 February 2017 Accepted 6 February 2017 Available online xxx Keywords: Massive-MIMO detection Power-amplifier nonlinearity Online detection RKHS techniques

a b s t r a c t One of the proposed solutions to meet the ever growing demand for data rates in 5G communication systems is to use large number of antennas (100–1000) in a massive multiple input multiple output (MIMO) communication system. Performing multi-user (MU) detection over massive-MIMO systems presents many challenges, prime among them being: large-dimensionality of the received dataset which can increase the computational complexity of traditional algorithms, and susceptibility to device impairments like power amplifier (PA) nonlinearity. Due to these factors, detection of the users’ symbols over uplink-MU massive-MIMO systems in a fast and computationally efficient way is an open problem. In this work, a reproducing kernel Hilbert space (RKHS) based block symbol detector is proposed for uplink-MU massive-MIMO systems that works on decomposed blocks of the observations, and selectively decides the use of an incoming observation, thereby rendering the detector to be computationally tractable, and robust to PAnonlinearity encountered in uplink-MU massive-MIMO. Simulations have been carried out in this work that demonstrate superior performance of the proposed approach as compared to batch/iterative least squares based algorithms. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction To cater to the ever increasing demand for high data rates, and for increasing number of users that can be accommodated reliably, a number of techniques have been suggested for the upcoming 5G standard [1]. As highlighted in the literature [2,3], increasing the number of antennas (as done in massive multiple input multiple output (MIMO) systems) increases spectral efficiency, quality and reliability of the wireless link which can be used to accommodate large number of users with higher data rates. However, massive-MIMO presents several challenges to modern research and nullifies certain pre-existing notions about MIMO detection [4]. For example, multi-user (MU) detection in massive-MIMO is a challenging problem due to the large dimensionality of the received observations [4,5]. More specifically, in the case of least-squares and the minimum mean squared error (MMSE) estimator, one cannot neglect the necessity to invert a Gram matrix at the receiver. This can become computationally very difficult when there are 100–1000 antennas [6]. In such scenarios, computationally simple online techniques are more preferable as compared to their batch counterparts for detection [5,7]. Also, high dimensionality in uplink-MU massive-MIMO systems leads to increase in the eigenvalue-spread of the data covariance matrix [5] which calls for rank-reduction based techniques.

R ∗

Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. M. D. Selvaraj. Corresponding author. E-mail addresses: [email protected] (R. Mitra), [email protected] (V. Bhatia).

http://dx.doi.org/10.1016/j.compeleceng.2017.02.005 0045-7906/© 2017 Elsevier Ltd. All rights reserved.

Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE 2

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

Another ubiquitous impairment to detection in the uplink massive-MIMO systems is the nonlinear power amplifier (PA) used in most transmitters, which can significantly impair the overall process of MIMO detection as has been pointed out in [8]. The work in [8], assumes a special structure of the PA (i.e. a polynomial nonlinearity). However, as has been surveyed in [9], PAs can have several underlying models and their characteristics may change with time. Hence, there is a need for a detection algorithm which inherently estimates and tracks the changes in the PA-nonlinearity. The MU detection is primarily categorized into two main categories for MIMO detection (with large number of antennas); a) algorithms based on meta-heuristics, and b) algorithms based on iterative descent. Among those algorithms based on meta-heuristics, and computationally simple approaches for MIMO detection for higher order modulation scheme are based on ant colony optimization as proposed in [10], and many others of its genre like those based on Tabu search [11]. The main limitations of these approaches are: a) these approaches are based on the squared error criterion which is not sufficient for nonlinear detection, b) these algorithms do not consider and compensate for PA-nonlinearity in their system model, and c) these algorithms consider very less number of antennas and hence come under the category of large-MIMO systems; hence it is doubtful whether these algorithms designed for 10–100 antennas could scale up to 100–1000 antennas for massive-MIMO systems. Whereas lately, iterative descent based techniques have attracted lot of interest [4,7,12] for MU massive-MIMO, which range from MMSE covariance matrix user updation based approaches to reduced-rank MSER based approach [5] and Neumann updates [13,14]. However, they too do not consider the effect of a PA in the overall system model. For example, the work in [4] uses a Gram matrix update for the linear channel matrix, which fails under a severe PA-nonlinearity. The work in [8] assumes a particular structure of the nonlinearity (Hammerstein model). However, as surveyed in [9], there are several other nonlinear models which are used to characterize the PA. The effect of device impairments like nonlinear characteristics of PA cannot be ignored. As reviewed in [15], as we move towards millimeter wave spectrum with high bandwidths to cater to increasing number of users in 5G, the devices at the RF front-end ceases to exhibit ideal characteristics. No general solution that can handle nonlinearity of an arbitrary kind is found in the massive-MIMO literature, which is a contribution of this paper. In this work, a parallel detector is proposed, that breaks the huge channel matrix into blocks and processes them individually. The proposed algorithm is inspired by the recent work on diffusion kernel least mean squares (KLMS) algorithm [16] (which has been validated via simulations and theory) and the quantized KLMS algorithm [17]. The benefits of the proposed algorithm are as follows: a) the proposed algorithm, by the Representer theorem in reproducing kernel Hilbert spaces (RKHS) [18], has the ability to approximate an arbitrary nonlinearity, thereby making it generic for any nonlinear PA-characteristics, b) the proposed algorithm, circumvents the computationally cumbersome detection problem involving a huge channel matrix by subdividing the large number of nonlinear equations into smaller blocks and performing parallel detection over them, and c) as validated from simulations, the proposed parallel detector outperforms total least squares (TLS), and K-best linear MMSE detectors surveyed in [19]. To the best of author’s belief, the nonlinear parallel detection problem over large channel matrices has not been considered in the literature for MU massive-MIMO in PA impaired scenarios (without apriori knowledge of PA-characteristics at the receiver). The proposed RKHS based detectors have the ability to learn arbitrary PA-nonlinearities and track changes in characteristics. The proposed detectors also provide significant performance gains for nonlinear uplink-MU massive-MIMO systems as compared to traditional TLS [6] and K-best approaches whilst maintaining computational simplicity. This paper is organized as follows: Section 2 provides the system model. In Section 3 review of RKHS based techniques and KLMS algorithm is provided. In Section 4 the proposed parallel detection algorithm is discussed, Section 5 presents simulations to validate the performance of the proposed algorithm, and Section 6 concludes the paper. 2. System model In this section, the system model is provided for an uplink-MU massive-MIMO system considered in this paper. As given in [6] the uplink system model for M-users (equipped with single antenna typical of a handset) describing a frequencyselective Rayleigh fading channel [20] under the assumption of a transmit side power-amplifier nonlinearity (given in [9]) can be written as:

yi =

T −1 

Hs f (xi−s ) + ni

(1)

s=0

where, T is the memory of the channel, yi ∈ CN is the received signal, H ∈ CN×M , where N(>>M) is the number of antennas at the receiver, subscript (·)i denotes the value of a quantity at the ith time instant and M denotes the number of users in the MU scenario as in [4]. The Hs,s=0 is a complex Gaussian matrix with zero mean and unit variance denoted as H ∼ CN (0, I ). For Hs,s=0 , we have a line of sight (LOS) path with a fixed Rice factor as given in [20]. For a MU massive-MIMO channel N ≥ 10M is chosen [4]. For example, xi ∈ CM is the signal vector at the ith time instant. The f(·) in (1) denotes the poweramplifer nonlinearity which is popularly modelled by Saleh or Rapp models as given in [9]. In the paradigm given in this paper, it will be assumed that at the receiver the nature of f(·) is not known. The transmitted vector, xi ∈ CM , denotes the symbols of all the M-users at a given time index i. The channel H is assumed to be quasi-static for a block of symbols. The ni ∈ CN is independent and identically distributed (i.i.d) additive white Gaussian noise taken from CN (0, σ 2 I ) at ith time instant and σ 2 denotes the variance of the Gaussian random variable. Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

3

Table 1 List of terms used and their meanings. Parameter

Meaning

η

Step-size User no. 1: M Block no. 1: L Time-index VQ constant VQ constant Kernel-spread parameter Observation dictionary for lth block for the uth user Error-term dictionary for lth block for the uth user Cartesian product of Iil (u ) and Yil (u ) Instantaneous error-term for lth block for the uth user Minimum Euclidean distance based detection rule Index-variable for sets Yil (u ) and Iil (u ) Location of the closest point to current observation yi in the dictionary Yil (u ) . Final estimate Pilot sample for uth user Cardinality

Super-scirpt index (u) Super-scirpt index l Sub-script i

τ  γ

Yil (u ) Iil (u ) Dil (u ) eli (u ) Q (. ) (j) j∗ xˆ i xi(u ) |·|

In this work a correlated frequency-selective Rayleigh fading based MU massive-MIMO channel model is also assumed as in [21]. Channel correlation is one of the limiting factors in MU massive-MIMO detection which prevents achieving full diversity and degrades the bit error rate (BER) performance [22]. In the considered model, let there be two ma trices R1 = Toeplitz([1, ρ 2 , ρ 4 , . . . , ρ 2N−2 ] ) and R2 = Toeplitz([1, ρ 2 , ρ 4 , . . . , ρ 2M−2 ] ), and a matrix Hs ∈ CN×M drawn from CN (0, I ), defined for all T taps except the first tap (which has an additional LOS component). Then, the channel matrix for 1



1

the sth path can be written as, Hs = R12 Hs R22 . The parameter ρ controls the correlation of Rayleigh faded channel. Typically, ρ = 0.5 has been reported to be of mediocre correlation, and ρ = 0.9 can be assumed to represent strongly correlated Rayleigh fading channel. In the next section, we review the KLMS algorithm as a basic detection algorithm, and highlight some of its salient properties. 3. Review of KLMS algorithm The concept of RKHS has been known since late 1990’s, when the nonlinear support vector machine based approaches were proposed [23]. However, recently, many other popular online learning based approaches like least mean squares (LMS), perceptron, recursive least squares (RLS) were also absorbed into the RKHS framework [24]. This absorption culminated in a new research area called kernel adaptive filtering [24]. RKHS based approaches are advantageous, mainly, when a separating hyperplane that can separate the classes (the separating hyperplane being the detector) does not exist. Instead, the decision boundary that has to be learnt is a nonlinear surface. In such scenarios affected by linearly non-separable observations, online RKHS based techniques are a good replacement for conventional techniques like LMS and least squares (LS). Among kernel adaptive filtering techniques in the literature, the kernel least mean squares (KLMS) based approaches are the most popular learning algorithms for non-linear signal processing [25]; others being kernel recursive least squares [26], and kernel maximum correntropy [27]. They have been found to be useful in diverse non-linear signal processing problems like nonlinear channel equalization and channel estimation. Now we proceed to review the classical KLMS algorithm, which is a popular online RKHS technique. The KLMS based approaches map the incoming observation at the ith instant implicitly by a feature map φ : CN → H, where H denotes the RKHS in which any arbitrary function can be approximated as a weighted sum of positive definite kernels as proved by the Representer theorem in [18]. Let the implicit weights in RKHS at the ith instant be denoted by i(u ) for the uth user. A summary of terminology used in this work is given in Table 1. The estimated output of the KLMS algorithm [25] at the ith instant for the uth user, xˆi(u ) , is then expressed as an inner product in H (denoted by < ·, · >H ) between the mapped observation φ (yi ) and i(u ) . This can be written mathematically as,

xˆi(u ) =< i(u ) , φ (yi ) >H

(2)

(u ) The cost function, JKLMS relies on the instantaneous approximation of mean squared error between xˆi(u ) and xi(u ) , hence can

be written as:

i

(u ) JKLMS = (xi(u ) − xˆi(u ) )2 i

(3)

Upon optimizing (3) in RKHS by stochastic gradient descent, we arrive at the following adaptation equation for KLMS: u) i(+1 = i(u ) + η (xi(u ) − xˆi(u ) )φ (yi )

(4)

Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE 4

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

where η is the step-size. These adaptation equations are in the RKHS H and it seems difficult to implement them without knowing the map φ (·). However, there exists a solution to this problem in the literature [25] called the “kernel trick”. To apply the kernel trick, we first take the inner product on both sides of (4) by the current mapped observation. Assuming zero initial conditions, (4) can then be written, as follows, as a running summation: i  (xi(u) − xˆi(u) ) < φ (yi ), φ (yi+1 ) >H

u) xˆi(+1 =η

(5)

i=1

where,



< φ (yi ), φ (yi+1 ) >H = exp(−γ

 ∀p

 ( yi ( p ) −

)

y∗i+1( p) 2

(6)

(·)i(p) denotes the pth component of the vector at the ith time instant and (·)∗ denotes conjugation. The expression in (6) is the widely used complex Gaussian kernel which has the capability to represent/invert any nonlinearity by the Representer theorem [18]. The kernel bandwidth parameter γ is calculated by the Silverman’s rule [28], and is used in all considered scenarios. Physically, γ controls the spread of the Gaussian kernel. As reported in [29] (and as is obvious from (5)), a problem faced by kernel based techniques is the temporally (i.e. with time) expanding requirement of storage/computational load with incoming observations yi . Hence, in [29], finite dictionary based approaches are presented which curtail the computational complexity of the RKHS based techniques and make it feasible for real time implementation. Also, when the number of regressors is quite large (which is typically the scenario in massive-MIMO), it is advisable to go for scalable algorithms. In this context, online RKHS-based learning scenarios, two algorithms, the diffusion-KLMS [16] and the quantized diffusion-KLMS [17] have been proposed in the literature and their convergence has been proven theoretically. The diffusion KLMS algorithm suffers from the same temporally expanding storage requirement as in the case of KLMS, which is circumvented by the quantized diffusion-KLMS. The detection problem considered in this work, from system model Eq. (1) has two main characteristics: a) It is a nonlinear detection problem due to PA impairments, and b) It has huge size due to the high values of N > >M. Hence, scalable online RKHS techniques (like the quantized diffusion-KLMS), which could break the massive channel matrix into blocks, and make inference over them, would be a valid detection algorithm in such scenario. The same is proposed and described in details in the next section. 4. Proposed detection technique In this section, a technique is proposed for dealing with the huge set of equations in (1). This technique is inspired by considering blocks of channel matrix H and fusing them. This approach is common in the literature in scenarios when there are large number of antennas [3,19]. However, in this work, the effect of PA nonlinearity is also considered in the formulation of uplink-MU massive-MIMO detection algorithm by mapping to RKHS H so as to estimate the overall inverse of channel and the PA nonlinearity implicitly (which has not been considered in the literature). The description of the proposed detector is divided into two subsections, namely, a) Block mapping to RKHS, and b) KLMS based detector. 4.1. Block mapping to RKHS The lth block of the channel matrix, Hl , is defined as:



T

T

H = H1 H2 . . .

T

Hl . . .

HL

T

T

(7)

where L is the number of disjoint blocks of size NL × M and (·)T denotes the transpose operator. It is to be noted that any arbitrarily large number of L cannot be permitted; L can only be increased to the limit until NL (>> M ) is satisfied. Also, the disjoint blocks of the output y, yli , at the ith time instant is given as:



T

T

T

yi = y1i y2i . . . yli . . . yLi

T

T

(8)

and hence, from (1), the system model for the lth block yl is defined as follows:

yli =

T −1 

Hls f (xi−s ) + nli

(9)

s=0 N

where nl ∈ C L is the noise vector of the lth block. Clearly, detection of x conditioned on the observations y is a linearly non-separable detection problem due to the nonlinearity f(·). The classical stochastic square law based cost function like LS have limited applicability as they involve learning a separating hyperplane which may not be sufficient to separate the signals in scenarios when the decision boundary is non-affine due to the PA-nonlinearity. The RKHS based algorithms are quite useful in dealing with such scenarios [24]. In this work, it is aimed to treat the cost functions corresponding to L channel blocks to learn, in an online manner (using the kernel trick), dictionaries and error Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

5

terms corresponding to each block for L independent symbol detections. Finally, the detected symbols are fused by taking the l l mean of all the l estimates for all users xˆ , {Q(xˆ )}Ll=1 where Q(· ) is the minimum Euclidean distance detection operation. In

this context, the notion of a dictionary is introduced for the lth block Dil (u ) = (Iil (u ) , Yil (u ) ) at the ith time instant consisting of |I l ( u ) |

|Y l ( u ) |

i i observations of a dictionary of error terms, Iil (u ) = {Iil (u ) ( j )} j=1 , and the dictionary of observations, Yil (u ) = {Yil (u ) ( j )} j=1 , for each of the u = 1, 2, · · · , M users. The symbol | · | denotes cardinality of the corresponding dictionary. Subsequently, the observations from lth block yli is mapped into RKHS via an implicit feature map, φ (yli ), and is used to learn and update a

dictionary, Yil (u ) , and error terms, Iil (u ) , specific to each block for each user. Hence, based on this proposed paradigm, the L dictionaries and error terms are formed by am online vector-quantization (VQ) technique using the novelty criterion (NC) based sparsification rule. The proposed algorithm is referred to as “KLMS based detector”. Algorithm 1 KLMS based detector-Parallel Kernel LMS based detector for uplink-MU massive-MIMO using NC.

Initialize step-size η, kernel width γ and quantization threshold  > 0, and initial dictionary D0l (u ) = {x0(u ) , yl0(u ) }, ∀l, u. Initialize τ and  . while i > 0 do for u = 1 : M do for l = 1 : L do while |Dil (u ) | ≥ 1∀l do |Dil (u) | l (u ) xˆli (u ) = j=1 Ii ( j ) < φ (Yil (u ) ( j )), φ (yli ) >H eli (u ) = xi(u ) − xˆli (u ) j∗ = arg min

l (u )

1≤ j≤|Di

|

yli − Yil (u) ( j ) 2

if yli − Yil (u ) ( j∗ ) 2 ≤  (u ) Yil (u ) = Yil−1 l (u ) ∗ (u ) ∗ Ii ( j ) = Iil−1 (j )

or

|eli (u) | < τ then

+ η el ( u ) else (u ) (u ) Yil (u ) = Yil−1 ∪ yli , Iil (u ) = Iil−1 ∪ eli (u ) end if end while end for end for  l Ouptut: xˆ i = 1L l Q(xˆ i ) end while

4.2. KLMS based detector The proposed KLMS based detector is discussed in details in Algorithm 1. This algorithm has similar structure like quantized diffusion KLMS algorithm as proposed in [29]. Separate dictionaries are maintained for all l blocks of the observation and updated according to the error term eli for the lth block. The output of the detector at each iteration for all L blocks is calculated parallelly by taking a weighted sum of the kernel inner product of entries in the dictionary Yil (u ) (for each of the

u = 1, 2, · · · , M users) with the observation yli . This inner product between the jth vector in the dictionary, Yil (u ) ( j )), and current observation yli in RKHS (denoted by < φ (Yil (u ) ( j )), φ (yli ) >H ) and is defined in (6). By the Representer theorem [18], the weighted sum of inner products in RKHS have the ability to approximate the overall inverse of the nonlinear PA and the channel; the weights denoting the dictionary of error terms for the lth block Iil (u ) for the uth user. For the proposed KLMS based detector, first the dictionary of observations, Yil is intialized to contain the first observation

yli

and the error-term dictionary Iil is initialized to the first pilot symbol x0(u ) . Output xˆli (u ) is calculated based on the pre-

existing dictionary for observations for the uth user. The error term (xi(u ) − xˆli (u ) ) is calculated using xˆli . If a significantly similar observation yli is observed as compared to the entries in the pre-existing dictionary (the measure of difference

being the Euclidean norm of the difference between yli and the closest entry to it in Yil (u ) being lesser than a threshold  , i.e., min

l (u )

1≤ j≤|Yi

|

yli − Yil (u) ( j ) 2 <  ) or the magnitude of the next error term is below a threshold τ , then the dictionary

remains unchanged, and only the j∗ th error term in Iil (u ) is updated, where j∗ = arg min

l (u )

1≤ j≤|Yi

|

yli − Yil (u) ( j ) 2 . It is to

be noted that both yli and the j∗ th entry in Yil (u ) (denoted by Yil (u ) ( j∗ )) are vectors in C L . By this process, only those terms which contribute significantly to the summation in approximating the channel inverse and the nonlinearity are added N

Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE 6

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11 Table 2 Comparison of the storage requirement and computational complexity for all the detection algorithms. Algorithm TLS K-Best KLMS based detector

Computational cost 2

O(M N) O(M2 K) [19] O(|Di |) [26]

Storage requirement M×N M×N N |Di | L

to the dictionary.  and τ are VQ constants which are initialized by a well-known rule [24] as τ = σe and  = 0.1



1 2γ

(σ e is the targeted desired MSE floor specified by the user as given in [24]). If the current observation is different (i.e. the Euclidean distance to the nearest entry in the dictionary exceeds  ) and the error term exceeds the threshold τ , the observation dictionary Yil (u ) is updated. The error term for ith time instant and the

observation yli (u ) is appended to the Iil (u ) and the Yil (u ) respectively. This process of online vector quantization keeps only the relevant and significantly different observations, whilst discarding the redundant observations. This helps in curtailing the computation demand and the storage requirement of the proposed detector, thereby facilitating computational simplicity in scenarios when the dimensionality of the incoming observations is large (due to large number of receive antennas). Finally, l the vector of estimates for M users for L blocks (denoted by Q(xˆ i ) for lth block) are aggregated and fused by taking the statistical mean. Convergence guarantees of the proposed algorithm stems from Cover’s theorem [24], Representer’s theorem [18], and the analysis given in [16]. 5. Simulations In this section, simulations are presented that validate the proposed KLMS based detector against commonly used detectors for uplink-MU massive-MIMO like TLS and K-best in the presence of PA impairments. The PA model f(.), is assumed to have the Rapp model as given in [9],

| f ( xi ( p ) )| =

Gxi( p)

(10)

1

( p ) 2ν ( p ) 2ν ( p ) (1 + | Visat | )

x

where G, Vsat are the gain and saturation voltage respectively of the PA and ν (p) controls the severity of the nonlinearity for the pth user (as each component of the vector xi denotes the transmitted signal of each user). For simulations, G = 1 and Vsat = 0.5 are chosen so as to generate a reasonably nonlinear PA saturation nonlinearity [9]. The ν (p) for each pth user is chosen randomly between 0.5 and 50 following a uniform distribution to model differing UE characteristics. The Rice factor for the LOS path was considered to be −10 dB, i.e. the LOS path is 10 times weaker than the multipath component. Using this PA model, M = 8 users with N = 320 receive antennas at base-station is considered for uplink-MU massive MIMO system. Four different scenarios are considered for: a) L = 1, b) L = 2, c) L = 4 and d) L = 10. The proposed techniques are compared with the K-best detector surveyed in [19] and the TLS detector. The K-best algorithm is a polynomial complexity detector which is a general case of successive interference cancellelation, and it minimizes the MMSE objective for large dimensional systems in a computationally simple manner; thus making it suitable for linear channels. The proposed approaches are also compared against TLS which is an optimal detector [19] over linear channels. Two fading scenarios are considered in this work: a) i.i.d Rayleigh fading in which all entries from the channel matrix are drawn independently (an assumption that frequently fails in MU massive-MIMO scenarios), and b) correlated Rayleigh fading  with correlation coefficient ρ = 0.75 which represents a reasonably severe correlation [30]. Also, τ = σe and  = 0.1

1 2γ

, where σ e is the MSE

floor specified by the user as given in [24]. For comparing the K-best detector, the value of K is assumed to be 5. In Figs. 1 and 2, the KLMS based detector is compared in terms of BER performance for various values of L with the K-best detector and TLS respectively. The proposed detector is found to be outperforming K-best detector and the TLS detector under the assumed Rapp nonlinearity f(·) both for the i.i.d fading scenario (in Fig. 1) and the correlated fading scenario (with ρ = 0.75) observed in Fig. 2. From Fig. 2, it is observed that the performance of all the compared algorithms degrade in correlated fading scenario as compared to the i.i.d scenario which is quite intuitive. Further, it is also observed that the kernel based detector proposed in this work outperforms the K-best detector, TLS detector, and maintains a reasonable BER performance in the presence of PA impairments. The primary reason for improved performance of the proposed detector over K-best and TLS detectors is that the K-best and TLS algorithms are based on suboptimal linear MMSE criterion which is invalid in severe PA nonlinearity scenarios. It can also be observed that increasing L indefinitely results in degradation of BER performance as the degrees of freedom of the lth matrix block available to the detector is reduced as L is increased. Hence, there is a tradeoff between “computational cost”/“receiver complexity”, and BER performance of the detector. Further, we compare the proposed detector in terms of computational requirements with the K-best detector and the TLS in Table 2. For symbol detection in the TLS detector, an M by M Gram matrix needs to be inverted whose computational cost is O(M3 ). Also to calculate the Gram matrix M2 N operations are needed. Therefore, as N > >M in massive MIMO, the Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

7

0

10

KLMS based Detector−L=1 KLMS based Detector−L=2 KLMS based Detector−L=4 KLMS based Detector−L=8 k−best MMSE TLS

−1

10

−2

BER

10

−3

10

−4

10

−5

10

5

10

15

20

25

30

SNR Fig. 1. Performance comparison of the KLMS based detector with L = 1, 2, 4, 8 with K-best detector and TLS in nonlinear power-amplifier scenarios for i.i.d frequency-selective Rayleigh fading for M = 8 and N = 320, K = 5, η = 0.22, T = 4.

0

10

KLMS based Detector−L=1 KLMS based Detector−L=2 KLMS based Detector−L=4 KLMS based Detector−L=8 k−best MMSE TLS

−1

BER

10

−2

10

−3

10

−4

10

5

10

15

20

25

30

SNR Fig. 2. Performance comparison of the KLMS based detector with L = 1, 2, 4, 8 with K-best detector and TLS in nonlinear power-amplifier scenarios for correlated frequency-selective Rayleigh fading (ρ = 0.75) for M = 8 and N = 320, K = 5, η = 0.22, T = 4.

computational cost is O(M2 N). Similarly for the K-best approach, as given in [19], for large N, the computational cost of the detector is O(M2 K). Both the K-best approach and TLS need to store the N × M channel matrix; hence the storage requirement is N × M. For the proposed detector, the temporal computational requirement grows linearly with the dictionary size [26] and hence is given by O(|Di | ). Also, the temporal storage requirement grows linearly with the number of observations in the dictionary and hence are given by |Di | NL . Thus it can be observed from the Table 2 that for large N, the storage complexity of all the approaches grows linearly with N and hence they are comparable. The computational requirement also grows linearly with N for TLS and K for the K-best detector as we increase the number of antennas at the receiver. For the proposed approaches, however, the number of computations saturate at the number of vectors available in the dictionary, |Di | thus making it computationally simpler as compared to TLS and the K-best detector. Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE 8

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

L=1 N=80 N=160 N=320 N=400

1

0.5

0

500

200

1000

500

1500

1000

L=2

N=80 N=160 N=320 N=400

0.5

Multiplications

600

1

1500

Iterations

Iterations L=2

1.5

MSE

N=80 N=160 N=320 N=400

400

0

0

N=80 N=160 N=320 N=400

400

200

0

0 0

500

1000

0

1500

0.5

600

Multiplications

N=80 N=160 N=320 N=400

1

500

1000

1500

Iterations

Iterations L=4

1.5

MSE

L=1

600

Multiplications

MSE

1.5

L=4 N=80 N=160 N=320 N=400

400

200

0

0 400

600

Iterations L=8

1.5

200 400 600 800 1000 1200 1400

800 1000 1200 1400

1

Iterations N=80 N=160 N=320 N=400

0.5

600

Multiplications

200

MSE

[m3Gsc;February 15, 2017;10:27]

L=8 N=80 N=160 N=320 N=400

400

200

0

0 200

400

600

800 1000 1200 1400

Iterations

200 400 600 800 1000 1200 1400

Iterations

Fig. 3. MSE and computational-load comparison for the KLMS-based detector with L = 1, 2, 4, 8 with K-best detector and TLS in nonlinear power-amplifier scenarios for i.i.d frequency-selective Rayleigh fading for M = 8 and N = 80, 160, 320, 400, and η = 0.22, T = 4.

Finally, computational complexity and the convergence of the proposed approach is demonstrated by varying the antenna size N in Figs. 3 and 4 for the i.i.d and correlated frequency-selective fading scenarios respectively. It is observed from Figs. 3 and 4 that as we increase the number of receive antennas from 80 to 400 the convergence curves show faster convergence with lower dictionary-size requirement. This improvement in performance is clearly due to the increase in number of measurements in the overall system of equations. Also, it can be observed that whenever, NL ≤ 10M, the performance of the proposed detector deteriorates and the algorithm does not achieve low MSE floor and requires a larger dictionary-size. Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

L=1 N=80 N=160 N=320 N=400

1

0.5

200 0

0

500

200

1500

400

600

800 1000 1200 1400

Iterations L=2

600 N=80 N=160 N=320 N=400

1

0.5

Multiplications

MSE

1000

Iterations L=2

1.5

N=80 N=160 N=320 N=400

400 200 0

0 500

1000

Iterations L=4

1.5

1

0

1500

500

1000

1500

Iterations N=80 N=160 N=320 N=400

0.5

L=4

600

Multiplications

0

MSE

N=80 N=160 N=320 N=400

400

0

N=80 N=160 N=320 N=400

400 200 0

0 400

600

800

Iterations L=8

1.5

1

200

1000 1200 1400

N=80 N=160 N=320 N=400

0.5

400

600

800

1000 1200 1400

Iterations L=8

600

Multiplications

200

MSE

L=1

600

Multiplications

MSE

1.5

9

N=80 N=160 N=320 N=400

400

200

0

0 200

400

600

800

Iterations

1000 1200 1400

200

400

600

800

1000 1200 1400

Iterations

Fig. 4. MSE and computational-load comparison for the proposed KLMS based detector for correlated frequency-selective Rayleigh fading (ρ = 0.75) for M = 8 and N = 80, 160, 320, 400, and η = 0.22, T = 4.

6. Conclusion A novel kernel based multiuser detector for massive-MIMO in the presence of PA nonlinearity is proposed. The proposed detector outperforms the K-best detector and TLS algorithms for the considered massive-MIMO system model. Simulations validate and compare the proposed algorithm against the existing techniques, and suggest a trade-off between BER performance and computational complexity. The simulation results also indicate that the proposed algorithm is scalable and offers a low-complexity solution for detection over nonlinear uplink-MU massive-MIMO channels. References [1] Boccardi F, Heath RW, Lozano A, Marzetta TL, Popovski P. Five disruptive technology directions for 5G. Commun Mag IEEE 2014;52(2):74–80. [2] Rusek F, Persson D, Lau BK, Larsson EG, Marzetta TL, Edfors O, et al. Scaling up MIMO: opportunities and challenges with very large arrays. Signal Process Mag IEEE 2013;30(1):40–60. [3] Chockalingam A, Rajan BS. Large MIMO ystems. Cambridge University Press; 2014.

Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

JID: CAEE 10

ARTICLE IN PRESS

[m3Gsc;February 15, 2017;10:27]

R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

[4] Larsson E, Edfors O, Tufvesson F, Marzetta T. Massive MIMO for next generation wireless systems. Commun Mag IEEE 2014;52(2):186–95. [5] Cai Y, de Lamare RC, Champagne B, Qin B, Zhao M. Adaptive reduced-rank receive processing based on minimum symbol-error-rate criterion for large-scale multiple-antenna systems. Commun IEEE Trans 2015;63(11):4185–201. [6] Rosario F, Monteiro FA, Rodrigues A. Fast matrix inversion updates for massive MIMO detection and precoding. Signal Process Lett IEEE 2016;23(1): 75–79. [7] Qin X, Yan Z, He G. A near-optimal detection scheme based on joint steepest descent and Jacobi method for uplink massive MIMO systems. Commun Lett IEEE 2016;20(2):276–9. [8] Zou Y, Raeesi O, Antilla L, Hakkarainen A, Vieira J, Tufvesson F, et al. Impact of power amplifier nonlinearities in multi-user massive MIMO downlink. In: 2015 IEEE Globecom workshops (GC wkshps). IEEE; 2015. p. 1–7. [9] Gharaibeh KM. Nonlinear distortion in wireless systems: modeling and simulation with MATLAB. John Wiley & Sons; 2011. [10] Mandloi M, Bhatia V. A low-complexity hybrid algorithm based on particle swarm and ant colony optimization for large-MIMO detection. Expert Syst Appl 2016;50:66–74. [11] Srinidhi N, Datta T, Chockalingam A, Rajan BS. Layered Tabu search algorithm for large-MIMO detection and a lower bound on ML performance. Commun IEEE Trans 2011;59(11):2955–63. [12] Tang C, Liu C, Yuan L, Xing Z. High precision low complexity matrix inversion based on Newton iteration for data detection in the massive MIMO. Commun Lett IEEE 2016;20(3):490–3. [13] Wu M, Yin B, Wang G, Dick C, Cavallaro JR, Studer C. Large-scale MIMO detection for 3GPP LTE: algorithms and FPGA implementations. Sel Top Signal Process IEEE J 2014;8(5):916–29. [14] Zhu D, Li B, Liang P. On the matrix inversion approximation based on Neumann series in massive MIMO systems. In: Communications (ICC), 2015 IEEE international conference on. IEEE; 2015. p. 1763–9. [15] Bjornson E, Hoydis J, Kountouris M, Debbah M. Massive MIMO systems with non-ideal hardware: energy efficiency, estimation, and capacity limits. Inf Theory IEEE Trans 2014;60(11):7112–39. [16] Mitra R, Bhatia V. The diffusion-KLMS algorithm. In: Information technology (ICIT), 2014 international conference on. IEEE; 2014. p. 256–9. [17] Mitra R, Bhatia V. Finite dictionary variants of the diffusion KLMS algorithm. arXiv:1509.02730, 2015. [18] Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Computational learning theory. Springer; 2001. p. 416–26. [19] da Silva MM, Monteiro FA. MIMO processing for 4G and beyond: fundamentals and evolution. CRC Press; 2014. [20] Gao Z, Dai L, Hu C, Wang Z. Channel estimation for millimeter-wave massive MIMO with hybrid precoding over frequency-selective fading channels. Commun Lett IEEE 2016;20(6):1259–62. [21] Loyka SL. Channel capacity of MIMO architecture using the exponential correlation matrix. Commun Lett IEEE 2001;5(9):369–71. [22] Lu L, Li GY, Swindlehurst AL, Ashikhmin A, Zhang R. An overview of massive MIMO: benefits and challenges. Sel Top Signal Process IEEE J 2014;8(5):742–58. [23] Sebald DJ, Bucklew JA. Support vector machine techniques for nonlinear equalization. Signal Process IEEE Trans 20 0 0;48(11):3217–26. [24] Liu W, Principe JC, Haykin S. Kernel adaptive filtering: a comprehensive introduction, vol. 57. John Wiley & Sons; 2011. [25] Liu W, Pokharel PP, Principe JC. The kernel least-mean-square algorithm. Signal Process IEEE Trans 2008;56(2):543–54. [26] Chen B, Zhao S, Zhu P, Principe JC. Quantized kernel recursive least squares algorithm. Neural Netw Learn Syst IEEE Trans 2013;24(9):1484–91. [27] Zhao S, Chen B, Principe JC. Kernel adaptive filtering with maximum correntropy criterion. In: Neural networks (IJCNN), the 2011 international joint conference on. IEEE; 2011. p. 2012–17. [28] Silverman BW. Density estimation for statistics and data analysis, vol. 26. CRC press; 1986. [29] Chen B, Zhao S, Zhu P, Principe JC. Quantized kernel least mean square algorithm. Neural Netw Learn Syst IEEE Trans 2012;23(1):22–32. [30] Nafkha A, Aziz B. Closed-form approximation for the performance of finite sample-based energy detection using correlated receiving antennas. Wirel Commun Lett IEEE 2014;3(6):577–80.

Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005

JID: CAEE

ARTICLE IN PRESS R. Mitra, V. Bhatia / Computers and Electrical Engineering 000 (2017) 1–11

[m3Gsc;February 15, 2017;10:27] 11

Rangeet Mitra: received B.Tech. degree from Asansol Engineering College, India, in 2008, and the masters from IIT Guwahati, India. His main areas of interest are analysis of sparse RKHS based learning systems. He is currently pursuing Ph.D. from IIT Indore, India. Vimal Bhatia: received Ph.D. from Institute for Digital Communications at the University of Edinburgh (UoE), UK in 2005. He is currently an associate professor in Discipline of Electrical Engineering at IIT Indore. His research interests are in the algorithms and solutions for future communication and optical systems.

Please cite this article as: R. Mitra, V. Bhatia, Kernel-based parallel multi-user detector for massive-MIMO, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.02.005