Digital Signal Processing 99 (2020) 102671
Contents lists available at ScienceDirect
Digital Signal Processing www.elsevier.com/locate/dsp
Pseudo-stochastic EM for sub-Gaussian
α -stable mixture models
Shaho Zarei 1 , Adel Mohammadpour ∗,2 Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
a r t i c l e
i n f o
Article history: Available online 28 January 2020 Keywords: Mixture models Sub-Gaussian α -stable distribution Expectation-maximization algorithm Rejection sampling
a b s t r a c t Due to the non-existence of a closed-form expression for sub-Gaussian α -stable densities, the M-step of the Expectation-Maximization (EM) algorithm for the Sub-Gaussian α -Stable Mixture Models (SGα SMMs) is intractable, and the EM algorithm for SGα SMM is still an open problem. SGα SMM can model nonhomogeneous Gaussian data, accommodate outliers, and high leverage data points, which are concepts of primary importance in robust mixture models. These models are robust and useful tools in modeling heterogeneous data with outlier observations, such as clustering financial or impulsive data. In this paper, a new EM algorithm based on a combination of EM (the first part) and stochastic EM (the second part) algorithms, is used to obtain the maximum likelihood estimators of the parameters of SGα SMM in the M-step. In the first part, the model parameters, except α s, are estimated from an analytical form via EM. In the second part, based on a stochastic EM, the maximum likelihood estimator of α , in each component, is calculated from pseudo-simulated data obtained by suitable rejection sampling. The efficiency of the proposed algorithm is illustrated by using both real and simulated data. © 2020 Elsevier Inc. All rights reserved.
1. Introduction Finite Mixture Models (FMMs) [14,25] are the powerful models that allow us to model data sampled from a heterogeneous population composed of distinct subpopulations. FMM acts as a convex combination of the products of two or more probability density functions with many important applications in clustering and classification. A Gaussian Mixture Model (GMM) is a prominent subclass of FMM. They assume all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. The properties of the Gaussian models make GMMs a suitable choice in many environments. The central limit theorem justifies the use of the Gaussian models in many scenarios. Furthermore, they are stable, meaning that any linear combination of multiple Gaussian random variables will still be Gaussian. Unfortunately, real data are often contaminated by atypical (outliers) observations. These observations affect the estimation of the model parameters, resulting in a GMM with poor performance. The detection of these atypical observations and the development
*
Corresponding author. E-mail addresses:
[email protected] (S. Zarei),
[email protected] (A. Mohammadpour). 1 Present address: Department of Statistics, Faculty of Science, University of Kurdistan, Sanandaj, Iran. 2 Present address: Department of Mathematics and Statistics, McGill University. https://doi.org/10.1016/j.dsp.2020.102671 1051-2004/© 2020 Elsevier Inc. All rights reserved.
of robust methods of parameter estimation are becoming increasingly popular. In this paper, to make FMM even more robust, we replace the Gaussian distribution with the Sub-Gaussian α -Stable (SGα S) distribution, leading to SGα SMM. The SGα S distribution is an elliptical generalization of the Gaussian distribution [29] having one additional parameter, α ∈ (0, 2], governing the tail weight. There is no closed-form expression for the density function of SGα S distributions. A common way of defining them is by their characteristic function. If X is a d-variate SGα S random vector, then joint characteristic function of X is given as follows, [23]:
E exp(it X ) = exp −
1 2
t t
α2
+ it μ ,
(1)
where μ is the location vector and the positive definite matrix is called dispersion (or shape) matrix. We use the notation X ∼ S d (α , , μ) to represent that distribution of X is a d-variate SGα S with tail index α , dispersion matrix and location parameter μ. From equation (1) the multivariate Gaussian distribution is obtained, as a special case, with α = 2, while elliptically symmetric distributions with heavier tails are obtained as α < 2. For instance, the multivariate Cauchy distribution is obtained for α = 1. The variance of a SGα S distribution diverges to infinity when α < 2, and this allows the model to be more robust than the competing heavy-tailed distributions, such as t Mixture Model (t MM; [31]). In addition, SGα S distributions have the stability law, mean-
2
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
ing that the sum of any number of independent random vectors of SGα S distribution, is still an SGα S distribution. Furthermore, the normalized sum of independent and identically distributed random variables with infinite variance converges in distribution to a non-Gaussian α -stable distribution by the generalized central limit theorem. SGα S is a special case of this family. This convergence in distribution is an essential feature in various applications like financial settings, [23]. The mentioned properties motivate us to define SGα SMM. Due to the non-existence of a closed-form expression for the SGα S density functions, the maximization step of the EM technique [11] for finding the maximum likelihood estimate of SGα SMM parameters is intractable and still is an open problem. This motivated us to develop a method for estimating the parameters of SGα SMM by using a combination of the EM and Stochastic EM (SEM; [6]) algorithms. We call this method the Pseudo-SEM (PSEM) algorithm. The rest of the paper is organized as follows. The SGα SMM is introduced in Section 2. Parameter estimation of SGα SMM with PSEM is discussed in Section 3. A simulation study is carried out in Section 4 to demonstrate the effectiveness of the proposed model in estimating SGα SMM parameters in univariate and multivariate cases. Some applications on real data are illustrated in Section 5. Concluding remarks are drawn in Section 6. 2. Sub-Gaussian α -stable mixture model The univariate α -stable (or stable) distributions are a fourparameter family, with an index of stability α ∈ (0, 2] (also called tail index or characteristic exponent), skewness β ∈ [−1, 1], scale γ > 0, and location δ ∈ R. We denote the density function of the stable random variable by f α (.|β, γ , δ) and notation X ∼ S (α , β, γ , δ) shows that the random variable X has the stable distribution with the mentioned parameters. The positive stable distributions have β = 1 and α < 1 while symmetric α -stable distributions have β = 0. For more information on stable distributions, one can refer to [27,30,41]. In addition, some important theoretical and empirical reasons for using stable distribution to model signals are given by [43]. The SGα S distributions are a parametric subclass of the multivariate α -stable distribution. In fact, these distributions are the extension of the univariate symmetric α -stable distributions. Furthermore, if X ∼ S d (α , , μ), then they can be the product of a d-variate Gaussian distribution and the square root of a positive α -stable distribution. Another way to define SGα S random vari2 able is as follows, [41]. Definition 1. Suppose Z is a zero-mean d-dimensional Gaussian random vector with variance-covariance matrix d×d , P ∼ 2
S ( α2 , 1, (cos( π4α )) α , 0) is a positive stable random variable, inde-
d pendent of Z , and μ ∈ R√ is a vector of constants. The multivariate random vector X = μ + P Z is SGα S.
We denote the joint density function of X by f S d (x|α , , μ). SGα SMM is a convex combination of the SGα S distributions with the following mixture model
f (x|θ) =
G
π g f S d (x|α g , g , μ g ),
(2)
g =1
where G is the number of components of mixture model and θ = (π g , α g , g , μ g ) for g = 1, . . . , G, is the set of all model parameters. Furthermore, as a special case, the univariate mixture of the symmetric α -stable model is
h(x|θ ) =
G
π g f α g (x|0, γ g , δ g ),
(3)
g =1
here θ = (π g , α g , γ g , δ g ) for g = 1, . . . , G. In both cases 0 < π g < 1
G
and g =1 π g = 1. The main drawback of the model given in (2) is the difficulty of estimating the model’s parameters, especially the tail indices α g . Since there is no analytical form for the SGα S density function, the EM algorithm is not directly usable for estimating parameters of (2). Therefore, there are not many works on estimating the parameters of α -stable mixture models. In the univariate case, a finite mixture of α -stable distributions with application to impulsive data was introduced by [38]. Salas-Gonzalez et al. [39,40] introduced MCMC and Gibbs sampling to estimate the parameters of univariate α -stable mixture models in the Bayesian framework. The log-moment method to estimate the parameters of the mixture of symmetric α -stable distributions with application to video foreground detection is explained by [4]. Teimouri et al. [44] estimated the parameters of univariate symmetric α -stable mixture models via an EM algorithm. They called their method the ECME algorithm. They estimated the models’ parameters, except the tail and scale parameters, from a closed form. By centralizing data and divide on an exponential random variable and conditioning on the Weibull distribution, they obtained the likelihood function of the tail and scale parameters and found the estimates of them. SalasGonzalez et al. [40] have suggested extending their proposed Gibbs sampling for a mixture of symmetric α -stable distributions to the mixture of sub-Gaussian random vectors. To the authors’ knowledge, it has not been done yet. Teimouri et al. [46] in a recent preprint tried to extend their proposed EM for estimating symmetric stable mixture models to the mixture of sub-Gaussian random vectors using their proposed EM estimation of sub-Gaussian random vectors in [45]. The proposed EM is an extension of the methodology explained in [44] for the univariate symmetric stable mixture model. They have used a combination of conditional expectation-maximization [26] and SEM to estimate the model parameters of (2). To propose a new estimation method of the mixture of sub-Gaussian parameters, one can use the idea of Teimouri et al. [46] that uses a practical parameter estimation of sub-Gaussian random vectors and extending it to their mixture models. In each iteration of their approach, they make new groups of data and estimate the tail indices and dispersion matrices by sampling from the Weibull distribution. Since making groups of data in each iteration is based on the parameters estimated in the previous iteration and dispersion matrices in each cluster are estimated from the simulated data, we believe this procedure increases uncertainty in the estimation of the parameters. Zarei et al. [48] presented a stochastic-conditional EM algorithm to estimate the parameters of Sub-Gaussian α -Stable ClusterWeighted Model (SGα SCWM). CWM [16,21] is the regression mixture model with random covariates. In SGα SCWM both the response variable and the vector of covariate variables have the SGα S distribution and explained approach by [46] is used to estimate model parameters. SGα SCWM is suitable for clustering heterogeneous data into groups (or clusters) that have a regression relationship and outliers. Another way to define SGα SMM is to use the approximation of the SGα S density function. There are a few estimation methods for estimating the SGα S parameters. Kuruoglu et al. [24] proposed an estimation of the SGα S density using a Gaussian mixture model. Their idea can be extended in two ways in our problem. That is, we can approximate the SGα S distribution with a defined Gaussian mixture in each repetition of the EM algorithm or directly approximate the SGα S mixture using a Gaussian mixture model. Both of them are possible, however; since we have a mixture, which is a linear combination of another finite mixture model of the product
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
of the Gaussian and positive α -stable distributions, the computational cost and the model complexity can increase and should be controlled. Kring et al. [23] introduce two analytic estimators for sub-Gaussian parameters, and Nolan [29] suggests a more accurate estimation method in this problem. These interesting estimators cannot be used in the EM algorithm due to the lack of a rangepreserving property. Furthermore, the Monte Carlo EM method for multivariate stable distributions [33] (hereafter we refer to it as MCEM-RQ) allows maximum likelihood estimate of the SGα S parameters to be done through introducing auxiliary random variable, so-called the “augmented variable”, as in Buckle’s method [5]. MCEM-RQ then requires an expectation to be taken with respect to both auxiliary variable and positive stable random variable as latent variables. Their proposed approach draws samples for these two variables by running a Gibbs sampling at each iteration. Since MCEM-RQ is an iterative method, if we use it for SGα SMM, we have an additional loop in each iteration, and we must maximize posterior distribution by generated data from the Gibbs sampling and rejection sampling. It increases the model’s complexity and is not suitable for using it in SGα SMM. In addition, Godsill and Kuruoglu [17] have presented two methods based on stochastic and deterministic iterative methods for estimating linear models contain symmetric stable distribution. The first method is based on a Bayesian framework and uses the Gibbs sampling for estimating the interested parameters, and the second is based on using the EM algorithm. They have shown the efficiency of their algorithms in a linear time series model. They treat the positive stable random variable as a nuisance parameter which should be integrated to perform the correct inference about the model. However, they do not estimate the tail index which is critical for SGα SMM. In this work, we estimate the model parameters, except α s, from the analytical form, and estimate the tail index using PSEM based on pseudo-simulated data generated from conditional distribution f P | X ( p |x, θ) with a suitable rejection sampling algorithm. Details of the proposed method are given in the next section. Furthermore, since the model (3) is a special case of (2) and is obtained when we replace g with 2γ g2 and μ g with δ g , we explain the method of estimating parameters only for (2).
3.1. Pseudo-stochastic expectation-maximization for SGα SMM Let x1 , . . . , x N , p 1 , . . . , p N and z 1 , . . . , z N be the complete data corresponding to (2) where xi and p i , i = 1, . . . , N, are observed and missing data, respectively. Furthermore, z 1 , . . . , z N are the G-dimensional component labels in which zig = 1 if ith observation comes from gth component (zig = 0, otherwise). We have
X i | P i = p i , Z ig = 1 ∼ N (μ g , p i g ),
d
Xi = μ +
(5)
where
P i | Z ig = 1 ∼ S (
αg 2
, 1, (cos(
πα g 4
2
)) α g , 0),
(6)
for g = 1, . . . , G and i = 1, . . . , N. Because of the conditional structure of the complete data distributions (5) and (6), the complete data log-likelihood function is
lc (θ ) =C +
N G
zig log(π g ) +
g =1 i =1 N G 1
−
2
2
N G
zig log( f P ( p i |α g ))
g =1 i =1
zig log(| g |)
g =1 i =1
N G 1
−
zig
1 (xi − μ g )T − g (xi − μ g )
pi
g =1 i =1
,
where C is a constant and free from θ = (α g , g , μ g , π g ) for g = 1, . . . , G. Furthermore, f P (.) is the probability density function of positive α -stable random variable P . 3.2. E-step The E-step, on the (t + 1)th iteration, requires the calculation of
Q (θ |θ (t ) ) = E θ (t ) lc (θ )|x1 , . . . , x N
=C +
N G
(t )
e 1ig log(π g )
g =1 i =1
3. Parameter estimation in SGα SMM
+
For estimating the SGα SMM parameters, we use the new EM algorithm. The main idea underlying the EM algorithm is to find a hidden variable whose probability density function depends on the interested parameter with the property that maximizing the latent variable’s distribution is more accessible than the main variable’s distribution [35]. To reach this end, we explain the structure of the proposed EM algorithm based on Definition 1. Let X i , i = 1, . . . , N be a random sample from the joint density function f S d (x|α , , μ), where N is the sample size. According to Definition 1, X i for i = 1, . . . , N is related to a positive stable random variable and a Gaussian random vector as
3
N G
(t )
e 1ig E θ (t ) log ( f P ( p i |α g ))|xi
g =1 i =1
−
N G 1 (t ) e 1ig log(| g |) 2 g =1 i =1
−
N G 1 (t ) (t ) 1 e 1ig e pig (xi − μ g ) T − g (xi − μ g ). 2 g =1 i =1
(t )
To this end, we should calculate e 1ig = E θ (t ) Z ig |xi
E θ (t ) P i−1 |xi (t )
(7) (t )
and e pig = (t )
(t )
for g = 1, . . . , G, i = 1, . . . , N and θ (t ) = (π g , α g ,
(t )
g , μ g ). It is easily shown that Pi Gi,
(4)
where P i is a positive α -stable random variable with the tail index α , G i is a d-dimensional zero-mean Gaussian random vector with variance-covariance d × d matrix and μ is the location parameter d
for X and = denotes equality in distribution. When data have a latent source of heterogeneity that splits the data into groups (or clusters), we assume that (4) is held for each group. According to this assumption, we can compute the complete likelihood function to obtain the EM algorithm for SGα SMM.
(t )
π g(t ) f S d (xi |α (gt ) , (gt ) , μ(gt ) )
e 1ig = G
g =1
π g(t ) f S d (xi |α (gt ) , (gt ) , μ(gt ) )
.
Since there is no closed-form for stable densities, we calculated (t ) (t ) e 1ig and e pig numerically using functions in the STABLE package3 and Monte Carlo simulation. See Appendix A for more details.
3
Available at http://www.robustanalysis.com.
4
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671 (t )
(t )
(t ) (t )
g
(t )
(t +1)
(t ) (t ) i =1 e 1ig e pig (xi
=
(t +1) T )
− μ g )(xi − μ g N (t )
(t )
α g for g = 1,. . . , G. To this log ( f P ( p i |α g ))|xi with respect
end, we have to maximize E θ (t ) to α g . Since f P | X ( p i |xi ) depends on f P ( p i ), this conditional density function does not have a closed-form, and we cannot find an analytical form to estimate α g . Therefore, SEM can be an excellent choice to overcome this problem. In the global structure of the SEM algorithm for the mixture models, for eliminating tedious computation in the M-step of EM, stochastic simulation is used. In this case, the auxiliary function Q (θ |θ (t ) ) is approximated by the conditional distribution of the unobserved variable given the observed variables, [6]. Then a random sample is generated from this conditional distribution. A value of the parameter that maximizes the marginal density function of the unobserved variable is an estimate of the interested parameter. Here, we use SEM only to estimate α g , and the auxiliary function Q (θ|θ (t ) ) is as follows:
In practice, one can use the Bayesian Information Criterion (BIC; [42]) to select the best model and select the number of components. The formulation of this criterion is given by
BIC = 2 log( L (θˆ )) − m log( N ), where θˆ is the ML estimate of θ , log(θˆ ) is the maximized observed-data log-likelihood, and m is the overall number of free parameters in the model, which is equal to
log( f P ( p i |α g )) f P | X ( p i |xi , α g )dp i .
According to SEM structure [35], Qˆ (α g |α g ) = log f P ( p i |α g ), (t )
where
(t )
pi
generated (t )
from
(t )
(t )
f P | X ( p i | xi , α g )
(t )
(t +1)
αg
and
=
(G − 1) + G + Gd + Gd (d + 1) /2,
(t )
argmax f P ( p i |α g ). Since it is impossible to generate samples from f P | X (.|.), samples are generated from this density with rejection sampling, [17, 40]. Our proposed rejection sampling incomes in the following. In the tth iteration for gth component, we have (t )
(t )
(t )
(t )
(t )
(t )
(t )
(t )
(t )
4. A simulation study
(t )
f ( x i | p i , α g , g , μ g ) f P ( p i |α g )
(t )
f ( p i | xi , α g , g , μ g ) =
≤
(t )
f S d ( x i |α g , g , μ g ) (t )
max f (xi | p i , α g , g , μ g ) f P ( p i |α g ) pi
(t )
(t )
.
(t )
f S d ( x i |α g , g , μ g )
It can be shown that:
(t )
(t )
(t )
max f (xi | p i , α g , g , μ g ) = pi
exp{− d2 }
d2
d
(t )
(t ) −1
( x i −μ g ) T g d 2
(8)
where the first term is related to the mixture weights, the second term to the tail indices, the third term to the components of the location parameter, and the fourth term to the components of the dispersion matrix. The highest BIC is preferred.
αg
(t )
(t )
3.4. Selecting the number of mixture components
(t )
Q (α g |α g ) =
(t )
f ( p |x, α g , g , μ g ). Furthermore, we use the maximum likelihood method [28] to estimate α g in each iteration. In the next iteration, the updated parameters are used to calcu(t ) (t ) late the values of e 1ig , e pig for i = 1, . . . , N and g = 1, . . . , G in the E-step, and model parameters are updated with these new expectation values. This loop is repeated T times, and the average of the estimated values of each parameter after burn-in time T 0 is obtained as the final estimation of each parameter in the model (2) and (3). As a stopping rule, we draw the graph of the differences between log-likelihood function in iteration (t + 1) and t and for T times, according to the trend of this curve, we can choose the best T 0 (see Subsection 4.3 for more details). According to our observations, T = 50 and T 0 = 30 are the optimal choices in using the PSEM algorithm in most situations.
.
3.3.2. Second part In the second part, we update
(t )
that maximizes f P ( p |α g ) is an estimate of α g Since in the positive stable, the tail index is half of the tail index in the conditional distribution, the estimated value of (t ) the tail index form f P ( p |α g ) should be multiplied by 2 for
i =1 e 1ig
(t )
α (gt ) (t ) in f ( p |x, α g ).
for g = 1, . . . , G. Concerning the SEM structure, the value of
,
i =1 e 1ig e pig
N
is denoted by M, then
We repeat the above algorithm in iteration t until we have N (t ) (t ) (t ) pseudo-samples from the conditional density f ( p |x, α g , g , μ g )
and (t +1)
f S d ( x i |α g , g , μ g ) ≤ M f P ( p i | (gt ) ) for
(t )
N
= N
(t )
with the tail index α g . 2. Draw a sample, called u, from the U (0, 1). 3. If u < M accept p ∗j , otherwise go to 1.
i =1
μg
(t )
(t )
1. Draw a sample, called p ∗j , from the positive stable distribution
N 1 (t ) e 1ig , N
(t ) (t ) i =1 e 1ig e pig xi
(t )
(t )
f ( p i | xi , α g , g , μ g ) α each p i . Therefore, we propose our rejection sampling as follows to generate pseudo(t ) (t ) (t ) samples from f ( p i |xi , α g , g , μ g ).
3.3.1. First part In the first part, with taking the derivative of (7) with respect to π g , μ g and g (Appendix B), these parameters are updated for gth component in the (t + 1)th iteration as follows:
(t +1)
(t )
pi
So, if the quotient
The M-step of the PSEM algorithm, on the same iteration, has two parts.
π g(t +1) =
(t )
max f (xi | p i ,α g , g ,μ g )
3.3. M-step
(t )
(2π ) | g |
1 2
(t )
( x i −μ g )
.
In this section, we investigate the performance of the PSEM algorithm in estimating SGα SMM parameters and clustering data, through a simulation study, we consider two situations: univariate case, the 3-component mixture of symmetric α -stable distributions, and in the multivariate case, 3-component SGα SMM. In both cases, for determining the initial values, we use the clustering results based on a k-median method with k = 3 (G = 3 based on proposed notation) groups and estimate initial values of the parameters to start PSEM using the STABLE package. In fact, in each group, SGα S distribution is fitted, and estimated parameter values are used as initial values for parameters in the corresponding component.
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
5
Table 1 True parameter values along with the means and SDs of the parameter estimates for a 3-component symmetric mixture of α -stable distribution from the 100 runs for the PSEM method, according to Simulation 1 and 2. Parameter
Simulation 1
Simulation 2
True values
Estimated
SD
True values
Estimated
SD
α1 α2 α3
1.4 1.4 1.4
1.31 1.64 1.55
0.17 0.12 0.13
2 2 2
1.85 1.88 1.90
0.12 0.11 0.11
γ1 γ2 γ3
0.2 0.5 0.6
0.23 0.49 0.62
0.05 0.14 0.03
0.57 0.57 0.28
0.57 0.56 0.28
0.03 0.03 0.01
δ1 δ2 δ3
-2 0 3
-2.11 -0.05 2.99
0.31 0.21 0.04
-3 0 2.5
-2.97 0.01 2.49
0.05 0.06 0.02
π1 π2 π3
0.2 0.3 0.5
0.210 0.283 0.508
0.06 0.06 0.01
0.4 0.3 0.3
0.397 0.302 0.301
0.02 0.01 0.02
Table 2 True parameter values along with the means and SDs of the values of estimated parameters for 3-component SGα SMM from the 50 runs. Parameter
True values
Mean estimates
SDs
π1 π2 π3 α1 α2 α3 μ1 μ2 μ3
0.333
0.339
0.014
0.333
0.331
0.011
0.333
0.329
0.015
1.450
1.580
0.092
1.250
1.341
0.143
1.800
1.716
0.211
(0, 5) (5, 0) (−
5, 0)
(0.051, 5.018) (5.000, −0.014) (−4.960, −0.013
)
(0.039, 0.026) (0.025, 0.015) ( 0.037, 0.021) 0.161 0.049 0.049 0.029
0.137 0.038 0.038 0.086
0.111 0.022 0.022 0.043
2
1
2
3
0.5
0.5 0 .5 1
0
0
1
− 0.5 − 0.5 0 .5 2
4.1. Univariate case Here, we evaluate the PSEM algorithm, with artificial data in two simulation studies. We select the true values of the parameters as [40]. Simulation 1: We iterate the proposed method 100 times for N = 1000 samples with the following 3-component symmetric 1.4-stable mixture model
h(x) = 0.2 f 1.4 (x|0, 0.2, −2) + 0.3 f 1.4 (x|0, 0.5, 0)
+ 0.5 f 1.4 (x|0, 0.6, 3). Simulation 2: For more comparison, and since GMM is a special case of the symmetric α -stable mixture model, we also do Simulation 3 in [40]. One hundred data sets of size N = 1000 are generated by following GMM
h(x) = 0.4 f 2 (x|0, 0.57, −3) + 0.3 f 2 (x|0, 0.57, 0)
+ 0.3 f 2 (x|0, 0.28, 2.5). In each simulation, the parameters are estimated from PSEM. The results of average estimated values for each parameter with the Standard Deviations (SDs) of the estimated values are represented in Table 1. Generally speaking, comparing estimated values with true values in the simulation study shows that the proposed method
2.023
0.510
0.510
0.513
0.925 0.011
0.011 1.007
−0.495 −0.495 0.505 2.042
has an excellent performance in estimating location parameters, weights, scales, and gives relatively good results for the tail indices. Furthermore, the SD values are small and indicate that the proposed method is robust during the repetition of the simulation. Comparing the proposed method with the fully Bayesian methodology that has been explained based on Gibbs sampling by [40] (that is referred to the Bayes method), shows that the Bayes method is more accurate than the proposed method in estimating the tail indices, but the two methods have approximately the same precision in estimating other parameters. It should be noted, implementing Markov chain Monte Carlo algorithms can be very complicated for α -stable distributions, due to the non-analytic distribution function expressions, and it is very hard to develop this model to the multivariate case. 4.2. Multivariate case: SGα SMM 4.2.1. Parameter estimation In this case, to investigate the performance of the PSEM algorithm in the multivariate case, we generate N = 1500 samples from 3-component SGα SMM with true value parameters, which are reported in Table 2 and repeat this process 50 times. We chose the true values of the parameters (except α s) as [31]. Table 2 shows the mean and SD of the estimated values for each parameter.
6
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
Fig. 2. Scatter plot of the 3-component SGα SMM simulated data based on mentioned true values in Table 2 contaminated with noisy data where the added uniform noise points are denoted by •.
Fig. 1. Comparing SGα S, t, and Gaussian MMs via ARI in 50 runs.
Similar to the univariate case, comparing estimated values with true values in our simulation study shows that PSEM has a very good performance in estimating the location parameters, weights, dispersion matrix, and provides relatively good results for tail indices. Furthermore, the SD values are small and indicate again that the proposed algorithm is robust during the repetition of the simulation. 4.2.2. Clustering simulated data We also study the feasibility of using SGα SMM, in comparison with t MM and GMM, in clustering simulated data. We compute the Adjusted Rand Index (ARI; [20]) for each simulation as a measure of performance for all the competing models. We recall that the expected value of ARI is 0, and value 1 shows a perfect classification. Fig. 1 shows the boxplots of the estimated ARI values. The average of ARI values for the GMM, t MM, and SGα SMM are 0.610, 0.901, and 0.928, respectively. These results confirm that SGα SMM is significantly better than GMM and t MM in clustering data, which have SGα S distribution in each component. 4.2.3. Sensitivity analysis The third analysis aims to evaluate the impact of unstructured/noisy data on clustering GMM, t MM, and SGα SMM. With this end, similar to [32], fifty noise points from a uniform distribution in the range −50 to 50 are also added to each variable of simulated data. Therefore, the noisy data fall inside the square that contains most of the simulated data. Fig. 2 shows the contaminated data, in which the bullets are denoting uniform noise points. The quality of competing models (only for the original simulated data, i.e., without noisy data) is evaluated based on ARI. The average ARI values for GMM, t MM, and SGα SMM are 0.523, 0.668, and 0.919, respectively. The high level of ARI index shows that the low number of noisy points have little effect on the accuracy of SGα SMM in clustering noisy symmetric α -stable data. 4.3. Determining the burn-in time and stopping criterion Differences between log-likelihood functions in iteration (t + 1) and t for t = 0, . . . , 50, which t = 0 is based on initial values are shown in Fig. 3. Fig. 3 shows that after the 12th iteration, the
Fig. 3. Determining the burn-in time T 0 . The red vertical dashed line shows the starting of graph convergence and is a good choice for the burn-in time. Since SEM (consequently PSEM) has a lesser tendency to get trapped in local maxima, the graph is almost convergent. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)
graph has little changes, and the algorithm is converged. So we can choose T 0 = 12. Noted that T should be determined before running the program, and Fig. 3 shows T = 30 is even enough. Furthermore, Fig. 3 indicates PSEM as SEM algorithm inherits a lesser tendency to getting trapped in local maxima, yielding improved global convergence properties, and the graph is almost convergent. Since PSEM, for all parameters (except α s) has the closed-form, differences between log-likelihood function in successive repetitions approximately converge to zero. In univariate case α s have more effect on this convergence, in this case, one can draw a graph of differences between auxiliary function (7) regardless of N G
(t )
e 1ig E θ (t ) log( f P ( p i |α g ))|xi ),
g =1 i =1
and determine T and T 0 based on the trend of this graph. Since in the PSEM algorithm, pseudo-samples are generated just for α s, we can say that PSEM typically mimic the SEM’s behavior, i.e., under mild regularity conditioned [10], SEM (as result PSEM) like the EM algorithm converges to the maximum of the likelihood function, at least to a local maximum. Fig. 4 shows estimated values for each α for one of the runs of the PSEM algorithm related to
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
Fig. 4. Estimated values of α ’s doesn’t converge to unique values. Therefore, the pointwise convergence averaging the estimated over iterations once stationary approximately has been reached. Table 3 Estimated parameters with the PSEM algorithm for three different real data sets. Parameter
Estimated values Enzyme data
Acidity data
Galaxy data
α1 α2
1.38 1.86
1.63 1.58
1.82 1.07
γ1 γ2
0.05 0.34
0.23 0.38
0.29 1.58
δ1 δ2
0.18 1.25
4.31 6.27
9.68 21.11
π1 π2
0.61 0.39
0.59 0.41
0.08 0.92
the performed simulation in Subsection 4.2. In the SEM algorithm, estimated values in different repetitions are homogeneous Markov chain, which under mild regularity conditions, converge to a stationary density function [35]. Therefore, in the PSEM algorithm like SEM, for a pointwise convergence averaging the estimated over iterations once stationary approximately has been reached (after burning time) can be used. 5. Real data analysis In this section, real univariate and multivariate data sets are used to evaluate the performance of the PSEM method. A comparison with the Bayes method, GMM, and t MM is also provided. 5.1. Univariate real data The first analysis uses the Enzyme, Acidity, and Galaxy data sets included in the mixAK package [1]. These data sets are well-known in the literature using mixture models, and the authors of [40] use them for the evaluation of the Bayes method. The Enzyme data set [34,3] come from a survey of n = 245 unrelated individuals and involve the distribution of enzymatic activity in the blood. The Acidity data set [34,8,9] involves a log scale acidity index measured in a sample of 155 lakes in the Northeastern states of the USA. The Galaxy data set [34,36] includes velocities of 82 distant galaxies, diverging from the Milky Way. Table 3 shows the results of the estimated parameters from the PSEM algorithm. For comparing the proposed method with the Bayes method, we use the estimated values of the PSEM method and the Bayesian method (explained in [40]) to cluster these real data sets. Since the
7
Fig. 5. The predicted SGα S mixture densities for each data with data densities’ curves.
Table 4 Values of silhouette and Dunn indices in clustering the univariate real data sets with the PSEM and the Bayes methods. The bold numbers highlight the best performance for each index.
Enzyme Acidity Galaxy
Silhouette Index
Dunn Index
PSEM
Bayes
PSEM
Bayes
0.757 0.728 0.752
0.727 0.768 0.585
0.024 0.033 0.312
0.058 0.069 0.028
real labels of data are unknown, we use Dunn [12] and silhouette [37] indices for comparing clustering results. The higher values of these indices show a better predictive ability of clustering. Table 4 shows the values of Dunn and silhouette indices for PSEM and the Bayes methods. It should be noted, the best number of components for clustering the Enzyme, Acidity, and Galaxy data sets according to the Bayes method are 2, 2, and 3, respectively, [40]. These numbers, according to the PSEM method, are 2 for all data sets. The predicted SGα S mixture density functions for each data with data densities’ curves are drawn in Fig. 5. Fig. 5 shows SGα SMM can accurately model these data even when the sample size is not so large. Table 4 shows the values of the silhouette and Dunn indices. We see that the Bayes method has a little better efficiency than PSEM in clustering Acidity data. However, for Enzyme data, according to the silhouette index, PSEM is more accurate than Bayes, and according to the Dunn index, Bayes method is more accurate than PSEM. Furthermore, the PSEM method has better performance in clustering Galaxy data set, while the number of components is less than the Bayes method. However, as mentioned before, the proposed method can be faster than the Bayes method, and can be used for the mixture of multivariate symmetric α -stable distributions, i.e., SGα SMM. To consider the performance of the BIC in selecting the true number of components and also the effect of a false number of components in the resulting clustering structure, we compute the BIC as well as the silhouette indices in clustering Galaxy data. Table 5 shows that the best number of components for SGα S MM, according to BIC and silhouette, is G = 2. Furthermore, the clustering performance, as measured by the silhouette index, decreases when we choose incorrectly the number of components.
8
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
Table 5 Selecting the true number of components with BIC and silhouette (G ∈ {2, 3, 4, 5}) for Galaxy data. The bold numbers highlight the best values for each index.
BIC Silhouette
G =2
G =3
G =4
G =5
-454.65 0.752
-474.27 0.597
-464.66 0.654
-477.51 0.512
Table 6 A bivariate density estimate for the Faithful data set according to 2 and 3-component SGα SMM and 3-component GMM. Parameter
3-com. GMM
3-com. SGα SMM
2-com. SGα SMM
π1 π2 π3 α1 α2 α3 μ1 μ2 μ3
0.46
0.64
0.649
0.36
0.35
0.351
0.18
0.01
-
-
1.67
1.67
-
1.52
1.71
-
1.80
-
(4.47, 80.89) T (2.04, 54.49) T ( 3.81, 77.65) T
(4.34, 80.21) T (1.97, 53.38) T ( 3.45, 78.00) T
(4.32, 80.02) T (2.00, 54.14) T
1
2
3
0.08
0.47
0.47
33.74
0.08
0.47
0.47
33.74
0.08
0.47
0.47
33.74
5.2. Multivariate real data 5.2.1. Faithful data set The second analysis uses the Faithful data set [2] concerning the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA. This data set contains 272 instances of 2 variables, eruption time and waiting time to next eruption (both in minutes). The Faithful data set is useful data in the literature related to model-based clustering [18] and GMMs. The model-based clustering with GMM analysis of this data set by mclust algorithm [13] shows that the best model, based on BIC, is a 3-component mixture with equal volume, shape, and orientation denoted by EEE, [13]. However, our model selection based on BIC is 2-component SGα SMM. For more comparison, we report the results of the parameter estimation of 2 and 3-component SGα SMM and also 3-component GMM in Table 6. Considering the mean and variance-covariance matrix of 3-component GMM in Table 6 indicates that there is no significant difference between mean vectors of the first and third components, and our next analysis shows that merging these two components improves the clustering results. Since the labels of data are unknown, we use Dunn and silhouette indices for comparing the quality of clustering with different models given in Table 7. According to the values of the BIC, silhouette, and Dunn indices, the best selection of the cluster number in SGα SMM is 2. Silhouette and Dunn indices also show better clustering results for both 2 and 3-component SGα SMM rather than 3-component GMM, especially in the 2-component case. A fewer number of mixture components is very important to overcome overfitting in complex models and to gain a simpler and more interpretable model. 5.2.2. Uranium exploration data set The second analysis of multivariate data uses the uranium exploration data included in the copula R package [19] that were previously analyzed by [7,15,47]. The data come from the hy-
0.11
0.42
0.42
29.89
0.05
0.33
0.33
32.05
0.48
0.73
0.73
118.29
-
0.16
0.77
0.77 32.73 0.03
0.37
0.37 29.12
-
Table 7 BIC, silhouette, and Dunn indices for comparing SGα SMM and GMM in clustering the Faithful data set. The bold numbers highlight the best model for each index. Mixture-type
m
BIC
Silhouette
Dunn
3-com. Gaussian (EEE) 2-com. SGα S 3-com. SGα S
13 20
-2437.49 -2512.11
0.296 0.710 0.468
0.003 0.034 0.002
Table 8 Silhouette and Dunn indices for comparing SGα SMM, t MM and GMM in clustering the Uranium exploration data set. The bold numbers highlight the best performance for different mixture models. Mixture-type
Silhouette
Dunn
SGα SMM t MM GMM
0.404 0.394 0.250
0.038 0.036 0.018
drogeochemical stream and sediment reconnaissance project sponsored by the U.S. Department of Energy. Although, this research involves of log concentrations of seven chemical measurements from N = 655 petroleum samples collected near Grand Junction, Colorado, the following analysis, only concerns the five variables: U , K , C s, Sc, and T i. These variables have some outlier points that are suitable for illustrative purposes. Table 8 reports the values of the indices for each mixture model and indicates that SGα SMM has a better performance than the other two mixture models. Furthermore, t MM has better performance than GMM. 6. Conclusions In this paper, the symmetric α -stable distributions in univariate and multivariate cases are used to make SGα SMM by considering the fixed number of the mixture components. The PSEM algorithm, which is a combination of SEM and EM algorithms, was
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
used to estimate the parameters under the maximum likelihood approach. The simulation and real data analysis revealed that the proposed method had a high level of precision in estimating model parameters and showed better performance, compared to GMM and t MM, in clustering α -stable symmetric multivariate data. As shown in the analysis of the Galaxy and Faithful data set, similar to the wider class of mixture models, one can use the information criterion, such as BIC, to select the correct number of mixture components. Concerning (8), in the multivariate model, the number of α s in comparison with the number of other parameters is small. Because the algorithm estimates all the parameters except α s very well, we expect that the clustering of α -stable data using the proposed algorithm has been very good accuracy. However, when M is too small (especially in high-dimensional settings), the computational cost increases. In these cases, one can add a small amount to M. The SGα SMM is more realistic than a GMM. Because the SGα S distribution captures the tail dependencies between variables and allows more efficient inferences rather than the Gaussian distribution. However, the parameter α g in each mixture component needs to be calculated by numerical methods, which has a negative computational impact, in terms of computational times, on the PSEM algorithm. As mentioned above and since rejection sampling works very well only for low-dimensional settings [22], as possible future work, one can extend the SGα SMM to the high-dimensional case using dimension reduction methods. Furthermore, the codes are accessible in GitHub with the following address: https://
github.com/adelmohammadpour/PSEM/blob/master/ psem.r.
B
i =1
B
−d/2−1
pi
i =1
−d/2
pi
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
exp{
d
√
stable random variable. To compute E 1 = E P −1 |x, α , , μ , we should compute
f ( p | x) =
f ( x, p ) f ( x)
= ∞ 0
f ( p ) f ( x| p ) f ( p ) f (x| p )dp
0
p −d/2−1 f P ( p |α ) exp{
.
π g ( g = 1, . . . , G − 1) we
∂ Q (θ|θ (t ) ) ∂ πg =
G N ∂ (t ) e 1ig log(π g ) ∂ πg g =1
i =1
=
G −1 ∂
N
∂ πg
i =1
=
(
i =1
(t )
g =1
(t )
N
(t )
e 1ig log(π g ) + e 1iG log(1 − π1 − · · · − πG −1 )
(t )
e 1ig
−
πg
e 1iG
πG
g = 1, . . . , G − 1.
),
Then, equating to zero and multiplying throughout by that N
N 1 (t ) e 1iG = 0,
(t )
e 1ig − π g
πG
π g , we have
g = 1, . . . , G − 1.
(9)
i =1
As (9) also holds for g = G, we can sum over g = 1, . . . , G and we get N G
G 1
(t )
e 1ig −
πG
G
(t ) g =1 e 1ig
πg
g =1
= 1 and
N
(t )
e 1iG = N −
i =1
G
g =1
N 1 (t ) e 1iG ,
πG
i =1
π g = 1. Then it results
N 1 (t ) e 1iG = N .
πG
(10)
i =1
Substituting (10) in (9), we get the estimate of the weight the (t + 1)th iteration as
π g(t +1) =
π g on
N 1 (t ) e 1ig . N i =1
Furthermore, on differentiating (7) with respect to the vector derivative rules, we obtain
μ g based on
1 (t ) (t ) −1 T 1 e 1ig e pig ( g ) + − (xi − μ g ). g 2 N
.
i =1
Since X | P = p ∼ N (μ, p ), we have
∞
}
On differentiating (7) with respect to obtain
since
Suppose X ∼ f S d (x|α , , μ). Therefore, X = μ + P Z where Z is a d-variate zero-mean Gaussian random vector with variance2 covariance matrix and P ∼ S ( α2 , 1, (cos( π4α )) α , 0) is a positive
2p i
}
Appendix B
Acknowledgments
Appendix A
2p i −(x−μ) T −1 (x−μ)
(t )
g =1 i =1
The authors would like to thank the editor, an associate editor, and anonymous referees for their helpful comments and for careful reading that significantly improved the article.
−(x−μ) T −1 (x−μ)
We take B = 2000 and update e pig , in the iteration t for i = 1, . . . , N and g = 1, . . . , G.
i =1
Declaration of competing interest
exp{
9
−(x−μ) T −1 (x−μ) 2p
}dp
E1 = . ∞ −d/2 −(x−μ) T −1 (x−μ) p f P ( p |α ) exp{ }dp 0 2p For approximating E 1 , we use a Monte Carlo method by generating B samples form the density function of P and computing the elements of under integral. If p 1 , . . . , p B be a random sample from f P ( p |α ), then the approximate value of E 1 is
1 Since − g is a symmetric matrix, then, equating to zero and multiplying throughout by g , we have that
(t +1)
μg
N
(t ) (t ) i =1 e 1ig e pig xi
= N
(t ) (t ) i =1 e 1ig e pig
.
In addition, for maximizing (7) with respect to g first, we write equation (7) as follows,
10
S. Zarei, A. Mohammadpour / Digital Signal Processing 99 (2020) 102671
Q (θ |θ (t ) ) = C +
N G 1 (t ) 1 e 1ig log(|− g |) 2 g =1 i =1
−
1 2
N G
1 e 1ig e pig (xi − μ g ) T − g (xi − μ g ),
(t ) (t )
(11)
g =1 i =1
where C is constant and free from g . On differentiating (11) with 1 respect to − g based on the matrix derivative rules, we obtain (t +1)
g
N =
(t ) (t ) i =1 e 1ig e pig (xi
(t +1)
(t +1) T )
− μ g )(xi − μ g N (t )
.
i =1 e 1ig
References [1] K. Arnost, K. Lenka, Capabilities of R package mixAK for clustering based on multivariate continuous and discrete longitudinal data, J. Stat. Softw. 59 (12) (2014) 1–38. [2] A. Azzalini, A.W. Bowman, A look at some data on the Old Faithful geyser, Appl. Stat. 39 (3) (1990) 357–365. [3] Y.C. Bechtel, C. Bonaiti-Pellie, N. Poisson, J. Magnette, P.R. Bechtel, A population and family study of n-acetyltransferase using caffeine urinary metabolites, Clin. Pharmacol. Ther. 54 (2) (1993) 134–141. [4] H. Bhaskar, L. Mihaylova, A. Achim, Video foreground detection based on symmetric alpha-stable mixture models, IEEE Trans. Circuits Syst. Video Technol. 20 (8) (2010) 1133–1138. [5] D.J. Buckle, Bayesian inference for stable distribution, J. Am. Stat. Assoc. 90 (430) (1995) 605–613. [6] G. Celeux, J. Diebolt, The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem, Comput. Stat. 2 (1) (1985) 73–82. [7] R.D. Cook, M.E. Johnson, A family of distributions for modelling non-elliptically symmetric multivariate data, J. R. Stat. Soc., Ser. B, Methodol. 43 (2) (1981) 210–218. [8] S.L. Crawford, An application of the Laplace method to finite mixture distributions, J. Am. Stat. Assoc. 89 (425) (1994) 259–267. [9] S.L. Crawford, M.H. DeGroot, J.B. Kadane, M.J. Small, Modeling lake chemistry distributions: approximate Bayesian methods for estimating a finite mixture model, Technometrics 34 (4) (1992) 441–453. [10] B. Delyon, M. Lavielle, E. Moulines, Convergence of a stochastic approximation version of the EM algorithm, Ann. Stat. (1) (1999) 94–128. [11] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc., Ser. B, Methodol. 39 (1) (1977) 1–38. [12] J.C. Dunn, Well-separated clusters and optimal fuzzy partitions, J. Cybern. 4 (1) (1974) 95–104. [13] C. Fraley, A.E. Raftery, Model-based clustering, discriminant analysis, and density estimation, J. Am. Stat. Assoc. 97 (458) (2002) 611–631. [14] S. Frühwirth-Schnatter, Finite Mixture and Markov Switching Models, Springer, New York, 2006. [15] C. Genest, L.P. Rivest, Statistical inference procedures for bivariate Archimedean copulas, J. Am. Stat. Assoc. 88 (423) (1993) 1034–1043. [16] N. Gershenfeld, Nonlinear inference and cluster-weighted modeling, Ann. N.Y. Acad. Sci. 808 (1) (1997) 18–24. [17] S. Godsill, E.E. Kuruoglu, Bayesian inference for time series with heavy-tailed symmetric α -stable noise processes, in: Proc. Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics, 1999. [18] M.S. Handcock, A.E. Raftery, J.M. Tantrum, Model base clustering for social networks, J. R. Stat. Soc., Ser. A, Stat. Soc. 170 (2) (2007) 301–354. [19] M. Hofert, I. Kojadinovic, M. Maechler, J. Yan, Copula: multivariate dependence with copulas, R package version 0.999-18, https://CRAN.R-project.org/package= copula, 2017. [20] L. Hubert, P. Arabie, Comparing partitions, J. Classif. 2 (1) (1985) 193–218. [21] S. Ingrassia, S.C. Minotti, A. Punzo, Model-based clustering via linear clusterweighted models, Comput. Stat. Data Anal. 71 (2014) 159–182. [22] W.S. Jank, The EM algorithm, its stochastic implementation and global optimization: some challenges and opportunities for operations research, in: Topics in Modeling, Optimization, and Decision Technologies: Honoring Saul Gass Contributions to Operations Research, Springer Verlag, New York, 2006. [23] S. Kring, S.T. Rachev, M. Höchstötter, F.J. Fabozzi, Estimation of α -stable subGaussian distributions for asset returns, Risk Assess. 12 (2) (2009) 111–152.
[24] E.E. Kuruoglu, C. Molina, W.J. Fitzgerald, Approximation of α -stable probability densities using finite Gaussian mixtures, in: In 9th European Signal Processing Conference (EUSIPCO 1998), 9th European, 1998, pp. 1–4. [25] G.J. McLachlan, D. Peel, Finite Mixture Models, John Wiley & Sons, 2004. [26] X.L. Meng, D.B. Rubin, Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika 80 (2) (1993) 267–278. [27] J.P. Nolan, Parameterizations and modes of stable distributions, Stat. Probab. Lett. 38 (2) (1998) 187–195. [28] J.P. Nolan, Maximum likelihood estimation and diagnostics for stable distributions, in: Lévy Processes, Birkhauser, Boston, MA, 2013, pp. 379–400. [29] J.P. Nolan, Multivariate elliptically contoured stable distributions: theory and estimation, Comput. Stat. 28 (5) (2013) 2067–2089. [30] J.P. Nolan, Stable Distributions: Models for Heavy-Tailed Data, Birkhauser, Boston, 2016, unfinished manuscript, Chapter 1 online at http://fs2.american. edu/jpnolan/www/stable/stable.html. [31] D. Peel, G.J. McLachlan, Robust mixture modelling using the t distribution, Stat. Comput. 10 (4) (2000) 339–348. [32] A. Punzo, P.D. McNicholas, Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model, J. Classif. 34 (2) (2000) 249–293. [33] N. Ravishanker, Z. Qiou, Monte Carlo EM estimation for multivariate stable distributions, Stat. Probab. Lett. 45 (4) (1999) 335–340. [34] S. Richardson, P.J. Green, On Bayesian analysis of mixtures with an unknown number of components, J. R. Stat. Soc. B 59 (4) (1997) 731–792. [35] A. Roche, EM algorithm and variants: an informal tutorial, arXiv preprint, arXiv: 1105.1476, 2011. [36] K. Roeder, Density estimation with confidence sets exemplified by superclusters and voids in the galaxies, J. Am. Stat. Assoc. 85 (411) (1990) 617–624. [37] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65. [38] D. Salas-Gonzalez, E.E. Kuruoglu, D.P. Ruiz, Estimation of mixtures of symmetric alpha-stable distributions with an unknown number of components, in: In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, vol. 5, IEEE, 2006, p. V. [39] D. Salas-Gonzalez, E.E. Kuruoglu, D.P. Ruiz, Finite mixture of α -stable distributions, Digit. Signal Process. 19 (2) (2009) 250–264. [40] D. Salas-Gonzalez, E.E. Kuruoglu, D.P. Ruiz, Modelling with mixture of symmetric stable distributions using Gibbs sampling, Signal Process. 90 (3) (2010) 774–783. [41] G. Samorodnitsky, M.S. Taqqu, Stable Non-Gaussian Random Processes, Chapman and Hall, New York, 1994. [42] G. Schwarz, Estimating the dimension of a model, Ann. Stat. 6 (2) (1978) 461–464. [43] M. Shao, C.L. Nikias, Signal processing with fractional lower order moments: stable processes and their applications, Proc. IEEE 81 (7) (1993) 986–1010. [44] M. Teimouri, S. Rezakhah, A. Mohammdpour, EM algorithm for symmetric stable mixture model, Commun. Stat., Simul. Comput. 47 (2) (2018) 582–604. [45] M. Teimouri, S. Rezakhah, A. Mohammdpour, Parameter estimation using the EM algorithm for symmetric stable random variables and sub-Gaussian random vectors, J. Stat. Theory Appl. 17 (3) (2018) 439–461. [46] M. Teimouri, S. Rezakhah, A. Mohammdpour, Robust mixture modelling using sub-Gaussian stable distribution, arXiv preprint, arXiv:1701.06749, 2017. [47] W.L. Wang, T.I. Lin, Maximum likelihood inference for the multivariate t mixture model, J. Multivar. Anal. 149 (2016) 54–64. [48] S. Zarei, A. Mohammadpour, S. Ingrassia, A. Punzo, On the use of the subGaussian α -stable distribution in the cluster-weighted model, Iran. J. Sci. Technol., Trans. A, Sci. (2018) 1–11.
Shaho Zarei received his Ph.D. in statistics in 2018. He was a visiting Ph.D. scholar at Catania University in winter and spring 2017. He is currently an assistant professor of statistics at the University of Kurdistan. In spring 2020, he will be visiting professor at Sapienza University. His research interests are clustering and classification, especially in high dimensional data, applications of alpha-stable distributions and small area estimation. Adel Mohammadpour received his B.Sc., M.Sc., and Ph.D. in Statistics from Shiraz University in 1993, 1995 and 2000, respectively. Since 2000, he has been a Faculty Member with the Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic). He was post-doc in Laboratoire des signaux et systems at Supélec (2003-2006). His current research interests include statistical inference for heavy-tailed and large-scale data.