Pseudo-Marginal MCMC for Parameter Estimation in α-Stable Distributions*

Pseudo-Marginal MCMC for Parameter Estimation in α-Stable Distributions*

17th IFAC Symposium on SystemCenter Identification Beijing International Convention October 19-21, 2015. Beijing, China 17th IFAC Symposium on System ...

754KB Sizes 0 Downloads 51 Views

17th IFAC Symposium on SystemCenter Identification Beijing International Convention October 19-21, 2015. Beijing, China 17th IFAC Symposium on System Identification Beijing International Center October 19-21, 2015. Convention Beijing, China 17th IFAC Symposium on System Identification Available online at www.sciencedirect.com Beijing International Center October 19-21, 2015. Convention Beijing, China Beijing International Convention Center October 19-21, 2015. Beijing, China October 19-21, 2015. Beijing, China

ScienceDirect

Pseudo-Marginal IFAC-PapersOnLine 48-28 MCMC (2015) 472–477 for Pseudo-Marginal MCMC for Parameter Parameter �� Pseudo-Marginal MCMCDistributions for Parameter Estimation in α-Stable Pseudo-Marginal MCMC for Parameter Estimation in α-Stable Distributions Pseudo-Marginal MCMCDistributions for Parameter � Estimation in α-Stable � � Estimation in α-Stable Distributions Marina Riabiz, Fredrik Lindsten,Distributions and Simon Godsill Estimation in α-Stable

Marina Riabiz, Fredrik Lindsten, and Simon Godsill Marina Riabiz, Fredrik Lindsten, and Simon Godsill Marina Riabiz, Fredrik Lindsten, and Godsill Signal Processing Communications Marina Riabiz,and Fredrik Lindsten,Laboratory, and Simon SimonEngineering Godsill Signal Processing and Communications Laboratory, Engineering Department, University of Cambridge, Cambridge, CB21PZ, Signal Processing and Communications Laboratory, Engineering Department, University of Cambridge, Cambridge, CB21PZ, UK, UK, (e-mail: [email protected], [email protected], [email protected]). Signal and Laboratory, Engineering Department, University of Cambridge, Cambridge, CB21PZ, UK, Signal Processing Processing and Communications Communications Laboratory, Engineering (e-mail: [email protected], [email protected], [email protected]). Department, University Cambridge, CB21PZ, (e-mail: [email protected], [email protected], [email protected]). Department, University of of Cambridge, Cambridge, Cambridge, CB21PZ, UK, UK, (e-mail: [email protected], [email protected], [email protected]). (e-mail: [email protected], [email protected], [email protected]). Abstract: The α-stable distribution is very useful for modelling data with Abstract: The α-stable distribution is very useful for modelling data with extreme extreme values values and skewed behaviour. The distribution is governed by two key parameters, tail thickness and Abstract: α-stable distribution usefulby fortwo modelling data with values and skewed The behaviour. The distributionis isvery governed key parameters, tailextreme thickness and skewness, in addition to scale and location. Inferring these parameters is difficult due to the lack Abstract: The α-stable distribution is very useful for modelling data with extreme values and skewed behaviour. The distribution is governed by two key parameters, tail thickness and Abstract: α-stable distribution is very usefulthese for modelling with extreme values skewness, in The addition to scale and location. Inferring parametersdata is difficult due to the lack of expression of the density. We develop aa Bayesian method, on and skewed behaviour. is governed by key tail thickness and skewness, inform addition toThe scale and location. parameters is difficult due tobased the lack and skewed behaviour. The distribution is Inferring governed by two key parameters, parameters, tail thickness and of a a closed closed form expression ofdistribution the probability probability density.these Wetwo develop Bayesian method, based on the pseudo-marginal MCMC approach, that requires only unbiased estimates of the intractable skewness, in addition to scale and location. Inferring these parameters is difficult due to the lack of closedinform expression ofapproach, the probability density.these We develop Bayesian method, on skewness, addition to scale and location. parameters is difficult due intractable tobased the lack thea pseudo-marginal MCMC thatInferring requires only unbiaseda estimates of the likelihood. To these estimates we an importance sampler aa latentof a closed expression of the density. We aa estimates Bayesian method, based on the MCMC that requires only unbiased of thefor intractable of a pseudo-marginal closed form form expression ofapproach, the probability probability density. We develop develop Bayesian method, based on likelihood. To compute compute these estimates we build build an adaptive adaptive importance sampler for latentvariable-representation of the α-stable density. This representation has previously been used the pseudo-marginal MCMC approach, that requires only unbiased estimates of the intractable likelihood. To compute these estimates we build an adaptive importance sampler for a latentthe pseudo-marginal MCMC that requires only unbiased has estimates of thebeen intractable variable-representation of theapproach, α-stable density. This representation previously used in in the literature conditional of the parameters, and compare our likelihood. To compute these estimates we adaptive importance sampler for latentvariable-representation of theMCMC α-stablesampling density. This haswe previously used in likelihood. To for compute these estimates we build build an adaptive importance sampler been for latentthe literature for conditional MCMC sampling of an therepresentation parameters, and we compare ouraamethod method with this variable-representation of α-stable density. has previously used the literature for conditional of therepresentation parameters, and comparebeen our method variable-representation of the theMCMC α-stablesampling density. This This representation haswe previously been used in in with this approach. approach. the literature for conditional MCMC sampling of the parameters, and we compare our method withliterature this approach. the for conditional MCMC sampling of the parameters, and we compare our method with this approach. © 2015, (International Federation of Automatic Control) Hosting the by Elsevier Ltd. All From rights reserved. 1.IFAC INTRODUCTION of with this approach. 1. INTRODUCTION of interest interest in in the distribution. distribution. From then then on, on, it it became became an instrument employed in various other fields, as 1. INTRODUCTION of in the distribution. Fromother then fields, on, it such became an interest instrument employed in various such as network analysis and communications (see Berger and 1. INTRODUCTION of interest in the distribution. From then on, it became The α-stable distribution was introduced by Lévy (1925) an instrument employed in various fields, as 1. INTRODUCTION interest in the distribution. Fromother then on,Berger it such became network analysis and communications (see and The α-stable distribution was introduced by Lévy (1925) of 1963; Ma and Nikias, 1995; Achim et al., an instrument employed in fields, such as to describe describe the behaviour was of normalized normalized sums of random random network analysis and (see Berger The α-stablethe distribution introducedsums by Lévy (1925) Mandelbrot, an instrument employed in various various other fields, such as Mandelbrot, 1963; Ma communications and Nikias, other 1995; Achim et and al., to behaviour of of 2010). network analysis and communications (see Berger and variables, under assumptions milder than the finite mean The α-stable distribution was introduced by Lévy (1925) Mandelbrot, 1963;and Ma communications and Nikias, 1995; et and al., to describe the behaviour of normalized of random (seeAchim Berger 2010). analysis The α-stable distribution was introduced by (1925) variables, under assumptions milder thansums theLévy finite mean network 1963; Ma and 1995; Achim and variance required by the the Central Limit Theorem. This Mandelbrot, to describe describe the behaviour of Central normalized sums of random random 2010). variables, under assumptions milder Limit thansums the finite mean Mandelbrot, 1963; Maparameters and Nikias, Nikias, 1995; Achim et etis al., al., to the behaviour of normalized of The estimation of the of the distribution and variance required by Theorem. This The estimation of the parameters of the distribution is an an 2010). generalization leads to flexibility in modelling phenomena variables, under assumptions milder than the finite mean and variance required the Central Limit Theorem. This 2010). variables, under assumptions milder than the finite mean important step for the successful application of the model. generalization leads toby flexibility in modelling phenomena The estimation of the of the distribution is an important step for the parameters successful application of the model. that present extreme values more likely towards positive or and variance required by the Central Limit Theorem. This generalization leads to flexibility in modelling phenomena and requiredvalues by themore Central Limit Theorem. This techniques have been the that variance present extreme likely towards positive or Several The estimation of parameters of distribution is important step for the successful application of frequentist the model. estimation of the the parameters of the the in distribution is an an Several techniques have been developed developed in the frequentist negative extremes. The fourmore parameters that characterize characterize generalization leadsThe to flexibility in modelling modelling phenomena that present extreme values likely towards positive or The generalization leads to flexibility in phenomena framework, e.g., based on the quantiles of the distribution negative extremes. four parameters that important step for the successful application of the model. Several techniques have been developed in the frequentist important step for the successful application of the model. framework, e.g., based on the quantiles of the distribution the distribution are the tail thickness α ∈ (0, 2], the that present extreme values more likely towards positive or negative extremes. The parameters that that present extreme values more likely towards positive or (McCulloch, 1986), its logarithmic moments (Kuruoglu, the distribution are thefour tail thickness α ∈characterize (0, 2], the Several techniques have been the frequentist framework, e.g., based the developed quantiles ofin techniques have been developed inthe thedistribution frequentist (McCulloch, 1986), itson logarithmic moments (Kuruoglu, skewness β ∈ ∈ [−1, [−1, 1],the the location μ ∈ ∈ that R and the scale negative extremes. The four parameters that characterize the distribution are tail thickness α and ∈characterize (0, 2],scale the Several negative extremes. The four parameters 2001), the empirical characteristic function (Koutrouvelis, skewness β 1], the location μ R the framework, e.g., based on the quantiles of the distribution (McCulloch, 1986), its logarithmic moments (Kuruoglu, framework, e.g., based on the quantiles of the distribution 2001), the empirical characteristic function (Koutrouvelis, σ > > distribution 0, the the first two being determinant inand representing the distribution are the tail thickness αin ∈representing (0, 2],scale the 1980), skewness β ∈first [−1, 1],the the tail location μ ∈ R the the are thickness α ∈ (0, 2], the or maximum estimator (Nolan, 2001). σ 0, two being determinant (McCulloch, 1986), its logarithmic moments (Kuruoglu, 2001), function (Koutrouvelis, 1986), characteristic itslikelihood logarithmic moments (Kuruoglu, 1980), the or aaempirical maximum likelihood estimator (Nolan, 2001). heavy tailedness (the lower α is, the more probable are (McCulloch, skewness β ∈ [−1, 1], the location μ ∈ R and the scale σ > 0, the first two being determinant in representing skewness β ∈ [−1, 1], the location μ ∈ R and the scale However all of these procedures introduce a number of heavy tailedness (the lower α is, the more probable are 2001), the empirical characteristic function (Koutrouvelis, 1980), or a maximum likelihood estimator (Nolan, 2001). 2001), the empirical characteristic function (Koutrouvelis, However all of these procedures introduce a number of extreme events) and asymmetry. In particular, in the σ > 0, the first two being determinant in representing heavy lower α is, the are numerical σ > 0,tailedness the first (the two being determinant in probable representing approximations, and, with the exception of the extreme events) and asymmetry. In more particular, in the 1980), or a maximum likelihood estimator (Nolan, 2001). 1 However all of these procedures introduce a number of 1980), or a maximum likelihood estimator (Nolan, 2001). numerical approximations, and, with the exception of the following we denote by Y ∼ S (σ, β, μ) a variable with αheavy tailedness (the lower α is, the more probable are 1 α extreme events) andbylower asymmetry. In particular, in the heavy tailedness (the is, β, the probable are case, they give only aa point estimate of the quantities following we denote Y ∼ Sαα (σ, μ)more a variable with α- last However all of these procedures introduce aa number of numerical approximations, and, with the exception of the However all of these procedures introduce number of last case, they give only point estimate of the quantities stable distribution, in introduced by extreme events) and asymmetry. In particular, in the the following we denote bythe Y ∼parametrization Sα11 (σ, β,In μ) particular, a variable with α- of extreme events) and asymmetry. in interest. stable distribution, in the parametrization introduced by numerical approximations, and, with the exception of the last case, they give only a point estimate of the quantities numerical approximations, and, with the exception of the of interest. Zolotarev (1986) by means of its characteristic function following we denote by Y ∼ S (σ, β, μ) a variable with α1 stable distribution, the introduced following we denote by Y ∼parametrization Sofα μ) a variable with by α- of Zolotarev (1986) byin means its β, characteristic function last case, they α (σ, interest. last case, they give give only only aa point point estimate estimate of of the the quantities quantities The Bayesian φ(t), such that stable distribution, in the parametrization introduced by Zolotarev (1986) by means of its characteristic function stable distribution, in the parametrization introduced by The Bayesian approach approach to to inference inference addresses addresses both both these these φ(t), such that of interest. ⎧ � � of interest. issues, providing (asymptotically) exact methods for deZolotarev (1986) by means of its characteristic function π ⎧ � � α α by means of itsπcharacteristic function The Bayesian approach to inference addresses both these φ(t), such that Zolotarev (1986) issues, providing (asymptotically) exact methods for de⎨ α |t|α �−iβ sign(t) K(α)� + iμt if α �= 1, scribing ⎨−σ ⎧ the posterior distribution of the parameters. Some The Bayesian approach to inference addresses both these φ(t), such that −iβ sign(t) K(α) + iμt if α = � 1, |t| −σ 2 issues, providing (asymptotically) exact methods for de� � π The Bayesian approach to inference addresses both these φ(t), such that scribing the posterior distribution of the parameters. Some α log φ(t) = α ⎧ � � π 2 K(α)�� + iμt if α �= 1, Bayesian ⎨ −iβ sign(t) |t|�απ�+ −σ log φ(t) = ⎧ techniques (Godsill, 2000) are based on a product issues, providing (asymptotically) exact methods for deπ ⎩ α scribing the posterior distribution of the parameters. Some iβ sign(t) log |t| + iμt if α = 1, −σ|t| issues, providing (asymptotically) exact methods for deπ Bayesian techniques (Godsill, 2000) are based on a product ⎨ �απ −iβ K(α) iμt if α �= 1, α |t| + iβ sign(t) sign(t)2log |t|� ++ iμt −σ|t| log φ(t) = ⎩ ⎨−σ 2 property of stable laws applied to the symmetric case scribing the posterior distribution of the parameters. Some −iβ sign(t) K(α) + iμt if α = � 1, |t| −σ 2 Bayesian techniques (Godsill, 2000) are based on a product � ⎩−σ|t| � 2 scribing the posterior distribution of the parameters. Some property of stable laws applied to the symmetric case log φ(t) = + iβ sign(t)2log |t|� + iμt if α = 1, Bayesian (1) log φ(t) = ⎩−σ|t| � π on inversion of the characteristic functechniques (Godsill, 2000) are based on (1)1, (β 2 + iβ sign(t) log |t| + iμt if α = property of stable laws applied symmetric case Bayesian techniques (Godsill, 2000) arethe based on a a product product (β = = 0), 0), others others on the the inversion of to the characteristic func⎩−σ|t| π + iβ sign(t) log |t| + iμt if α = 2 tion (Lombardi, 2007) or on the Poisson series represenproperty of stable laws applied to the symmetric case (1)1, (β where K(α) = α − 1 + sign(1 − α). = 0), others on the inversion of the characteristic func2 property of stable laws applied to the symmetric case tion (Lombardi, 2007) or on the Poisson series represenwhere K(α) = α − 1 + sign(1 − α). (1) (Lemke and Godsill, 2011, 2012). The present work (β = 0), others on the inversion of the characteristic func(1) tation tion (Lombardi, 2007) or on the Poisson series represenwhere K(α) = α − 1 + sign(1 − α). (β = 0), others on the inversion of the characteristic functation (Lemke and Godsill, 2011, 2012). The present work For a review of the most commonly used parametrizations aa latent variables representation of the α-stable tion 2007) or the Poisson series For a review of α the commonly parametrizations exploits where K(α) = − 11 + − tation(Lombardi, (Lemke and Godsill, 2011, work tion (Lombardi, 2007) or on on the 2012). PoissonThe series represenexploits latent variables representation of present therepresenα-stable where K(α) = − most + sign(1 sign(1 − α). α). used we to (1998) and While For a review of αthe most used(1996). parametrizations density function that was introduced by Buckle (1995) (Lemke and Godsill, 2011, 2012). The present work we refer refer to Nolan Nolan (1998)commonly and Weron Weron (1996). While the the tation exploits a latent ofBuckle the α-stable tation andvariables Godsill, 2011, 2012). by The present work density(Lemke function that was representation introduced (1995) characteristic function is given explicitly by (1), there is For a review of the most commonly used parametrizations we refer to Nolan (1998) and Weron While for a conditional Markov Chain Monte Carlo (MCMC) a latent variables representation of the α-stable For a review offunction the most used(1996). parametrizations characteristic iscommonly given explicitly by (1), therethe is exploits density function that was introduced by Buckle (1995) exploits a latent variables representation of the(MCMC) α-stable for a conditional Markov Chain Monte Carlo no closed form expression for the density function, with we refer to Nolan (1998) and Weron (1996). While the characteristic function is given explicitly by (1), there is Gibbs sampling propose instead a marginal density function that was introduced by Buckle (1995) we refer toform Nolan (1998) and Weron (1996). While the no closed expression for the density function, with for a conditional Markov Chain Monte density functionalgorithm. that was We introduced byCarlo Buckle (1995) Gibbs sampling algorithm. We propose instead a (MCMC) marginal only aa few exceptions including the Gaussian distribution, characteristic function is explicitly by (1), is no expression for the density function, with sampler, using the pseudo-marginal approach, for a conditional Markov Chain Monte Carlo (MCMC) characteristic function is given given explicitly by (1), there there is MCMC onlyclosed few form exceptions including the Gaussian distribution, Gibbs sampling algorithm. We propose instead a marginal for a conditional Markov Chain Monte Carlo (MCMC) MCMC sampler, using the pseudo-marginal approach, obtained α 2. limited the application of the no closed form expression for function, only a fewfor exceptions including the density Gaussian distribution, firstly presented by ONeill et al. and deGibbs algorithm. We propose instead a marginal no closed form expression for the the density function, with obtained for α = = 2. This This limited the application ofwith the MCMC sampler, the Gibbs sampling algorithm. We propose instead aapproach, marginal firstly sampling presented byusing ONeill etpseudo-marginal al. (2000) (2000) and further further destable distribution, until the work of Mandelbrot (1963) only a few exceptions including the Gaussian distribution, obtained for α = 2. This limited the application of the veloped by Beaumont (2003) and Andrieu and Roberts MCMC sampler, using the pseudo-marginal approach, only a few exceptions including the Gaussian distribution, stable distribution, until the work of Mandelbrot (1963) firstly presented by ONeill et al. (2000) and further deMCMC sampler, using(2003) the pseudo-marginal veloped by Beaumont and Andrieu andapproach, Roberts and Fama (1965) who showed the possibility of modelling obtained for α = 2. This limited the application of the stable distribution, the work of Mandelbrot (1963) In Section 22 ONeill we present aa (2000) general formulation of firstly presented by et al. and further deobtained for α =who 2.until This limited the application of the (2009). and Fama (1965) showed the possibility of modelling veloped by Beaumont (2003) and Andrieu and Roberts firstly presented by ONeill et al. (2000) and further de(2009). In Section we present general formulation of the changes of stock in series stable distribution, until the work of Mandelbrot (1963) and Fama (1965) who showed the possibility oftime modelling problem, showing the difference between marginal and veloped by Beaumont (2003) and Andrieu and Roberts stable distribution, until theprices work of financial Mandelbrot (1963) the sudden sudden changes of stock prices in financial time series the (2009). In Section 2 we present a general formulation of veloped by Beaumont (2003) and Andrieu and Roberts the problem, showing the difference between marginal and by the distribution, to increase and Fama (1965) who the modelling the sudden changes of showed stock prices inleading financial time series the conditional MCMC We give aamarginal brief review (2009). In Section 22schemes. we aathen general formulation of and Fama of (1965) who showed the possibility possibility ofan modelling by means means of the stable stable distribution, leading toof an increase problem, showing thepresent difference between and (2009). In Section we present general formulation of conditional MCMC schemes. We then give brief review the sudden changes of stock prices in financial time series  byThe means ofchanges the acknowledges stable distribution, toEPSRC-DTGan increase the pseudo-marginal method in Section 3. Thereafter, the problem, showing the difference between marginal and the sudden of stock pricesfunding inleading financial time series of first author partial from  conditional MCMC schemes. We then give a brief review the problem, showing the difference between marginal and of the pseudo-marginal method in Section 3. Thereafter, The first author acknowledges partial funding from EPSRC-DTGby means of the stable distribution, leading to an increase 2013/EU, while thestable second distribution, and third authors were supported by the we detail the application of problem conditional schemes. We then give aa the brief review byThe means of the leading toEPSRC-DTGan increase  of pseudo-marginal method Section Thereafter, firstwhile author partial funding from 2013/EU, theacknowledges second and third authors were supported by the conditional MCMC schemes. Weinmethods then giveto brief review we the detail theMCMC application of these these methods to3. the problem  The first EPSRC BTaRoT project EP/K020153/1; the third author was also of inferring the parameters of the α-stable distribution in the pseudo-marginal method in Section 3. Thereafter, author acknowledges partial funding from EPSRC-DTG 2013/EU, while the second and third authors were supported by the EPSRC BTaRoT project EP/K020153/1; the third author was also we detail the application of these methods to the problem of the pseudo-marginal method in Section 3. Thereafter, The first author acknowledges partial funding from EPSRC-DTGinferring the parameters of the α-stable distribution in funded by while the Learning of complex dynamical systems project by (Con2013/EU, the second and third authors were supported the Section 4, and present numerical results in Section 5. we detail the application of these methods to the problem EPSRC BTaRoT project EP/K020153/1; the third author was also funded by the Learning of complex dynamical systems project (Conof inferring the parameters of the α-stable distribution in 2013/EU, while637-2014-466) the second and third authors were supported by the we detail the application of these methods to the problem Section 4, and present numerical results in Section 5. tract number: from the Swedish Research Council. EPSRC BTaRoT project the systems third author was also of inferring the parameters of the α-stable distribution in funded by the Learning of EP/K020153/1; complex dynamical project (Contract number: 637-2014-466) from the Swedish Research Council. EPSRC BTaRoT project EP/K020153/1; the third author was also Section 4, and present numerical results in Section 5. of inferring the parameters of the α-stable distribution in funded by the Learning of complex dynamical project (Contract number: 637-2014-466) from the Swedishsystems Research Council. Section 4, and present numerical results in Section 5. funded by the Learning of complex dynamical systems project (ConSection 4, and present numerical results in Section 5. tract number: 637-2014-466) from the Swedish Research Council.

Copyright © IFAC 2015 tract number: 637-2014-466) from the Swedish Research Council. 472 Copyright © IFAC 2015 472 2405-8963 © 2015, IFAC (International Federation of Automatic Control) Copyright IFAC 2015 472 Hosting by Elsevier Ltd. All rights reserved. Copyright IFAC responsibility 2015 472Control. Peer review© of International Federation of Automatic Copyright ©under IFAC 2015 472 10.1016/j.ifacol.2015.12.173

2015 IFAC SYSID October 19-21, 2015. Beijing, China

Marina Riabiz et al. / IFAC-PapersOnLine 48-28 (2015) 472–477

2. PROBLEM FORMULATION Here we introduce the Bayesian parameter inference framework for latent variable models. Denote with θ ∈ Θ the parameters we are inferring, with y = {yi }N i=1 ∈ Y the data and with x = {xi }N ∈ X a set of latent, i=1 non-observed, variables. The parameters are modelled as random variables with prior distribution π(θ). We are interested in sampling from their posterior distribution, after observing the realization of the data, with likelihood pθ (y). According to Bayes’ theorem, the parameter posterior distribution is given by π(θ|y) ∝ pθ (y)π(θ). In the setting of α-stable data, the likelihood is not available in closed form, but we have a representation of it as the marginal of a higher dimensional distribution fθ (y, x) = gθ (y|x)fθ (x) (see Section 4.1). Here, fθ (y, x) is the joint distribution of the data y and the latent variables x, and the data likelihood is given by  (2) pθ (y) = gθ (y|x)fθ (x)dx.

Using this latent variable representation, common algorithms can be used for marginal and conditional sampling, as explained below. 2.1 Sampling schemes

Metropolis-Hastings (Hastings, 1970) and Gibbs sampling (Geman and Geman, 1984) are the most commonly used algorithms to draw samples from a Bayesian posterior distribution (see also Robert and Casella, 2004). They are  based on generating a Markov chain θ (k) k with limiting distribution π(θ|y). The sample path of the Markov chain can then be used to estimate expectations w.r.t. the posterior distribution by computing Monte Carlo averages. The Metropolis-Hastings algorithm draws a new sample θ � given the current state of the chain θ from the proposal distribution q(θ � |θ). The proposed value is accepted with probability pθ (y)π(θ � ) q(θ|θ � ) (3) ρ=1∧ pθ (y)π(θ) q(θ � |θ) otherwise the value is rejected and the chain remains at its current state. The Gibbs sampler operates by simulating the variables of the model from their respective full conditional distributions. If the target distribution is d−dimensional, then at (k) each step k it draws single components θi iteratively from (k) the univariate full-conditional distributions πi (·|θ−i , y), (k)

(k)

(k)

(k−1)

(k−1)

where θ−i := (θ1 , . . . , θi−1 , θi+1 , . . . , θd ), for i = 1, . . . , d. Observe that we assume that it is possible to (k) sample from πi (·|θ−i , y). If this is not the case, a mixed scheme, Metropolis-within-Gibbs (Muller, 1991), can be used to target the full-conditionals. Observe that in general the full-conditionals are not restricted to be univariate, (k) and the components θi can be vectors. Note moreover that the target distribution need not be limited to the parameters.

In particular, the conditional sampler for the latent variables framework (2) augments the state space by incorporating the latent variables to obtain Θ� = Θ × X . It 473

473

then samples iteratively, through a Gibbs algorithm, from (k) the full conditionals π(x|θ (k−1) , y) and πi (θi |θ−i , x(k) , y). The main advantage of this scheme is that computing the acceptance ratio requires only evaluating the integrand of expression (2), which is feasible. It is frequent, however, for conditional MCMC samplers to have a slow mixing of the chains, due to high posterior correlation existing between θ and x. This is reduced in a marginal sampler, where the latent variables are ideally integrated out to compute pθ (y), appearing in the expression of the full conditionals. However, when the integral in (2) cannot be computed analytically, this approach requires numerical integration which is often prohibitively costly. Furthermore, using a numerical integration directly in the sampling algorithms would result in inexact MCMC schemes, with biases that are difficult to assess. An alternative solution to the implementation of marginal sampling schemes is provided by the pseudomarginal method. 3. PSEUDO-MARGINAL SAMPLER VIA IMPORTANCE SAMPLING The pseudo-marginal approach has its origin in an approximate sampler proposed by ONeill et al. (2000), where a Monte Carlo estimate of the likelihood term was computed twice for each MCMC step, once for θ � and once for θ. This provides an approximation of the acceptance probability, which can be used to construct a practical algorithm, that however is inexact, due to the approximation error in the acceptance probability. Nevertheless, Andrieu and Roberts (2009) showed that by considering the augmented state (θ, z), where z is a nonnegative, unbiased estimate of the likelihood, meaning that E[z|θ] = pθ (y), it is possible to build a sampler whose marginal distribution is exactly the target of the ideal marginal sampler. In particular, let π(θ, z) = fθ (z)π(θ) be the distribution of the expanded state. It can be shown that an exact MCMC scheme for the target distribution proportional to zfθ (z)π(θ) has acceptance probability ρ=1∧

z � fθ (z � )π(θ � ) fθ (z)q(θ|θ  ) , zfθ (z)π(θ) fθ (z � )q(θ � |θ)

in which at each step only an estimate of the likelihood is required. Thanks to the unbiasedness of z, marginalizing the samples to the θ component corresponds to drawing from the desired target posterior π(θ|y), because  zfθ (z)π(θ)dz = π(θ)pθ (y).

The most common way to build an unbiased estimator of pθ (y) is by means of an importance sampler for the conditional distribution pθ (x|y), as detailed in Algorithm 1. Then the importance-sampling-based pseudomarginal Metropolis-Hastings sampler (or the Metropoliswithin-Gibbs step) can be implemented following Algorithm 2. We remark that at each step we are either accepting or rejecting a move to the augmented proposed state (θ  , Z � ) from the previous state (θ (k−1) , Z (k−1) ) and that the prob-

2015 IFAC SYSID 474 October 19-21, 2015. Beijing, China

Marina Riabiz et al. / IFAC-PapersOnLine 48-28 (2015) 472–477

Algorithm 1 IS(θ, y) - Importance sampler

f (y, x|α, β)=  y α/(α−1)   y α/(α−1) 1 α     exp − ,    |α − 1| tαβ (x) tαβ (x) |y| (4) where 1−α sin(παx + ηαβ ) tαβ (x) = (cos((α − 1)πx + ηαβ )) α , 1/α (cos(πx))

1: for j = 1, . . . , N do (a) Simulate Xi ∼ ζθ,yj (x), for i = 1, . . . , M gθ (yj |Xi )fθ (Xi ) (b) Compute Wi = , for i = 1, . . . , M ζθ,yj (Xi ) (c) Compute Zj = 2: return Z =

N

1 M

Z j=1 j

M

Wi

i=1

ηαβ = βπ/2K(α), and lαβ = −ηαβ /πα. Moreover the marginal distribution of y is Sα1 (β, 0, 1). Observe that for simplicity we work with standardized stable data, aiming at inferring θ = (α, β), which are the most characteristic parameters of the α-stable distribution. A generalization to the four parameters (α, β, σ, μ) is straightforward, but likely to increase the variance of the estimates, as well as the computational time.

Algorithm 2 IS-based pseudo-marginal MH 1: Set θ (0) arbitrarily. 2: Compute Z (0) ← IS(θ (0) , y) 3: for k = 1, . . . , Nit do (a) Propose θ � ∼ q(·|θ (k−1) ) (b) Compute Z � ← IS(θ � , y) (c) Set



where

θ (k) , Z

 (k)

ρ=1∧

4.2 A conditional-scheme for the stable distribution

 � �  θ ,Z with probability ρ :=  (k−1) (k−1)  θ

Z�

Z (k−1)

,Z

Based on representation (4), Buckle (1995) implements a Gibbs scheme that samples iteratively (x, α, β), where {xi }N i=1 is a vector of latent variables, one for each standard stable data point yi . The kth Gibbs’ step samples

otherwise

π(θ � ) q(θ (k−1) |θ � ) π(θ (k−1) ) q(θ � |θ (k−1) )

(k)

ability of accepting this move is inversely proportional to Z (k−1) , the noisy realization of the likelihood in the previous set of parameters. This is a critical point for the implementation of the pseudo-marginal method. Specifically, if at some iteration k − 1 we accept a proposed value of the parameters with a large positive noisy realization in the likelihood estimate (Z (k−1) � pθ(k−1) (y)), then the algorithm has the tendency of getting stuck at θ (k−1) for many iterations. To avoid this poor mixing of the chains it is necessary to make the importance sampling produce likelihood estimates with small variance, by accurately choosing ζθ,yj (x), the proposal distribution for the latent variables. In the following we report a marginal representation of the α-stable density presented in the literature for a conditional Gibbs sampling scheme. We then make use of the pseudo-marginal method for this representation (see Algorithm 4), and in particular, we construct an efficient importance sampler based on adaptive envelopes (see Algorithm 3). The latter development is instrumental to the application of the pseudo-marginal method to inference for α-stable parameters, but we note that this adaptive scheme could also find applications in other contexts. 4. INFERENCE FOR α-STABLE PARAMETERS 4.1 A marginal representation of the α-stable density Marginal representations of the α-stable density were introduced by Zolotarev (1966). Nolan (1997) applies some modifications to suit the S 0 -parametrization, while Buckle (1995) gives the expression valid for the S 1 parametrization (1) that we refer to. He proves that the function f : (−∞, 0)×[−1/2, lαβ ]∪(0, ∞)×[lαβ , 1/2] −→ (0, ∞) is a bivariate probability density function for (y, x) 474

(a) x(k) , xi ∼ π(x|α(k−1) , β (k−1) , yi ); (b) α(k) ∼ π(α|β (k−1) , x(k) , y); (c) β (k) ∼ π(β|α(k) , x(k) , y).

The expressions of the full conditionals of the latent variables and the parameters are given by Buckle (1995). We note that the first sub-step is performed through an accept-reject algorithm based on adaptive proposals, that we take as a basis for the importance sampler described in Section 4.3. Moreover a Metropolis-within-Gibbs is used (k) for points (b) and (c), where a change of variables from xi (k) (k) to vi = tα,β (xi ) is suggested. This adds computational complexity, as it requires to evaluate the Jacobian of the inverse transformation in the newly computed x variables, each time a new value of the parameters is proposed. However the new target full-conditionals π(α|β (k−1) , v(k) , y) and π(β|α(k) , v(k) , y) are less “peaky” than the original ones, due to less correlation existing between the parameters and the latent variables. The overall effect is a much better mixing of the chains, as the simulation results show in Section 5. 4.3 Pseudo-marginal approach for the stable distribution Here we detail the novel contribution of the present work, the application of the pseudo-marginal method to the inference of α and β parameters of the stable distribution. The importance sampling step is defined by the Algorithm 1, once ζθ,yj (x), the proposal distribution for the latent variables, is set. In building these proposals, we take inspiration from the mechanism that Buckle (1995) adopts to draw samples from the full conditionals π(x|α, β, yi ). Specifically it can be proven that each π(x|α, β, yi ) is a uni= t−1 modal function with maximum point in xmax i αβ (yi ), max = α/ (|α − 1||yi | exp(1)); its and maximum value πi compact support is Xi = [−1/2, lαβ ], if yi < 0 or Xi = [lαβ , 1/2], if yi > 0. Unimodality is used to adapt a

2015 IFAC SYSID October 19-21, 2015. Beijing, China

1

Marina Riabiz et al. / IFAC-PapersOnLine 48-28 (2015) 472–477

PM-MH and to the second one as PM-MH-GS. In general, tuning of PM-MH-GS is simpler than for PM-MH, because the two parameters are proposed and accepted or rejected separately. However this results in twice the simulation time, since the likelihood estimate is computed twice per each iteration. Simulation results for parameter estimation through the presented conditional and pseudo-marginal schemes are shown in the next section.

α =1.2, β =0.7, yj =-2.6525 π(x|α, β, yi ) ζα,β,yj (x) Xjg

0.8 0.6

475

0.4 0.2

Algorithm 3 Adaptive envelope for LV sampling

0 -0.5

0

0.5

x

Fig. 1. Proposal distribution for the latent variables, ζα,β,yj (x), consisting in a (G+1)=21 levels piecewise constant envelope on the full conditional π(x|α, β, yi ).

piecewise constant envelope function by sampling a first group of G (not identically distributed) latent variables  g G 0 Xj j=1 , starting from the uniform envelope ζθ,y (x) = j max πi 1Xi (x). Assume e.g. that yi < 0: a first point X1g is 0 uniformly sampled from ζθ,y (x) and used to add a level, j forming a two-piece constant envelope function  π1 1A1 (x) + π2 1A2 (x) if tαβ (X1 )g < yi , 1 ζθ,y (x) = j π2 1A1 (x) + π1 1A2 (x) if tαβ (X1 )g > yi ,

where π1 = π(X1g |α, β, yi ), A1 = (−1/2, X1g ), π2 = πimax , A2 = (X1g , lαβ ). The procedure is repeated by sampling the next point X2g uniformly over one of the two segments, where the segment is selected from a categorical distribution with parameters (p1 , p2 ) given by the normalized areas of the two segments. This procedure is then iterated until the final (G + 1)-levels envelope G ζθ,y (x) j

=

G+1 

0 1: Set ζθ,y (x) = πimax 1Xi (x) j 2: for j = 1, . . . , G do j−1 (a) Sample Xjg ∼ ζθ,y (x). j

if tαβ (Xjg ) < yi then add the segment πj 1Aj (x) to the left of the mode (c) else add the segment πj 1Aj (x) to the right of the mode

(b)

G (x) 3: return ζθ,y j

Algorithm 4 PM-IS based MH within Gibbs (α, β) 1: Set (α(0) , β (0) ) arbitrarily 2: Compute Z (0) ← IS(α(0) , β (0) , y) 3: for k = 1, . . . , Nit do (1) Given (α(k−1) , β (k−1) , Z (k−1) ) (a) Propose α� ∼ qα (·| α(k−1) ) (b) Compute Z � ← IS(α� , β (k−1) , y) (c) Set



α

,Z

where ρα = 1∧

πj 1Aj (x),

(k)

��



Z� Z (k−1)

 � �  α ,Z with prob. ρα :=  (k−1) (k−1)  α

else

,Z

π(α� , β (k−1) ) qα (α(k−1) | α� ) π(α(k−1) , β (k−1) ) qα (α� | α(k−1) )

(α(k) , β (k−1) , Z �� )

j=1

with πj and Aj appropriately defined, is obtained. When normalized, this forms the proposal distribution ζθ,yj (x), from which the M i.i.d. samples Xi in Algorithm 1 are drawn. For clarity the procedure is summarized in Algorithm 3. Figure 1 shows an example of an envelope with G + 1 = 21 levels, built on a not too peaky conditional distribution π(x|α, β, yi ). Peaked full conditionals are observed when the absolute value of the stable data point yi is either close to zero, or very large. They are responsible for the high variance of the likelihood estimate Z, if the parameters G and M are not opportunely tuned; in particular too small values of G may cause the procedure to sample from lowprobability regions of the full conditionals. Observe that the adaptive envelope procedure adopted here could be used in other problems, where unimodality is observed. On the other hand we could reach our goal by means of different proposals for the latent variables, built for example with a Laplace approximation of the full conditionals (see Tierney and Kadane, 1986). Once we have defined how to produce an estimate of the likelihood providing details on the importance sampler, the pseudo-marginal Algorithm 2 could be used either directly, with θ = (α, β), or as a Metropolis-within-Gibbs step, as detailed in Algorithm 4. We refer to the first scheme as 475

(2) Given (a) Propose β � ∼ qβ (·| β (k−1) ) (b) Compute Z ��� ← IS(α(k) , β � , y) (c) Set



β

(k)

,Z

where ρβ = 1∧

(k)



 � ���  β ,Z with prob. ρβ :=  (k−1) ��  β

,Z

else

Z ��� π(α(k) , β � ) qβ (β (k−1) | β � ) Z �� π(α(k) , β (k−1) ) qβ (β � | β (k−1) )

5. SIMULATIONS RESULTS AND COMPARISONS In both the conditional and the marginal scheme the proposal and prior distributions for α and β have to be chosen by the user. On this point our implementation differs from that presented by Buckle. He alleviates the problem of proposal selection by further building adaptive rejection sampling envelopes for them, referring to the methodology suggested by Gilks and Wild (1992) and its extensions for non log-concave distributions. However this adds computational complexity to the sampling procedure, so we prefer using simpler proposal distributions in this evaluation. Additionally our version of the pseudo-marginal method requires to set a value for the parameters G and M : the first one determines the quality of the approximation of

Marina Riabiz et al. / IFAC-PapersOnLine 48-28 (2015) 472–477

π(x|α, β, yi ), the second one the accuracy of the Monte Carlo estimate of its integral in the importance sampling.

For simplicity the parameters are then assumed to be a priori independent and uniformly distributed on Sα ×Sβ . Furthermore in both the marginal and the conditional schemes we make use of proposal distributions for the parameters that are a tunable mixture between a truncated Gaussian random walk (with probability p) and an independent move (with probability 1 − p). The first component corresponds to a local exploration of the state space, and requires the tuning of the variances σα2 , σβ2 (and of the covariance σαβ for the joint sampling in the PM-MH scheme) while the second one represents the attempt at a more global move. We have performed simulations for the two sets (α1T , β1T )= (0.5, 0.7) and (α2T , β2T ) = (1.2, 0.7), corresponding to two different weights of the tails, and a positively skewed distribution. The respective initial values of the chains are (0) (0) (0) (0) (α1 , β1 ) = (0.8, 0.4) and (α2 , β2 ) = (1.7, 0.4) and the algorithms are run for Nit = 5000 iterations. At the top of Figure 2 we show the trace plot of the chains for α (left) and β (right), obtained for the two parametrizations of the conditional scheme and the PMMH-GS scheme for (α1T , β1T ). We use p = 0.85, σα2 = σβ2 = 10−3 , and G = 50, M = 50. As a general observation the conditional Gibbs sampler suffers from correlation between the parameters and the latent variables x: the target full conditionals are too peaky and not spread around (αT , βT ). This effect is successfully reduced by means of the re-parametrization to the latent variables v. The pseudo-marginal method with the chosen G and M achieves, on the other hand, similar performances to this second scheme, without the necessity of transforming the latent variables. The sample autocorrelation functions are displayed in the centre of Figure 2, and they appear comparable. A more quantitative analysis, based on the lag corresponding to the first crossing of the rejection band, is presented in Table 1, for the combinations G ∈ {20, 50}, M ∈ {50, 100}. The lags for the v-conditional Gibbs algorithm are (Lα , Lβ ) = (17, 15), and a comparison with the pseudo-marginal values indicates that envelopes with 476

0.8 0.6

β

α

con(x) con(v) ps-mar (x)

0.6

con(x) con(v) ps-mar (x)

0.4 0.4 0

2000

4000

iterations

0.8

0.2 0

6000

0.2

0 -0.2

33

0.3

con(v) ps-mar (x)

0.2

-0.2

17

50

con(v) ps-mar (x)

1

17

33

50

Lag Length con(v) ps-mar (x)

0.15

0.2

6000

0.4

0

Lag Length

4000

iterations

0.6

0.4

1

2000

0.8

con(v) ps-mar (x)

0.6

α

As for the first issue, observe that the characteristic function of the parametrization (1) has a pole for α = 1, and some of the analytical expressions used are not well defined for α = 0. For this reason, it is common in the αstable literature to make inference on α assuming that it belongs to the bounded domain Sα ∈ {(0.1, 0.9), (1.1, 2)}. Moreover we decide to bound also the support of the prior and proposal distribution of β to Sβ ∈ {(−1, 0), (0, 1)}, even if this restriction could be relaxed. Specifically we make sure that Sα and Sβ contain (αT , βT ), the values of the parameters used to generate the N = 1000 stable points dataset. These are obtained through the version of the Chambers-Mallows-Stuck algorithm (see Chambers et al., 1976) that suits the representation (4) of the stable density, as described by Weron (1996). In particular we use what he calls the Sα1 (σ2 = 1, β2 = 0.7, μ = 0) version of the S 1 parametrization, noting that, if the Sα1 (σ, β, μ = 0) parametrization is used, a change of variable has to be done from (σ, β) to (σ2 , β2 ).

0.8

β

2015 IFAC SYSID 476 October 19-21, 2015. Beijing, China

0.1

0.1

0.05

0 0.4

0.5

0.6

α

0 0.6

0.7

β

0.8

0.9

Fig. 2. Trace plot (top), sample autocorrelation function (centre)

and posterior distributions (bottom) of α and β, varying the sampling scheme.

Table 1. Lag of the first rejection band crossing, for the α and β sample posterior autocorrelations. (Lα ; Lβ ) G = 20 G = 50

M = 50 (50;42) (20;14)

M = 100 (44;22) (17;15)

50 levels allow us to generate chains with a similar effective sample size. At the bottom of Figure 2 we show the posterior distributions of the parameters, produced with the v-Gibbs and with the PM-MH-GS schemes, after a 500 iterations burnin. The red vertical lines represent (α1T , β1T ); the posterior means obtained with the first algorithm are E [α|y] = 0.497, E [β|y] = 0.72 and E [α|y] = 0.496, E [β|y] = 0.72 with the second one, revealing a small difference in precision between the two methods. 6. CONCLUSIONS We have shown that the proposed adaptive envelopesbased pseudo-marginal method is able to achieve results comparable to the Bayesian approach presented by Buckle (1995). This was obtained using an off-the-shelf implementation of the pseudo-marginal method, i.e., without requiring the application-specific re-parametrization used by Buckle to enable the mixing of the conditional scheme. Furthermore, the pseudo-marginal method has the advantage that the design parameters G and M directly influence the mixing; larger values result in better mixing and these parameters can thus be chosen as large as possible w.r.t. the available computational resources. The mixing of the conditional scheme, however, is strictly limited by the dependencies between the latent variables and the parameters. It is not possible to improve upon this unless the user is able to come up with an even “better”

2015 IFAC SYSID October 19-21, 2015. Beijing, China

Marina Riabiz et al. / IFAC-PapersOnLine 48-28 (2015) 472–477

re-parametrization. The latter point could prove to be an even larger merit of the pseudo-marginal method in more challenging scenarios, e.g. for models including more parameters or where the posterior distribution has significant probability mass on both disconnected subsets of the support Sα , forcing the sampler to jump between these subsets. The conditioning on v may prohibit large moves in the parameter space, in particular between disconnected subsets of the posterior support, which is not an issue for a marginal sampler employing global (i.e., independent) proposals. Investigating the performance of the methods in such more challenging scenarios is a topic for future work. However, it is necessary to focus the reader’s attention on the major drawback of the novel method, consisting in an increased computational cost. For each iteration, this is on average G × N (the computationally expensive part of the importance sampler being the G adaptations of each of the N envelopes), compared to 6 × N for the conditional scheme implemented by Buckle. In the conditional sampler each latent variables is draw by rejection sampling and each acceptance requires, on average, no more than 6 candidate points. The mixing of the conditional sampler was significantly improved by changing parametrization from x to v. An interesting topic for future work is therefore to build a pseudo-marginal scheme based on the re-parametrized latent variables v. Their support is not bounded, as Im(tα,β (x)) = (−∞, +∞), which makes the ‘envelopes strategy’ not applicable any more. Were a less expensive approach for the importance sampling step in the new parametrization to be found, this would likely improve the state of the art of the Bayesian approach to inference of the α-stable parameters, based on the representation (4). A Matlab source code for the showed methods is available on http://www-sigproc.eng.cam.ac.uk/Main/MR622. REFERENCES Achim, A., Buxton, B., Tzagkarakis, G., and Tsakalides, P. (2010). Compressive sensing for ultrasound RF echoes using α-stable distributions. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, 4304–4307. IEEE. Andrieu, C. and Roberts, G.O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, 37(2), 697–725. Beaumont, M.A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics, 164(3), 1139– 1160. Berger, J. and Mandelbrot, B. (1963). A new model for error clustering in telephone circuits. IBM Journal of Research and Development, 7(3), 224–236. Buckle, D.J. (1995). Bayesian inference for Stable distributions. Journal of the American Statistical Association, 90(430), pp. 605– 613. Chambers, J.M., Mallows, C.L., and Stuck, B.W. (1976). A method for simulating Stable random variables. Journal of the American Statistical Association, 71(354), 340–344. Fama, E.F. (1965). The behavior of stock-market prices. The Journal of Business, 38(1), pp. 34–105. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-6(6), 721–741.

477

477

Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(2), pp. 337–348. Godsill, S. (2000). Inference in symmetric alpha-stable noise using MCMC and the slice sampler. In Acoustics, Speech, and Signal Processing, 2000. ICASSP ’00. Proceedings. 2000 IEEE International Conference on, volume 6, 3806–3809 vol.6. Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), pp. 97–109. Koutrouvelis, I.A. (1980). Regression-type estimation of the parameters of Stable laws. Journal of the American Statistical Association, 75(372), pp. 918–928. Kuruoglu, E. (2001). Density parameter estimation of skewed αstable distributions. Signal Processing, IEEE Transactions on, 49(10), 2192–2201. Lemke, T. and Godsill, S. (2011). Enhanced Poisson sum representation for alpha-stable processes. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 4100–4103. Lemke, T. and Godsill, S. (2012). Linear Gaussian computations for near-exact Bayesian Monte Carlo inference in skewed alpha-stable time series models. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 3737–3740. Lévy, P. (1925). Calcul des Probabilités. PCMI collection. GauthierVillars. Lombardi, M.J. (2007). Bayesian inference for α-stable distributions: A random walk {MCMC} approach . Computational Statistics & Data Analysis, 51(5), 2688 – 2700. Ma, X. and Nikias, C. (1995). Parameter estimation and blind channel identification in impulsive signal environments. Signal Processing, IEEE Transactions on, 43(12), 2884–2897. Mandelbrot, B. (1963). New methods in statistical economics. Journal of Political Economy, 71(5), pp. 421–440. McCulloch, J.H. (1986). Simple consistent estimators of stable distribution parameters. Communications in Statistics - Simulation and Computation, 15(4), 1109–1136. Muller, P. (1991). A generic approach to posterior integration and Gibbs sampling. Technical report, Purdue University, West Lafayette, Indiana. Nolan, J.P. (1997). Numerical calculation of Stable densities and distribution functions. Comm. Statist. Stochastic Models, 13(4), 759–774. Nolan, J.P. (2001). Maximum likelihood estimation of stable parameters. Levy processes: Theory and applications, 379–400. Nolan, J. (1998). Parameterizations and modes of Stable distributions . Statistics & Probability Letters, 38(2), 187 – 195. ONeill, P.D., Balding, D.J., Becker, N.G., Eerola, M., and Mollison, D. (2000). Analyses of infectious disease data from household outbreaks by Markov chain Monte Carlo methods. Journal of The Royal Statistical Society Series C-applied Statistics, 49, 517–542. Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer. Tierney, L. and Kadane, J.B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393), 82–86. Weron, R. (1996). On the Chambers-Mallows-Stuck method for simulating skewed Stable random variables. Statistics & Probability Letters, 28(2), 165 – 171. Zolotarev, V.M. (1966). On representation of Stable laws by integrals. Selected Translations in Mathematical Statistics and Probability, 6(1), 84–5. Zolotarev, V. (1986). One-Dimensional Stable Distributions. Translations of mathematical monographs. American Mathematical Society.