On the choice of high-dimensional regression parameters in Gaussian random tomography

On the choice of high-dimensional regression parameters in Gaussian random tomography

Results in Applied Mathematics 3 (2019) 100067 Contents lists available at ScienceDirect Results in Applied Mathematics journal homepage: www.journa...

362KB Sizes 0 Downloads 30 Views

Results in Applied Mathematics 3 (2019) 100067

Contents lists available at ScienceDirect

Results in Applied Mathematics journal homepage: www.journals.elsevier.com/ results-in-applied-mathematics/

On the choice of high-dimensional regression parameters in Gaussian random tomography Christian Rau Albrecht-Dürer-Str. 7, 65428, Rüsselsheim, Germany

a r t i c l e i n f o

a b s t r a c t

Article history: Received 24 July 2019 Received in revised form 12 August 2019 Accepted 15 August 2019

The stochastic Radon transform was introduced by Panaretos in order to estimate a biophysical particle, modelled as a three-dimensional probability density subject to random and unknown rotations before being projected onto the imaging plane. Assuming that the particle has a Gaussian mixture distribution, the question arises as to how to recover the modes of the mixture, and the mixing weights. Panaretos and Konis proposed to do this in a sequential manner, using high-dimensional regression. Their approach has two drawbacks: first, the mixing weight estimates often have incorrect rank; second, much information inherent in the coefficients from the penalized regression approach (Lasso, or more generally elastic net) is lost. We propose a procedure, based on the asymptotic precision matrix, that exploits the information from the high-dimensional regression more efficiently, and yields in particular a method for choosing the parameter that balances the Lasso with ridge regression. © 2019 The Author. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Keywords: Elastic net Landmarks Lasso Mixture Random tomography Rotations MSC: 60D05 62H35

1. Introduction The classical problem of three-dimensional tomography, from a mathematical point of view, is how to recover a trivariate real-valued function rðxÞ ¼ rðx1 ; x2 ; x3 Þ from the function

rðx; xÞ ¼

Z

þ∞ ∞

rðx þ txÞ dt ;

x2S2 ;

x2x⊥ ;

(1)

where x⊥ is the plane in R3 through the origin with normal vector x and S2 is the unit sphere in R3 . In the context of applications, the function r is used to model the unknown object which is to be reconstructed, with the argument x of  rðxÞ representing the orientation at which r is observed. , called the Radon transform (or X-ray transform) of r, is known only In classical Radon inversion problems, the function r for a finite set of values of x and, due to discretization of the imaging plane, of x. There exist a number of procedures to . For further information see [1, p. 142] and the references therein. (approximately) obtain r from such observations of r In random tomography, as introduced in [2] and studied from the computational viewpoint in [3] to tackle the problem of cryo-electron microscopy (cryo-EM), the goal is to recover a biophysical particle r from observations at orientations which are, unlike in the classical case of (1), random and unknown. This extra layer of ill-posedness makes the problem considerably E-mail address: [email protected]. https://doi.org/10.1016/j.rinam.2019.100067 2590-0374/© 2019 The Author. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

2

C. Rau / Results in Applied Mathematics 3 (2019) 100067

more difficult. (The exact meaning of the term ‘orientation’ in this context is given in Definition 1 below.) Throughout the present paper, we adopt the assumption made in [2,3] that the rotations are distributed according to Haar measure n on the three-dimensional rotation group SOð3Þ. A random rotation A is Haar distributed if the distribution of Q A (and then also automatically that of AQ ) is the same as that of A, for any fixed Q 2SOð3Þ. This property makes n the most symmetrical, and hence most ‘natural’, probability measure on SOð3Þ, and an analogue of the uniform distribution on the circle. The measure n may be given explicitly in various parametrizations of SOð3Þ; for the so-called Euler angles, which are defined in Subsection 2.1 and play a major role in this paper, this description is contained in the integration formula of Proposition 3. It was shown in [2] that in the above setup, the degree of ill-posedness is more serious than in the classical Radon transform, as only the so-called shape of r,

½r :¼ fAr; A 2 SOð3Þg ;

(2)

can be recovered; here ArðxÞ :¼ rðA1 xÞ for x2R3. The shape ½r is an equivalence class; passing from r to ½r identifies r with all of its rotated versions. In order to make the problem statistically tractable, in [2,3] as well as in the present paper, an approximation of r by a Gaussian mixture with scalar, identical covariance matrices is used:

rðxÞ ¼

K X

qk 43 ðx  mk Þ ;

qk > 0 ;

k¼1

K X

qk ¼ 1 ;

(3)

k¼1

 where 4d ð ,Þ ¼ 4d ð ,m; s2 Þ, d ¼ 2; 3, is the d-dimensional Gaussian density with mean m and covariance matrix s2 I, and I is the identity matrix. Here the points m1 ;…; mK , called landmarks (LMs) in the sequel, are assumed to be distinct, and s is assumed known. In the following definition, diam A is the diameter of the set A, and H is the matrix I without the last row. Definition 1. (See Panaretos, 2009) Let r have its centroid at the origin, and assume diam supp r  2p. The function PfrgðAÞ defined by

ðPfrgðAÞÞðxÞ :¼ ¼

K X

Z

þ∞ ∞





Arðx1 ; x2 ; x3 Þ dx3

qk 42 x  mAk ;

(4)

mAk :¼ HAmk ;

k¼1  is called the tomographic projection or profile, of r at orientation A2SOð3Þ. A sample fðPfrgðAm ÞÞgN m¼1 , with A1 ; …; AN independent and distributed as n each, is called the stochastic Radon transform (SRT) of r. Equation (4) shows that the tomographic projection of a Gaussian mixture in R3 is, conditional on A or the random LMs mA , k a Gaussian mixture in R2 . To remove effects of location, any estimators of the LMs are shifted so that they sum to zero.

Remark 2. In contrast to Eqn. (1), the orientation used in the tomographic projection of the SRT has one additional parameter. In the Euler angle parametrization introduced in Subsection 2.1, the extra parameter is essentially the precession angle j, that is the within-plane rotation of the projection [2, pp. 3275e3276]. The fact that only ½r, and not r itself, can be recovered implies in the context of the normal mixture model (4) that not the LMs mk themselves, but their Gram matrix

G :¼



mi ; mj

K i;j¼1

;

is to be estimated, where 〈 ,; ,〉 denotes the standard inner product. Indeed, it is easy to check that G does not change if mk is replaced by Amk for k ¼ 1; …; K. (In fact A may also be a reflection, so that chirality, or handedness, is lost by passing to the Gram matrix. Loss of chirality is a well-known phenomenon in cryo-EM, see for example [4, p. 1095].) In practice, the random field PfrgðAÞ is observed on a finite set of points in the bounded imaging plane. That is, the observations lie within the set fx2R2 :  p  x1 ;x2  pg. The discretization adds another asymptotic (of the ‘infill’ type, see [5]) to the problem, besides the one coming from the length N of the SRT. Typically, and as in [3], the observation points lie on a discrete square grid of finite extent with say T points along each of the two axes, so that one has T 2 observations in total. In theoretical considerations such as in the laws of large numbers and central limit theorems (CLTs) presented in [2], T ¼ TN diverges together with N . Also, in [2] a likelihood function for the problem was formulated, which, besides the randomness coming from the rotations, imposed additive, independent and normally distributed noise. More precisely, the model for ðmÞ P ðmÞ ¼ fP s;t g, the m'th profile in the sample ð1  m  N Þ, is

  ðmÞ ðmÞ P s;t ¼ PfrgðAm Þ us;t þ εs;t ;

s; t ¼ 1; …; T :

(5)

C. Rau / Results in Applied Mathematics 3 (2019) 100067

3

In the above mentioned case of a square grid, the indices s and t may be taken as the row and column of the pixel center

ðmÞ us;t . The rotations A1 ; …; AN are independent and distributed as n; and the random errors εs;t are independent and have 2 2 the normal Nð0; sε Þ distribution for all s; t, and fixed variance sε > 0. Also, the rotations and errors are independent. This “white-noise” assumption approximates reasonably well the practical situation in biophysical analysis [6, p. 1446]. While the assumption of gridded design is convenient for graphical purposes, the problem at hand suggests the use of a polar grid, where the points us;t are assumed to be the centers of polar pixels:

us;t ¼ rs ðcos zt ; sin zt Þ ;

1sP;

1tM;

(6)

where frs ; s ¼ 1 ; …; Pg is a set of increasing radii with r1 < … < rP  p (recall that 2p is assumed to be the maximally allowed extent of the support of r), and zt ¼ 2pt=M form a set of equispaced angles. The notion of polar pixels has previously found use in tomographic problems, see [7]. While there the main idea was the reduction of computation cost, the main attractive feature in the present context is to take advantage of the symmetry of the problem in order to reduce the adverse effects that any discretization has on the reconstruction. We shall adopt polar pixels in the numerical experiments presented in Section 3. We point out that for the Lasso, or more generally the elastic net, procedure that is used to carry out the reconstruction, the square grid assumption is not essential, as noted in [8, p. 320]. The log-likelihood associated with Eqn. (5) is given for the case of R2 in [2, Eqn. (4.6), p. 3287] and in the case of R3 in [3, m Eqn. (3.4), p. 2579)], where in the latter case, an error in notation can be corrected (recall that mA was defined in (4)): k

n [

mAk m

f

1

o

! k ¼ 1; …; K ; fqk gk¼1;…;K m ¼ 1; …; N

(7)

N X K T n  o2 X X ðmÞ P s;t  qk 42 us;t  mAk m :

N T 2 m¼1

s;t¼1

k¼1

Limit theory for a hybrid of likelihood and method-of-moments estimation, based on the log-likelihood [, was developed in [2, Sec. 4]. However, it was noted there that the resulting estimators are far from easily tractable. These difficulties were further elaborated upon in [3], where a different method for reconstructing ½r from [2] was proposed to (at least partly) overcome these issues. That method was based on a relaxation to a sparse regression problem, in which the unknown locations and mixing weights were estimated separately. Relaxation means that not the precise location of the component mode is sought, but only the index of the (Cartesian or, in the context of the present paper, polar) pixel containing that mode. In this connection, we should recall the fact that a Gaussian mixture can have more modes than components [9], and that closed-form expressions for key quantities associated with Gaussian mixtures are essentially only known in the case of two components [10,9]. As this plays a crucial role in the present paper, we now turn to a brief description of sparse regression. In a linear model with a vector Y of n response variables and a design matrix X2Rnp with p explanatory variables, the elastic net (or in the b¼b b ðlÞ of a regression coefficient vector b2 context of the implementation in the R software [11], the glmnet [12]) estimate b Rp is of the form [13] (see also [8])



.



bb ðl1 ; l2 Þ :¼ argmin kY  Xbk22 n þ l1 jjbjj1 þ l2 kbk22 ; b

(8)

np where X2R   is the design matrix and l1 ; l2  0 are penalty parameters. Thus for the elastic net, the penalty is a sum of jj,jj1 - and ,j22 -penalties. The case l2 ¼ 0 gives the Lasso, which was exclusively used in the relaxation in [3], while l1 ¼ 0 gives ridge regression, also known as Tikhonov regularisation in the applied mathematics literature. It is known that Lasso tends to favour single non-zero (in the current estimation, where a density is to be estimated, positive) coefficients, while ridge regression tends to favour clusters of non-zero pixels [12, p. 3]. The present paper addresses two key problems that arise when applying the procedure of [3] (if one considers the elastic net, and not just the Lasso, in the reconstruction problem). First, simulations show that whenever two landmarks are close to each other, the ranks of the associated mixing weights qk may be wrongly assigned; for example, if the mixing weights qk and their sample analogues b q k are in increasing order, as done in [3] and also in the simulations in Section 3 of the present paper, it may be that the estimated weight b q K does not correspond to the true largest component. Evidently, this problem affects the

4

C. Rau / Results in Applied Mathematics 3 (2019) 100067

whole subsequent recovery procedure. Second, the information from the regression coefficients was essentially discarded, as stated in [3, p. 2585], and the mixing weights estimated in a ‘fresh’ second run on the entire data. It is plausible to surmise that the information which was discarded may help to set the parameters l1 and l2 in (8) d or, more simply, the relative weights of these penalties as considered in [12], based on the representation

l1 ¼ al ;

l2 ¼ ð1  aÞl :

(9)

The problem of estimating l is being taken care of by the cross-validation of the glmnet package of [12], via the cv.glmnet command. On the other hand, the problem of estimating the crucial parameter a in (9) does not seem to have been investigated in any depth. The vignette for the R glmnet package of [12] suggests averaging several samples in order to suppress the randomness from the cross-validation, and comparing the resulting regularisation curves. The present paper highlights a situation where the averaging is not only over the folds of the cross-validation, but also over the space of additive error ðmÞ variables εs;t from Ref. [5] and the rotations Am . The distinct roles of these probability spaces was pointed out previously in [2, p. 3288] as the underpinning of the limit theorems presented there. The expressions that we suggest in order to overcome the above two key problems appear as precision matrix (inverse covariance matrix) entries in the central limit theorem (CLT) for the mixing weights in [2, Thm. 4.4, p. 3293]; these entries are shown as the quantities fij in Subsection 2.2. These entries also arise from applying a second-order differential operator to the image d not in the image domain, but in the parameter domain of the Gaussian mixture d, as an elementary computation, see [2, p. 3294], shows. As the indices of the Gaussian peaks participating in the precision matrix entry are interchangeable (that is, the precision matrix is symmetric), the problem of incorrectly assigned ranks is circumvented. As to the second key problem, the approach presented here makes use of the whole set of non-zero regression coefficients, and not just the ranks of the mixing weights computed therefrom as in [3]. Once the attention is limited to rotations within the observation ðx1 x2 Þ plane, we have essentially a problem that can exploit invariance, modulo the errors that arise from the discretization and noise, under such rotations. This has some similarities with the use of symmetries in classical Radon transform problems, see for example [7]. However, an impediment to exploiting this symmetry is that the implementation of the elastic net in the R package glmnet, see [12], is not invariant under rotation of the image [14]. The method in this paper also allows a refinement of an initial estimation by assessing its result on the ‘zoned’ precision matrix, where the ‘zones’ are essentially defined by the Euler nutation angle of the rotation. Another viewpoint on the problem discussed in the present paper is that of a two-step procedure, where the Lasso (or more generally, elastic net) coefficients are the input in a further estimation task (here, the estimation of the precision matrix). There does not appear to be much literature on two-step methods for the Lasso at present. One exception is [15]. In that paper, the Lasso was used to obtain the coefficients of a single-index model which was then used in monotone regression. However, not only our object of estimation, but already our model is markedly different. Section 2 gives an account of Euler angles, and then delves into the main computations. Section 3 shows the application of the procedure to the artificial example in [2,3]. We emphasise that our methodology applies to other cases, and indeed beyond the Haar measure situation, since the ‘zones’ may be examined each by themselves and are not assumed to follow any fixed probability distribution (the latter may be imposed by the experimenter through the selection of images). We conclude this section with notation. We align our notation with that in [3] where this is possible, although on occasion we shall modify it for clarity. Generic vectors in R3 are usually written as ðx; y; zÞT . The spherical coordinates of ðx; y; zÞT 2 S2 are [1, p. 108] 8 > >
¼ sin q cos 4 ; y ¼ sin q sin 4 ; > > : z ¼ cos q ;

(10)

where the longitude 4 (measured counterclockwise as the oriented angle with the yz-plane) and the colatitude q (measured as the angle with e3 ¼ ð0; 0; 1ÞT ) satisfy 0  4 < 2p and 0 < q < p; we arbitrarily set 4 ¼ 0 for q2 f0; pg. We write ðx; y; zÞ  ð4; qÞ. (No confusion will be possible between 4 and the Gaussian density 4d.) We denote by Eij the 3 3 matrix with 1 in the ði; jÞ-entry and 0 elsewhere. For a nonzero vector v, write v0 for the vector of the same direction and unit Euclidean length. The number of elements of a finite set C is denoted by #ðCÞ. For a random variable X and any set B, we write EðX; BÞ ¼ EðX1B Þ, where 1B is the indicator function of B. 2. Methodology A useful tool in this paper for describing rotations are the Euler angles, considered in the first subsection.

C. Rau / Results in Applied Mathematics 3 (2019) 100067

5

2.1. Euler angles We represent A2SOð3Þ by its ZYZ Euler angles [16, pp. 125e126]:

A ¼ EuZYZ ð4; q; jÞ ¼ expðjY 3 ÞexpðqY2 Þexpð4Y3 Þ ; Y1 ¼ E32  E23 ; Y2 ¼ E13  E31 ; Y3 ¼ E21  E12 ;

(11)

where ð4; q; jÞ 2½0; 2pÞ  ½0; p  ½0; 2pÞ. An explicit expression for (11) is given in [16, p. 126], which we do not display here. The ZYZ Euler angles are well-defined outside a n-nullset, as are the ZXZ Euler angles, which are defined analogously via the representation

A ¼ EuZXZ ð4; q; jÞ ¼ expðjY3 ÞexpðqY 1 Þexpð4Y 3 Þ : In this paper we will exclusively use ZYZ Euler angles and simply write Eu ¼ EuZYZ . The Euler angles j; q; 4 are called, respectively, precession angle, nutation angle, and angle of proper rotation [17, p. 136]. These terms are used for both ZYZ and ZXZ Euler angles. Returning to the setting of the normal mixture model (3), for A2SOð3Þ and 1  i; j  K with isj, let

    4ij ; qij ; jij ¼ Eu1 AQ ij ;

(12)

where Q ij 2SOð3Þ is any rotation such that Q Tij e3 ¼ ðmi  mj Þ0 . Note that the ‘nutation angle with shift’ qij , which will be the most important Euler angle for our considerations, is n-almost everywhere well-defined. Indeed, the exception set of rotations A for which (12) is not well-defined, meaning that qij 2f0; pg, is just a shift, by a fixed rotation, of the n-nullset where the original Euler angles (11) are not well-defined. We write

Bn : ¼ An Q ij ;

n ¼ 1; …; N :

(13)

We shall need the following integration formula for Euler angles from [17, Prop. 7.4.2, p. 136]. (There the result is formulated for ZXZ Euler angles, but the simple relation [16, p. 126] EuZYZ ð4; q; jÞ ¼ EuZXZ ð4 þ p=2; q; j  p=2Þ gives the formula for ZYZ Euler angles.) Proposition 3.

Z

Assume that F  0, or that F is n-integrable. Then

FðAÞ nðdAÞ ¼

Z p

1 8p2

0

Z 2p Z 2p sin q dq d4 dj FðEuð4; q; jÞÞ : 0

0

2.2. Glmnet-based estimates for precision matrix entries The entries of the precision matrix F ¼ ðfij Þ appearing in the CLT in [2, Thm. 4.4, p. 3293] were shown to equal

fij :¼

1 2ps2ε

Z E

        42 xHAmi ; s2 42 xHAmj ; s2 dx :

R2

Recall that s2ε is the variance of the noise in model (5). Here and below, the expectation is taken with respect to Haar measure n. There is an invariance inherent in the expression for fij, commonly referred to as bi-invariance. This invariance, as well as other facts stated in this section, concern the space S2 . This space plays a crucial role in random tomography, as it is the homogeneous space of the rotation group SOð3Þ, via the group action x1Ax for A2SOð3Þ and x2S2 . This close connection is also visible, for example, in the proof of the ‘shape inversion formula’ of [2, Thm. 4.1, p. 3285]. First we recall a classical formula for integration of functions on S2 ; see [17, Prop. 9.1.2, p. 189]. Proposition 4. Let f : S2 /R be an integrable function; we use the same notation if f is considered as a function in spherical coordinates (10). Assume that there exists a function F : ½ 1; 1/R such that f ð4; qÞ ¼ Fðcos qÞ. Then

6

C. Rau / Results in Applied Mathematics 3 (2019) 100067

Z

f ðxÞ dsðxÞ

S2

Z p Z 1 1 Fðcos qÞsin q dq ¼ pffiffiffi ¼ pffiffiffi 2 p 0 2 p

1 1

FðtÞ dt :

A function f with the property from Proposition 4 is called zonal. The following proposition concerns general bi-invariant functions. Proposition 5. For arbitrary rotationally invariant “radial basis functions” in place of 42, the entry fij is a function of the entries in Gjfi;jg , the submatrix of G with row and column indices fi; jg. Proof. By [17, Lemma 9.5.2, pp. 204e205], the kernel ðx; hÞ142 ðxjxÞ42 ðxjhÞ, where x; h2S2 , is a function of 〈x; h〉. Now let x ¼ m0i , h ¼ m0j . For notational ease, on occasion we do not make the dependence of subsequently defined quantities on i; j explicit.    Assume now that the rotationally invariant radial basis function is the Gaussian 42 $m; s2 . To find the concrete simplifications for fij under this assumption, we use a simple identity for the bivariate normal density, stated in the next lemma. Lemma 6.

         u þ v s2    2 2 2 ; : 42 yu; s 42 yv; s ¼ 42 u  v0; 2s 42 y 2 2

Proof. The proof follows directly from the parallelogram law. The quantity fij is now computed by successively applying the defining invariance property of Haar measure, Lemma 6, the integration formula for SOð3Þ from Proposition 3, and Proposition 4, as follows (recall that Q ij was defined below (12)):

Z

        42 ymi ; s2 42 ymj ; s2 dy

E R2

Z

¼E

        42 yQ ij mi ; s2 42 yQ ij mj ; s2 dy

R2

¼

¼

1

Z p

1

sinqij dqij 4s2  2 !   1  t2 mi  mj   dt ; 4s2

exp 

8ps2

0

Z

1

exp

4ps2

0

(14)

!   m  m 2 sin2 qij i j

for any isj. Evidently, this ‘weighted heat kernel’ does have the property stated in Proposition 5. More generally, writing Qij for the random version of the nutation angle qij defined in (12) and defining I m ¼ ½qm ; qmþ1 Þ, one obtains analogously (by replacing f in Proposition 4 by f 1I m )

Z

        42 ymi ; s2 42 ymj ; s2 dy ; Qij 2I

E R2

¼

1 8ps2

Z

(

cos qm cos qmþ1

exp

 

m

2 )  1  t2 mi  mj  4s2

(15) dt :

A Let b b s;t :¼ bb s;t denote the regression coefficients from the Lasso problem (8), where the penalty l is chosen to minimise a criterion; in the glmnet package of [12], the choices for this are the mean-squared error (MSE) and the mean absolute error (MAE). 2 In the present context, P is the vectorised image of length T 2 , and the design matrix X ¼ ðxj;p Þ2RT #ðM Þ arises from the Gaussian (heat) kernel; here M is the set of candidate component centers, cf. [3]:

        xj;p ¼ 42 uj up ; s2 ¼ 42 uj  up 0; s2 :

C. Rau / Results in Applied Mathematics 3 (2019) 100067

7

 First we look at how define a (function-valued) estimator of a single factor 42 ð ,mA ; s2 Þ. Here and in the sequel, we use the word ‘estimator’ with the understanding that A is fixed, so that the randomness, at this point, solely comes from the error variables εs;t and the cross-validation from the high-dimensional regression (8). The estimator of the mean in [3, p. 2584] suggests two possibilities for the estimation from a single profile as in (5). First, the ‘plug-in’ estimator

T

1

         cA 2 42 , mAk ; s2 ¼ 42 ,  m ;s ; k

(16)

where



mcAk ¼ bq Ak

1

X

A bb s;t us;t ;

ðs;tÞ 2 C~

X

A b q k :¼

ðs;tÞ 2 C~

A k

A bb s;t :

(17)

A k

(For simplicity, we assume that s is known; otherwise, similar to [3], an estimate of s may be refined in the course of the A estimation procedure proposed here.) The notation C~ k indicates that these “clusters” of non-zero coefficients, as we shall call them b . following [3] (where two pixels are connected if they share either an edge or a corner), are estimates, just as the coefficients b s;t Note that in (16) as well as (18) below, the ‘center’ mA is not estimated explicitly; we think of the estimator as a nonparametric k density ‘centered’ at mA . k An alternative to (16) is the estimator

T

2

     1 A  42 ,mAk ; s2 ¼ b qk

X

   A b b s;t 42 , us;t ; s2 :

(18)

A ðs;tÞ 2 C~ k

For the analysis in [3], it does not matter whether we adopt (16) or (18), as the only information about the components that is used in that paper is the locations of their ‘centers’ (recall the paragraph below (7) for related discussion). However, in the present paper this is not so.   We now turn to the approximation of the product 42 ð ,mi ; s2 Þ42 ð ,mj ; s2 Þ. First of all, and as in (14), the Haar measure assumption allows us to replace mi ; mj by Q ij mi ; Q ij mj. The approximation uses all nm profiles whose observed nutation angle falls in I m :

T

         42 ,Q ij mi ; s2 42 ,Q ij mj ; s2           ¼ T 1 42 ,Q ij mi ; s2 T 1 42 ,Q ij mj ; s2 

¼



X

1

nm

1

ðnÞ 2 ij

n: Q

I

X

m

B Bn b qi n b qj

Bn Bn bb 0 0 bb 00 s ;t

s ;t 00

        42 ,us0 ;t 0 ; s2 42 ,us00 ;t 00 ; s2 ;

Bn ðs0 ;t 0 Þ 2 C~ i Bn ðs00 ;t 00 Þ 2 C~ j

where Bn was defined in (13). The identity (6) gives

tm :¼

Z T

 

        42 ymi ; s2 42 ymj ; s2 dy

R2

¼

1

nm

X ðnÞ

n: Qij 2 I

1 m

T X

B Bn b q j a;b¼T qi n b

X Bn ðs0 ;t 0 Þ 2 C~ i Bn ðs ;t Þ 2 C~ j 00

00

ðs0 ;t 0 Þðs00 ;t 00 Þ¼ða;bÞ Bn b B0 n 0 b b s ;t b s00 ;t 00

    42 ua;b 0; 2s2 :

(19)

8

C. Rau / Results in Applied Mathematics 3 (2019) 100067

The quantity tm approximates the conditional probability

Z E

          42 ymi ; s2 42 ymj ; s2 dyQij 2I

m

;

R2

which is computed from (15) through dividing by

 pm : ¼ P Qij 2 I

 m

¼ ðcos qm  cos qmþ1 Þ=2 ;

where the latter probability may be computed from Proposition 3. Remark 7. The summation for the innermost sum on the right-hand side in (19) shows that, in contrast to the estimators Bn Bn in (17), it now the ‘slopes’ between two pixels of the respective clusters C~ i ; C~ j that contribute to the sum, rather than individual pixels. In view of the problem mentioned in Section 1 regarding non-invariance of the glmnet implementation under rotations of the image, it is of interest to consider a shift in the random angle of precession j, so that j is replaced by jþ j0 for some fixed nonrandom j0 . 3. Artificial example of Panaretos and Konis The artifical example in [2,3] is a normal mixture as in (3), with parameters

s ¼ 0:46 ;

q ¼ ð0:18; 0:26; 0:21; 0:35Þ ;

m01 ¼ ð0; 0:8; 0:3ÞT ; m03 ¼ ð0:7; 0:4; 0:3ÞT ;

m02 ¼ ð0:7; 0:4; 0:3ÞT ; m04 ¼ ð0; 0; 0:8ÞT :

The problem was found to not be strongly affected by noise, as is well known for penalized regression and pointed out in [3, p. 2586]. We let sε ¼ 0:025. P We applied the centring mk ¼ m0k þ ð0; 0; 0:025ÞT for k ¼ 1 …; 4, so that Kk¼1 mk ¼ 0; the same was done for the sample analogues. We further ordered the LMs according to ascending mixing weights. For these data, approximating the distance between any two LMs by their average r z 1.378, (14) becomes

1

Z

4ps2

1

exp 0

  ! 1  t2 r2  dt z 0:1791 4s2

for isj;

and (15) likewise takes a single value for any isj. For space reasons, we only report the results for the whole data, that is we take I m ¼ ½0; pÞ. The effective sample size is smaller than the actual sample size due to the occurrence of indistinguishable mixing components, which we did not attempt to separate. Note that even well-separated LMs may be affected by the problem of incorrectly identified mixing weight ranks mentioned in Section 1. We chose i ¼ 2 and j ¼ 4, corresponding to the LMs with the largest and second-largest mixing weights. Our computations were carried out in the R language [11], using the glmnet package of [12]. We used a polar grid of the form (6) with ðP;MÞ ¼ ð40; 32Þ, and the MSE criterion. A small simulation study with sample size N ¼ 300 yielded Table 1, which calls in question the use of the pure Lasso in [3]. Table 1 shows the results for different values of a ¼ 1  a and of the shift j0 of angle of precession from the end of Section 2 ðj0 ¼ 0; j0 ¼ p=3Þ. In addition, the effective sample size nm is shown in parentheses in each case. The deviation from tm showing in the optimum appears to be due to discretization error. Nevertheless, the results are surprisingly good for the moderate sample size. The situation is mostly similar if instead I m is chosen as a sub-interval of ½0; pÞ. Table 1 Simulation of tm ¼ 0:1077 from (19) for different values of a : ¼ 1  a, with a as in (9). (Values of l not shown.) Shift angle j0

a ¼0

a ¼ 0:0005

a ¼ 0:0010

a ¼ 0:0015

a ¼ 0:0020

a ¼ 0:0025

0

0.0918 (123) 0.0944 (137)

0.0924 (122) 0.0996 (129)

0.0957 (129) 0.0964 (133)

0.0979 (122) 0.0914 (130)

0.0928 (112) 0.0954 (120)

0.0942 (116) 0.0948 (129)

p =3

C. Rau / Results in Applied Mathematics 3 (2019) 100067

9

Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest None. Acknowledgment We are grateful for the helpful comments of an anonymous reviewer. References  upper half-plane. Second ed. New York: Springer; [1] Terras A. In: Harmonic analysis on symmetric spacesdEuclidean space, the sphere, and the Poincare 2013. https://doi.org/10.1007/978-1-4614-7972-7. [2] Panaretos V. On random tomography with unobservable projection angles. Ann Stat 2009;37:3272e306. https://doi.org/10.1214/08-AOS673. [3] Panaretos V, Konis K. Sparse approximations of protein structure from noisy random projections. Ann Appl Stat 2011;5:2572e602. https://doi.org/10. 1214/11-AOAS479. [4] Shkolnisky Y, Singer A. Viewing direction estimation in cryo-EM using synchronization. SIAM J Imaging Sci 2012;5:1088e110. https://doi.org/10.1137/ 120863642. [5] Cressie NAC. Statistics for spatial data. New York: Wiley; 1993. https://doi.org/10.1111/jtsa.12168. n J, Singer A. Structural variability from noisy tomographic projections. SIAM J Imaging Sci 2018;11:1441e92. https://doi.org/10.1137/ [6] Ande 17M1153509. [7] Girard D. Optimal regularized reconstruction in computerized tomography. SIAM J Sci Stat Comput 1987;8:934e50. https://doi.org/10.1137/0908076. € fling H, Tibshirani R. Pathwise coordinate optimization. Ann Appl Stat 2007;1:302e32. https://doi.org/10.1214/07-AOAS131. [8] Friedman J, Hastie T, Ho ndola C, Engstro € m A, Haase C. Maximum number of modes of Gaussian mixtures. Inf Inference 2019. to appear, https://doi.org/10.1093/imaiai/ [9] Ame iaz013. [10] Ray S, Lindsay B. The topography of multivariate normal mixtures. Ann Stat 2005;33:2042e65. https://doi.org/10.1214/009053605000000417. [11] R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2019. https://www. R-project.org/. [12] Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1e22. http://www. jstatsoft.org/v33/i01/. https://doi.org/10.18637/jss.v033.i01. [13] Bühlmann P, van de Geer S. Statistics for high-dimensional data: methods, theory and applications. New York: Springer; 2010. https://doi.org/10.1007/ 978-3-642-20192-9. [14] Hastie T. Personal communication. 2019. [15] Neykov M. Isotonic regression meets LASSO. Electron J Stat 2019;13:710e46. https://doi.org/10.1214/19-EJS1537. [16] Chirikjian G, Kyatkin A. Engineering applications of noncommutative harmonic analysis: with emphasis on rotation and motion groups. Boca Raton, FL: CRC Press; 2001. https://doi.org/10.1201/9781420041767. [17] Faraut J. Analysis on Lie groups: an introduction. Cambridge Studies in advanced mathematics. vol. 110. Cambridge: Cambridge University Press; 2008. https://doi.org/10.1017/CBO9780511755170.