PhaseLift is robust to a constant fraction of arbitrary errors

JID:YACHA AID:1110 /COR [m3L; v1.172; Prn:20/01/2016; 9:21] P.1 (1-13) Appl. Comput. Harmon. Anal. ••• (••••) •••–••• Contents lists available at S...

Download PDF

726KB Sizes 2 Downloads 26 Views

Report

PDF Reader
Full Text

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.1 (1-13) Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

Contents lists available at ScienceDirect

Applied and Computational Harmonic Analysis www.elsevier.com/locate/acha

Letter to the Editor

PhaseLift is robust to a constant fraction of arbitrary errors Paul Hand ∗ Rice University, Department of Computational and Applied Mathematics, MS-134, 6100 Main Street, Houston, TX 77005, United States

a r t i c l e

i n f o

Article history: Received 17 April 2015 Received in revised form 13 December 2015 Accepted 3 January 2016 Available online xxxx Communicated by Karlheinz Gröchenig Keywords: Phase retrieval PhaseLift Matrix completion Robust principal component analysis

a b s t r a c t Consider the task of recovering an unknown n-vector from phaseless linear measurements. This nonconvex problem may be convexiﬁed into a semideﬁnite rank-one matrix recovery problem, known as PhaseLift. Under a linear number of Gaussian measurements, PhaseLift recovers the unknown vector exactly with high probability. Under noisy measurements, the solution to a variant of PhaseLift has error proportional to the 1 norm of the noise. In the present paper, we study the robustness of this variant of PhaseLift to gross, arbitrary corruptions. We prove that PhaseLift can tolerate noise and a small, ﬁxed fraction of gross errors, even in the highly underdetermined regime where there are only O(n) measurements. The lifted phase retrieval problem can be viewed as a rank-one robust Principal Component Analysis (PCA) problem under generic rank-one measurements. From this perspective, the proposed convex program is simpler than the semideﬁnite version of the sparse-plus-low-rank formulation standard in the robust PCA literature. Speciﬁcally, the rank penalization through a trace term is unnecessary, and the resulting optimization program has no parameters that need to be chosen. © 2016 Elsevier Inc. All rights reserved.

1. Introduction This paper establishes robustness of an algorithm for recovering a vector x0 ∈ Rn from phaseless linear measurements that contain noise and a constant fraction of gross, arbitrary errors. That is, for ﬁxed measurement vectors ai ∈ Rn for i = 1 . . . m, our task is to ﬁnd x0 satisfying bi = |x0 , ai |2 + εi + ηi

(1)

for known bi ∈ R, known ai , and unknown ηi and εi . Here ηi will represent noise in the measurements, and εi will represent gross, arbitrary corruptions. This recovery problem is known as phase retrieval. Measurements of form (1) arise in several applications, such as X-ray crystallography, optics, and microscopy [13,19,1]. In * Fax: +1 713 348 5318. E-mail address: [email protected]. http://dx.doi.org/10.1016/j.acha.2016.01.001 1063-5203/© 2016 Elsevier Inc. All rights reserved.

JID:YACHA 2

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.2 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

such applications, gross corruptions in some measurements may be due to sensor failure, occlusions, or other eﬀects. Recently, researchers have introduced lifting-based algorithms for the phase retrieval problem that have provable recovery guarantees [1,5]. The insight of these methods is that the phase retrieval problem can be convexiﬁed by lifting it to the space of matrices. That is, instead of searching for the vector x0, one can search for the lifted matrix x0 xt0 . The quadratic measurements (1) then become linear measurements on this lifted matrix. As the desired matrix is semideﬁnite and rank-one, one can write a rank minimization problem under the semideﬁnite and data constraints, which has a convex relaxation known as PhaseLift. In the noiseless case, PhaseLift is the program min tr(X) subject to X 0, {ati Xai = bi }i=1...m . X

(2)

Here, the trace of X is a convex proxy for the rank of a positive semideﬁnite X. An estimate for the underlying signal x0 can be found by computing the leading eigenvector of the optimizer of (2). Similar to [5,8,2], we will seek recovery guarantees for independent identically distributed Gaussians ai ∼ N (0, In ) ∈ Rn . Under this data model, [8] and [2] have shown that (2) can be simpliﬁed to the feasibility problem ﬁnd X 0 such that {ati Xai = bi }i=1...m . This feasibility problem succeeds at ﬁnding x0 xt0 exactly with high probability when m ≥ cn for a suﬃciently large c [2]. This scaling is quite surprising because there are only O(n) measurements for an O(n2 ) dimensional object. As discussed in [5,8], the semideﬁnite cone is suﬃciently ‘pointy’ that the high-dimensional aﬃne space of data-consistent matrices intersects the semideﬁnite cone only at exactly one point. In the noisy case without gross errors, that is for ε = 0, [2] showed that the PhaseLift variant min |ati Xai − bi | subject to X 0 (3) i

ˆ successfully recovers a matrix near x0 xt0 with high probability. Speciﬁcally, they prove that the solution X t ˆ ˆ to (3) satisﬁes X − x0 x0 F ≤ C0 η1 /m with high probability. From X, an estimate of x0 can be obtained ˆ1, u ˆ1u ˆ When specialized to by x ˆ0 = λ ˆ1 , where (λ ˆ1 ) is the leading eigenvalue and eigenvector pair for X. η1 k the real-valued case, [2] proves that mink∈{0,1} ˆ x0 − (−1) x0 ≤ C0 mx0 for some C0 . The contribution of the present paper is to show that the program (3) is additionally robust against a constant fraction of arbitrary errors. For a ﬁxed set of coeﬃcients that contain gross errors, we show that approximate recovery succeeds with high probability for arbitrary signals and arbitrary values of the gross errors. Theorem 1. There exist positive numbers fmin , γ, c, C, C such that the following holds. Let m ≥ cn. Fix a set S ⊂ {1 . . . m} such that |S|/m ≤ fmin . On an event of probability at least 1 − e−γm , for any x0 ∈ Rn ˆ to (3) satisﬁes and for any ε with supp(ε) ⊆ S, the minimizer X ˆ − x0 xt F ≤ C X 0

η1 . m

The resulting estimate for x0 satisﬁes min ˆ x0 − (−1)k x0 ≤ C

k∈{0,1}

η1 . mx0

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.3 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

3

Note that this high-probability result is universal over signals x0 and corruptions ε with ﬁxed support. That is, a single realization of the measurement vectors ai will allow recovery with high probability for an arbitrary and even adversarial x0 . Further, for a ﬁxed set of corrupted entries, the values of the gross corruptions can be arbitrary and even adversarial. In the case of no gross errors, in which ε = 0, this theorem reduces to the result in [2] and the rank-one case of [15]. In the noiseless case, in which η = 0, the theorem guarantees exact recovery of x0 with high probability under a linear number of measurements, of which a constant fraction are corrupted. We now explore the optimality of this theorem. The scaling of m versus n is information theoretically optimal and has no unnecessary logarithmic factors. The noise scaling is the same as in [2], and its optimality was established there. For arbitrary errors, the ﬁxed fraction of gross errors cannot be extended to a case where fmin ≥ 1/2 because one could build a problem where half of the measurements are due to an x0 and the other half are due to some x1 . In such a case, recovery would be impossible. As stated, Theorem 1 only applies to real-valued signals. As the applications of phase retrieval typically involve complex measurements, it is an important follow-up question to determine if the corresponding theorem holds for complex measurements. We leave this question for future research. See Section 2.3 for a brief comment on this extension. 1.1. Comparison with other phase retrieval algorithms There are several algorithms for phase retrieval in the literature. The most popular in practice are alternating direction methods, such as Gerchberg–Saxton and Fienup, which involve inverting an oversampled Fourier Transform or solving a least squares problem [11,9,1]. These methods do not currently have rigorous recovery guarantees in the case of uncorrupted measurements, and they are sensitive to outliers in the inverse Fourier transform or least squares steps. In order to solve phase retrieval with corrupted measurements, the outliers would need to be explicitly ﬁltered or the algorithms would need to be modiﬁed. Of recent interest is a nonconvex algorithm for phase retrieval called Wirtinger ﬂow [4]. In this algo rithm, a descent is performed down the nonconvex objective i (|ai , x|2 − bi )2 under the initialization t of the leading eigenvector of the matrix i bi ai ai . This method enjoys theoretical guarantees and fast numerical implementations. In the presence of data corruption, the initialization and descent steps are susceptible to corrupted data. As a result, the Wirtinger ﬂow algorithm is not corruption tolerant. Perhaps a corruption-tolerant variant of Wirtinger ﬂow could be found. We leave this as the subject of future research. To the best of the author’s knowledge, PhaseLift is the only method for phase retrieval that can provably tolerate data corruption without explicit outlier detection. Because PhaseLift lifts an n-dimensional problem to an n × n semideﬁnite program, it is computationally expensive for large problems. As a result, there is currently a speed vs. corruption-tolerance tradeoﬀ in the theoretical performance of phase retrieval methods: the faster Wirtinger ﬂow algorithm has theoretical guarantees but is not corruption-tolerant, whereas the slower PhaseLift algorithm has theoretical guarantees and is corruption-tolerant. 1.2. Relation to robust PCA Much recent work in matrix completion has studied the recovery of low-rank matrices from arbitrary corruptions to its entries. This is a task known as robust Principal Component Analysis (PCA). Results in this framework typically involve measuring some of the entries of a low rank n × n matrix X0 and assuming that some fraction of those measurements are arbitrarily corrupted, giving the data matrix A. The matrix X0 can then be recovered under certain conditions by a sparse-plus-low-rank convex program: min λX∗ + E1 such that P(X + E) = P(A)

(4)

JID:YACHA 4

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.4 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

where X∗ is the nuclear norm of X, λ is a constant, E1 is the 1 norm of the vectorization of E, and P is the projection of a matrix onto the observed entries. [3,6,14,23,10,7,17]. Results from this formulation have been quite surprising. For example, under an appropriate choice of λ and under an incoherence assumption, the sparse-plus-low-rank decomposition succeeds for suﬃciently low rank X when O(n2 ) entries are measured and a small fraction of them have arbitrary errors [3]. Subsequent results have been proved that only require m rn polylog(n) measurements, where r is the rank of X [7,17]. This result is information theoretically optimal except for the polylogarithmic factor. The present paper can be viewed as a rank-one semideﬁnite robust PCA problem under generic rank-one measurements. From this perspective, we would naturally formulate the PhaseLift problem under gross errors by min λ tr(X) +

|ati Xai − bi | subject to X 0.

(5)

i

The present paper shows that explicit rank penalization by the trace term is not fundamental for exact recovery in the presence of arbitrary errors. That is, (5) can be simpliﬁed by taking λ = 0. The resulting program has no free parameters that need to be explicitly tuned. As in [8,2], the positive semideﬁnite cone provides enough of a constraint to enforce low-rankness. The present work diﬀers from the standard robust PCA literature in that the measurements are generic and are not direct samples of the entries of the unknown matrix. As Gaussian measurements are easier to analyze, we can provably achieve information theoretic scaling without additional logarithmic factors. 1.3. Numerical simulation We now use numerical simulation to provide empirical evidence for the performance of (3). Let the signal length n vary from 5 to 50, and let the number of measurements m vary from 10 to 250. Let x0 = e1 . For each (n, m), we consider measurements such that

bi ∼ Uniform([0, 104 ])

if 1 ≤ i ≤ 0.05m ,

bi = |x0 , ai |2

otherwise.

(6)

We attempt to recover x0 xt0 by solving (3) using the SDPT3 solver [20,21] and YALMIP [18]. For a given ˆ deﬁne the capped relative error as optimizer X, ˆ − x0 xt F /x0 xt F , 1). min(X 0 0 Fig. 1 plots the average capped relative error over 10 independent trials. It provides empirical evidence that the matrix recovery problem (3) succeeds under a linear number of measurements, even when a constant fraction of them contain very large errors. 2. Proofs Let A : S n → Rm be deﬁned by the mapping X → (ati Xai )i=1...m , where S n is the space of symmetric real-valued n × n matrices. Note that A∗ λ = i λi ai ati . Let AS be the restriction of A onto the coeﬃcients given by the set S. Let ei be the ith standard basis vector. Let X0 = x0 xt0 . We can write the measurements (1) as b = AX0 + ε + η.

(7)

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.5 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

5

Fig. 1. Recovery error for the PhaseLift matrix recovery problem (3) as a function of n and m, when 5% of measurements contain large errors. Black represents an average recovery error of 100%. White represents zero average recovery error. Each block corresponds to the average from 10 independent trials. The solid curve depicts when the number of measurements equals the number of degrees of freedom in a symmetric n × n matrix. The number of measurements required for successful recovery appears to be linear in n, even with a small fraction of large errors.

Similarly, the optimization program (3) can be written as min AX − b1 such that X 0. We introduce the following notation. Let X∗ be the nuclear norm of the matrix X. When X 0, X∗ = tr(X). Denote the Frobenius and spectral norms of X as XF and X, respectively. Let X, Y = tr(Y t X) be the Hilbert–Schmidt inner product. Given x0 , let Tx0 = {yxt0 + x0 y t | y ∈ Rn }. Note that Te1 is the space of symmetric matrices supported on their ﬁrst row and column. The orthogonal complement Te⊥1 is then the space of matrices supported in the lower-right n − 1 × n − 1 block. When x0 is clear, we will simply write T instead of Tx0 . Let I be the identity matrix, and let 1(E) be the indicator function of the event E. 2.1. Recovery by dual certiﬁcates The proof of Theorem 1 will be based on dual certiﬁcates, as in [12,5,8,2]. A dual certiﬁcate is an optimal variable for the dual problem to (3). Its existence certiﬁes the correctness of a minimizer to (3). The ﬁrst order optimality conditions at X0 for (3) are given by ˜ Y˜ = A∗ λ

(8)

˜ ∈ ∂ · 1 (−ε) λ

(9)

Y˜ 0

(10)

Y˜ , X0 = 0,

(11)

where ∂ · 1 (−ε) is the subgradient of the 1 norm evaluated at −ε. Note that (10) and (11) imply Y˜T = 0. Such a Y˜ would be dual certiﬁcate for (3). Unfortunately, constructing such a Y˜ that exactly satisﬁes these conditions is diﬃcult. As in [12,5,8,2], we seek an inexact dual certiﬁcate, which approximately satisﬁes these conditions. Speciﬁcally, we will build a dual certiﬁcate Y = A∗ λ that satisﬁes YT ⊥ IT ⊥

(12)

YT F ≤ 1/2 7 sgn(εi ) if εi = 0 λi = − m

(13)

|λi | ≤

7 m

if εi = 0.

(14)

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.6 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

6

To prove that existence of such a Y will guarantee successful recovery of x0 xt0 with high probability, we will rely on two technical lemmas. The ﬁrst technical lemma provides 1 -isometry bounds on A and was proven in [5]. Lemma 1. (See [5].) There exist constants c0 , γ0 such that if m ≥ c0 n, then with probability at least 1 −e−γ0 m , 1 1 A(X)1 ≤ 1 + X∗ for all X, (15) m 16 1 1 A(X)1 ≥ 0.94 1 − X for all symmetric, rank-2 X. (16) m 16 We will need simultaneous control of the 1 -isometry properties over several subsets of measurements. Lemma 2. There exists a constant γ˜0 such that the following holds. Let m ≥ 100c0 n, and ﬁx a support set S with |S| = 0.01m . There is an event ES with probability at least 1 − e−˜γ0 m on which 1 1 c AS X1 ≥ 0.94 1 − X for all symmetric rank-2 X, (17) |S c | 16 1 1 AS X1 ≤ 1 + X∗ for all X, (18) |S| 16 1 1 AX1 ≤ 1 + X∗ for all X. (19) |m| 16 The proof of Lemma 2 is immediate from Lemma 1. In order to prove that a dual certiﬁcate guarantees recovery, we establish a technical result that an optimal solution X0 + H to (3) lies near the cone HT ⊥ ∗ ≥ 0.56HT F with high probability. Lemma 3. Fix a support set S with |S| = 0.01m and let m ≥ 100c0 n. On the event ES from Lemma 2, 2 for all x and for all ε with supp(ε) ⊆ S, any optimal X0 + H satisﬁes HT ⊥ ∗ ≥ 0.56HT F − m η1 . Proof. Let X0 + H be a minimizer for (3), which implies that A(X0 + H) − b1 ≤ AX0 − b1 . By (7), AH − ε − η1 ≤ ε + η1 . Breaking these 1 norms into the coeﬃcients belonging to S and S c , we have AS c H − ηS c 1 + AS H − ε − ηS 1 ≤ ε + ηS 1 + ηS c 1 .

(20)

By the triangle inequality, we have AS c H − ηS c 1 + ε + ηS 1 − AS H1 ≤ ε + ηS 1 + ηS c 1 ,

(21)

AS c H1 ≤ 2ηS c 1 + AS H1 .

(22)

which implies

Breaking (22) into its components on T and T ⊥ and applying the triangle inequality, we have AS c HT 1 − AS c HT ⊥ 1 ≤ 2ηS c 1 + AS HT 1 + AS HT ⊥ 1 ⇒AS c HT 1 ≤ 2ηS c 1 + AS HT 1 + AHT ⊥ 1

(23) (24)

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.7 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

7

We now apply the 1 isometry bounds from Lemma 2 on each term of (24). On the event ES , 1 |S c |HT AS c HT 1 ≥ 0.94 1 − 16 1 |S c | √ HT F , ≥ 0.94 1 − 16 2

(25)

where the second inequality follows because HT has rank at most 2. On the event ES , 1 HT ∗ AS HT 1 ≤ |S| 1 + 16 1 √ 2HT F ≤ |S| 1 + 16

(27)

(26)

(28)

On the event ES , 1 HT ⊥ ∗ AHT ⊥ 1 ≤ m 1 + 16 Combining (24)–(29), we have 0.94 1 − √ 2 1+ Thus, 0.56HT F ≤

2 m η1

1 16

1 16

|S c | √ |S| − 2 m m

HT F ≤

+ HT ⊥ ∗ on the event ES .

2 ηS c 1 + HT ⊥ ∗ m

(29)

(30)

2

We may now prove that existence of an inexact dual certiﬁcate (12)–(14) will guarantee successful recovery of a matrix near X0 = x0 xt0 with high probability, provided that there are few enough arbitrary errors. Lemma 4. There exists a C such that the following holds. Fix S such that |S| = 0.01m . Let m ≥ 100c0 n. Fix x0 ∈ Rn . Fix ε ∈ Rm such that supp(ε) ⊆ S. Suppose that there exists Y = A∗ λ satisfying (12)–(14). ˆ of (3) satisﬁes X ˆ − X0 F ≤ C η1 . Then, on the event ES from Lemma 2, a minimizer X m ˆ = X0 + H be a minimizer for (3), which implies that A(X0 + H) − b1 ≤ AX0 − b1 . By Proof. Let X (7), AH − ε − η1 ≤ ε + η1 . Letting α = 7/m, condition (14) gives λ/α ∈ ∂ · 1 (−ε)

(31)

Hence, − ε1 + λ/α, AH − η ≤ ε1 + η1 ⇒

λ, AH − η ≤ αη1

(32) (33)

⇒

Y, H ≤ λ, η + αη1

(34)

⇒

Y, H ≤ 2αη1

(35)

Decomposing (35) into T and T ⊥ , we have

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.8 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

8

YT , HT ≤ −YT ⊥ , HT ⊥ + 2αη1 ,

(36)

YT ⊥ , HT ⊥ ≤ |YT , HT | + 2αη1 .

(37)

and thus,

By conditions (12)–(13) and HT ⊥ 0, we have HT ⊥ ∗ ≤ |YT ⊥ , HT ⊥ | ≤ |YT , HT | + 2αη1 ≤

1 HT F + 2αη1 2

(38)

By Lemma 3, on the event ES , HT ⊥ ∗ ≥ 0.56HT F −

2 η1 . m

(39)

Combining (38) and (39), and using α = 7/m, we get 0.56HT F ≤

2 14 η1 + 0.5HT F + η1 m m

(40)

So, HT F ≤

C C η1 and thus by (37), HT ⊥ F ≤ η1 . m m

ˆ − X0 F = HF ≤ We conclude X

C m η1

(41)

for some C. 2

2.2. Construction of the dual certiﬁcate We now construct the dual certiﬁcate for arbitrary x0. Our construction will be a modiﬁcation to the dual certiﬁcate in [2]. Also similar to [2], we will build dual certiﬁcates with high probability on a net of x0 . We will then use a continuity argument to get a dual certiﬁcate for a arbitrary x0 . Let S + and S − be disjoint supersets of the indices over which ε is positive or negative, respectively. Let S = S + ∪ S − . For pedagogical purposes, S + and S − should be thought of as exactly the indices over which ε is positive or negative. For technical reasons, we let them be supersets of cardinality linear in n, in order to use standard probability bounds. For a ﬁxed choice of S + and S − , let the inexact dual certiﬁcate Y be deﬁned by Y =

1 x0 2 x0 | 1 |ai , | ≤ 3 ai ati −7ai ati + 7ai ati + β0 − |ai , m x0 x0 c + − i∈S

i∈S

(42)

i∈S

where β0 = Ez 4 1(|z| ≤ 3) ≈ 2.6728 and z is a standard normal random variable. We will refer to each of the terms of in the right hand side of (42) as Y+ , Y− , and Y0 , respectively. The form of Y0 is due to [2], and the intuition behind it is as follows. Note that E(ai ati ) = In and m ˜ ˜ 0 1 E(|ai , e1 |2 ai ati ) = ( β00 In−1 ), where β˜0 = Ez 4 for a standard normal z. The construction m i=1 [β0 − 2 t ˜ |ai , e1 | ]ai a thus has expected value (β0 −1)IT ⊥ , which would provide the exact dual certiﬁcate conditions i

(10)–(11). As shown in [2], a satisfactory inexact dual certiﬁcate can be built with m = O(n) and coeﬃcients that are truncated to be no larger than 7/m. In the present formulation, the terms Y+ and Y− are then set to have coeﬃcients ∓7/m in order to satisfy (14). We now show that a dual certiﬁcate exists with high probability for a ﬁxed signal and a ﬁxed pattern of signs of ε.

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.9 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

9

Lemma 5. There exists constants c, γ ∗ such that the following holds. Let m ≥ cn. Fix x0 ∈ Rn and ε ∈ Rm . Let S + and S − be ﬁxed disjoint sets of cardinality 0.001m . Then the dual certiﬁcate Y from (42) satisﬁes ∗ (14), YT F ≤ 0.25, and YT ⊥ − 1.7IT ⊥ ≤ 0.3 with probability at least 1 − e−γ m . Proof. Without loss of generality, it suﬃces to assume x0 = e1 . It suﬃces to show that with high probability Y0,T ⊥ − 1.7IT ⊥ ≤ 0.2,

(43)

Y0,T F ≤ 0.15,

(44)

Y±,T ⊥ ≤ 0.05,

(45)

Y±,T F ≤ 0.05.

(46)

First, we establish (43)–(44). By Lemma 2.3 in [2], there exist c˜, γ˜ such that if |S c | ≥ c˜n, then with c probability at least 1 − e−˜γ |S | , m Y − 1.7I ⊥ ⊥ T ≤ 0.1, and |S c | 0,T m |S c | Y0,T ≤ 0.15.

(47) (48)

Thus, c c Y0,T ⊥ − 1.7IT ⊥ ≤ Y0,T ⊥ − |S | 1.7IT ⊥ + 1 − |S | 1.7 ≤ 0.2 m m which establishes (43). By (48), we get (44) immediately. Next, we establish (45). Let a be the vector formed by the last n − 1 components of a. Observe that ∗ i∈S + ai ai is a Wishart random matrix. Standard estimates for singular values of random matrices with Gaussian i.i.d. entries, such as Corollary 5.35 in [22], apply. If |S + | = 0.001m ≥ c˜0 n, with probability at least 1 − e−˜γ1 m for some γ˜1 , 1 ∗ ai ai − IT ⊥ ≤ 1/2 + |S | + i∈S

Hence, 1 ∗ ai ai ≤ 3/2 + |S | + i∈S

Thus, Y+,T ⊥ ≤

+ 3 |S | 2 m ,

and we arrive at (45). The bound for the Y−,T ⊥ term is identical. + Next, we establish (46). Note that Y+ = −7 |Sm | · |S1+ | i∈S + ai ati . As per Lemma 6, if |S + | = 0.001m ≥

c˜1 n, then Y+,T F ≤ 7 |Sm | · 5 ≤ 7 · 0.001 · 5 ≤ 0.05, with probability at least 1 − e−˜γ1 m . ∗ Thus (43)–(46) hold simultaneously with probability at least 1 − e−γ m for some γ ∗ provided that c ≥ max(2˜ c, 1000˜ c0 , 1000˜ c1 ). 2 +

The behavior of Y±,T relies on the following probability estimate for the behavior of a Gaussian Wishart matrix on T .

JID:YACHA 10

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.10 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

Lemma 6. Let x0 = e1 . Let A = probability at least 1 − e−˜γ1 m .

1 m

m i=1

ai ati . There exists c˜1 , γ˜1 such that if m ≥ c˜1 n then AT F ≤ 5 with

Proof. Let y be the (1, 1) entry of A, and let y be the rest of the ﬁrst column of A. Then AT 2F = y 2 +2y 22 . m 2 1 2 So, y = m i=1 ai (1). Hence, my ∼ χm . Standard results on the concentration of chi-squared variables, such as Lemma 1 in [16], give P(my ≥ 4m) ≤ e−˜γ1,2 m for some γ˜1,2 . Hence P(y 2 ≥ 16) ≤ e−˜γ1,2 m . 1 Now, it remains to bound y 22 . We can write y = m Z c, where Z = [a1 , . . . , am ] and ci = ai (1), where 2 Z and c are independent. Note that c2 is a Chi-squared random variable with m degrees of freedom. Hence, with probability at least 1 − e−˜γ1,2 m , c22 ≤ 4m For ﬁxed x2 = 1, Z x22 ∼ χ2n−1 , and hence with probability at least 1 − e−γ1,3 m Z x22 ≤ m when m ≥ c˜1 n. Hence, m2 y 22 ≤ 4m · m with probability at least 1 − 2e−˜γ1,4 m . So, y 2 < 2 with probability at least 1 − 2e−˜γ1,4 m . We conclude AT 2F ≤ 25, and hence AT F ≤ 5 with probability at least 1 − e−˜γ1 m for some γ˜1 . 2 We now show that for a ﬁxed signal and support set of gross errors, there is a high probability that a dual certiﬁcate exists simultaneously for all gross errors. Lemma 7. Fix x0 and a support set S. If m ≥ cn and |S|/m ≤ min(0.001, γ ∗ /2 log 2), then there is an ˜S,x on which for all ε with supp(ε) ⊆ S, there exists a Y satisfying (14), YT F ≤ 0.25, and event E 0 ˜S,x is at least 1 − e−γ ∗ m/2 . YT ⊥ − 1.7IT ⊥ ≤ 0.3. The probability of E 0 Proof. Consider all of the 2|S| possible assignments of sign to the entries of ε on S. For each, choose an S + and S − that are disjoint, have cardinality 0.001m , and are supersets of the indices assigned a positive or ˜S,x be the event on which all sign assignments yield a Y satisfying (14), negative sign, respectively. Let E 0 YT F ≤ 0.25, and YT ⊥ − 1.7IT ⊥ ≤ 0.3. By Lemma 5, this event has probability at least 1 − 2|S| e−γ

∗

m

≥ 1 − e−γ

∗

m/2

.

2

We now show that for a ﬁxed support set of gross errors, there is a high probability that a dual certiﬁcate exists simultaneously for all signals and for all gross errors. Lemma 8. Fix a support set S. If m ≥ max(c, 4 log(201)/γ ∗ )n and |S|/m ≤ min(0.001, γ ∗ /2 log 2), then on ∗ an event of probability at least 1 − e−γ m/4 , for all x0 and for all ε with supp(ε) ⊆ S, there exists Y = A∗ λ satisfying (14) and YT F ≤ 0.4 and YT ⊥ − 1.7IT ⊥ ≤ 0.5. Proof. By Lemma 7, for any ﬁxed all x0 such that x0 = 1, for all ε with supp(ε) ⊆ S, there exists a Y = A∗ λ such that

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.11 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

7 m 7 sgn(εS ) λS = m

λ∞ ≤

11

(49) (50)

YT ⊥ − 1.7IT ⊥ ≤ 0.3

(51)

YT F ≤ 0.25

(52)

∗

˜S,x , which has probability at least 1 − e−γ m/2 . By Lemma 5.2 in [22], there exists an ε˜-net on the event E 0 Nε˜ of the unit sphere such that |Nε˜| ≤ (1 + 2/˜ ε)n . Hence, such a Y exists simultaneously for all x0 ∈ Nε˜ ∗ on an event of probability at least 1 − (1 + 2/˜ ε)n e−γ m/2 . If m > 4n log(1 + 2/˜ ε)/γ ∗ , then such a Y exists ∗ ∗ simultaneously for all x0 ∈ Nε˜ with probability at least 1 − e−γ cn/4 ≥ 1 − e−γ m/4 . We now appeal to a continuity argument to show that a dual certiﬁcate exists for points not on the net Nε˜. For an arbitrary x such that x2 = 1, we consider the Y corresponding to the nearest x0 ∈ Nε˜. Note that x − x0 ≤ ε˜ by deﬁnition of the net Nε˜. We now closely follow the proof and notation of Corollary 2.4 in [2] to show that Y is a satisfactory approximate dual certiﬁcate for x. Note that Y ≤ 2.5. Let Δ = xxt − x0 xt0 and note that ΔF ≤ 2˜ ε. Let T = Tx . Now we have YT ⊥ − 1.7IT ⊥ = YTx⊥ − 1.7ITx⊥ − R1

(53)

YT = YTx0 + R2

(54)

R1 = ΔY (I − x0 xt0 ) + (I − x0 xt0 )Y Δ − ΔY Δ − 1.7Δ

(55)

R2 = ΔY (I − x0 xt0 ) + (I − x0 xt0 )Y Δ − ΔY Δ

(56)

0

0

where

We observe that R1 ≤ 2Y ΔI − x0 xt0 + Y Δ2 + 1.7Δ ≤ 13.4ε + 10ε2 √ √ √

R2 F ≤ 2R2 ≤ 2 2Y ΔI − x0 xt0 + Y Δ2 ≤ 10 2(ε + ε2 )

(57) (58)

If we choose ε˜ = 0.01, R1 ≤ 0.135 and R2 F ≤ 0.143, and YT ⊥ − 1.7IT ⊥ ≤ 0.3 + 0.135 ≤ 0.5 YT F ≤ 0.25 + 0.143 ≤ 0.4

(59) 2

(60)

ˆ to the recovery error for We now prove a lemma relating the recovery error in the recovered matrix X the unlifted signal x ˆ0 . Let {λj }j=1...n be the nonincreasing sequence of eigenvalues for a matrix X 0, and let {uj } be the corresponding orthonormal eigenvectors. Lemma 9. Let x0 ∈ Rn . Let X0 = x0 xt0 . If X 0 satisﬁes X − X0 F ≤ ξx0 2 for ξ > 0, then min x0 − (−1)k

k∈{0,1}

√ λ1 u1 2 ≤ (1 + 2 2)ξx0

Proof. Without loss of generality, take x0 = 1. Write X =

j

λj uj utj . By Weyl’s perturbation theorem,

max{|1 − λ1 |, λ2 , . . . , λn } ≤ ξ

(61)

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.12 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

12

n Writing X0 −u1 ut1 = X0 −X + (λ1 −1)u1 ut1 + j=2 λj uj utj , we use (61) and the assumption X −X0 F ≤ ξ to establish X0 − u1 ut1 ≤ 2ξ. Note that 1 − |x0 , u1 |2 = x0 − u1 ut1 2 ≤ 4ξ 2 , and let k = 1 if x0 , u1 < 0 and let k = 0 otherwise. We thus have (−1)k x0 − u1 2 = 2 − 2|x0 , u1 | ≤ 2(1 − |x0 , u1 |2 ) ≤ 8ξ 2 We can then conclude (−1)k x0 −

λ1 u1 2 ≤ (−1)k x0 − u1 + u1 −

Applying (61) once more, we have (−1)k x0 −

√

√ √ λ1 u1 ≤ 2 2ξ + |1 − λ1 | ≤ 2 2ξ + |1 − λ1 |

√ λ1 u1 2 ≤ (1 + 2 2)ξ. 2

We can now prove Theorem 1. Proof of Theorem 1. Assume that |S|/m ≤ min(0.001, γ ∗ /2 log 2) and m ≥ max(c, 4 log(201)/γ ∗ ). By ∗ Lemma 8, there is an event of probability at least 1 − e−γ m/4 such that for all x0 and for all ε with supp(ε) ⊆ S, there exists a Y = A∗ λ satisfying (12)–(14). Choose a superset S such that |S| = 0.01m. On ˆ − X0 F ≤ Cη1 /m. The intersection the intersection of this event with ES , Lemma 4 guarantees that X −γm of these events has probability at least 1 − e for some γ. η1 ˆ for ξ = The proof of the error estimate ˆ x − x0 ≤ C mx follows by applying Lemma 9 to X 0

η1 C mx 2. 0

2

2.3. Comment on the complex-valued case Extending Theorem 1 to the complex-valued case may be possible. In the proofs above, the most central use of the real-valued assumption is in the concentration estimates of the dual certiﬁcate. The certiﬁcate is broken into pieces where ε is positive, negative, and zero. In the complex case, there are an inﬁnite number of direction that the values of ε can take. Hence, the complex case will require a more involved analysis. Acknowledgments The author acknowledges generous funding by the grant NSF DMS-1464525. References [1] E. Candès, Y. Eldar, T. Strohmer, V. Voroninski, Phase retrieval via matrix completion, SIAM J. Imaging Sci. 6 (1) (2013) 199–225. [2] Emmanuel J. Candès, Xiaodong Li, Solving quadratic equations via PhaseLift when there are about as many equations as unknowns, Found. Comput. Math. (2012) 1–10. [3] Emmanuel J. Candès, Xiaodong Li, Yi Ma, John Wright, Robust principal component analysis?, J. ACM 58 (3) (2011) 11. [4] Emmanuel J. Candes, Xiaodong Li, Mahdi Soltanolkotabi, Phase retrieval via Wirtinger ﬂow: theory and algorithms, IEEE Trans. Inform. Theory 61 (4) (2015) 1985–2007. [5] Emmanuel J. Candès, Thomas Strohmer, Vladislav Voroninski, PhaseLift: exact and stable signal recovery from magnitude measurements via convex programming, Comm. Pure Appl. Math. 66 (8) (2013) 1241–1274. [6] Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo, Alan S. Willsky, Rank-sparsity incoherence for matrix decomposition, SIAM J. Optim. 21 (2) (2011) 572–596. [7] Yudong Chen, Ali Jalali, Sujay Sanghavi, Constantine Caramanis, Low-rank matrix recovery from errors and erasures, IEEE Trans. Inform. Theory 59 (7) (2013) 4324–4337. [8] Laurent Demanet, Paul Hand, Stable optimizationless recovery from phaseless linear measurements, J. Fourier Anal. Appl. 20 (1) (2014) 199–221. [9] J.R. Fienup, Phase retrieval algorithms: a comparison, Appl. Opt. 21 (1982) 2758–2768.

JID:YACHA

AID:1110 /COR

[m3L; v1.172; Prn:20/01/2016; 9:21] P.13 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••

13

[10] Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candes, Yi Ma, Dense error correction for low-rank matrices via principal component pursuit, in: 2010 IEEE International Symposium on Information Theory Proceedings, ISIT, IEEE, 2010, pp. 1513–1517. [11] R.W. Gerchberg, W.O. Saxton, A practical algorithm for the determination of phase from image and diﬀraction plane pictures, Optik 35 (1972) 237–246. [12] David Gross, Recovering low-rank matrices from few coeﬃcients in any basis, IEEE Trans. Inform. Theory 57 (3) (2011) 1548–1566. [13] Robert W. Harrison, Phase problem in crystallography, J. Opt. Soc. Amer. A 10 (5) (1993) 1046–1055. [14] Daniel Hsu, Sham M. Kakade, Tong Zhang, Robust matrix decomposition with sparse corruptions, IEEE Trans. Inform. Theory 57 (11) (2011) 7221–7234. [15] Richard Kueng, Holger Rauhut, Ulrich Terstiege, Low rank matrix recovery from rank one measurements, CoRR, arXiv:1410.6913 [abs], 2014. [16] Beatrice Laurent, Pascal Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Statist. (2000) 1302–1338. [17] Xiaodong Li, Compressed sensing and matrix completion with constant proportion of corruptions, Constr. Approx. 37 (1) (2013) 73–99. [18] J. Löfberg, Yalmip: a toolbox for modeling and optimization in MATLAB, in: Proceedings of the CACSD Conference, Taipei, Taiwan, 2004. [19] R.P. Millane, Phase retrieval in crystallography and optics, J. Opt. Soc. Amer. A 7 (3) (1990) 394–411. [20] K.C. Toh, M.J. Todd, R.H. Tutuncu, SDPT3 – a Matlab software package for semideﬁnite programming, Optim. Methods Softw. 11 (1998) 545–581. [21] R.H. Tutuncu, K.C. Toh, M.J. Todd, Solving semideﬁnite-quadratic-linear programs using SDPT3, Math. Program. Ser. B 95 (2003) 189–217. [22] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, in: Y.C. Eldar, G. Kutyniok (Eds.), Compressed Sensing: Theory and Applications, Cambridge University Press, 2012. [23] John Wright, Arvind Ganesh, Shankar Rao, Yigang Peng, Yi Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, in: Advances in Neural Information Processing Systems, 2009, pp. 2080–2088.

PhaseLift is robust to a constant fraction of arbitrary errors

PhaseLift is robust to a constant fraction of arbitrary errors

Recommend Documents