JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.1 (1-13) Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
Contents lists available at ScienceDirect
Applied and Computational Harmonic Analysis www.elsevier.com/locate/acha
Letter to the Editor
PhaseLift is robust to a constant fraction of arbitrary errors Paul Hand ∗ Rice University, Department of Computational and Applied Mathematics, MS-134, 6100 Main Street, Houston, TX 77005, United States
a r t i c l e
i n f o
Article history: Received 17 April 2015 Received in revised form 13 December 2015 Accepted 3 January 2016 Available online xxxx Communicated by Karlheinz Gröchenig Keywords: Phase retrieval PhaseLift Matrix completion Robust principal component analysis
a b s t r a c t Consider the task of recovering an unknown n-vector from phaseless linear measurements. This nonconvex problem may be convexified into a semidefinite rank-one matrix recovery problem, known as PhaseLift. Under a linear number of Gaussian measurements, PhaseLift recovers the unknown vector exactly with high probability. Under noisy measurements, the solution to a variant of PhaseLift has error proportional to the 1 norm of the noise. In the present paper, we study the robustness of this variant of PhaseLift to gross, arbitrary corruptions. We prove that PhaseLift can tolerate noise and a small, fixed fraction of gross errors, even in the highly underdetermined regime where there are only O(n) measurements. The lifted phase retrieval problem can be viewed as a rank-one robust Principal Component Analysis (PCA) problem under generic rank-one measurements. From this perspective, the proposed convex program is simpler than the semidefinite version of the sparse-plus-low-rank formulation standard in the robust PCA literature. Specifically, the rank penalization through a trace term is unnecessary, and the resulting optimization program has no parameters that need to be chosen. © 2016 Elsevier Inc. All rights reserved.
1. Introduction This paper establishes robustness of an algorithm for recovering a vector x0 ∈ Rn from phaseless linear measurements that contain noise and a constant fraction of gross, arbitrary errors. That is, for fixed measurement vectors ai ∈ Rn for i = 1 . . . m, our task is to find x0 satisfying bi = |x0 , ai |2 + εi + ηi
(1)
for known bi ∈ R, known ai , and unknown ηi and εi . Here ηi will represent noise in the measurements, and εi will represent gross, arbitrary corruptions. This recovery problem is known as phase retrieval. Measurements of form (1) arise in several applications, such as X-ray crystallography, optics, and microscopy [13,19,1]. In * Fax: +1 713 348 5318. E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.acha.2016.01.001 1063-5203/© 2016 Elsevier Inc. All rights reserved.
JID:YACHA 2
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.2 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
such applications, gross corruptions in some measurements may be due to sensor failure, occlusions, or other effects. Recently, researchers have introduced lifting-based algorithms for the phase retrieval problem that have provable recovery guarantees [1,5]. The insight of these methods is that the phase retrieval problem can be convexified by lifting it to the space of matrices. That is, instead of searching for the vector x0, one can search for the lifted matrix x0 xt0 . The quadratic measurements (1) then become linear measurements on this lifted matrix. As the desired matrix is semidefinite and rank-one, one can write a rank minimization problem under the semidefinite and data constraints, which has a convex relaxation known as PhaseLift. In the noiseless case, PhaseLift is the program min tr(X) subject to X 0, {ati Xai = bi }i=1...m . X
(2)
Here, the trace of X is a convex proxy for the rank of a positive semidefinite X. An estimate for the underlying signal x0 can be found by computing the leading eigenvector of the optimizer of (2). Similar to [5,8,2], we will seek recovery guarantees for independent identically distributed Gaussians ai ∼ N (0, In ) ∈ Rn . Under this data model, [8] and [2] have shown that (2) can be simplified to the feasibility problem find X 0 such that {ati Xai = bi }i=1...m . This feasibility problem succeeds at finding x0 xt0 exactly with high probability when m ≥ cn for a sufficiently large c [2]. This scaling is quite surprising because there are only O(n) measurements for an O(n2 ) dimensional object. As discussed in [5,8], the semidefinite cone is sufficiently ‘pointy’ that the high-dimensional affine space of data-consistent matrices intersects the semidefinite cone only at exactly one point. In the noisy case without gross errors, that is for ε = 0, [2] showed that the PhaseLift variant min |ati Xai − bi | subject to X 0 (3) i
ˆ successfully recovers a matrix near x0 xt0 with high probability. Specifically, they prove that the solution X t ˆ ˆ to (3) satisfies X − x0 x0 F ≤ C0 η1 /m with high probability. From X, an estimate of x0 can be obtained ˆ1, u ˆ1u ˆ When specialized to by x ˆ0 = λ ˆ1 , where (λ ˆ1 ) is the leading eigenvalue and eigenvector pair for X. η1 k the real-valued case, [2] proves that mink∈{0,1} ˆ x0 − (−1) x0 ≤ C0 mx0 for some C0 . The contribution of the present paper is to show that the program (3) is additionally robust against a constant fraction of arbitrary errors. For a fixed set of coefficients that contain gross errors, we show that approximate recovery succeeds with high probability for arbitrary signals and arbitrary values of the gross errors. Theorem 1. There exist positive numbers fmin , γ, c, C, C such that the following holds. Let m ≥ cn. Fix a set S ⊂ {1 . . . m} such that |S|/m ≤ fmin . On an event of probability at least 1 − e−γm , for any x0 ∈ Rn ˆ to (3) satisfies and for any ε with supp(ε) ⊆ S, the minimizer X ˆ − x0 xt F ≤ C X 0
η1 . m
The resulting estimate for x0 satisfies min ˆ x0 − (−1)k x0 ≤ C
k∈{0,1}
η1 . mx0
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.3 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
3
Note that this high-probability result is universal over signals x0 and corruptions ε with fixed support. That is, a single realization of the measurement vectors ai will allow recovery with high probability for an arbitrary and even adversarial x0 . Further, for a fixed set of corrupted entries, the values of the gross corruptions can be arbitrary and even adversarial. In the case of no gross errors, in which ε = 0, this theorem reduces to the result in [2] and the rank-one case of [15]. In the noiseless case, in which η = 0, the theorem guarantees exact recovery of x0 with high probability under a linear number of measurements, of which a constant fraction are corrupted. We now explore the optimality of this theorem. The scaling of m versus n is information theoretically optimal and has no unnecessary logarithmic factors. The noise scaling is the same as in [2], and its optimality was established there. For arbitrary errors, the fixed fraction of gross errors cannot be extended to a case where fmin ≥ 1/2 because one could build a problem where half of the measurements are due to an x0 and the other half are due to some x1 . In such a case, recovery would be impossible. As stated, Theorem 1 only applies to real-valued signals. As the applications of phase retrieval typically involve complex measurements, it is an important follow-up question to determine if the corresponding theorem holds for complex measurements. We leave this question for future research. See Section 2.3 for a brief comment on this extension. 1.1. Comparison with other phase retrieval algorithms There are several algorithms for phase retrieval in the literature. The most popular in practice are alternating direction methods, such as Gerchberg–Saxton and Fienup, which involve inverting an oversampled Fourier Transform or solving a least squares problem [11,9,1]. These methods do not currently have rigorous recovery guarantees in the case of uncorrupted measurements, and they are sensitive to outliers in the inverse Fourier transform or least squares steps. In order to solve phase retrieval with corrupted measurements, the outliers would need to be explicitly filtered or the algorithms would need to be modified. Of recent interest is a nonconvex algorithm for phase retrieval called Wirtinger flow [4]. In this algo rithm, a descent is performed down the nonconvex objective i (|ai , x|2 − bi )2 under the initialization t of the leading eigenvector of the matrix i bi ai ai . This method enjoys theoretical guarantees and fast numerical implementations. In the presence of data corruption, the initialization and descent steps are susceptible to corrupted data. As a result, the Wirtinger flow algorithm is not corruption tolerant. Perhaps a corruption-tolerant variant of Wirtinger flow could be found. We leave this as the subject of future research. To the best of the author’s knowledge, PhaseLift is the only method for phase retrieval that can provably tolerate data corruption without explicit outlier detection. Because PhaseLift lifts an n-dimensional problem to an n × n semidefinite program, it is computationally expensive for large problems. As a result, there is currently a speed vs. corruption-tolerance tradeoff in the theoretical performance of phase retrieval methods: the faster Wirtinger flow algorithm has theoretical guarantees but is not corruption-tolerant, whereas the slower PhaseLift algorithm has theoretical guarantees and is corruption-tolerant. 1.2. Relation to robust PCA Much recent work in matrix completion has studied the recovery of low-rank matrices from arbitrary corruptions to its entries. This is a task known as robust Principal Component Analysis (PCA). Results in this framework typically involve measuring some of the entries of a low rank n × n matrix X0 and assuming that some fraction of those measurements are arbitrarily corrupted, giving the data matrix A. The matrix X0 can then be recovered under certain conditions by a sparse-plus-low-rank convex program: min λX∗ + E1 such that P(X + E) = P(A)
(4)
JID:YACHA 4
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.4 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
where X∗ is the nuclear norm of X, λ is a constant, E1 is the 1 norm of the vectorization of E, and P is the projection of a matrix onto the observed entries. [3,6,14,23,10,7,17]. Results from this formulation have been quite surprising. For example, under an appropriate choice of λ and under an incoherence assumption, the sparse-plus-low-rank decomposition succeeds for sufficiently low rank X when O(n2 ) entries are measured and a small fraction of them have arbitrary errors [3]. Subsequent results have been proved that only require m rn polylog(n) measurements, where r is the rank of X [7,17]. This result is information theoretically optimal except for the polylogarithmic factor. The present paper can be viewed as a rank-one semidefinite robust PCA problem under generic rank-one measurements. From this perspective, we would naturally formulate the PhaseLift problem under gross errors by min λ tr(X) +
|ati Xai − bi | subject to X 0.
(5)
i
The present paper shows that explicit rank penalization by the trace term is not fundamental for exact recovery in the presence of arbitrary errors. That is, (5) can be simplified by taking λ = 0. The resulting program has no free parameters that need to be explicitly tuned. As in [8,2], the positive semidefinite cone provides enough of a constraint to enforce low-rankness. The present work differs from the standard robust PCA literature in that the measurements are generic and are not direct samples of the entries of the unknown matrix. As Gaussian measurements are easier to analyze, we can provably achieve information theoretic scaling without additional logarithmic factors. 1.3. Numerical simulation We now use numerical simulation to provide empirical evidence for the performance of (3). Let the signal length n vary from 5 to 50, and let the number of measurements m vary from 10 to 250. Let x0 = e1 . For each (n, m), we consider measurements such that
bi ∼ Uniform([0, 104 ])
if 1 ≤ i ≤ 0.05m ,
bi = |x0 , ai |2
otherwise.
(6)
We attempt to recover x0 xt0 by solving (3) using the SDPT3 solver [20,21] and YALMIP [18]. For a given ˆ define the capped relative error as optimizer X, ˆ − x0 xt F /x0 xt F , 1). min(X 0 0 Fig. 1 plots the average capped relative error over 10 independent trials. It provides empirical evidence that the matrix recovery problem (3) succeeds under a linear number of measurements, even when a constant fraction of them contain very large errors. 2. Proofs Let A : S n → Rm be defined by the mapping X → (ati Xai )i=1...m , where S n is the space of symmetric real-valued n × n matrices. Note that A∗ λ = i λi ai ati . Let AS be the restriction of A onto the coefficients given by the set S. Let ei be the ith standard basis vector. Let X0 = x0 xt0 . We can write the measurements (1) as b = AX0 + ε + η.
(7)
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.5 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
5
Fig. 1. Recovery error for the PhaseLift matrix recovery problem (3) as a function of n and m, when 5% of measurements contain large errors. Black represents an average recovery error of 100%. White represents zero average recovery error. Each block corresponds to the average from 10 independent trials. The solid curve depicts when the number of measurements equals the number of degrees of freedom in a symmetric n × n matrix. The number of measurements required for successful recovery appears to be linear in n, even with a small fraction of large errors.
Similarly, the optimization program (3) can be written as min AX − b1 such that X 0. We introduce the following notation. Let X∗ be the nuclear norm of the matrix X. When X 0, X∗ = tr(X). Denote the Frobenius and spectral norms of X as XF and X, respectively. Let X, Y = tr(Y t X) be the Hilbert–Schmidt inner product. Given x0 , let Tx0 = {yxt0 + x0 y t | y ∈ Rn }. Note that Te1 is the space of symmetric matrices supported on their first row and column. The orthogonal complement Te⊥1 is then the space of matrices supported in the lower-right n − 1 × n − 1 block. When x0 is clear, we will simply write T instead of Tx0 . Let I be the identity matrix, and let 1(E) be the indicator function of the event E. 2.1. Recovery by dual certificates The proof of Theorem 1 will be based on dual certificates, as in [12,5,8,2]. A dual certificate is an optimal variable for the dual problem to (3). Its existence certifies the correctness of a minimizer to (3). The first order optimality conditions at X0 for (3) are given by ˜ Y˜ = A∗ λ
(8)
˜ ∈ ∂ · 1 (−ε) λ
(9)
Y˜ 0
(10)
Y˜ , X0 = 0,
(11)
where ∂ · 1 (−ε) is the subgradient of the 1 norm evaluated at −ε. Note that (10) and (11) imply Y˜T = 0. Such a Y˜ would be dual certificate for (3). Unfortunately, constructing such a Y˜ that exactly satisfies these conditions is difficult. As in [12,5,8,2], we seek an inexact dual certificate, which approximately satisfies these conditions. Specifically, we will build a dual certificate Y = A∗ λ that satisfies YT ⊥ IT ⊥
(12)
YT F ≤ 1/2 7 sgn(εi ) if εi = 0 λi = − m
(13)
|λi | ≤
7 m
if εi = 0.
(14)
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.6 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
6
To prove that existence of such a Y will guarantee successful recovery of x0 xt0 with high probability, we will rely on two technical lemmas. The first technical lemma provides 1 -isometry bounds on A and was proven in [5]. Lemma 1. (See [5].) There exist constants c0 , γ0 such that if m ≥ c0 n, then with probability at least 1 −e−γ0 m , 1 1 A(X)1 ≤ 1 + X∗ for all X, (15) m 16 1 1 A(X)1 ≥ 0.94 1 − X for all symmetric, rank-2 X. (16) m 16 We will need simultaneous control of the 1 -isometry properties over several subsets of measurements. Lemma 2. There exists a constant γ˜0 such that the following holds. Let m ≥ 100c0 n, and fix a support set S with |S| = 0.01m . There is an event ES with probability at least 1 − e−˜γ0 m on which 1 1 c AS X1 ≥ 0.94 1 − X for all symmetric rank-2 X, (17) |S c | 16 1 1 AS X1 ≤ 1 + X∗ for all X, (18) |S| 16 1 1 AX1 ≤ 1 + X∗ for all X. (19) |m| 16 The proof of Lemma 2 is immediate from Lemma 1. In order to prove that a dual certificate guarantees recovery, we establish a technical result that an optimal solution X0 + H to (3) lies near the cone HT ⊥ ∗ ≥ 0.56HT F with high probability. Lemma 3. Fix a support set S with |S| = 0.01m and let m ≥ 100c0 n. On the event ES from Lemma 2, 2 for all x and for all ε with supp(ε) ⊆ S, any optimal X0 + H satisfies HT ⊥ ∗ ≥ 0.56HT F − m η1 . Proof. Let X0 + H be a minimizer for (3), which implies that A(X0 + H) − b1 ≤ AX0 − b1 . By (7), AH − ε − η1 ≤ ε + η1 . Breaking these 1 norms into the coefficients belonging to S and S c , we have AS c H − ηS c 1 + AS H − ε − ηS 1 ≤ ε + ηS 1 + ηS c 1 .
(20)
By the triangle inequality, we have AS c H − ηS c 1 + ε + ηS 1 − AS H1 ≤ ε + ηS 1 + ηS c 1 ,
(21)
AS c H1 ≤ 2ηS c 1 + AS H1 .
(22)
which implies
Breaking (22) into its components on T and T ⊥ and applying the triangle inequality, we have AS c HT 1 − AS c HT ⊥ 1 ≤ 2ηS c 1 + AS HT 1 + AS HT ⊥ 1 ⇒AS c HT 1 ≤ 2ηS c 1 + AS HT 1 + AHT ⊥ 1
(23) (24)
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.7 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
7
We now apply the 1 isometry bounds from Lemma 2 on each term of (24). On the event ES , 1 |S c |HT AS c HT 1 ≥ 0.94 1 − 16 1 |S c | √ HT F , ≥ 0.94 1 − 16 2
(25)
where the second inequality follows because HT has rank at most 2. On the event ES , 1 HT ∗ AS HT 1 ≤ |S| 1 + 16 1 √ 2HT F ≤ |S| 1 + 16
(27)
(26)
(28)
On the event ES , 1 HT ⊥ ∗ AHT ⊥ 1 ≤ m 1 + 16 Combining (24)–(29), we have 0.94 1 − √ 2 1+ Thus, 0.56HT F ≤
2 m η1
1 16
1 16
|S c | √ |S| − 2 m m
HT F ≤
+ HT ⊥ ∗ on the event ES .
2 ηS c 1 + HT ⊥ ∗ m
(29)
(30)
2
We may now prove that existence of an inexact dual certificate (12)–(14) will guarantee successful recovery of a matrix near X0 = x0 xt0 with high probability, provided that there are few enough arbitrary errors. Lemma 4. There exists a C such that the following holds. Fix S such that |S| = 0.01m . Let m ≥ 100c0 n. Fix x0 ∈ Rn . Fix ε ∈ Rm such that supp(ε) ⊆ S. Suppose that there exists Y = A∗ λ satisfying (12)–(14). ˆ of (3) satisfies X ˆ − X0 F ≤ C η1 . Then, on the event ES from Lemma 2, a minimizer X m ˆ = X0 + H be a minimizer for (3), which implies that A(X0 + H) − b1 ≤ AX0 − b1 . By Proof. Let X (7), AH − ε − η1 ≤ ε + η1 . Letting α = 7/m, condition (14) gives λ/α ∈ ∂ · 1 (−ε)
(31)
Hence, − ε1 + λ/α, AH − η ≤ ε1 + η1 ⇒
λ, AH − η ≤ αη1
(32) (33)
⇒
Y, H ≤ λ, η + αη1
(34)
⇒
Y, H ≤ 2αη1
(35)
Decomposing (35) into T and T ⊥ , we have
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.8 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
8
YT , HT ≤ −YT ⊥ , HT ⊥ + 2αη1 ,
(36)
YT ⊥ , HT ⊥ ≤ |YT , HT | + 2αη1 .
(37)
and thus,
By conditions (12)–(13) and HT ⊥ 0, we have HT ⊥ ∗ ≤ |YT ⊥ , HT ⊥ | ≤ |YT , HT | + 2αη1 ≤
1 HT F + 2αη1 2
(38)
By Lemma 3, on the event ES , HT ⊥ ∗ ≥ 0.56HT F −
2 η1 . m
(39)
Combining (38) and (39), and using α = 7/m, we get 0.56HT F ≤
2 14 η1 + 0.5HT F + η1 m m
(40)
So, HT F ≤
C C η1 and thus by (37), HT ⊥ F ≤ η1 . m m
ˆ − X0 F = HF ≤ We conclude X
C m η1
(41)
for some C. 2
2.2. Construction of the dual certificate We now construct the dual certificate for arbitrary x0. Our construction will be a modification to the dual certificate in [2]. Also similar to [2], we will build dual certificates with high probability on a net of x0 . We will then use a continuity argument to get a dual certificate for a arbitrary x0 . Let S + and S − be disjoint supersets of the indices over which ε is positive or negative, respectively. Let S = S + ∪ S − . For pedagogical purposes, S + and S − should be thought of as exactly the indices over which ε is positive or negative. For technical reasons, we let them be supersets of cardinality linear in n, in order to use standard probability bounds. For a fixed choice of S + and S − , let the inexact dual certificate Y be defined by Y =
1 x0 2 x0 | 1 |ai , | ≤ 3 ai ati −7ai ati + 7ai ati + β0 − |ai , m x0 x0 c + − i∈S
i∈S
(42)
i∈S
where β0 = Ez 4 1(|z| ≤ 3) ≈ 2.6728 and z is a standard normal random variable. We will refer to each of the terms of in the right hand side of (42) as Y+ , Y− , and Y0 , respectively. The form of Y0 is due to [2], and the intuition behind it is as follows. Note that E(ai ati ) = In and m ˜ ˜ 0 1 E(|ai , e1 |2 ai ati ) = ( β00 In−1 ), where β˜0 = Ez 4 for a standard normal z. The construction m i=1 [β0 − 2 t ˜ |ai , e1 | ]ai a thus has expected value (β0 −1)IT ⊥ , which would provide the exact dual certificate conditions i
(10)–(11). As shown in [2], a satisfactory inexact dual certificate can be built with m = O(n) and coefficients that are truncated to be no larger than 7/m. In the present formulation, the terms Y+ and Y− are then set to have coefficients ∓7/m in order to satisfy (14). We now show that a dual certificate exists with high probability for a fixed signal and a fixed pattern of signs of ε.
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.9 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
9
Lemma 5. There exists constants c, γ ∗ such that the following holds. Let m ≥ cn. Fix x0 ∈ Rn and ε ∈ Rm . Let S + and S − be fixed disjoint sets of cardinality 0.001m . Then the dual certificate Y from (42) satisfies ∗ (14), YT F ≤ 0.25, and YT ⊥ − 1.7IT ⊥ ≤ 0.3 with probability at least 1 − e−γ m . Proof. Without loss of generality, it suffices to assume x0 = e1 . It suffices to show that with high probability Y0,T ⊥ − 1.7IT ⊥ ≤ 0.2,
(43)
Y0,T F ≤ 0.15,
(44)
Y±,T ⊥ ≤ 0.05,
(45)
Y±,T F ≤ 0.05.
(46)
First, we establish (43)–(44). By Lemma 2.3 in [2], there exist c˜, γ˜ such that if |S c | ≥ c˜n, then with c probability at least 1 − e−˜γ |S | , m Y − 1.7I ⊥ ⊥ T ≤ 0.1, and |S c | 0,T m |S c | Y0,T ≤ 0.15.
(47) (48)
Thus, c c Y0,T ⊥ − 1.7IT ⊥ ≤ Y0,T ⊥ − |S | 1.7IT ⊥ + 1 − |S | 1.7 ≤ 0.2 m m which establishes (43). By (48), we get (44) immediately. Next, we establish (45). Let a be the vector formed by the last n − 1 components of a. Observe that ∗ i∈S + ai ai is a Wishart random matrix. Standard estimates for singular values of random matrices with Gaussian i.i.d. entries, such as Corollary 5.35 in [22], apply. If |S + | = 0.001m ≥ c˜0 n, with probability at least 1 − e−˜γ1 m for some γ˜1 , 1 ∗ ai ai − IT ⊥ ≤ 1/2 + |S | + i∈S
Hence, 1 ∗ ai ai ≤ 3/2 + |S | + i∈S
Thus, Y+,T ⊥ ≤
+ 3 |S | 2 m ,
and we arrive at (45). The bound for the Y−,T ⊥ term is identical. + Next, we establish (46). Note that Y+ = −7 |Sm | · |S1+ | i∈S + ai ati . As per Lemma 6, if |S + | = 0.001m ≥
c˜1 n, then Y+,T F ≤ 7 |Sm | · 5 ≤ 7 · 0.001 · 5 ≤ 0.05, with probability at least 1 − e−˜γ1 m . ∗ Thus (43)–(46) hold simultaneously with probability at least 1 − e−γ m for some γ ∗ provided that c ≥ max(2˜ c, 1000˜ c0 , 1000˜ c1 ). 2 +
The behavior of Y±,T relies on the following probability estimate for the behavior of a Gaussian Wishart matrix on T .
JID:YACHA 10
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.10 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
Lemma 6. Let x0 = e1 . Let A = probability at least 1 − e−˜γ1 m .
1 m
m i=1
ai ati . There exists c˜1 , γ˜1 such that if m ≥ c˜1 n then AT F ≤ 5 with
Proof. Let y be the (1, 1) entry of A, and let y be the rest of the first column of A. Then AT 2F = y 2 +2y 22 . m 2 1 2 So, y = m i=1 ai (1). Hence, my ∼ χm . Standard results on the concentration of chi-squared variables, such as Lemma 1 in [16], give P(my ≥ 4m) ≤ e−˜γ1,2 m for some γ˜1,2 . Hence P(y 2 ≥ 16) ≤ e−˜γ1,2 m . 1 Now, it remains to bound y 22 . We can write y = m Z c, where Z = [a1 , . . . , am ] and ci = ai (1), where 2 Z and c are independent. Note that c2 is a Chi-squared random variable with m degrees of freedom. Hence, with probability at least 1 − e−˜γ1,2 m , c22 ≤ 4m For fixed x2 = 1, Z x22 ∼ χ2n−1 , and hence with probability at least 1 − e−γ1,3 m Z x22 ≤ m when m ≥ c˜1 n. Hence, m2 y 22 ≤ 4m · m with probability at least 1 − 2e−˜γ1,4 m . So, y 2 < 2 with probability at least 1 − 2e−˜γ1,4 m . We conclude AT 2F ≤ 25, and hence AT F ≤ 5 with probability at least 1 − e−˜γ1 m for some γ˜1 . 2 We now show that for a fixed signal and support set of gross errors, there is a high probability that a dual certificate exists simultaneously for all gross errors. Lemma 7. Fix x0 and a support set S. If m ≥ cn and |S|/m ≤ min(0.001, γ ∗ /2 log 2), then there is an ˜S,x on which for all ε with supp(ε) ⊆ S, there exists a Y satisfying (14), YT F ≤ 0.25, and event E 0 ˜S,x is at least 1 − e−γ ∗ m/2 . YT ⊥ − 1.7IT ⊥ ≤ 0.3. The probability of E 0 Proof. Consider all of the 2|S| possible assignments of sign to the entries of ε on S. For each, choose an S + and S − that are disjoint, have cardinality 0.001m , and are supersets of the indices assigned a positive or ˜S,x be the event on which all sign assignments yield a Y satisfying (14), negative sign, respectively. Let E 0 YT F ≤ 0.25, and YT ⊥ − 1.7IT ⊥ ≤ 0.3. By Lemma 5, this event has probability at least 1 − 2|S| e−γ
∗
m
≥ 1 − e−γ
∗
m/2
.
2
We now show that for a fixed support set of gross errors, there is a high probability that a dual certificate exists simultaneously for all signals and for all gross errors. Lemma 8. Fix a support set S. If m ≥ max(c, 4 log(201)/γ ∗ )n and |S|/m ≤ min(0.001, γ ∗ /2 log 2), then on ∗ an event of probability at least 1 − e−γ m/4 , for all x0 and for all ε with supp(ε) ⊆ S, there exists Y = A∗ λ satisfying (14) and YT F ≤ 0.4 and YT ⊥ − 1.7IT ⊥ ≤ 0.5. Proof. By Lemma 7, for any fixed all x0 such that x0 = 1, for all ε with supp(ε) ⊆ S, there exists a Y = A∗ λ such that
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.11 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
7 m 7 sgn(εS ) λS = m
λ∞ ≤
11
(49) (50)
YT ⊥ − 1.7IT ⊥ ≤ 0.3
(51)
YT F ≤ 0.25
(52)
∗
˜S,x , which has probability at least 1 − e−γ m/2 . By Lemma 5.2 in [22], there exists an ε˜-net on the event E 0 Nε˜ of the unit sphere such that |Nε˜| ≤ (1 + 2/˜ ε)n . Hence, such a Y exists simultaneously for all x0 ∈ Nε˜ ∗ on an event of probability at least 1 − (1 + 2/˜ ε)n e−γ m/2 . If m > 4n log(1 + 2/˜ ε)/γ ∗ , then such a Y exists ∗ ∗ simultaneously for all x0 ∈ Nε˜ with probability at least 1 − e−γ cn/4 ≥ 1 − e−γ m/4 . We now appeal to a continuity argument to show that a dual certificate exists for points not on the net Nε˜. For an arbitrary x such that x2 = 1, we consider the Y corresponding to the nearest x0 ∈ Nε˜. Note that x − x0 ≤ ε˜ by definition of the net Nε˜. We now closely follow the proof and notation of Corollary 2.4 in [2] to show that Y is a satisfactory approximate dual certificate for x. Note that Y ≤ 2.5. Let Δ = xxt − x0 xt0 and note that ΔF ≤ 2˜ ε. Let T = Tx . Now we have YT ⊥ − 1.7IT ⊥ = YTx⊥ − 1.7ITx⊥ − R1
(53)
YT = YTx0 + R2
(54)
R1 = ΔY (I − x0 xt0 ) + (I − x0 xt0 )Y Δ − ΔY Δ − 1.7Δ
(55)
R2 = ΔY (I − x0 xt0 ) + (I − x0 xt0 )Y Δ − ΔY Δ
(56)
0
0
where
We observe that R1 ≤ 2Y ΔI − x0 xt0 + Y Δ2 + 1.7Δ ≤ 13.4ε + 10ε2 √ √ √
R2 F ≤ 2R2 ≤ 2 2Y ΔI − x0 xt0 + Y Δ2 ≤ 10 2(ε + ε2 )
(57) (58)
If we choose ε˜ = 0.01, R1 ≤ 0.135 and R2 F ≤ 0.143, and YT ⊥ − 1.7IT ⊥ ≤ 0.3 + 0.135 ≤ 0.5 YT F ≤ 0.25 + 0.143 ≤ 0.4
(59) 2
(60)
ˆ to the recovery error for We now prove a lemma relating the recovery error in the recovered matrix X the unlifted signal x ˆ0 . Let {λj }j=1...n be the nonincreasing sequence of eigenvalues for a matrix X 0, and let {uj } be the corresponding orthonormal eigenvectors. Lemma 9. Let x0 ∈ Rn . Let X0 = x0 xt0 . If X 0 satisfies X − X0 F ≤ ξx0 2 for ξ > 0, then min x0 − (−1)k
k∈{0,1}
√ λ1 u1 2 ≤ (1 + 2 2)ξx0
Proof. Without loss of generality, take x0 = 1. Write X =
j
λj uj utj . By Weyl’s perturbation theorem,
max{|1 − λ1 |, λ2 , . . . , λn } ≤ ξ
(61)
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.12 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
12
n Writing X0 −u1 ut1 = X0 −X + (λ1 −1)u1 ut1 + j=2 λj uj utj , we use (61) and the assumption X −X0 F ≤ ξ to establish X0 − u1 ut1 ≤ 2ξ. Note that 1 − |x0 , u1 |2 = x0 − u1 ut1 2 ≤ 4ξ 2 , and let k = 1 if x0 , u1 < 0 and let k = 0 otherwise. We thus have (−1)k x0 − u1 2 = 2 − 2|x0 , u1 | ≤ 2(1 − |x0 , u1 |2 ) ≤ 8ξ 2 We can then conclude (−1)k x0 −
λ1 u1 2 ≤ (−1)k x0 − u1 + u1 −
Applying (61) once more, we have (−1)k x0 −
√
√ √ λ1 u1 ≤ 2 2ξ + |1 − λ1 | ≤ 2 2ξ + |1 − λ1 |
√ λ1 u1 2 ≤ (1 + 2 2)ξ. 2
We can now prove Theorem 1. Proof of Theorem 1. Assume that |S|/m ≤ min(0.001, γ ∗ /2 log 2) and m ≥ max(c, 4 log(201)/γ ∗ ). By ∗ Lemma 8, there is an event of probability at least 1 − e−γ m/4 such that for all x0 and for all ε with supp(ε) ⊆ S, there exists a Y = A∗ λ satisfying (12)–(14). Choose a superset S such that |S| = 0.01m. On ˆ − X0 F ≤ Cη1 /m. The intersection the intersection of this event with ES , Lemma 4 guarantees that X −γm of these events has probability at least 1 − e for some γ. η1 ˆ for ξ = The proof of the error estimate ˆ x − x0 ≤ C mx follows by applying Lemma 9 to X 0
η1 C mx 2. 0
2
2.3. Comment on the complex-valued case Extending Theorem 1 to the complex-valued case may be possible. In the proofs above, the most central use of the real-valued assumption is in the concentration estimates of the dual certificate. The certificate is broken into pieces where ε is positive, negative, and zero. In the complex case, there are an infinite number of direction that the values of ε can take. Hence, the complex case will require a more involved analysis. Acknowledgments The author acknowledges generous funding by the grant NSF DMS-1464525. References [1] E. Candès, Y. Eldar, T. Strohmer, V. Voroninski, Phase retrieval via matrix completion, SIAM J. Imaging Sci. 6 (1) (2013) 199–225. [2] Emmanuel J. Candès, Xiaodong Li, Solving quadratic equations via PhaseLift when there are about as many equations as unknowns, Found. Comput. Math. (2012) 1–10. [3] Emmanuel J. Candès, Xiaodong Li, Yi Ma, John Wright, Robust principal component analysis?, J. ACM 58 (3) (2011) 11. [4] Emmanuel J. Candes, Xiaodong Li, Mahdi Soltanolkotabi, Phase retrieval via Wirtinger flow: theory and algorithms, IEEE Trans. Inform. Theory 61 (4) (2015) 1985–2007. [5] Emmanuel J. Candès, Thomas Strohmer, Vladislav Voroninski, PhaseLift: exact and stable signal recovery from magnitude measurements via convex programming, Comm. Pure Appl. Math. 66 (8) (2013) 1241–1274. [6] Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo, Alan S. Willsky, Rank-sparsity incoherence for matrix decomposition, SIAM J. Optim. 21 (2) (2011) 572–596. [7] Yudong Chen, Ali Jalali, Sujay Sanghavi, Constantine Caramanis, Low-rank matrix recovery from errors and erasures, IEEE Trans. Inform. Theory 59 (7) (2013) 4324–4337. [8] Laurent Demanet, Paul Hand, Stable optimizationless recovery from phaseless linear measurements, J. Fourier Anal. Appl. 20 (1) (2014) 199–221. [9] J.R. Fienup, Phase retrieval algorithms: a comparison, Appl. Opt. 21 (1982) 2758–2768.
JID:YACHA
AID:1110 /COR
[m3L; v1.172; Prn:20/01/2016; 9:21] P.13 (1-13) P. Hand / Appl. Comput. Harmon. Anal. ••• (••••) •••–•••
13
[10] Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candes, Yi Ma, Dense error correction for low-rank matrices via principal component pursuit, in: 2010 IEEE International Symposium on Information Theory Proceedings, ISIT, IEEE, 2010, pp. 1513–1517. [11] R.W. Gerchberg, W.O. Saxton, A practical algorithm for the determination of phase from image and diffraction plane pictures, Optik 35 (1972) 237–246. [12] David Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Trans. Inform. Theory 57 (3) (2011) 1548–1566. [13] Robert W. Harrison, Phase problem in crystallography, J. Opt. Soc. Amer. A 10 (5) (1993) 1046–1055. [14] Daniel Hsu, Sham M. Kakade, Tong Zhang, Robust matrix decomposition with sparse corruptions, IEEE Trans. Inform. Theory 57 (11) (2011) 7221–7234. [15] Richard Kueng, Holger Rauhut, Ulrich Terstiege, Low rank matrix recovery from rank one measurements, CoRR, arXiv:1410.6913 [abs], 2014. [16] Beatrice Laurent, Pascal Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Statist. (2000) 1302–1338. [17] Xiaodong Li, Compressed sensing and matrix completion with constant proportion of corruptions, Constr. Approx. 37 (1) (2013) 73–99. [18] J. Löfberg, Yalmip: a toolbox for modeling and optimization in MATLAB, in: Proceedings of the CACSD Conference, Taipei, Taiwan, 2004. [19] R.P. Millane, Phase retrieval in crystallography and optics, J. Opt. Soc. Amer. A 7 (3) (1990) 394–411. [20] K.C. Toh, M.J. Todd, R.H. Tutuncu, SDPT3 – a Matlab software package for semidefinite programming, Optim. Methods Softw. 11 (1998) 545–581. [21] R.H. Tutuncu, K.C. Toh, M.J. Todd, Solving semidefinite-quadratic-linear programs using SDPT3, Math. Program. Ser. B 95 (2003) 189–217. [22] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, in: Y.C. Eldar, G. Kutyniok (Eds.), Compressed Sensing: Theory and Applications, Cambridge University Press, 2012. [23] John Wright, Arvind Ganesh, Shankar Rao, Yigang Peng, Yi Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, in: Advances in Neural Information Processing Systems, 2009, pp. 2080–2088.