Linear Algebra and its Applications 537 (2018) 221–249
Contents lists available at ScienceDirect
Linear Algebra and its Applications www.elsevier.com/locate/laa
Pseudo-skeleton approximations with better accuracy estimates A.I. Osinsky a,c , N.L. Zamarashkin a,b,c,∗ a
Institute of Numerical Mathematics RAS, Moscow, Gubkina str., 8, Russia Moscow State University, Moscow, Leninskie Gory, 1, Russia Moscow Institute of Physics and Technology, Dolgoprudny, Institutsky per., 9, Russia b c
a r t i c l e
i n f o
Article history: Received 29 July 2016 Accepted 29 September 2017 Available online 2 October 2017 Submitted by V. Mehrmann MSC: 65F30 65F99 65D05
a b s t r a c t We propose a priori accuracy estimates for low-rank matrix approximations that use just a small number of the rows and columns. This number is greater than the approximation rank, unlike the existing methods of pseudo-skeleton approximation. But the estimates are more accurate than previously known ones. This paper generalizes the results of [12,13]. © 2017 Elsevier Inc. All rights reserved.
Keywords: Low rank approximations Pseudoskeleton approximations Maximum volume principle
1. Introduction In the modern computational mathematics, the low-rank approximations of the dense matrices built using a small number of their elements play an important role as effective algorithms for working with very large dense matrices [1–5] and as a part of so called * Corresponding author. E-mail addresses:
[email protected] (A.I. Osinsky),
[email protected] (N.L. Zamarashkin). https://doi.org/10.1016/j.laa.2017.09.032 0024-3795/© 2017 Elsevier Inc. All rights reserved.
222
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
cross approximation methods of low-parametrical approximations of multidimensional data (see, e.g. [6–9]). The availability of reliable a priori accuracy estimates for low-rank matrix approximations play a crucial role in the justification of the developed algorithms. The problem of theoretical justification for such methods becomes significantly more complicated while moving from matrices to multidimensional matrices (tensors), i.e. from 2 to 3 and more dimensions. The known accuracy estimates for the low-parametrical approximations of multidimensional data can hardly be considered satisfactory [6–8]. One of the major obstacles to improve them are additional factors in the elementwise estimates for the low-rank approximation of matrices. As a rule, these factors increase with the growth of the approximation rank [12,13]. Consider a matrix A ∈ CM ×N , where A = Z+F with rank Z = r and F 2 ε. Let us choose m columns C ∈ CM ×m and n rows R ∈ Cn×N in the matrix A. We are interested in the accuracy estimates for the best or nearly the best rank r approximations of the matrix A by its CGR approximation with some kernel G ∈ Cn×m (see, e.g. [11–13]). Apparently, the first systematic study of such matrix approximations was conducted by Gu, Eisenstat in [10] and Goreinov, Tyrtyshnikov, and Zamarashkin in [11,12]. The main difference is that Gu and Eisenstat supposed the use of all matrix elements in their algorithm, while Goreinov, Tyrtyshnikov, and Zamarashkin constructed the CGR approximation assuming available just a small part of matrix elements. They proved the possibility of using CGR approximations for the particular case m = n = r and obtained several estimates in the spectral matrix norm. In addition, the relation between the quasi-optimal pseudo-skeleton approximations and the maximum volume submatrices ˆ = was established. By definition, the volume of a square r × r submatrix Aˆ is V2 (A) ˆ | det(A)|. Developing these ideas Goreinov and Tyrtyshnikov proved the following theorem. Theorem 1. [13] Consider a matrix A in the block form A=
A11 A21
A12 A22
,
where A11 ∈ Cr×r has the maximum volume among all r × r submatrices in A. Then the following element wise accuracy estimate holds: A22 − A21 A−1 A12 (r + 1) σr+1 (A) , 11 C
(1)
where σr+1 (A) is the (r + 1)th singular value of A. One can easily see that Theorem 1 demonstrates a quasi-optimal accuracy of CGR approximations. But the presence of the factor r+1 in (1) devalues the result for deriving practically useful accuracy estimates for multidimensional arrays: in the d-dimensional case a factor like (r + 1)d−1 arises (see, e.g. [7]) and seems to be improved to
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
223
(r + 1)log2 (d−1) , which is done in [8] under rather strong additional assumptions that can be checked only a posteriori. In this work we propose the new accuracy estimates that are similar to (1), but with the factor r + 1 replaced by a constant arbitrarily close to 1. For this purpose we slightly increase the number of rows and columns used to construct a cross approximation of rank r. Of course, the cross approximations with the rows/columns number exceeding the approximation rank is not a new idea. For example, the randomized algorithms constructing CU R approximations are arranged this way (see, e.g. [17,20–22]). However, it seems that the randomized algorithms may overestimate the number of rows and columns that guarantee the high approximation quality with the probability close to 1 (see, e.g. [20–22]). Unlike the randomized algorithms, the new approach is mostly deterministic. The columns and rows in the CGR approximation are chosen by searching the maximum of a special function. For the particular case m = n = r, considered in [12,13], this function coincides with the volume of a submatrix at the intersection of the selected rows and columns. An advantage of this approach is that it requires only a small number of matrix elements. It is especially important when they are difficult to compute and anyway no large storage is needed to keep them. The present work extends the results of [12,13] to the case of m and n greater than r. It is worth noting a work of A. Mikhalev [14] which is particularly interesting by the idea of various generalizations of the concept of volume that makes the maximal-volume principle introduced in [13] more flexible. Another example of deterministic algorithm that uses more than r (in fact (r + 1)) columns/rows for the rank-r approximation is the adaptive cross approximation method by Bebendorf [2–4]. This approach is now very popular as a method that can be used in practice. But the extra column/row is used there just for the verifying the stopping criterion. Following [14], we generalize the concept of the volume to the case of rectangular matrices. Since for a square matrix B ∈ Cp×p the volume coincides with the product p of its singular numbers V(B) = σi (B), we define the volume a rectangular matrix B size B ∈ C
m×n
as V(B) =
p
i=1
σi (B), where p = min(m, n).
i=1
Let us introduce one more important functional of the singular values. For a rectangular matrix B we call r-projective (projective) volume Vr (B) the product of its r largest r singular values Vr (B) = σi (B). If any size of B is less than r, then let Vr (B) = 0. i=1
The following notation is used. For any matrix B such that rank (B) ≥ r, we define the matrices Br and Br† using the singular value decomposition B = U ΣV ∗ as follows: Br = U Σr V ∗ ,
σi (Σr ) =
σi (Σ) , 0,
if i r, otherwise.
224
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
Br†
=V
Σ†r U ∗ ,
σi Σ†r
=
σi−1 (Σ) , 0,
if i r, otherwise.
If r = rank B, we write B † instead of Br† , and B † is the classical pseudo-inverse of B. Finally, for a threshold τ > 0 we introduce matrices [B]τ and [B]†τ : σi (Σ) , if σi (Σ) τ, [B]τ = U [Σ]τ V ∗ , σi ([Σ]τ ) = 0, otherwise. σi−1 (Σ) , if σi (Σ) τ, † [B]τ = V [Σ]†τ U ∗ , σi [Σ]+ = τ 0, otherwise. The work is organized as follows. Section 2 describes some properties of the volume and r-projective volume. In Section 3 we prove the existence theorems for quasi-optimal CGR approximations of rank r with the number of selected rows and columns exceeding the approximation rank. It should be remarked that Theorems 2 and 3 of Section 3 were previously formulated and proved in [14]. However, the proof technique used in [14] seems somewhat confusing and as if the connection with the original work [12] is completely lost. Now we provide a proof entirely in the spirit of the work [12]. The principal results of this paper are in Section 4. Herein we present an elementwise error estimate for the CGR approximation using the maximal volume submatrix, and moreover, a similar result based on the projective volume of submatrices. Finally, Section 5 describes an idea of the search for submatrices with a large projective volume and as well contains the results of some numerical experiments. 2. The volumes and matrices with bounded inverses As we have already mentioned, the maximal volume principle is crucial to construct the skeleton matrix approximations of guaranteed accuracy. We capitalize on the idea that better estimates can be based on various generalizations of the concept of volume and, as the main contribution, we present the new ones that we call projective volumes. In this section we consider the volume of rectangular matrices and introduce and study the concept of the r-projective volume. The volume V(A) = V2 (A) of a rectangular matrix A ∈ Cm×n is defined as
min(m,n)
V(A) =
σi (A),
i=1
where σ1 (A) ≥ σ2 (A) ≥ . . . are the singular values ordered by descending. This definition m slightly differs from the one in [14] which states the volume to be σi (A). In our case i=1
the volume may be nonzero when m > n (and is equivalent to the volume from [14] for the conjugate matrix A∗ ). If m n, then one may calculate V(A) as follows: V (A) = det (A∗ A).
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
225
Lemma 1. Let A ∈ Cm×n , B ∈ Cn×k . Then the volume possesses the following properties: 1. If m = n k or k = n m, then V(AB) = V(A)V(B). If k = m, then V(AB) V(A)V(B). In the latter case, the equality holds provided the rows of A are linear combinations of the columns of B. 2. If m k then m
V(AB) V(A) B2 . If k m then k
V(AB) A2 V(B). 3. Let W = A
B ∈ Cm×(n+k) , n + k m. Then V(W ) V(A)V(B),
and if A∗ B = 0 then
the equality holds. 4. Let W = A b ∈ Cm×(n+1) , n m and let A be of maximal volume among all m × n submatrices of W . Then n+1 V(W ) V(A) n−m+1 and, besides that, A† b22
m . n−m+1
5. Suppose that U ∈ Cm×N is a matrix with orthonormal rows, i.e. U U ∗ = Im , and let ˆ be of maximal volume among all m × n submatrices in U . Then m n and U ˆ † 2 1 + (N − n)m . U n−m+1 Proof. 1. Consider the case m = n. We have V 2 (AB) = det (ABB ∗ A∗ ) = |det (A)| det (BB ∗ ) |det (A∗ )| = = det (AA∗ ) det (BB ∗ ) = V 2 (A) V 2 (B) .
226
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
The required expression is obtained by extracting the square root. The case k = n follows from it after replacing the matrix A by B ∗ and the matrix B by A∗ , since V(AB) = V (B ∗ A∗ ) . In this case k and m are interchanged. Let us turn to the case m = k. If, moreover, k < n, then the product AB is not a full rank matrix, and so the left-hand side is always 0. For the case k n we make use of the following singular value decompositions of A and B: A = UA ΣA VA∗ , UA ∈ Cm×m , ΣA ∈ Cm×m , VA ∈ Cn×m , B = UB ΣB VB∗ , UB ∈ Cn×k , ΣB ∈ Ck×k , VB ∈ Ck×k . Then V (AB) = V (UA ΣA VA∗ UB ΣB VB∗ ) = |det (UA ΣA VA∗ UB ΣB VB∗ )| = = |det (ΣA )| |det (ΣB )| |det (VA∗ UB )| = = V (A) V (B) |det (VA∗ UB )| . Now we estimate |det (VA∗ UB )|: |det (VA∗ UB )| VA∗ UB 2 VA∗ 2 UB 2 = 1. n
n
n
So we obtain the required inequality substituting 1 instead of |det (VA∗ UB )|. Finally, if the rows of A are linear combinations of the columns of B then T
A = A∗ = BP for some square matrix P ∈ Cn×n . Then we have V (AB) = V (P ∗ B ∗ B) = V (P ) V (B ∗ B) = V (P ) det (B ∗ B) = = V (P ) V 2 (B) = V (BP ) V (B) = V (A) V (B) . Here we have used the already proven case m = n. 2. The second inequality follows directly from the first one, because V(AB) = V(B ∗ A∗ ) and we can simply redefine B ∗ as A, A∗ as B, and interchange m and k. So let us prove the first inequality. The case m > n is trivial since the matrix is not full-rank and so the left-hand side equals zero. The cases where one of the inequalities turns into an equality are already analyzed in the item 1. Consequently,
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
227
m
V(AB) V(A)V(B) V(A) B2 , where the trivial inequality is used:
min(n.k)
V(B) =
min(m,k)
σi (B) =
i=1
m
σi (B) B2 .
i=1
To finish the proof of the item 2 we will use well-known inequality related to the singular values of the product AB of two matrices: σi (AB) = σi (A)B2 .
(2)
Taking the product of all singular values of AB, we obtain
min(m,k)
V (AB) =
m
σi (AB) B2
i=1
m
m
σi (A) V (A) B2 .
i=1
3. First we consider the case A∗ B = 0
A∗ A 0k×n
0n×k B∗B
Extracting the square root we get the required equality.
∗
V (W ) = det (W W ) = det 2
= V 2 (A) V 2 (B) .
∗
If A B = 0 then multiply W by a unity-volume matrix P =
In×n 0k×n
−A† B . By the Ik×k
item 1, we have V (W ) = V (W P ) =
I −A+ B n×n =V A B 0k×n Ik×k
=V A (I − AA+ ) B . Consider a matrix
W = A B = A (I − AA+ ) B . Since A∗ B = 0, we can use the already discussed case A∗ B = 0 applied to the matrix W . Finally we obtain V (W ) = V (W ) = V (A) V I − AA+ B k V (A) V (B) I − AA+ 2 = V (A) V (B) .
228
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
We also take advantage of the item 2 and the fact that the spectral norm of orthogonal projector I − AA† is equal to 1. 4. For completeness, here we present a proof similar to the one outlined in [14] (note that a different approach was earlier proposed by F.R. De Hoog and R.M.M. Mattheij [15,16]). First of all we prove that det(W W ∗ ) =
n+1 1 det(Ai A∗i ), n − m + 1 i=1
where the summation is over all m × n submatrices Ai . By the Binet–Cauchy theorem, det(W W ∗ ) =
det(Wj ) det(Wj∗ ) =
j
det(Wj Wj∗ ),
j
where the summation is over all m × n submatrices Wj of W . The same is true for the matrix Ai A∗i : det(Ai A∗i ) =
det(Wj Wj∗ ),
(3)
j
where the summation is over all m × m submatrices of Ai . It can be easily checked that in (3) every summand Wj (a submatrix of size m × m in W ) is counted n − m + 1 times (the number of submatrices Wj in the matrix Ai ). Thus we have 1 det(Ai A∗i ). n − m + 1 i=1 n
det(W W ∗ ) =
Since the submatrix A is of maximal volume, we conclude that 1 det(Ai A∗i ) n − m + 1 i=1 n
V 2 (W ) = det(W W ∗ ) =
(4)
n+1 n+1 det(AA∗ ) = V 2 (A). n−m+1 n−m+1 The square root extracting gives us the required inequality. To finish the proof we estimate det(W W ∗ ) in a different way: det(W W ∗ ) = det(AA∗ + bb∗ ) = det(AA∗ ) det(I + (AA∗ )−1 bb∗ ).
(5)
The matrix I + (AA∗ )−1 bb∗ is the sum of the identity matrix and a rank-1 matrix, thus only one eigenvalue is not equal to 1. This eigenvalue must coincide with the determinant. So we have
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
229
det(I + (AA∗ )−1 bb∗ ) = det(I + (A† )∗ A† bb∗ ) = = tr(I + (A† )∗ A† bb∗ ) − (m − 1) = 1 + tr((A† )∗ A† bb∗ ) = = 1 + tr((A† b)∗ A† b) = 1 + A† b22 . By substituting this expression into (5) and comparing it with (4), we find that n+1 det(AA∗ ) det(W W ∗ ) = (1 + A† b22 ) det(AA∗ ). n−m+1 By reducing both sides by det(AA∗ ) and expressing A† b22 , we get the desired inequality. ˆ is located in 5. Without loss of generality it can be assumed that the submatrix U the first n columns of U . In this case, the matrix U can be represented in the form
ˆ bn+1 · · · bN . U= U Then 2 2 ˆ + ˆ + ˆ + 2 2 U = U = U U 2 = U 2 2 2
2 ˆ+ ˆ ˆ+ + ˆ = U U U bn+1 · · · U bN = 2 ˆ† ˆ ˆ∗ ˆ† ∗ ˆ† ˆ † )∗ + · · · + U ˆ † bN b∗ (U ˆ † )∗ = U U U (U ) + U bn+1 b∗n+1 (U N N ˆ + 2 ˆ + ˆ 2 U + U U bi 1 + (N − n) 2
i=n+1
2
(6)
2
m . n−m+1
Here we have used the already proven item 4 and as well the properties of the 2-norm. Extracting square the root leads us to the desired inequality. 2 Let us introduce one more functional of the singular values that will play a significant role in our constructions. We define the r-projective volume of a matrix A ∈ Cm×n as Vr (A) =
r
σi (A).
i=1
When it does not lead to an ambiguity, we will omit the reference to r and simply call it the projective volume. If r is greater than min(m, n), then by definition we assume that Vr (A) = 0. Lemma 2. Let A ∈ Cm×n , B ∈ Cn×k . Then the following properties of the r-projective volume hold: 1. Vr (AB) Vr (A)Br2 , Vr (AB) Vr (B)Ar2 .
230
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
2. Consider matrix W = A b ∈ Cm×(n+1) , and its submatrix A of the maximal r-projective volume among the m × n submatrices of W (with r n). Then Vr (W ) Vr (A)
n+1 . n−r+1
(7)
Proof. 1. The proof is similar to the one of the item 2 of Lemma 1, where the volume is defined as the product of all singular values. Using (2) we obtain Vr (AB) =
r
r
σi (AB) B2
i=1
r
r
σi (A) = Vr (A) B2 ,
i=1
and the second inequality is derived in the same way. 2. Consider the singular value decomposition W = U ΣV ∗ with the square factors U and V . Then the submatrix A can be represented in the form A = U ΣVˆ ∗ , where Vˆ ∈ Cn×m is composed of the first n rows of V . Let Ur , Vr and Vˆr denote the columns of U , V and Vˆ , corresponding to the r largest singular values of W (here and further we assume that σi σi+1 ), and Σr 0 denote the first r rows of Σ. In this case, we can write a sequence of transformations as follows: Vr (A) Vr (Ur∗ A) = Vr ( Σr
0 Vˆ ∗ ) =
= V(Σr Vˆr∗ ) = Vr (W ) · V(Vˆr ),
(8)
where the inequality is satisfied by the property 1 for the projective volume and the last equation is due to the property 1 of the volume. If the matrix A corresponds to the maximal volume submatrix Vˆr (among all submatrices Vr of the same size), then by the item 4 of Lemma 1 we would have 1 = V(Vr ) V(Vˆr )
n+1 . n−r+1
This is equivalent to the inequality V(Vˆr )
n−r+1 . n+1
(9)
Comparing (8) and (9), one can conclude that a submatrix A in W always exists such that (7) holds. Consequently, the inequality (7) is surely satisfied when A is a submatrix with maximal r-projective volume (among all submatrices of this size). 2 Remark 1. The property 1 of the volume can be generalized to the r-projective volume as follows:
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
231
Vr (AB) Vr (A)Vr (B). It is a direct consequence of the well-known Horne inequalities for the singular values of the product of two matrices (see, e.g. [23], Theorem 3.3.4). However, we will not use it hereinafter. 3. CGR approximations with better accuracy estimates By definition, the CGR approximation of a matrix A is its approximation as a product of three matrices: matrix C, composed of some of the columns of A, matrix R, composed of the rows of A, and some specially selected matrix G. If G coincides with the inverse to the submatrix at the intersection of C and R, the approximation is called skeleton. If any kind of pseudoinverse is used, we call it pseudoskeleton. In any case this is what is called a cross approximation. Let U ∈ CN ×r be a unitary matrix, and Mn (U ) be the set of all n × r submatrices of U . Similarly to [12,14], we define a function t(r, N, n) as follows: t(r, N, n) =
1 min U
max
ˆ ∈Mn (U ) U
ˆ) σmin (U
.
(10)
By the property 5 of Lemma 1 of the section 2, the following inequality holds: t(r, N, n)
1+
(N − n)r . n−r+1
(11)
We observe that the value of t(r, N, n) decreases with the growth of n. That allows us to build a cross approximation of rank r with an improved accuracy estimates as we will show below. We begin with the following theorem. Theorem 2. [12,14] Assume that A ∈ CM ×N , A = Z + F , rank Z = r, and F 2 ε. Then there exist m columns and n rows of A which determine a pseudo-skeleton CGR approximation such that 2 . A − CGR2 ε 1 + t(r, M, m) + t(r, N, n)
(12)
Proof. Consider the singular value decomposition of Z: Z = U ΣV, U ∈C
M ×r
, Σ∈C
r×r
, V ∈C
(13) r×N
,
ˆ ∈ Cm×r and Vˆ ∈ Cr×n of the matrices U and V respectively, and any submatrices U ˆ † 2 t(r, M, m) and Vˆ † 2 t(r, N, n). Submatrices with such estimates such that U always exist according to (10).
232
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
ˆ For the CGR approximation let us take the rows and columns corresponding to U ˆ and V . Denote by C and FC the M × n submatrices of A and F in the selected columns. Analogically, denote by R and FR the m × N submatrices in the selected rows. Finally, denote by Aˆ and Fˆ the m × n submatrices of A and F standing at the intersection of the selected rows and columns. In our notations the CGR approximation with the kernel G can be written as ˆ ΣV + FR = U ΣVˆ GU ˆ ΣV + E, CGR = U ΣVˆ + FC G U
(14)
ˆ ΣV + FR − FC GFR . E = U ΣVˆ + FC GFR + FC G U
(15)
where
ˆ † and Vˆ † and note that U ˆ †U ˆ = Ir and Vˆ Vˆ † = Ir . Then Consider the pseudoinverses U we transform (15) to ˆ† U ˆ ΣVˆ G FR + FC GU ˆ ΣVˆ Vˆ † V + FC GFR = E = UU ˆ † (ΦG) FR + FC (GΦ) Vˆ † V + FC GFR , = UU
(16)
where ˆ ΣVˆ = Aˆ − Fˆ . Φ=U Similarly using Φ we rewrite (13) and (14) as follows: ˆU ˆ † ΦVˆ † V Z =A−F =U
(17)
ˆ † (ΦGΦ) Vˆ † V + E. CGR = U U
(18)
Now we define the kernel of the pseudo-skeleton approximation by G = [Φ]†τ .
(19)
Φ[Φ]†τ Φ = [Φ]τ ,
(20)
Φ[Φ]†τ 2 = [Φ]†τ Φ2 1,
(21)
Since
and, moreover,
then for E2 in (16) we obtain ˆ † 2 + Vˆ † 2 + ε . E2 ε U τ
(22)
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
233
Using this inequality together with (17), (18), and (20) we obtain the estimate ˆ † 2 Vˆ † 2 + A − CGR2 ≤ ε + τ U Now by setting τ =
ε ˆ † 2 Vˆ † 2 U
ε2 ˆ † 2 + εVˆ † 2 . + εU τ
(23)
in (23) we complete the proof of the theorem. 2
Note that the kernel G in the form (19) requires an explicit representation of the perturbation matrix F . In the following Theorem 3 the parameter τ is completely determined by F 2 , where F can be regarded as an error of the best possible approximation in the 2-norm. Theorem 3. [12,14] Assume that A ∈ CM ×N , A = Z + F , rank Z = r, and F 2 ε. Then there exists m ×n submatrix Aˆ such the CGR approximation defined by the selected m columns and n rows satisfy ˆ † R2 ε (2 + 2t(r, N, n) + 2t(r, M, m) + 5t(r, N, n)t(r, M, m)) . A − C[A] ε
(24)
Proof. Define the matrices C and R analogously as in Theorem 2, and rewrite (16), (17) and (18) in the terms of Aˆ and Fˆ : ˆ † AG ˆ E =U U FR + FC GAˆ Vˆ † V + FC GFR − ˆ † Fˆ GFR − FC GFˆ Vˆ † V, −U U ˆ † AˆVˆ † V − U U ˆ † Fˆ Vˆ † V, A − F =U U ˆ † AG ˆ Aˆ Vˆ † V + E− CGR =U U ˆ † AG ˆ Fˆ Vˆ † V + ˆ † Fˆ GAˆ Vˆ † V − U U −U U ˆ † Fˆ GFˆ Vˆ † V. +U U In order to get a pseudo-skeleton approximation we set ˆ †τ , G = [A] then the relations similar to (20) and (21) hold: ˆ A] ˆ † Aˆ = [A] ˆ τ, A[ τ ˆ † ˆ ˆ ˆ † A[A]τ 1, [A] τ A 1. Thus
234
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
ˆ † ˆ † ˆ † ˆ † ˆ † R A − A˜ = V − C[ A] ε + τ U + ε A U V + τ 2 2
2
2
2
ε2 ε2 ε2 ˆ † ˆ † + U +εU + εVˆ † + + Vˆ † + τ τ τ 2 2 2 2 2 ε ˆ † ˆ † ˆ † ˆ † +2εU V + U V . τ 2 2 2 2
2
By selecting the threshold τ = ε, we obtain the estimate (24). 2 Remark 2. The above theorems in the case of m = n = r were proved in [12]. For √ sufficiently small r the right-hand side expression is proportional to r in Theorem 2, and to r in Theorem 3. However, for example, for n = m = 2r the coefficient will only decreases with the increase of r. In this case, while constructing the approximations of higher ranks we will not get a stronger growth compared to the estimate for small ranks. Theorem 4. [11] Assume that A ∈ CM ×N , A = Z + F , rank Z = r, F 2 ε. Then there exists a submatrix Aˆ ∈ Cm×n (m n) such that the τ -pseudoskeleton approximation A˜ = Cˆ[A]†τ R satisfies the inequality
˜ 2ε A − A
2 . 1 + t2 (r, M, m) 1 + t (m, M, m) + t (m, N, n)
(25)
Proof. Consider the singular value decomposition of A in the block form A = U ΣV = U
Σ1 0
0 Σ2
V,
Σ1 ∈ Cr×r .
Then we can assume that Z=U
Σ1 0
0 V. 0
We also represent the matrix U in the block form:
U=
U11 U21
U12 U22
,
U11 ∈ Cm×r .
The similar block representations will be used for A, Z, and F . Without loss of generality ˆ , and thus U ˆ † 2 t(r, M, m). we can assume that U11 = U Consider a new error matrix 0 0 m×m m×(M −m) F. F˜ = † −U21 U11 I(M −m)×(M −m) Thus matrix Z˜ = A − F˜ can be represented as
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
Z˜ = =
A11 † Z21 + U21 U11 F11
A12 † Z22 + U21 U11 F12
235
=
A11 A12 † † † † U21 U11 Z11 + U21 U11 F11 U21 U11 Z12 + U21 U11 F12
Im×m = A A 11 12 , † U21 U11
=
so rank (A − F˜ ) m and the following estimate for F˜ is true: 2 ∗ 2 † † ∗ 2 † F˜ F 2 U U U U + I I ε + 1 U 21 21 11 11 11 2 2 ε 1 + t2 (r, M, m) .
2
2
2
˜ Because of the coincidence of the rows Apply the Theorem 2 to the matrices A and Z. ˆ it turns out that Aˆ = Z˜ and the CGR approximation in Theorem 2 coincides with the τ -pseudoskeleton approximation. In this case, r is replaced with m and ε is with ε 1 + t2 (r, M, m), and we finally get (25). 2 Remark 3. This theorem was proved in [11] for the particular case n = m = r. Lemma 3. [7] Assume that A ∈ CM ×N , A = Z + F , rank Z = r, and F 2 ε. Then there exists a CW approximation using n columns such that A − CW 2 ε (1 + t (r, N, n)) . Proof. Let us choose the columns analogously as in Theorem 2, so that Vˆ † 2 t(r, N, n). Then Z =A − F = U ΣV = U ΣVˆ Vˆ † V = (C − FC ) Vˆ † V = =C Vˆ † V − FC Vˆ † V. Set W = Vˆ † V . Then A − CW 2 = F − FC Vˆ † V ε + εVˆ † ε (1 + t (r, N, n)) . 2
2
Remark 4. The theorem was proved in [7] for the case n = r. Remark 5. The matrix W can also be represented as † W = Z11 Z11
Z12 .
2
236
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
We can see that † ∗ Z11 [Z11 Z12 ] = Vˆ † Σ−1 U11 U11 ΣV = Vˆ † V,
Vˆ , Z11 ∈ Cr×n , Thus, if
Z11
Z12
= A11
Σ, U11 ∈ Cr×r ,
V ∈ Cr×N .
A12 , then the CW is a cross approximation.
Applying Lemma 3 instead of Theorem 2 in the proof of Theorem 4 gives a more precise estimate as below. Theorem 5. [14] Assume that A ∈ CM ×N , A = Z + F , rank Z = r, and F 2 ε. Then there exists a submatrix Aˆ ∈ Cm×n (m n) such the corresponding pseudoskeleton approximation A˜ = C Aˆ† R satisfies the inequality A − A˜ ε 1 + t2 (r, M, m) (1 + t (m, N, n)) . 2
(26)
This estimate is interesting because here we have τ = 0. Moreover it is better than any previously known estimate for the 2-norm of the CGR approximations with G defined by Aˆ in case of m = n = r. Remark 6. The work [14] contains an alternative proof for Theorem 5, but with a missing term 1 in the right-hand side of (26). Remark 7. One can use an expression
1+
(M −m)r n−r+1
instead of
† 2 right-hand side. It follows from (6), and since U21 U11 2 ˜ unnecessary term 1 when calculating F 2 .
1 + t2 (r, M, m) in the
(M −m)r n−r+1 ,
we get rid of the
4. Main results In this section we formulate and prove two theorems concerning the elementwise accuracy estimates of the cross approximations with the number of rows m and the number of columns n exceeding the approximation rank r. Chebyshev norm of a matrix is defined as follows: AC = max |Aij |. i,j
Theorem 6. Let a matrix A ∈ CM ×N be represented in a block form A=
A11 A21
A12 A22
,
where the submatrix A11 ∈ Cm×n (with m n, rank A11 = n) has the maximal volume among all m × n submatrices of A, and let a rank-n pseudoskeleton CGR approximation
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
be constructed using the columns C =
A11 A21
and rows R = A11
237
A12
and the matrix
A†11 as the kernel G. Then √ † A − CA11 R 1 + n · 1 + C
n · σn+1 (A) . m−n+1
(27)
Proof. Let the C-norm of the error matrix A − CA†11 R be reached at the position of an element a in the matrix A. First we note that if the element a stands in the columns C, then its representation is exact. Therefore it is necessary to prove the estimate only for the elements out of the columns C (and analogously out of the rows R). By a submatrix Aˆ we denote a bordering of A11 by a column and a row:
Aˆ =
A11 c∗
b a
∈ C(m+1)×(n+1) ,
b ∈ Cm ,
c ∈ Cn .
Consider a quantity γ = a − c∗ A†11 b that is equal to the element of A − CA†11 R whose absolute value we are going to evaluate. † A 0 n×1 11 ˆ= ∈ C(n+1)×(m+1) . Using Property 3 of the volume Define a matrix B 01×m 1 we obtain 0 n † † ˆ =V A V B = V A11 = V −1 (A11 ) . 11 1 2
Here we take into account that the volume of a column coincides with its 2-norm. ˆ and applying the Property 1 of the volume we get After multiplying Aˆ by B V Aˆ ˆ Aˆ V B ˆ V Aˆ = V B . V (A11 ) ˆ A: ˆ Now we calculate the volume of a product B
A11 b 0n×1 In×n A†11 b = , c∗ a 1 c∗ a In×n A†11 b ∗ † ˆ ˆ ˆ ˆ V B A = det B A = det = a − c A b 11 = |γ| . c∗ a ˆ Aˆ = B
A†11 01×m
Finally we obtain
|γ|
V Aˆ V (A11 )
.
238
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249 ˆ V A
Therefore, it is sufficient to evaluate V(A11 ) . For that, we assume that V Aˆ > 0, since otherwise the estimate is trivial. In order to estimate the ratio matrix A:
ˆ V A
V(A11 ) ,
we use the (n + 1)-th singular value of the
−1 ∗ ˆ ˆ = σ1 A A
1 1 = = σn+1 (A) σn+1 Aˆ σn+1 Aˆ∗ Aˆ 1
−1 −1 −1 ∗ ∗ ∗ ˆ ˆ ˆ ˆ ˆ ˆ . = A A A A (n + 1) A A 2
F
(28)
C
Any n × n submatrix of Aˆ∗ Aˆ can be represented in the form C1∗ C2 , where C1 and C2 ˆ The matrices C1 and C2 have their own maximal are some (m + 1) × n submatrices of A. volume m × n submatrices. But the volumes of these submatrices still will not exceed V(A11 ), so the Property 4 of the volume leads us to V (C1,2 ) V (A11 )
1+
n , m−n+1
where C1,2 means that the inequality is true both for C1 and C2 . Using the Property 4 of the volume, we can now obtain an estimate for C1∗ C2 : V
(C1∗ C2 )
n V (C1 ) V (C2 ) V (A11 ) 1 + m−n+1
2
.
Now we apply the formula for the inverse matrix element and obtain an estimate for the −1 maximum modulus element in Aˆ∗ Aˆ : ∗ det (C1∗ C2 ) ∗ −1 = max V (C1 C2 ) Aˆ Aˆ = max C1 ,C2 C det Aˆ∗ Aˆ C1 ,C2 V 2 Aˆ V 2 (A11 ) n 1+ . m−n+1 V 2 Aˆ Substituting this estimate in (28), we get the following: 1 σn+1 (A)
(n + 1) 1 +
=
n m−n+1
V (A11 ) √ 1+n· V Aˆ
1+
V 2 (A11 ) = V 2 Aˆ
n . m−n+1
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
239
After some reorganizations it becomes V Aˆ V (A11 )
√ 1+n·
1+
n · σn+1 (A) . m−n+1
ˆ V A
As it was mentioned above |γ| V(A11 ) , thus the estimate (27) holds for this case of position of a. The last case is when the element a stands in the rows R but not in the columns C. To evaluate the absolute value of γ, we consider the following extension A˜ of the matrix A11 : A˜ = A11
b ∈ Cm×(n+1) ,
such that a stands in the column b. The column b defined as b = A11 A†11 b, will correspond to b in A˜ in the pseudo-skeleton approximant CA†11 R. The error for the element a is estimated above by the 2-norm of error rate of the column b which equals † † I − A11 A11 b = V I − A11 A11 b . 2
Note that the similar expression was obtained in the Property 3 of the volume. Choosing W = A˜ we obtain V A˜ = V (A11 ) V I − A11 A†11 b , V A˜ † . V I − A11 A11 b = V (A11 ) The further proof is completely analogous to the previous case. Instead of (28) we obtain
1 σn+1 (A)
−1 (n + 1) A˜∗ A˜ . C
The sizes of C1 and C2 coincide with the size of A11 . Therefore V C˜1,2 V (A11 ) , and the final estimate contains the only one of the roots which makes it even better. √ n So, wherever the element a stands, its error will not exceed 1 + n · 1 + m−n+1 · σn+1 (A). This proves the theorem. 2 Now let us formulate the main result of the work.
240
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
Theorem 7. Let a matrix A ∈ CM ×N be represented in the block form A=
A11 A21
A12 A22
,
where the submatrix A11 ∈ Cm×n (with rank A11 r) has the maximal r-projective volume among all m × n submatrices of A. a rank-r pseudoskeleton approxima Consider
A11 , rows R = A11 A12 and the matrix tion constructed using the columns C = A21 (A11 )†r as the kernel G. Then the following inequality holds: A − C(A11 )†r R C
1+
r · m−r+1
1+
r · σr+1 (A) . n−r+1
(29)
Theorem 7 leads to the following corollaries. Corollary 1. If any size of A11 equals r (for example, r = m), then the r-projective volume coincides with the volume, and for the maximal volume submatrices the following estimate holds: √ r A − CA+ R r + 1 · 1 + · σr+1 . 11 C n−r+1 In case of n = 2r − 1 it gives † A − CA11 R 2 (r + 1) · σr+1 . C
Corollary 2. If A11 is a square submatrix of order n = m = 2r − 1, then (29) turns into A − C(A11 )+ r R C 2σr+1 (A) . Proof of Theorem 7. We consider several cases depending on the position of the element a in the matrix A on which the C-norm error of the pseudo-skeleton approximation is attained. If a is an element of the submatrix A11 then † A11 − A11 (A11 )r A11 = A11 − (A11 )r 2 = σr+1 (A11 ) σr+1 (A) . 2
Let us assume that a is out of the columns C and the rows R. By Aˆ we denote a bordering of A11 by the column and row containing a: Aˆ =
A11 c∗
b a
∈ C(m+1)×(n+1) ,
b ∈ Cm ,
c ∈ Cn .
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
241
The element of the error matrix in the position corresponding to a can be written as †
γ = a − c∗ (A11 )r b = a − c∗ V Σ−1 U ∗ b, U ∈ Cm×r , V ∈ Cn×r , Σ ∈ Cr×r , where U ΣV ∗ is the singular valuedecomposition of (A11 )r . 0r×1 U∗ V 0n×1 ˆ We multiply A on the left by , and on the right by , 01×m 1 01×r 1 and apply the item 1 of Lemma 2 to get 0r×1 U∗ A11 b V 0n×1 r+1 r+1 ˆ V = A V 01×m 1 01×r 1 c∗ a Σ U ∗b Σ U ∗b r+1 det =V = = a a c∗ V c∗ V Ir×r U ∗b = det det (Σ) = c∗ V Σ−1 a = a − c∗ V Σ−1 U ∗ b Vr (A11 ) . ˆ = Vr (A), ˆ and come up with Now we express |γ| from above, use the equality V r+1 (A) V r+1 Aˆ
Vr Aˆ
Vr Aˆ = σr+1 Aˆ σr+1 (A) . |γ| Vr (A11 ) Vr (A11 ) Vr (A11 )
(30)
Let B be a submatrix of the maximal r-projective volume among all m × (n + 1) subˆ Furthermore, in B we can find an m ×n submatrix of maximal r-projective matrices of A. volume. The volume of this submatrix does not exceed Vr (A11 ), so the item 2 of the Lemma 2 implies n+1 Vr (A11 ) . Vr (B) (31) n−r+1 Considering Aˆ as a bordering of the matrix B and again using Lemma 2, we obtain the inequality m+1 Vr (B) . Vr Aˆ (32) m−r+1 Combining the above inequalities we obtain m+1 n+1 · · σr+1 (A) , |γ| m−r+1 n−r+1 which is equivalent to (29).
242
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
To complete the proof, it remains to consider the case when a is located in the rows R, but out of the columns C, or vice versa. Both cases are treated similarly. Consider a bordering A˜ of A11 by one column: A˜ = A11
b ∈ Cm×(n+1) .
We have † γ = b − A11 (A11 )r b = (I − U U ∗ ) b2 = V ((I − U U ∗ ) b) . 2
We multiply A˜ on the right by
V 01×r
0n×1 1
and apply the item 1 of Lemma 2 to get
the inequality V r+1 A˜ V UΣ
b
.
(33)
In the proof of the property 3 of the volume it was shown that V
A B
= V (A) V
I − AA† B .
After substituting A = U Σ and B = b we obtain the equality V
UΣ b
= Vr (A11 ) V ((I − U U ∗ ) b) .
The estimate for γ follows from this and from (33): γ
Vr A˜ V r+1 A˜ r σr+1 (A) 1 + · σr+1 (A) , Vr (A11 ) Vr (A11 ) n−r+1
where the last inequality also follows from Lemma 2. Since this inequality is not worse than (29), the theorem is proved. 2 Corollary 3. Since all the augmentations of A11 used in the proof attach to it only one column and/or row, it is sufficient that A11 has the maximal volume only in the submatrices that differ from it by one column and/or row. Corollary 4. If Vr Aˆ is α times less than the maximal volume, then the estimate is no more than α times worse. Proof. It was shown that |γ|
ˆ Vr (A) σr+1 (A). Vr (A11 )
(34)
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
243
Even if the C-norm be reached on an element within C and R, it is still possible to ˆ Hereat the volume can not decrease. expand the submatrix A˜ to A. ˆ we obtain from (34) an additional Replacing A11 by a maximal volume submatrix in A, factor α in the expression for |γ|. In this case the right-hand side expression for the maximal volume submatrix is estimated by the theorem. Accounting the coefficient α will increase the bound in α times. 2 These corollaries show that it is not necessary to choose the maximal volume submatrix to get a good approximation. It can explain the effectiveness of approximations by the submatrices that may not have a maximal projective volume. However, the Corollary 4 claims that for the greater projective volume the better accuracy can be guaranteed. It should be recognized that, in practice, the existing algorithms, which search submatrices with a large volume or projective volume, give no guarantee that α is close to 1 or at least is not too big. Therefore the accuracy estimate is not directly applicable. However, as will be seen hereinafter, the numerical experiments still show a fairly high accuracy of the approximation. Theorem 8. [18] Assume the conditions of Theorem 7 are fulfilled, and consider a matrix A = Z + E where Z is its best rank r approximation in the C-norm. Then (m + 1)(n + 1) † EC . A − CA11r R C (m − r + 1)(n − r + 1)
(35)
Proof. It is sufficient to prove (35) for an arbitrary extension Aˆ of the matrix A11 by one row and one column. From Theorem 7 it follows that (m + 1)(n + 1) ˆ ˆ † ˆ ˆ · σr+1 (A), A − CA11r R C (m − r + 1)(n − r + 1) ˆ are the corresponding submatrices of C and R. where Cˆ and R ˆ Since it equals the 2-norm of Now let us evaluate the (r + 1)-th singular value of A. the best rank r approximation error, we obtain ˆ E ˆ 2 EF σr+1 (A)
ˆ C, (m + 1)(n + 1)E
(36)
ˆ denotes the error matrix for the best rank r approximation of submatrix Aˆ in where E the C-norm. Its C-norm definitely will not exceed the C-norm of the matrix E, which is the error matrix for the best approximation of the matrix A in the C-norm. Thus ˆ σr+1 (A)
(m + 1)(n + 1)EC .
(37)
Substituting (37) in (36), we find that the inequality of the form (35) holds for any submatrix of A, which finishes the proof. 2
244
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
Remark 8. In [18], Theorem 8 was proved for the case n = m = r. Corollary 5. Taking n = m = 2r − 1 in (35) we conclude that A − CA+ R 4rEC . 11r C 5. Towards the numerical algorithm This section provides a preliminary and short description of how to find a quasiextreme submatrix with a rather large projective volume. Also we present the numerical results and compare the cross approximation errors for the submatrices of various sizes. In order to find a submatrix with a large projective volume, we propose to start with some r × r submatrix and then pad it with rows and columns. Note that an arbitrary nonsingular submatrix is not a very good choice of the initial submatrix. To be better off, we propose to begin with a submatrix obtained after the sequential use of the maxvol algorithm [19]. The main idea of the maxvol algorithm is to find an extremal submatrix [19] step by step among the submatrices that differ from the current one in a column or row. Then we need to determine rows and columns for adding to the current submatrix to increase the projective volume. The following lemma is helpful for that.
Lemma 4. Consider a matrix W = A b ∈ Cm×(n+1) , where the submatrix A has the maximal projective volume among all m × n submatrices of W . Then Vr (W ) Vr (A)
1 + A†r b22 .
Proof. We consider r < min(m, n), since otherwise the lemma is trivial. Let the matrix A have the singular value decomposition A = U ΣV ∗ with the square matrices U and V . Denote by Σr a square submatrix in Σ containing the r largest singular values. For the corresponding submatrices Ur and Vr we have Ar = Ur Σr Vr∗ . To estimate Vr (W ) consider Vr (Ur∗ W ). The following expressions were obtained in the proof of Property 4 of the volume (see Lemma 1): 2 det(W W ∗ ) = det(AA∗ ) 1 + A† b2 , 2 V(W ) = V(A) 1 + A† b2 . Using them along with the item 1 of Lemma 2 we obtain the required inequality
(38)
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
245
Fig. 1. The Chebyshev norm of the approximation errors depending on the rank r (averaged over 100 experiments). The first r singular values of the matrix A were set as 100 and the rest ones as 1.
Vr (W ) Vr (Ur∗ W ) = V =V = Ur∗ A Ur∗ b Σr Vr∗ Ur∗ b 2 2 † ∗ ∗ ∗ =V(Σr Vr ) 1 + (Σr Vr ) Ur b = Vr (A) 1 + A†r b . 2
2
2
Thus, we can introduce the matrix Cˆ = Aˆ†r R and choose a column of the maximum 2-norm. The equality (38) guarantees that this choice will lead to a guaranteed increase of volume. Similar extension can be carried out for the columns. This is exactly a way how we searched for a large projective volume submatrix. Note that an efficient algorithm of a submatrix padding in case of the standard volume of rectangular matrices (called maxvol2 ) was presented in [14]. Now we will describe our numerical experiments. We used the matrices of two types: with a sharp jump of the singular values after the r-th one (that guarantees the existence of an accurate rank r approximation) and with a smooth decrease of the singular values. The singular values of the matrices were set in advance, and the unitary factors U and V were constructed randomly by the orthogonalization of random Gaussian matrices Ω1 and Ω2 . In each experiment, the new matrices were generated. The experiments were performed for 100 ×100 matrices, with r changing from 1 to 50. The results are presented in Figs. 1 and 2. Note that the graphics corresponding to the maxvol algorithm contain only the errors that do not exceed the estimate (1). Otherwise, the error was assigned as (1). We see that in most cases it is a rough estimate, but if a chosen submatrix is far from the maximal volume, the approximation accuracy falls significantly. In case of the low ranks this fall is very prominent.
246
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
Fig. 2. The Chebyshev norm of the approximation errors depending on the rank r (averaged over 100 experiments). The singular values of the matrix A were set as σk (A) = 100/k.
This simplification was done because we were interested in comparison of these approximations with the large volume submatrices results, rather than in the examination of the submatrix search algorithms. For the search of large volume (2r − 1) × (2r − 1) submatrix we took the submatrix with the large r-projective volume as an initial guess. That helped us to reduce the probability to get a small volume submatrix: the first r singular values were already large enough. Fig. 1 demonstrates that the average error is obviously less than the estimate (29), but the increase of the submatrix size (while preserving the approximation rank) can significantly reduce it. Moreover, we see that for the multidimensional arrays this reduction will exponentially depend on the dimension. Furthermore, it is noticeable that the error increases with the approximation rank growth though the singular values do not change. In case of the maximal volume square submatrices this growth is the most significant, while in case of the large r-projective volume the approximation accuracy is almost independent of the rank. Also it should be mentioned that for volume-based approximations the accuracy is hardly improved with increasing number of rows/columns, and it even becomes worse for the higher ranks. Again, that is because of the fact that the singular values do not change, and the coefficient grows with r increasing. The situation in Fig. 2 is slightly different. In case of the low ranks, while the coefficient at σr (A) in (1) is low, the use of a higher rank approximation is beneficial. However starting from r = 6 the best approximation corresponds to a submatrix with large r-projective volume. Also in case of the low ranks there is a high error for the approximations by square r×r submatrices. As noted above, this is because the submatrices are close to singular.
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
247
Fig. 3. Error distribution for the approximations of rank r = 9 per 1000 experiments. The first r singular values of the matrix A were set as 100 and the rest ones as 1.
It is worth noting that in cases of approximations based on large projective volume submatrices as well as large volume rectangular submatrices of rank r, the error decreases monotonically (though slowly, due to the slow decay of the singular numbers). And in case of the approximations based on square submatrices of large volume, the error almost stops changing after r ∼ 20 and starts to increase significantly for r ∼ 50. It means that the singular value decline does not balance the factor growth. Finally it makes sense to analyze the error distributions. Fig. 3 demonstrates the distributions of the Chebyshev error norms for the submatrices of various sizes. It can be seen that the smallest deviation corresponds exactly to the projective volume case: in 1000 experiments, the error norm never exceeded 0.9 while the factor provided by the Theorem 7 is 2. In case of the volume we see rather large deviations. The errors exceeding the value 2 are represented in the last column of the diagram. Thus, the use of submatrices with large projective volume can significantly increase the approximation accuracy and prevent the inverse of almost singular submatrices. 6. Conclusion In this paper, the new estimates for the pseudoskeleton approximations accuracy were obtained. The r-projective (or simply projective) volume was introduced as a generalization of the volume concept. The properties of the volume and the projective volume were stated. The estimates for the CGR approximations were refined. Finally, it was proved that the use of square submatrices with a large projective volume leads to a significantly higher accuracy. This fact was confirmed experimentally.
248
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
Let us also mention the meaning of Theorem 7 for numerical methods. Firstly, the bound (27) implies the equivalence of the row and column spaces for the low-rank approximations. In other words, there is no reason to consider approximations with different numbers of rows and columns, and hence a rectangular kernel matrix does not help (see, e.g. [14]). Secondly, it follows from Theorem 7 that finding submatrices with a large projective volume (but not necessarily large volume) is the most important part of CGR-like approximation methods. Acknowledgment The work was supported by the Russian Science Foundation, Grant 14-11-00806. References [1] E.E. Tyrtyshnikov, Incomplete cross approximation in the mosaic-skeleton method, Computing 64 (4) (2000) 367–380. [2] M. Bebendorf, Approximation of boundary element matrices, Numer. Math. 86 (4) (2000) 565–589. [3] M. Bebendorf, S. Rjasanow, Adaptive low-rank approximation of collocation matrices, Computing 70 (2003) 1–24. [4] M. Bebendorf, R. Grzhibovskis, Accelerating Galerkin BEM for linear elasticity using adaptive cross approximation, Math. Methods Appl. Sci. 29 (2006) 1721–1747. [5] S.L. Stavtsev, Application of the method of incomplete cross approximation to a nonstationary problem of vortex rings dynamics, Russian J. Numer. Anal. Math. Modelling 27 (3) (2012) 303–320. [6] I.V. Oseledets, E.E. Tyrtyshnikov, TT-cross approximation for multidimensional arrays, Linear Algebra Appl. 432 (2010) 70–78. [7] S.A. Goreinov, On cross approximation of multi-index arrays, Dokl. Math. 77 (3) (2010) 404–406. [8] D.V. Savostyanov, Quasioptimality of maximum-volume cross interpolation of tensors, Linear Algebra Appl. 458 (2014) 217–244. [9] I.V. Oseledets, D.V. Savostyanov, E.E. Tyrtyshnikov, Tucker dimensionality reduction of threedimensional arrays in linear time, SIAM J. Matrix Anal. Appl. 30 (3) (2008) 939–956. [10] M. Gu, S.C. Eisenstat, Efficient algorithm for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput. 17 (4) (1996) 848–869. [11] S.A. Goreinov, N.L. Zamarashkin, E.E. Tyrtyshnikov, Pseudo-skeleton approximations, Dokl. Akad. Nauk 343 (2) (1995) 151–152. [12] S.A. Goreinov, E.E. Tyrtyshnikov, N.L. Zamarashkin, A theory of pseudo-skeleton approximations, Linear Algebra Appl. 261 (1997) 1–21. [13] S.A. Goreinov, E.E. Tyrtyshnikov, The maximal-volume concept in approximation by low-rank matrices, Contemp. Math. 268 (2001) 47–51. [14] A.Yu. Mikhalev, A Method of Constructing of Block Low-Rank Matrix Approximations, PhD, 2014, Moscow. [15] F.R. De Hoog, R.M.M. Mattheij, Subset selection for matrices, Linear Algebra Appl. 422 (2) (2007) 349–359. [16] F.R. De Hoog, R.M.M. Mattheij, A note on subset selection for matrices, Linear Algebra Appl. 434 (8) (2011) 1845–1850. [17] C. Boutsidis, D.P. Woodruff, Optimal CUR matrix decompositions, in: Proceedings of the 46th Annual ACM Symposium on Theory of Computing, 2014, pp. 353–362. [18] S.A. Goreinov, E.E. Tyrtyshnikov, Quasioptimality of skeleton approximation of a matrix in the Chebyshev norm, Dokl. Math. 83 (3) (2011) 1–2. [19] S.A. Goreinov, I.V. Oseledets, D.V. Savostyanov, et al., How to find a good submatrix, in: V. Olshevsky, E. Tyrtyshnikov (Eds.), Matrix Methods: Theory, Algorithms, Applications, World Scientific Publishing, 2010, pp. 247–256. [20] P. Drienas, R. Kannan, M.W. Mahoney, Fast Monte Carlo algorithms for matrices: computing a low-rank approximation to a matrix, SIAM J. Comput. 36 (2006) 132–157.
A.I. Osinsky, N.L. Zamarashkin / Linear Algebra and its Applications 537 (2018) 221–249
249
[21] M.W. Mahoney, P. Drineas, CUR matrix decompositions for improved data analysis, Proc. Natl. Acad. Sci. 106 (3) (2009) 697–702. [22] M.W. Mahoney, P. Drineas, S. Muthukrishan, Relative-error CUR matrix decompositions, SIAM J. Matrix Anal. Appl. 30 (2) (2008) 844–881. [23] A.W. Marshall, I. Olkin, Inequalities: Theory of Majorization and Its Applications, Mathematics in Science and Engineering, vol. 143, Academic Press, 1979.