Sparse block factorization of saddle point matrices

Sparse block factorization of saddle point matrices

JID:LAA AID:13333 /FLA [m1L; v1.159; Prn:3/09/2015; 8:16] P.1 (1-29) Linear Algebra and its Applications ••• (••••) •••–••• Contents lists availab...

1MB Sizes 1 Downloads 46 Views

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.1 (1-29)

Linear Algebra and its Applications ••• (••••) •••–•••

Contents lists available at ScienceDirect

Linear Algebra and its Applications www.elsevier.com/locate/laa

Sparse block factorization of saddle point matrices S. Lungten ∗,1 , W.H.A. Schilders, J.M.L. Maubach Centre for Analysis, Scientific Computing and Applications (CASA), Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600 MB, Eindhoven, The Netherlands

a r t i c l e

i n f o

Article history: Received 27 November 2014 Accepted 18 July 2015 Available online xxxx Submitted by V. Mehrmann MSC: 15A23 65F50 Keywords: Saddle point matrices Sparse matrices Transformation matrix Block partitioning Block factorization Schilders’ factorization

a b s t r a c t The factorization method presented in this paper takes advantage of the special structures and properties of saddle point matrices. A variant of Gaussian elimination equivalent to the Cholesky’s factorization is suggested and implemented for factorizing the saddle point matrices block-wise with small blocks of orders 1 and 2. The Gaussian elimination applied to these small blocks on block level also induces a block 3 × 3 structured factorization of which the blocks have special properties. We compare the new block factorization with the Schilders’ factorization in terms of sparsity and computational complexity. The factorization can be used as a direct method, and also anticipate for preconditioning techniques. © 2015 Elsevier Inc. All rights reserved.

1. Introduction Indefinite matrices with special forms which occur in many scientific and engineering problems can be exploited efficiently by taking advantage of the structures and properties * Corresponding author. E-mail addresses: [email protected] (S. Lungten), [email protected] (W.H.A. Schilders), [email protected] (J.M.L. Maubach). 1 The research work for this author was supported by the Erasmus Mundus IDEAS project and CASA, Eindhoven University of Technology. http://dx.doi.org/10.1016/j.laa.2015.07.042 0024-3795/© 2015 Elsevier Inc. All rights reserved.

JID:LAA

AID:13333 /FLA

2

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.2 (1-29)

of their blocks. We consider symmetric indefinite linear systems of the form (see the notation)       ˚ ˚ B ˚T ˚ x ˚ A ˚ ˚ ˚ A˚ u = b with A = , ˚ u= , b= f ˚ y ˚ ˚ g B 0

2

for

(1)

˚ ∈ Rn×n is symmetric positive definite; B ˚ ∈ Rm×n has full rank and m < n; where A ˚ is usually sparse ˚ x, f˚ ∈ Rn ; and ˚ y, ˚ g ∈ Rm . In applications, the coefficient matrix A and large, which can easily turn out to be a million by million. Systems of the form (1) are known as saddle point problems, which result from discretization of PDEs or coupled PDEs such as the Stokes problem and generally in the context of mixed finite element methods. Saddle point systems also arise in electronic circuit simulations [28, 32], Maxwell’s equations [22], economic models and constrained optimization problems [9,14,15,17,30]. For example consider the equality-constrained quadratic programming problem: 1 T˚ ˚˚ x A˚ x =˚ g. min p(˚ x) = ˚ x −˚ xT f˚ subject to B ˚ x 2

(2)

The Karush–Kuhn–Tucker (KKT) conditions [14,36] for the solution to (2) give rise to the system (1), where the components of ˚ y are the associated Lagrange multipliers. Thus ˚ ˚ has the coefficient matrix A is also known as KKT matrix and it is nonsingular if (i) B T ˚Z ˚ is positive definite, where ˚ A full row rank and (ii) the reduced Hessian matrix Z n×(n−m) ˚ ˚ is the matrix whose columns span the ker(B) [26, p. 443]. Z∈R Numerous solution methods for the saddle point systems of the form (1) can be found in the literature and many of them have focused on preconditioning techniques for Krylov subspace iterative solvers [2,6,11,17,20–22,25,29]. One can find a wide range of iterative methods in [1], a detailed survey done by Benzi, Golub and Liesen including preconditioning techniques. As a direct method (as opposed to iterative solvers), various ˚P = LDLT can be found techniques based on symmetric indefinite factorization P T A in [8,13,19,31,32,35], where P is a permutation matrix, L is unit lower triangular matrix, D is block-diagonal matrix with blocks of order 1 or 2. The permutation matrix P is ˚ is sparse. introduced for (i) pivoting dynamically and (ii) reducing the fill-in in L if A The block diagonal pivoting strategies are mainly due to Bunch and Kaufman [3], Bunch and Parlett [4] and Bunch, Kaufman and Parlett (BKP) [5]. ˚τ = A, followed by a block In this paper, we propose a different transformation τ T A Gaussian elimination factorization Pπ T A Pπ = Lb Db −1 Lb T , where: (i) Lb is block lower triangular with blocks of orders 1 and 2, and Db = diag(Lb ) is the block diagonal part of Lb with blocks of orders 1 and 2. 2 The original system (1) undergoes a transformation, so we use the symbol ‘˚’ in its notation in order to represent the transformed system more conveniently without the symbol ‘˚’.

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.3 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

3

(ii) τ is an (n + m) × (n + m) transformation (possibly a permutation) matrix, which ˚ Operator τ is follows from a transformation of the linear constraint matrix B. −1 T chosen such that the Lb Db Lb factorization is stable and has a sparse factor Lb . (iii) Pπ is a simple, predefined (n +m) ×(n +m) permutation matrix for a priori pivoting of A. The proposed Lb Db −1 Lb T factorization method exploits the structure and properties of A. For instance, the first m blocks of the block diagonal Db are the 2 × 2 pivots inheriting the same structure and properties of A, and the remaining n − m blocks are the 1 × 1 pivots. At scalar level, Lb Db −1 Lb T factorization has the same computational efficiency as that of the Cholesky factorization for symmetric positive definite matrices, which is shown in Appendix A of this paper. Whenever we come across the features related to the scalar level factorization, we refer them to Appendix A. ˚ with larger blocks of There are also several other block factorization methods for A order n, m or n − m, which are mostly based on either the Schur complement matrix ˚A ˚−1 B ˚T or the reduced Hessian matrix Z ˚T A ˚Z. ˚ For example, the Schilders’ factorizaB tion [32] is a block 3 × 3 structured factorization with blocks of orders m and n − m, ˚τ for a different τ . Later in [9,10,12], it has been used as a basis for applied to τ T A implicit factorization for constructing different families of constraint preconditioners for the saddle point matrices. We also produce such a 3 × 3 structured block factorization from Lb Db −1 Lb T factorization, and it turns out to be different from the ones in [1,9,10, 32]. Since the work in this paper was originally dedicated towards bringing an improvement on the work by Schilders [32], we compare the new factorization with the Schilders’ factorization in terms of sparsity and computational complexity. The remaining sections of the paper are organized as follows. In Section 2, we discuss the required properties for τ and show that such matrices exist for symmetric saddle point problems. Section 3 is the main part of the paper, in which we present the proposed factorization Pπ T A Pπ = Lb Db −1 Lb T . It covers the existence, sparsity, computational complexity and stability of Lb , and the steps for solving (1) using Lb . Comparison of the new block factorization with the Schilders’ block factorization is discussed in Section 4. Numerical results for this comparison are provided in Section 5. 2. Determination of transformation matrix τ There are different ways to choose the transformation matrix τ depending on the requirement of the transformed matrix B. For example, in [16,32], τ is chosen such that it results in B 1 being an m ×m upper triangular matrix. With our aim to obtain a stable and sparse block Lb Db −1 Lb T factorization, we choose a transformation matrix  τ =

P 0

 0 , Q

(3)

JID:LAA

AID:13333 /FLA

4

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.4 (1-29)

where P is an n × n permutation matrix and Q is an m × m orthogonal (possibly a permutation) matrix such that ˚ P = B = [ B1 QT B

 B2 ] =

 ,

(4)

a lower trapezoidal form with B 1 being an m × m nonsingular lower triangular matrix ˚ is and B 2 the remaining m × (n − m) part. The choice of τ here also ensures that A just permuted due to P , which is an essential property for sparsity of the transformed saddle point matrix. In the following, we give a brief overview to determine τ satisfying these conditions. ˚ If B ˚ = [˚ bij ] is an incidence matrix with ˚ bij ∈ {−1, 0, 1}, which has maxTypical B. imally two non-zeros ( −1 and/or 1) in each column [28,32], then one can obtain an m × m row permutation matrix Pr and an n × n column permutation matrix Pc such ˚ Pc = B is a lower trapezoidal form. Thus τ in (3) is an (n + m) × (n + m) that Pr T B permutation matrix with P = Pc and Q = Pr . An algorithm for such a permutation was developed even to obtain sparse B −1 1 in [23]. Systems of the form (1) with incidence ˚ matrix B evolve in resistor network modeling [28], the Stokes equations and many other applications with network topology. ˚ If B ˚ is of more general form, then by applying a sparse QR transformation General B. to it from the left, one can obtain an m × m orthogonal matrix Q1 and an n × n permutation matrix P 1 such that  ˚P 1 = B ˜= B ˜1 QT1 B

 ˜2 = B

 .

Define an n × n permutation matrix 

I P2 = a 0

0 I n−m

 ,

where Ia = [em , . . . , e1 ] is the flipped identity matrix with ei being the ith unit vector in Rm . Then    ˜ 2 = Ia B ˜ 1 Ia Ia B ˜ 2 = [ B1 B2 ] = Ia BP . Hence, the required τ in (3) is determined by choosing an m × m orthogonal matrix Q = Q1 Ia and an n × n permutation matrix P = P 1 P 2 . More details about sparse QR transformation can be found in [7, pp. 82–95]. 3. Block Lb Db −1 Lb T factorization Although the transformed saddle point matrix A has special block structures with blocks of orders m and n − m, we do not factorize it by using these blocks directly due to two reasons. Firstly, more computational work has to be spent on computing the

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.5 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

5

inverses, products and sums of large blocks during the factorization and requires separate algorithms for the block matrix operations. Secondly, this approach has to come through a certain type of conjecture and may lead to a number of factorizations, which are slightly different from each other. Our aim is to first exploit the structure of A, partition it into small blocks of orders 1 and 2, and then do a unique factorization in exact arithmetic using simple, inexpensive and robust algorithm. 3.1. Block partitioning We consider the transformation matrix τ of size (n + m) × (n + m) defined in (3), ˚ as follows: which is applied to the saddle point matrix A   T     T ˚ B ˚T P 0 0 P A B A = , (5) 0 Q ˚ 0 B 0 0 QT B









τ A τT ˚ A ˚ due to P . Because where B is lower trapezoidal form in (4) and A is permuted form of A of the lower trapezoidal form of B, the transformed matrix A can be partitioned into a block 3 × 3 structure: ⎡

A11 A = ⎣ A21 B1

A12 A22 B2

B T1 B T2 0





⎢ ⎦=⎢ ⎣

⎥ ⎥ ⎦



related to Mastronardi and Van Dooren [24], and Pestana and Wathen [27]. Beside B 1 being a lower triangular, the blocks A11 and A22 already have nice properties – symmetric positive definite and sparse. By taking advantage of the diagonal parts of A11 and B 1 , we reorder A by applying a simple permutation matrix Pπ of size (n + m) × (n + m) such that the permuted A can be partitioned into a block n × n structure with blocks of orders 1 and 2. The partitioning leads to four types of blocks: 2 × 2, 1 × 2, 2 × 1, and 1 × 1 (scalar) blocks, residing in their respective domains such that n2 =

m2

(2×2 blocks)

+

2m(n − m) (1×2 & 2×1 blocks)

+ (n − m)2 . (1×1 blocks)

The most significant feature of this partitioned form is that its block diagonal part is given by the direct sum m   akk bkk k=1

 bkk ⊕ 0

n+m 

akk ,

k=2m+1

which form a priori pivots for the block Lb Db −1 Lb T factorization. Furthermore, all the 2 × 2 blocks of the block lower triangular part of the partitioned matrix are retained

JID:LAA

AID:13333 /FLA

6

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.6 (1-29)

in the factor Lb . This ensures the sparsity of first 2m rows of Lb . When it comes to partitioning and factorization, one needs to be cautious with the indices of the blocks of orders 1 and 2, which are defined according to the elements of the transformed matrices A and B, i.e., A = [aij ], 1 ≤ i, j ≤ n and B = [bij ], 1 ≤ i ≤ m, 1 ≤ j ≤ n. It becomes clearer as we see more examples and discussions in the following parts of the section. The permutation matrix Pπ that we use here is defined as in [18,32]. Definition 1. Consider the positive integers n and m in (1). Let Nn+m = {1, 2, . . . , n + m}. Define a permutation π : Nn+m → Nn+m by  π=

1,

2,

3,

· · · , 2m − 1,

4,

1, n + 1, 2, n + 2, · · · ,

m,

2m,

2m + 1, · · · , n + m

n + m, m + 1, · · · ,

 .

n

An (n + m) × (n + m) permutation matrix P π related to π is given by P π = [ e1 ,

en+1 , e2 ,

en+2 ,

· · · , em ,

en+m ,

em+1 ,

··· ,

en ] ,

(6)

where ei is the ith unit vector of length n + m. To gain an insight on how the application of Pπ to A can partition it into a block ˚ of size 8 ×8 with n = 5 and m = 3. The n ×n structure, consider a saddle point matrix A transformed A (block 3 × 3 structure) and partitioned Pπ T A Pπ (block 5 × 5 structure with blocks of orders 1 and 2) are as follows: ⎡

a11 ⎢ ⎢ a21 ⎢ ⎢ a31 ⎢ ⎢ a41 ⎢ ⎢ a51 ⎢ ⎢b ⎢ 11 ⎢ ⎣ b21 b31

a12 a22 a32 a42 a52 0 b22 b32

a13 a23 a33 a43 a53 0 0 b33

a14 a15 a24 a25 a34 a35 a44 a45 a54 a55 b14 b15 b24 b25 b34 b35 A

b11 0 0 b41 b51 0 0 0

b21 b22 0 b42 b52 0 0 0

⎤ ⎡ b31 a11 ⎥ ⎢ b32 ⎥ ⎢ b11 ⎥ ⎢ ⎢ a21 b33 ⎥ ⎥ ⎢ ⎥ ⎢ b21 b43 ⎥ ⎢ and ⎥ ⎢ a31 b53 ⎥ ⎢ ⎥ ⎢b 0 ⎥ ⎢ 31 ⎥ ⎢ 0 ⎦ ⎣ a41 a51 0



b11 0 0 0 0 0 b41 b51

a12 0 a22 b22 a32 b32 a42 a52

b21 a13 b31 0 0 0 b22 a23 b32 0 0 0 0 a33 b33 0 b33 0 b42 a43 b43 b52 a53 b53 T Pπ A P π

a14 b14 a24 b24 a34 b34 a44 a54

⎤ a15 ⎥ b15 ⎥ ⎥ a25 ⎥ ⎥ b25 ⎥ ⎥ a35 ⎥ ⎥ b35 ⎥ ⎥ ⎥ a45 ⎦ a55

(7)

where the permutation matrix for this case is P π = [e1 , e6 , e2 , e7 , e3 , e8 , e4 , e5 ]. We observe that by applying Pπ symmetrically to A, the entries having the same color and index get coupled. In fact, the permutation matrix Pπ shuffles the first and the last m columns of A, and similarly the rows. For general n and m, let F = Pπ T A Pπ . Then the blocks Fij of orders 1 and 2 for 1 ≤ i, j ≤ n are given by:

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.7 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

⎧ aii ⎪ ⎪ ⎪ b ⎪ ⎪  ii ⎪ ⎪ ⎨ a Fij =

ij

bij ⎪ ⎪ ⎪  ⎪ ⎪ ⎪ ⎪ ⎩ aij 0

 bii , 1≤i=j≤m 0  0 , 1≤j
⎧  aij ⎪ ⎪ , ⎪ ⎪ bij ⎪ ⎪ ⎨ and Fij =

7

1≤i≤m
[ aij bji ] , 1 ≤ j ≤ m < i ≤ n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩a [ ij ] , m < i, j ≤ n

3.2. Existence, uniqueness and sparsity the block Lb Db −1 Lb T factorization Based on the structure and properties of the blocks of orders 1 and 2, we show that the block Lb Db −1 Lb T factorization of Pπ T A Pπ exists uniquely. Theorem 1. Suppose an (n + m) × (n + m) permutation matrix P π is defined as in (6) and F = P Tπ A P π , where A is (n + m) × (n + m) transformed symmetric indefinite matrix in (5). Then there exists (n + m) × (n + m) nonsingular block lower triangular matrix Lb with blocks of orders 1 and 2 such that F = Lb Db −1 Lb T

(8)

and the 2 ×2 blocks Lij = Fij , 1 ≤ j ≤ i ≤ m, where Db = diag(Lb ) is the block diagonal part of Lb . It is unique with respect to P π . Proof. Similar to (A.2) in Appendix A, one can deduce iteratively Lij = Fij −

j−1 

T Lik L−1 kk Ljk

(9)

k=1

where ⎧ ⎪ ⎪ ⎨2 × 2 blocks for 1 ≤ j ≤ i ≤ m; Lij = 1 × 2 blocks for 1 ≤ j ≤ m < i ≤ n; ⎪ ⎪ ⎩1 × 1 blocks for m < j ≤ i ≤ n.

(10)

We show the existence of (9) by induction on j ≤ i ≤ n as follows: (i) 2 × 2 blocks. Note that for each k = 1, . . . , m, −1 Fkk =



akk bkk

bkk 0

−1 =

1



0

b2kk bkk

bkk −akk



which exists since bkk = 0. All the 2 × 2 blocks Lij = Fij , 1 ≤ j ≤ i ≤ m, which is trivially shown in the following.

JID:LAA

AID:13333 /FLA

8

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.8 (1-29)

For j = 1, . . . , m For i = j, . . . , m Lij = Fij −

j−1 

T Lik L−1 kk Ljk

k=1

= Fij −

= Fij −

j−1   aik bik

k=1

  0 1 0 0 b2kk bkk

j−1   aik bik

  0 1 0 0 b2kk bkk ajk

k=1

(ii) 1 × 2 blocks. Let the 1 × 2 blocks be Lij = [ αij

bkk −akk



ajk 0

bjk 0



 0 = Fij . bkk bjk

βji ], where 1 ≤ j ≤ m < i ≤ n.

From (10) and (11), we obtain For j = 1, . . . , m For i = m + 1, . . . , n Lij = Fij −

j−1 

T Lik L−1 kk Ljk





k=1 1×2

= Fij −

j−1 

[ αik

2×2

βki ]

k=1

= [ aij

bji ] −

1

0

b2kk bkk

j−1   ajk k=1



bkk

βki

bkk −akk



ajk 0

bjk 0



 bjk βki . bkk

Hence αij = aij −

j−1  ajk k=1

bkk

βki and βji = bji −

j−1  bjk βki , 1 ≤ j ≤ m < i ≤ n, bkk

k=1

exist since bkk = 0. (iii) 1 × 1 blocks. Let the 1 × 1 blocks be Lij = [ αij ], where m < j ≤ i ≤ n. From (9), (10) and (11), we obtain For j = m+1, . . . , n For i = j, . . . , n

(11)

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.9 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

Lij = Fij −

m 

T Lik L−1 kk Ljk −







k=1 1×2 2×2 2×1

= aij −

m 

[ αik

βki ]

k=1

= aij −

1

j−1 

9

T Lik L−1 kk Ljk







k=m+1 1×1 1×1 1×1



0

b2kk bkk

bkk −akk



 j−1  αjk −1 − αik αkk αjk βkj k=m+1

j−1 m   1 αik αjk (α b β + β b α − β a β ) − . ik kk kj ki kk jk ki kk kj b2kk αkk

k=1

k=m+1

Apparently the 1 × 1 blocks (scalars) are given by

αij = aij +

m  βki βkj k=1

b2kk

akk −

j−1  αik βkj + αjk βki αik αjk − , bkk αkk

m < j ≤ i ≤ n,

k=m+1

(12) which exist only if αii = 0 for each i = j = m + 1, . . . , n. For this, let a rectangular matrix H ∈ R(n−m)×n be defined by H = [H In−m ], where In−m is the (n − m) × (n − m) identity matrix, and H = [hkj ] such that hkj = −

βji , bjj

1 ≤ j ≤ m,

1 ≤ k ≤ n − m,

i = k + m.

Define G = HAHT . Then the elements of the matrix G = [grs ] are defined by

grs = aij +

m  βki βkj k=1

b2kk

akk −

αik βkj + αjk βki , bkk

1 ≤ r, s ≤ n − m, i = r + m, j = s + m.

(13)

Computation of grs is demonstrated in Appendix B for n = 4, m = 2. Combining (12) and (13), we get j−1  αik αjk αij = grs − , αkk

1 ≤ r, s ≤ n − m, i = r + m, j = s + m.

(14)

k=m+1

From (14), it is evident that the 1 × 1 block elements are from a lower triangular matrix L such that LD −1 LT = G, where D = diag(L). Since the matrix H has full rank n −m, the matrix G is symmetric positive definite. Hence by the theorem given in Appendix A, LD −1 LT = G exists and αii > 0 for each i = j = m + 1, . . . , n. The uniqueness also follows from the same theorem in Appendix A. 2

JID:LAA

AID:13333 /FLA

10

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.10 (1-29)

From (i), (ii) and (iii), the blocks of Lb are given by ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬

⎧  aii bii ⎪ ⎪ , 1≤i=j≤m ⎪ ⎪ bii 0 ⎪ ⎪ ⎪ ⎪ ⎪   ⎪ ⎪ ⎪ ⎨ aij 0 , 1 ≤ j < i ≤ m bij 0 Lij = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ [ αij βji ] , 1 ≤ j ≤ m < i ≤ n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩[ α ] , m < j ≤ i ≤ n ij

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭

(15)

where the block pivots are from m   akk Db = diag(Lb ) = bkk k=1

 bkk ⊕ 0

n+m 

αkk .

k=2m+1

For example, Lb Db −1 Lb T factorization of F in (7) gives: ⎡

a11 ⎢ ⎢ b11 ⎢ ⎢ a21 ⎢ ⎢ b21 Lb = ⎢ ⎢ a31 ⎢ ⎢b ⎢ 31 ⎢ ⎣ α41 α51

b11 0 0 0 0 0 β41 β51

0 0 a22 b22 a32 b32 α42 α52

0 0 b22 0 0 0 β42 β52

0 0 0 0 a33 b33 α43 α53

0 0 0 0 b33 0 β43 β53

0 0 0 0 0 0 α44 α54

⎤ 0 ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥. 0 ⎥ ⎥ 0 ⎥ ⎥ ⎥ 0 ⎦ α55

(16)

Obviously, only the 1 × 2 and 1 × 1 blocks are required to be computed, while the 2 × 2 blocks are straight away from the ones in F . Hence, the first 2m rows of Lb are sparse since their elements are directly from the sparse blocks A11 and B 1 . Algorithm 1 gives a MATLAB version for the factorization F = Lb Db −1 Lb T . The function s involved in the algorithm is defined by ⎧ ⎪ ⎪ ⎨[l, s(l) =

l + 1] if l is a row index and 1 ≤ l ≤ m,

[l, l + 1]T if l is a column index and 1 ≤ l ≤ m, ⎪ ⎪ ⎩[l] if m + 1 ≤ l ≤ n.

It determines the positions of elements of the lth block. We use V (s(k), :) to optimize the computational complexity (see Appendix A). Below, in Definition 2, we introduce a permutation σ that swaps the two rows of every ˆ which 2 × 2 block row of Lb . This is done in order to obtain a lower triangular matrix L, can be used to solve the system (1) through backward substitution.

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.11 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

11

Algorithm 1 For a block structured symmetric indefinite matrix F ∈ R(n+m)×(n+m) with blocks of orders 1 and 2, the algorithm computes a nonsingular block lower triangular matrix Lb ∈ R(n+m)×(n+m) with blocks of orders 1 and 2 such that F = Lb Db −1 Lb T .   For j = 1, . . . , n, the jth block column Lb s(j) : n + m, j overwrites F s(j) : n + m, s(j) . 1: for j = 1 : n 2: for k = 1 : j − 1   −1  T 3: V s(k), : = F s(k), s(k) F s(j), s(k) 4: end for 5: for i = j : n j−1    !  6: F s(i), s(j) = F s(j), s(j) − F s(i), s(l) V s(l), : l=1

7: 8:

end for end for

Definition 2. Let a permutation σ : Nn+m → Nn+m be defined by  σ=

1, 2, · · · , 2m − 1, 2, 1, · · · ,

2m,

2m,

2m + 1, · · · , n + m

2m − 1, 2m + 1, · · · , n + m

 .

The related permutation matrix P σ of size (n + m) × (n + m) is given by P σ = [ e2 ,

e1 ,

··· ,

e2m ,

e2m−1 ,

e2m+1 ,

··· ,

en+m ] .

(17)

It is easy to see that ˆ b −1 L ˆT , Pσ T Pπ T A Pπ Pσ = LD

(18)

˜ ij , are given ˆ = Pσ T Lb is a lower triangular matrix whose block elements, say L where L by: ⎧  ⎪ ⎪ bii 0 , 1 ≤ i = j ≤ m ; ⎪ ⎪ ⎪ ⎪ aii bii ⎪ ⎪ ⎪   ⎪ ⎪ ⎪ ⎨ bij 0 , 1 ≤ j < i ≤ m ; aij 0 ˜ ij = L ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ [ αij βji ] , 1 ≤ j ≤ m < i ≤ n ; ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩[ α ] , m < j ≤ i ≤ n, ij

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭

(19)

meaning every ith, 1 ≤ i ≤ m, 2 by 2 block row of Lb is row-interchanged, while the other n − m block rows remain unaltered. From the example of Lb in (16), we obtain

JID:LAA

AID:13333 /FLA

12

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.12 (1-29)

Pσ = [ e2 ,

e1 ,

e4 ,

e3 ,

e6 ,

0 0 b22 a22 b32 a32 α42 α52

0 0 0 b22 0 0 β24 β25

e5 ,

e7 , e8 ] ,

which gives ⎡

b11 ⎢ a ⎢ 11 ⎢ ⎢ b21 ⎢ ⎢ a21 T ˆ L = Pσ Lb = ⎢ ⎢ b31 ⎢ ⎢a ⎢ 31 ⎢ ⎣ α41 α51

0 b11 0 0 0 0 β14 β15

0 0 0 0 b33 a33 α43 α53

0 0 0 0 0 b33 β34 β35

0 0 0 0 0 0 α44 α54

⎤ 0 ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥. 0 ⎥ ⎥ 0 ⎥ ⎥ ⎥ 0 ⎦ α55

Hence ˆ b −1 L ˆT u = d LD

ˆ = d, w = Db v, L ˆ T u = w, Lv



˚and ˚ where d = Pσ T Pπ T τ T d u = τ Pπ Pσ u, which solves the system (1) through backward substitution. The block diagonal Db is not required to invert, since the vector w is directly obtained from the matrix vector product Db v. We only need to extract the block diagonal matrix Db from Lb using the function s. 3.3. Computational complexity of floating point operations (flops) We determine the total number of flops required for the whole computational process involved in the new factorization and compare it with the Cholesky’s flop counts required ˚ With regard to the determination of transformation matrix τ , the for factorizing A. floating point operations required for the worst case (for instance, left Householder QR ˚ to B with m < n) is transformation of B 2 ∼ 2m2 n − m3 . 3

(20)

The choice of τ in (3), results not only in a sparse factor Lb , but also reduces the number ˚ and of flops. In the following, we show that the total flops required for transforming A 3 factorizing A is less than the Cholesky’s flops by an O(m ). Since the 2 × 2 blocks do not have to be computed, we find the flop counts only for the 1 × 2 and 1 × 1 blocks. 1 × 2 blocks. Rewriting (ii) in Theorem 1 by using Algorithm 2 in Appendix A gives: For j = 1, . . . , m For k = 1, . . . , j − 1 [ v 1 (k) end

v 2 (k) ] =



ajk bkk

bjk bkk



JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.13 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

For i = m + 1, . . . , n  j−1 ! β α [ ij v 1 (k)βki ji ] = aij −

bji −

k=1

j−1 !

13

 v 2 (k)βki

k=1

end end ∴

flops(1 × 2 blocks) = 2

m   (j − 1) + 2(n − m)(j − 1) ∼ 2 m2 n − m3 .

(21)

j=1

1 × 1 blocks. From (iii) in Theorem 1, we obtain For j = m+1, . . . , n For k =1, . . . , m ⎡ ⎤



⎢ u1 (k) ⎥ ⎢ ⎢ ⎥=⎢ ⎣ ⎦ ⎣α jk u2 (k) bkk

⎤ βkj ⎥ bkk ⎥ ⎦ akk − u1 (k) bkk

end For k =1, . . . , j − 1 αjk w(k) = αkk end For i =j, . . . , n αij = aij −

m 

αik u1 (k) + βki u2 (k) −

k=1

j−1 

αik w(k)

k=m+1

end end

flops(1 × 1 blocks) =

n−m 

5m + (j − 1) + 4m(n − m − j + 1) + 2(n − m − j + 1)(j − 1)

j=1



1 3 5 3 n + m − 3m2 n + mn2 . 3 3

(22)

Sum of (20), (21) and (22) gives the total floating point operations: flops(total) ∼

1 3 n − m3 + m2 n + mn2 . 3

(23)

JID:LAA

AID:13333 /FLA

14

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.14 (1-29)

˚ is Cholesky flops for factorizing A 1 3 1 1 5 (n + m)3 + (n + m)2 − (n + m) ∼ n + m3 + m2 n + mn2 . 3 2 6 3 Thus the number of operations in (23) is less than the Cholesky’s by ∼

4 3 m . 3

3.4. Numerical stability Pivoting strategies are required in order to address the fundamental issue on the bound of the growth factor ρ, which occurs during Gaussian elimination, defined by (l)

max |aij | ρ(A) =

i,j,l

max |aij | i,j

(l)

where aij , l = 1, . . . , n, are the elements of the reduced matrix, A(l) at the lth step Gaussian elimination of an n × n matrix A = A(1) . In (11) and (12), the factors bki /bkk , 1 ≤ k ≤ j − 1, may lead to arbitrarily large updates of αij and βji , for each 1 ≤ j ≤ m < i ≤ n and m < j ≤ i ≤ n. So, for stable ˚ is transformed into a lower trapezoidal Lb Db −1 Lb T factorization, we consider that B B = [B 1 B 2 ] such that the diagonal elements of B 1 satisfy the condition: |bkk | ≥ |bki |, 1 ≤ i ≤ n for each k = 1, . . . , m. This condition is sufficient for the bound of the growth factor ρ, which is shown in Theorem 2. Again, all the 2 × 2 blocks are directly from A, so we have to show only the bounds for the elements of 1 × 2 and 1 × 1 blocks. ˚ ∈ Rn×n is symmetrically transformed into Theorem 2. Suppose a saddle point matrix A A as in (5) such that the lower trapezoidal form B = [B 1 B 2 ] satisfies the condition: |bkk | ≥ |bki |, 1 ≤ i ≤ n for each k = 1, . . . , m. Let Pπ be the permutation matrix in (6). If the block factorization P Tπ A P π = Lb Db −1 Lb T runs to completion, then the growth factor ρ is bounded by # " (l) (l) max |aij |, |bji | ρ(A) =

i,j,l

max {|aii |, |bjj |}

≤ 22m ,

1 ≤ l ≤ n.

(24)

i,j

Proof. From (12), it is easy to see that For j = 1, . . . , m For i =m + 1, . . . , n [ |αij |

 |aij | |βji |] ≤ 2j−1 max i,j

 |bjj | .

(25)

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.15 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

15

Since we do not know whether max {|aij |} or max {|bjj |} is the largest, the common i,j

j

upper bound on the elements of 1 × 2 blocks, 1 ≤ j ≤ m < i ≤ n, is # " (j) (j) max |aij |, |bji | = max {|αij |, |βji |} ≤ 2m−1 max {|aij |, |bjj |} . i,j

i,j

i,j

(26)

The 1 × 1 blocks of Lb are the elements decomposed from the symmetric positive definite matrix G ∈ R(n−m)×(n−m) defined in (13). Since, the Gaussian elimination growth factor for a symmetric positive definite matrix without pivoting is equal to 1 [34, p. 239], the bound on 1 × 1 blocks αij , m < i, j < n, in (14) is the same as that on G. It suffices to show that the elements of G are bounded. From (14) and (25), for 1 ≤ r, s ≤ n − m, i = r + m, j = s + m, we get |grs | ≤ |aij | +

m  |βki | |βkj |

|bkk |2

k=1

≤ |aij | +

m 

|akk | +

|αik | |βkj | + |αjk | |βki | , |bkk |

22(k−1) |akk | + 2k−1 |αik | + 2k−1 |αjk |,

k=1

≤ |aij | + 3 max |aij | i,j

m 

22k−2 ,

k=1

≤ 22m max |aij |.

(27)

i,j

Combining (26) and (27), we obtain # " (l) (l) max |aij |, |bji | ≤ 22m max {|aij |, |bjj |} = 22m max {|aij |, |bij |} , i,j

i,j,l

1 ≤ l ≤ n.

i,j

2

(28)

For Gaussian elimination of a matrix of size (n + m) × (n + m) with partial pivoting, Wilkinson [37] showed that ρ ≤ 2n+m−1 . The upper bound that we have derived in (28) is sharper for m ≤ n − 1, which complies with our assumption of m < n but only with sufficient condition on B as stated in Theorem 2. We prefer to use the above ˚ only if A ˚ is almost ill-conditioned, since it might not mentioned transformation of B ˚ ensure sparse B. In fact, such a transformation is equivalent to partial pivoting of A ˚ without having to pivot the symmetric positive definite matrix A. 4. The new block factorization versus the Schilders’ block factorization Schilders’ factors consist of a block 3 × 3 structure with blocks of orders m and n − m, which are computed directly from the blocks with similar orders of the transformed ˚ , for a different τ . For the interest of comparison, we show saddle point matrix τ T Aτ −1 that the block Lb Db Lb T factorization can also be induced to such a block 3 × 3

JID:LAA

AID:13333 /FLA

16

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.16 (1-29)

structured factorization with blocks of orders m and n − m. Our persuasion here is solely based on application of the inverse of the permutation π. In other words, we do not go for any further computations in order to form the blocks of orders m and n − m from the factors Lb and Db −1 . The induced block factors are different and much more sparse than the Schilders’ factors for large n and m. We provide both theoretical and numerical aspects, which distinguish these two factorizations. 4.1. Induced block factorization Let P π−1 be a permutation matrix of size (n + m) × (n + m) defined by the inverse π −1 . Applying P π−1 congruently to F = Pπ T A Pπ restores it to A and so the blocks Aij and B i , i = 1, 2. However, the question here is – if Lb is reformed with a similar application of P π−1 to it, will there be well-defined blocks of orders m and n − m in P Tπ−1 Lb P π−1 , which can be related to the blocks Aij and B i ? To answer this question, consider the example of F with n = 5, m = 3 in (7). The positions (indices) of the elements of Lb and Db −1 are inherited from the elements of A. Congruence application of π −1 to Lb and Db −1 means taking all their elements back to the inherited positions in A. For instance, by applying π −1 to Lb in (16), the entries a32 and β41 are moved from their current positions (5, 3) and (7, 2) to their inherited positions (3, 2) and (4, 6), respectively. Applying π −1 to all other entries of Lb gives: ⎡

Pπ−1 T Lb Pπ−1

a11 0

⎢ ⎢ a21 ⎢ ⎢ ⎢ a31 ⎢ ⎢α ⎢ 41 =⎢ ⎢ α51 ⎢ ⎢ ⎢ b11 ⎢ ⎢ ⎣ b21

0

0

0 b11 0

a22 0

0

0

a32 a33

0

0

0



0

0

0

0

b22

0

0

0

0

0

⎥ 0 b22 0 ⎥ ⎥ ⎥ 0 0 b33 ⎥ ⎥ β14 β24 β34 ⎥ ⎥ ⎥ β15 β25 β35 ⎥ ⎥ ⎥ 0 0 0 ⎥ ⎥ ⎥ 0 0 0 ⎦ 0 0 0

α42 α43 α44 0 α52 α53 α54 α55

b31 b32 b33 and similarly, ⎡

Pπ−1 T Db −1 Pπ−1

0

⎢ ⎢ 0 ⎢ ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ =⎢ ⎢ 0 ⎢ ⎢ −1 ⎢ b11 ⎢ ⎢ ⎣ 0 0

0

0

0

0

b−1 11

0

0

0

0

0

0

0

b−1 22

0

0

0

0

0

0

0

b−1 33

0

−1 0 α44

0

0

0

0

−1 α55

0

0

0

0

−a11 b−2 11

0

0 0 −a33 b−2 33

0 0

0 0

0 0

b−1 22

0

0

0

0

−a22 b−2 22

0

b−1 33

0

0

0

0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.17 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

17

which are indeed as expected and have blocks with special structures and properties. For general m and n, let   LA 0 D B Pπ−1 T Lb Pπ−1 = L = M L H and B1 0 0 ⎡ ⎤ 0 0 D −1 B ⎦. D 0 P Tπ−1 Db −1 P π−1 = D = ⎣ 0 −1 −2 DB 0 −D A D B Then, using the above example inductively, the blocks of orders m and n − m can be defined as follows: ⎫ LA = [aij ], 1 ≤ j ≤ i ≤ m, m × m lower triangular matrix; ⎪ ⎪ ⎪ ⎪ ⎪ L = [αij ] , m < j ≤ i ≤ n, (n − m) × (n − m) lower triangular matrix; ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ M = [αij ], 1 ≤ j ≤ m < i ≤ n, (n − m) × m rectangular matrix; ⎪ ⎬ (29) H = [βji ], 1 ≤ j ≤ m < i ≤ n, (n − m) × m rectangular matrix; ⎪ ⎪ ⎪ ⎪ D B = diag(B 1 ), m × m diagonal matrix; ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ D A = diag(LA ), m × m diagonal matrix; and ⎪ ⎪ ⎪ ⎭ −1 D = diag (L), (n − m) × (n − m) diagonal matrix. All these blocks are directly from Lb , they all exist and are well-defined with respect to the permutation inverse, π −1 . Also LA is the lower triangular part of the block A11 , and L is the lower triangular matrix decomposed from the symmetric positive definite matrix G in (13), so they are nonsingular. Using these blocks, we easily prove Lemma 1 that gives the induced block factorization of A with blocks of orders m and n − m. Lemma 1. Consider the transformed symmetric indefinite matrix A ∈ R(n+m)×(n+m) in (5) and the permutation matrix P π in (6). Then the factorization Pπ T A Pπ = Lb Db −1 Lb T can be induced to a block factorization with blocks of orders m and n − m such that ⎤ ⎡ ⎤ ⎤⎡ T  ⎡ A11 A12 B T1 LA M T B T1 LA 0 D B 0 0 D −1 B ⎥ ⎢ ⎦⎣ 0 D 0 LT 0 ⎦, ⎣ A21 A22 B T2 ⎦ = M L H ⎣ 0 −1 −2 T 0 B1 0 D 0 −D A D B DB H 0 B1 B2 0

B







L T D L A (30) where LA , L, M , H, D A , D B and D are as in (29). Proof. Notice that the blocks LA , L, M , H, D A , D B and D defined in (29) exist due to the existence of Lb , we only need to show that (30) holds. Applying P π−1 congruently on P Tπ A P π and using Theorem 1, we obtain:

JID:LAA

AID:13333 /FLA

18

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.18 (1-29)

$ % A = P Tπ−1 P Tπ A P π P π−1 $ % = P Tπ−1 Lb Db −1 Lb T P π−1 $ %$ %$ % = P Tπ−1 Lb P π−1 P Tπ−1 Db −1 P π−1 P Tπ−1 Lb T P π−1 = LDLT .

2

Like in the proof of the Schilders’ factorization, if we compare the right- and left-hand sides of (30), the blocks can be related by the following equations. Note that the product of any two diagonal matrices is commutative. A11 = LA + LTA − D A ,

(31)

T −1 A21 = M + ED −1 B LA − ED A D B ,

(32)

T −1 T A12 = M T + LA D −1 B E − DA DB E ,

(33)

T T −1 T −2 T A22 = M D −1 B E + LDL + ED B M − ED A D B E ,

(34)

T T −T B 2 = B 1 D −1 B E or E = B 2 B 1 DB .

(35)

Working out equations (31) through (35), one can obtain the relation LDLT = [ −E

I n−m ] A [ −E

T

I n−m ] .

(36)

It is quite evident from (30), that the induced block factors L and D are different from the ones in [1,9,10,32], since they are deduced from A by using a different transformation operator τ . Furthermore, in [1,9,10], suggestions for the blocks of L and D are provided such that LDLT approximates the block A by keeping the constraint matrix B intact, which is known as implicit factorization for a constraint preconditioner. In contrast, the induced block factorization in (30) and the Schilders’ factorization give the exact factorization of A. This is another reason which motivates us to draw a comparison between the Schilders’ factorization and the induced block factorization. 4.2. Comparison with the Schilders’ factorization According to the Schilders’ factorization [32, Lemma 4.1], a symmetric indefinite ma˚ ∈ R(n+m)×(n+m) is transformed into A, ˜ partitioned into a block 3 × 3 structure trix A and decomposed into the following form: ⎡

˜ A ⎢ 11 ˜ ⎣ A21 ˜1 B

˜12 A ˜22 A ˜2 B ˜ A

⎤ ⎡ ˜ T1 ˜ T1 B B T⎥ ˜2 ⎦ = ⎣ B ˜ T2 B 0 0



0 ˜ 2 + I n−m L 0 ˜ L

⎤⎡ ˜1 ˜1 L D ⎦ ⎣ ˜ 0 M Im Im

0 ˜2 D 0 ˜ D

⎤ Im ˜T , 0 ⎦L 0

(37)

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.19 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

19

Table 1 Comparison of the blocks from the Schilders’ factorization and the induced block factorization in (30). ˜D ˜L ˜T Schilders’ L $ $ %% ˜ 1 −T A ˜1 = B ˜11 B ˜ 1 −1 ˜ 1 T strict lower B L $ % ˜ 1 −T A ˜11 B ˜1 −1 ˜ 1 = diag B D % $ T T ˜ 1 −1 − B ˜21 − B ˜1 ˜1 ˜ = A ˜2 L ˜2 T D B M

Current LDLT LA = lower triangular(A11 ) D A = diag(A11 ) −T  D A − LT M = A21 + B T 2 B1 A

 ˜ 1 is m × m upper triangular matrix ˜ 1 and L ˜ 2 are respectively, m × m where B ;L ˜ is (n − m) × m rectangular and (n − m) × (n − m) strictly lower triangular matrices; M ˜ ˜ matrix; and D 1 and D 2 are respectively, m ×m and (n −m) ×(n −m) diagonal matrices. ˜ 1, L ˜ 1 and M ˜ are By working out the left- and right-hand sides of (37), the blocks D computed from the following equations (details can be found in [32]): $ −T % ˜ 1 = diag B ˜11 B ˜ −1 ˜1 A D , 1 $ % −1 ˜1 = B ˜ ˜ ˜ T1 lower B ˜ −T L A B , 11 1 1 $ % ˜ = A ˜21 − B ˜ T2 L ˜T ˜ ˜ T1 B ˜ −1 M 1 − B 2 D1 ,

(38) (39) (40)

˜ 2 and D ˜ 2 are to be determined from the Cholesky factorization of whereas the blocks L ˜ which the reduced Hessian matrix Z T AZ,   is similar to the matrix on the right-hand side T −T T ˜11 and B ˜1 ˜ ˜ of (36), where Z = −B 2 B 1 I n−m . Although in general, the blocks A ˜ −T ˜ ˜ −1 gets more fill-in when B ˜ −1 are sparse, the product B is dense. Ultimately 1 A11 B 1 1 ˜ ˜ the lower triangular block L1 in (39) and the rectangular block M in (40) turn out to be substantially full. It is also expensive to compute these blocks. We can see in Table 1, that the induced blocks DA , LA and M can be determined with a minimum involvement of the factor B −1 1 , which leads them to be sparser and computationally cheaper than ˜ 1, L ˜ 1 and M ˜. the corresponding Schilders’ blocks D The induced block factorization in (30) has got even more advantage over the Schilders’ factorization if the block A is a diagonal matrix, which occurs in applications such as resistor network modeling and some convex quadratic programming problems. It is clear from Table 1 that if A is diagonal, $then the block M % = 0, since LA = D A T T −1 ˜ ˜ ˜ ˜ ˜ and A21 = 0. Whereas the block M = −B2 L1 B1 + D1 , which cannot be zero ˜2 is the null matrix. unless B 5. Numerical experiments We did numerical experiments on two different categories of saddle point matrices ˚ as seen in Section 2. With regard to that are based on two types of constraint matrix B ˚ typical B, we conducted our tests on the saddle point matrices, which arise in resistor ˚ we examined the network modeling systems. With regard to more general form of B,

JID:LAA

AID:13333 /FLA

20

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.20 (1-29)

˚ Fig. 1. A realistic industrial resistor network, RNC20 and its associated saddle point matrix A.

saddle point matrices provided in the repository of the University of Florida (UF) sparse matrix collection [33], maintained by Timothy A. Davis. For resistor network modeling, the matrix A is a diagonal matrix with resistance ˚ is an incidence matrix having full row rank m. There are values of n resistors, and B m+1 nodes in a resistor network and one node is grounded. The row related to the ground ˚ which makes the system stable [28]. As a node is deleted from the incidence matrix B, ˚ has at most two nonzero elements in each column, which is permuted into a result, B lower trapezoidal form [23]. In order to understand the structures and sparsity patterns ˚ of size of A, Pπ T A Pπ and the factor Lb , we consider a small saddle point matrix A 113 × 113 from an industrial resistor network problem, consisting of 44 nodes (including the ground node) and 70 arcs. The visual representations of the resistor network and its associated saddle point matrix in this example are shown in Fig. 1. The transformation ˚ here is a permutation matrix of order 113. The transformed operator τ applied to A matrix A and its block partitioned form Pπ T A Pπ with blocks of orders 1 and 2 are shown in Fig. 2. From Fig. 3, we see that the block lower triangular factor Lb and its block diagonal part Db contain the same structure of Pπ T A Pπ . We also obtained the ˜ as shown in Fig. 4. The factor L from Lb , which is compared with the Schilders’ factor L ˜ are shown in induced block diagonal factor D and the Schilders’ block diagonal factor D ˜ comparing Fig. 5. Even for this small matrix, we observe that there is more fill-in in L to L. ˚ in resistor network problems are presented The numerical results for larger sizes of A ˚ we experimented on 10 saddle point in Table 2. Regarding the more general form of B, matrices available in the UF sparse matrix collection coming from various applications, which are presented in Table 3. Since the numerical tests were carried out in the MATLAB R2013b, we have selected only the moderate-sized matrices. For all the matrices ˚ part into trapezoidal form such that we have chosen, we were able to transform the B ˚ that A and A have the same sparsity. We did sparse QR transformation from the left

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.21 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

21

Fig. 2. Transformed matrix A and the block partitioned matrix Pπ T APπ with blocks of orders 1 and 2.

Fig. 3. Factors of Pπ T APπ – block lower triangular matrix Lb and its diagonal part with blocks of orders 1 and 2.

˚ for the ones which were not able to be permuted. In all these cases, there was on B no fill-in in the orthogonal matrices Q’s and the transformed B’s were as sparse as the ˚ No fill-in in Q here means it is as sparse as a permutation matrix with the original B’s. number of nonzero elements equal to m. In order to observe the growth in the elements of L, we computed the growth factor by taking the ratio of the largest absolute element of L to the largest absolute element of A, see Table 4. The growth factor is acceptable in all these examples.

JID:LAA

AID:13333 /FLA

22

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.22 (1-29)

˜ Fig. 4. Induced block factor L and the Schilders’ factor L.

˜ Fig. 5. Induced block diagonal factor D and the Schilders’ block diagonal factor D.

Table 2 ˜ and L of saddle point systems of realistic industrial resistor Nonzero counts of L network problem. Network RNB28 RNB27 RNB26 RNB1 RNC2 RNB5 RNB4 RNC1

Size

Nonzeros

n

m

A

˜ L

L

1751 2272 2430 7876 18,762 23,221 25,593 58,054

1058 1380 1474 4835 11,782 12,820 16,052 36,392

8549 11,124 11,934 39,374 93,804 116,099 127,959 290,264

31,137 40,591 50,845 2,445,185 12,446,959 20,767,772 18,212,250 168,846,313

13,936 20,848 22,240 101,732 192,646 463,294 456,245 1,646,337

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.23 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

23

Table 3 ˜ and L of saddle point matrices from UF sparse matrix collections. Nonzero counts of L Matrix ncvxqp1 tuma2 stokes64 ncvxqp9 tuma1 Qpband mario001 aug3dcqp blockqp1 mario002

Size n

m

Nonzeros ˚ A

A

˜ L

L

7111 7515 8450 9054 13,360 15,000 23,130 27,543 40,011 234,128

5000 5477 4096 7500 9607 5000 15,304 8000 20,001 155,746

73,963 49,365 140,034 54,040 87,760 45,000 204,912 128,115 640,033 2,097,566

73,963 49,365 140,034 54,040 87,760 45,000 204,912 128,115 640,033 2,097,566

1,107,332 1,991,029 5,501,502 100,293 6,493,093 40,000 8,954,944 3,839,064 600,750,077 1,990,093,350

1,070,385 1,967,493 4,503,680 94,597 6,452,659 40,000 5,759,189 3,547,086 560,113 16,620,482

Table 4 Growth factor ρ(A) for the new method, which is calculated by taking the ratio of maximum absolute element of L to the maximum absolute element of A for the saddle point matrices from UF sparse matrix collections. Matrix

ncvxqp1

tuma2

stokes64

ncvxqp9

tuma1

Qpband

ρ(A)

1.0000

2.3370

1.3533e+04

4.9018e+06

54.9584

1.0455

Matrix

mario001

aug3dcqp

blockqp1

mario002

ρ(A)

40.3018

2.0000

3.3340e+07

57.5445

6. Conclusions Symmetric indefinite matrices arising from saddle point problems can be congruently transformed into special form of a block 3 by 3 structure. The transformed matrices can be exploited efficiently by taking advantage of the structure and properties of their ˚ is blocks. We defined a transformation operator τ such that the constraint matrix B ˚ transformed into a lower trapezoidal form, while the block matrix A is only permuted. The transformed saddle point matrix A is partitioned into a block n by n structure with blocks of orders 1 and 2 by applying a simple, predefined permutation π. We have shown that the block Lb Db −1 Lb T factorization of the block partitioned matrix exists. The transformation operator τ that we have chosen results not only in a sparse factor Lb , but also leads to computationally cheaper block factorization. The permutation π ensures a priori pivots for the factorization. In case the saddle point matrices are badly conditioned, sparsity of B may have to be traded off with its sufficient condition for pivoting. We have also shown that a block 3 by 3 structured factorization with blocks of orders m and n − m can be induced from the block Lb Db −1 Lb T factorization. The induced block factorization is much more sparse and cheaper comparing to the Schilders’ factorization. The new factorization can be used as a direct method as well as a basis for preconditioning techniques.

JID:LAA

AID:13333 /FLA

24

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.24 (1-29)

Acknowledgement We would like to thank Tyrone Rees and Jennifer Scott, Rutherford Appleton Laboratory, Oxford for their valuable suggestions and comments during the process of drafting the manuscript. They inspired us to bring additional sections on numerical stability and computational complexity of the proposed block factorization. We would also like to thank the anonymous referees for carefully reading the manuscript and providing helpful comments. Appendix A. LD −1 LT factorization for a symmetric positive definite matrix We show that LD −1 LT factorization for a symmetric positive definite matrix A ∈ Rn×n has the same computational efficiency as the Cholesky factorization, where L is lower triangular matrix and D = diag(L) is the diagonal part of L. Since this factorization doesn’t require square roots like in the case of Cholesky, it is much more convenient for symmetric indefinite matrices. We present a rigorous proof of existence and uniqueness of this factorization. Theorem. Let A ∈ Rn×n be symmetric and positive definite matrix. Then there exists a unique nonsingular lower triangular matrix L ∈ Rn×n such that A = LD−1 LT

(A.1)

where D = diag (L) is the diagonal part of L. Proof. We develop an algorithm for (A.1) and prove that the algorithm down. Considering A = [aij ] of size 3 × 3, we have ⎤     ⎡ −1 l11 0 0 a11 a12 a13 l11 0 0 l11 l12 −1 a21 a22 a23 = l21 l22 0 ⎣ 0 l22 0 ⎦ 0 l22 −1 l31 l32 l33 a31 a32 a33 0 0 0 0 l33

doesn’t break  l13 l23 . l33

Through direct computations, one can obtain l11 = a11 , l21 = a21 ,

l22 = a22 −

2 l21 , l11

l31 = a31 ,

l32 = a32 −

l31 l21 , l11

l33 = a33 −

2 l31 l2 − 32 . l11 l22

For general n, lij = aij −

j−1  lik ljk k=1

lkk

, 1≤j≤i≤n

(A.2)

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.25 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

25

which exists if and only if the diagonal entries lii = aii −

i−1 2  lik , 1≤i≤n lkk

(A.3)

k=1

are not equal to zero. This can be shown by induction on i = j = 1, . . . , n. In the following T part of the proof, let L(j+1,j) = [lj+1,j , . . . , ln,j ] denotes a jth column vector of L of length n − j. For i = j = 1, l11 = a11 > 0, let A be partitioned as follows:  A=

a11 AT21 A21 A22

 .

 T Let 0 = v 1 ∈ Rn−1 . Define v = v0T v T1 ∈ Rn such that v0 = −AT21 v 1 /a11 . Then    T  T a A v0 11 T 21 0 < v A v = v0 v T1 v1 A21 A22 = v0T a11 v0 + v0T AT21 v 1 + v T1 A21 v0 + v T1 A22 v 1 v T1 A21 v T A21 T v T A21 T AT v 1 a11 21 − 1 A21 v 1 − 1 A21 v 1 + v T1 A22 v 1 a11 a11 a11 a11   T L(2,1) LT(2,1) T T A21 A21 T = v 1 A22 v 1 − v 1 v 1 = v 1 A22 − v1 . (A.4) a11 l11

(2) A =

(A.4) implies A(2) ∈ R(n−1)×(n−1) is symmetric and positive definite. The entries of A(2) are: a(2) rs = aij −

li1 lj1 , l11

1 ≤ r, s ≤ n − 1,

where i = r + 1, j = s + 1.

Thus for i = j = 2, (2)

0 < a11 = a22 −

2 l21 = l22 . l11

Assume that (A.4) holds up to i = j = n − 1 i.e.,  A

(n−1)

=

(n−2) A22



L(n−1,n−2) LT(n−1,n−2) ln−2,n−2

is symmetric and positive definite which has the entries:

 ∈ R2×2

JID:LAA

AID:13333 /FLA

26

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.26 (1-29)

(n−1) ars = aij −

n−2  k=1

lik ljk , lkk

1 ≤ r, s ≤ 2 ,

i = r + n − 2, j = s + n − 2

giving (n−1)

a11

= an−1,n−1 −

n−2 2  ln−1,k

lkk

k=1

= ln−1,n−1 > 0.

 For i = j = n, apply similar partitioning to A(n−1) . Define u = uT0 $ %T (n−1) (n−1) that u1 = 0 and u0 = −A21 u1 /a11 . Then T

0
(n−1)

u=



uT0

uT1





 =

(n−1) A22

uT1

(n−1)

a11

(n−1)

A21 −



(n−1) T

(A21

)



(n−1)

A22

L(n,n−1) LT(n,n−1) ln−1,n−1 A(n)

u0 u1

uT1

T

∈ R2 such



 u1 .

So, A(n) ∈ R1×1 is symmetric and positive definite, hence (n)

a11 = ann −

n−2 2  ln,k k=1

lkk



2 ln,n−1

ln−1,n−1

= ann −

n−1 2  ln,k k=1

lkk

= lnn > 0.

For uniqueness, suppose there exist two nonsingular lower triangular matrices L1 and L2 satisfying (A.2) for the same A. Then T −1 T L1 D −1 1 L1 = L2 D 2 L2 −1 −1 T −T ⇐⇒ L−1 2 L1 D 1 = D 2 L2 L1 .

(A.5)

In (A.5), the left-hand side is a lower triangular matrix while the right-hand side is an upper triangular. This is possible only if L−1 2 L1 is a diagonal matrix. Therefore, −1 let L2 L1 = D, where D is a diagonal matrix and hence D−T = LT2 L−T 1 . Since the diagonal of product of any two upper triangular matrices is equal to the product of their diagonals, we get T −T −T D T2 D −T = diag(LT2 )diag(L−T . 1 1 ) = diag(L2 L1 ) = D

∴ from (A.5), −1 −T DD −1 1 = D2 D −T ⇐⇒ D = D −1 D1 2 D

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.27 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

27

Algorithm 2 For a given symmetric and positive definite matrix A ∈ Rn×n , the algorithm computes a nonsingular lower triangular matrix L ∈ Rn×n such that A = L diag−1 (L)LT . For j = 1, . . . , n, the column L(j : n, j) overwrites A(j : n, j). 1: for j = 1 : n 2: for k = 1 : j − 1 3: ν(k) = A(j, k)/A(k, k) 4: end for 5: for i = j : n j−1 ! 6: A(i, j) = A(i, j) − ν (l)A(i, l) l=1

7: 8:

end for end for

T −T ⇐⇒ D = D −1 2 D2 D1 D1 = I

⇐⇒ L1 = L2 .

2

Since lkk > 0, (A.3) implies that lkk ≤ akk for each k = 1, . . . , n. Again from (A.3),

aii =

i 2  lik l2 l2 ≥ ik ≥ ik , for 1 ≤ k ≤ i ≤ n lkk lkk akk

k=1



|lij | ≤ max aii , for 1 ≤ i, j ≤ n. i

From (A.2), observe that every time the jth column of L is updated, the quotient ljk /lkk is computed repeatedly for (n − j + 1)(j − 1) times. This recurrence causes extra computational cost. Therefore by computing the quotients ljk /lkk for k = 1, . . . , j − 1 and storing them in a vector ν of length j − 1 before every next update, this reduces the number of divisions to j − 1 as shown in Step 3 of Algorithm 2. Step 6 involves (n − j + 1)(j − 1) multiplications and (n − j + 1)(j − 1) additions. Therefore, Algorithm 2 requires total flop counts precisely equal to

n  j=1

(j − 1) + 2(n − j + 1)(j − 1) =

1 3 1 2 5 n + n − n, 3 2 6

which is equal to the flop counts of the Cholesky’s factorization.

JID:LAA

AID:13333 /FLA

28

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

[m1L; v1.159; Prn:3/09/2015; 8:16] P.28 (1-29)

Appendix B. Example for computing the elements of the matrix G in (14) for n = 4, m=2







13 − βb11 g11 g12 = β14 g21 g22 − b11



G

23 − βb22 β24 − b22 H



a11 10 ⎢ ⎢ a21 ⎢ 0 1 ⎣ a31

a 41



a12 a13 a22 a23 a32 a33 a42 a43 A

⎤⎡ β 13 14 − b11 − βb11 a14 ⎥ ⎢ β23 β24 a24 ⎥ ⎢ − b22 − b22 ⎥⎢ a34 ⎦ ⎣ 1 0 a44 0 1

HT

⎤ ⎥ ⎥ ⎥ ⎦

where g11

' & 2 2 β23 β13 β23 a31 β13 a21 = a33 + 2 a11 + 2 a22 − 2 − 2 a32 − β13 b11 b22 b11 b11 b22 = a33 +

2 β13 β2 α31 β13 α32 β23 a11 + 223 a22 − 2 −2 2 b11 b22 b11 b22

using (11);

' & β24 β13 β14 β23 β24 a31 β14 + a41 β13 a21 a + − − a − β 11 32 13 b211 b222 b11 b11 b22 ' & β23 a21 − a42 − β14 b11 b22

g21 = g12 = a43 +

= a43 + −

β13 β14 β23 β24 a31 β14 + a41 β13 a11 + − b211 b222 b11

α32 β24 + α42 β23 b22

g22 = a44 + = a44 +

again by the fact of (11);

' & 2 2 β24 β14 β24 a41 β24 a21 a + a − 2 − 2 a − β 11 22 42 14 b211 b222 b11 b11 b22 2 2 β14 β24 α41 β24 α42 β24 a + a22 − 2 −2 . 11 2 2 b11 b22 b11 b22

References [1] M. Benzi, G.H. Golub, J. Liesen, Numerical solution of saddle point problems, Acta Numer. 14 (2005) 1–137. [2] M. Benzi, J. Liu, Block preconditioning for saddle point systems with indefinite (1, 1) block, Int. J. Comput. Math. 84 (8) (2007). [3] J.R. Bunch, L. Kaufman, Some stable methods for calculating inertia and solving symmetric linear systems, Math. Comp. 31 (137) (1977) 163–179. [4] J.R. Bunch, B.N. Parlett, Direct methods for solving symmetric indefinite systems of linear equations, SIAM J. Matrix Anal. Appl. 8 (1971) 639–655. [5] J.R. Bunch, L. Kaufman, B.N. Parlett, Decomposition of a symmetric matrix, J. Numer. Math. 31 (1976) 31–48. [6] Z.H. Cao, A class of constraint preconditioners for nonsymmetric saddle point problems, Numer. Math. 103 (2006) 47–61. [7] T.A. Davis, Direct Methods for Sparse Linear Systems, SIAM, 2006.

JID:LAA

AID:13333 /FLA

[m1L; v1.159; Prn:3/09/2015; 8:16] P.29 (1-29)

S. Lungten et al. / Linear Algebra and its Applications ••• (••••) •••–•••

29

[8] A.C. De Niet, F.W. Wubs, Numerically stable LDLT -factorization of F -type saddle point matrices, IMA J. Numer. Anal. 29 (2009) 208–234. [9] H.S. Dollar, Iterative linear algebra for constrained optimization, DPhil (PhD) thesis, Oxford University Computing Laboratory, 2005. [10] H.S. Dollar, N.I.M. Gould, W.H.A. Schilders, A.J. Wathen, Implicit factorization preconditioning and iterative solvers for regularized saddle-point systems, SIAM J. Matrix Anal. Appl. 28 (2006) 170–189. [11] H.S. Dollar, N.I.M. Gould, W.H.A. Schilders, A.J. Wathen, Using constraint preconditioners with regularized saddle-point problems, Comput. Optim. Appl. 36 (2007) 249–270. [12] H.S. Dollar, A.J. Wathen, Approximate factorization constraint preconditioners for saddle-point matrices, SIAM J. Sci. Comput. 27 (5) (2006) 1555–1572. [13] I.S. Duff, S. Parlet, Strategies for scaling and pivoting for sparse symmetric indefinite problems, SIAM J. Matrix Anal. Appl. 27 (2) (2005) 313–340. [14] A. Forsgren, P.E. Gill, M.H. Wright, Interior methods for nonlinear optimization, SIAM Rev. 44 (4) (2002) 525–597. [15] M.P. Friedlander, D. Orban, A primal–dual regularized interior-point method for convex quadratic programs, Math. Program. Comput. 4 (2012) 71–107. [16] E. Galligani, L.Z. Modena, Error analysis of an algorithm for equality-constrained quadratic programming problems, Computing 58 (1997) 47–67. [17] P.E. Gillt, W. Murray, D.B. Poncelen, M.A. Saunders, Preconditioners for indefinite systems arising in optimization, SIAM J. Matrix Anal. Appl. 13 (1) (1992) 292–311. [18] G.H. Golub, C. Grief, On solving block-structured indefinite linear systems, SIAM J. Sci. Comput. 24 (6) (2003) 2076–2092. [19] N.I.M. Gould, On modified factorizations for large-scale linearly constrained optimization, SIAM J. Optim. 9 (4) (1999) 1041–1063. [20] C. Keller, N.I.M. Gould, A.J. Wathen, Constraint preconditioning for indefinite linear systems, SIAM J. Matrix Anal. Appl. 21 (4) (2000) 1300–1317. [21] L. Komzsik, P. Poschmann, I. Sharapov, A preconditioning technique for indefinite linear systems, Finite Elem. Anal. Des. 26 (1997) 253–258. [22] Q. Liu, New preconditioning techniques for saddle point problems arising from the time-harmonic Maxwell equations, ISRN Appl. Math. 2013 (2013), 8 pages. [23] S. Lungten, W.H.A. Schilders, J.M.L. Maubach, Sparse inverse incidence matrices for Schilders’ factorization applied to resistor network modeling, Numer. Algebra Control Optim. 4 (3) (2014) 227–239. [24] N. Mastronardi, P. Van Dooren, The antitriangular factorization of symmetric matrices, SIAM J. Matrix Anal. Appl. 34 (1) (2013) 173–196. [25] M.F. Murphy, G.H. Golub, A.J. Wathens, A note on preconditioning for indefinite linear systems, SIAM J. Sci. Comput. 21 (6) (2000) 1969–1972. [26] J. Nocedal, S. Wright, Numerical Optimization, Springer-Verlag, New York, 1999. [27] J. Pestana, A.J. Wathen, The antitriangular factorization of saddle point matrices, SIAM J. Matrix Anal. Appl. 35 (2) (2014) 339–353. [28] J. Rommes, W.H.A. Schilders, Efficient methods for large resistor networks, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 29 (2010) 28–39. [29] T. Rusten, R. Winther, A preconditioned iterative method for saddlepoint problems, SIAM J. Matrix Anal. Appl. 13 (1992) 887–904. [30] A. Shahzad, E.C. Kerrigan, G.A. Constantinides, A stable and efficient method for solving a convex quadratic program with application to optimal control, SIAM J. Optim. 22 (4) (2012) 1369–1393. [31] Q. Schenk, K. Gätner, On fast factorization pivoting methods for sparse symmetric indefinite systems, Electron. Trans. Numer. Anal. 23 (2006) 158–179. [32] W.H.A. Schilders, Solution of indefinite linear systems using an LQ decomposition for the linear constraints, Linear Algebra Appl. 431 (2009) 381–395. [33] UF Sparse Matrix Collection: GHS_indef group, http://www.cise.ufl.edu/research/sparse/matrices/ GHS_indef/index.html. [34] G.W. Stewart, Matrix Algorithms, Vol. 1, Basic Decompositions, SIAM, 1998. [35] M. Tuma, A note on the LDLT decomposition of matrices from saddle-point problems, SIAM J. Matrix Anal. Appl. 23 (2002) 903–915. [36] S.J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997. [37] J.H. Wilkinson, Error analysis of direct methods of matrix inversion, J. ACM 8 (1961) 281–330.