On stochasticity preserving methods for the computation of the matrix pth root

On stochasticity preserving methods for the computation of the matrix pth root

+Model MATCOM-4022; No. of Pages 16 ARTICLE IN PRESS Available online at www.sciencedirect.com ScienceDirect Mathematics and Computers in Simulation...

714KB Sizes 0 Downloads 12 Views

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS Available online at www.sciencedirect.com

ScienceDirect Mathematics and Computers in Simulation xxx (2014) xxx–xxx

Original Articles

On stochasticity preserving methods for the computation of the matrix pth root夽 Tiziano Politi a , Marina Popolizio b,∗ b

a Dipartimento di Ingegneria Elettrica e dell’Informazione, Politecnico di Bari, Via Orabona 4, 70125 Bari, Italy Dipartimento di Matematica e Fisica “Ennio De Giorgi”, Università del Salento, Via per Arnesano, 73100 Lecce, Italy

Received 15 March 2013; received in revised form 12 November 2013; accepted 3 January 2014

Abstract In this paper we consider the numerical computation of the matrix pth root of stochastic matrices. In particular a collection of theoretical results concerning the pth root of stochastic matrices is reported and some numerical methods are described. The aim of the paper is to highlight the properties of such methods when they are applied to numerically compute the pth root of a stochastic matrix when it is expected to be stochastic too. © 2014 IMACS. Published by Elsevier B.V. All rights reserved. Keywords: Stochastic matrices; Matrix functions; Matrix pth root function

1. Introduction Recently the numerical computation of matrix functions can be considered one of the most interesting topics in numerical analysis. Starting from the classical paper by Moler and Van Loan [26] in 1978 about the matrix exponential many works have been devoted to the numerical computation of f(A), being A an order n complex matrix and f an analytic complex valued function [8,9,15,23,24,27,30–32]. Different functions have been considered (i.e. trigonometric functions, pth root function, sector functions and others); the applications which involve these matrix functions are several and cover very different scientific fields. One of the most common application area is probably the numerical solution of matrix differential systems. Here we recall the use of exponential integrators to numerically solve partial differential equations, but also integration schemes for boundary value problems which require the knowledge of trigonometric matrix functions and the numerical integration of fractional differential equations [11,12]. Starting from the computation of matrix functions other numerical problems arise in this field, i.e. the computation of the vector f(A)b being b an order n vector with n  1 (see [31] and references therein). In this paper we consider the problem of the numerical computation of a matrix function preserving an algebraic property. In fact we observe that recently there has been a growing interest in the development of algorithms being able to preserve (or to exploit) the structure of the matrix. For example it is well known that the exponential map of a skew-symmetric matrix (i.e. a matrix such that AT = − A) is an orthogonal matrix and in general the function maps a Lie algebra into its own Lie group (see [10] 夽 ∗

This work was supported by the GNCS-INdAM 2012 project “Integratori esponenziali per equazioni differenziali di ordine frazionario”. Corresponding author. E-mail addresses: [email protected] (T. Politi), [email protected] (M. Popolizio).

0378-4754/$36.00 © 2014 IMACS. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.matcom.2014.01.002

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16 2

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

and references therein). The challenge is to obtain numerical methods preserving the structure of the matrix function (see [10,24]) or the geometric properties of the theoretical solution of matrix differential systems (see [7]). Another interesting research field is the exploitation of the structure of the matrix A in order to efficiently compute f(A) using well known methods (usually methods based on the Schur decomposition of the matrix, [21,22] for centrosymmetric matrices and [25] for circulant matrices). The aim of this paper is to highlight some properties of numerical methods for the pth matrix root when the argument is a stochastic matrix. In fact stochastic matrices arise in the mathematical models using Markov chain. Usually they describe the transition probabilities of a vector state at time t to the same state vector at time t + t, where t is the shortest period over which a transition matrix can be estimated. If a short term transition matrix is needed it can be obtained by computing a pth root (e.g. see [6] for the description of a Markov model in healthcare, but also [14,20] and references therein). A nice example is a Markov chain for the state of weather condition in an airport. In this situation it is indeed common to predict the state of weather condition on a quarter hour basis, that is, every 15 minutes. However, information about weather conditions in the airport are usually collected and recorded every hour, thus resulting, at the end of each hour, in a transition matrix A which includes the existing data. To track the state of the airport on a quarter hour basis one needs to find a transition matrix B such that A = B4 , that is, a stochastic root of a stochastic matrix is required. Another example arises in risk management of portfolio: also in this case the important data, namely the company’s credit ratings, are recorded yearly, thus to define, at the end of the year, a transition matrix with all the recorded information. However, investment horizon is shorter than a year, usually one month or a quarter of a year, thus to require the computation of stochastic roots of stochastic matrices. The paper is organized as follows: in Section 2 some theoretical results concerning matrix pth root and stochastic matrices are reported. In Section 3 some classical numerical methods for the computation of matrix pth roots are recalled. The theoretical behavior of these methods with respect to stochastic and doubly stochastic matrices are described in Sections 4 and 6, while some numerical tests are shown in Sections 5 and 7 where also the numerical computation of vectors like A1/p v is taken into account. Some results described in this paper will be verified just numerically since we hope that the paper can be a starting point for a deeper study of numerical methods suitable to compute the stochastic pth root of a stochastic matrix and of theoretical issues on stochasticity preserving property of the existing numerical methods. 2. Matrix functions and stochastic matrices In this paper we consider the solution of the nonlinear equation Xp = A, where p is a positive integer and A is an order n stochastic matrix. The basic theoretical result on its solution is the following theorem (see [15]). Theorem 2.1 (Classification of pth roots of nonsingular matrices). Let the nonsingular matrix A ∈ Cn×n have the Jordan canonical form Z−1 AZ = J = diag(J1 , J2 , . . ., Jm ), with Jordan blocks Jk = Jk (λk ) ∈ Cmk ×mk , and let λ1 , . . ., (jk) (jk) λs , s ≤ m, be the distinct eigenvalues of A. Let Lk = Lk (λk ), k = 1, . . ., m, denote the p pth roots of Jk given by ⎤ ⎡ (m −1) fjk k (λk )  ⎥ ⎢ fjk (λk ) fjk (λk ) . . . ⎢ (mk − 1)! ⎥ ⎥ ⎢ ⎥ ⎢ . . .. ⎥ ⎢ (jk ) . fjk (λk ) . Lk (λk ) = ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ..  ⎥ ⎢ . f (λ ) k ⎦ ⎣ jk fjk (λk ) where jk ∈ {1, 2, . . ., p} denotes the branch of the pth root function f (z) = are expressible as polynomials in A, given by (j2 ) (jm ) −1 1) Xj = Zdiag(L(j 1 , L2 , . . ., Lm )Z ,

√ p

z. Then A has precisely ps pth roots that

j = 1, . . ., ps ,

(1)

corresponding to all possible choices of j1 , . . ., jm , subject to the constraint that ji = jk whenever λi = λk . If s < m then A has additional pth roots that form parametrized families (j2 ) (jm ) −1 −1 1) Xj (U) = ZUdiag(L(j 1 , L2 , . . ., Lm )U Z ,

j = ps + 1, . . ., pm

(2)

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

3

where jk ∈ {1, 2, . . ., p}, U is an arbitrary nonsingular matrix that commutes with J, and for each j there exist i and k, / jk . depending on j, such that λi = λk while ji = In the theory of matrix functions the roots (1) are called primary functions of A, and the roots in (2) are called nonprimary functions. For a deeper discussion on theoretical issues see [15]. The focus of this paper is on stochastic matrices and we recall here their definition and some properties (see, e.g. [17]). Definition 2.2. A matrix A is said to be a stochastic matrix if it is nonnegative and all its row sums are 1. From a mathematical point of view a nonnegative matrix A is stochastic if and only if Ae = e,

for e = (1, . . ., 1)T .

(3)

Stochastic matrices are endowed with special spectral properties that we recall here (see [20]). Theorem 2.3. Let A ∈ Rn×n be stochastic and let ρ(A) denote its spectral radius. Then 1 ρ(A) = 1; 2 1 is a semisimple eigenvalue of A and has a corresponding eigenvector e; 3 if A is irreducible, then 1 is a simple eigenvalue of A. A natural question that could arise concerns with the stochasticity of f(A) for a stochastic matrix A and a given function f: the first condition the matrix f(A) has to fulfill is the nonnegativity. The second one, thanks to the characterization for stochastic matrices, is that f(A)e = e. For any matrix function g, it holds g(A)v = g(λ)v for any eigen-pair (v, λ) of A. It is thus sufficient that f(1) = 1 to conclude that f(A)e = e. This requirement on f clearly keeps out the majority of functions. Moreover, also for those functions f such that f(1) = 1, the stochasticity preservation is not guaranteed. More hypotheses are needed, as shown in the following result. Theorem 2.4. [20] If the stochastic matrix A ∈ Rn×n is the inverse of an M-matrix then A1/p exists and is stochastic for all p ∈ N. Example: The matrix ⎛ 1 ⎜1 ⎜ ⎜2 ⎜ A=⎜ ⎜ .. ⎜. ⎜ ⎝1 n

⎞ 1 2 .. . 1 n

..

.

···

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 1⎠ n

(4)

fulfills the hypotheses of the Theorem above for any n ∈ N. Indeed it is the inverse of ⎛ ⎞ 1 ⎜ −1 2 ⎟ ⎜ ⎟ ⎜0 ⎟ −2 3 ⎜ ⎟ ⎜. ⎟ .. .. .. ⎜. ⎟ . . ⎝. ⎠ . 0 0 · · · −(n − 1) n which is clearly an M-matrix. In the following we will use matrices like (4) to test the numerical approximations to A1/p whose exact value, according to Theorem 2.4, is stochastic too. Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16 4

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

3. Numerical methods for the matrix pth root Several methods have been proposed over the years to numerically evaluate A1/p , for any matrix A whose pth root is well defined. In this section we give a brief overview of some numerical methods, with the aim to analyze them in the following section to verify if they preserve stochasticity. 3.1. Matrix iterations The simplified Newton iteration for the matrix pth root (see e.g. [18]) is defined as 1−p

Xk+1 =

(p − 1)Xk + AXk p

.

(5)

Unfortunately, as discussed in [19], “there is no hope to find a pure rational matrix iteration, that is, an iterative scheme like Xk+1 = ϕ(Xk ), converging to the matrix pth root”. A variant of this method was proposed in [18] to improve the stability: it reads   ⎧ (p − 1)I + Nk ⎪ ⎪ ⎪ ⎨ Xk+1 = Xk p (6)   ⎪ (p − 1)I + Nk −p ⎪ ⎪ ⎩ Nk+1 = Nk p with initial approximations X0 = I and N0 = A . Moreover it has been proved that (Xk , Nk ) converges quadratically to (A1/p , I) for each A whose eigenvalues belong to the set D = {z ∈ C : Rz > 0, |z| ≤ 1}. The matrix iteration (6) has a computational cost which is O(n3 log p). 3.2. Schur method The algorithm in the Matlab package by Higham [15] to compute the pth root of a matrix is based on the Schur method proposed by Smith [35], who also proved that it is numerically stable. This method starts from a real Schur decomposition of the matrix argument A, say A = QTQT where T is block upper quasi-triangular, each block Tii is either 1 × 1 or 2 × 2 with complex conjugate eigenvalues and Q is real orthogonal. A pth root U of T is computed by means of a recursion involving the blocks in T, thus a pth root X of A is given by X = QUQT . The Schur decomposition can be calculated by numerically stable techniques at a cost of 25n3 flops. The computation of U requires p2 n3 /6 flops and the formation of X = QUQT requires 3n3 flops. 3.3. Contour integral A common way to define a matrix function involves the Cauchy integral: given a closed contour  lying in the region of analyticity of f and containing the spectrum of A, f(A) is defined as  1 f (z)(zI − A)−1 dz. (7) f (A) = 2πi  Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

5

The usual way to exploit this formulation for a numerical approximation of f(A) is to use quadrature formula applied to a suitable parametrization of the contour . Thus, as shown in [36], this is equivalent to a rational approximation, say f (A) ≈ R(A). Several methods derive from this integral representation of f, varying both for the choice of the contour and the quadrature formula. In the following we focus on the approach proposed by Bini et al. [2] specific for the pth root, and the more recent method by Hale et al. [13] working for functions like Aα , log(A) and related ones. Numerical methods to approximate (7) require the inversion of several matrices. This could be represent a problem for ill-conditioned matrices: in these cases other approaches should be preferred. 3.3.1. Bini–Higham–Meini In [2] the approximation to A1/p starts from the integral representation  ∞ p sin(π/p) A1/p = (xp I + A)−1 dx A π 0 which the authors reformulate as a problem of complex integration around the unit circle. The truncation of the Euler–MacLaurin expansion of the transformed integrand leads to the approximation  p −1  N−1 j j wN 2p sin(π/p)  1 − wN A− XN = I (8) A j j 2 N 1 + wN (1 + w ) j=0

N

for wN = cos(2π/N) + i sin(2π/N) and p = 2q for q ∈ N, q odd. The approximation error of this method is A1/p − XN = r 2N for a given r < 1 [2]. The numerical method proposed in [2] is an iterative procedure which, given a starting value N and a tolerance ε, p computes XN by (8); it stops if the error ||A − XN || < ε otherwise continues by doubling N. For our tests we fix N, according to suitable scalar tests, and we just compute the corresponding XN . Moreover, we save a matrix–matrix product by using the equivalent expression ⎞ ⎛ N−1 N−1   2p sin(π/p) ⎝ p p −1 XN = τj I + τj ξj (A − ξj I) ⎠ N j=0

j

j 2

j=0

j

j

with τj = wN /(1 + wN ) and ξj = (1 − wN )/(1 + wN ). Unfortunately, the coefficients τ j and ξ j do not allow to simplify the expression, not even for a real argument A. 3.3.2. Hale–Higham–Trefethen The paper by Hale, Higham and Trefethen [13] deals with the numerical approximation of a wide class of matrix functions f(A), including the real matrix powers. They consider a matrix A, whose spectrum is on or near the positive real axis, to evaluate the contour integral (7); more precisely, they claim that for several cases, as for the power function, it is better to obtain f(A) as A · A−1 · f(A). Thus the integral they consider is  A f (A) = z−1 f (z) (zI − A)−1 dz. 2πi  For the power function they write all in the new variable w = z1/2 and, by means of a conformal map, the region of analyticity of the integrand is transplanted to an annulus. The midpoint rule is then applied for the numerical integration on this contour; they fix N equally spaced points in (− K + iK /2, K + iK /2) defined as tj = −K +

iK (j − 1/2)K +2 , 1≤j≤N 2 N

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16 6

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

where K and K are complete elliptic integrals. The resulting approximation is fN (A) = −

N 8KA(mM)1/4  −1 I σj (w(tj )2 I − A) πNk

(9)

j=1

where m and M are such that the spectrum of A is contained in [m, M], σj =

f (w(tj )2 )cn(tj )dn(tj ) 2

w(tj )(k−1 − u(tj ))

and cn, dn are Jacobi elliptic functions in standard notation (see [1] for the definitions and properties of cn(t) and dn(t)). The approximation error of this method is

f (A) − fN (A) = O(e−π

2 N/(log(M/m)+6)

)

for any m and M. Also in this case, thanks to the hypothesis that A is real, we can save the cost due to the premultiplication by A. Indeed, ⎛  ⎛ −1 ⎞⎞ N N  8K(mM)1/4 ⎝ 1 ⎠⎠ . fN (A) = − (10) Iσj I + I ⎝σj I − A πNk w(tj )2 j=1 j=1 A drawback of this method is that it requires the extremal eigenvalues of A: they are usually not known and their numerical computation is expensive. Moreover, the hypothesis on the real spectrum cannot be relaxed, as we have experimentally tested. Another shortcoming is that formula (10) requires the inversion of N matrices: this is clearly a weakness of the approach. However the authors suggest to use this process to compute f(A)b rather than f(A), as we will discuss in the following tests. For this reason the authors restrict the applicability of the approach to the “backslash matrices”, that is, “large sparse matrices for which systems of equations (zI − A)x = b can be solved efficiently by sparse direct methods but Krylov subspace iterations and Schur reduction to triangular form are impractical” [13]. 3.3.3. Computational costs The expressions in (8) and (10) require essentially the same computational cost when applied to matrix arguments; indeed, in our improved implementation, they involve N matrix inversions for a computational cost of 2Nn3 . 4. Stochasticity preserving methods Several applications, as discussed in the Introduction, require the computation of A1/p for a stochastic A and, in the hypothesis that A1/p is stochastic too, they need a numerical approximation with this feature. In this section we analyze the numerical schemes described in the previous section to check if they preserve this aspect. 4.1. Matrix iterations To verify the stochasticity of the limit of the sequences defined by (5) and (6) we could use the characterization (3): it requires Xk to be positive and to possess the eigen-pair (e, 1). The later is easier to check and it holds for both iterations (5) and (6), as we are going to discuss. Theorem 4.1. If Ae = e and Xk is the sequence of the simplified Newton iteration (5) with X0 e = e, then Xk e = e for any k. Proof. We prove the claim by inductive reasoning on k. The case k = 0 is true for the hypotheses done; assume it is true for k; then 1−p

Xk+1 e =

(p − 1)Xk + AXk p

1−p

e=

(p − 1)Xk e + A(Xk p

e)

=e

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx 1−p

since Xk

7

e = e. 

The analogue for (6) follows: Theorem 4.2. If Ae = e and Xk is the sequence in (6) then Xk e = e for any k. Proof. We start by proving that Nk e = e for any k. We prove it by induction on k; the case k = 0 is true for the hypothesis on A. Assume Nk e = e; then     (p − 1)I + Nk −p (p − 1)I + Nk −p Nk+1 e = Nk e = e. p p It is well known that any eigenvector of a given matrix is an eigenvector of its inverse and its powers; the eigenvalues change as the reciprocal or the power of the original eigenvalue; in this case no change occurs since the eigenvalue is 1. Moreover   (p − 1)I + Nk e=e p thus the claim for Nk+1 follows. Analogously we prove the claim for Xk . It is trivial for k = 0; assume it is true for k;     (p − 1)I + Nk (p − 1)e + e Xk+1 e = Xk e = Xk =e p p so the proof follows.  When the iteration converges the nonnegativity of the Xk ’s is guaranteed; indeed, thanks to the limit definition and to the fact that A is non-negative, from a certain index k the matrix Xk is certainly nonnegative since it converges to A1/p . We can thus summarize these observations in the following theorem Theorem 4.3. For any stochastic matrix A the approximation Xk given by the matrix iteration (6) is stochastic for a sufficiently large value k. More can be said for the matrix iteration (6): indeed, while the convergence is inherent, the following theorem guarantees the quadratic order: Theorem 4.4. The sequence (Xk , Nk ) given by the matrix iteration (6) converges quadratically to (A1/p , I) for any stochastic matrix A. Proof. The thesis easily follows since the spectrum of any stochastic matrix is contained in D = {z ∈ C : Rz > 0, |z| ≤ 1} as stated in Theorem 2.3.  We can thus conclude that the numerical approximation to A1/p given by Xk in (6) is stochastic for a certain iteration k sufficiently large. 4.2. Contour integral The check on the stochasticity of the approximations given by (8) and (10) is very challenging. Here we consider the case p = 2 for the approach by Bini and coauthors (8), while we just give some empirical results for the method by Hale and coauthors (10). 4.2.1. Bini–Higham–Meini and Hale–Higham–Trefethen The analysis for the numerical method in (8) simplifies a bit when one is interested in the matrix square root; in fact, at least for this simplified situation, we can infer that the numerical approximation has the eigenpair (e, 1), as we are going to show by making use of the following lemma: Lemma 4.5. For p = 2, i.e. for the square root, and for every N ∈ N 2p sin(π/p)  τj p = 1. N 1 − ξj N−1 j=0

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16 8

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

Table 1 Errors with respect to expm(1/p*logm(A)) for p = 2 and A, of dimension n × n, as in (4). n

HHT

rootm

MatrixIt

BHM

mpower

sqrtm

10 20 40 80 160

3.5e−015 7.3e−015 2.6e−014 2e−013 1.1e−011

4e−015 7.6e−015 2.6e−014 2e−013 8.4e−013

3.5e−015 7.3e−015 2.6e−014 2e−013 8.4e−013

3.4e−015 1e−013 1.3e−009 3.5e−007 1.3e−005

1.1e−013 8.3e−010 5 3.4e+019 2.1e+081

4e−015 7.6e−015 2.6e−014 2e−013 8.4e−013

Proof. The jth term in the sum is −1

ξj (1 − ξj2 )

j 2

= ξj

(1 + wN ) j 2 (1 + wN )

j 2 − (1 − wN )

=

1 . 4

N−1

Thus the quantity we are analyzing becomes (4/N)

j=0

(1/4) which is clearly equal to 1. 

Theorem 4.6. If Ae = e and Xk is the sequence in (8) for p = 2, then Xk e = e for any k. Proof. Since Ae = e, then (A − ξj2 I)e = (1 − ξj2 )e, thus ⎞ ⎛   N−1 N−1 N−1      −1 ξj2 4 ⎝ 4 2 2 e XN e = τj e + τj ξj A − ξj I e⎠ = τj 1 + N N 1 − ξj2 j=0 j=0 j=0 which results in e, thanks to the Lemma above.  Numerically, for any p 2p sin(π/p)  τj p =1 N 1 − ξj N−1 j=0

which could mean that XN e = e. We intend to deepen this feature to offer a theoretical evidence of it. Moreover, it should be proved that XN is nonnegative. This aspect is clearly challenging, for the presence in the sum of matrix inverses and complex scalars. Applying approximation (9) to a stochastic matrix, to preserve the property it should be fN (A)e = e hence the following equation should hold −

N 8K(mM)1/4  σj = 1, I πNk w(tj )2 − 1 j=1

provided that fN (A) is nonnegative. We verified numerically that the equality holds within machine precision but in a forthcoming paper we aim to analyze in depth this method, together with the one in (8), to completely disclose their properties. 5. Numerical tests 5.1. On the approximation error We present a numerical test to give a flavor of the approximation quality for the methods under investigation. In particular in Tables 1–3 we report the error with respect to the Matlab matrix expm(1/p*logm(A)) which we consider as the exact value of A1/p for A as in (4). The reason for this choice lies on the alternative equivalent way to define A1/p and on the reliability of the results computed by the Matlab functions expm and logm. We compare this matrix Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

9

Table 2 Errors with respect to expm(1/p*logm(A)) for p = 6 and A, of dimension n × n, as in (4). n

HHT

rootm

MatrixIt

BHM

mpower

10 20 40 80 160

1.6e−015 3.5e−015 1.1e−014 3.5e−013 7.4e−011

2.9e−015 4.1e−015 1.1e−014 8.4e−014 3.5e−013

1.5e−015 3.2e−015 1.1e−014 8.4e−014 3.5e−013

3.3e−015 8.6e−013 3.2e−011 3.3e−010 1.8e−009

1.1e−013 1.3e−008 16 2.6e+020 3.8e+082

Table 3 Errors with respect to expm(1/p*logm(A)) for p = 10 and A, of dimension n × n, as in (4). n

HHT

rootm

MatrixIt

BHM

mpower

10 20 40 80 160

1.1e−015 3.4e−015 7.7e−015 4.8e−013 1.1e−010

2.7e−015 3.4e−015 7.4e−015 5.3e−014 2.2e−013

9.4e−016 2e−015 6.8e−015 5.3e−014 2.2e−013

5e−009 5.2e−008 2.4e−007 5.9e−007 1.1e−006

9.3e−014 1e−008 6.3 7.8e+020 1.7e+083

with those given by the Bini–Higham–Meini approximation (8) for N = 80 labeled BHM; the Hale–Higham–Trefethen approximation (10) for N = 20 labeled HHT; the matrix iteration (6) labeled MatrixIt for k = 10 iterations; the rootm given by the Smith’s procedure and the result of the Matlab built-in function mpower. We stress that we consider p = 2, 6, 10 for the restriction imposed by the Bini–Higham–Meini method. The dimensions considered, at most 160 × 160, are quite large for these tests since all these methods have computational costs proportional to n3 . However, our aim is to give an idea of the different performances. The striking result is that for this example the Matlab built-in function mpower has errors which become huge as the dimension enlarges, thus to make the approximation meaningless. For completeness, we report the results given by the Matlab built-in function sqrtm for the special case p = 2: the resulting approximation is extremely accurate, as one would expect. We could thus guess that the command mpower is not optimally implemented to deal with generic real exponents. For the other methods we stress that those based on the contour integral, namely HHT and BHM, are less accurate than the others and the accuracy deteriorates as the matrix dimension gets larger; the reason for this is presumably the matrix inversions involved in the methods. However, for the smallest dimensions the errors are very small for all methods. 5.2. To measure the stochasticity of the numerical approximation In order to check if the numerical approximations given by the methods under investigation are stochastic, we measure the quantity ||Me − e||2 with e = (1, . . ., 1)T and M denoting the numerical approximation given by the corresponding method. Theoretically this value should be zero. For this example we omit to use the Matlab mpower routine for the poor results presented in Tables 1–3. From Tables 4–6 we can appreciate the tiny errors Me − e 2 for all methods, thus confirming the theoretical findings about their stochasticity preservation. Table 4 Errors Me − e 2 where M is the approximation to A1/p by the different methods, e = (1, . . ., 1)T , p = 2 and A, of dimension n × n, as in (4) n

HHT

rootm

MatrixIt

BHM

10 20 40 80 160

4.3e−016 1.5e−015 1.8e−015 1.6e−015 3.5e−015

3.1e−015 5.7e−015 7.8e−015 1.6e−014 5.2e−014

3.3e−016 5.6e−016 1.3e−015 2.1e−015 3.7e−015

5.6e−016 7.9e−016 1.3e−015 1.7e−015 4.4e−015

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16 10

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

Table 5 Errors Me − e 2 where M is the approximation to A1/p by the different methods, e = (1, . . ., 1)T , p = 6 and A, of dimension n × n, as in (4). n

HHT

rootm

MatrixIt

BHM

10 20 40 80 160

1.4e−015 1.7e−015 6.6e−015 1.4e−014 5.7e−015

2.4e−015 4.6e−015 1.1e−014 1.3e−014 5e−014

4e−016 6.6e−016 1.1e−015 2.2e−015 3.6e−015

1.7e−015 2.7e−015 3.5e−015 4.6e−015 1.4e−014

6. Double-stochasticity preserving methods Doubly-stochastic matrices represent a special subset of stochastic ones; many theoretical results have been deduced for these matrices and many works have been devoted to them (see, e.g. [17,33,34] and references therein). These matrices are widely used in some telecommunication applications such as the theory of communication to what are commonly referred to as satellite-switched, time-division multiple-access systems (SS/TDMA systems) (see [3–5]) where doubly stochastic matrices are used as a basic data structure for storing and processing uncertain information in multisensor data correlation (see [38,39]). In this section we extend our analysis to the doubly stochastic case, after recalling basic features (see, e.g. [17]). Definition 6.1. A matrix A is said to be a doubly stochastic matrix if A and AT are stochastic. Hence a nonnegative matrix A is doubly stochastic if and only if (3) holds and e T A = eT ,

for e = (1, . . ., 1)T .

(11)

Example: A simple way to get doubly stochastic matrices is to start from orthogonal matrices. Indeed, given a orthogonal matrix A with entries aij , the matrix U defined by the entries uij = |aij |2 is doubly stochastic [17]. More generally, doubly stochastic matrices can be obtained by starting from any positive square matrix A since, as shown by Sinkhorn [33], there always exist two diagonal matrices D1 and D2 with positive main diagonals such that D1 AD2 is doubly stochastic. The matrices D1 and D2 are unique up to a scalar factor and they can be obtained by applying the Sinkhorn–Knopp algorithm derived in [34]: in practice they are the limit of the sequence of matrices generated by alternately normalizing the rows and columns of A. Our aim is to investigate if the numerical methods recalled so far for the matrix pth root preserve the double stochasticity. The first step is thus to understand when the exact value is doubly stochastic. Theorem 6.2. If the matrix A ∈ Rn×n is doubly stochastic and is the inverse of an M-matrix then A1/p exists and is doubly stochastic for all p ∈ N. Proof. Theorem 2.4 guarantees that the matrix A1/p exists and is stochastic. Let M be the M-matrix such that A = M−1 , then −1

T

AT = (M −1 ) = (M T )

1/p

and MT is clearly an M-matrix. Thus, from Theorem 2.4, (AT ) T 1/p

(A )

= (A

exists and is stochastic; moreover

1/p T

)

Table 6 Errors Me − e 2 where M is the approximation to A1/p by the different methods, e = (1, . . ., 1)T , p = 10 and A, of dimension n × n, as in (4). n

HHT

rootm

MatrixIt

BHM

10 20 40 80 160

1.3e−015 6.5e−015 7.4e−015 5.7e−015 1.5e−014

2.4e−015 4.7e−015 1e−014 1.3e−014 4.8e−014

4e−016 8.6e−016 1.6e−015 2.9e−015 4.3e−015

9.6e−012 1.4e−011 1.9e−011 2.7e−011 3.8e−011

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

11

for the properties of matrix functions.  Matrices satisfying the hypotheses of the previous theorem can represent the test arguments to numerically approximate the matrix pth root and verify the double stochasticity preservation. 6.1. Matrix iterations We start our analysis with the matrix iterations discussed in Section 3.1. Since we have already proved they preserve stochasticity, we just have to prove that the vector e is a right eigenvector, corresponding to the eigenvalue 1, for the approximation to A1/p . Theorem 6.3. If eT A = eT and Xk is the result of the matrix iteration in (5), then eT Xk = eT for any k. Proof. The proof is similar to the stochastic case.  Theorem 6.4. If eT A = eT and Xk is the result of the matrix iteration in (6) then eT Xk = eT for any k. Proof. The proof is similar to the stochastic case.  The previous results trivially extend to the limit of the matrix iterations; we can then conclude that the approximations to A1/p given by (5) and (6) are doubly stochastic whenever A is doubly stochastic. 6.2. Contour integral The analysis for the approximations given by (8) and (10) is intricate and would deserve a deeper study we intend to tackle in a future paper. For the approximation (8) by Bini and coauthors we restrict the attention to the square root to conclude it actually preserves double stochasticity. Indeed, as before, since the stochasticity preservation has been proven, we just need to show that Eq. (11) holds: Theorem 6.5. If eT A = eT and Xk is the sequence in (8) for p = 2, then eT Xk = eT for any k. Proof. The proof is similar to the stochastic case.  √ The approximation (8) to A is then doubly stochastic when the matrix argument A is so. We will confirm this finding by numerical tests. For the doubly stochastic case we use test matrices obtained by applying the Sinkhorn–Knopp method to positive matrices, as discussed at the beginning of this Section. Specifically, we consider the matrix A = (Z ∗ Z).2 with Z a square normally distributed random matrix of dimension 10, 20, 40, 80, 160 (this choice guarantees A is positive and symmetric). In Fig. 1 we report, as the matrix dimension varies, the error ||eT M − eT ||2 where M is the approximation to A1/p by the different methods, e = (1, . . ., 1)T and p = 2, 10. All the approximations show very small errors, slightly smaller for p = 2 than for p = 10, but in any case smaller than 10−10 . The errors for HHT, MatrixIt and rootm are always very similar, while those of BHM and Higham present noticeable differences depending on p. In any case all the methods seem to preserve double stochasticity. Very similar plots result when using the doubly stochastic matrix we get by applying the Sinkhorn–Knopp algorithm to the tridiagonal matrix with entries 1, 10, 1, as shown in Fig. 2. A possible reason to explain the deep similarity of the plots could rely on the similar spectral properties of the matrices under investigation, since both of them must have a (cramped) spectrum in (0, 1]. 7. On the approximation of A1/p v When the interest is on vectors of the form A1/p v for a given vector v and a small matrix argument A, one can think to simply compute the matrix A1/p and then to multiply it for the vector v. The situation is more challenging when the dimension of A is not small since, as described before, the available methods to compute A1/p are expensive. However, methods like HHT and BHM which in the matrix case involve matrix inversions, for the computation of A1/p v they actually involve only linear systems, thus becoming more competitive and more stable. Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

Fig. 1. Behavior as the matrix dimension varies of the error ||eT M − eT ||2 where M is the approximation to A1/p by the different methods, e = (1, . . ., 1)T and p = 2 (left), p = 10 (right).

12

ARTICLE IN PRESS

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS 13

Fig. 2. Behavior as the matrix dimension varies of the error ||eT M − eT ||2 where M is the approximation to A1/p by the different methods, e = (1, . . ., 1)T and p = 2 (left), p = 10 (right).

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16 14

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

Table 7 Errors ˜e − e 2 for A as in (4), e = (1, . . ., 1)T and p = 2. n

HHT

Arnoldi

BHM

100 200 400 800

6.5e−015 1.2e−014 9.4e−014 3.6e−013

8.4e−015 1.9e−014 6.4e−014 1.7e−013

5e−015 1.4e−014 3.4e−014 7.6e−014

Table 8 Errors ˜e − e 2 for A as in (4), e = (1, . . ., 1)T and p = 6. n

HHT

Arnoldi

BHM

100 200 400 800

3.5e−015 4.2e−015 1.5e−014 9.8e−013

6.6e−015 1.2e−014 4.2e−014 1.3e−013

5e−015 7.2e−015 1e−014 1.6e−014

In general, ad hoc strategies are often used when just vectors like A1/p v are necessary. An important approach to approximate f (A)v for a given matrix A and a suitable function f relies on Krylov subspace methods. The basic idea is to project the problem into a possibly much smaller space, in such a way to drastically reduce the problem dimension and to make its solution simpler. The favorable computational and approximation properties have made the Krylov subspace methods extensively used; see, among the others, [32,16,23,24,27]. Over the years some tricks have been added to these techniques to make them more effective, both in terms of computational cost and memory requirements, see, e.g., [28,37,31,29]. The approximation spaces for these techniques are defined as Km (A, v) = span{v, Av, . . ., Am−1 v}. Since the basis given by the vectors v, Av, . . ., Am−1 v can be very ill-conditioned, one usually applies the modified Gram-Schmidt method to get an orthonormal basis with starting vector v1 = v/ v . Thus, if these vectors v1 , . . ., vm are the columns of a matrix Vm and the upper Hessenberg matrix Hm collects the coefficients hi,j of the orthonormalization process, the following Arnoldi formula holds AV m = Vm Hm + hm+1,m vm+1 eTm with em denoting the mth column of the identity matrix. An approximation to f (A)v may be obtained as ym = v Vm f (Hm )e1 . The procedure reduces to the three-term Lanczos recurrence when A is symmetric, which results in a tridiagonal matrix Hm . One has still to face with the issue of evaluating a matrix function, but, if m n, for the much smaller matrix argument Hm , which is m × m. Several approaches can then be tried, for example the built-in function funm in Matlab, based on the Schur decomposition of the matrix argument, and the Schur–Parlett algorithm to evaluate the function of the triangular factor [15]. Table 9 Errors ˜e − e 2 for A as in (4), e = (1, . . ., 1)T and p = 10. n

HHT

Arnoldi

BHM

100 200 400 800

3.5e−015 5.5e−015 1.3e−014 9.7e−013

6.4e−015 1.1e−014 2.9e−014 1.1e−013

3e−011 4.3e−011 6.1e−011 8.6e−011

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

15

In Tables 7–9 we report the error ˜e − e 2 where e˜ is the approximation to A1/p e given by the Arnoldi, the HHT and the BHM methods and the matrix A is the one defined in (4) for the dimension n varying from 100 to 800. We stress that for this test we considerably enlarged the problem dimension since the methods are much more effective than in the previous cases. Indeed, Arnoldi requires just matrix–vector multiplications, while HHT and BHM reduce to the solution of linear systems (the Matlab backslash has been used to solve them). The results from Tables 7, 8 and 9 confirm the stochasticity preservation of the HHT and BHM methods. Moreover they show that also the Arnoldi method preserves, in some sense to be better understood, stochasticity. 8. Conclusions and future work In this paper we have addressed the numerical computation of the stochastic pth root of a stochastic matrix. We have considered different classes of methods such as Matrix Iterations, Contour Integrals and Schur methods. The analysis for methods based on contour integrals is challenging; however they look promising since they can be applied to much larger dimension problems when the interest is in the computation of the vector A1/p v; indeed, in this case the most demanding part is the solution of linear systems and it is possible to reduce the computational costs by applying iterative methods. In future we aim to deepen some theoretical issues about the stochasticity preserving property of the pth root of these methods and others known in literature. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions, Dover, New York, 1972. D.A. Bini, N.J. Higham, B. Meini, Algorithms for the matrix pth root, Numerical Algorithms 39 (4) (2005) 349–378. R.A. Brualdi, Notes on the Birkhoff algorithm for doubly stochastic matrices, Canadian Mathematical Bulletin 25 (2) (1982) 191–199. R.A. Brualdi, Some applications of doubly stochastic matrices, Linear Algebra and its Applications 107 (1988) 77–100. R.A. Brualdi, Combinatorial Matrix Classes, vol. 13, Cambridge University Press, Cambridge, 2006. T. Charitos, P.R. de Waal, L.C. Van Der Gaag, Computing short-interval transition matrices of a discrete-time Markov chain from partially observed data, Statistics in Medicine 27 (2008) 905–921. N. Del Buono, L. Lopez, Geometric integration on manifold of square oblique rotation matrices, SIAM Journal on Matrix Analysis and Applications 23 (4) (2002) 974–989. N. Del Buono, L. Lopez, A survey on methods for computing matrix exponentials in numerical schemes for ODEs, in: Computational Science – ICCS 2003. Part II, Springer, vol. 2658(1), 2003, pp. 111–120. N. Del Buono, L. Lopez, R. Peluso, Computation of the exponential of large sparse skew-symmetric matrices, SIAM Journal on Scientific Computation 27 (2005) 278–293. N. Del Buono, L. Lopez, T. Politi, Computation of functions of Hamiltonian and skew-symmetric matrices, Mathematics and Computers in Simulation 79 (4) (2008) 1284–1297. R. Garrappa, M. Popolizio, On the use of matrix functions for fractional partial differential equations, Mathematics and Computers in Simulation 81 (5) (2011) 1045–1056. R. Garrappa, M. Popolizio, Evaluation of generalized Mittag–Leffler functions on the real line, Advances in Computational Mathematics 39 (1) (2013) 205–225. N. Hale, N.J. Higham, L.N. Trefethen, Computing Aα , log(A), and related matrix functions by contour integrals, SIAM Journal on Numerical Analysis 46 (2008) 2505–2523. Q.G.E. He, A note on the stochastic roots of stochastic matrices, Journal of Systems Science and Systems Engineering 12 (2) (2003) 210–223. N. Higham, Functions of Matrices: Theory and Computation, SIAM, Philadelphia, PA, USA, 2008. M. Hochbruck, C. Lubich, On Krylov subspace approximations to the matrix exponential operator, SIAM Journal on Numerical Analysis 5 (1997) 1911–1925. R. Horn, C. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, 1990. B. Iannazzo, On the Newton method for the matrix pth root, SIAM Journal on Matrix Analysis and Applications 25 (2006) 503–523. B. Iannazzo, A family of rational iterations and its application to the computation of the matrix pth root, SIAM Journal on Matrix Analysis and Applications 30 (2008) 1445–1462. L. Lin, Roots of stochastic matrices and fractional matrix powers, The University of Manchester, 2011 (Ph.D. thesis). Z. Liu, H.C.H. Chen, Computation of the principal square root of centrosymmetric H-matrices, Applied Mathematics and Computation 175 (2006) 319–329. Z. Liu, Y.R.R. Zhang, Computing the square root of matrices with central symmetry, Applied Mathematics and Computation 186 (2007) 715–726. L. Lopez, V. Simoncini, Analysis of projection methods for rational function approximation to the matrix exponential, SIAM Journal on Numerical Analysis 44 (2006) 613–635. L. Lopez, V. Simoncini, Preserving geometric properties of the exponential matrix by block Krylov subspace methods, BIT Numerical Mathematics 46 (2006) 813–830.

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002

+Model MATCOM-4022; No. of Pages 16 16

ARTICLE IN PRESS

T. Politi, M. Popolizio / Mathematics and Computers in Simulation xxx (2014) xxx–xxx

[25] C. Lu, C. Gu, The computation of the square root of circulant matrices, Applied Mathematics and Computation 217 (2011) 6819–6829. [26] C. Moler, C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, SIAM Review 20 (1978) 801–836. [27] I. Moret, Rational Lanczos approximations to the matrix square root and related functions, Numerical Linear Algebra with Applications 16 (2009) 431–445. [28] I. Moret, P. Novati, RD-rational approximations of the matrix exponential, BIT Numerical Mathematics 44 (2004) 595–615. [29] I. Moret, M. Popolizio, The restarted shift-and-invert Krylov method for matrix functions, Numerical Linear Algebra with Applications 21 (2014) 68–80. [30] T. Politi, M. Popolizio, Schur decomposition methods for the computation of rational matrix functions, in: Computational Science – ICCS 2006, vol. 3994, 2006, pp. 708–715. [31] M. Popolizio, V. Simoncini, Acceleration techniques for approximating the matrix exponential operator, SIAM Journal on Matrix Analysis and Applications 30 (2008) 657–683. [32] Y. Saad, Analysis of some Krylov subspace approximations to the matrix exponential operator, SIAM Journal on Numerical Analysis 29 (1992) 209–228. [33] R. Sinkhorn, Diagonal equivalence to matrices with prescribed row and column sums, American Mathematical Monthly 74 (1967) 402–405. [34] R. Sinkhorn, P. Knopp, Concerning nonnegative matrices and doubly stochastic matrices, Pacific Journal of Mathematics 21 (1967) 343–348. [35] M.I. Smith, A Schur algorithm for computing matrix pth roots, SIAM Journal on Matrix Analysis and Applications 24 (4) (2004) 971–989. [36] L.N. Trefethen, J.A.C. Weideman, T. Schmelzer, Talbot quadratures and rational approximations, BIT Numerical Mathematics 46 (2006) 653–670. [37] J. van den Eshof, M. Hochbruck, Preconditioning Lanczos approximations to the matrix exponential, SIAM Journal on Scientific and Statistical Computing 27 (4) (2006) 1438–1457. [38] F. Witte, Doubly stochastic matrices and sequential data association, in: Part I, Information Linkage Between Applied Mathematics and Industry, Academic Press, New York, 1979, pp. 641–646. [39] F. Witte, D. Lucas, Probabilistic tracking in a multitarget environment, in: IEEE, 1978.

Please cite this article in press as: T. Politi, M. Popolizio. On stochasticity preserving methods for the computation of the matrix pth root, Math. Comput. Simul. (2014), http://dx.doi.org/10.1016/j.matcom.2014.01.002