Information Processing North-Holland
Letters
17 (1983) 145-148
ON THE ARITHMETIC Gadiel
SEROUSSI
Mathematical
5 October
COMPLEXITY
OF MATRIX
KRONECKER
1983
POWERS
* and Fai MA
Sciences Department,
Communicated by Michael Received 8 January 1983 Revised 31 March 1983
IBM
Thomas J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.
A. Harrison
In this paper we study the arithmetic complexity of computing the pth Kronecker power of an n x n matrix. We first analyze a straightforward inductive computation which requires an asymptotic average of p multiplications and p- 1 additions per computed output. We then apply efficient methods for matrix multiplication to obtain an algorithm that achieves the optimal rate of one multiplication per output at the expense of increasing the number of additions, and an algorithm that requires O(log p) multiplications and O(log 2p) additions per output.
Keywords:
Algebraic
computational
complexity,
Kronecker
1. Introduction Let v=(v,,v~ ,..., v,)’ be a column vector of indeterminates, and, for any positive integer p, let VP denote the column vector whose entries are all the different monomials of degree p in the entries commutativity between indeof v, assuming terminates. For example, if v = (v,, vz)‘, then v3 = (vf, v&, v,v,‘, v,‘)‘. By a simple combinatorial argument, it can be shown that the dimension of ). We shall denote this dimension by vp is (“+:-I m(n, p). We shall index the entries of vp by vectors I =(i,, iz, . . . . ip) with 1
* On leave from Haifa, Israel. 0020-0190/83/$3.00
Technion,
Israel
Institute
of Technology,
0 1983, Elsevier Science Publishers
powers
where A(p) is an m(n, p) X m(n, p) matrix whose entries are polynomials in the entries of A. A(p) is called the pth Kronecker power of A [I]. As an example, if A = (Ai,) is a 2 X 2 matrix, then its Kronecker square is A2 A(2) =
A,,:,, i
A221
AL
2‘4,1‘4,2 A,,A22
+ -=,,A22
A,2A2,
Al2A22 A222
We shall index the entries of A(p) by pairs (n, p)-vectors I, J corresponding to the indexing up and VP, i.e.,
. i
of of
(up)1 = CA(p)t,~(v~),. The concept of matrix Kronecker powers [I] naturally arises in many areas of applied mathematics. It has been used in the study of probability moments [2], in feedback control [3] and in stability analysis [4], to mention a few recent applications. As pointed out in [5], the chief virtue of Kronecker algebra lies in the fact that its properties sometimes enable one to bypass many of the
B.V. (North-Holland)
145
Volume
17, Number
INFORMATION
3
PROCESSING
for any two conformable square matrices A and B [5]. It is important to notice that, despite many similar algebraic properties, A(p) is not the same as the p-fold direct Kronecker product A @ A @ A @ . . @ A (p times), where 8 is defined by A 8 B = (A,,B). To some extent, A(p) is a ‘reduced’ version of A 8 A @ . - @ A [l]. This reduction, however, does not preserve distributivity of powers with respect to multiplication, and, in general, it is not true that A(p + q) = A(p) @ A(q). Despite its many applications, we have not found in the literature any investigation of the complexity of computing A(p). In this paper we study the arithmetic complexity of A(p), i.e., we investigate the number of arithmetic operations needed to compute the entries of A(p) given the entries of A. We shall emphasize the multiplicative complexity. but we shall also mention the additive complexity. In Section 2 we analyze the computation of A(p) using a straightforward inductive method. We show that this computation requires an asymptotic average of p multiplications per computed output (in short, m/o) and p - 1 additions per computed output (in short, a/o). Here asymptotic means “as n + cc with p +C n”. In Section 3 we use efficient methods for matrix multiplication [6] to show how A(p) can be computed with an asymptotic rate of 1 m/o, which is optimal. This proposed computation, however, increases considerably the number of additions. Using instead a slightly improved version [S] of the results of [7] for matrix multiplication, we obtain an algorithm that performs an asymptotic average of O(log p) m/o and O(log ‘p) a/o.
A(P)l.J =
A(P),,., = c
1
A,,,
c I 0 S.t.
J~J
fI
Ob,)=J
A,L.n(,k)
k=z
1
(2) Assume A(p - 1) has already been computed, and let d(J) denote the number of distinct values of entries of J. Then, computing A(p), J directly from the expression at the right-hand side of (2) requires d(J) multiplications and d(J) - 1 additions. Denoting by u.(n, p) and a(n, p) respectively, the total number of multiplications and additions performed to compute all the entries of A(p), we have P) =
Cd(J) + p(n. P - 1) 1.J
Then G
(up)1 = u,,u,> ‘. . Ulp
146
IFI A,‘&,& k=
Here S(J) denotes the set of all distinct permutations on the elements of the (n, p)-vector J (notice that the cardinality of S(J) depends on J, varying from(S(J)(=l whenj,=j,=... =j,to]S(J)J=p! when all the elements of J are different). Hence, the entries of A(p) are sums of monomials of degree p in the entries of A. and it can be readily verified that each monomial appears in exactly one entry of A(p). Let I’ be the (n, p - I)-vector (iz, i,, . . . . ir), let j be the value of an entry of J. and let J’(j) be the (n. p - 1)-vector obtained from J by removing the first occurrence of j. Letting j run through all the different values of the entries of J, we can rewrite (1) as follows:
t-t(n.
= ( A,,,v, + Ai,z~z + . +. + A,&
c osS(J)
2. The straightforward computation . . . . ir) be an (n. p)-vector.
1983
For an (n, p)-vector J = Cj,, jz, . . ., jr), the coefficient of (vP)~ in the above expression is
= A(P)B(P)
Let I=(i,,i,,
5 October
@,,P,+ A,,Pz + . . + A,J,,) . . . . +Yp,v,+ kp2v2+ . . + A,,nd
difficulties due to noncommutativity. One of the most significant properties of Kronecker powers is that (AB)(p)
LETTERS
p*m(n,
p)‘+
p(n, p - 1)
P
<
.. G c k=2
k-m(n.
k)‘,
p>
1.
(3)
Volume
17, Number
INFORMATION
3
PROCESSING
and o!(n, p) = c (d(J) - 1) + a(n,
P - 1)
I,J
G (p - l)*m(n, < ...
<
p)’ + 4n,
5 (k -
I)*m(n,
p - 1) k)*,
P’
1. (4)
k=2
(Clearly, t.i.(n, 1) = c.u(n, 1) = 0.) To obtain the average number of operations per output, we divide y(n, p) and cw(n, p) by m(n, p)‘. Since m(n, p) = n’/p! when n >> p, the asymptotic contribution of the terms corresponding to k < p in the right-hand sides of (3) and (4) is negligible, and the overall asymptotic complexity is p m/o and p - 1 a/o.
3. Improved algorithms Eq. (2) implies that the entries of A(p) are bilinear forms in the entries of A and A(p - 1). In this section, we show how to partition this set of bilinear forms into many matrix multiplication problems, and then use efficient matrix multiplication algorithms to reduce the complexity of the computation of A(p). Let J be a fixed (n, p)-vector, and let i, be a fixed integer in the range 1 < i, < n. Let i, run through all the integer values in the range 1 Q i, < 12, and let I’ run through all (n, p - I)-vectors having i, as their first component. The number of such vectors is m(n - i, + 1, p - 2). Under these assumptions, we can regard the right-hand side of (2) as expressing the multiplication of an i, X d(J) submatrix B(i,, J) of A by a d(J) X m(n - i, + 1, p - 2) submatrix C(i,, J) of the transpose of A(p - 1). From this matrix multiplication we obtain i,*m(n - i, + 1, p - 2) entries A(p),., of the Jth column of A(p), namely, those for which the first index I has the fixed value i, as its second coordinate. To obtain all the entries of the Jth column, we perform one matrix multiplication for each value of i,, 1 G i, G n. In [6] Bracket and Dobkin show how to multiply an N x s matrix by an s x N matrix in N2 + o(N’) multiplications (i.e., asymptotically, one m/o), provided s< log zN. This result can be extended to the product
5 October
LETTERS
1983
of an N x s by s x M matrix if, in addition to the above condition on N and s, we also have sN/M + 0 as N + 00 (in fact, s(M mod N)/M + 0 would suffice, but we shall use the stronger condition). In the case of the matrix multiplications B(i,, J) x C(i,, J), we can use the algorithm of [6] whenever i, 2 2d(J) and d(J)*i,/m(n-i,+
l,p-2)+0
asn+co.
Since d(J) < p, we can satisfy the first condition by requiring i, > 2r, and to satisfy the second condi. . tion, we require i2 G n - n’ for some E such that I/(p - 2) < E < 1. (We assume p > 4. The cases p = 2 and p = 3 will be treated separately.) For values of i 2 outside the range 2P < i z < n - n’, we use the straightforward algorithm for matrix multiplication, which requires d(J) < p m/o. Let B(n, p) denote the cardinality of the set {I] 1
< 2p or n - n’ < iz < n}.
Then, the average number is upperbounded by l*[m(n,
in the algorithm
PII + P-W PI m(wP)
P> - P(n
Using the approximation be readily verified that if p*2P/n
of m/o
--f 0
m(n, p) = rip/p!,
it can
asn--+oo,
then p*P(n,
p)/m(n,
p) + 0
as n + 00.
Hence, the contribution of the ‘marginal’ i,‘s to the average complexity of an entry of A(p) is negligible, and therefore, the computation of A(p) takes asymptotically one m/o when p > 4. It remains to prove that the same asymptotic rate can be achieved for p = 3 and p = 2. For p = 3, choose, say, k(n) = [n’/3]. When i, x==k(n) and n - i, + 1 > k(n) (as will happen most of the time when i, varies in the range 1 < i, ,< n), partition B(i,, J) into blocks of dimension k(n) x d(J) (with some remainder), and C(i,, J) into blocks of dimension d(J) X k(n). The block multiplications can be done at the rate of one m/o. Again, the contribution of the ‘marginal’ i,‘s is negligible. 147
Volume
17. Number
3
INFORMATION
For p = 2, the rate of 1 m/o the following scheme: A(2&,.tt
using
I
= A,,A,,,
A(2)qr.st = AqaA,t = (A,,
is achieved
PROCESSING
+ A,,A,,
+ Aqt )(A,,
-A,,A,,
-
iz >> ph
A,,A,,.
The total number of multiplications in the above computation is p(n, 2) = m(n, 2)’ + O(n”). Clearly,
The above follows.
2)2 -+ 1 m/o discussion
5 October
1983
\ to O(log N) m/o.and O(log 2N) a/o. These complexities can be carried out to the computation of A(p), by partitioning B(i,, J) into blocks of dimension, say, ph X d(J), and C(i,, J) into blocks of dimension d(J) X ph. This partition is useful for values of i, such that
+ A,,)
l
r*(n, 2)/m(n,
LETTERS
as n + x. can
be summarized
as
Theorem.
There is an algorithm for computing A(p) which requires m(n, P)~ + o(m(n. P)~) multiplications, provided that ~2~/n + 0 as n + co.
and
m(n - i, + 1, p - 2) > p’.
This will happen for most values of i, if n grows rapidly enough with respect to p (e.g., n > p” with x > 6). For each block multiplication, we can use the algorithm of [7], with O(log p) m/o and O(log ‘p) a/o. As in the previous cases, the complexity contribution of i,‘s for which the partition is not useful is negligible. Notice also that in this algorithm, n is required to grow polynomially in p. as opposed to the exponential growth required in the algorithm based on [6].
Acknowledgment
It should be noted that the rate of 1 m/o is optimal for the computation of A(p). This is due to the fact that since each monomial of degree p in the A,,‘s appears exactly in one entry of A(p), all the entries are linearly independent in the sense of [9]. Hence, by [9, Theorem l] the minimal number of multiplications in any algorithm computing A(p) is m(n, p)2. The optimal multiplicative complexity of the algorithm for A(p) is achieved at the expense of a considerable increase in the number of additions (this is inherited from the construction in [6]. which is inefficient in the number of additions). If p is unbounded as n goes to infinity, we can obtain more balanced algorithms by using instead the results of [7], where an efficient algorithm is presented for the computation of the product of an N x N” matrix by an N” X N matrix, when oi < 210g 2/51og 5 = 0.17227. (The results of [7] deal with the product of N X N by N X N” matrices. By well-known duality properties of bilinear forms [9], the two problems are computationally equivalent.) The asymptotic complexity of the algorithm in [7] is upperbounded by O(log ‘N) m/o and O(log 3N) a/o. Using a slight modification, Coppersmith [8] has improved these bounds
148
We are grateful to Don Coppersmith useful conversations, and for suggesting results of [6,7,8].
for many using the
References to Matrix Analysis (McGraw-Hill. [ll R. Bellman, Introduction New York, 1960). PI R. Bellman, T.T. Soong and R. Vasudevan, On moment behavior of a class of stochastic difference equations. J. Math. Anal. Appl. 40 (1972) 286-299. and R.E. King, A. Kronecker product [31 P.N. Paraskevpoulos approach to pole assignment by output feedback. Internat. J. Control 24 (3) (1976) 325-334. [41 F. Ma and T.K. Caughey, On the stability of linear and nonlinear stochastic transformations. Internat. J. Control 34 (3) (1981) 501-511. operaI51 R. Bellman. Limit theorems for non-commutative tions. Part I. Duke Math. J. 21 (1954) 491-500. [61 R.W. Brockett and D. Dobkin. On the number of multiplications required for matrix multiplication. SIAM J. Comput. 5 (1976) 624-628. Rapid multiplication of rectangular [71 D. Coppersmith, matrices. SIAM J. Comput. 11 (1982) 467-471. Personal communication. [81 D. Coppersmith. Arithmetic complexity of computations, [91 S. Winograd, CBMS-NSF Regional Conf. Series Appl. Math. 33 (SIAM. Philadelphia, PA. 1980).