Chapter V Matrices 1. Introduction
In developing the theory of matrices we are concerned with two main topics which are of practical importance, viz. the solution of a system of linear algebraic equations and the determination of the eigenvalues of a matrix in the special case when they are all distinct. Sets of simultaneous linear differential equations which often occur in practice can, with certain approximations, be replaced by a set of linear algebraic equations, thus giving rise to the above-mentioned problems. Usually the process of solving such a system of equations in many variables is laborious and so methods suitable for computation on a desk machine have been included, notably the Choleski-Turing method. 1.1. Definitions
A rectangular array of numbers (real or complex) of the form /
*12
*13 · · · «1»
\
I a21
a22
a2Z...
\
a
a
\ am\
m2
a2n
m3 · · · amn /
is called an (m x n)-matrix (or a matrix of order m x n), having m rows and n columns. We seek to define the operations of addition, subtraction, multiplication, and division treating a matrix as a single entity. Before giving these definitions we make some remarks on the rectangular array A. Sometimes the matrix A is denoted by (αί;·) where, in the absence of more The matrix notation used in this chapter is summarized in Table V on page 303.
250
251
1.1. DEFINITIONS
precise information, it is assumed that 1 ^ t ^ m, 1 ^ / ^ n. If m = 1 then the matrix A reduces to the single row (ana12als
.. ,aln)
and is called a row matrix. In n-dimensional space the (real) numbers an, a12,... aln can be taken to be the components of a vector, and so a row matrix is sometimes called a vector of the first kind, thus distinguishing it from a vector of the second kind which is a column matrix, i.e. a matrix with one column and m rows. To print a column matrix as a column takes too much space and so we adopt the usual convention that a column matrix is written horizontally and enclosed in { }; a row matrix is enclosed in [ ]. Addition
Let A = (a^), B = (b^); then if A and B are matrices of the same order we define
(
Ä
ll + £>n
«12 + &12 · · · aln + hn
a21 +b21
a22+b22
...a2n
+ b2n
#ml + ^wl
#m2 + bm2 . . . amn + bmn
Subtraction
As for addition, subtraction is only defined when A and B are of the same order, and then A — B = (oij -
bij)
Null matrix
If every element of a matrix is zero, i.e. if a^ = 0 for all i and / satisfying 1 ^ i ^ m, 1 ^ / ^ n, then the matrix is called a null matrix and is represented by (0). It is equivalent to the number zero in ordinary algebra. Equality of matrices
Two matrices A and B are said to be equal if they are of the same order and if A - B = (0), the njill matrix.
252
V. MATRICES
Transposition A matrix Ä can be derived from A by writing the rows of A as the columns of Ä. This operation is known as transposition and the matrix Ä is called the transpose of A. The transpose of Ä is clearly A. Multiplication Multiplication of A by a scalar λ multiplies every element of A by A, i.e. X(a{j) = {λα^). The definition of the multiplication of a matrix by a matrix may seem a little peculiar at first sight. It arises quite naturally, however, in the transformation from one coordinate system to another. Consider a very simple case of a transformation from the (xvx2)-plane to the (y lf ;y 2 )-plane given by y±
=
Û>\\X\ ~T~ # 1 2 ^ 2
y2
=
# 2 1 ^ 1 ~T~ # 2 2 X2
This transformation is characterized by the matrix I an
a12
\ #21
#22
which is called the matrix of the transformation. A second transformation from the (yvy2)-pla.ne to the (^,^)-plane is h = *11>Ί + H = &213Ί +
h 12J2 h
22y2
with matrix è
B =
ll
b2i
b
12
b22
and yields a direct connection between the (xvx2)Z
and the ( z ^ ) - p l a n e , viz.
l = M # l l * l + #12^2) + M#21*l + #22*2) = (b11a11 + b12a2A)x1 + (bna12 +
Z
2 ~ =
b12a22)x2
*2l(Äll^l
i #12^2)
1 ^ 2 2 \ # 2 1 Λ ' 1 ~Γ # 2 2 ^ 2 /
(Mu
^22Ä21/*1
1 \ ^ 2 1 # 1 2 T~ ^ 2 2 ^ 2 2 ) ^ 2
+
1.1.
253
DEFINITIONS
The matrix of the last transformation is I ^11^11 i ^12^21
^11^12
i~ ^12^22
\
^21^12
'
^22^2
and is obtained from the transformation A followed by that of B, or briefly, the transformation could be denoted by BA. Hence we can think of C as being equivalent to the product BA, i.e. b
U
^21
h
Ä
12 \ K l α
α
^22 / \ 2 1
/δ11«11 +
12\ _
a
22/
\*21 ll +
&12α21
δ
11 Λ 12 +
δ
α
^22^21
12 Λ 2
^22 α 2
&21 12 +
The essence of this equation is generalized in the following definition: Let A be an (m x n) matrix and B a (p x q) matrix, then the product AB is defined only when n — p. When this condition is fulfilled the matrices A and B are said to be conformable. The product of two conformable matrices A = (a-), B = (b^) of orders (m x n)f (n x q), respectively, is defined to be a matrix C = (c^) of order (m x q), where n
Cij = anbij + ai2b2j + . . . + ainbnj = ^
(1.1)
aikbkj
k=\
a process of multiplication which is conveniently referred to as the multiplication of rows into columns. As an example suppose that A is a (2 x 3) and B a (3 x 2) matrix, say ^4 = ( ^11 ^12 V*21
Λ
22
Λχ3 Ä
23,
1
B =
Then A and 5 are conformable and the general element c- of the product matrix C [of order (2 x 2)] is, from (1.1) Cij = anbij + ai2b2j + alSb3j i.e.
,4£ = C =
a Λ
lA l +
Λ
12 δ 21 +
α
ΐ3^31
Λ
21^11 ι ^22^21 ~Γ 23^31
Λ
11^12 +
Λ
Λ
12 δ 22
21^12 ~Γ #22^22
+ ι ^23^32 y
Notice that even if the product AB exists, the product BA need not. For if A B exists then, being conformable we may suppose that A is of order
254
V. MATRICES
(m x n) and B of order (n x q) so that, in the order BA they are not conformable unless q = m. Thus both AB and BA are defined if, and only if, A is an (m x n) and ß an (« x w) matrix, and even then can be equal only in certain circumstances, and never equal when m Φ n; for AB is an (m x m) whilst BA is an (n x n) matrix and equality of matrices demands that they should be of the same order. Nor must it be presumed that if A and B are square (n x n) matrices that AB = BA. A simple example shows this to be false. Let
A=(Λ then
1
0\
/0 1 Β= \ 1 0
0 0/ ,
0 Λ£= I 0
Λ
1\ 0 /),
and
/0 £.4 - \ 1
0 0
ΑΒφΒΑ.
Clearly then the order of multiplication of matrices is important and only certain types of matrices are commutative, i.e. satisfy the equation AB = BA. To be precise we say that, in the product AB, B is premultiplied by A, or alternatively that A is postmultiplied by B. Extended product of matrices
Assuming that, in the order Av A2, A3,■.'.., Ap two adjacent matrices are conformable, it is easy to define the continued product AXA2.. .Ap by induction in the following way. Let A x be of order (m x n^, A2 of order ini X n2),... Ar of order (nr_t x nr),..., and Ap of order (np_x x n). Then it can be shown that Αλ(Α2Α3) = (A1A2)A3 and so either of these can be taken to be AXA2A^\ this single product matrix is then multiplied by AA and so on. The final matrix is of order (m X n). ΙΐΑχ = Α2= . . . =Ap then we have defined Ap. Transpose of a product
Let
(cii) = C = AB = (aij)(blj) n
then
dj = ^
aikbkj.
255
1.1. DEFINITIONS
Denote by α^ the elements of the transpose Ä of A, and similarly for B and C, so that ä(j = α^ etc. Then n
en = £
n
Ojkbki = ^Λ hkäkj
i.e.
C = BÄ.
Thus (ABC) = C(AB) == CEÄ
and in general if P = AXA2..
.^
then
P = Ap . . . A2AV Square matrices
Square matrices occupy a special place in matrix algebra. Later, we see that only square matrices possess reciprocal matrices. There are several important types of square matrices. (i) Diagonal matrix. If all the elements of a square matrix A = (atj) other than those on the principal diagonal are zero, i.e. if a{j = 0 for i φ /, and not all aü zero, then A is called a diagonal matrix, and is denoted by diag. (au a22 . . . a (ii) Scalar matrix. If A is a diagonal matrix, and if aü = a for all i, then the matrix is called a scalar matrix. (iii) Unit matrix. If A is a scalar matrix with all the diagonal elements equal to unity then it is called a unit matrix and is denoted by / . The scalar matrix defined in (ii) is clearly al. Since there exists a unit matrix of order (n x n) for all n ^ I we can form the products IB and BI for any matrix B. It is easily verified that IB = BI = B Thus / commutes with any matrix. Furthermore, since Ip = I for p a positive integer, it is apparent that / takes the place of unity in- matrix algebra. Examples on the product of matrices
1. We consider the properties of the so-called Pauli spin matrices, viz.
256
V. MATRICES
which are
/
so that
0\
.
/ - / 0'
axGy + σγσχ = 0,
oxay — σγσχ = 2jaz.
2
σχ = I = ay2 = az2
In addition, 2. Let 1 0
]/2Mx=
1
;
2
Then
2{M
]/2My
2
+ My + M ) =
Ί Again
2MxMy = ( 0 j
whence
2
'2 0 0 ( 0 2 0
0 -/' 0 0 0 -j
2(MxMy — MyMx)
i.e. MxMy — MyMx = jMz. Partitioned matrices
It has been tacitly assumed that the elements of a matrix are numbers, real or complex. However, the elements of a matrix could themselves be matrices, and the rules for addition, subtraction, and multiplication could be formulated. The composite matrix is called a partitioned matrix, and the matrix elements are called its submatrices. Addition ^11
^12\
„
/^ll
^12
where A{; and B- are submatrices of the same order. Then IJ
IJ
Ä =
Mii±*u
A12
\^21 ± #21
^22 ± #22,
The definition is readily generalized.
±B12
257
1.1. DEFINITIONS
Multiplication
AB =
^11^11 + ^12^21
^11^12 + ^ 112^22 2^!
^ 2 1 ^ 1 1 ~^~ ^22^*21
^21^12 "Γ" ^ 2 2 ^ 2 2
provided that the products AikBkj are defined. These concepts are often used in technical literature and are included here for the sake of completeness ; they are not used later. Examples on partitioned matrices
1. Using the σ spin matrices we define ((0)
(0)
ox
(0) (0)/'
(0)/'
\<>x ( 0 ) / '
β =
(0)\ (0)
where the null and unit submatrices are of order (2 x 2), then laxay whence
(0)\
«^ = [(0) o*,y a.x
(σνσχ
(0)
**** = [ m a*, α.χβ + βχχ = (0)
α,« = I = α,« = a,2 = /?2
and 2 3 1 -3 -1 2 4 -2 3
0 1 2 3 4 5 5 3 1
1 X 0 + [2 3]{3 5} { - 3 4} ( 0 ) + / - l 2 -2 3
[1 2 ] + [ 2 3 ] / 4 5\ {-3 4}[1 2 ] + / - l
2\/4
Elementary operations
The truth of the following statements is verified by means of examples, but no proof is given.
258
V. MATRICES
(a) Let In(k,l) be that matrix derived from the unit (n X n) matrix by interchanging the kth and the /th rows. Then premultiplication of a conformable matrix A by In(k,l) produces a matrix which differs from A in that the £th and /th rows of A are interchanged. The procedure can be generalized to effect the interchange of more than two rows. (b) Similarly, any permutation of the columns of A can be effected by postmultiplying A by a matrix derived from / by the same permutation of its columns. Exercise 1
(a) / 8 (1.3)(α ί; ·)=
(»)
/0 0 1 0 1 0 1 0 0
(ai;-)/3(l,3) -
*11
"12
*31
"32
a21
*22
(c) Let In(k + λί) be the matrix derived from the unit (n x n)-matrix by adding λ times the /th row to the &th row. Then premultiplication of A with In(k + λΐ) produces a matrix which is derived from A by adding λ times the /th row of A to the &th row. This result is easily generalized. (d) If A is postmultiplied by a matrix derived from the unit matrix by adding a constant multiple of one column to another column, then the product matrix reflects a similar operation performed on A. (c)
(l 1 00 λ\λλ Ian η al2 \ 0 1 0 ] 1 «21 *22 0 0 ly ^32
Γ
'W
(ί/)
a13\ α
23 I ~
Ä
33/
/αη + λα31 I
A
\
α
α12 + λα32
21
Λ
31
Λ
a13 + λα,
22
α
23
32
Ä
33
faΛn η α12 / 1 μ λ\ / Λ Π μ<ζη + «12 λαη + α13 12 βα13 « \\ α21 α22 «23 1 ! 0 1 0 I = I α21 μ# 21 + α22 λα21 + α23 \α31 μα31 + α32 λα31 +α33/ α31 α32 «33/ κ0 0 1 /
The operations (a), (b), (c), (d) are called the elementary operations.
259
2. DETERMINANTS
2. Determinants
Before defining the operation of division it is necessary to introduce the idea of a determinant. Thus the determinant of a square matrix A = (a^) is denoted by «In
*n
#2n
\A\ = #nl
Λ„2 · · ·
a
nn
and is defined as the following summation
= Σ ± K- OLZjdzk the meaning of which is now discussed. Notice that the row suffices appear in normal order while the column suffices i,jtk,... (n altogether) appear as some permutation of the normal order 1, 2, 3 , . . . , n. If this permutation is even (cyclic), i.e. one which can be derived from the normal order by an even number of interchanges of adjacent suffices, then a positive sign is prefixed to this term in the summation. If, however, the permutation of the column suffices is odd (anticyclic), i.e. one derivable from the normal order by an odd number of interchanges of adjacent suffices, then a minus sign is attached to that term of the summation. The summation, therefore, extends over n ! permutations of i,j,k,... half of which are even and half odd. This is a formal definition of a determinant and the reader who is unfamiliar with the manipulation of determinants can take comfort from the fact that the definition is seldom used in practice for evaluating a determinant. The definition is, however, used for proving the properties of determinants. To clarify the definition we evaluate a third-order determinant, \A\ =
ΛΛΛ
U-to
#1?
*21
*22
*23
= £,
± #1^2/03*
The summation extends over 3! = 6 permutations of 1,2,3. There are three even permutations, viz. (1, 2, 3), (2, 3, 1), (3, 1, 2) yielding the positive terms, and three odd permutations, viz. (3, 2, 1), (2, 1, 3), (1,3, 2) yielding the negative terms. Hence — #11 Ä 22 Ä 33 H" Λ 12 Α 23 Λ 31 ~Γ" Λ 13 Λ 21 Λ 32
( Λ 13 Α 22 Λ 31 "Γ" Λ 12 Λ 21 Α 33 "Γ" Λ 11 Λ 23 α 32)
260
V. MATRICES
Properties of d e t e r m i n a n t s
(I) The determinants of a matrix and its transpose are equal. (II) The interchange of any two rows changes the sign of the determinant. (III) If two rows of a determinant are identical then the value of the determinant is zero. (IV) Multiplication by λ of all the elements in a row of a determinant, \A\, yields a determinant of value λ\Α\. (V) The addition of a constant multiple of a row to the corresponding elements of any other row leaves the value of the determinant unchanged. Because of the first of these five properties, the remaining four remain valid when column replaces row. For the sake of completeness we give a brief outline of the proofs of these theorems. (I) Since \A\ = Σ ± ( Λ ^ ^ - % . . . ) , then \Ä\ = Σ ± Κ ι Λ ; Α 3 · · ·)· I x i,j,kt... are now arranged in normal order then \Ä\ = Σ ± {αηα2μ- · ·) sav > where if (i,j,k,...) is even (or odd) permutation of ( 1 , 2 , 3 , . . . , « ) then (λ,μ,ν,...) is also even (or odd). (Π) ZJ
±
(aUa2f
· - -
~
= t ( a l » Ä 2 / · · · <^rp^{r+l)q . . . ) = —
2^
\A\
i.e. the interchange of two adjacent rows (known as inversion) changes the .sign of the determinant. It can be verified that 2^
± (aUa2j
· · . f l ( f + / ) s . · 'arp- · · ) = (— l )
2
' "
1
= {-l)*-l\A\
^
± (aUa2j.
· · «f/r. · ^(r+t)s-
· ·)
= - \A\,
i.e. the interchange of the rth and the (r + /)th rows requires (2t — 1) inversions. (Ill)
\A\ = Σ
±
(aua2j...arp...arq...)
is the determinant in which two rows, say the rth and the (r + /)th rows are identical. Interchanging these two rows leaves the determinant unchanged, but by (II) changes the sign of the determinant. Hence \A | = — |^41,
and so
\A \ = 0
2.
261
DETERMINANTS
(IV) This follows from Σ (V)
Σ±
± {aUa2j · · · (farp) - - ·} = ^ Σ Since 27 ± {aua2j...
(aua2j.
(α^ +
. .arP. . .
± (Ä1*'Ä2; - - · ^ . . .) =
fa{r+t)q).. t -a 2 /
λ\Α\.
.} is equal to
· · -a(r+t)q-
· .fl(r+*)<7- * 0
= \Α\ + λΟ
from (IV),
the theorem follows. N o t e the relationship between these theorems and the elementary operations mentioned earlier.
2.1. Evaluation of determinants a. Expansion by rows The coefficient of an
in \A\=
^
± (Äl**2/*3*...)
is Σ ± (&9jaw · ·) w n e r e (j,k,...) is some permutation of the numbers (2, 3 , . . . , w). This last summation is, therefore, some determinant of order (n — 1) and is derived from \A\ by omitting the first row and first column. To find the coefficient of alr in the determinant \A\, bring alr by a series of (r — 1) inversions into the first column. By (II) the sign of the determinant is (— l ) r _ 1 . Using the above result derived for the coefficient an, we see t h a t the coefficient of alr in the expansion of \A | is (— l ) r _ 1 times the (n — l)st order determinant obtained from \A | by the omission of the first row and rth column. Such a determinant, omitting the factor (— l ) ' - 1 is called the minor of alr and is denoted by Mlr. Generalizing, the minor M·· of at·· is obtained from \A | by omitting the zth row and the / t h column, and (— l)*' H - 2 M i ; · is the coefficient of a^ in the expansion of \A\. In the expression « i i ^ u + ( - I ) 2 _ 1 « i 2 ^ i 2 + ( - 1 ) 3 - S 3 ^ i 3 + . . · + ( - l)n-lalnMln
(2.1.1)
each of the minors contains (n — 1) ! terms and so altogether there are (n)(n — 1)! distinct terms which occur with the same sign as in \A\, i.e.
262
V. MATRICES
(2.1.1) is equal to \A\. Often (2.1.1) is written in terms of cofactors, where the cofactor A{- of a{- is the signed minor of a^, namely An = ( - l)*+>-2Mt7 = ( -
\)^ΜΨ
Then (2.1.1) becomes n
\A | = anAn
a
+ a12A12 + . . . + ainAln = J£
\rAlr,
r=l
and is the expansion of the determinant by its first row. The expansion by the ith row is thus n
\A | = anAn + ai2Ai2 + . . . + ainAin = J>J airAir
(2.1.2)
and similarly the expansion by the yth column is n α Α
\A | = aijAij + «2/^2; + . . . + anjAnj = Σ
ηπ
(2Λ.3)
Reversing the process, we see by comparison with (2.1.2) that the expression ^ l ^ n + #t2^41? -f- . . . -f- ainAin = £
(2.1.4)
airA\r
where i is any integer between 2 and n, is equal to the determinant an
aii · · · #w
α
Λ
21
22 * · * Ä 2n
an
ai2 . . . tfiM
#nl
a
n2 . . . an
which is zero, from (IV), since the first and the *th rows are identical. The general result, of which (2.1.4) is a particular case, is A UnAki
+
A ai2Ak2
+
A . - - + üifAkn
V = JZJ
A airAkr
\ 0 — ] i i
H .r
T=
ky^t .
(2.1.5)
2.1. EVALUATION OF DETERMINANTS
263
Similarly, by expanding by columns instead of rows, we deduce W
f
if if
(2.1.6)
In the cases k Φ i, k Φ j in (2.1.5) and (2.1.6), respectively, the summation is called an expansion in alien cofactors. (These two equations are important when we come to consider the reciprocal of a matrix.) In the cases k = i, k = / the determinant \A | of order n is expanded into a sum of n determinants of order (n — 1). b. Pivotal condensation
This method of evaluating a determinant makes a systematic use of the properties already proved. The method is, by the subtraction of multiples of a fixed row from all the others, to produce a determinant with all the elements of a column, except one, equal to zero. Choose any non-zero element of the determinant as the pivot. By interchanging rows and columns, if necessary, this pivotal element can be brought into the leading position with, at most, a change in the sign of the determinant. In fact, the effect of bringing a{j into the leading position is to multiply the original determinant by (— l ) t + ; . Thus without loss of generality, we may assume that an φ 0 and take this to be the pivotal element. Divide the first row of \A\ by an to give [see (IV)] 1
a
l2Kl
a
izlail · · ' tfl«Kl #2n
*23
\A\ = ar dm
an2
<*n3
Carry out simultaneously the operations R{ — aa Rv 2 ^ i ^. η, (where R{ stands for the ith row). This series of operations leaves the value of \A\ unchanged by (V), and so
\A\ = ar
1 0 0
a
\l\aYL
als/au .
K%
b
&32
Ô
23 33
Vn3
· «
• «lnMii
• •
hn hn
264
V. MATRICES
where ^22
== Ä
22
( α 12 Λ 21// α ΐ1»
^32 ~
Λ
( Λ 12 Α 3ΐ)/ Λ 11'
32
and in general bij =
dij — (ΛΙ,·Λ,·Ι)/ΛΙΙ
1
=
an
axj
= Ci//an
α,ι α,·;·
say,
for 2 < * < w, 2 < / < «. |i4| = ( l / a u ) — 2 | ^ - |
Thus
where |c t; | is single determinant of order (n — 1). This method is obviously more satisfactory than (§ 2.La), Example
\A\ =
0 6 13 2 5
5 -4 -3 14 6
3 11 2 -6 7
-7 9 8 1 4
2 -5 10 4 3
The element au in the fourth row, fourth column, is taken as pivot, but to illustrate the theory and make the arithmetic less heavy, the determinant is simplified by performing successively the operations R2 — R3, R3 + Rv R1 + 2R5, where R{ + kR^ stands for the operation of adding k x ; t h row to the original tth row to form a new *th row. Then 17 9 5 -6 7
= ( - I) 4
17 -1 2 14 6 23 15 11 31
10 -7 13 2 5 3 -15 -12 -50
1 1 1 4 8 -9 11 -3
8 -15 12 4 3 4 19 8 13
the determinant being reduced in order by the operations Rx — i? 4 , R2 — R^, R3 - R„ R5 - 4tf4. Taking Cx + C 2 , C 2 + C 3 , C 3 - C 4
\A\ =
26 0 -1 19
11 -24 -1 -53
4 10 3 10
4 -19 8 -13
265
3. RECIPROCAL OF A SQUARE MATRIX
and, with a31 as pivot 82 10 -47
-15 -24 -34
\A\ = (_ l)3+i ( _ x
212 -19 -165
-15 = (-i) 24 -34
82 10 -47
-4 -1 44
the last step being effected by taking C3 — 3C2 — 2CV Using a23 as pivot, the order of the determinant can be further reduced giving ΜΙ =
81 42 1090 393
(-ΐ)2+3(-ΐ)(-ΐ)
= -77,613
The reader is advised to work through the steps in detail.
3. Reciprocal of a square matrix
Let A = (a|;) be a square matrix of order n and determinant \A\. We define a related matrix called the adjoint (adjugate) of A which is denoted by Â. Thus, if A- are the cofactors of aif- in \A\ for all i and /, then L
ll
. . Ani
A„ l
22
l
2n
A =
= (AH) = (Ätj) A3
..A.
Consider the product of A when postmultiplied by Â, and suppose
AÂ =
P=(pii)
then, from the definition of multiplication in § 1.1, n
pij = aaAji + ai2Aj2 + . . . + ainAjn = ^
airAjr.
Comparing this with (2.1.5) we see that Pa =
0 if \A | if
χφ] i= j
or alternatively, since pu = \A\, all the elements of the principal diagonal of P are each equal to \A\ and all the other elements are zero. Thus AÂ is a scalar matrix which can be written as \A \I, where / is the unit (n x n)
266
V. MATRICES
matrix. In a similar way the reader can verify using (2.1.6), that A A = \A \I, and so (3.1)
AÂ = ÂA = \A\I.
If \A | = 0, then A is said to be a singular matrix, and the products of (3.1) are equal to the null matrix. If \Α\φ 0, then A is said to be nonsingular and in this case we can divide throughout by \A\ in (3.1), thereby defining a matrix A~x as A-* = (AH\\A\). The reason for this notation is immediately apparent since now (3.1) is replaced by A-^A = AA-1 = /
(or A0 = I)
i.e. A~x is a matrix which can be interpreted as the reciprocal of A. Note that A'1 is defined only when the matrix A is square and non-singular. We are now in a position to define the operation of division. Division
If A is a square, non-singular matrix, then A~XB is the operation of predivision and BA~X is the operation of postdivision of B by A. The two quotients exist provided that the matrices A-1, B and B, ^4 - 1 are conformable. Even when A~XB, BA~X both exist they are not necessarily equal, e.g. if A is an (n x n) and B an (n x q) matrix, then Α~λΒ is defined but BA~X is not defined unless q = n. We conclude this section by noting that the reciprocal of a continued matrix product obeys the same reversal rule as that which holds for the transpose, viz. (ABC ... PQ)-1 = Q-^P-1...
C-^-M"1.
This can be proved inductively by noting that if AB = C then B^A^ABC-1
= B^A-KC-1,
i.e.
C" 1 = B^A-1, A'
1
since 1
A =B- B
E x a m p l e on continued product and reversal rule
'-(! X X:i)(« I)
= CC-1 = I.
3.1. DETERMINANT OF THE ADJOINT MATRIX
167 35\ ,119 25/
then
4 15
A-* =
I W / 5 -1\-V3 4/ \11 - 2 / \4
-15
whence
4/i-ll
2.5 -11.9
-
2.5
-
A-1 A
5/1
3.5\ 16.7/
11.9
267
nr
: -t
-1
3.5W167
35
16.7/\119
25
2
1 -1
It is also readily verified that Ä =
/4
15
\l
4
X
5 11 - 1 - 2
167 35
119 25
either by direct multiplication or from (i) above. 3.1. Determinant of the adjoint matrix
It should be mentioned that it can be deduced from the previous section that the determinant of the adjoint matrix |^i| whose elements are the cofactors of a^ in \A\, is equal to |^4| w-1 . Defining the determinant of a product of square matrices to be equal to the product of their determinants it follows from (3.1) that if \A\ ψ 0
O MPI
o
= \A\
and so \Â\ = |^4| n_1 . (The equation is trivial if \A\ — 0.) A peculiarity of matrix algebra is that AB — (0) does not imply that either A = (0) or B = (0) ; for example, consider
(o
o)(o
o)=(o
oH°>-
268
V. MATRICES
4. Solution of simultaneous linear equations Inhomogeneous equations
Consider n linear, inhomogeneous equations in the n unknowns xvx2,.. xn> say alxxx + a12x2 + alsx3 + . . . + alnxn = hx ] a21xx + a22x2 + a23xs + . . . + a2nxn = h2
·,
(4.1)
anlxx + an2x2 + an$xz + . . . + annxn = hn Denote by A the matrix (at;) of the coefficients in (4.1); by X the column matrix {xxx2.. .xn}, and by H the column matrix {AjA2...An}. Then, by the rule for multiplication (4.2)
AX = H
is a concise representation of all the equations (4.1). If A is a non-singular matrix then A~x exists, and premultiplying (4.2) by A~x gives A~1AX = A-1H i.e. since A~lA = I and IX = X X = A~1H or
(4.3)
{*!*2 ...*„} = M | j {/*A ... K)
a matrix equation which gives the values xvx2, · . . , xn, which are the solutions of (4.1). Thus, for 1 < r < n n
Xr = ( V l r + M Î T + . · ■ + KAnr)\\A\ = ( ^ M i r ) / | 4 |
(4.4)
If A1,A2,..., hn are replaced by alr,a2r,.. ., anr, respectively, then from (2.1.6) the numerator of the right-hand side of (4.4) is the expansion of \A\ by the rth column, from which it follows that n Σ
k A
s sr
4 . SIMULTANEOUS
269
EQUATIONS
is the expansion by the rth column of a determinant derived from \A\ by replacing the rth column of \A\ by H = {hxh2.. .hn}. For brevity, denote by A$ the matrix derived from A by replacing its rth column by H, then from (4.4) for
xr=\AHW\l\A\
l
This is Cramer's rule. For example, if n — 3, then the rule gives
K
*12
hx
*13 X9
Ht *32
*33
=
ax Xo
\A\
1
*1 *22
where in this example \A\ =
*12 *21
*23
Cramer's rule, although giving an explicit solution in determinantal form, is not so useful in its application to the numerical solution of equations with a large number of unknowns due to the labor involved in evaluating determinants. Later, in § 4.1, a more suitable method is given. Homogeneous equations In this case the matrix H is a null matrix and the n equations of (4.1) reduce to aixxx + ai2x2 + · · · + OLinxn = 0,
1< i <
(4.5)
From Cramer's rule, if \A\ φ 0 then all the xr = 0 since \A$\ contains a column of zeros. Hence the necessary condition for non-trivial solutions of (4.5) is that \A\ = 0. In this case at least one of the set χνχ2, · . ·, xn is non-zero. Without loss of generality we may suppose xn Φ 0, and then the equations (4.6) can be reduced to the inhomogeneous case by dividing all the equations by xn. If we substitute yr = xT\xn 1 ^ r ^ {n — 1), (4.5) becomes Λ*1)Ί + ^ 2 y 2 +
· · · «»(n-l)y(H-l)
=
!<*<(»-!)
(4.6)
(omitting the equation for i = n) a system of (n — 1) inhomogeneous equations in (n — 1) unknowns which can be solved by previous methods provided
270
V. MATRICES 01(»-1) *21
*22
^2(n-l)
^o #(n-l)l#(n-l)2 · · · #(n-l)(n-l)
If, in the final equation of (4.5), viz. that arising from i = n, the values for yr as found from (4.6) are substituted, then the left-hand side is merely the expansion by the nth row of the determinant \A | and the equation reduces to the assumed condition, \A\ = 0. 4.1. Choleski-Turing method
Of all the methods for solving systems of linear algebraic equations involving a large number of unknowns, we shall be content to describe just one—the Choleski-Turing method. This method makes use of two types of triangular matrices, namely an upper triangular matrix U, and a lower triangular matrix L. Thus U is defined as a matrix whose elements below the principal diagonal are all zero, and L as a matrix whose elements above the principal diagonal are all zero. In the particular cases when the elements in the principal diagonal are all equal to unity, the other non-zero elements being arbitrary, attention is drawn to this fact by denoting U and L by U(l) and L(l), respectively, and referring to these special matrices as unit upper and unit lower triangular matrices, respectively. The system of equations %\ T~ ^12-^2 i ^13-^3 l H +
U
2ZXZ +
· · · "Γ -· - +
w
l n # n = R>\
U2nXn =
k2
Xn — Rn
can therefore be represented by a matrix equation of the form U(l)X = K
(4.1.1)
where X = {xxx2' · ·%η}, Κ = {^2- · -^η}· The method of solution of such a system of equations as (4.1.1) is immediately apparent. Thus xn is found from the last equation, xn_x from the penultimate equation (since xn is known), and so on until, knowing x2,xz>..., xn it is possible to find xx from the first equation. This process is known as back-substitution. Sim-
4.1. CHOLESKI-TURING METHOD
271
ilarly a system of equations L{\)X = K can be solved by forwardsubstitution. The idea of the Choleski-Turing method is to replace the system of equations (4.1) viz. with
AX = H
\A\^0,
(4.1.2)
by an equivalent system of the form (4.1.1), i.e. U(1)X = K. This is effected by expressing A as the product of a non-singular matrix L and a unit upper triangular matrix U(l), i.e. A = 117(1), so that (4.1.2) now becomes LU(l)X = H. Using (4.1.1) we can write the last equation in the form LK = H and so the column matrix K can be found by forward substitution. Whence, from (4.1.1) the column matrix X is found by backward substitution, giving the solution of (4.1.2). The basis of the method is contained in the theorem. Theorem
(i) If the discriminants Am (1 ^ m ^ n) of an (n x n) matrix A are non-singular, then there exists a unique lower triangular matrix L and a unique unit upper triangular matrix U{\) such that A = LU (I). (The discriminant Am of a matrix A is the square array formed from the first m rows and m columns of A.) (ii) The matrix L can be uniquely expressed as the product of a unit lower triangular matrix <$f(l) and a diagonal matrix D = diag. (dxd2.. .d%n), and hence A = &(1)DU(1). Proof. Let hi L =
I
hi ^31
^22 ^32
^33
*wl
ln2
InS
272
V. MATRICES
and «13 · . . U\n
«12
o
1/(1) =
1
«23 ' . . W2„ . - Uin
1 .
Forming the product LU (I) and equating to A we show that it is possible to find unique values for l^ and «t·.. Multiplying the first row of L into U(l) and equating to the corresponding elements in the first row of A gives a
il
Λ
12
^11 ~~
and
Λ
ni«12 ~
^11«13 =
12»
(4.1.3)
or briefly 2 2
uij=aijlan,
(4.1.4)
on using (4.1.3), since by hypothesis a n — |zJa| 7^ 0. Multiplying the second row of Z, into £7(1) and equating to the corresponding elements in the second row of A we have ^21 = <21«13 i ^22«23
=
^23»
^21«12 ~r" ^22 = = ^22
^21»
^21«14 ' ^22«24
= r
(4.1.5)
^24 > · · · > ^21«l n "i ^22«*-w ~ ^ 2 Μ
where the last system of equations can be written briefly as ^21«l? + ^22«2; = #2;>
3 ^ / ^ W.
(4.1.6)
Thus /21 = a21 from (4.1.5) and then l22 = a22 — u12a21, or on substituting for u12 from (4.1.4) ^22 = ( α 22^11 — Λ 1 2 α 2 ΐ ) / α 1 1 ·
ί4·1·7)
Now the equations (4.1.6) are simple equations in the unknowns «2- since ^2i> ^22' «l; a r e known from (4.1.5,7) and (4.1.4) respectively. Thus u2j =
α2ι·αη — α2ίαυ ^11^22
^21^12
=
α^αη — Λ2ιαΐ; r-τη 1^21
the denominator being non-zero by hypothesis.
Ζ 22 =|Λ|/|ΛΙ#0
,
ο^ ·^ „
ό ^ ] ^ η .
Also from (4.1.7)
(4.1.8)
273
4.1. CHOLESKI-TURING METHOD
We have now found all the elements in the first two rows of both the matrices L and U(\) (incidentally showing that they are unique), and have proved that l22 = |^J2|/j^d1J. Proceeding in this way all the elements /· ; and u{can be determined uniquely and it can be shown inductively that
(ii) It is now easy to prove the second part of the theorem. This depends on the obvious identity
I ^3ifin ~ I hlfill Inlfiw
hzfivz 1 W^22 W^33 Itâfivz
^sfisZ
i.e. L —- J?(1)D where D = diag. (lnl22- · -^««)· Since L is unique so also are JS?(1) and D. This completes the proof of the theorem, i.e. A = LU(l) = SC(l)DU(l). Note on the theorem. It should be noted that the conditions imposed on the Am in the theorem need not be verified before the method is applied. If the conditions are not satisfied, then at some stage of the calculation of L a diagonal element lü = 0 is found, thereby showing the stage at which the method appears to fail. However, when this stage is reached with the ith row it is only necessary to interchange this with some subsequent row, i.e. to take the equations in a different order. The method proceeds in the usual way until a further lkk = 0 is found when this row is interchanged with a subsequent row. It is seen from the following example that the numerical work involved in arriving at an lü = 0 is not wasted. E x a m p l e on t h e Choleski-Turing m e t h o d
To solve the simultaneous equations 34
(1)
#5
-58
(2)
3 * x + 11* 2 + 13*3+ * 4 —12* 5
8
(3)
— 1 0 ; ^ + 2x2 — 7x3 — 9%4 +
274
V. MATRICES
χχ + 23*2 + 23*3 + 4*4 — 26% =
2
(4) (5)
TABLE I
L 5
— 10
-1
5 5
5
5
/22 = 0; Interchange (2) and (5)
5 58 — 5
1
116 44 —
1
6
0
3
-10
2
0
0 22
— 3
O
ά
0
-1
0
13
0
6
0
0
0
0
1 1 — 3
0
0
0
0 1
7
2
75
63
— 22
22
22 0
0
0
0 0
1
4
0
0 0
0
L
4 7 -3 -2
0 6 5 52
361
1
5
23
1
3
2
6
5
5
5
5
2
7 2
63
75
22
22
0
0
1
0
0
0
0
1
0
0
0
0
1
4
0
0
0
0
349
~ΊΪ2 21
3
1
1 -12
8
-2
13
34
1
13
2
2 189 22
3
349 ~~~22~
1
-
349
2
4 -26
23
409
1 -58
-9
5
4 5
AM
0
19
-1
17(1)
0
5
34
II
1
52
6
3 - 2
2 -7
1
0
T
11
-10
~349
0
22
-1
3
0
5
168
-1
2
TABLE
5
5-1
X
K
0 /44 = 0; Interchange remaining two equations. 349
116 44 — 5 3
H
A
17(1)
361 349 1
5 -1
3
-2
6
4 _2
2
-4
9
7
9
15
-3
-3
-3
11
8
-12
-2
-2
34
31
-8
-28
275
4.1. CHOLESKI-TURING METHOD TABLE
0 6
-1
5
-10
0
5
5
0
1
Y
0
0
0
1
0
0
0
0
1
4
0
0
0
0
0
1
0
0
0
0 349
-1
116
1
7
5
0
Y
5
~YT
44
Y
0
1
Y
10
1
41
34
~22
22
4
L 0
4 7
5
0
1
Y
0
0
0
1
0
0
0
0
1
4
0
0
0
0
~Y
0
0
0
52
22
Y
0
19
349
168
-2
Y ~~ΥΓ 21
-8
1
349
~~ΥΓ
4
11 -2
27
28
16
-15
-24
-23
47
51
25
24
IV
5
1
5
-10
1
7
0
52
-3
12
-5
-3
14
3
~349
4
0
5
-2
-1
5
7
A (CR)
U(l)lR)
0
6
4
5
Y ~Y
TABLE
5
11
4
0
22
58
3
A
l/<*>
L 5
III
1 10
11 5 1
3
~Y
41
34
~~22
22 12 ~349 1
5
4
7
5
11
4
2
4
0
9
7
16
31
28
25
-3
8
16
4
2
-2
32
63
55
27
Method (i)
If au Φ 0 as in this example take the equations in given order. First element of L = first element of A.
(n)
Write down first row of U(l) = (First row A) -r- an.
(hi)
Calculate second row of L. It is found that l22 = 0. Interchange equations (2) and (5) and proceed. Do not reject the values calculated in the previous second row.
276
V. MATRICES
(iv)
Continue with calculation of the new second row.
(v)
Note that the elements in the two rejected rows in which lH = 0 reappear in their new position, in this instance, in the rows 4 and 5. Explanation of tables and method of checking T A B L E I I . Matrix ÜC) = (/£>)
This m a t r i x is formed in t h e following w a y : /(C) - / hi
=
hj + hj
i.e. t h e element in t h e second row / t h column is t h e s u m of t h e t w o elements in t h e first row / t h column a n d t h e second row / t h column of L. Similarly hj = hj + hj + hj
and in general
i Hj
The product L{C)U(l)
=
,
hj
= A{C) where A{C) = {a^) a n d (C)
ciij
e.g.
/
i
V
= JSJ k=l
akj
a{$ = a13 + a23 + α 33 + a 4 3 = 3 — 1 + 13 — 7 = 8. T A B L E I I I . T h e m a t r i x U{R) = (u[R)) is formed in t h e following w a y :
k= l
where «t-· is t h e general element of U(l).
Thus, for example
(R) , 63 75 Λ Λ ^35; = «ai + «32 + «33 + «34 + «35 = ° + 0 + 1 ~ ^ + £2 T h e product LU{R) is equal t o A { R ) where A { R ) = {a{R)) a n d (R)
dij
e.g.
=
i V JSJ
k=l
aik,
aif = 3 + 11 + 13 + 1 - 12 = 16.
=
34 22
277
4.1. CHOLESKI-TURING METHOD TABLE
IV L(C)U(R)
=
i
where
a^
^(CRU
A(CR) =
= ^
j
^
P=\
aPq.
q=\
For example 3
R)
4
3
α
a& = Σ Σ Ρ< = Σ p=\
q=l
«n = 5 - 5 + 28 = 28
(6)
p=l
or alternatively, a{^R) is equal to the sum of the elements in the (3 X 4) matrix formed from A by taking the first three rows and four columns. Example. Suppose the first three rows of L and £7(1) have been found and it is thought necessary to check. By Table II. The third row of L{C) into each column of £7(1) should give the elements 4?» αΏ > αΏ > au> αΏ7-1 = 4 ? ;
7 ( - 1/5) + (52/5)(l) = 9 = a$ = - 1 - 1 + 11
7(3/5) + (52/5)(1/3) + (22/3)(1) = 15 = 4 ^ = 3 - 1 + 13, etc. By Table III. The third row of L into each column of U{R) should give the elements 4 ΐ ) · · · α 3 5 ) e-£4 ? = 3 · 1 + (58/5)(10/3) + (22/3)(- 41/22) = 28 = 3 + 11 + 13 + 1 ; 4£> = 3(11/5) + (58/5) ( - 1/6) + (22/3)(34/22) = 16 = 3 + 11 + 13 + 1 - 12. By Table IV. Multiplication of the third row of L(C) into u{R) should given the elements a^{R)... af5R) e.g. 3
4™> = 7(11/5) + (52/5)(- 1/6) + (22/3)(34/22) = 25 = ]T a™
cf. (6)
= 1 1 - 2 + 16 = 25. Normally L and £7(1) are determined without any intermediate checks, but a final check of the multiplication of the last row of L into the last column of U in Table IV should give the last element of A. If, however, the results disagree the error can be located by a systematic use of one of the methods contained in Tables II, III, or IV.
278
V. MATRICES
Evaluation of a determinant using the Choleski-Turing method
It can be verified using the above method that the matrix, A, of the example in § 2.1b becomes 3
0 67 3 19
11 2
^
17 3
7
0
0
1
Ύ
5
0
0
0
0
0
1
18 ~~ 67
™ 67
0
0
0
0
1
0
0
0
1 -
0
0
0
0 6 13
= -757
566 ~67~ 233 ~67~
24
-6
0
16,755 0 757 8073 77,613 757 16,755
whence
7
~T
104 ~ ~67~ 190 757
2
ΊΓ
37 67 815 757 10,862 16,755
1
\A\ =
Note that from this triangulation we can derive
\M =
3 11
Kl =
|= -67,
|Zl3| = · 3 11 2
3 11 2 -6
0 6 13
5 -4 -3 14
9
7 9 8 1
5 -4 -3
= - 1C3
So that the reader can verify more easily the product of the triangular matrices, the elements have been left in fractional form. In practice, however, it is customary to use decimal notation with a suitable choice of the number of significant figures. Inversion of a matrix
Since it is relatively easy to invert a triangular matrix, the CholeskiTuring method has an obvious application to finding the inverse of a matrix A. Find matrices L, U(l) such that
LU(l)=A and then the problem is reduced to that of finding Z, - 1 and C/_1(l) since
4-1=
U-\\)L-\
279
4.2. A SYMMETRIC MATRIX
To evaluate the inverse of the lower triangular matrix L postmultiply it by a lower triangular matrix A = (Ai;), and solve the matrix equation LA = I, or explicitly
'hi / =
I ^31
Q ^32 ^33
^ 'nl . ^n2
^»3 · · · ^n
from which the unknowns λί;· can be found. For example hiKi = l* S i v i n g λιι> hi ^11 + ^22 ^21 = °> g i v i n g ^21 \ 122^22= l>
giving
^
_1
and so on. Thus A = L is found. Similarly, to invert an upper triangular matrix U, it should be premultiplied by an upper triangular matrix (μ^) to give the equation /^ll
/^12
^13 * * · /^l*
/^22
/^23 * * · ^ 2 n
Ö4 J
_
I
_
r
O thus yielding a system of equations which can be solved successively for the μφ 4.2. A special case: t h e m a t r i x A is s y m m e t r i c
The matrix A is said to be symmetric if A = Ä i.e. at;· = α;ϊ. For example, the matrix 5 6 15 6 3 29 15 29 10 is s y m m e t r i c .
When A is symmetric the process of finding its inverse can be shortened a little. For it is now possible to express A in the form A
=LL
where the elements of L are found as in the Choleski-Turing method. The matrix L is not necessarily unique, nor are its elements necessarily real
280
V. MATRICES
even when the elements of A are real. The reader can verify this by constructing a simple numerical example. All that is now required is to find Z, -1 , since (L)~x = (£ _1 ) follows immediately, determined.
A~x = (Z) _ 1 L - 1
and so
is
5. Eigenvalues (Latent roots) The eigenvalues of A are defined in the following way. Consider the system of equations (5.1)
ΑΧ = λΧ = λΙΧ
where A is a scalar constant, A an (n x n) matrix, X = {xxx2. . .xn} is a column matrix, and I the unit (n x n) matrix. Equation (5.1) is equivalent to (A - λΙ)Χ = 0 which represents a system of n homogeneous equations in the n numbers xvx2,.. ·, xn- This system of equations has non-trivial roots only if (cf. § 4) \A - JJ\ = 0 i.e. if an — λ #21 I
#nl
a12 Λ
22 0Ln2
a13 ... λ
α
23 ' ' '
a\n a
2n
#n3 · · · ^ηη ~
s\ λ
/K 9 \
\
When expanded (5.2) is a polynomial of degree n in λ which yields n values for λ, say λνλ2,. . .,λη, the so-called eigenvalues, or latent roots, of A. For each value λτ of λ there corresponds a column matrix Xr, called an eigenvector (modal column) of A. The polynomial of degree n in λ derived from the expansion of (5.2) is called the characteristic equation of A and is represented by i* - a ^ - 1 + a ^ » - 2 - . . . + ( - l)MaM = 0
(5.3)
where, for example, ax = an + a22 + . . . + ann and, from the theory of equations λΎ + λ2 + . . . + λη = QLV The coefficient ax is called the spur (trace) of A and has the property that it is invariant under a transformation (see the next paragraph for the formal definition of a transformation).
5. EIGENVALUES (LATENT ROOTS)
281
Suppose that T is a non-singular square matrix, then the matrix TAT~X is called the transform of A by T. Since T~1(TAT~1)T = A, A is the transform of TA T~l by Γ - 1 . It is now shown that, if the eigenvalues of the (n x n) matrix A are distinct, then it can be transformed into a diagonal matrix. Let Xr be any one of the n eigenvalues and Xr the corresponding eigenvector, i.e. AXr = XrXr
(5.4)
Construct the square matrix T which has the n eigenvectors (modal columns) as columns. This (n x n) matrix is called the modal matrix of A and can be written as the row matrix T=
[XxX2...Xn]
(5.5)
Thus AT = A [ΧχΧ2 · * · Xn] — r^1X1A2^2 · · * AnXn] on using (5.4), and the multiplication rule for partitioned matrices, i.e. A T = \XXX%. . .Xn] diag (λ,λ2.. Λ ) = T diag (λ,λ2.. .λη), and premultiplying both sides by T~l, Τ-ΐΑΤ
=
άΜξ.(λιλ2...λη).
Thus the modal matrix T = [XiX2 · · · ^ n ] transforms A into a diagonal matrix whose diagonal elements are the eigenvalues of A. 5.1. The Cayley-Hamilton theorem
This theorem states that every matrix satisfies its own characteristic equation. This is a convenient way of expressing the fact that, if the characteristic equation of an (n x n) matrix A is λη - α ^ - 1 + α2λ*-2 - . . . + ( - 1) η α η = 0
(5.1.1)
then A satisfies the equation An - ^A»-1 +
(5.1.2)
This theorem is the basis of a method for obtaining the characteristic equation without actually expanding the determinant (5.2).
282
V. MATRICES
Proof. The theorem is proved here only in the case when the eigenvalues of A are all distinct (the theorem remains valid, however, even when this restriction is removed). In this case A can be transformed into a diagonal matrix. Consequently we first prove the theorem when A is a diagonal matrix D, and then generalize. If D = diag. (λ 1 λ 2 .. .λη), Dx = diag. (μ1μ2-. ·μη) are two diagonal matrices of order n, then their product is DD1 = diag. (λ1μ1 λ2μ2 . . . λημη)
(5.1.3)
where, if Xr = 0, then λΫμτ = 0; or if μ$ = 0, λ&μ5 = 0. The characteristic equation of the diagonal matrix D is obtained by expanding the determinant \D — λΙ\ = 0, and is clearly (λ-λ1)(λ-λ2)...(λ-λη)
= 0
which replaces (5.1.1) in the case when A = D. Thus the left-hand side of (5.1.2) is replaced by (D - XJ)(D - V ) . . · ( £ - U)
(5.1.4)
which is now shown to be the null matrix. Consider the general factor in (5.1.4), viz. D — XrI = diag. (λχλ2.. Λη) — diag. (λΧ..
.λτ)
= diag. (λχ — λτ λ2 — Xr . . . 0 . . . λη — λτ) which has a zero in the rth row, rth column. Hence the left-hand side of (5.1.4) is equal to the null matrix, on using (5.1.3). This proves the theorem for A = D, a diagonal matrix. Now suppose that A is any matrix whose eigenvalues are distinct. Then there exists a non-singular matrix T such that T~lAT = D, a diagonal matrix. Hence T-\A and
-λΙ)Τ=
T-^AT - Τ-^λΙΤ =
\Τ~Λ\Α - M\\T\ = \D -
Ό-λΙ
λΙ\.
Thus, since T is non-singular, the matrices A and D have the same eigenvalues and by the first part of the proof, D satisfies D*-(tlD"-1+ But
... + ( - l ) w a r t / = ( 0 ) . .
Dr = ( Γ - Μ T ) r = ( Γ - Μ Γ ) ( Γ ~ Μ T)... (T^A T) =
(5.1.5) T^A'T
283
5.2. THE ITERATION METHOD
and substituting in (5.1.5) with r = 1, 2 , . . . , n T-*AnT - ο ^ Γ - Μ - ^ Γ + . . . + ( - l)MaM/ - (0). Pre- and postmultiplying this last equation by T and T~l respectively, we have An - ^A»-1 + . . . + ( - l)MaM7 = (0), (5.1.6) as required.
This completes to proof of the theorem.
A derivation of the characteristic equation To find the characteristic equation, let Y0 = {^oi^· · -Jon) be any arbitrary column matrix, e.g. Y0 = {1 0 0 . . .0} and let Y1 = AY0> where
Y2 = AY1 = A*Y0,...,Y»
Yr = {yriyr2 . . . y™},
= AYn-1
= A»Y0 (5.1.7)
r = 0, 1, 2 , . . . n,
so that all the y's are known. Postmultiply (5.1.6) by Y0 and substitute the values (5.1.7) in the result to give Y n - O ^ - D + . . . + ( - l)«awY0 = (0) This matrix equation yields n simultaneous equations in the n unknowns a^Oj,..., aw viz. yns — Win-i)* + a2y(«-2)5 . . . + (— l) n a„y0s = 0,
1 < s < n,
which can be solved by methods already outlined. Having obtained the a's, the characteristic equation (5.1.1) can be solved by the usual methods. An alternative approach particularly useful in finding the dominant eigenvalue (i.e. that eigenvalue having the greatest modulus) is that shown in § 5.2. 5.2. Iterative method for determination of eigenvalues The method depends upon the use of matrix polynomials which are now defined. Let P{X) ~pm*m + pm-lX™-1 + . . . + A where p0,pv..., pm are scalar constants, be any polynomial in the scalar variable x. The expression P(A) = PmA" + Pn-iA*»-1
+..m+pj
284
V. MATRICES
where A is an n x n matrix, is called a matrix polynomial of degree m in A. This definition implies that, if an identity exists between scalar polynomials, then a corresponding identity exists for the matrix polynomials obtained by replacing xr by Ar, r ^ 1, and x° by / . Thus, if Ρχ(χ), ^ Μ » Pz(x),... Pt(x), P(x) are polynomials such that P(x)=ITPr(x), then it follows that P(A)=nPr(A). Suppose now that P(x) is a polynomial of degree (m — 1) or less, and m
that Q(x) = Π(χ — λτ) where λν λ 2 ,.. .Xm are m distinct numbers. Then P(x)IQ(x) can be expressed in partial fractions, i.e. as m
P(x) _
P(x)
v
Ar
-Στ^Τ,
(5.2.1)
s=l
where ΛνΛ2,..., Am are constants to be determined. Multiplying both sides of (5.2.1) by (x — λτ) and then letting x -> Xr we have = /!,,
for
1 < r < w.
5=1
Substitution of ΛΓ in (5.2.1) and multiplication by Q(x) yields the identity known in this context as the Lagrange interpolation formula, viz.
From this identity there follows the corresponding matrix equation (5.2.2) provided that
degree of P(A) ^ (w — 1)
(5.2.3)
285
5.2. THE ITERATION METHOD
From (5.2.2), (5.2.3) and the Cayley-Hamilton theorem we deduce a special case of Sylvester's theorem. Sylvester's t h e o r e m
/ / the eigenvalues λνλ2,...,
λη of an (n x n) matrix A are distinct, and
if P(A) is any polynomial in A, then •(5.2.4) Proof. Note the differences between (5.2.2) and (5.2.4) ; m is replaced by n\ Q(X) = (λ — AX)(A — λ2).. .{λ — λη) = 0 is now the characteristic equation of the matrix A, denoted as usual by Q{X) ~λη-
α ^ - 1 + α2λΜ"2 . . . + ( - 1)ηαΛ = 0
(5.2.5)
the polynomial P(A) is no longer restricted by (5.2.3) with n replacing m. The proof of the theorem follows immediately from (5.2.2) if it can be shown that any polynomial can be expressed in terms of powers of A less than, or equal to, n — 1. That this is true is a consequence of the Cayley-Hamilton theorem, namely An = ^A»-1
- OL2A»-2 + . . . + ( - ly-^J
(5.2.6)
Multiply both sides of (5.2.6) by A so that An+1
= ^A"
- OL^A"-1 + . . .
+ ( - l) n - 1 a M ^.
Substitute for An from (5.2.6) to give An+1
= OL^A»-1
- oL2An~2 + . . . + ( - l ^ - W ) - *2An~x + . . . + (-l)n-loLnA
(5.2.7)
i.e. a polynomial of degree (n — 1) or less in A. Continuing this process we see that An,An+1,.. . can be expressed as polynomials of degree at most n — 1 in A. Since any polynomial P(A) is merely a sum of scalar multiples of powers of A, it is apparent that, by a continuous use of (5.2.6) etc., that P(A) can be expressed as a polynomial of degree at jnost n — 1. Similarly Ρ(λ) is reducible to a form of degree at most n — 1 in λ with the same coefficients. Then (5.2.4) follows from (5.2.2) and the theorem is proved. (This theorem needs modification if any eigenvalue is a repeated root of the characteristic equation.)
286
V. MATRICES
In particular, if P(A) = Am where m is a positive integer, then Sylvester's theorem states (5.2.8) From (5.2.8) we can deduce an iterative method for finding the real dominant eigenvalue in the following way. Let m be a large positive integer and λχ the required dominant root (supposed real). Then, to a close approximation, Am is equal to the first term in the summation of (5.2.8), i.e. Α"=λΓΠ[λ
_χ
j-λ^Β,
say
(5.2.9)
where B is an (n x n) matrix. Let Y0 = {y0iyo2 · · · >Ό»} De a n arbitrary column matrix, e.g. (10 0...0), and postmultiply (5.2.9) by Y0 giving (5.2.10)
Α^Υ0 = λ^ΒΥ0 where each side is a (1 x n) matrix. Then, writing A m Y0 = {amlam2... amn}
and
B Y0 = {bxb2... bn}
where the as and the b's are known, the matrix equation (5.2.10) gives Similarly
amr = X1mbry 4*+ιγ0
l
=
(5.2.11) (5.2.12)
m+1
where we put the column matrix A
Y0 = {a{m+1)1.. -tf (m+1)w }, so that
+1
a{m+i)r=Xr br, Dividing (5.2.11) and (5.2.13)
l
λλ = a{m+i)rlamr,
1< r< n
(5.2.13)
which means that the real dominant eigenvalue is given as the ratio of any two corresponding elements of the known matrices AmY0, Am+1Y0. The iteration process is carried out until this ratio remains constant. If l^il ^ 1^1 where λ2 is the subdominant root, then this process requires relatively few steps. The iterative method accomplishes more; for we assert that the column matrix AmY0 is the eigenvector Xx corresponding to the eigenvalue λν This becomes apparent by a change in notation. Thus, if we write AmY0 = X,
287
5.2. THE ITERATION METHOD
(5.2.10) and (5.2.12) become respectively Xx = and
λ^ΒΥ0
ΑΧλ = λ^λ^ΒΥ^)
= λλΧχ
from the previous equation. Hence X1 is the required eigenvector, as asserted. The method can be adapted to the case when the characteristic equation is such that λχ and λ2 are conjugate complex numbers whose modulus is greater than that of any other eigenvalue. In this case we have from Sylvester's theorem (5.2.4)
where now we choose, Ρ(Α)=Α™(Α-λ2Ι) Ρ(λ2) = X2m(l2 - λ2) = 0
Since
(5.2.14)
P(A) is approximately equal to the first term of the summation since the second term arising from λ = λ2 vanishes by (5.2.14), so that
Am(A - λ2Ι) = λ ^ - λ2) Π (A ~~ Y ) ΞΞ V C
say
(5.2.15)
where C is an (n x n) matrix. Let Y0 = {y01.. .y0n} be an arbitrary column matrix with which we postmultiply (5.2.15), giving A*{A - λ2Ι)Υ0 = X^CYQ.
(5.2.16)
Using the notation AmY0 = {amlam2.. .amn} as before, and putting CY0 = {c1c2...cn} where now the matrix elements are complex, we have »(m+l)r — Kamr = λ{*0τ,
Similarly and
Am+\A
1 < Ύ < ».
(5.2.17)
- λ2Ι)Υ0 = If^CY^
^ 2 ) r — A2Ä(m+i)r= A1m+1cr,
(5.2.18)
1^ r ^ n
Dividing (5.2.17), (5.2.19) 3
i ~
i.e.
#(m+2)r — Λ2Λ(>η+1)>·
ϊ
Α ^ α ^ — (λχ + A2)«(w+i)r + a(m+2)r = 0,
1 ^ r < n.
(5.2.19)
288
V.
MATRICES
Only two of these equations are required to find λλ and λ2. The corresponding eigenvector arising from λλ can also be found. Again the derivation is apparent by a change of notation. Let X, = A™{A - V ) Y0 = hmt
?ο
(5-2-20)
then (5.2.18) becomes AX,
= MXfCYo)
=
λιΧ1
and so the required eigenvector is Χλ given by (5.2.20). Alternative procedure. Sometimes it is convenient to iterate by postmultiplying an arbitrary row matrix, say X= [ 1 0 0 . . . 0 ] by A. By this procedure we obtain the modal row corresponding to the dominant eigenvalue. An example to illustrate these methods is deferred until after the next section. 5.3. Evaluation of subdominant eigenvalue This involves the construction of a new matrix A{1) whose eigenvalues are 0, λ2,λ3,. . ., λη where we suppose \λ2\ > \λ3\ > . . . > \λη\ and where, as usual λνλ2,. . ., λη (\λλ\ > \λ2\) are the distinct eigenvalues of A. To this end we introduce the modal row X defined as the row matrix satisfying the equation XA =λΧ For non-trivial solutions of this equation, \A - λΙ\ = 0 yielding n eigenvalues λλ,λ2,...,λη identical with the solutions of (5.2). Let Xr be the modal row corresponding to the eigenvalues λ„ so that (5.3.1)
XrA =-- lrXr Let Xs be the eigenvector (modal column) satisfying
(5.3.2)
AXS = XSXS. Then we show that XrXs = 0
if
r^s.
(5.3.3)
To prove (5.3.3), postmultiply (5.3.1) by Xs and premultiply (5.3.2) by Xr to give X^rAXg = /.rXrXs, XrAXs = AsXrXs
5 . 3 . EVALUATION O F SUBDOMINANT
EIGENVALUE
289
and then, by subtraction of these last two equations =0
(Us)XrXs implying, since λτ φ λ,, that
XrXs = 0,
as was to be proved. Proceeding with the construction of Α{λ) we write A j = \ # ι ι # 2 1 · . . Xn\)
as the eigenvector corresponding to λ1 (which has already been found by the iterative method), and suppose xsl is a non-zero element of the column matrix. Form the matrix / 0 0. ' 0 0.
z=
0.
0
\ 0 0.
■^21/^1 ·
1 XnllXs:
.0 .0 .0 -
where the sth column of Z is the eigenvector X1 divided by the element xsl, and the elements of the remaining (n — 1) columns of Z are all zero. If Xr = lxr\xr2· - -Xm\ where now Y ^ 2 is a modal row of A, then, by matrix multiplication, XrZ
=
[ΟΟ...Χ,Χ1...0]/Λ:51
where XrXx appears in the sth column of the resulting row matrix. But, by (5.3.3), ^ ^ = 0 since r Φ 1, and hence XrZ is equal to the null matrix [0], It follows that (5.3.4)
Xr(I -Z) = Xr. Substituting for Xr from (5.3.4) in the left-hand side of (5.3.1) i.e. Writing
Xr{I - Z)A = XrXr Xr [(/ - Z)A - λτΐ] = 0
(5.3.5)
(/ - Z)A = A^
(5.3.6)
(5.3.5) becomes Xr(A{l) — XrI) = 0, the eigenvalues of which are given by the determinantal equation \ΑΜ-λτΙ\
=0
where
2 < r < n.
290
V.
MATRICES
Hence (n — I) of the n eigenvalues are λ2,λ3,..., being zero since, from (5.3.6) |4(i)| = \I-Z\\A\
λη, the remaining one
=0
since (/ — Z) has a row of zeros for the sth row. To find the subdominant root λ2 of A the iteration is performed on A{1). The method can obviously be extended to find all the eigenvalues in descending order of magnitude, provided that they are all real and distinct. In Example 2 below we find the dominant and subdominant eigenvalues of a (4 x 4) matrix by a variation of the above method, that is by iterating starting with an arbitrary row, constructing a new matrix A$ (R for row) to give the subdominant eigenvalue. To find the remaining eigenvalues the matrix A is inverted and the dominant and subdominant eigenvalues of A-1, say μν μ2 are found using an arbitrary column for the iteration as described in the text. Then the matrix A^ (C for column) is constructed to find the subdominant root of A*1. Then μ^1 and μ2ι are those eigenvalues of A having the two least moduli. Example 1
As a straightforward application of the foregoing methods we consider a simple matrix / 3-1 l' A = I -21 7 3 \ 7 -2 2, and determine the dominant eigenvalue and the characteristic equation. Iteration Y0 1 0 0
Y, = AY0 3 -21 7
Y2 = A*Y0 37 -189 77
y* = A*Y» 377 -1869 791
Y* = A*Y0 3791 -18627 7959
Y5 = A*Y0 37959 -186123 79709
Taking the ratio of corresponding elements of Y5 and Y4, which are 10.01, 9.99, 10.01 the dominant eigenvalue is 10.00, taking the mean of these ratios. The coefficients of the characteristic equation are given as the solutions of the system of simultaneous equations 377 — 37ax + 3a2 — a3 = 0 -1869 + 1 8 9 a i - 2 1 a 2 =0 7 9 1 - 7 7 a x + 7a2 =0
or or
8 9 - 9ax + a2 = 0 113 — llo^ + Og = 0
291
5.3. EVALUATION OF SUBDOMINANT EIGENVALUE
whence
OL1 = 12,
α2 = 19,
α3 = — 10
and the characteristic equation can be solved to find the other eigenvalues. The eigenvector corresponding to λχ = 10 is from the Yh column {1 —4.903 2.100}. The correct eigenvector { ^ A ^ } can be found by solving the system of equations (3 — λ)χ1 — x2 + * 3 = 0 7*! — 2 * 2 + (2 — λ)χ3 = 0 with λ = 10, xx = 1, and is {1 —4.9 2.1}. The characteristic equation as found by expanding the determinant \A - λΙ\ = 0 P - 12A2 + 19A + 10 = (λ - 10)(P - 2λ - 1) = 0.
is
Example 2
A further example mentioned in the text is to find the eigenvalues λν λ2, λ3, Α4 of ' 8 - 2 4 0.75N -1 9 6 - 8 A - ' 2 - 4 0 - 6 4 -8 -4 14, Let X0 = [1 0 0 . . .0] be a row matrix with which we premultiply A. Note that each row Xr is divided by that power of 10 which reduces the leading element to a number between 1 and 10. This power of 10 is shown at the start of each row. 1
*0
x
=
*1
ΧϋΑ* = X2 10
0
8
-2
0
0
4
0.75
7.7
-5.6
1.7
0.85
2
7.4
-7.94
-0.62
5.23
3
8.558
-12.562
-3.896
14.601
1.3164
-2.3140
-9.9544
3.3471
10«
2.4242
-4.6253
-2.2097
7.2330
7
4.8549
-9.5537
-4.6986
15.3280
X0A* = Z 8 10»
1.0031
-1.9952
-0.9922
3.2287
10
2.0951
-4.1825
-2.0874
6.7870
X»AZ = Z 3 10
X0A* = Xi 10 5 XçA5 = x5 10 X0A«
=
^6
X0A* = X 7 10
X0A» = ^ 9 10
292
V. MATRICES
The ratios of corresponding elements in XQ and X8 are 20.89
20.96
21.04
21.02
so that, to a close approximation, λλ = 21 is the dominant eigenvalue. The corresponding modal row is [1
-2
-1
3.25]
We now construct the matrix A(1) whose eigenvalues are 0, λ2, Α3, A4. The auxiliary matrix Z is taken to be
Z = -!
so that
AW =
A(I-Z)
(
0 0 2 0
0 0 1 0
3.25 / 0 /
1
0
0
0
\
1 0 0 \ -2 0 3.25 I 0 0 1 / /12 - 1 0 0 13.75 \ 1 5 — 3 0 11.50 \
1 2 - 4 \ 0 0
Iterating again Xo X0AW = X^AW
1
0 - 6 I 0 1 / 0
0
0
χ^)
10
1.2
-1.0
0
1.375
= X2W
10
9.4
-9.0
0
6.375
XJVAW = XSW 1
XtMAM = XJ * X^AW
0 1 0
= x5m
Χ , ο χ ο = X 6w XJVAW = X,'1»
102 10
6.78
-6.70
0
3.213
3
4.786
-4.770
0
1.939
4
3.363
-3.359
0
1.295
s
2.356
-2.355
0
0.8908
e
1.650
-1.650
0
0.6201
10
10 10
The ratios of corresponding elements of X^ 7.002
and X^ 7.003
-
are 6.96
and we take A2 = 7. To find the other two eigenvalues the matrix A is
5.3. EVALUATION OF SUBDOMINANT EIGENVALUE
293
triangulated and inverted. The least eigenvalue of A becomes the dominant eigenvalue of A-1. This is found by iteration using an arbitrary column. Triangulation 8 0 -1 8.75 2 -3.50 4 - 7
0 0
0 0
\
1.60 0
-0.80
\
I
2.625 /
Λ -0.25 1 ° 01 0
0.50 0.74286 1 0
0
\o
0.09375 -0.90357 -5.84375 1
= A
\A\ = 8 x 8.75 x 1.60 x 2.625 = 294.
whence Inversion 8 1 2 4
0 8.75 -3.50 -7
:
1.60 0 M 0 . 8 0 2.625/ -0.25 0.50 0.09375 \
1 0 0 0
1 =
:
0.1250 0.0143 -0.1250 -0.1905
1 0 0
0 0.1143 0.2500 0.3810
1 0.25 0 1 0 0 0 0
0.7429 0.9036 1 - -5.8438 0 1
0 0 0.6250 0.1905
0 0 0 0.3810
-0.6858 -0.7429 1 0
= /
-3.8750 -3.4375 5.8438 1
Thus 1
A~
=
1 0.25 0.6858 - 3.8750 \ / 0.1250 0 0 0 0 1 0.7429 - 3.4375 \ / 0.0143 0.1143 0 0 0 0 1 5.8438 I 1-0.1250 0.2500 0.6250 0 ,0 0 0 -0.1905 0.3810 0.1905 0.3810. A-1
i.e.
=
0.9524 -1.6190 0.7619 -1.3810 -1.2381 2.4762 -0.1905 0.3810
-1.1667 -1.1191 1.7381 0.1905
-1.4762 -1.3095 2.2262 0.3810,
Iterate using the column Y0 = {1 0 0 0} ^1
=
ΥΛ4 ■ A~lY*3
Y2 =
A-1Y1
Y* = A-*Yt
1
0.9524
1.40
1.69
1.853
1.936
0
0.7619
1.31
1.61
1.777
1.860
0
-1.2381
-1.87
-2.19
-2.357
-2.440
0
-0.1905
-0.20
-0.29
-0.202
-0.202
^6
=
A~^Yh
^7
=
A-*YB
1.977
1.997
1.902
1.922
-2.482
-2.502
-0.202
-0.202
294
V. MATRICES
The eigenvalue determined by the ratio of corresponding elements is taken to be 1.00. Hence, the eigenvalue of A having the least modulus is 1 _ 1 = 1. To find the subdominant root of the inverse matrix, construct
Zc =
1.000 0.963 1.247 0.100
0 0 0 0
0 0 0 0
0 0 0 0
(taking the eigenvector corresponding to the root 1.00 as {1.00 0.963 -1.247-0.100}) and the new matrix for iteration, namely
ACM = (I-ZC)A
/ ° =
[ -0.963 I 1.247 \ 0.100
/ °
/ -0.1553 1 -0.0505 \ -0.0953 Using Y0 = {1 0 0 0} for iteration Yi(l)=
ya(i)=
0 1 0 0
0 0 1 0
0 0.1781 0.4573 0.2191
0 0.0044 0.2832 0.0738
y 8 (i ) = =
y4(i,=
AcWYJV
AeWYsM
APY^
0
0
0 X 10- 3
OxlO- 3
OxlO- 3
Ac^Yf)
y^l)^
0 0.1121 0.3854 0.2334 À
YJl)
=
Y0
ACMY0
Ae^Y^
1
0
0
-0.1553
-0.0386
-0.0141
-6.40
-3.109
-1.549
0
—0.0505
—0.1220
—0.0753
—39.9
—20.39
—10.31
0
-0.0953
-0.0600
-0.0315
-16.0
-8.080
-4.072
The subdominant eigenvalue of A~x can be taken to be 0.50 and hence the remaining eigenvalue of A is 0.50 - 1 = 2. As a check, the product of all the eigenvalues is equal to the determinant of A, i.e. |^4| = 2 1 x 7 x 2 x 1 = 294, agreeing with the value of \A | obtained by the triangulation process. 6. Special types of matrices
In this section we mention three important types of matrices and discuss briefly some of their properties. They are the Orthogonal matrix, Hermitian matrix, and Unitary matrix.
295
6.1. ORTHOGONAL MATRIX
6.1. Orthogonal matrix
A matrix A is said to be orthogonal if (6.1.1)
AÄ = I This equation implies ÄA = I.
For, on premultiplying (6.1.1) by
A'1
A-1 = A~1AÄ = Ä ÄA = A-1 A = I.
whence Also from (6.1.1)
\A\\Ä\ = 1. and so, since \A\ = \Ä\, we have
μ| = ±ι. If X and Y are column vectors connected by the equation (6.1.2)
Y = AX
then this transformation is called orthogonal if A is orthogonal. If, in addition \A\ = + 1, then the transformation is called a rotation. The choice of nomenclature is clear when we consider a simple example. Let X = K * 2 * 3 } , Y = {y^y*}, and
(
cos0
sin Θ
— sin9 cos0 0 0 so that \A\ = 1, then (6.1.2) is equivalent to yx = χχ cos Θ + x2 sin θ
0 \
0 I 1/
y2=z — χχ sin Θ + #2 COS Θ y
3
=
*3
which represents that transformation of the coordinate system which is obtained by a simple rotation of the xx- and #2-axes about the #3-axis through an angle Θ. An important property of orthogonal matrices is embodied in the following.
296
V. MATRICES
Theorem. Every eigenvalue of a real orthogonal matrix has unit modulus. Proof. We have to show that all values of λ satisfying (6.1.3)
ΑΧ = λΧ are such that
λλ = 1
where 1 is the complex conjugate of λ. Taking the transpose of the complex conjugate of (6.1.3), an operation denoted by an asterisk, one finds Χ*Α* = λ*Χ*
(6.1.4)
Form the product of (6.1.3) and (6.1.4) as shown below X*A*AX = λ*λΧ*Χ (6.1.5) Since A is real, Ä = A, and so A *A = ÄA = I ; since λ is a scalar, λ*λ = λλ ; since X*X φ 0, it follows from (6.1.5) that λλ=1 proving the theorem. 6.2. Hermitian matrix
The matrix A is said to be Hermitian if A=A*
i.e.
(ars) = (äsr)
From this definition it follows that the diagonal elements of a Hermitian matrix are real since arr = ärr. The Pauli spin matrices (see § 1.1) are Hermitian. If a matrix A is such that A ~ — A* then A is called a skew-Hermitian matrix, and the definition implies that its diagonal elements are imaginary. If A is skew-Hermitian then jA is Hermitian. Theorem
The eigenvalues of a Hermitian matrix are real. Proof. From (6.1.3) and (6.1.4) of the previous theorem ΑΧ = λΧ χ*Α*
= λ*Χ* = λΧ*
(6.2.1) (6.2.2)
297
6.3. UNITARY MATRIX
Since A = A*, (6.2.2) can be written (6.2.3)
X*A = λΧ* Premultiplying (6.2.1) by X* and postmultiplying (6.2.3) by X, X*A X = λΧ*Χ,
X*A X = λΧ*Χ,
whence, since X*X is not zero, λ = λ, i.e. λ is real. Corollary. If ^4 is skew-Hermitian, jA is Hermitian and so the eigenvalues of A are imaginary. 6.3. Unitary matrix
This matrix is a generalization of the real orthogonal matrix. A matrix A is said to be a unitary matrix if it satisfies A*A =AA*
= I.
Theorem
The eigenvalues of a unitary matrix have unit modulus. Proof. since then
This is an immediate consequence of (6.1.5) with A A* — I, λλ*=.λλ=
\λ\2=ι
Lorentz transformation
An example of an orthogonal matrix is given by the matrix of the Lorentz transformation. Here where
X' = AX
X = {xxx2Hxù>
^
=
{XIX2XSXA}
the first three elements being the usual rectangular coordinates of three dimensional space and the fourth elements * 4 = jet, (j = ]/ — 1) x± = jet' where c is the velocity of light and t, t' are time coordinates. The matrix A is 1 0 0 1 A=
|
0
0
0
0
0
0 1
]/l - v2/c2 — jvjc | / l - v2/c2
0 0 jvjc
v2/c2
]/l 1 ]/l -
v2\c2
298
V. MATRICES
where v is the velocity of the primed system in the x3- (or x3'-) direction relative to the unprimed system. It is readily verified that ÄA = AÄ = I. Explicitly the transformation is x2' = x2,
%i = x}
where a = (1 — i,2/c2)~1/2.
%a = a(*3 — νή>
f = &(t — vxzjc)
Thus 4
i.e.
x' = Ax,
χμ' = 2J aM»x»> μ = l> 2> 3» 4 ·
(6·2·4)
Hence Äx' = ÄA x = x, since ÄA = / , and so 4
4
xv = 2^i ^νχχχ' = Σ Λ=1
α
*»χϊ>
v = 1, 2, 3, 4.
(6.2.5)
λ=1
Substituting for xv from (6.2.5) in (6.2.4) and equating the coefficients of χμ', we have
V JO Z «***> = à» = j j
if i{
λφμ λ
=
(δμν is called the Kronecker delta). Similarly
Σ 7. Simultaneous diagonalization of two symmetric matrices
It has been shown in § 5 how a matrix with distinct eigenvalues can be reduced in a unique way to a diagonal form by means of an orthogonal matrix T. Such a reduction is still possible if A is a symmetric matrix even though the eigenvalues are no longer distinct. (There are, of course, other ways of diagonalizing A if the restriction that T should be orthogonal is removed.) Because of its importance in the theory of normal modes, for example, we discuss, in this concluding section, the simultaneous reduction of two symmetric matrices A and B to diagonal form. The result is embodied in the following theorem.
7. DIAGONALIZATION OF TWO MATRICES
299
Theorem
Two symmetric matrices A = (a^) and B = (b-) can be diagonalized simultaneously provided either (i) the roots of the determinantal equation \A -λΒ\=0
(7.1)
considered as an equation in λ, are distinct; or (ii) all the discriminantal determinants of at least one of the matrices are positive. Proof, (i) Let λν λ2>...,λη be the distinct roots of (7.1) and let Xr and Xr be connected by the equation AXr = XrBXr
(7.2)
On transposing, this last equation is XrÄ = KXrË or, since A and B are symmetric XrA = KXfB This equation when postmultiplied by Xs gives XrAXs = KXrBXs
(7.3)
From (7.2) by a change of notation AXS
=
ASBXS
which, when premultiplied by Xr yields XrAXs
=
AsXrt)Xs
Comparing this last equation with (7.3) we have, since Xr Φ Xs if r φ s XrBXs = 0
(7.4)
XrAXs = 0
(7.5)
and hence
Let
XrAXr = 7.r,
XsBXs = ßs
(7.6)
300
V. MATRICES
and let T be the (1 x n)-matrix [Xv X2>..., XH], so that T is the (n χ 1)matrix {Xv X2>..., Xn}. Treating A as a (1 x l)-matrix we have ?ΜΓ is an (n χ 1)(1 x 1)(1 x w), i.e. an (n x w)-matrix. In fact TAT=
(XrAXs) = diag. (04, a 2 ,. . ., aM)
on using (7.5) and (7.6). Similarly from (7.4) and (7.6) TBT=
{XrBXs) = diag. (βν /? 2 ,..., βη)
thus completing the proof of (i) of the theorem. (ii) It is immediate that, if A and B are (1 x 1)-matrices, then the simultaneous reduction is possible. The proof is completed by induction, the inductive hypothesis being that the simultaneous reduction is possible for any two symmetric {(n — l) x (n — l)}-matrices provided that all the discriminantal determinants of B (say), namely, \ΔΑ = bn, | J 2 | = ( è iA2 - b12b21),..., \An\ = \B\, are positive. Let λχ be a root of the equation \A -λΒ\
= 0
and let Xx satisfy the equation Consider the equation ΧλΒΧ = 0
AXX = λ1ΒΧ1 where
X = {xv x2,...,
xn}
This is a linear equation in the n unknowns xv x2,..., xn to which there is an infinity of possible solutions. Choose (n—l) linearly independent solutions thereby defining (n — 1) linearly independent matrices X2, X3,. . ., Xn all of which satisfy X1BXS = Q if 2<6
(7.9)
a result which is assumed for the moment and the proof of the theorem completed. Let T now be defined as the (1 x n) non-singular matrix [Xlt X2,. . ., Xn] and form the product TBT=
(XrBXs)
7. DIAGONALIZATION OF TWO MATRICES
301
But, from (7.7), (7.8), and (7.9) the (n x w)-matrix (XrBXs) is such that X1BX1 — γ and all the other elements in the first row and first column are zero, i.e.
i I j° )
TBT=I \
0 | Bn-J
and hence the (n x n) -matrix B is reducible by the inductive hypothesis to diagonal form since En_t is an {(n — 1) x (n — l)}-matrix. A similar relation holds when A is transformed. This completes the proof of (ii) of the theorem. Proof of (7.9): Quadratic forms. Let B = (btj) and X = {xv x2,. . ., xn} be (n X n)- and (n x 1)-matrices respectively. Then, if B is symmetric n
n
ΧΒΧ = ΣΣ ;=1
i=\
is a quadratic form in n variables xv..., XBX = ^ η ^! 2 + 2\2χχχ2 .+
Ô22A;22
#n. Written in extenso
+ 2b13xxx3 + . . . + 2biHx1xn + 2^23Λ:2^3 + . . . + 2b2nx2xn i
^33^3
~r · · ·
i ^^:in^3^H
+ This quadratic form can be reduced to a sum of squares by the elementary process of completing the square. Thus, completing the square on the terms in xv we have, symbolically XBX = blx(xx + u12x2 + u13x3 +
+ ulnxn)2
+ V22X22 + 2023*2*3 + · ' ' +
'2nX2Xn
2l
(7.10)
+ r Vnn%n
where, for example, uly = blrjblv By successively completing the square on terms in x2, then in *3, and so on, we deduce that XBX = cxy2 + c2y22 + . . . + cnyn2,
say
302
V. MATRICES
where
y x = xx y2
+ u12x2 + u13x3 + . . . +
=
^2
i~ ^ 2 3 ^ 3 T" · · ·
uUlxn
i M>2.nXn
yn = or, in matrix notation F = Î7X where f/ is the upper unit triangular matrix of the If C = diag (cv c2,..., cn) then
(7.11) us.
qyi 2 + c2y22 + .. - + c„yM2 ^ Y C Y = *I7C£/A: = XZLY from (7.10) and (7.11).
Hence UCU = B
(7.12)
Since U and Ü each have units in the diagonal it follows that \U\ = |(7| = 1 and hence t h a t |C| = | B | i.e.
cxc2cz ...cn
= \An\
(7.13)
in the notation of the theorem. In the original quadratic form (7.10) put xn = 0, and repeat the process outlined above. The matrix B in (7.12) is replaced by an {(n — 1) x (n — 1)}matrix derived from B by taking the first (n — 1) rows and columns so that the determinant of this new matrix is |zJM_]|. Similarly C is replaced by the matrix diag. (cv c2,. . . , cn_x) so t h a t the equation corresponding to (7.13) is c±c2 ... cM_x = |zl„_i| showing that
cn = |ZlM|/|zJn_i|
Proceeding in this way we can show that Cr=\Ar\l\Ar-l\ Since all the Ar are positive so also are all the cr and so Wi2
+ c2y22 + · · · + c «y« 2
is essentially positive apart from the trivial case when yx = y2 = . . . — yn = 0. In this case the quadratic form is called a positive definite form. This proves the assertion made in equation (7.9).
303
PROBLEMS
Notation Since the notation used in this chapter differs from that used by other authors, it is summarized for reference in Table V. TABLE
V
Matrix A = (αί;) ; determinant of A = \A | Row matrix (vector of first kind) = [xxx%. . .xn~\ Column matrix (vector of second kind) = {xxx2 · · · xn\ Diagonal matrix diag. (ana22. . .ann) A A"1
is the adjoint (adjugate) of A : \A\ = is the reciprocal of A, i.e. AA~l
\A\n~l
= A"1 A = I
A is the transpose of A A*
is the transpose of the conjugate complex of A
A is orthogonal if A A = A A = / A is unitary if A A* = A* A = I A
is Hermitian if A = A*
L, U are lower and upper triangular matrices, respectively L(l), U(l)
are lower and upper unit triangular matrices respectively
Problems
1. (i) Show that the product of the matrices a + jb c + jd is of the form
(
c + jd\ / oc — jß γ — jô a — jb) \— γ — jô a + jß x — jy
u — ]v
— u — jv
x + jyt
where a, b, c, d, a, β, γ, ô, x, y, u, v are real and / = \ — 1. Hence, by taking the determinants of each expression deduce that the product of two numbers,
304
V. MATRICES
each of which is a sum of four squares, can be expressed as the sum of four squares. / / (0)\ (l 0\ where 7 = ( 0 J (ii) If e=[{0)1) and
α,=(σι
J
, = 1,2,3,
where the σ{ are the Pauli spin matrices (see Example, § 1.1 of the text), evaluate the products — //?a,·, i = 1, 2, 3, denoting them by y{. Show also that, if y4 = ß, then iiyrYs + rs^r) = ôrsI,
r, s = 1, 2, 3, 4;
Here / is the unit (4 χ 4) matrix, and ôrs is the Kronecker delta (i.e. ôrs = 0 unless r — s when ôrr = 1). 2. Write down the transpose of the matrix - « 2
- « 3
I «3 \
and evaluate ÄA.
a
4
-«3
Deduce that A-^AKVS
+ KS + OS + OLS)
If X = { Λ ^ Λ ^ Λ ^ } , Y = {y^^a)^} and if AX = Y, prove that
(<*!» + α^ + «,» + « 4 2)(V +
X>» +
^2
^2)
+
=
(y^ + y ^ + y^ + y^)
by premultiplying AX by its transpose. 3. Evaluate the following determinants 22 -12 0 12
16 10 -6 12
50 -36 16 34
28 -12 -6 34
23 -19 9 5
-9 1 -33 -15
36 -5 -27 51
-14 19 -18 9
305
PROBLEMS
4. If A = (a{j) and ^4i; is the cofactor of a^ in the determinant \A\ show by considering the product AXA2 where
*-($
that
<
?)^-te ) '?)■—«,» Σ a'*XrXs
Mil = — 1^1 Σ r=l
s=l
5. (i) Find the values of λ for which the equations y — 3z _ x + 3y + 2z _ y z
x + 2y + z _4x— x have non-trivial solutions.
(ii) Find the values of λ such that the equations 2x + (λ-1)χ
(λ-Ι)
y - (λ-1)ζ
+ {(λ-1)*+1}γ-ζ
(λ—1)χ+
= 0 =0
y+
z=0
have a solution other than x = y = z = 0.
6. If
Λ=(
C0SÄ
5
Η
and β = ( / - Λ ) ( / + Λ)-ι
\— sin a cos a/ where 7 is the unit (2 x 2) matrix, show that Z? is skew-symmetric [(^·) is skew-symmetric if ό- = — δ·χ·]. If C = (/ - £ 3 )(/ + ^ 3 ) - 1 show that CC - 7. 7. If Λ=
cos φ ( — sin î/r
0
sin 0 0 cos φ 0
° l,
find the product ABC =■- D, say. that it is equal to D.
(
cos <£ — sin φ 0
sin φ 0 cos <> / 0 0 1,
Evaluate the product CᎠand verify
306
V. MATRICES
8. The two values of * derived from the equation ax2 + bx + c = 0 are substituted in axx2 + bxx + cx y = a2x2 + b2x + c2 to give values of y denoted by yv y2. b2 = 4#c, or a Ä
l
a2
Show that, if y± = y2 then either
6
c
*1
Cl
b2
H
= 0
9. Find the adjoint of the matrix
A =
/
*
X
\
2
1
-2 0 3 -6
11 - 1° -1'
-4 -1 6 -10
and hence find A~x. 10. Solve the equations 116
— 24*x — 17*2 + 66*3 + 10^+
3*2 — 30*3
— 26*x — 10*2 + 10*! +
76
*4 =
3
44*4 = - 12 118
*3 +
2*2 — 30*3 —
*4 =
43
27
*4 = — 15
by the Choleski-Turing method or otherwise. 11. By triangulating the matrix,
M =
-24 10 -26 10
-17 3 -10 2
66 -30 76 -30
116 -44 118 -43
find its reciprocal. 12. Using the iterative method, find all the eigenvalues of the matrix of Problem 11, i.e. find the values of λ satisfying
307
PROBLEMS
-24 - λ 10 -26 10
-17 3-λ -10 2
66 -30 76-A -30
116 -44 = 0 118 -43 - λ
13. Find the characteristic equation of the matrix M of problem 11 by the method of § 5.1 and hence find the latent roots. 14. Find values of the unknowns which satisfy the equations 3.0*x — 1.0%2 + 5.3*3 — 7.3*4 + 6.9* 5 = — 1.5*! + - 1.5*! + 0.7*! +
4
10 42
5 5
L2
*4 -
31
33 67
2lx
22
*4 +
23
0.5*2 + 1.7*3 — · *4 + ·°*5 =
5.1* x + 11.3*2 -
· *3 +
9.8*2 +
53.31
2 8
3
+
*5 = *5 = —
4.1*2 — 3 · 6 *3 + 8 · 2 *4 + 5 · 6 *5 = —
·
·
10
·18
30 79
·
15. Evaluate the following determinant by the Choleski-Turing method: 2.700 17.728 6.290 13.583 16.
4.800 110.212 22.040 98.192
1.700 7.358 2.890 4.913
3.800 63.092 14.440 54.872
Find the reciprocal of the matrix T =
Show that the transform of the matrix
A=\
b+ c [c-b \b-c
by T, i.e. TAT-1 matrix A.
c—a c+ a a—c
b-a\ a-b I
a + bl
is a diagonal matrix. Determine the eigenvalues of the
17. Verify the Cay ley-Hamilton theorem for the matrix
4 = 0 \1
-2 1
-2 I 0/
308
V. MATRICES
Find the eigenvalues of the matrix and deduce the matrix which reduces A to a diagonal form 18. (i) If A is a square matrix such t h a t A + \ I, A — \ l are both orthogonal, show t h a t A is skew-symmetric and t h a t A2 = — f / . Deduce t h a t A is of even order. (ii) If A is a skew-symmetric show t h a t (/ — A)-1 provided (/ — A)"1 exists. Verify the result when 0 _2 -1
A =
2 0 3
(I + A) is orthogonal
- · )
19. If the matrices
A =
0 0 0
1 0 0
0 1 0
0 1
ß
Y
à}
(ί
B =
have the same characteristic equation find also that, if
T =
0 0 1 b
a, β, γ, ô knowing b. .. Ω
1 b b2
0 1 2b
0 0 1
0 0 0
3
36 2
36
1
^
then A =
0 1 b 0
1 b 0 0
Show
TBT~\
20. Find the eigenvalues of the matrix 3.249313 3.626040 0.182506 6.239686
0
3.626040 -0.182506 6.239686 1.791109 0.034851 13.053520 8.656182 1.188276 1.791109 1.188276 19.234164 0.034851
0
1.570559 -0.477647
0 0.363851
0 0.131216
0 0 0 0
1.570559 - 0.477647 0.363851 0.131216
17.996882
0
0
27.739174
GENERAL REFERENCES
Frazer, R. A., Duncan, W. J., and Collar, A. R., Elementary Matrices, Cambridge Univ. Press, London and New York, 1938. Turnbull, H. W., and Aitken, A. C , Canonical Matrices, Blackie, London, 1950.