CHAPTER X V l l l
Markoff Matrices and Probability Theory I n this chapter we wish to call attention to a rather remarkable class of matrices which provide a tool for the study of certain types of probabilistic problems. A considerable body of theory has been developed concerning these matrices. We shall not attempt even to survey this knowledge, since to do so would require much more space than appears justified here. Instead, our intent is to provide the means for recognizing a problem for which this approach is appropriate, to state briefly how such a problem is set up, and to show some of the properties of its solution. We consider random or stochastic processes of a restricted type known as Markoff processes or chains. A Markofprocess is one in which the future is determined only by the present. We do not need to know anything about the past history to compute the statistics of its future. This may appear a rather drastic restriction. However, there are a great many problems which concern processes that are at least approximately Markoff. Others may be put in this category by properly defining the present state. That is, the description and categorization of the state at time t may be made to include not only the observable situation at time t, but the manner in which it was reached. T h e general class of Markoff processes include those with a continuum of possible states, and with continuous transition from one state to another. Brownian motion, in which the motion of a set of massesideally point masses-are determined by a set of unpredictable, and hence random, factors acting discretely but at random intervals, is an example. We shall not here concern ourselves with such a broad class of processes. Instead, we shall be concerned with processes in which transitions occur at fixed intervals of time-or of whatever is the independent variable. We shall assume, furthermore, that these transitions are among a finite set of possible states. At least we shall assume that the physical problem can be so approximated. As is so often the case in probability problems, we can best illustrate this by a problem of gambling. Let the state of a gambler’s pocketbook at time t be N(t). That is, he has $N. Once a minute he makes a bet. On each bet, there is a known probability that he will win $K, where K is 37 1
372
XVIII. MARKOFF MATRICES AND ‘PROBABILITY THEORY
one of some set of integers. If K is negative, then this is a probability of loss. Hence, if he has $ N ( t ) at time t , we know the probability that he will have $(N K) at time (t 1). We can then ask various questions. If, for example, he starts with $No at t = 0, and vows to quit when hC has $M, we can ask: What is the probability that he will go broke before reaching his goal ? (We cannot, however, ask the piobability that he‘ will keep his vow if he is lucky!) This is the so-called Problem of the Gambler’s Ruin. There are many areas where similar problems arise that are of greater cultural significance. For example, consider an atomic pile. A neutron is generated at some point inside it. This neutron, after being moderated to thermal energy, bounces around in a random fashion. I t collides with various types of nuclei. At each collision, it may simply bounce, or it may be absorbed without further effect, or it may be absorbed with subsequent fission and the release of new neutrons. Furthermore, if it bounces long enough, it may irreversibly escape from the boundary. We may say that the neutron has “won” if it induces a new fission, “lost” if it is captured without fission or escapes, and remains in a neutral state if it simply bounces. We are vitally interested in the probability that a neutron generated at a random point in the pile will win. If the probability of winning is high enough, taking into account the average number of neutrons generated in a fission, then the pile will run away. If it is too small, then the pile will be quenched. It is obviously of great interest to know just where is the balance point. Again, for example, consider a length of transmission line terminated in a known impedance. There are joints in it every so often. Each joint contributes some degree of reflection, the exact amount and its phase being a consequence of manufacturing tolerance and therefore known only as a probability distribution. T h e problem can usually be approximated sufficiently well if we quantize the possible reflections. That is, we assume that the reflections have one of a finite set of magnitudes and a finite set of phases, with known probability distributions. We can then ask: What is the probable input standing wave ratio, and the probable variance around this value? This is a question of how the various individual reflections will successively combine. That the rules of combination are somewhat complicated-vectorial addition if the reflections are small enough, and worse if they are large, or accumulate to a large standing wave ratio-may be nasty in detail, but does not alter the principle. A large number of other problems could be cited. Some are of tremendous importance. (For example, Darwin’s principle of survival of the fittest is, in detail, a Markoff process involving the successive generations.) However, suffice it to say these problems arise whenever
+
+
1.
STATE VECTOR
373
one wishes to know something about the end result of the successive application of random processes. We may categorize all such problems as involving a system that can exist in a finite number of states. At intervals, which may be regular or not, events happen which may change the state of the system. The effect of a given event cannot be predicted exactly, but only as the set of probabilities associated with the possible transitions. These probabilities are, in general, functions of the state preceding the event, but are not functions of any other observable data. We want to know something about the probable situation after a given number of such events. Or we want to know the probable number of events that will occur before the system is in a given state or set of states. Such a problem is called a Markoff or random walk or drunkard’s walk problem. 1. S T A T E V E C T O R
T o tackle such a problem, we set up a state vectorwhose dimensionality equals the total number of possible states of the system. In the gambler’s case, he may have anywhere from $0 to $ M in, say, $1 steps. Hence we establish an ( M 1)-dimensional vector, each component representing one possible state. We may order these components in any way. In the gambler’s case, it is natural to order the states according to the gambler’s bankroll. The first component may represent his having the maximum amount, $ M , the next $ ( M - l), and so on, to the last component which represents him being broke, with $0. Or vice versa. This is a matter of convenience only, however, and has no mathematical significance. In the cases of the atomic pile or the reflections on a transmission line, there is no such simple order suggested. But all that is required is that we know what state corresponds to each entry. If we know that at a given time the gambler has exactly $N, then we set the Nth component of the vector equal to unity, and all other components equal to zero. We call such a situation a pure state. If we do not know exactly what the gambler has, but only the probabilities, then we enter those probabilities. If the probability that he has $N is p N , then we write the vector
+
We call this a mixed state.
374
XVIII. MARKOFF MATRICES AND PROBABILITY THEORY
For a given vector x to represent a state, whether pure or mixed, it must have certain properties: (a) Its components are all real. (b) Every component is nonnegative. (c) The sum of its components must equal unity, since the system must be in some one of the states listed. A vector with these properties we call a probability vector. We need also to consider vectors that are not probability vectors. In particular, we shall need vectors that are the difference between two probability vectors. We define the norm of a vector as the algebraic sum of its components:
This is a somewhat unusual definition of norm, but it is the one that suits our purpose here. We recall the definition of a norm given on page 61. T h e vectors that we will want to consider will all have norms that are nonnegative definite. The norm as given is a linear function so that 11 ax 11 = 0111 x 11. And Minkowski’s inequality does hold. In fact, we can go further and replace the inequality by an equality:
!I x t- y II
= I/ x II
+ II Y II
(3)
A probability vector has unit norm. A vector formed as the difference of two probability vectors has zero norm. Hence such a vector cannot be nonnegative-i.e., not all of its components can be nonnegative. 2. TRANSITION O R MARKOFF M A T R I X
We next consider what happens when our gambler plays the game (or the neutron collides, or we move our attention one length back along the transmission line). Suppose, first, that he started with exactly $n. After one play there are various amounts that he may have, with various probabilities. Suppose that the probability that after one play, he ends up with exactly $m is p m n . Or pm, is the probability that the neutron is at position m after collision, if it started at position n. Or P , ~ ,is the probability that the complex reflection coefficient has the value that we index as #m, if it has the value indicated by #n one section further along the line.
2.
TRANSITION OR MARKOFF MATRIX
375
T h e value P,,~,is the transition probability from state n to state m. Suppose, now, we do not know that the gambler had, intially, exactly $n. Suppose we only know that there was a probability x , that he had this sum. Then after one play, the combined probability that he started with $n and ended with $m is the product of the probabilities, pmnx, . Then the total probability y m that he has exactly $m after one play is the sum of all such quantities, over the range of possible initial states:
Equation (4)is the expression for the product of a matrix by a vector. Hence we can write the totality of such equations for all values of m by the vector equation y
= Px
(5)
where P is the matrix of transition probabilities. T h e matrix P is called a transition or M a r k 0 8 matrix. Suppose we consider the situation after s plays, and indicate the gambler’s state by x(s). Let xk be the pure state in which the kth component is unity, all others zero. This corresponds to exact knowledge of the amount of money the gambler has at that time. After one more play, the gambler is in the state x(s + 1). If he started in xk , he has
+
1) must be a probability Since this is a possible situation, the vector x(s vector, even though one describing a mixed state. But it is simply the kth column of P. Hence a necessary and sufficient condition for P to be a Markoff matrix is that each of its columns shall be a probability vector-all its components are real and nonnegative, and the sum of the coefficients in each of its columns must be unity.l There is some confusion in the literature as to just what is called a Markoff matrix. Sometimes pmnis taken as the probability that state m arose from state n, and the Markoff matrix is the matrix of these probability terms. In this case, each row must be a probability vector, since state m must have arisen from some state.
316
XVIII. MARKOFF MATRICES A N D PROBABILITY THEORY
3. EIGENVECTORS O F A MARKOFF MATRIX
I n the following we shall assume simple structure. This is not necessarily true and the possibility of generalized eigenvectors can be taken account of without too much elaboration, but we shall confine ourselves to the simpler case. Consider an eigenvector xi which is also a probability vector. T h e n it is a possible state of the system, and x(s) in Eq. ( 5 ) may be this vector: x(s)
= xi
PXi = xixi and
+
x(s
(7)
+ 1) = xixi
However x(s 1) is then also a probability vector. Since the norms of both x(s) and x(s 1 ) must be unity, hi must be unity. T h e eigenvalue corresponding to any eigenvector that is also a probability vector must be unity. This does not prove that there is necessarily such a vector that is both an eigenvector of P and a probability vector. This can be proven, however. It can also be proven that there can be no eigenvectors whose eigenvalue has a magnitude greater than unity. This is perhaps not unreasonable. T h e existence of such a vector means that there exists a limiting situation. If the game is repeated a large number of times, it will generally approach asymptotically a constant distribution of probabilities. (There are exceptions to this, however.) Consider as a specific example the gambler who plays until he is either broke or else has accumulated $ M when he, hypothetically, quits. I n this case, there are two eigenvectors with unit eigenvalues representing these two pure states:
+
..., 0)
(8)
xz = col(O,O, ...)0, 1 )
(9)
x1
= col(l,O, 0,
When he reaches either state, the game goes on without him, so that the state is unchanged by further action of the game. Hence these are eigenvectors with unit eigenvalue. Suppose, in addition, there are the eigenvectors x3,..., x, making u p a complete set. T h e n any initial vector, whether a pure state or not, can be expanded on these eigenvectors: ~ ( 0= )
2 i
U a. Xa. - alX1
+ uzx, +
***
(10)
3.
EIGENVECTORS OF A MARKOFF MATRIX
377
After playing the game once, the gambler’s state is
and after m times
x(m)
=
Cui~imxi
T h e initial state was a probability vector. So, too, are the states after each successive play. Hence 11 x(m) 11 = 1 for all m. It is evident, and can be proven rigorously, that every eigenvector whose eigenvalue is not unity must have zero norm
11 xiII
=0
if Xi # 1
(13)
Otherwise we cannot hope to maintain the norm at unity with successive applications of P. Consider, furthermore, the xi whose hi is the largest of the set of A’s which are not equal to unity. If this 1 hi I > 1, then as m increases, this eigenvector must eventually dominate all others. Since the norm of this eigenvector is zero, at least one of its components must be negative. Hence, if such a vector existed, x(m) would eventually have a negative component. Since x(m) is a probability vector this is impossible. Hence all the eigenvalues of P must have magnitude not greater than unity:
I hi I
<1
for all X i
(14)
I n the case of the gambler, if x1 and x2 are the only eigenvectors with unit eigenvalue, then after a sufficiently large number of plays, the contributions of the x 3 , ...,x,, parts of x(m) in Eq. (10) become vanishingly small, so that lim x ( m ) = ulxl
m+m
+ u2x2
T h e probability of his ultimately winning $M is a,, and that of his ultimately going broke is u2 , the coefficients of the corresponding vectors in the expansion of the initial state on the eigenvectors. T h e above is not intended to be a rigorous proof but is rather a plausibility statement. T h e theorem can, however, be rigorously proven. T o summarize, we state the theorem as follows. If P is a Markoff matrix that is semisimple, then it has at least one eigenvector with unit eigenvalue and which can be chosen to be a probability vector. All other eigenvectors have eigenvalues whose magnitudes are not greater than unity. Those with eigenvalues whose magnitude is less then unity have zero norm.
378
XVIII. MARKOFF MATRICES A N D PROBABILITY THEORY
It must be possible to find at least one eigenvector with unit eigenvalue that is a probability vector. This need not be a pure state. Neither does it follow that every eigenvector with unit eigenvalue is normalizable to a probability vector. I n the gambler’s problem, we had two eigenvectors with unit eigenvalue that were pure states, x1 and x2,given by Eqs. (9) and (10). Since they are degenerate, any linear combination of them is an eigenvector with unit eigenvalue. But (xl- x2), while still such an eigenvector, has zero norm and so is unnormalizable. Also, (2x1- x2) is such an eigenvector which does have unit norm, but is not nonnegative, and so is not a probability vector. As another example, consider the transition matrix
This is a Markoff matrix since each column adds to unity and all terms are nonnegative. Each pure state is turned into the other by its application. T h e eigenvector with unit eigenvalue, adjusted by a scalar factor to have unit norm, is x1 =
(4
and indicates an equal probability of the two pure states. T h e other eigenvector is
and has an eigenvalue of -1. Its norm is, as expected, zero. We may note that in this case, an arbitrary initial vector does not approach a determined limit. Its expansion in terms of x1 and x2 will in general contain some finite amount of x2 . Since I A2 I = 1, successive application of P does not cause this component to decay, but simply to alternate its sign. Hence the state oscillates. Even in such a case where the system does not approach a limiting condition, we can show rigorously that its average state, averaged over successive applications of P, does approach a limit which is composed of the eigenvectors of eigenvalue unity. 4. R E C I P R O C A L E I G E N V E C T O R S
I t is of interest to consider the reciprocal eigenvectors. These, it will be remembered, are the vectors yi such that yi-fxj= Sii
(19)
5. Then, since it follows that or
NONNEGATIVE A N D POSITIVE MATRICES
379
y It p x3. -- A.3Yi tx.3 -- h.6.. = xi 3 1 3 y,tP
= piyit
Ptyi
= pi*yi
so that the yi are the eigenvectors of the adjoint matrix under the unitary inner product relation. Then, since y.tpx. a E z p EYE . .tx. 1 = xiy,txi it follows that pi = hi . Since the columns of P add to unity, it is immediately evident that the vector
is a reciprocal eigenvector of P with eigenvalue unity, or an eigenvector of the adjoint, Pt. We are able to write down immediately and explicitly one of the inverse eigenvectors of any Markoff matrix. I t is, furthermore, an eigenvector with unit eigenvalue and unit norm. 5. N O N N E G A T I V E A N D POSITIVE MATRICES
T h e class of Markoff matrices is a special case of a broader class which are known as nonnegative matrices. A matrix is said to be nonnegative if all of its coefficients are real and 3 0 . If all of the terms are positive, none being zero, the matrix is said to be positive. Note that this has nothing to do with positive definiteness. Such matrices have an important subclassification according to whether they are reducible or not. We use the term reducible here in a similar, but slightly different sense from that we mentioned before (Chapter VIII, Section 1). We call a Markoff matrix reducible if we can permute the order of its basis vectors in such a way as to put it into the form where A and
C are square matrices. Otherwise, we call it irreducible.
380
XVIII. MARKOFF MATRICES AND PROBABILITY THEORY
Such a permutation of the basis vectors can be accomplished by a similarity transformation with an operator T that is a permutation matrix-i.e., a matrix in which each row and each column has only a single nonzero term, which is unity. We do not, however, permit a general change of basis since this might upset the property of a given vector being a probability vector. T h e class of irreducible nonnegative matrices have some remarkable spectral properties that are a generalization of those we have cited for Markoff matrices. These are given by the theorem of Perron as generalized by Frobenius which we shall not prove: Theorem. A n irreducible nonnegative matrix always has a positive real eigenvalue A that has multiplicity one. The magnitudes of all other eigenvalues do not exceed A. T o A there corresponds a reciprocal eigenvector that can be normalized to have coordinates that are all real and positive. Furthermore, if the matrix has k eigenvalues, A, , A, , ..., Akp1 , all of magnitude A, then these eigenvalues are all distinct and are roots of
Likewise, all other eigenvalues are simple and occur in sets that are similarly related. I f k > 1, then the matrix can, by a permutation, be put into the f o r m
where the 0’s along the main diagonal are scalars or square matrices. T h e simple matrix of Eq. (16) is irreducible. T h e only possible permutation of the order of the states does not change it, so that it cannot, in this manner, be put into the form of Eq. (25). T h e matrix of the gambler’s problem, on the other hand, must be reducible. If the gambler goes broke, or wins his predetermined amount, then no further change can occur. Hence, if the states of x are ordered with $0 at the bottom and the maximum amount at the top, the first column of P has all zeros except the top element which is unity. Likewise
EXERCISES
38 1
the last column is all zeros except the bottom element which is unity. Hence, with this sequence of the basis vectors, the matrix is in the form of Eq. (23) since it can be partitioned into 1 x 1 and (n - 1) x (n - 1) blocks on the main diagonal that are square, and the n x 1 block in the lower left is null. We have already seen that in this particular case the unit eigenvalue has at least multiplicity two, and generates at least two eigenvectors. A weaker form of the theorem of Perron and Frobenius can be proven for reducible matrices. Specifically, we can show that, in this case, there still exists a A, which is real and positive, and such that it is not exceeded by the magnitude of any other eigenvalue, and such that there corresponds a nonnegative reciprocal vector. We do not know that A, is nondegenerate, nor do we know the interrelation between eigenvalues of the same magnitude. 6. CONCLUSIONS
The class of Markoff matrices gives us the means of studying various problems which involve probabilistically determined processes. In addition, they provide a class of matrices about which a great deal can be inferred on general grounds. We know, for example, much about the spectrum of eigenvalues and the corresponding eigenvectors or reciprocal eigenvectors. I t is therefore sometimes worthwhile even in nonprobabilistic problems to consider if the basis can be renormalized so as to make the matrices involved either Markoff matrices or their transposes. If we can can do this, then the theory of this class of matrices can be employed even though the physical problem has nothing to do with a Markoff process. We have not attempted here the mathematical proof of the theorems involved. Instead we have discussed them on the basis of a hypothetical game and attempted in this way to make them plausible.
Exercises
1. Suppose a gambler has $200 to begin with. Suppose he quits when he is either broke, or has $400. Suppose the game pays even money on any bet, and gives him a probability p of winning any single bet. What is his expectation of winning the $400 he wants if he (a) bets his $200 on a single play ? (b) makes $100 bets ? (c) makes $50 bets ? How should he play if the game is favorable ( p > 3) ? How if it is unfavorable ( p < 8) ?
382 2.
XVIII. MARKOFF MATRICES A N D PROBABILITY THEORY
Consider a Markoff matrix of the form
f f !H
l b O O O
A=
'be..
l) O c a b O O
O
c
a
o o o c where a, b,
c
O
are real and positive and a+b+c=l
Find an explicit expression for the eigenvalues (cf. Chapter XVII, Exercise 1). Show that they are real and have magnitude less than or equal to unity. Find the eigenvectors. Show that their norms have the values deduced in the text.
3. Show that if P and Q are Markoff matrices, then X = a P + ( l -u)Q
is a Markoff matrix where a is in the range 0 4.
< a < 1.
Prove that if P and Q are Markoff matrices, then so is PQ.
5. Show that if u is a real vector whose norm [in the sense of Eq. (2)] is zero, then the algebraic sum of the components of any column of uvT, where v is any real vector, is zero. Use the dyad expansion to show that if A is apositive semisimple matrix whose eigenvectors are x1 , x2 , ..., x, , with eigenvalues Al = 1 and A,, A,, ..., A,, any suitable values, and with norms (1 x1 11 = 1, 11 x2 11 = (1 xg I( = .*.= 11 x, 11 = 0, then A is a Markoff matrix.