INFORMATION AND CONTROL 9, 4 6 9 - 5 2 0
(1966)
Equivalences between Probabilistic and Deterministic Sequential Machines* CARL V. PAGE
Department of Information Science, Unit'ersity of North Carolina, Chapel Hill, North Carolina This paper is concerned with behavioral equivalences between machines which have random internal state transitions, and machines which have deterministic internal state transitions. Section I contains a method of constructing a finite deterministic machine (if one exists) which is expectation equivalent to a given probabilistic sequential machine {Theorem 1.3 and Theorem 1.7). In Section II the method is extended giving invariant subspace conditions for the existence of an input-state calculable (deterministic transitions but random output) machine which has the same first N central moments of output as a given probabilisfic sequential machine (Theorems 2.4 and 2.5). The common practice of using behaviorally equivalent machines as interchangeable submachines of other machines is formalized in Section I I I . Indistinguishability is the only equivalence which meets the interchangeability condition (Theorem 3.4). It is also proven that some finite deterministic machine can distinguish between two tape equivalent machines which are not expectation equivalent (Theorem 3.2). Section IV provides bounds ell the length of input strings necessary for deciding whether the following relations hold between arbitrary members of their domains: ---E (Theorem 4.3), ~ . (Theorem 4.5), the reduction relations RE (Theorem 4.4) and R.v (Corollary 4.5). LIST OF S ~ I B O L S --~
--~D
Expectation equivalence. Def. 1.2 N-moment equivalence. Def. 2.5 Distribution equivalence. Def. 2.2
* Most of this research was conducted at the University of Michigan Logic of Computers Group with the support of the U. S. Army Research Office (Durham), the National Science Foundation, the National Institutes of tIealth, and the Office of Ngval Research. 469
470
PAQE
----r ~-r "s
Tape acceptance equivalence. Def. 3.3 Indistinguishability. Def. 3.9 Equivalence of distributions (or states) of machine A associated with --- s • Def. 4.2 Ex(x) Expectation of output for input sequence x of machine A. Def. 1.1 Oa*(x) The output random rariable for input string x in machine A = Ox(rpa(x) ). Def. 2.1 OaR(S) Random output attached to machine A depending on state S p~a(x) ith central moment o f output for input string x machine A. Def. 2.3 PxX(y/x) The probability that output sequence y will be observed given input sequence x in machine A started from initial distribution I. Def. 3.5, and Remark 3.7 RR Reduction relation for --=s. Def. 1.3 R,v Rcduction relation for =,v. Def. 2.5 Y = [J~'=l {F,} Set of oulput codes for a machine A. Remark 2 T( A, k) T h e set of input tapes accepted by probabilistic sequential machine A with cutpoint k. Remark 3 INTRODUCTION The notion of behavioral equivalence is a fundamental part of the study of automata theory. Two superficially different definitions of behavioral equivalence occur in the literature for deterministic nmchines. One, discussed by Burks (1961), which we will write ----x, calls two machines behaviorally equivalent if they define the same function from input strings to output strings. The other, part of Rabin-Scott (1959) automata theory, which we will write = r , calls two machines behaviorally equivalent if they accept the same set of tapes. For any deterministic nmchines D and D' with the same input alphabets and binary outputs, D - t D' holds if and only if D - r D holds. However, for the generalizations of --z (Carlyle, 1961) and --r (Rabin, 1964) to probabilistie machines A and A ~, we observe that A --- r A' does not imply A ---zA'. This paper is concerned with properties of several behavioral equivalences between probabilistic machines. The major question considered is whether there exists a finite deterministic machine, perhaps with random number generator outputs, which is behaviorally equivalent to a given
BEIIAVIORAL EQUIVALENCES BETWEEN .MACHINES
471
probabilistie sequential machine. In the last section, bounds are found for the experimental determination of whether a given behavioral equivalence holds between two nmchines. In order to gain insight into the kinds of equivalences which will bc studied, two nmdels of probabilistie sequential machines will be presented later in this section. A. THE CONCEPT OF PROBABILISTIC SEQUENTIAL ~[ACIIINE
B y a probabilistic sequential machine is meant a system which satisfies one of the following definitions: DEFINITION 0.1. A (Moore-type) probabilistic scquential machine A is a system
A = (n, I, S, Y,, A (or) : u E 21, F, O} where n:
I: • S:
a natural number, the number of states the initial distribution, an n-component row vector whose c0mpoponents are probabilities set of state vectors = [Sl = (1, 0 , - . . , 0 ) , - . - , S, = (0, - . - ,
0, 1)}
Z: A(a):
filput alphabet.. Usually 2; = {0, 1, 2, . . . , k - 1} cr E 2; n X n transition matrix for input symbol ~. A(a)lm is the probability of a transition from state I to state m via symbol ~r. Wc frequently call A (~) a "symbol matrix" F: output vector, a n-component column vector whose entries are real numbers. F, is the o u t p u t from state S, O: o u t p u t function O(,-qd = F, = S c F (where " . " indicates matrix multiplication). When clear from context, some of the parts of the formal definition may be omitted for the defilfition of a particular mactfine. DEFI:,UTmX 0.2. A (Mealy-type) probabilistic sequential machine
A = (n,I,S,Z,A(z):crEZ,
W,P}
where n, I, S, 23, A(~) : a E 21 are as in Definition 0.1, W is the set of outputs and where the o u t p u t function P satisfies
P ( S ~ , cs) = TVq
S~ E S, ~ E ~
TVq E IV
I t is an easy nmtter to show that Definition 0.1 and Definition 0.2 are equivalent in the following sense: For every Moore-type probabilistic sequential machine there is a Mealy-type sequential machine which is
472
PAGE
indistinguishable in an intuitive sense and vice-versa. Consequently, we will be concerned only with the properties of Moore-type probabilistic sequential machines which from now on will bc called "probabilistic scqucntial machines" or just "machines." M a n y systems like probabilistic sequential machines occur in fields of study not historically associated with automata theory. Brains and Svcchinsky (1962) discuss a system like Def. 0.1 in their paper On learning theory, If one takes the cartesian product of machines of Def. 0.2, one gets M a r k o v processes with rewards and alternatives as studied in sequential decision theory as prcscnted by Howard (1960). Matrix games as discussed by Thrall (1957) can be considered as instances of Definition 0.1 in which I and F are strategy vectors and game matrix A(x) is defined by a string x. A simple correspondence shows that the noisy discrete channel of Shannon (1948) is equivalent to the system of Definition 0.2. Someday probabilistic sequential nmchincs may become a unifying concept, providing results for diverse fields. Probabilistic sequential machines were devised as slight generalizations of the probabilistic automata of Rabin (1964). If the initial distribution I is restricted to elements of S and F~ = 0 or 1 for i = 1, --. , n then Definition 0.1 defines a probabilistic automaton. Following Rabin, we describe how the transition nmtriccs for sequences of inputs are generated by the symbol matrices. Remarl¢ 1. Let x = i ~ . . . i , , i ~ . E 2 ; , j -- 1 , . . - , r. Then A ( x ) = A ( i l ) . . . A ( i , ) , i.e., the transition matrix for a string x is found by multiplying the matrices for the symbols of x together in order. Remark 2. Sometimes the real numbers which are the outputs of a probabilistic sequential machine will be regarded as codes for symbolic outputs. Remark 3. The expectation of output for input string x of machine A is just E.,(x) = I A ( x ) F . For any real number ,~, the set of tapes accepted by A with cutpoint )~, T(A, )~) = {x E ~*: E.~(x) >- h}. B. NOTATIONAL CONVENTIONS In what follows it will be convenient to use certain notational conventions. With regard to subscripts, note that state S~ is identified with i the vector (0, . - . , 1~ 0, . . . , 0). We use A(x)~ to m e a n / t h r o w of the nmtrix A (x). On the other hand, A (x) ~means the ith power of the matrix A (x). M a t r i x multiplication is indicated in the custonmry ways, using
BEHAVIORAL EQUIVALENCES BETWEEN MACItINES
473
both " . " and juxtaposition. The following notational identities are used frequently: S,.A(z).F = A(x),.F = (A(z).F), When necessary, identifying superscripts or subscripts will be added to basic symbols, e.g., F " is the output vector of machine B. If P is an alphabet, all sequences of length r are denoted by F r. Following the conventions of automata theory, given an alphabetF, F* denotes the set of all finite sequences (or strings or tapes as they arc often called) of symbols from F. I t will always be assumed that F* contains an identity string hr so that Arx = x, x E F*. Furthermore, we will assume that the input alphabet set 2; contains a symbol A such that A(A) = E ( n ) : the n-dimensional identity matrix. We do not require that hx = x although A (Ax) = A (x). The length of a string x is the length of the sequence of symbols which it denotes and will be written lg. (x). Contrary to nornml notation, the A defined above has length 1 lg. (A) = 1 whereas in general ]g. (Ar) = O. Concatenation between strings will bc indicated b y juxtaposition. Hence if x and y are strings lg. (xy) = lg. (x) q- lg. (y) The exponential notation on strings will be used to indicate repetition n times
,.--.-...,-,-.--.,
i.e., x" = x -- • x so that lg. ( 2 ) = n lg. (x). An abstract machine (a nmchine with no initial state specified) will be indicatcd by leaving a blank iu the definition of the nmchinc e.g.: A = (n, , ~ , A ( ( r ) : ~ E Y ~ , F , O } C. ~[ODELS OF ~)IIOBABILISTIC SEQUENTIAL ~[ACIIINES
Two nmdcls will be considered, one of which is probabilistie and one of which is deterministic, although both fall within the axiomatic framework of probabilistic sequential machines. Example 0.1. Probabilistic internal operation: A slot-machine A simple model of a probabilistic sequential machine is a slot-machine. The static position of the dials represents the present state of the ma-
474
PAGE
chine. Usually there are 20 different positions on the dial and 3 dials for a total of 8000 states. The input consists of putting in a coin and pulling a level, causing the machine to travel transiently through many states until it settles down in one state. An output is associated with each state. Nothing (which is associated with 0) comes out unless the dials all display the same object. In that case, some change tumbles out (which is associated with the corresponding real number) usually dependent only on the kind of object being displayed, i.e., the state of the machine. Such a inachine whose output is controlled by its states is known as a "Moore machine" (1956). Each state can be associated with a number between 1 and 8000, and the output for each state call be tabulated in a colunm vector or 8000 X 1 matrix. In the formalism, this column vector will be called the "output vector" and designated by the s)nnbol "F." The output for state i will be written as "Pc ." The enormous number of distinct ways the lever can be pulled are prevented from significantly influencing the outcome by spring loading. Hence a normal pull of the lever L produces only one kind of state transition law which could ill principle be determined and tabulated in a "switching" matrix A (L). The behavior of a slot-machine A could be described rising a finite state 5Iarkov chain with rewards and transition matrix A (L) but for the fact that various nonstandard but repeatable inputs have been developed by players of such machines. A more complete description requires some finite number of additional transition laws associated with the nonstandard inputs to the machine. We associate such inputs with additional input symbols. Consider how the dials of the machine might be found initially. If the dials can be completely observed, the initial state S~ is represented by a vector I (or a i × 8000 matrix) with a 1 in the ith component and zeros elsewhere. On the other hand, the dials may not be completely visible, we may wish to specify the average behavior of a large number of machines run sinmltaneously, or we may wish to consider the average return from playing one machine only when it is left by other players in one of a set of preferred states. In any one of these cases, I can be a stochastic vector (I~, . . . , h000) where I~ is the probability of being in state S~ at time to. In the general case, the next state probabilities starting with an initial state vector I and an input string x are given by I . A ( x ) . Hence the expected value of output of a machine A starting with initial state distribution I and output vector F after a string x of inputs has occurred is
BEHAVIORAL EQUIVALENCES BETWEEN MACII1NES
475
just EA(x)
= I.A(x).F
which is a bilinear form in I and F with form matrix A (x). T h e variance in output and other higher moments can be defined analogously. E x a m p l e 0.2. D e t e r m i n i s t i c internal structure: C h e m i c a l production cell
Suppose a chemical tank A is divided into several isolatcd compartlnents A I , • • • , A, by partitions which are interconnccted b y an electronically controlled system of pumps and valves. Suppose that there is a finite set of controls 2; = {0, 1, -.- , K -- 1} and that for each control c a fixed fraction of the chemical in compartment A~, v~i, is pumped into compartment A t . For all controls e in 2~, the full influence on redistribution of liquid in the tank can be described in a n X n matrix A ( c ) with v~j being A(c)~j. Furthermore, suppose that the liquid being pumped between compartments is a catalyst which causes production of a desired end product in each compartment with a different efficiency, i.e., if the mass fraction of catalyst in A~ is P~ and F~ is the efficiency of A~, then the output of end product is P~F~. Note that it is assumed that the output of the compartment depends linearly on the catalyst present. T h e initial state I is an n component vector with the ith component I , being the nmss fraction of catalyst in compartment i. Note ~'~ ~1 I~ = 1 since the tank is a closed system as far as the catalyst is concerned. T h e distribution of mass fractions of catalyst over the compartments after a sequence of controls x = il - - . i,, is just I.A(il)
.....
A(i,.)
= I.A(x)
T h a t is, ( I . A ( x ) ) ~is the mass fraction of catalyst in compartment i after starting with initial distribution I of catalyst fractions over compartments and the string of control inputs x = i~ • • • i,, . The total end product from the tank is the sum of the outputs from each compartment: ~--~'-I ( I - A ( x ) ) ~Fi which can be written I - A ( x ) . F in matrix notation. This expression has the same form as the expectation of output for the probabilistic slot-machine, but there are no overt probabilities involved here. The nmss fractions of catalyst play the same role as the probabilities in the first example. However, the output will still be written like an expectation as E . ~ ( x ) . The total end product accumulated, T~, for the string of controls x from time to to time to ~ m is given b y adding the output from each sub-
476
PAGE
string, i.e., T~ = Za(il) --I- Ea(ili2) + "'' + Ea(iii2 " " i,,,) I. DETERMINING WIIETIIER A PROBABILISTIC SEQUENTIAL MACIIINE IS EXPECTATION EQUIVALENT TO A F I N I T E DETERMINISTIC MACHINE A. TtIE CONCEPT OF EXPECTATION EQUIVALENCE
h i the two models discussed in the introduction, tile expected value of output, Ea(x), played an important role in the physical intcrpretations. Let us repeat the definition of the expected value of output. DEFINITION" 1.1. The expected value of output for an input string x of a probabilistic sequential Inachine A is given by EA(x) = I . A ( x ) . F
for x
in
Z*
DEFINITION 1.2. Machines A and A' are expectation equivalent, written A --=~ A'; i.e., Ex(x) = Ea,(x)
for all
x
in
E*
Recall from Example 0.2 that Ea(x) was the actual output of the chemical cell and not an expectation. Hence the basic concept of expectation equivalence is analogous to the definition of behavioral equivalence ----, for Example 0.2. However for Example 0.1, the slot-machine, expectation equivalence is not the generalization of this kind of behavioral equivalence, histead, the concept of indistinguishability discussed in Section I I I seems to be the -lppropriate generalization. Example 1.1. Machines A and A' which are expectation equivalent: IA(x)F = I'A'(x)F' A = ( I , / i ( 0 ) , A(1), F)
A(0) =
A'(0)
=
1 0 .~ -I 0
and
~/
0 "~ ½
Vx C ~*
A' = (I, A'(0), A'(1), F') A(1) =
~ -~'
A'(1) =
0
(i ° ,:,:(D
0
BEIIAVIOI~kL EQUIVALENCES BETWEEN MACtIINES
47~
These machines are expectation equivalent from any initial probability distribution, I, over the states. The previous example shows that two machines can have very different symbol matrices and still be expectation equivalent. Some graph thcoretie properties of the transition matrices which are important to markov theory, such as the accessibility of a state, depend on the location of the zeros. This example shows that the location of the zeros is not the only relevant factor in the study of expectation equivalcnce. Consideration of the interplay between the state transitions and the real number outputs attached to states requires the use of elementary linear algebra. B. TttE REDUCTION RELATION RE
In this section a congruence relation on input sequences, R~, will be defined so that a quotient machine can be constructed. If the rank of Rs happens to be finite, the constructed machine has a finite number of states. States of the quotient machine will correspond to values of expectation which occur for input strings. By attaching a deterministic output device to each state of the constructed machine, an expectation e~luivalcnt deterministic machine is obtained. If the rank of RE is finite, some class of the relation must contain infinitely many strings. A necessary condition for RE tO be finite in rank is that it be nontrivial, i.e., at least two different strings are contained in some class. This weak necessary condition requires the symbol nmtriccs to satisfy certain strong conditions. DEFINITION 1.3. The reduction relatiol{ RE is given by
xREy iff EA(xz) = EA(yz)
Vz E ~*,
VIE S
RE is a right congruence relation on 2;* because of the reflexivity, transitivity and symmetry of " = " and the substitution property in its definition. It follows that strings x and y which are in the same class of the relation RE will have equal expectations from any initial state of the machine and will continue to have equal expectations for any finite input continuation z. As far as expectation of output is concerned, the behavior of the nmchine A is the same after either string x or string y. C. CONSTRUCTION OF THE QUOTIENT 5[ACHINE
DEFINITIOX 1.4. The equivalel~ce class of x' in R, an equivalence relation, is given by
478
PAGE
R[z'] = {x: xRx'} I t is a well known result from Rabin and Scott ( 1 9 5 9 ) t h a t given a right congruence relation R on Z*, one can construct a quotient automaton with no output T ( R ) T ( R ) = (a, S, ~I) where a =
R[A]
5: = {R[xl: x E ~*} M
is a f u n e t i o n f f o m
S × 2~ into
M ( R [ z ] , ~) = R[x~]
x E z*;
S
such that
~ E z
DEFINITION 1.5. Let fl __. ~*. A congruence R refines fl if
xRy ~ x E fl iff y E fl THEOnE.~I 1.1. Rabin and Scott (1959) Let fl be a subset of ~*. fl is the behavior of a finite (deterministic) automaton A = ( T ( R ) , if) orer ~ where 5 = {R[x]: x E fll i ff there exists a right congruence relation R of finite rank which refines {3. THEOr~ml 1.2. I f the congn~ence relation R~ has finite rank, then for any k there is a finite deterministic automaton A ' such that the tapes accepted by A ' are T ( A , k). Proof: Let fl = T ( A , k) = {x: E,~(x) >= h I. Note that Rj: refines fl, i.e., xREy ~ x E T ( A , ),) iff y E T ( A , k). If RE has finite rank, by definition RE[x] has a finite number of members. Using Theorem 1.1 we construct
T(RE) = (a, S, M) and
A' = (a, S, M, 5:) which accepts
T ( A , ~) Q.E.D.
D. CONSTRUCTION OF AN EXPECTATION EQUV,rALENT FINITE DETERMINISTIC i"~,IActIINE
The quotient machine construction will be used to obtain a sufficient condition for the rcduction of a probabilistic sequential machine into an expectation cquivalent finite deterministic machine whose output func-
:BELL~.VIORAL EQUIVALENCES :BET~,VEEN 5IACItINES
479
tion is either a constant C(s) for each state s or a random device Oa-~(s) with expectation E(Oan(s) ) = C(s). DEFINITION 1.6. rp,~(x) is the response of A to input string x. If A is deterministic, rpa(x) is the state of A after an input of x. If A is probabilistic, rpa(x) is a random variable taldng on values which are states with distribution I - A (x). TttEORm~ 1.3. The reduclion relation Re defined by a probabilistic machine A has finite rank if and only if there exists a finite deterministic machine A ' with a deterministic output O a, such lhat O a,( rp,t,( X ) ) = E a ( z ) V x E ~*. Proof (sufficiency) : B y T h e 0 r e m 1.i let A' = (a, S, M, 4)) where 4) is the empty set. Note any congruence R refines 4) vacuously. W e a t t a c h an output function Oa, to elements of S. 0.,,(s)
=
E~(x)
s =
R~[x]
For a deterministic machine, M is extended to M* which operates on strings rather than symbols by
M*(s,~) = M(s,,r)
s ES
~E
M*(s, ~x) = ~lt*(M*(s, ~), ~)
• e z*
We note t h a t M*(a, x) = rpa,(x) so we need to show only •that = RE[x]. L e t x = ili...., i,, for i j C ~ ; j = 1, 2 , - - - , m .
rp.,,(x)
rpa,(z) = M*(A, z) = M * ( M * ( a , i~), i,. . . . i,~) = M * ( M ( a , i,), i 2 . . . i,,)
= ~t*(M(R~[A], i,), i : . . . i~,) = M*(R~[Ai,],
i2...
i.,)
= M*(M(Rdid, i~), i~.-. i,)
= R,[i~i~... i,,] = RE[x]
Hence the constructed sequential machine is A t = (a, S, M, On,) Necessity: Given A ~ (a, S, M, OA,) such t h a t
Oa,(rpA,(x) ) = E a ( x )
V x E Y.*
O~,,(rpA,(xz) ) = E.4(xz)
Vz E Z*
480
rAGS
Let rpA,(x) = S~x E ~*. Define
S~RoS~
xR~y
iff
L e t n ' be the cardinality of S--finite. r a n k R0 = r a n k RE !
r a n k R0 = < n H e n c e r a n k RE is finite. Q.E.D. COROLLARY 1.3. The reduction relation RE defined by a probabilistic machine A has finite rank ¢:* there exists a finite deterministic machine A ' such that A - e A'. Proof: T h e machine A of T h e o r e m 1.3 meets the condition of the corol•
t
lary since
EA,(x) = OA,(rpA,(X) ) = EA(X)
VX C Z* Q.E.D. •
R
I n s t e a d of the deterministic function O,t, , a r a n d o m devine OA, (s) such t h a t E(O,~,(s)) = EA(X) could have been used in the construction. E. TIIE PARTITION OF TtIE SET OF ACCESSIBLE STATE DISTRIBUTIONS INDUCED BY Rz
DEFINITION" 1.7. V ( A ) = { I A ( x ) : x E Z * } - - t h e set of all stochastic vectors which can occur as distributions over the states of A. W e sometimes call V ( A ) the " s t a t e vectors accessible in A " . DEFINITION" 1.8. A set of vectors V = {vl, v2, • "-} is com'ex if for a n y finite set of indices J , real n u m b e r s c~. _>- 0, j C J and ~--]~eJ ci = 1 ~-~je~c~vi C V. T h e convex closure of a set of vectors V, written V + = {v': v' = ~-]~jejcjvj, ~ j e s c j = 1, ci > 0 and vi C V}. I t is clear t h a t V ( A ) ~ S +. THEOnml 1.4. I f Re has finite rank r, there exists a partition H = ( l i t , - . - , IL) on V ( A ) and an integer valued function g(l, m) szLeh that H~A(~) C II~(~.~)
i = 1, - - . , r;
a E Z
Proof: RE induces an equivalence on the set of stochastic vectors accessible b y the machine. Since Re has finite rank, form a set of an a r b i t r a r y distinct representative fronl each congruence class, say x t , . . - , x~ where x~xii = 1,_,o . . . , r ; j < i.
481
BEIIAVIORAL EQUIVALENCES B E T W E E N MACtIINES
Define IIi=
{IA(x)}
13 • E n E t z i]
We show that ( I I t , • .- , I I r ) is a partition of V ( A ) . Let |W =
~J II,i=l
IA(x') E I V ~ IA(x') E V(A) IA(x') E V(A)~x'ERE[xk]
for some
k = 1,.-.,r
I A ( x ' ) E IIk for some
h = 1,-..,
r
Hence
W=
LJ 12, = V ( A )
i-~l
We show H, CIIIi =
~,
i#j
Suppose t h a t I A ( y ) E H, [1 IIi
IA (y) E I I , ~ y E Re[x,] ~ yREx, IA (y) E Hi ~ Y E RE[xi] ~ yR~xi Hence we get
yREx~ ~ x~REy b y s y m m e t r y and transitivity of RE gives
x,RExj ~ X, E R~[xi] B u t x~ and xi arc representatives and there is only one representative from each class z , = xi
i # j
which is a contradiction. Finally we show there exists an integer valued function g(i, ~) such that H~A(a) C IIec,.,) a E 2;
vIE H ~ v l
= IA(wl)
for some
w l E Z*
v~A(~) = I A ( w t ) A ( ~ ) = IA(wt¢) E Hi
PAGE
482
for some j as has been shown above. v~E I L ~ , 2
= /.4(w2)
for some
w2E 2~*
v~A(~) = I A ( w ~ ) E ni since elements of R~ have the substitution property, i.e.,
wlRrw~ ~ wlcrR~x~
~EZ
u'2RExi ~ w2aREx~a
a E Z,
x~a is an clement of a class with representatives xi for s o m e j and depends only oil x, and or. So there is a function g(l, m) such that g(i, ~) = j
~ E )2 Q.E.D.
F. NECESSARY AND SUFFICIENT CONDITIONS THAT STRINGS BE IN TtlE
SA-~IE R~ CLASS The relation RE has occupied an important place in the developnmnt of this theory. T h e structure of the transition matrices of strings which are in the same RE class will now be studied. Our rcsults will be similar to the results presented by Mostow et al. [1963] for quotient spaces. DEFINITION" 1.9. A relation R is ~wntrivial if there exist x and y in the domain of R with x ~ y such that xRy. DEFINITION 1.10. The kernel of F = Kern. (F) = {vER~:v.F=O}
where R is the set of reals. DEFINITION 1.11. The span of a set of vectors {v~, -. • , v,} is denoted by
({vl, . . . , v , } )
=(~.c,v,
Vc, E
A necessary and sufficient condition for x and y to be in the same class of the reduction relation Re is given by the next theorem. THEOnE.~I 1.5. xR~y ¢:* there exists a subspace U of Kern. (F) such that (i) U . A ( z ) c Kern. (F)
(ii) A(x) = A ( y ) - ~ -
Vz E Z*
: withu~E U i =
1,.-.,n
BEIIAVIORAL EQUIVALENCES B E T W E E N MACtIINES
483
Proof: xRry ¢:* I A ( x z ) F = I A ( y z ) F
VIE S
Vz E Y-,*
hence A(x)F = A(y)F
(1)
because
S = {(1,0,.-.,0),...,(0,-.-,0,1)1
and
AE2:*
Using elementary linear algebra, the solution of (1) consists of a particular solution and a kernel. A(x) = A ( y ) +
where
hiE Kern.(F)
i = 1,2,...,n
n
multiplying by A (z) A(x)A(z) = A(y)A(z) +
A(z)
Vz E Y,*
n
A ( x z ) = A ( y z ) -t-
A(z) n,
Multiplying by an arbitrary distribution I and output vcetor F I A ( x z ) F = I A ( y z ) F -t- I .
.A(z)F
I E S+
n
But since x and g are in RE IA(xz)F = IA(yz)F
Vz E Y,* I E S +
Hence I.
li
. A ( z ) F = 0 ~ h~ A ( z ) E Kern. (F),
I
Let U = <{h~ , . . . , h~l>. W e get UA(z) c Kern. (F)
Vz E 21"
i = 1, .-- , n
484
VAQE
To show the opposite implication, let H =
i
where
h~ C U c K e r n . ( F ) ,
i = 1,2,..-,n
i
A ( x ) = ,t ( y ) + t I
Multiplying b y A (z) on the right for an arbitrary z C N*
A(xz) = A(yz) +/-/A(z) Multiplying by I on the left and F on the right
I A ( x z ) F = I A ( y z ) F + I . \h~A}z)F].
I E S
but h , A ( z ) F = 0 since h~A(z) E Kern. ( F ) , i = 1, . . . , n. Hence IA(xz)F = IA(yz)F Q.E.D. P a r t (i) of Theorem 1.5 will now be restricted to the finite class of symbols rather than the unbounded class of strings. THEOREM 1.6. Let U = (U=~z. {A(x), - A ( y ) , } :i = 1, . . . , n for x, y such that xREy) then U . A ( z ) c Kern. (F) *=~ [3 jr a subspace of R~:
(i)
UA(¢) c V:'Ca C
(ii)
VA(cr) C V C
Kern. (F) V ¢ ~ 2~]
Proof: U A ( z ) c Kern. ( F ) Let V =
({u.A(z);
u C U, z E
~*})
VA(¢) = { u A ( z ) A ( ¢ ) ; u ff U, z C 2~*} =
V
Consider an arbitrary v E V. There nmst be some set of indexes J and constants ci such that v = ~ clujA (zi)
by definition of V
BEIIAVIORAL EQUIVALENCES BETWEEN MACIIINES
485
v . F = (~--~ c~usA(zi) ) . F jE.r
= ~ c,~usA(zslF jE.r
But u i A ( z s ) F = 0 b y U A ( z ) c Kern. ( F ) SO
v.F=O
Hence V c Kern. ( F ) UA(z) c l r
by definition
Therefore U A ( z ) c Kern. ( F )
Q.E.D. G. INVARIANT SUBSPACE CIIARACTERIZATION OF RE DEFINITION 1.12. A subspace V is im, ariant under a set of linear transformations {T~ : i = 1, 2, • :- , m} if V'T~c
V
i=
1,2, . . . , m
Theorems 1.5 and 1.6 yield the following directly: THEOREM 1.7. Strings x and y are in the same class of RE i f and only i f there exisls a subspace V of Kern. (F) such that (i) V is invariant under {A(a); Va E ~}. (ii) A ( x ) = A ( y ) + H where H , C V, i = 1, . . . , n. H . NECESSARY AND SUFFICIENT CONDITIONS TtIAT RE BE NON'TRIVIAL
A very weak necessary condition that RE have finite rank is t h a t it is at least nontrivial. From Theorem 1.7 it is immediate that: COROLLARY 1.8. The reduction relation R~ is nontrivial ¢:~ there exists a subspace V of Kern. (F) such that (i) V is invariant under {A(a); Va E 23}. (ii) A ( x ) = A ( y ) + H where H, C V i = 1, . . . , n. (iii) x ~ y. Hence we now know that given strings x and y in the same class of
486
P~.GE
R r , tile difference between the rows of the matrices A(x) and A ( y ) must be elements of a subspace V which has special properties. Namely V must be invariant under all symbol matrices and contained in the kernel of the output vector. TIIEOREM 1.9. A necessary condition that R s be nontrivial is that A(~): V(r E ~ be reducible for the same change of basis. I n other words, there exists a subspace V and a'linear transformation IV of the state vectors S to a basis for V such that basis for V
where 0 denotes a submatrix of zeros and A1 ~, A2", and A3" are submatrices wMch for all a in ~ have the same number of columns and rows. Proof: By Theorem 1.7 and standard matrix theory (see Jacobson (1952), pp. 116-117). Theorem 1.9 givcs us a strong matrix refornmlation of the statement that RE be nontrivial. A sufficient condition is obtained if in addition we require that V C Kern. (F). 0
_
:5
1/2(0),
1/2(1)
:1 12(1)
: 2
Fro. I. State diagram for the machine of Example 1.2
BEHAVIORAL EQUIVALENCES BETWEEN MACIIINES
487
Example 1.2. We construct an e x p e c t a t i o n equivalent finite deterministic machine from a probabilistic sequential machine A illustrating Theorem 1.3, Corollary 1.3, and Theorem 1.7,
A = (I, A(O), A(1), F> where I=
A(0) =
[i1000i]0
(?~, T'~, ~ , o, o, o) 0
1
0
0
0
0 0
0 0
½ 0 0 0
½ 0
1( 5 1 2 1 2
0
F =
o ~ o ÷
0 0
A(1) =
0
0
0
0
0 0
0 1 0 -~ 0
0
o o ~ o ~ o ~o ~ 0
0
0
0
The state diagram for A is shown in Fig. 1. The following labeling conventions are used: p (K) : p[0, 1]; K E 2-;means probability of transition of p via symbol K. l:Ft : Output of Fz occurs when the machine is in state I. P, (KI)
I
0
1
0 : is replaced by 0 P' (K,), P2 (K..! 0
t
t P~ (K2)
I t will now be demonstrated that for machine A 00RE0
A(00) =
[i 1°
000' 000
o ½ o½o 00 001 o¼o o 00 001
i lOOOi] 1oooo
o½o½ 0
0
0
o ~o 0
0
0
÷ 0
0
488
rAGE
=
o,ooo lixoooil
which gives
o
~
o
~-
0
0
0
0
o -k
o -~
0
0
0
0
iio oooi] 1 ooooo
(A(00)--
A(O))F
=
0+~ 0
5
0 --{
0
0
1
0
2
0 --I~- 0 ~ 0 0 0 0
1 2
= (0,0,0,0,0,0)
Hence A (00)F = A ( 0 ) F or I A (00)F = 121 ( 0 ) F for all I. Furthermore, for all P C [0, 1] (0, 0, P, 0, 1 -- P, 0)A(0) = (0, 0, P, 0, 1 -- P, 0) (0, 0, P, 0, 1 -- P, 0)A(1) = (0, 0, P, 0, 1 -- P, 0) that is, lV = ({(0, O, P, O, 1 -- P, 0)1) is invariant under the symbol matrices A (0) and A (1). V = ({(0,0, P I 0 , - - P ,
0)})c
lV
and
1'2t(0) = V VA(1) = ir
B y Theorem 1.7 we know 00RE0 but let us verify this fact. For z C 2;* 0
0
0 0 (A(00) -- A ( O ) ) A ( z ) = C , 0 0 0 0 o o 0 0
0
0
0
O]
0 +~ 0
0
0
0 0
-~ 0
!10 o
0
0
-~o 0
where C, is a constant depending on the string z and (21(00) -- A ( O ) ) A ( z ) F =- D F = 0
=D
tlEIIAVIORAL EQUIVALENCES IIET~VEEN MACtI1NES
489
Consequently Vz C 2;*, V I C S + IA(OO)A(z)F = IA(O)A(z)F or
E~(OOz) = EA(0z),
whichshows
00R~0
B y the same method one can show that 10RE1 011REll 0 1 0 1 1 R ~ l l l l l R e l l
01010R~0
so "ill strings are in the cl'lsses R~[A], R~[0], Re[i], R~[ll], R~[01], Re[0101, R~[0101] which means that RE has tinite rank. Following Thcorem 1.3, we compute the expectations and construct the expectation equivalent deterministic machine A'. Note that the vahlcs of expectation depend oil the initial state I. E.,(A) = I A ( A ) F = I F = 8.6
EAt0) = (0, a~, ~ , 0, ~ , 0 ) F = 4.6 o -~-, 15 0 ) Y = 1.1 E.,(1) = (0, 0, ~ , ~-~, EAt01) = (0, 0 , - ~ , W~,7"w~,50 ) F = 1.9 15 EAt10) = (0, 0, ~½, 0, .~z~o, ~2) F
•
E a. ( l l )
~' = 1.1 = I~a(1) (since 10R~I)
= (0, 0, ~ , 0, 2"~' -~, 0 ) f = 1.0
EAt010) = 1.9 Eat0101) = 9.1 The expectation equivalent deterministic machine of Corollary 1.3 is shown in Fig. 2. We note that A' has 7 states while A has just 6 states. The deterministic cycle 0101 appears in both machiims. II. DETERMINING WIIETItER A PROBABILISTIC SEQUENTIAL MACIIINE IS N-MOMENT EQUIVALENT TO AN INPUT-STATE CALCULABLE MACIIINE i.
INTRODUCTION
In this chapter the concept of exIIectation equivalence is generalized to N-moment equiv'dence. A congruence relation R,v is defined which
490
PAaE o
A'
:
I.o
o°1 F r o . 2. A ~ = Z * / R E
for Example 1.1
partitions the set of input strings into classes. All members of a particular class produce the same expectation and first N -- 1 central moments for the machine defining R~-. If RN has finite rank, a finite quotient machine can be constructed which is deterministic with each state corresponding to a congruence class. Each state can be connected to a random device having the same expectation and N -- 1 moments as the class represented b y the state, giving a deterministic machine with random outputs. T h e constructed input-state calculable machine is N-moment equivalent t0 the probabilistic machine. After the first theorem concerning a ncccssary and sufficient condition that two strings be in the same R~- class, a simple substitution gives generalizations of some results of Section I. Hence the generalizations are presented in this section without proofs. B. DISTRIBUTION EQUIVALENCE: ----D
The random variable structure of probabilistic sequential machines will be investigated in this section. DEFINITION 2.1. OA*(X) : the output random t'ariable of the machine A after a string x has occurred as input. Using Definition 1.6 we note that
O.,*(z) = o(rp.~(x) )
BEIIAVIO1L~.L EQUIVALENCES BETWEEN I%IACtlINES
491
DEFINITION 2.2. A and A' are distribution equivaM, t, written A ----. A', if for JA = {j: (IA(x)jFj ~ 0} there is a 1-1 map h between J . , and J . such that IA(x)ho) = I'A'(x)s j 6 J.,, x 6 Z*
Fh(,) = F/
j 6 J A'
Distribution equivalence corresponds to the conventional definition of equivalence for discrete random variables except for random variables F~ ~ F/for i ~ j . Referring back to Example 0.2, two chemical cells are distribution equivalent if (1) We neglect those partitioned areas which have either zero efficiency or a zero fraction of the catalyst. (2) Of the remaining partitioned areas there is a correspondence between the partitioned areas of one ccll and the other such that corrcsponding areas have the same fraction of catalyst regardless of the sequence of controls entering the cells. (3) Corresponding partioned areas have the same efficiencies. C. ~[OMENTS OF ThE OUTPUT RANDO.~I VARIABLE
DEFINITION 2.3. Let F=
:
\k./
Fi 6 R
i = 1,2,..-,n
call
Then the ith cenlral ~zoment of O**(x) is
tt,A(X) = E[(O**(z) -- EA(x))'I
i = 2, 3 , . . .
THEOREM 2.1.
tL"(x) = ~ (i~)
i=2,3,...
Proof: By the binomial theorem ~,*(x) = E [~o (-1)'O.,*(x)'-kE.(x)' (ik) ] To compute the cxpectation of the discrete random variable OA*(X) i-k
492
I'AGE
note t h a t it has the same distribution as Oa*(x) but takes on values F~ -k, . . . , Fin-k for i ~ k
gi A (x) = ~
( - - 1) k
l•fO ~t .4* l ~x)\ i - - k i lll ~~a t/ x )x k A- ( - - 1 ) l E a ( x ) '"
(--1) ~
• I A ( x ) ( F ' - ~ ) E a ( x ) ~ A- ( - - 1 ) ' E a ( x ) '
k=0
,,
= ~ k=O
Q.E.D. D . SPECIAL PROPERTIES OF ];~ABIN" ])ROBABILISTIC ~'kUTOMATA
DEFINITION 2.4. A Rabin probabilistic automaton (1964) is a probabilistic scquential machine such t h a t I E S and F, = 0 or F~ = 1 i =
1, 9
...
n.
R a b i n probabilistic a u t o m a t a have rather special features as far as the random variable of the outl)ut is concerned. COROLL',Rr 2.1. For a Rabin probabilisffc automaton A
.i'4(x) = ~
(--1
l~.,(x)
+ (--1)'Et(x) ~
i = 2, 3, . . .
k~O
Proof: F, = 0 or 1 hence ( F ~-k) = F
for
i~k
and the rcsult from Theorem 2.1. COaOLL~RY 2.2. I f E.t(x) = E ~ ( y ) for some Rabin probabilistic automaton A, then all central moments for x and y are equal also, i.e. 1 t,,"(x) = t,, .4 (y)
for
i=2,3,.-.
Note: for i = 2 we get the variances of the outputs arc equal. ColtOLLiItr 2.3. I f tWO Rabin probabilistic automaton A and A ' arc expectalion equivalent then ~,"(x)
= m~ ' (x)
i=
2, 3 , . . .
VzEY,*
E . T I I E CONCEPT OF N-~[O.MENT EQUIVALENCE: ~---,v
E v e n if two machines are expectation equivalent, the statistics of their behavior m a y be so different t h a t for m a n y purposes we would not want to consider the machines behaviorally equivalent. Returning to Example 0.1, two slot-machines can be expectation equivalent, meaning that the average payoff is the same for both, but one can be much more
BEItAVIORAL EQUIVALENCES BETWEEN MACIIINES
49~
desirable than the other for a player of limited resources. For a player with limited resources might have a far longer average time until "gambler's ruin" on one machine than the other. Hence in Order to associate machines in the same class whose statistics of behavior are somewhat alike, the notion of N-moment equivalence will be introduced. DEFINITION" 2.5. Probabilistic sequential machines A and A ' are N-mome~lt equivalent, written A ---,v A' if EA(x) = E~,(x)
tt~a(x) = p¢'(x)
i=
2, . - . , N
forall
xin~*
"Example 2.1. Probabilistic sequential machines A and A' such that A - . v A' for any initial distribution I, i.e. Ea(x) = Ea,(x)
and t,,A(x)=~¢'(x)
At0) =
VxCZ*
½ ¼ l
i=2,3,...
VIeS
At1) =
-~-
21'(1) =
~-
+
0 ~
A'(O) =
{ o
For both machines
F =
F..
for F1, Fo. arbitrary real numbers.
El ~ . T t t E RELATIO.N'SttlP BETWEEN ~ D AND ~---A-
TttEOREM 2.2. For probabilistic seque~ztial machi~zes A and A ~ A -- D A ~ ~ A --.v A '
for all fildte N
Proof: Distribution equivalence means there exists an h such that
494
P.~oE Fh~o = F / (IA(x))~(o = (I'A'(x)),
V x E :~
when !
!
(I A (x)),F,
!
~ 0
Hence
(IA(x) )^(,)Fh(,) = ~ (I'A' (x) ),F," or
E,,(x) = E~,(x) which is expectation equivalence. For a n y finite N
F~},~ =
(F,')"
T h e fact that ~,,,/(x)
=
~' ( x ) ~.v
comes fl'om inspection of T h e o r e m 2.1. Symbolically, we have shown A -- D A ' ~ A --4. A ' for a n y N. H o w close one call come to a converse to T h e o r e m 2.2 depends on the form of the entries of F. LE~I.~IA 2.1. ( G a n t m a c h e r (1959)). Given a sequence too, ml , . . . of real numbers m, i f there exists a set of positive numbers rl > O, r2 > O, . . . , r~ > 0 > V~>
V,-1,...,VI>
0
which is a solution to each of the following equations
m~ = ~ ~ V /
(p = o, 1, 2, . . . )
(,)
then the solution to ( , ) is unique. We can apply the l e m m a to get a partial converse to T h e o r e m 2.2. T n E o n m l 2.3. I f machines A and A ' meet the foUowing requirements (letting h ( i ) -- i W . L . G . ) (i) ( I A ( x ) ) , F , = 0 iff ( I ' A ' ( x ) ) , F ; = 0 i = 1, . . . , n. V x E ~*. (it) .,ill slates in a given machine have distinct output symbols. A t (iii) E.~(x) = E.4,(x), Vx E ~*; p,A(x) = pl (x), i = 2, 3, . . . . Then A and A ' are distribution equiL,alcnt.
BEHAVIORAL
EQUIVALENCES
BETWEEN
MACtIINES
495
Proof: We use Lemma 2.1. Since the central moments of Oa*(x) and $ OA, (x) about zero are equal for any string nzo = Z , [ ( I A ( x ) ), such that F, ~ 0] ml = E.,(z)
= E~,(z)
.,4
(x) + E.,,(x) ~ m~ = t,.. ( x ) + Z A ( x ) ~ = ~,~ "
We discard those components whose contribution to the moment is zero and relabel the nonzero components by the index j. Let J = {i: I A ( x ) , F , ~ 0} Because of assumption (i) we also have J = {i: I ' A ' ( x ) , f ; ~ 0} Hence m2, = ~
JEJ
= ~
JEJ
(IA(x))i(F~) p
p = 0,1,2,---
(I'A'(x))j(Pi') p
p-- 0,1,2,---
By the lemma the solution is unique. (IA(x))i = (I'A'(x))j .....
fi
jEJ
= F~'
Therefore A and A' are distrribution equivalent. Example 2.2. Condition (ii) of Theorem 2.3 is necessary as shown by the following: I A ( x ) = (0.5, 0.3, 0.2)
l'A'(x) = (O.5, O.4, 0.1) E a ( x ) = I A ( x ) F = 0.5 E,,,(x) = I ' A ' ( z ) F ' = 0.5 Since A and A ~ are Rabin automata, by Corollary 2.3 At
~,~(x) = ~, (x)
i = 2, 3 , . - .
496
PAGE
However, A and A ~ have different distributions over states for the string x.
G. TIlE N-~IOMENT REDUCTION" RELATION DEFINITION" 2.5. The N-moment reduction relation R.~. : xRxy if for all I in S
Ea(xz) = Ea(yz)
and
I~,a(xz) = p,a(yz) Vz E Z*,
i=2,3,...,N
The relation Rx is a congruence relation and R~ = Rx for N = 1.
H. INPUT-STATE CALCULABLE~[ACHINES A probabilistic sequential nmchine has randonmess associated with its switching, or state transitions, and a deterministic output function 0. For some problems it is convenient to view a randomly behaving machine as having deterministic switching but a random output device. We study now connections between these two viewpoints. DEFINITION" 2.6. (Carlyle (1965)). A machine A is input-state calculable if knowing the state at time t and input at time t, the state at time t + 1 can be calculated by a deterministic function. As Carlyle has pointed out, the class of finite input-state calculable machines consists of exactly those machines which have finite deterministic switching and random outputs. DEFINITION" 2.7. A random output depending on the state S~ with parameters of r , , • • • , rk will be written
Si:lr,, ''',rk I I. CHARACTERIZATION OF INTuT-STATE CALCULABLE .-~IACHINES EQUIVALENT BY ~-~-TO PROBABILISTIC SEQUENTIAL ~[ACIIINES
We obtain a generalization of Theorem 1.3. TIIEORE:Xl 2.4. Let Rx be the N-moment reduction relation defined by
a probabilistic sequential machine A. Rank, [ Rx [ = r finite ¢=~ there exists an input-state calculable machine A r such that A -- N A ' Proof: Using the quotient construction of Theorem 1.1, obtain ap_ A" = ~ * / R x where
BEHAVIORAL EQUIVALENCES BETWEEN MACHINES
497
A " =
and ill is analogous to the function M in Theorem 1.3. Elements ill the same congruence class of R~, have expectations and the first N -- 1 central moments equal. Hence the machine A" can have random devices attached to the states (which are classes R~,[x]) such that the first N -- 1 central moments and expectation of each device is the same as the congruence class represented by the state. The resulting machine A' has deterministic switching and random output functions and is equivalent by --N to the probabilistic nmchine defining R.v. The dctails of the proof parallel the proof of Theorcm 1.3. Q.E.D. J. A NECESSARY AND SUFFICIENT CONDITION FOR THE N-~[OMENT REDUCTION REI..~_TION TO HOLD
In the previous section we have seen the importance of the N-moment reduction relation R~. in characterizing those probabilistic sequential machines for which there is an input-state calculable machine equivalent by ---N • Let us now obtain invariant subspace conditions for strings to be in the same class, analogous to those of the theorems of Section I. THEORE.~I 2.5.
xR,vy <::*A ( x )
= A(y) +
:
\L/
where N
<{h,,.-., h.}> c- n Kern. (F') i:l
and
<{h,, . . . , h,,l>.A(z) c N Kern. ( F ~)
Vz E ~*
Proof: Suppose that R.v holds for x and y E,(x)
= E,~(y) ¢=0 I A ( x ) F ¢=>A ( x )
= IA(y)F = A(y)
VIE S
-t-
3"= 1 , - - - , n 1~2A ( x ) = I A ( x ) ( F ~-) -- E , ( y ) 2 ~.'~(y) = I A ( y ) ( F 2)
-
E ,. ( y ) - o
498
rAGE
since xR~.y we have that ma(x) = p2a(y) which gives
IA(x)(F") = IA(y)(f")
VIE S
rl
A(x) = A(y) +
!
r~ E Kern. (F 2)
j = 1, 2, . . . , n
rn
For any i, p,A(x) can be written as a recursivc function of I A ( x ) ( F ~) and smaller powers of F, i.e., ma(X)
=
IA(x) (F ~) + ~
IA(x)(F'-k)Za(x) k + (--1)'E~(x)'
(--1) k
k~l
Hence by induction we assume
I A ( x ) ( F ~) = I A ( y ) ( F ~)
k = 1, 2,
• --,
i
--
1;
VIE
8
(1)
Hence
= IA (x) (F i) + [3 ~A(y) = I A ( y ) ( F ' ) + fl
,/(x)
= ma(y) ¢:* [ I A ( x ) ( F j) = I A ( y ) ( F ~) V I E S + j
,,~(x)
= ~, ( u ) , ~ A ( ~ )
= A(y)
<= i]
(2)
+ n
where
r / E K e m . ( F ~)
j=
1,22.,...,n
which completes the induction. The rest of the proof is analogous to Theorem 1.5. Q.E.D. If we substitute RN for RE and Ni=lKern. (F i) for Kern. (F), the proofs of Theorems 1.4, 1.6, 1.8, and 1.9 go through cxactly as before and wc state the dual theorems which are obtained. THEOnm~ 1.4D. I f R• has finite rank r there exists a partition ~r = (~-1, " " , ~'r) on V ( A ) and an integer valued function g(i, m) such that TtmoRmt 1.6D. Let
U = ( U {A(x), -- A ( y ) , } i = xE~*
1,2,...,nandxR.~y)
BEIIAVIORAL EQUIVALENCES BETWEEN
MACIIINES
499
then for any z E Y,* N
UA (z) c f] Kern. (F ~) i=l
and there exists V a subspace of R" such that for any ~ E
(i) UA(,,) c V. N (it) VA(,r) c V c fl~-~ Kern. (U). TtIEOREM 1.SD. R x is nontrivial ¢:~ (-4 V) a subspace of R" such that N (i) V c [1 i=1 Kern. (Fi). (it) V is invadant under" A(a), '7"a C 2L (iii) A ( x ) = A(y) -}- H where H, C. V, some H, ~ O. • TttEORE.~! 1.9D. R x is nontdvial ~ there exists a subspaee v such that the symbol matrices A (~) : cr C ~ be reducible for the same change of basis for V, i.e., there exists a linear transformation W from the slate basis S to a basis for I t such that
basis for V
IV-IA(a)W = I A/AI~ Aa0~I where 0 denotes a block of all zeros the same size for all symbols ~ and N
V C [1 K e m . ( F ~) Example 2.3. We extend Example 1.2 to illustrate Theorems 2.4, 2.5, and 1.8D. 10" 5" ,v
{ { ( O, O, p, O, --p, 0)}) C [1 Kern.
.=1
in
2" 1" 2"
for any finite N .
Note that in this case that the classes of RE are also the classes of R x . Hence we can replace the output from any state of the machine in Fig. 2 with a random device possessing the same first N central moments as the probabilistic sequential machine. LCt us compute the variances.
500
PAGE
, d ( h ) = (~o, -&, ~'~, o, o, o)
10
--
(8.6) 2
2
= 8.84 Likewise, we get ~2A(O) = 1.44
~-qA "w" 1 ) = 0.09
~2a ( 1 )
,u2a(lO)
=
0.09
m a ( l l ) = (0, O, "~o,O, -~, O )
ma(OlO)
•2a(0101)
=
0.0
=
(0, O, ~ r.,2~ ~,
=
0.0
0,
0.09
=
/
--
(1.0)
~
~2 ~.'~, ~-)
if2 53 75 = (we, O, ~:so, O, ~:~o, O)
i12 /-
(9.1) °.
= 7.29 A nmchine A' which has the same expectation and variance for each string and detem~inistie switching will be constructed .using random output devices symbolized by 8':
BEIIAVIORAL
EQUIVALENCES
BETWEEN
501
MACtlINES
0
4.6, 1.44 I
: 8.6, 8.84 I
/•{..
1.9, .09 I
1
[--z-z;-~
0,1 0
Fxo. 3. Input-state calculable machine A' which has the same expectation and variance for all strings as probabilistic machine A of Example 1.2. I@----]is the initial state of A'. attached to states S' which supply random numbers with mean e and variance V. The machine A', shown ill Fig. 3, is the machine of Example 1.2 with the outputs connected to random devices such as the above rather than deterministic outputs. III. TItE NOTION OF INDISTINGUISIIABILITY AS A CRITERION OF BEttAVIORAL EQUIVALENCE Suppose probabilistic sequential machines A and A' arc behaviorally equivalent in an intuitive sense. Taking into consideration how machines are built and repaired, one would expect them to be interchangeable as submachines of any larger machine. Indistinguishability of two machines in any machine into which they can bc plugged is a strong criterion,
502
1"AGE
the ramifications of which will be investigated. The following example of Arnold (1964) illustrates how the notion of distribution equivalence, ~, fails to meet the interchangeability requirement. i.
EXAMPLE OF TWO DISTRIBUTION EQUIVALENT ~[ACIIINES WHICH ARE NOT INTERCItANGEABLE AS COMPONEN~I'S OF A .,~[ACHINE
A1 = (I1, A~(0), A~(1), F,) As = (/2, A~(0), A2(1), F~) where 12 = I1, F2 = F1
A~(0) = A ~ ( 1 ) =
0 0
0 0 0 0
F1 =
A2(0) = A 2 ( 1 ) - -
0 0 0 0
1 0 0 0
11 = (1, 0, 0, 0, 0)
o
o
o
~ ~
0
0 0 0
0 0 0
0 0
Machines A1 and A2 happen to be independent of the input i.e. are Markov processes since AI(0) = AI(1) and A2(0) = A2(1). Table I establishes that A1 - - , A2. Later a machine will be shown which behaves differently with AI as a submachine than it does with As as a submachine even though the state behaviors of A1 and As are Markov proccsses. DEFINITION 3.1. A ---* B denotes the nmehinc obtained from plugging the output of A into the input of B, subject to the provision that the input symbols of B include the output symbols of A. DEFINITION 3.2. The set of tapes accepted by machine A with cutpoint ,~ written T(A, ~,): T(A, ~) = {x: EA(x) >= ,X}. DEFINITION 3.3. A and A ~ are tape equivalent machines, written
BEIIAVIORAL EQUIVALENCES B E T W E E N MACIIINES
503
TABLE I COMPARISON BF ~IAcIIINES A I AND A2
x
Ra, (x)
o ½ ½ 0
0 or 1 oo, ol, lo or 11 -dl x: lg. (x) > 3
I vl, (x)
(1, (0, (0, (0,
o, o, o, o) ½, ½, 0, 0) O, O, ½, ½) O, O, O, O)
Ea ,(x)
o ~ ½ 0
I2:12(D
(1, (0, (0, (0,
o, o, o, o) ½, ½, 0, 0) O, O, ½, ½) O, O, O, O)
A - r A' if for some specified ,~1 and ),5
T(A, ),1) = T(A', ),2) DEFINITION 3.4. A and A' are tape indistinguishable for a class C of machines if T(A ~ C, ,~) = T ( A ' ~ C, X) for all ~ aud C E C. The class C could be something more special than finite deterministic or probabilistic automata, e.g. the class of definite automata. TtlEOREM 3.1. I f probabilistic sequential machines A and A' are dislribution equivalent they are not necessarily tape-indistinguishable for the
class of finite deterministic automata. Proof: (by example) Let C be a finite deterministic machine which accepts 01, 10 with probability 1 and "dl other tapes with probability 0. We tabulate the expectations of A1 ---+ C and A2 ---+ C in Table II. Hence T(A1 ---+C, ~) ~ T(A2 ---+C, ~) for any X C (½, ¼). The reason for this difference is t h a t the conditional probabilities of output random variables differ for A1 and A2. For example, Prob. {0",(01) = 1} = 1
given
OA~(1) = 0
Prob. {0a..(01) = 1} = ½ given
OA,.(1) = 0
while *
*
TItEOREM 3.2. For probabilislic sequential machines A and A', if for all finite deterministic machines C and any culpoint X: T(A ---+C, ~) = T ( A ' ~ C, ),) ~ A - E A'
Proof: Suppose Ea(x) ~ EA,(x) for some tape x of length k. Without loss of generality choose EA(x) > EA,(X). Since the rationals arc dcnse in the reals, let ~,~be a rational such that Ea(x) > )~o> Ea,(x). Let C be
504
PAGE
TABLE II EXPECTA'IION OF .tlx ~
~ AND ~t~ __~ (7 FOR STRINGS X OF LENGTtI 2
y
e#,'(y/~).
~a,.c(x)
e~:O,/x)
]za,.c(x)
oo
o
o
¼
o
ol I0 11
½ ½ 0
½
¼
l
½
¼
¼
0
¼
0
a deterministic machine which beginning at time k computes the nulnber ik -- Xc where i~ is the input at time k. Since ~,, is rational C needs o~fly a finite number of states. C accepts the string z iff ik -- ~,c _-__O, which can be done in a finite number of steps.
x E T ( B ~ C, he)
iff E ~ c ( x ) > he
but since C is deterministic
x C T ( B ---* C, hc)
iff E~(x) ~= hc
hence let B = A and B = A':
x E T(A~C,X~)
and
x ~ T(A'~C,~c)
so
T ( A ~ C, h~) # T ( A ' ---, C, h~) B y logical equivalence we have shown for the class C of finite deterministic machines (k) (C)[T(A ---* C, k) = T ( A ' - - , C, k)] ~ (x)[Ea(x) = E a , ( z ) ] Q.E.D
By the example presented in Theorem 3.1 we know the converse is not true. B . A ~ [ O R E SATISFACTORY T E C H N I C A L N O T I O N OF INDISTINGUISttABILITY
The exmnple at the beginning of this section shows that machine equivalences such as distribution equivalence, - v , fail to have the substitution property with respect to the composition of machines. T o obtain a more satisfactory definition of behavioral equivalence, the conditional probability structure of probabilistic sequential machin¢~
BEHAVIORAL EQUIVALENCES BETWEEN hLkCIIINES
505
must be explored. A stronger concept of equivalence, called indistinguishability, based upon equality for the two machines of the probabilities of all possible o u t p u t strings given all possible input strings will be formulated, following the development of Carlyle (1961). In what follows it is assumed that Z contains a symbol A so that A (A) = E ( n ) , the n-dimensional matrix identity so t h a t the o u t p u t from the initial state can be ignorcd. DEFINITION 3.5. T h e conditional probability for a sequence of outputs y = yly2 • • • y,~ given a sequence of inputs x = zl • • • z,~ starting from an initial distribution 1I = (II1, II2, .. • , Hn) of a nmchine A will be written Pua(y/x) or if the machine involved is clear from context, just P~ ( y / x ) . Table II shows how machines A1 and A2 differ with respect to Definition 3.5. The symbols of the o u t p u t alphabet are real numbers which occur as components of the o u t p u t column vector F, i.e., the output alphabet Y can be written Y=
[3 {F,}
As usual, the set of all finite scqucnccs of symbols from Y will be denoted b y Y*. DEFINITION 3.6. T h e probability of a sequence of transitions S~ --* S~ --* --- --~ S~i with o u t p u t sequence y because of input sequence x will be written Ps,, . . . . . s , j ( y / x ) DEFINITION" 3.7. T h e conditional probability transition matrix A (y~/a) is formed from A (a) by zeroing out all columns except those corresponding to states with output y... More fornmlly, Let Jr, = {J: Fi = Y,}
y, E Y
and let Q~ be the nmtrix with [Q~']m = 1 for j E J~, and [Q~']k.i = 0 otherwise. Then A ( y , / z ) = A ( z ) Q ~ ' y , E Y, z E E. Note that [A (yk/z)],.i is just Ps,~si(yk/a). Remark. 3.6. Let y E Y*, x E Z*, y, E Y, a E Z such ttmt lg. (y) = Ig. (x). T h e n A (yy,/xa) = A ( y / x ) A ( y , / a )
506
PAGE
By definition [A(yy~/xa)]z.,~ is Ps~s.,(yy,/xa). For any state Sk
Ps,~s.,(yy,/xa) = Ps,~sk(y/x)Psk~s,~(y,/cr) Since transitions to different states Sk are nmtually exclusive events n
Pz,~s,~(yy,/xa) = k~= l Ps,~s,(y/x)Ps~-.s,~(y,/a) Using the definitions again
[A (yy,/xa) l ,.,~ = ~ [A (y/x)],.k[A (y,/a) ]k,,~ k~l
or in matrix form
A (yy,/x,7) =- A (y/z)A (y,/a) Hence the conditional probability transition matrices for output strings given input strings can be generated by the conditional probability transition matrices for output symbols given input symbols, analogous to the case for the transition matrices A (x). Remark 3.7. Given initial distribution over states H, the probability of getting output string y from input string x is just
PnA(y/x) = ~ ~ 1-I,[A(y/x)l,,i i=1 i=1
With U = ( i ) we can write Pn a (y/x) = HA (y/x) U
Remark 3.8. W e note the following identity PnA(y/x) = ~
Pna(yy,/xa)
for all ~ E 2;
YiEY
since
PHa(yy,/xa) = ~ YiEY
IIA(y/x)A(y,/a)V
YiEY
= HA(y/x) ~
A(y,/a)U = H A ( y / x ) A ( a ) U
yiEY
But for any n X n stochastic row matrix 6'
CU= U
BEtIAVIORAL EQUIVALENCES :BETWEEN MACItlNES
507
Hence
HA (y/x)A (~) V = HA (y/x) V = Pi~A(y/x) DEFINITION 3.8. The terminal distribution II*(y/x) for a sequence of outputs y given inputs x (assuming Piia(y/x) > O)
HA(y/x) n*(y/x) - HA(y/x) U The ith component of lI*(y/x) is the probability of being in state i after input string x has occurred and output string y has been observed. The following identity holds whenever PnA(y/x) > O.
pna(yy,/x~) = Pn a (y/x)Pn*¢ul~)(y,/a) a y~E Y, a E 2, a E Z*, y E Y* DEFINITION" 3.9. Machines A and A' are indistinguishable written A ~ ~ A t if
Pna(y/x) = P~:(y/x)
Vx E Z*,
Vy E Y*
The concept of indistinguishability for machines depends on observable identity when both machines are startcd from their iifitial state distributions. DEFINITION 3.10. Machines A and A' are k-indistinguishable if Ar
P1a(y/x) = Pn,(y/x)
x E (Z) '~, y E (y)m for m = 0 , 1 , . . - , / :
DEFINITION 3.11. In a machine A, two initial state distributions II and II' are indistinguishable if
Pn'~(y/x) = P~,(y/x)
Vy E Y*,
Vx E 2"
DEFINITION 3.12. In a machine A, two initial state distributions II and II' are k-indistinguishable if
Pna(y/x) = P~,(y/x)
Vx E (~)k,
Vy E (y)k
Checking whether the indistinguishability defi~fition (3.9) for machines or for initial distributions (3.11) holds, using only the definitions, involves calculation of an unbounded sequence of conditional probabilities. In the next section is shown a bound for the length of strings whose probabilities need to be calculated. If n is the number of states, then only strings of length n - 1 or less nccd be considered in establishing indistinguishability.
508
PAGE
C. TtIE RELATIONSHIP :BETWEEN TttE INTUITIVE AND TECttNICAL . CONCEPTS OF INDISTINGUISIIABILITY
We have yet to relate the intuitive notion of indistinguishability to the technical Definition 3.9. T h e next theorem shows that two machines indistinguishable in the technical sense are indeed indistinguishable when plugged into C, any finite state probabilistie or deternfinistic machine. Since C has a fiifite number of states, it is assumed that finite strings of Z = C(Y*), the random variable taking on values of strings of outputs of C given strings of inputs from the random variable Y, depend only oll finite strings Y*. THEOREM 3.4. Let C* be the class of finite state probabilislic and deterministic sequential machines. For any C E C* if
A =--~A '
then
A --~ C --I A ~ "-~ C
Proof: For any fixed value y of the o u t p u t string random variable of A, Ya P~'C(z = C ( y ) / x ) = P , a ( y / x ) P C ( z = C ( y ) / y l since the occurrence of different y are disjoint events, for all y E Y*: lg. (y) = lg. (x). P~C(z/.r) =
~
y~(y)lg.(:r)
P~(y/x)PC(z = C(y)/y)
since Z and Z' range over the same set and the indistinguishability of A and A ' Pn a ( y / x ) = P ~ : ( y / x ) So for all x E 2;* and all z E Y~* P ~ C ( z / x ) = P~:'C(z/x) which means A --~ C and A ~ --~ C are indistinguishable. Q.E.D. The criterion of interchangeability as a submachine has lead us to --~ as a behavioral equivalence for probabilistic sequential machines. T h e equivalence -- ~is well known as an equivalence between communication chaimels. The other kinds of equivalences discussed are equally valid for channels with numerically coded outputs. The relationship between the equivalences - r , ~ r , - . v , = D, and --~r can be summarized in the following schenmtic way:
BEH.KVIOIL~L EQUIVALENCES BETWEEN" MACHINES
509
A- D A ~ A m N A' ~
A =E A' ~
A =T A'
A-=I A' As we h a v e seen in previous chapters, the concepts of behavioral equivalence for probabilistic machines analogous to those of deterministic machine theory depend on the device being modeled. Consequently, applications of probabilistie sequential machines to new domains are likely to suggest new kinds of behavioral equivalences. IV. FINITE COMPLETE SETS OF INVARIANTS FOR THE BEIIAVIORAL EQUIVALENCES ---E, ~ , AND ---i AND THE REDUCTION CONGRUENCE RELATIONS RB AND R~r T h e results of the previous sections involve relations defined over all finite strings of the input alphabet. I n this section are found bounds for the length of strings necessary to consider in order to decide whether two elements of the domains of the relations are in tile same class. DEFINITION 4.1. A set of functions 5: = {3"1, "-" , fro} is a sel o f i n v a r i a n t s for the relation R if for all x and y in the domain of R xRy ~
f,(z)
= f,(y)
i = 1, . . " , m
The set of functions 5=is a complete set of invariauts if x R y ¢=*f~(x) = f,(y),
i = 1, . . .
, m.
We exhibit sets of functions which are invariants for the above relations. A set of functions which are invariant over RE and R~. are:
f(,,.~.~)(x)
= Z.,(xz)
:
for a l l z : l g . ( z )
-<_i,
for a l l l C S
f(A.N.~)(X) = ,.v'~(XZ) While for the relation -- r, the set of functions below is a set of invariants: g~,.~)(A)
= Pn'~(y/x)
for all x and y;
lg. (x) = lg. (y) _--< i
Likewise the set
h(z.,)(A ) = E , , ( x )
for a l l x :
h(z.r)(A)
for
= p,'4(x)
l g . ( x ) =
r = 2, - - . , N
is a set of invariauts for the relations --B and ~ . .
510
rAaE
I t is clear tlmt for an unbounded i, the above are complete sets of invariants. However, in what follows a finite value of i will be found for each of these cases. In the case of ~ B the bound will be the same as the well known Moore bound for deterministic autonmta but in the case of ~ it will be lower for most machines. The main tool used in finding the various values of i is the following simple lemma. A. THp. FUNDAMENTAL LEM.~IA LEM~IA 4.1. Given n-dimensional vector space V , a finite set T = {T~} where each T~ E V X V is a linear transformation on V and some finite set of vectors Vo c V such that dim (Vo) = r _-> 1. Define
Mo = Vo lllx = {Vo" T , : T~ 6 T , Vvo E Vo} -~[k = {v0"Tq " " T~ k : T , , , "'" , Tik 6 T, Vo C V0} and let
Then there exists an integer J ( T ) such that
(i) LI(T) = L~(T)+I. (ii) J ( T ) < n - - r. Proof: Lo C L1 c • • • c L~ c • • • c Lk as a consequence of the deftnition. T h e sequence {dim Li} i~0 is bounded above b y n, the dimension of l T. Hence call J ( T ) the smallest index such that
dim (Lj(r)) = dim (L~(r)+~) •
,.
r
~J(T)
We show that the sequence I(hm ~jliffi0 is strictly increasing. From the definition of L i we obtain dim (Lo) =< dim (LI) -_< dim (L2) --- _-< dim (L~(T)) <= n Suppose dim (LK) = dim (LK+I) for K < J ( T ) then K = J ( T ) by definition which is a contradiction, i.e., we get dim (L0) <: dim (L~) < -.- < dim (L~(T)) < n Noting that dim (L0) = r we get r -5 J ( T ) which shows part (ii). Q.E.D.
=< n
BEHAVIORAL EQUIVALENCES BETWEEN MACItlNES
511
B , A BOUND FOR T E S T I N G FOR I"~IEMBERSHIP IN ~ i
THEORE.~I 4.1. I f A is a probabilislic sequential machine with n states, then (n -- 1)-indistinguishability of initial distributions H and H' is sufftcient to guarantee indistinguishability of initial distributions H and H'. Proof: Using Lemma 4.1 let [s_xN
,o
t/i/t
1
T = {A(y,/a):y, Vo.
E Y, o C E}
T, = A (yi/a) U
b y the lcmma. For any string x = al --. at, : for r' finite, A ( y / x ) U can be expressed as A(y/x)U
.~ .c,A(yB,, .
.
y . . ,' l ~ j , , "
. . .
,,~,)v
(.)
with r,=n--
1
for
i C I,
Ynk~E Y,
a~k~C Z
(for
/¢= 1,-.-,r~)
Hence for initial distributions II and II ~ Pn'~(y/x)
= IIA(y/x)V
= ~.. e,IIA(yB,, .. . y~,/~rj~, . . . ~rj~,)U iEl
Let •
?
i
Y; ---- YBt~ "'" YB,~
and
P ~ , ( y / x ) = ~_, c~Pn A ( y /, x ) ,
x
i
with
= o'i~ - . -
~:
oyr ~
l g . ( y i ) = l g . ( x ~) =< n -- 1
nmltiplying ( . ) by II' gives
p~,(ylx) = ~ c,Pn,(y ~ 'i x ' ) iEl
B y the asstunption of (n -- 1)-indistinguishability for lI and II' A i i PrI A (y i / x i ) = Pn,(Y /x ) lg. (x')
=
l g . (yi)
~ ~t -- 1
Hence Pri A ( y / x ) = P ~ , ( y / x )
Q.E.D.
512
P~Gn
C. EQUIVALENCE OF DISTRIBUTIONS IN 0N'E ~LkCHINE
Using Lemma 4.1, we can make effective the definition of the relations R~ and R~ of Section II. A hound will be obtained on the lengths of strings needed for deciding whether x and y are in the same congruence class. DEFINITION 4.2. Distributions II and II' are expeclation equivalent for a machine A, written " II,~' a II' , if I I A ( x ) F = II'A(x)F, V x E ~*. DEFINITION 4.3. Distributions II and II J are If-expectation equivalent for a machine A, written 11 ~ II ~, if
HA(x)F = II'A(x)F
x E 2~*:0 _-< lg. (x) _-< K
T H n o n m l 4.2. (Generalization of the result of Paz (1964)). I f A is a probabilislic sequential machine with n slates and if II and H' are n -- 2 equivalent dislributions of A then
II ~ II' Proof: We use the elementary fact that for any constant c H A ( x ) F = H A ( x ) F ¢=~I I A ( x ) F + c = I I ' A ( x ) F + c
V x E Z*
Since HA (x) and II'A(x) are stochastic
If all the entries of F are equal, then all distributions are expectation equivalent. If at least two of the entries of F are not equal, then (F) ~
F A- c
F ' = F -t- c
(i))
. Let A' be a machine differing from A only in that
. Suppose we experiment with A and A ~ simultaneously.
No new information is obtained, i.e., distributions are expectation equivalent for A iff they are expectation equivalent for A ~. However, if we compute a bound for the two machine experiment using Lemma 4.1, the bound will be lower than would have been obtained from an experiment on A alone. Since the results of the two experiments are identical, the lower bound applies to A also.
513
BEtIAVIORAL EQUIVALENCES BETWEEN" MACItlNES
T = {A(a): a C Z}
Vo.T, = A(i).Vo
By Lemma 4.1, there is a finite set of indices J of vectors A(xi)Viwith V ~ E V0 with lg. (xl) -<_ n -- 2 such t h a t for an arbitrary x E 2~* there are constants c~ so t h a t A ( z ) F = ~ , c~A(xAV ~ jeJ
which reduces to
Multiplying by the initial distributions gives:
HA(x)F = ~ cflIA(xi)F -4- c' jtJ
n'A(x)P = ~ c~n'A(xAF + c' jeJ
(n
--
2)-expectation equivalence gives HA(xs)F = II'A(xs)F
j EJ
SO
nA(x)F = n'A(x)Y Q.E.D. D . ]~OUNDS FOR TESTING FOR ~[EMBERSIIIP IN ~ E AND R E
DEFINITIOX 4.4. The abstract join of probabilistic sequential machines A = (n, II, A(0), . . . A(k and
1), F) A' = (n', II', A'(0), . . . A'(k -- 1), E')
is the abstract n + n' state machine A e written Ae=A
@A'=(n+n',
,Ae(0),..-Ae(k-
where
A*(i)= A(i) i 0 [ 0 I A'(i)
1 ) , F ~)
514
~,.~oE
and
W'/ II and 17' can be embedded in the n W n t dimensional space as n' zeros
n zeros
II e = (II, 0, - - . , 0 )
II 'e = ({}, . - . , 0, II t)
The problem of deciding whether two machines A and A' are expectation equivalent: IIA(x)F = II'A'(x)F'
V x C ~*
is logically equivalent to deciding when II and li t are cquivalent in A (9 A', i.e., whether He .~A' ii, e E Hence following Carlyle (1961), we use Theorem 4.2 to state Remark 4.1. II e.~eA, iite ¢=, H e~.%a, iite where K = n -[- n' -- 9 E KE which gives the following theorem. TIIEORE.~! 4.3. Let A and A ~ be probabilislic sequential machines having I n and n stales respeclwely. [A -~E A ' c:~ E a ( x ) = E a , ( x )
Vx:lg.(x) -
t-
2]
Theorem 4.3 nmkes the experimental determination of expectation equivalence possible provided the number of states of each machine is known. Furthermore, it gives a bound on the process of finding whether two strings are in the same equivalence class under the reduction relation RE of Section 1. This result is sunmmrized in the following theorem. THEOREM 4.4. Strings x and y are in the same equivalence class under the reduclion relation RE of an n state probabilistic sequential machine A if and
only if E., (xz) = Ea(yz) for all strings z:lg.(z) < n-
2
and all
I CS
Proof:
xR~yc:aEa(xz) = Ea(yz)
for all
z C 2~ ,
for all
IE S
BEHAVIORAL EQUIVALENCES BETWEEN" MACtIINES
515
IA(x)A(z)F = IA(y)A(z)F Let II = I A ( x ) and II' = I A ( y )
¢=*IIA(z)F = H ' A ( z ) F
Vz C ~*
B y T h e o r e m 4.2 and its obvious converse, we get
I A ( x ) ,,-~E A IA (y) which gives the theorem. E . BOUNDS FOR TESTING FOR ~IEMBERSHIP IN ~ x
AND RN
DEFINITION 4.5. np = the independence number of all n state machine A with o u t p u t vector F: nr = dim ({[F;]: i = 1, 2, . . . , n}) I t follows from vector space arguments t h a t nr = #{Fk : Fk ~ 0}
where # is the cardinality operator on sets
The independence number is just the dimension of the space generated b y powers of the components of the o u t p u t vector F. For a Rabin a u t o m a t a n r = I and all central m o m e n t s reduce to polynomials in what we m a y consider the first "central m o m e n t " EA(x). I n general, if the independence n m n b e r is n ~ , then for all x in Z*, the (nr -Jr- 1)st central A m o m e n t z~F+~(x) reduces to a polynomial in the lower central moments since u~,,+a(x) = I A ( x ) ( F "r+') -}-Q(z) where Q(x) is a polynomial in which I A ( x ) ( F i ) , i = 1, . - - , nj. occur. IIcnce nF
.~r+,(x) = IA(x) ~ c,(F') + Q(x) i=l
since n~- is the dimension of the space ( ( F ~) : i = 1, 2, • .. , n) n F
= ~ c , I A ( x ) ( F i) + Q(x) i=1
TttEORml 4.5. Let A be a probabilistic sequential machine with mdpul
516
PAGE
vector F and n stales. T h e n for a n y I" <= n r and strings x a n d y in Z*:
{
EA(xz)
.}
= E.~(yz)
~,2~(xz) = ~2"(yz)
m"(xz)
Vz E
~;~(yz) [I~A(xz ) = Z A ( y z ' ) A .* ~2a(xz ') ~ (yz )
I
b', A (xz ! )
where p = 0 i f
(D
t Vz:lg.(z')
< n -- r --
~,;'(y~')
t
C (IF, (F~), " " , ( F ' ) } ) , p = 1 otherwise.
Proof: Lct 17o, = {F, ( F : ) , . . . , (Fr)}
/1\ \1/
4.2, i.e. f o r i =
1,9
...
r
H e n c e the dimension of (It0) is either r or 7" + 1. We will call it r -I- p where p is defined in the s t a t e m e n t of the t h e o r e m dim(Vo) = 7"+ p
where
r < nr
BEIIAVIORAL EQUIVALENCES BETWEEN MA(2HINES
517
{T,} = { A ( i ) : i E Z} F ° r anY "° C (V°)there exist e°nstants ce such that (defining ( F ° ) = ( i )
)
t'o. T~ = A(i)vo = ~ ckA(i)(F ~) k~p
Consider any string z:lg. (z) = m'
finite
Then there exists a spanning set A(xi)vo with j E J and constants c~(vo) so that
A(z)vo = ~ c~(t'o)A(xi)vo :lg. (xi) ~ n -- r -- p j ,~.t
Let co range over the ( F ~) i = p, -. • , r and multiply by II and II*
r[A(z)(F') = ~ c~( (F') )nA(z~)(F"J) j ~.r
7/'i =< 1'
H'A(z)(F') = ~ c~((Ft))H'A(xj)(F "~) je3
Since HA (xi) ( F nj) = II'A (x j) ( F nj)
by assumption
IIA(z)(F') = I I ' A ( z ) ( F ~) T h a t is, the moments about zero from H and H' are equal if they are cqual for all strings of length = < n - r p. Let II = : I A ( x ) and II' = IA (y). T h e n we have for any z and a n y initial distribution
I A ( x z ) ( F ~) = I A ( y z ) ( F ~)
i = p, .. . , r
holds if and only if for i = p, . . . , r
I A ( x z ' ) ( F ' ) = I A ( y z ' ) ( F ~) for all strings z ~ of length less than or equal to n - r -- p. Noting by Theorem 2.4 (Eq. (2)) that any central moment ~,~'4(x) is a reeursive function of I A ( x ) ( F ) , . . . , I A ( x ) ( F ~) the result is established. Q.E.D. COI~OLLArcY 4.5. (Bound for testing the relation R~,). Let A be a proba-
bilislic sequenlial machine with n states and with N <= h r . xR~y ¢:~for all sll~ngs z". lg. (z') =< n - N - p
518
PAGE
'/
Ea(xz') = Ea(yz ) ] '
P~a(xz') ma(YZ') i A ! ~,~,~(xz') = ~,v (vz )J wherep: 0if(i)E
for all
({F , ( F ° - ) , . . . , ( F V ) } ) , p = l
I ES
otherwise.
TtIEORE.~t4.6. (Finite set of invariants for =--x). Suppose 21 alld A' are probabilistic sequential machines having n and n' stales respectively
where N < n~. + nr, -- #{y:y E y Y N Y' and y ¢ 01 otherwise For any initial distribution II of A and any initial distribution II ~ of A~:
Ea(x') A I
A----~,
a
= Ea,(x')
i
a,/
i,
~'"
'"
¢:~ m ( x ) = v2 ( x ) u,/(z')
.
=
] t
Vx:lg.(x')
=< n + n ' -
N-
p
tL~ ( x )
Proof: Construct A e = 21 @ A' and let Vo in Lemma 4.1 be
Vo'=
E (Vo'>
or else
otherwise. n ~ = dim (Vo') ...
"
.
.
, n , , e = { ~ : (~E Y or ~.E Y')
and
~ ~ 0 and
,)(~ Y FI Y'}
' = n ~ + n ~ , - # I ~ ) : ~ E Y.NY' and ~9#0} Using Lemnm 4.1 and all argument like the one in Theorem 4.4 establishes the theorem. Q.E.D.
BEItAVIOILa,L EQUIVALENCES BETWEEN MACtlINES
519
F. Discussion OF TIIE GENERALIZATION OF TIIE ~[OORE BOUND COROLI~ty 4.6. Let 21 and A ' be n-state deterministic machines with twovalued output alphabet Y = Y' = {1, 2}. Then A and A ' are indistinguishable for all strings if they are indistinguishable for all strings of length at most 2n -- 2. Proof: I n Theorem 4.6 we have n~ e = 2 + 2 -- 2 = 2 and p = 0 so t h a t r < 2, For deterministic machines, indistinguishability reduces to E a ( x ) = E a , ( x ) for all x E E* and also E a ( z ) = E~7a , ( x ) ~ w-.4 (x) = v2A ' (x) Hence the right side of Theorem 4.6 gives the result. Q.E.D. Theorem 4.6 can be regarded as a generalization of the Moore result (1956) to probabilistic machines with arbitrary rather than hinary output alphabets. Note that Moore's bound is 2n -- 1 since he considers the initial o u t p u t as part of the experiment. We consider the initial outputs when considering strings of length 1 since the symbol A has identity symbol matrix. The role of the zero output symbol in Theorem 4.6 is a significant departure from Aioore's deterministic results. In order to get p = O in Corollary 4.6 wc used a two-valued o u t p u t set {1, 2} rather than {0, 1} with the implicit assumption that such recoding of output symbols cannot affect indistinguishahility between deterministic machines. Without the recoding, p = 1 and the bound is still the Moore bound. RECEIVED: July 29, 1965 REFERENCES An:
520
PAOE
Jaconso.~, (1952), "Lectures in Abstract Algebr.t, Vol. I I : Linear Algebra." Van Nostrand, Princeton, New Jemcy. MOORE, E. F., (1956), Gcdanken-exl)el'iments oll sequential machines. In "Automata Studies," C. E. Shannon and J. McCarthy, eds., pp. 123-153. Princeton Univ. Press, Princeton, New Jcl~ey. hlosTow, G. 1)., SA.xxsox,J. II., AND .~,IEYER, J. P., (1963), "Fundamental Structures of Algebra," pp. 258-260. McGraw-llill, New York. P.~z, A., (1964), Probabilistic automata, Parts I and II. Inform. Control. R,~m,x', M. O., (196i), Probabilistic Automata. In "Scquential Machines, Selected Papers," E. F. Moore, ed., pp. 98-114. Addison-Wesley, Reading, Mass. I~.~.BIN,~I. O., AND SCOTT, I)., (1959), Finite automata "rod their decision problems. I B M Res. Derelop. 3, 114-125. SHANNOX, C. E., AND WEAVER, W., (19t8), "The Mathematical Theory of Communication," p. 3t. Univ. of Illinois Press, Urbana, Illinois. THrtAr~r~, R. M., AND TORNHm.~L L., (1957), "Vector Spaces and Matrices," pp. 29S-300. Wiley, New York.