Appendix II
Some Notions of Probability Theory
This appendix briefly presents some notions of probability theory which are used in the text. For more complete exposition the reader should consult standard texts on probability theory, such as Feller (1971), Papoulis (1965), Ash (1972) and Loeve (1963). In two of the sections of the book (viz. sections 5.3 and 6.7) we have used some material on stochastic processes and stochastic differential equations which is, however, not covered in this appendix; for this, the reader should consult Wong (1971), Gikhman and Skorohod (1972), Fleming and Rishel (1975) and the references cited in section 6.7.
11.1
INGREDIENTS OF PROBABILITY THEORY
Let 0 denote a set whose elements are the outcomes of a random experiment. This random experiment might be the toss of a coin (in which case 0 has only two elements) or selection of an integer from the set [0, (0) (in which case 0 is countably infinite) or the continuous roulette wheel which corresponds to a nondenumerable n. Any subset of n on which a probability measure can be defined is known as an event. Specifically, if F denotes the class of all such events (i.e. subsets of 0), then it has the following properties: (1) OEF. (2) If A E F, then its complement A C
= {w EO: io ¢ A} also belongs to F. (The empty set, 4>, being the complement of 0, also belongs to F.) (3) If Ai> A 2 E F, then Ai (\ A 2 and Ai U A2 also belong to F. (4) If A i,A 2 , • • • ,Ai>' .. denote a countable number of events, the countable intersection n:,,= 1 Ai and the countable union UiOCl= 1 Ai are also events (i.e, E F). The class F, thus defined, is called a sigma algebra (a-algebra) and a probability measure P is a nonnegative functional defined on the elements of this a-algebra. P also satisfies the following axioms. (1) For every event A E F, 0 ~ P(A) s 1, and P(O) = 1. 405
406
DYNAMIC NONCOOPERATIVE GAME THEORY
(2) If A I,A2 EF and A l n A 2 = rJ> (i.e. Al and A 2 are disjoint events), P(A I u A 2 ) = P(Ad + P(A 2 ) .
(3) Let {Ai} denote a (countably) infinite sequence in F, with the properties Ai + I C Ai and n;: 1 Ai = rJ>. Then, the limit of the sequence of real numbers {P(A i ) } is zero (i.e, limi_ 00 P(AJ --+ 0). The triple (0, F, P) defined above is known as a probability space, while the pair (0, F) is called a measurable space. If 0 = R", then its subsets of interest are the n-dimensional rectangles, and the smallest a-algebra generated by these rectangles is called the n-dimensional Borel a-algebra and is denoted B", Elements of B" are Borel sets, and the pair (R", B") is a Borel (measurable) space. A probability measure defined on this space is known as a Borel probability measure.
Finite and countable probability spaces If 0 is a finite set (say, 0 = {WJ,W2"" ,w ft } ) , we can assign probability weights on individual elements of 0, instead of on subsets of 0, in which case we write Pi to denote the probability of the single event Wi- We call the n-tuple (PI, P2' .•• , pft) a probability distribution over O. Clearly, we have the restriction 0 ::5: Pi ::5: 1 'V i = 1, .. ,n, and furthermore if the elements of 0 are all disjoint (independent) events, we have the property = I Pi = 1. The same convention applies when 0 is a countable set (i.e. o = {WI' W2, . . . ,Wi' . . . } ), in which case we simply replace n by 00_
L:;
11.2 RANDOM VECTORS Let (OI,Fd and (02,F 2) be two measurable spaces andf be a function defined from the domain set 0 1 into the range set O 2, If for every A EF 2 we have f-I(A)~{wEOI:f(w)EA}EFJ, then f is said to be a measurable function, or a measurable transformation of (0 1 , F I) into (0 2, F 2)' If the latter measurable space is a Borel space, then f is said to be a Borel function, in which case we denote it by x. For the special case when the Borel space is (0 2, F 2) = (R, B),the Borel function x is called a random variable. For the case when (0 2, F 2) = (R", B"), x is known as an n-dimensional random vector. If there is a probability measure P defined on (Ob F I)---which we henceforth write simply as (0, F)---then the random vector x will induce a probability measure Px on the Borel space (R", B"), so that for every BE B" we have Px(B) = P(x-I(B)). Since every element of B" is an n-dimensional rectangle, the arguments of Px are in general infinite sets; however, considering the collection ofsets {~E R": ~i < aj, i = 1, ... , n} in B" where a, (i = 1, ... ,n) are real numbers, restriction of Px to this class is also a probability measure whose argument is now a finite set. We denote this probability measure by
APPENDIX II
407
PIC = Px(al> a2• . . . • an) and call it a probability distribution function of the
random vector x. Note that
Px(al> a2.···. an) = p({WEn:XI(W) < al. X2(W) < a2"'" xn(w) < an}), where Xi is a random variable denoting the ith component of x. Whenever n> 1. P, is sometimes also called the cumulative (joint) probability distribution function. It is a well-established fact that there is a one-to-one correspondence between Px and p x • and the subspace on which P; is defined can generate the whole B" (cf. Loeve, 1963). Independence Given the probability distribution function of a random vector X = (Xl> •.. , xn>'. the (marginal) distribution function of each random ,¥ariable Xi can be obtained from
PX i (aJ = The random variables
Xl> ••••
lim x n are said to be (statistically) independent if
Px(al' ...• an) = Px1(ad PX2(a2) .... Px.(an). for all scalars alo ....• an' Probability density function A measure defined on subintervals of the real line and which equals to the length of the corresponding subinterval(s) is called a Lebesgue measure. It assigns zero weight to countable subsets of the real line. and its definition can readily be extended to n-dimensional rectangles in R", Let P be a Borel probability measure on (R", B")such that any element of B" which has a Lebesgue measure of zero has also a P-measure of zero; then we say that P is absolutely continuous with respect to the Lebesgue measure. Now, a well-established result of probability theory says that (cf. Loeve, 1963), if x: ('1, F, P) --+ (R", B", Px) is a random vector and if P x is absolutely continuous with respect to the Lebesgue measure, there exists a nonnegative Borel function Px(') such that, for every A E B", Px(A) =
L pA~) d~.
Such a function px(.) is called the probability density function of the random vector x. In terms of the distribution function Px' the preceding relation can be written as
PAal>' .. ,an) = f~l", for every scalar aI' ... , an'
.,. f'?", pA~lo ... , ~n)d~l .... d~n
408
DYNAMIC NONCOOPERATIVE GAME THEORY
11.3
INTEGRALS AND EXPECTATION
Let x: (n, F, P) -> (R", B", Px) be a random vector andf: (R", B") -> (R", B") be a nonnegative Borel function. Then, f can also be considered as a random vector from (n, F) into (R", B"), and its average value (expected value) is defined either by JrJ(x(w))P(dw) or by JR"f(~)PAd~) depending on which interpretation we adopt. Both of these integrals are well defined and are uniquely equal in value. Iff changes signs, then we take f = f+ - f- where both f+ and f- are nonnegative, and write the expected value off as
provided that at least one of the pair E[f+(x)] and E[f- (x)] is finite. Since, by definition, Px(d~) = Px(~ + d~) - P A~) g dPx( ~), this integral can further be written as E[f(x)] = JIl"f(~)dPx(~) which is a Lebesgue-Stieltjes integral and which is the convention that we shall adopt. For the special case whenf(x) = x we have E[x] g
JR" ~dPA~) gi
which is known as the mean (expected) value of x. The covariance of the n-dimensional random vector x is defined as E[ (x - i) (x - in =
JR" (~ -
i) (~ -.i)' dPx(~) g cov (x)
which is a nonnegative definite matrix of dimension (n x n). Now, if Px is absolutely continuous with respect to the Lebesgue measure, E[f] can equivalently be written, in terms of the corresponding density function Px' as If n consists only of a finite number of independent events WI' W2' ... , Wn, then the integrals are all replaced by the single summation E[f(x(w))] =
n
L: f(x(W;))Pi
i= 1
where Pi denotes the probability of occurrence of event co; For a countable set
n, we have the counterpart
n
E[f(x(w))] = lim n-CX)
L: f(x(w;))p;.
i=l