Detection and prediction of a stochastic process having multiple hypotheses

Detection and prediction of a stochastic process having multiple hypotheses

INFORMATION 301 SCIENCES 6, 301-311 (1973) Detection and Prediction of a Stochastic Process Having Multiple Hypotheses* D. W. KELSEY and A. H. HADD...

581KB Sizes 0 Downloads 29 Views

INFORMATION

301

SCIENCES 6, 301-311 (1973)

Detection and Prediction of a Stochastic Process Having Multiple Hypotheses* D. W. KELSEY and A. H. HADDADt Department of Electrical Engineering and Coordinated Science Laboratory, Illinois at Urbana-Champaign, Urbana, Illinois 61801

University of

Communicated by John M. Richardson

ABSTRACT A system is proposed which combines hypothesis testing and prediction to estimate the future value of a stochastic process whose statistics are known only to belong to some finite set of possible hypotheses. Bayes optimization of the individual components is performed, and system performance is discussed for a modified version of the usual mean-squared error predictor cost functions. An example is given illustrating various features of the system’s performance for a specific choice of input hypotheses.

1. INTRODUCTION The problem of optimally predicting future values of a stochastic process is an important one, with applications in statistical communications, control theory, and other fields. Briefly, a set of observations R on a signal x(r), to 0. The estimate, 2(t t A), must satisfy some preassigned criterion of optimality, the most common being minimum-mean-squared error (MMSE). Prediction of processes whose statistical properties are completely known has been thoroughly investigated [l-4]. In addition, extensive work has been done on adaptive predictors, which are employed when a priori knowledge of these statistics is incomplete [ 5-81. The problem treated in this paper is, in a sense, intermediate between classical and adaptive prediction. It is assumed that the statistics of x(r) and R are not known uniquely, but instead belong to some finite set of possibilities {Hi} , i = *This work was supported by the Joint Electronics Program under contract DAAB-O’I67-C-0199. t Please send all correspondence to A. H. Haddad at the above address-after August 1, 1973; prior to Aug. 1, ‘73, send correspondence to: A. H. Haddad, c/o School of Engineering, Tel-Aviv University, Ramat-Aviv, Tel-Aviv, Israel. 0 American Elsevier Publishing Company, Inc., 1973

302

D. W. KELSEY AND A. H. HADDAD

1,2,. . ., M. This assumption provides a good model for a class of practical prediction problems. If it is also assumed that an II priori probability Piis known for each Hi, i=1,2,. .., M, then the problems may be viewed as a combination of the classical problems of hypothesis testing and optimal prediction. That is, it is necessary both to decide which Hi is appropriate and, using this decision, to estimate x(t t h). Interestingly enough, systems which combine hypothesis testing or detection with other operations, such as prediction, estimation or smoot!ting, have not been extensively treated in the literature. Several authors, including Middleton and Esposito 191, Kailath [IO] , and Park and Lainiotis [ 1 l] , have considered estimation of a signal present with probability P < 1. Lainiotis has studied similar systems in the context of adaptive estimation [12-l 51. The purpose of this paper is to investigate a related problem. 2. PROBLEM STATEMENT Let x(t) be the signal whose predicted value x(t + h) is desired, and let R be the observation vector containing x(r) for re < r < t. It is assumed that the statistics of the signal and observations can be one of M possible models, such that if hypothesis Hi is true then the density of R and conditional density of x(t t h) may be obtained asf(RlHJ andf(x(t + h)lR,Hi), respectively, for i = M. It is also assumed that the (I priori probability of Hi is given by Pi. 1,2,. .., It is desired, therefore, to perform simultaneous prediction and hypothesis testing. Furthermore, it may be desirable under certain conditions to refrain from performing prediction if the expected error is too large. Therefore, the scheme

Fig. 1. The detection-prediction

system.

DETECTION

AND PREDICTION

OF STOCHASTIC

PROCESS

303

suggested for the combined prediction-detection is as shown in Fig. 1. The scheme has a detector and M predictors, so that when the detector decides that Hi is present, the output of the ith predictor is chosen: a(t + A) = Pii. The problem is to optimize the system in the Bayesian sense. 3. OPTIMIZATION PROCEDURE The optimization procedure to be used is an extension of that employed by Middleton and Esposito [9] and requires first of all the assignment of Bayes cost constants to the detector and cost functions to each predictor. The detector is then chosen, under the assumption that the predictor structures are known, in such a way that the total expected cost is minimized. Finally, predictors are found which further minimize this cost. To simplify the following discussion, only the binary case (M = 2) will be considered, but extensions to M > 2 can be easily derived. The cost assignments are as follows: (1) For the detector Cij = cost of deciding Hi when Hj is present; iJ = 1,2. COi= cost of making no decision when Hi is present; j = 1,2. (2) For the predictors gij[X(t + A), Rki] = cost of using the ith predictor when Hi is present; ij = 1,2. The standard assumptions that the detector costs are non-negative constants and the predictor costs are non-negative functions will be made. More general assignments might be written. In particular, the detector costs might be made functions of the observations R. The total expected cost of prediction and detection can be written as 67.= $ i=l

5 j=i

J

Pif(RIHi) (Cij +E(gij[X(t

+h),Rhi]IR,Hj})dR

Zi

t

I

[C,IP,~(RIH,)~C,,P~~(RIH~)IdR, (1)

ZO

where Zi, i = 1,2, denotes that region in the observation space Z in which the detector decides Hi and Z, that region in which no decision is made. If the following notation is used i= 1,2 Ai(phi,R) =t f’j.IWIHj)
B(R)

= G,P,f(RIH,)

+ G2P2fWIH2)

(2)

304

D.W. KELSEY AND A.H.HADDAD

then for given predictors PAi

the cost equation

(1) may be written as

[Ai(a,i,R)

bi=

- B(R)1 dR.

Since, from (2), Ai and B are all non-negative, it is clear that (3) is minimized by assigning a given observation vector R to Z, if and only if B(R) < mini [Ai( I?,i,R)] . Similarly, it can be shown that the optimal detector must assign R to Zi whenever Ai()Shi,R) < min [Aj(ghi,R),B(R)] , where i,j = 1,2 and i #j. Note that this method may be used to find the optimal detector to accompany any arbitrarily assigned set of predictors. We might, for example, assume that PAi =E{x(t + A)IR,Hi}. It is preferable, however, to derive predictors Pki which further minimize the cost bi under the assumption that the detector is as described above. Under this assumption, a careful inspection of (3) reveals that the Pki which minimize bi are simply those that minimize Ai(Pii,R). Therefore, the necessary conditions for the optimal predictors !?ki are

aAi(-Qhi,R) =O,

i= 1,2.

aRhi

Unfortunately, the solutions of (4) for specific choices of the cost functionsgij can be quite complicated. This does not hold true, however, for the following generalization of the usual mean-squared error cost functions, C,[X(ttX)-

gij

tX(f+ ~)$#lil =

{ Cp[X(t +A)-

where C, and C, are non-negative constants. yield the following optimal predictors

fhl =

Z?hi]*,

i=j

.!?Ai]*,

i#j,

Equation

(4) may be solved to

Cam1 + CpAm2

c, + c,lz Corn1+ C,Am,

9h2

(5)

= cp

+ C&A



(6b)

where A = [P2 f(R III2 )] /[PI f(R IH, )] is the generalized likelihood ratio and mi =E{x(t + X)IR,Hi} is the optimal MMSE predictor when Hi is true. The optimal predictors then are weighted sums of m, and m2, with the weightings dependent on the normalized likelihood ratio and on the assigned constants C, and Co. These constants, in turn, depend on the relative importance attached to prediction errors after a correct or an incorrect prediction has been made. When C, = CO then

DETECTION

AND

PREDICTION

P hl

OF STOCHASTIC

=&2=

305

PROCESS

m, + Am2

(7)

3

l+A

and the prediction ??(t t X) is independent of the detector’s decision and is given by the usual MMSE predictor. The detector structure may then be described by evaluating the various conditional expectations in (2). First let a; = E{ [x(t + A) - mi] * IR,&} , i = 1,2 denote the conditional error variance of predictor mi when Hi is present. Then, expressions for the conditional MSE of the individual predictors may be derived and substituted into (2) to express the detector structure as follows, Let ha(A) = AZ&q,2 h,(A)

= -A2Cpqp2

+A[Cpq,2

-

WCpqal

Gqpl

-

Up2

-

GCp(m2

-

-

GCp(m2

mdZl

-

4’1

-

Cpqpl

+Gqal

(8)

where 4a1

= co,

-

Cl1

4p1

‘C2l

+cp:

-

G&

-

4a2

co1,4p*

= co2

=c,*

-

c22

-

+cpd

G-4

-

co*.

(9)

Then, (a) No decision is made if and only if hi(A) < 0, i = 1,2. (b) When a decision is made, then H2

(Co A + C,) h2 (A) 2 (C, A + Cp)h I 09. HI

(10)

4. SYSTEM PERFORMANCE The behavior of the system described above depends to a large extent on the relative magnitudes of the assigned detector costs. These, in turn, will depend on the goals of the system user. More specifically, they will depend on whether the user is interested in determining which Hi is actually present as well as in predicting x(t t h), or whether only the predicted value is of interest. In the former case, any reasonable assignment of detector costs should satisfy Cl2

>co*

>c**,c*,

>co,

>Gl.

(11)

That is, under either hypothesis the cost of making no decision should be greater than the cost of a correct decision but less than the cost of an incorrect one. These inequalities alone, however, are not sufficient since they do not take into account the effect of predictor errors. From (9), if the MSE in (2) take their lower bound then qar, i = 1,2 represents the difference in total cost between making no decision and making a correct decision when Hi is present,

306

D. W. KELSEY

AND A. H. HADDAD

while qpi, i = 1,2 represents the cost difference between an incorrect decision and no decision when Hi is present. It is reasonable to require, therefore, that all costs should be chosen such that qai > 0, qpi > 0,

i = 1,2.

(12)

A detailed graphical analysis of (8) under the constraints in (11) and (12) reveals that only two basic detector types are possible. Let ni denote the positive root of hi(A), and n the positive root of (CPA t C,)&.(A) - (C, A t C,)h, (A). In what will be called Type 1 detection, H, is decided whenever A < or, Hz whenever A > Q, and no decision is made for v1 < A < n2. (Special cases of Type 1 detection, in which one hypothesis is never chosen, occur when nl = 0 or v2 = -.) In Type 2 detection, a decision is always made; it is H, when A < n and H2 when A > n. The detector thresholds 9, n, , and n2 differ from conventional detector thresholds in that they are not constant, but instead depend on uf and (m2 - m 1)’ , which in general are functions of the observations R. For Gaussian inputs, CJ~are constants, but the dependence of on R remains for all but a few special cases. As a result, it is diffi(m2 - ml)’ cult to make general remarks about the detector’s performance. Some useful observations can be made, however, on the effect of changes in C, and C’, on a(t t h). It has already been noted that when C, = Co then P(xth)=jl*r =4ka. Further, (6) reveals that when CD = 0 then 5?~ = mi, i = 1,2. Thus, as might be expected, when errors of the ith predictor are penalized only when Hi is present, it becomes the classical MMSE predictor associated with Hi. However, for 0 < Co < C,, a(t + h) is intermediate between these two extremes. This may be clearly seen if the predicted value for Type 2 detection is expressed as P(tth)=B(A)m,

t [I-

0(A)]m2,

(13)

where e(A) = c

G tcA~[wV+ a P

CD c +c A~[l\-~l. P ff

(14)

A similar relation for Type 1 ,detection can be written by incorporating the fact that no prediction is made for r]r < A < n2. As Co ranges from 0 to C,, the effect on 0 (A) is shown in Fig. 2. This graph was made under the assumption that u; > a:. Note that, as CD is decreased, the reduced emphasis on “incorrect” prediction in the expression for 9 hr and #h2 is compensated for by an increase in the detection threshold TJ_This results in Hz, the hypothesis associated with the greater conditional expected error, being decided less often. When only the predicted value Z?(t + h) is of interest, the restrictions in (11) are no longer appropriate. A more suitable cost assignment may be Cii = 0, i,j = 1,2. The predictor formulations (6) are unaffected by this change, but

DETECTION

AND PREDICTION

OF STOCHASTIC

307

PROCESS

a h

r,\

Cb -

“,\ \ \’

\



\ ‘1 \

.

-Y

--_

0

% - Cal2

,---_;____ --.-

CB = c (I

I-__

__

- --

---m--v

- _ --me_ -_a A



Fig. 2. B(h) vs. A for 0 < Co G C,.

the detector differs from that discussed above. It may best be described in terms of IS, the cost of using predictor i,i = 1,2, C, [~(t + h) - slxi] 2, when Hi present bi = Co [~(t + X) - PAi] 2, when Hi not present.

(19

The expected cost may be obtained as

It can be shown from (16) and (8) that the detector structure becomes as follows, (a) No decision is made when E{ biIR) >

co1

+ AC02

l+A

; i=1,2.

(17)

(b) When a decision is made, then it is HI ifE{&ilR} a?.

308

D. W. KELSEY

AND

A. H. HADDAD

5. EXAMPLE To illustrate the performance of the prediction-detection system described above, the following example was considered. The signal models are the following, H, :x(r) = VT, to
to <~
are given by 1(7)=x(7)

to 97
+n(r),

where n (7) is white Gaussian noise with power spectrum No. The random variables v and a are independent of the noise and are assumed to be independent Gaussian with means ~7,0, and variances 0,”, u,‘, respectively. It is desired to predict x(t t X) given r(r), to i 7
= 0, co, = -yKr c, 0:

co, =+rKaGo;,C,z

=K3COZ,C21

where, to satisfy (11) and (12), it is sufficient Ki>l, y>min

1.0, 08--

I

=K4Co,

(18)

to require that

i=l,...,4

(l/K,,

I

l/K,).

I

(19)

,

120

L .I 16 12

a

R1"

Pe

a= 04-

06

RP

02

04

P

o 060 10

15

a

20

310

0

25

Y

Fig. 3. Predictor cost slP, probability of error Pe, and probability of no prediction P, vs. 7 for Cfl= 0.

309

DETECTION AND PREDICTION OF STOCHASTIC PROCESS

Fig. 4. Predictor cost 61p, probability of error P,, and probability of no prediction P, vs. 7 for Cp = CJ2.

The numerical values assigned to the various parameters are: at = 1, 0, = 0.5, NrJ = 0.5, IO =O,t=2,A=0.2,C,=l,P,=OS,K,=4/3,K,=l,K,=lt (1/3b), and K4 = 1 + 1.5b, where b = ui u,, 1/(a: (I,,~), and uii is the conditional variance of (mz - mr ) given Hi. The parameter y was then increased from its initial value of 1, and the following quantities were calculated: (a) P,,= Probability that no prediction is made. (b) Pe = Probability that the detector makes an incorrect decision. (c) 6$ = Expected prediction cost-i.e., that cost resulting when all detector costs in (1) are set to zero. Figs. 3-5 show the behavior of these parameters for. three values of Co. In Fig. 6,6$, is plotted as a function of P,.For small y, H, is never chosen and the choice is between Hz and no decision. As y is increased, the relative importance of detector errors with respect to predictor errors increases. This causes a decision to be made more often, and deciding HI becomes possible. Eventually, the “no decision” region disappears entirely and the detector be10

,

08

.20 -16

BP a 2

o 06

12 pe

04

08

02

iz

04 P”

-be 0 10

30

5.0

70

!:

0

90

Y Fig. 5. Predictor cost AP, probability of error Pe, and probability of no prediction Pn vs. 7 for Cp = C,.

310

D. W. KELSEY AND A. H. HADDAD

0

0.2

0.6

0.4

0.8

1.0

P” Fig. 6. Predictor cost dip vs. probability of no prediction I’,, for CD = 0, &/2, &.

comes Type 2. Further increases in y result only in a very slight reduction of q, with 62, and P, therefore remaining essentially constant. Graphs such as those in Figs. 3 through 5 give a good illustration of the tradeoffs inherent in this system. For example, when CD= C,/2, y can be chosen so that 61, is reduced to 0.5 of its maximum value, but only at the cost of a 0.47 probability of no prediction, and P, will be 0.2. Figure 6 gives the relationship between 6$, and P,, in a slightly more useful form. By employing these graphs, the user hopefully can obtain a balance between low prediction cost and high probability of making a prediction. 6. SUMMARY AND CONCLUSIONS A system has been proposed which incorporates a detector and M predictors to estimate x(t + h) for an input signal x(r), c,,
DETECTION

AND PREDICTION

OF STOCHASTIC

PROCESS

311

REFERENCES 1. N. Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time Series, J Wiley, New York, (1949). 2. P. Masani and N. Wiener, Nonlinear Prediction in Probabilityand Statistics, (U. Grenender, Ed.), J Wiley, New York, (1957) p. 190. 3. R. E. Kalman and R. S. Bucy, New Results in linear filtering and prediction theory, Jn. Basic Engineering, Trans. ASME, Series D, 83, p. 95, (1961). 4. E. Parzen, A New Approach to the Synthesis of Optimal Smoothing and Prediction Systems, in MathematicalOptimization Techniques, (R. Bellman, Ed.), University of California Press, Berkeley, Calif., (1963) p. 75. 5. D. Gabor, W. P. L. Wilby and R. Woodcock, A universal nonlinear filter, predictor and simulator which optimizes itself by a learning process, Proc. IEE, Part B., 108, p. 422, (1961). 6. A. V. Balakrishnan, An adaptive nonlinear data predictor, Rot. Natl. Telemetering Conf, 2, paper 6-5, (1962). 7. L. D. Davisson, The theoretical analysis of data compression systems, Proc. IEEE, 56, p. 176, (1968). 8. L. D. Davisson, Steady state error in adaptive mean-square minimization, IEEE Trans. Inform. Theory, IT-16, p. 382, (1970). 9. D. Middleton and R. Esposito, Simultaneous optimum detection and estimation of signals in noise, IEEE Trans Inform. Theory, IT-14, p. 434 (1968). 10. T. Kailath, A note on least-squares estimates from likelihood ratios, Information ond Control, 13, p. 541 (1968). 11. S. K. Park and D. G. Lainiotis, Joint detection estimation of Gaussian signals in white Gaussian noise, Proc. 1970 IEEE Symp. Adaptive Processes, (1970). 12. D. G. Lainiotis, Optimal adaptive estimation: structure and parameter adaptation, IEEE Trans. Automatic Control, AC-16, (1971). 13. D. G. Lainiotis, Sequential structure and parameteradaptive pattern recognition-Part I: supervised learning, IEEE Trans. Inform. Theory, IT-16, p. 548, (1970). 14. D. G. Lainiotis, Supervised learning sequential structure and parameter adaptive pattern recognition: discrete data case, IEEE Trans. Inform. Theory, IT-17, p. 106, (1971). 15. D. G. Lainiotis, Joint detection, estimation and system identification, .I. Inform. and Control, 19, p. 75, (1971). Received December 6, 1971; revised version received July 3, I972