“Explosive” processes in recidivism and other reinforcement models of learning

337KB Sizes 0 Downloads 76 Views

Report

PDF Reader
Full Text

Math1 Comput. Modelling Printed in Great Britain.

Vol. 15, No. 6, pp. 51-54, All rights reserved

1991 Copyright@

0895-7177/91 $3.00 + 0.00 1991 Pergamon Press plc

“EXPLOSIVE” PROCESSES IN RECIDIVISM AND OTHER REINFORCEMENT MODELS OF LEARNING IRWIN Department

GREENBERG

of Operations

George Mason University,

Research and Applied Statistics

4400 University Drive, Fairfax, (Received

Virginia 22030 USA

July 1990)

Abstract-An ‘explosive’ stochastic process features an intensity function that decays exponentially in the absence of occurrences and increases in jumps following each occurrence. Two such processes are examined and some transient and steady-state probabilities are obtained. The possible application of these models to criminal recidivism, habit and addiction, and other models of reinforcement are described and the difficulties of statistical inference are discussed. INTRODUCTION

Vere-Jones and Ogata [l] examined a “self-correcting” [0, t) and conditional intensity function O(Wot)

=

ev{a

+P[t

process,

-

a process

with a history

rN(t)l},

Hcl over

(1)

where a, /3 and T are real parameters with ,f3> 0, T > 0, and N(t) is the number of occurrences in [0, t). The initial intensity, at t = 0, is ea. In a later paper [2] they applied this model to the occurrence of earthquakes and derived maximum likelihood estimates of the parameters. The process is self-correcting in the sense that each occurrence reduces the intensity while the absence of occurrences over a period of time increases the intensity. This author [3] introduced a system with the reverse pattern of behavior. In attempting to model criminal recidivism it was postulated that the more crimes an individual has committed in the past, the more likely he or she is to commit crimes in the future and the longer that the individual has abstained from committing crimes the less likely he or she is to commit crimes in the future. There are other learning and reinforcement phenomena to which this description might also be applied: for example, use of habit-forming or addictive substances such as tobacco, alcohol or narcotics; physiological trauma such as heart attack or muscle strain; and perhaps consumer purchase behavior. This type of phenomenon could be modeled by equation (1) with ,f9 < 0, T > 0; an “explosive” process, explosive in the sense that there exists a non-zero probability that the intensity grows without bound due to the positive feedback in the system. This results in the incorrigible criminal, the chain smoker or addict, or physiological failure (or death) in the application examples cited. Certain electronic or mechanical reliability “shock” models [4] might also be treated as explosive processes if it appears reasonable that each shock makes it more likely for future shocks to occur and the absence of shocks allows the system to recover somewhat. As in the physical trauma model the infinite intensity could be interpreted as a system failure. The recidivism model featured an intensity function that decayed exponentially in the absence of occurrences; if there were no occurrence in [t, t + T) then O(t + T) = O(t) exp{PT},

P<

0,

(2a)

which also holds for the Vere-Jones and Ogata intensity function of equation (1). An occurrence at t increased the intensity by an amount r, either a constant or a random variable: O(t+) = O(t-)

+ r.

(2b) Typeset

51

by A,#-T)$

I. GREENBERG

52

Note that if r always takes on the same value T, 0(t) is increased by the fixed amount r following each occurrence. In the Vere-Jones and Ogata model (in which r is a constant) the corresponding relationship is O(t+) = O(t-) exp{-pr}, p < 0; O(t) is increased to a fixed multiple of its value at the instant prior to the occurrence. In the sections that follow, the models based on intensity function (1) and intensity function (2a), (2b) will b e examined. In both cases the parameter b = -/3 will be used. The resulting notation with P and b both positive should be more intuitive. THE

RECIDIVISM

MODEL

For the process described by the intensity function (2a) and (2b) with /3 replaced by -b define p,(tlw) E prob{ N(t) = n 1e(O) = w}. Th e p ure birth argument for zero occurrences leads to = --w

$po(tlw) with boundary

condition

e-btpO(tlw),

pc(O]w) = 1. Hence, po(tlw)

= exp{-(w/b)(l

- esbt)}.

(3)

The first occurrence at or prior to t requires more than zero occurrences in [0, t); hence the CDF of the time until the first occurrence, conditional on the initial intensity w is F(t Iw) = 1 --po(t Iw). The conditional density function is thus f(tlw)

= w exp{-bt

- (w/b)(l-

esbt)}.

(4)

This is not a proper density function; F(co]w) = 1 - exp{-w/b} reflecting the possibility that there are no occurrences ever. Let t, denote the time of the nth occurrence; if there are fewer than n occurrences then t, F 00. It is clear that the density function for t, - t,_l = t conditional on B(t,_l+) = w is given by (4). If the amount added to the intensity function following an occurrence has CDF Q(T), independent and identically distributed for all occurrences, then the transition CDF for the intensity function is the convolution of (4) with Q(r). This can be written G(z]w)

= Prob{B(t,+) =

J

0z(l/b)

2 z IO(t,_l+)

exp{-(w

- r)/b}

= W}

Q(z - r)dr.

(5)

This expression has a relationship to the theory of the M/G/l queue with mean interarrival time b and service time CDF Q(r). If s, is the total (waiting plus service) time spent in the system by the nth arrival after the start of a busy period then equation (5) represents prob{s, 5 z I s,-1 = w}. Since fl(t,) and sn have the same stochastic behavior, a number of results from queueing theory can be utilized to yield analogous results for the O(t) process. One such result deals with the total number of events. Let N = limN(t), the limit taken as t goes to infinity. The probability that N = n is the same as the probability h(n]w) that n are served in a busy period begun with initial work w (see equation (19) of [5]): h(nlw)

= (w/bn!)

Jm

e -(T+W)/b [(r + w)/b]dQn(r),

0

where an(r) refers to the n-fold convolution of @(r) with itself. If the expected value of r is greater than or equal to b then there is a non-zero probability that the busy period will be infinite in length, corresponding to N = co. In fact,

2 n=O

h(nlw)

= emWZ(0),

“Explosive” processes in recidivism

where

Z(s)

is the unique

non-negative

53

root of f(z) + (z - s)b-

1 =o,

with f(z) being the Laplace-Stieltjes transform of (a(r); 1 - exp{-wZ(0)) is the probability of an infinite number of occurrences. It might be questioned whether ‘Y going to infinity” has any relevence to a model based on human lifetimes. In fact, N < 00 iff the limiting value of e(t) = 0; an individual who stops smoking, committing crimes or taking drugs does so at some time usually well before the end estimate of the probability of n of his or her life. In that case, h(n]w) could be a reasonable occurrences in a lifetime. In any event, the sum of h(n]w), n going from 0 to N is an upper bound on the probability of N or fewer in a lifetime. The maximum of 0(t) over the lifetime is analogous to the maximum virtual delay in an M/G/l busy period. Let RI(K) be the function whose Laplace transform is emzw/[f(z) + bz - l] and az(K) be the function whose Laplace transform is l/[f(z) + bz - 11. Then, from equation (6) of Greenberg [5] Prob{c;Zl
= (w/bra!)

[(w + nr)/b]“-’

If T > b then there is a unique, of finite N is

real, positive

exp{(w + nr)/b},

transform

of sZi(K)

e-*“‘/(e--rz

,...

.

root (call it 2) of eStr + br - 1 = 0. The probability

e -Wz = (e-“z)“‘/r Also, the Laplace

n=0,1,2

= (1 _ bZ)w/‘.

is

+ b% - 1) = eeZw 2

(1 - e-“‘)‘/(b,)j+‘.

j=O

Expanding (1 - emZr)j by means of the binomial theorem, and interchanging the order of summation yields Q,(K) with the summation Similarly,

= x(1/j!)

extending &(I<)

[-(K

- w - jr)/b]j

from j = 0 to the largest = x(1/j!)

[-(Ii’

- jr)/b]j

inverting

exp{(K integer

the transform

-w

term-by-term

- jr)/b},

less than or equal to (I< - w)/r.

exp{(K

- jr)/b},

with the summation extending from j = 0 to the largest integer less than or equal to K/r. Their ratio determines the CDF of the maximum of O(t), as given by equation (7). The derivation of results for finite t values is, as one would suspect, much more difficult than the limiting results discussed above. Conceptually, one could utilize the recursive relationship pn(t]w) beginning with the known This leads to ~i(t]w)

= wexp{-(w/b)

For the deterministic ~r(t]w)

= 1’ Jam f(z]w)p+l(t expressions

for Pc(t]w)

(1 - e -““)) 1’ lW

- z]r + wembO) d@(T) da: and f(z]w),

exp{ -bx

case in which r = P with certainty

= (w/b)

exp{ -(l/b)

equations

- (r/b)

(3) and (4) respectively.

[l - e-b(t-“)]}

it can be shown that

[r + w(1 - embt)]} [&((r/b)

d@(r) dr.

this becomes

eWbt) - eWbtEz(r/b)],

where Ez(2) is the exponential integral, equation (5.1.4) of Abramowitz and Stegun [6] with n = 2. Further iterations do not appear to lead to any apparent general analytical (as opposed to numerical) solution for p,(tlw).

I. GREENBERG

54

THE

VERE-JONES

AND

OGATA

MODEL

From the discussion surrounding (2a) and (2b) it is clear that the Vere-Jones and Ogata model follows the same behavior as the recidivism model until the first occurrence. Hence, the same derivation can be used to conclude that the probability of no occurrence in [O,t) is given by (3) and the conditional pdf of the time of the first occurrence by (4). The e(t) process of equation (1) h as a lattice structure in that it can only take on values exp{cu - b(t - rn)} = w exp{-b(t - rn)}, n = 0,1,2,. . . at time t. This allows one to use the pure birth argument to obtain &(tlw) It can be easily verified

that

= ‘We-*l [e+l)b’p”-~(t~w) this is satisfied

by

R”+j(j-1)/2 pn(tlw)

- PVp&lw)].

exp{_(w/bRj)(l_e-at)}

= &-1)-j [#=,(l

j=O

- R’)]

[n;;;(l-

R’)]

’

with R E emaT. The lattice property indicates that the conditional CDF of the intensity function at time t, Prob{B(t) < 0l0(0) = w}, can be obtained by summing pn(tlzu) over all integers less than or equal to (t/r) + (l/b) log(B/w). The limiting value of pn(tl w ) as t goes to infinity is this same finite series with evbt = 0. As is the case in the previous section, these limiting values will sum to a number less than unity, reflecting the possibility of n growing without bound. No closed form expression is immediately apparent and the explosion probability would have to be obtained numerically. STATISTICAL

CONSIDERATIONS

The statistical inference for the Vere-Jones and Ogata model was based on writing the loglikelihood function of the intensity function for fixed, finite time Tin which N(T) = N occurrences were observed: 1ogL = 5

logO

- lTO(t)

cit.

0

n=l

Conceptually, a single individual could be monitored over the interval and the parameters Q, b and P estimated, at least numerically. Whether usual distributional assumptions can be made is problematical: the derivatives of the log-likelihood function that would be equated to zero to obtain the estimates must satisfy a law of large numbers and the central limit theorem. The Further, the usual problem of the choice explosive nature of the process may preclude this. between alternative models and the design of an experiment that would lead to an estimate of the joint distribution of the parameters throughout the population present serious obstacles to the utilization of these models in practice. REFERENCES D. Vere-Jones and Y. Ogata, On the moments of a self-correcting process, J. Appl. Probab. 21, 335-342 (1984). PTOC. 2. Y. Ogata and D. Vere-Jones, Inference for earthquake models: A self-correcting model, Slochaetic Appl. 17, 337-347 (1984). 3. I. Greenberg, A state dependent Poisson process with an application to crime rates, ORSA/TIMS Bulletin 10, 70 (1980). 4. R.E. Barlow and F. Proschan, Slatislical Theory ofRe/iabilily and Life Testing, Halt, Rinehart and Winston, 1.

5. 6.

New York, pp. 91-97, (1975). I. Greenberg, Some duality results in the theory of queues, J. Appl. Probab. M. Abramowitz and I.A. Stegun, Handbook Washington, D. C!., p. 228, (1964).

of Mathematical

Functions,

6, 99-121 (1969). National Bureau of Standards,

“Explosive” processes in recidivism and other reinforcement models of learning

“Explosive” processes in recidivism and other reinforcement models of learning

Recommend Documents