Back-to-back mixtures of discrete distributions

Back-to-back mixtures of discrete distributions

Journal of Statistical Planning and Inference 11 (1985) 267-276 267 North-Holland BACK-TO-BACK Paul MIXTURES OF DISCRETE DISTRIBUTIONS S. H...

543KB Sizes 0 Downloads 61 Views

Journal

of Statistical

Planning

and Inference

11 (1985) 267-276

267

North-Holland

BACK-TO-BACK Paul

MIXTURES

OF DISCRETE

DISTRIBUTIONS

S. HORN

Department of Mathematical Sciences, Mail Location #25, University of Cincinnati, Cincinnati, OH 45221, USA Received

9 December

Recommended

1983; revised manuscript

Abstract: A method

for expanding

very simple: a breakpoint distributions appealing poses,

received

4 September

1984

by V.P. Godambe

are fit. Thus,

is chosen

the choices

for fits of discrete

the method

is a mixture

of two discrete

in light of the ease with which the likelihood

the method

data is given. The method

for the data set on either side of which two separate

is used on the data set that motivated

AIMS Subject Classification: Primary

62P99;

Key words and phrases: Discrete random

Secondary

variable;

equations

distributions. simplify.

is

discrete

The method

For illustrative

is

pur-

its conception. 62ElO.

Mixture

model.

1. Introduction The motivation for the material presented here came from an analysis of residuals from a counting process. Since the process had occasional large jumps, the residuals tended to be skewed toward large values. The analysis consisted of coming up with an appropriate discrete model of the residuals. This was difficult since the residuals consisted of both positive and negative values and were long-tailed and skewed. For long-tailed skewed discrete data, finding a suitable distribution to describe the behavior may be difficult. It may be the case, however, that one distribution fits one side of the data well, and another fits the other side well, but no single distribution provides a good fit for the entire data set. We propose that this fitting of parts of the data separately, or back-to-back, can yield a good fit. The price that is paid is in terms of the degrees of freedom used for the fit: the total number of parameters for both discrete distributions, plus one for the mixture probability, plus one because the fits are taken separately. Degrees of freedom can be saved by allowing the two distributions to share some, or all, of the parameters. This will of course not provide as good as fit as in the case described above.

0378.3758/85/$3.30

0

1985, Elsevier

Science

Publishers

B.V. (North-Holland)

P.S. Horn / Back-to-back mixtures

268

2. Back-to-back

mixtures

We wish to define a discrete probability function, f( . ), with associated random variable X, on the integers, say, such that one side exhibits the behavior of discrete probability function fr( - ), and the other side exhibits the behavior of another discrete probability function f2(. ). The two sides of the domain will be separated at an integer k, the minimum value for f2(. ). This number k will be referred to as the breakpoint. Specifically, let P(X=x)=f(x)=Pf~(x)+(~

-P)MX),

(1)

where k-l iz;_fi(O=

1,

i~kf*(i)=l~

f,(O=O,

izk,

i
“MO=Q

and p=P(X
f(x) =P x= .

..)

(-(x+ -JO,

1)

l))!

Z[X
- w~[,,o](x),

l)... ,

where Zr.j(x) is the indicator function. Thus, back-to-back mixing greatly increases our choices of discrete models. Note that in the above back-to-back mixture model the breakpoint was chosen a priori. This will be the case when there is a natural point at which to separate the data. For example, with signed data a breakpoint of zero (or one) would make sense. However, an a priori choice of a breakpoint, though natural, need not provide the best fit. One could let the breakpoint k vary over the possible data points

P.S. Horn / Back-to-back mixtures

and choose that k that gives, say, the maximum yield likelihood equations much more difficult a priori breakpoint. The breakpoint, k, can be estimated using

269

value for the likelihood.

This will

to solve than those derived

from an

criteria

other

than

maximum

likeli-

hood. For instance, it can be chosen to minimize the chi-squared goodness-of-fit statistic, and the other parameters (functions of k) can be derived by maximum likelihood. Alternatively, a simpler a priori choice of the breakpoint is the sample mode. Of course when the data are used to estimate the breakpoint an extra degree of freedom is used. The rest of this study will be concerned with situations in which there is a natural breakpoint in the data. An attractive feature of this situation is the simplicity of the resulting likelihood equations.

3. Estimation

of parameters

We wish to fit the discrete

distribution

f(.

) of Equation

(1) to the data points,

where fr(x) =f,(~;t?) and fi(x) =f2(x$), where d = (aI, . . . ,cxU) and 8= t, . . . , &) are parameter vectors. Without loss of generality, we may assume that (P Then the logarithm of the likelihood is as x,(~~(...~x,
Xl,

.*-,XN,

follows: L(p,ti,~~A=nlogp+

i: logf,(x;;&) i=l

+

w-n> log (1 -P> +

f

logf*(x,;D~>,

,=n+l

where n is the number of x,‘s less than k. This is the case since f,(x) =0 for xz k and f*(x) = 0 for x< k. Note that rj, the maximum likelihood estimate of p, is the solution to the equation N-n

aL(p,ti,/qq=;- -

ap

1-P

= 0,

that is lj=n/N. Thus, rj is simply the sample proportion of observations less than k. This is reassuring in light of Equation (l), where it was stated that p = P(X< k). An attractive feature of the back-to-back mixture is the fact that the likelihood equations for the remaining parameters are no more difficult to solve than the equations for a non-mixture model. This is because the resulting likelihood equations involving the ~j’s do not involve any Pm’s. Thus we simply fit the points less than k separately, as we would if these were the only data points. The same is true of the data points greater than or equal to k. The two probability functions that describe the data set are related only through the parameter p. This implies that we need not fit using maximum likelihood, we could fit using, for example, method of moments. The point is that we can fit the two sides of the data separately.

270

P.S. Horn / Bark-to-back mixtures

The back-to-back mixture is thus not as difficult to use in practice as might first appear. For example, if fr(. ) is a binomial and f2(. ) is a negative binomial then there are a total of 5 parameters, including the mixture probability, p. As previously noted, the maximum likelihood estimate of p is trivial to compute since it does not involve any of the other parameters. The remaining four parameters are estimated though two sets of two equations, not one set of four equations. The former situation is, of course, much simpler.

4. An example The data given in Table 1 are the residuals from a method of forecasting demand for special types of telecommunication equipment. Knowledge of the distribution of these residuals would be of help to those making future predictions. Note that negative residuals indicate that the forecast was too large, while positive residuals are indicative of under-forecasting. To fit the data from Table 1 we look to a heavy-tailed discrete distribution, say the negative binomial. To fit a negative binomial to these data we will shift it so that

Table

1

Residuals

from forecast

i

fl

(f, = frequency i

of i)

f,

i

i

fi

-28

1

758

25

9

51

1

91

1

-22

2

495

12 11

53

2

54

2 1

1

h

i

fi

-21

3

351

26 27

2

93 96

-20

4

252

28

10

55

1

102

- 19

5

216

29

10

56

3

106

1

- 18

150

30 31

4 5

57

1

107

1

58

1

32

4

59

4

109 112

1 1

- 17

8

6 7

- 16

10

8

- 15 - 14

12

9

75

33

7

60

2

118

1

16 14

10

51

34

5

64

2

11

56

35

8

65

3

120 127

1

- 12

20

12

49

36

5

-11 - 10 -9 -8 -7

29 31 61 63

13 14 15

45 41 30

3 1

68 69 73

1 1 2

100

19 25

2 4 5 3

66 67

16 17

37 39 40 41

-6

142

18

22

42 43

5 4

76 77

1 1

-5 -4

206 304 587 695

19 20 21 22

20 14 14 13

44 46 47

5 3 4

80

1

81 82

1 1

915 1300

23 24

16

48 49

1 3

83 84

1 1

9

50

3

86

I

-13

-3 -2 -1 0

Ill 95

1

137

1

139 154 167 168

1 1 1

188

1

1

271

P.S. Horn / Back-to-back mixtures

the sample minimum, - 28 corresponds to the point 0. It is important to note that this shift and subsequent fit will force, P(X< -28)=0. This will cause a logistical problem if there is no natural minimum for the data being examined. Since there is no reason, in this example, why an observation could not be - 29, there is some problem with the single distribution fit. However, since it could be argued that such observations are infrequent, this problem is not worth troubling over. Using the formulas

found

maximum likelihood our fitted model is P(X=x)

in Johnson

estimates

=

x= -28,

and Kotz (1969), p. 132, we compute

N and P to be 22.65 and 1.308 respectively.

the

Thus,

r(so*65 +X) (1.308/2.308)X+28(1/2.308)22~65, r(22.65)((28 + x)!) -21,

-26 ,....

In Figure 1, histograms of the empirical data and the fitted negative binomial are given (though not all 22 cells are shown). The fit is not very good as shown by the diagram and the chi-square statistic, which has a value over 6000. We will now fit a back-to-back mixture. We must first decide on the breakpoint, k. Since the data are residuals, and thus signed, k= 0 is an appropriate point at which to split the data set. This has the interpretation of treating residuals from strictly over-forecasting differently from the other residuals. Thus, the negative (
_

Chi-squared statistic= Degrees of freedom-

Residual

Fig. 1. Negative

binomial

fit.

P.S. Horn / Back-to-back

212

mixtures

negative (2 0) residuals will be used to give another fitted distribution. From Table 1, we compute the sample mean and variance of the negative data to be -3.402 and 9.163 respectively. We further compute the sample mean and variance of the non-negative data to be 5.317 and 142.375 respectively. Since for both parts of the data set the variances are much greater than the absolute values of the means, and we wish to use only a few parameters, we turn to suitable oneparameter distributions to describe each part of the data set. We will fit a geometric distribution, with parameter r, to the negative part of the data, and a (shifted) zeta distribution, with parameter Q, to the nonnegative part of the data (Johnson and Katz, p. 240). This yields the fitted model P(X=x)=O.4224(1

-0.7061)(0.7061)-X~1Z,,<,,(x)

+0.5776(x+

x= . . . . - l,O, 1, . . . .

1)-1~6”5/2.251~,,01(x>,

In Figure 2, the empirical and fitted histogram are given along with the chisquared statistic. Note the improvement of the back-to-back model over the previous model. With the single distribution fit the value of the chi-squared statistic was 6267 with 19 degrees of freedom (22 (cells) -2 (parameters)I), while the above back-to-back fit has a chi-squared value of 966 with 17 degrees of freedom. (The negative data have 6 (8 - I- 1) and the nonnegative data have 12 (14- I- 1) degrees of freedom respectively. An extra degree of freedom is used to estimate the

Chi-squared statistic= C@Vees of freed-

-8

-7

-6

-5

-‘I

-3

-2

-I

0 1 Residual

Fig. 2. Back-to-back

2

3

15

geometric/zeta

6

7

fit.

89

966 I,

0:

Ewirical

a:

llodel

10

II

12

13

P.S. Horn / Back-to-back

mixture

probability.)

Although

both chi-squared

mixtures

statistics

will often be the case with a large data set, the back-to-back

213

are highly significant,

as

fit is an improvement

over the single fit by a factor of almost 6.5, while only using two more degrees of freedom. The improvement in the fit, as shown by the diagrams and the chi-squared statistics is worth the price of two degrees of freedom. Furthermore, back-to-back mixtures do not suffer from the logistical restrictions of the single discrete distribution. With the back-to-back mixture there is no restriction on the minimum value of a residual. Even if there were a natural minimum in the above example, a back-to-back mixture would still have been appropriate: a truncated geometric distribution could have been used to model the negative residuals. As a further example, let us fit a back-to-back model to these data with the negative values again described by a geometric (parameter r), but now with a negative binomial distribution (parameters N and P) to describe the non-negative values. This yields the fitted model P(X=x)=O.4224(1 +

o

-0.7061)(0.7061)~X~‘Z,,<,,(x) 5776

W.4243 + 4 (0.9261)X(1 -O.9216)o.4243&o,(x), I-(0.4243)x!

x=...,

-l,O,l,....

Figure 3 gives the empirical and the fitted histograms along with the chi-squared statistic. Again we see a great improvement in the fit. Here the value of the chisquared statistic is 274 with 16 degree of freedom. Thus, for the price of three degrees of freedom we see an improvement over the single distribution fit by a factor of almost 23. (For one degree of freedom, this back-to-back mixture model is an improvement over the previous back-to-back mixture model by a factor of 3.5.) As a final example let us fit the following back-to-back model:

x=

... . -

l,O, 1, . ..)

where pL t-p2 +p3 = 1. This model is an improvement over the previous back-toback models in that by fitting the breakpoint, 0, separately, we do not have to decide whether to include it with either the strictly positive or strictly negative data points. Of course, by fitting the breakpoint separately the fitted frequency will necessarily equal the observed frequency at a cost of one degree of freedom. If we fit the above model to the data with the negative data described by a geometric distribution, and the positive data described by a negative binomial distribution, we get the fitted model P(X=x)=O.4224(1

-0.7061)(0.7061)-X-‘I,,
P.S. Horn / Back-to-back mixtures

Chi-squared statistic= Degrees of freedan=

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

8

9

10

274 16

11

12

13

Residual

Fig. 3. Back-to-back

+ 0.4075

geometric/negative

binomial

fit.

(o.9344)X- I(1 - o.9334)o~46~01,,>01(x), =(x- o.5340) QO.466O)(x - l)!

x= . ..) -l,O,l,.... Figure 4 gives the empirical and fitted histograms along with the chi-squared statistic, which is equal to 213 with 15 degrees of freedom. Thus, for the price of four degrees of freedom the above back-to-back mixture model is an improvement over the single distribution fit by a factor of almost 30. (For one degree of freedom this fit is better than the previous back-to-back fit by about 20%.)

5. Iso-back-to-back

mixtures

In the first example of a back-to-back mixture model, we used two different distributions, Poisson and geometric, with different parameters, A and r. Similarly, in the first back-to-back fit we used two different distributions, geometric and zeta, with different parameters, r and Q_ The reason for this was that we believed that the negative residuals behaved differently from the non-negative ones. This was obvious by looking at Table 1 where the nonnegative residuals are much more skewed.

P.S. Horn / Back-to-back

mixtures

275

h

Fig. 4. Back-to-back

geometric/breakpoint/negative

binomial

fit.

We need not adhere to such generalities if we believe that the data on both sides of the breakpoint, k, have the same shape. LetS( . ;&) be a discrete probability function defined on the non-negative integers. Now, let us define the iso-back-to-back mixture model of f( . ) with breakpoint, k, as, P(X=x)=fO=pfU-

1 -x;~~~~,
-p)f(X--k;~)I~~~o,(x),

x= . . . . - l,O, 1, . . . . Note that, unlike the previous back-to-back model, the parameter vectors in this model are the same. The iso-back-to-back could even be used with different types of discrete distributions. For example, one could fit non-negative data using a binomial distribution with parameters N and p, and fit the negative data with a Poisson distribution with parameter A = Np. This would save one degree of freedom. In general, one would use the iso-back-to-back mixture model if one believes the data exhibit similar shapes on either side of the breakpoint and/or there are not many degrees of freedom to give up.

6. Conclusion There is much to be gained

by back-to-back

mixing,

especially

with long-tailed,

P.S. Horn / Back-to-back mixtures

216

skewed, discrete data. For the price of a few degrees of freedom, substantial gains can be made in fitting a back-to-back mixture over a single fit of the whole data. Back-to-back mixtures may be especially appropriate if there is a natural ‘break’ in the data. For example, with signed data, like residuals, the point zero will often be appropriate. That is, treat the strictly negative residuals separately from those that are greater than or equal to zero. If one believes that the data have similar shapes on either side of the breakpoint, and/or there are not many degrees of freedom in the first place, then iso-back-toback mixtures would be appropriate. These models are similar to the ordinary backto-back mixtures, except that the two distributions use some or all of the same parameters. Thus, not as many degrees of freedom are used for the fit and both sides of the data may be modeled with similarly shaped distributions. Though not examined in this study, back-to-back mixtures may be appropriate with bimodal data; the obvious breakpoint lying between the two modes. Back-toback mixtures that use more than one breakpoint are also possible - the likelihood equations, though more numerous, would be no more difficult to solve than in the single breakpoint mixture. In the residual setting, for example, we could fit a separate parameter to P(X=O) at the cost of one degree of freedom. This would preclude our having to group the zeroes with either the strictly positive or strictly negative residuals. Finally, we note that all that has been presented for discrete models holds for continuous models as well. (There will be a discontinuity at the breakpoint(s), however.) This would be an interesting area for further study.

Acknowledgement The author

wishes to thank

the referees

for their helpful

suggestions.

References Blischke,

W.R. (1964). Estimating

the parameters

of mixtures

of binomial

distributions.

J. Amer. Statist.

Sot. 59, 510-528. Blischke, W.R. (1962). Moment estimators tions. Ann. Math. Statist. 33, 444-454. Johnson,

for the parameters

N.I. and S. Kotz (1969). Discrete Distributions. John

of a mixture

of two binomial

Wiley and Sons, New York.

distribu-