ELSEVIER
International Journal of Forecasting 11 (1995) 539-541
Evaluation of probability forecasts of events Yosihiko Ogata Institute of Statistical Mathematics, Tokyo, Japan
Abstract
Methods for evaluation of probability forecasts of events are described and an application to earthquake prediction is introduced. Keywords: AIC; Evaluating score of forecasts; Contingencytables
Professor Vere-Jones thoroughly reviewed the state and art of predicting earthquakes and forecasting earthquake risk, including the important idea of the conditional intensity of point processes, various models based on this idea, appropriate estimation procedures and related interesting topics. Here, as a supplement to Professor Vere-Jones' excellent paper, I would like to contribute a commentary on the evaluation of probability forecasting procedures whether it is model-based or not. As evidence from forecasting experiments accumulates, their evaluation gets increasingly important if forecasting methods are to improve. Let p(Ft) be a probability of the occurrence of an event in a prescribed certain time interval in the future forecasted based on the information F, at and before time t. Usually, a model for forecasting is estimated by fitting it to data of past events and any other information useful for its prediction. If a predictive model is given in term of the conditional intensity A(t I Ft) depending on the past history and other information F t such as a variety of models described in the Professor Vere-Jones' paper, then we have
p, = p(F,) =
ft t+x h(s IFs) ds
for some interval length x. On the other hand, the logistic transformation Pt log 1 - p , - f ( F , )
of the probability provides another type of models frequently used for such forecasting problems. For instance, consider the case when a sequential earthquake activity starts somewhere. It can be a swarm or simply an aftershock sequence besides foreshocks. Therefore, it is very desirable to know whether the activity is a precursor to a forthcoming significantly larger earthquake or not. Using a logistic model, Ogata et al. (1994, 1995) discussed using conditional probabilities to estimate the probability that a cluster of earthquakes will be foreshocks of a forthcoming larger earthquake, depending on the history F t of occurrence pattern in space, time and magnitude of some of the earliest events in the cluster. In any case, when the likelihood-based estimation methods are used to construct the forecast-
0169-2070/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0169-2070(95)00622-2
Y. Ogata / International Journal of Forecasting 11 (1995) 539-541
540
ing models, the "Akaike Information Criterion" (Akaike, 1974),
and AIC 1 = ( - 2 ) ~'~ {'Onlog p, + (1 -'on) log(1 _ p n ) } ,
AIC = (-2)max log likelihood + 2(number of adjusted parameters) is useful for such purposes as Professor VereJones described in the paper, where the maximum is attained with respect to the parameters of the model. A model with a smaller AIC is considered to be a better fit to the data. The AIC is an estimation of the expected entropy (Akaike, 1977) which measures how close the (maximum likelihood based) predictive model is to the true underlying probability law of the current data. At the same time, there are many models which are not always likelihood-based, but based on complex trial-and-error experiments such as the M8 pattern recognition algorithm (see Section 6.4 of Professor Vere-Jones paper), or based on a compromise procedure incorporating experiences of experts as in obtaining the rainfall probability in daily weather forecasts. Performance of the probability forecast of rainfall events is usual measured by meteorologists using the Brier's P-score ~Brier, 1950) such that P = ( l / n ) ~'n ( P n - - ' O n ) , where Pn is the forecasted probability of the rainfall and 'On takes the value either 1 or 0 according as the rainfall event actually occurred or not. However, the score is not sensitive from the statistical viewpoint. As the alternative score, in Ogata et al. (1995), performance of probability forecasts of the foreshock identification is measured by using AIC in two ways as follows. First, performance of the forecasts {Pn = p(Ftn); n = 1, 2 , . . . ) } conditional on the seismic patterns of clustering earthquakes is compared with the unconditional probability forecast P0 which is the ordinary ratio of the foreshock cases to all the cases in the data base of past events. Then, AIC can also be applied to the evaluation as an estimate of the negative entropy (Boltzmann, 1878; Akaike, 1977). The corresponding AICs are AIC 0 = ( - 2 ) ~ {'onlog P0 + (1 - "on)log(1 -P0)}
n
n
where 'On takes the value either 1 or 0 according as the corresponding event is actually occurred or not, respectively. Note here that, in the present application of the AIC, we have not estimated any parameter (i.e. the parameter values in the log likelihood are fixed as a prescribed predictive model) so that the number of adjusted parameters in the AIC is zero in both cases above). The difference A A I C = A I C 1 AIC 0 here measures the quality of the forecasting performance, and consists of components 1~ p n Qn ='on l°g p n + (1 -'on)log Po 1 - Po
such that AAIC = ( - 2 ) r. n Qn. Here, Qn indicates a size of gain or loss of the forecast Pn against the unconditional probability forecast P0 according to whether its value is positive or negative, respectively. Thus Qn at each stage n is useful for real time diagnosis of a probability forecast. AAIC evaluates overall performance of a series of forecasts. If there is any other proposed forecasting procedure, then the corresponding AIC is calculated to compare with AIC 1. Another evaluation of the performance of the forecasts is carried out by the examination of cross-classified tables. Divide a set of forecasting experiments into several classes at suitable forecast probabilities, and further sub-divide them into two classes according as each event actually took place or not. As such, the cross classified categories are summarized by a contingency table {N~,j} with i = 1,2 and j = 1 , . . . ,m. For each contingency table, we examine whether the actual frequency of the outcomes {'on} depends on the forecast probabilities or not. Following Sakamoto and Akaike (1978), the strength of the dependence is evaluated by calculating AIC of models of multinomial distributions such that 2
m
[Ni..'~I'N,j\
A I C 0 : (--2) i_~1 j----~lNi'j log~-~., ) ~ - ~ . ) + 2(m - 1)
Y. Ogata / International Journal of Forecasting 11 (1995) 539-541
for the m o d e l suggesting that actual o u t c o m e s are i n d e p e n d e n t of the forecasts, and 2
3
AICl = (-2) ~ E i=lj=l
Ni j
N,,jlog-~-+2(2m
-
1)
.,.
f o r m the m o d e l suggesting d e p e n d e n c y of the o u t c o m e s on the forecasts, where N i , = E i Ng,i, N , j = Ei Ni4 and N , = Ei Ej Ng4. Thus, the c o m p a r i s o n of A I C in the present case lead to the conclusion w h e t h e r the forecasting is statistically valid to s o m e extent or not. In O g a t a et al. (1995), the e a r t h q u a k e data of the J a p a n M e t e o r o l o g i c a l A g e n c y is divided into two sets in the terms 1 9 2 6 - 7 5 and 7 6 - 9 3 , respectively. T h e data of the first period is used for the estimation of the predictive model and the latter is used to evaluate the p e r f o r m a n c e of the c o r r e s p o n d i n g model based forecasts. In b o t h evaluations the conditional probability forecasts were shown to be better than the unconditional forecasts.
541
References Akaike, H. 1974, A new look at the statistical model identification, IEEE Transactions on Automatic Control. AC-19, 716-723. Akaike, H., 1977, On entropy maximization principle, in: P.R. Krishnaiah, (ed.) Application of Statistics (NorthHolland, Amsterdam), 27-41. Boltzmann, L., 1878, Weitere Bemerkungen uber einige Probleme der mechanischen Warmetheorie, Wiener Berichre, 78, pp. 7-46. Brier, G.W., 1950,Verification of forecasts expressed in terms of probability, Monthly Weather Review, 78, p. 1-3, Ogata, Y., Utsu, T. and Katsura, K., 1994, Statistical features of foreshocks in comparison with other earthquake clusters, Geophysical Journal International, 121, 233-244. Ogata, Y., Utsu, T. and Katsura, K., 1995, Statistical discrimination of foreshocks from other earthquake clusters, Geophysical Journal International, submitted. Sakamoto, Y. and Akaike, H., 1978, Analysis of crossclassified data by AIC, Annals of the Institute of the Statistical Mathematics, 30, 185-197.