Atcid. Anal & Prey.Vol. 12, pp. 81-93 Pe'gamonPress Ltd., 1980. Printedin Great Britain
THE DEFINITION OF RESTRAINT EFFECTIVENESS T. P. HUTCHINSONt Transport Studies Group, University College London, London, England (Received 9 October 1978; in revised form 23 July 1979)
Abstract--When calculating the effectiveness of an injury-reducing device from data on injuries with and without the device in operation, one meets the difficulty that injury is merely graded into a few categories from none to fatal, not measured on a continuous scale. This paper discusses the derivation of formulae that overcome this, and of statistical tests for whether the device has any effect. It is shown that there are two important issues to be considered when deciding on a formula: (i) The distinction between effectiveness relative to (a) the total variability in injury severity in all accidents, or (b) the variability within the particular circumstances of an individual accident. (ii) The shape of the probability distributions of injury severity in the two circumstances, with and without the device. As to the first issue, effectiveness of type (b) requires each case with the device to be matched with a case without the device, whereas type (a) does not; type (b) leads to a more powerful statistical test, whereas type (a) is more suitable for descriptive purposes. As to the second issue, it is shown that the logistic distribution has suitable theoretical properties and leads to formulae and statistical tests that are conveniently simple. However, it is demonstrated that formulae that are linear in the probabilities are unlikely to be suitable.
I. I N T R O D U C T I O N
T i e usual means of expressing the effectiveness of an injury-reducing device, such as a seat b~;lt restraint, is as the percentage reduction in the proportion of injuries that are greater than some threshold. That is, the following formula is used:
El = I00 (x -x y)
(1)
where E1 is the effectiveness, x is the proportion of unrestrained persons who sustain injury greater than some threshold (often, in AIS grades 2 or above), and y is the corresponding proportion for restrained persons. Among those who have used this definition are Rininger and Boak (1976), Reinfurt et al. (1976), and Scott et al. (1976). This formula has the following disadvantages: (i) There is no indication how to modify the analysis to utilise the extra information present when severity is graded into more than two cetegories. (ii) Effectiveness as defined by (1) is found empirically to vary with the AIS level u,~ed as the threshold. (iii) The shape of the probability distributions of injury severity which it implies is rather implausible. (The meaning of points (ii) and (iii) will become clear shortly.) Section 2 is concerned with how formula (1) and certain alternative formulae are connected w th assumed distributions of injury severity, how data in which severity has several categories may be dealt with, and how statistical tests for the effectiveness of the injury-reducing device may be carried out. Dissatisfaction with formula (1) is thus the first motivation for this paper. The second is the arguments of Schneider (1975) for case-by-case matching of accidents (according to variables such as velocity change) in order to compare injuries in two situations. As will be seen, this leads to quite a different definition of restraint effectiveness from that applicable to the usual form of data, in which one simply has the numbers of cases injured to various degrees in the t~o situations, without the cases being paired off. A method of statistical analysis suitable for this data of matched pairs is the concern of Section 3. A section of discussion and conclusions ends the paper. 'Present address: Motor Industry Research Association, Nuneaton, England. AA~Vol. 12,No. 2--A
81
82
T.P. HUTCHINSON
2. UNMATCHED DATA In order to fix ideas, Table 1 shows the general form of the data which will be the concern of this section. It is assumed that a random sample of crashes have been investigated, and that the drivers have been classified according to whether or not they were restrained and the degree of severity of their injury. The following two expressions for measures of restraint effectiveness are directly analogous to eqn (1): E2
log (y) log (x)
=
(2)
(l/y)- 1
ga=(l/x)- l '
(3)
Section 2.1 will describe the theories underlying El, E2, and E3, and in Section 2.2 the corresponding statistical methods for data analysis are given.
The relative injury frequency (RIF) curve Although injury is classified (in Britain) into only four categories (fatal, serious, slight, none), it will be readily appreciated that there is really a continuum of severity (s), which is divided by three thresholds into four regions. Further, in any set of circumstances there will be a probability distribution of injury severity. In a second set of circumstances there will be a different distribution of severity. Because we have no quantitative definition of severity (though we must assume that all injuries can be ordered as to their severity), we can allow the first of these distributions to have any convenient shape we like--Gaussian, logistic, or exponential, for instance--without loss of generality, since the severity axis can always be stretched or compressed until the distribution becomes the required shape. When this has been done, however, the second distribution does not necessarily have any particularly convenient form. But it is likely that it will have approximately the same shape as the first distribution, but with a different location or scale parameter. The function that contains the information about how the injury severity distributions compare in the two sets of circumstances is y = / ( x ) , where x(s) is the probability of the severity exceeding s in condition 1, and y(s) is the probability of severity exceeding s in condition 2, and y(x) is obtained by elimination of s. This is called a relative injury frequency (RIF) curve by Hutchinson (1976). The probabilities x and y are both, of course, bounded by 0 and 1, and the function y(x) must be monotonically increasing (because probability density cannot be negative) and should not touch the line y = x except at (0, 0) and (1, 1). If it did cross the y = x line, it would mean that changing from condition 1 to condition 2 either increased or decreased the proportion of injuries in both tails of the distribution. While this might be the case for some choices of conditions 1 and 2, this would be exceptional, and certainly would not be expected when conditions 1 and 2 correspond to whether or not a restraint system was used. What we are saying is that if we use the proportion of injuries that are more severe than some threshold s as our measure of severity, then this proportion is less if a restraint is in use than if one is not, at whatever point we take s. Among the functions satisfying these conditions are 2.1
y
X
=
-a
(4)
Table 1. Notation for comparingthe samples of restrained and unrestrained drivers when the accidents are not matched Fatal
Serious Sli~nt
None
Unrestrained
N11
N12
NI 3
NI4
Restrained
N21
N22
N2~
N24
The definition of restraint effectiveness
83
y = x"
(5)
X
Y= a +(1-a)x"
(6)
[n each case, the parameter a describes the effectiveness. Equation (4) gives rise to E~ as an appropriate measure of effectiveness. Equation (4) arises f the injury severity distributions in both circumstances are triangular (Fig. 1). Proof: x ~ t 2, v ~ t 2, therefore x ~ y. Equation (5) gives rise to E2 as an appropriate measure of effectiveness. To prove that eqn q5) arises if the injury severity distributions in both circumstances are exponential (Fig. 2), we write x(s)= exp (-bs) and y ( s ) = exp (-cs). Then y = x c/b, and -dx/dx and -dy/ds (which are 1he probability density functions) are b • exp (-bs) and c • exp (-cs). Equation (6) gives rise to E3 as an appropriate measure of effectiveness. Equation (6) arises if the injury severity distributions in both circumstances are logistic (Fig. 3). Proof: if
x(s) = [1 + exp (s + b)] -1 ~tnd
y(s) = [1 + exp (s + c)] z then
y=
[ l+ l - X e x p ( c - b ) l' x
=
x a+(1-a)x
with a = exp (c - b). It is important to appreciate that these shapes of the injury severity distributions are sufficient but not necessary conditions for their respective RIF functions--by stretching or compressing the s axis we can make either of the probability density functions into any shape we like. What happens if we make the wrong choice from E,, E2, E3? If hypothesis (5) is true, but i11stead of computing E2 we compute ET, we will find E1 = 100x -
X a
x
= 1 0 0 ( I - x a-I)
(7)
If the restraint system is helpful rather than harmful, a > 1. Therefore, eqn (7) shows a r egative relation between x and E,. Since a low x corresponds to the use of a high threshold for
Y
4
t
P
S
Fig. 1. Sketch of an example of a pair of probability densities of injury severity whichgives rise to eqn (4).
84
T. P. HUTCHINSON
r S
Fig. 2. Sketchof an exampleof a pair of probabilitydensities of injury severitywhich givesrise to eqn (5). For key, see Fig. 1. comparison, and vice versa, there will be a positive relation between the position of the threshold and El: thus expression (7) says that the higher the threshold, the higher will be the effectiveness as measured by El. B. J. Campbell of the Highway Safety Research Center, University of North Carolina, has apparently concluded that this is empirically true. Referring to oral comments made by him at an SAE meeting, Rininger and Boak (1976) say: In a presentation given to the Society of Automotive Engineers in October 1975, Dr. B. J. Campbell of HSRC offered a hypothesis to explain the seemingly contradictory findings of various restraint studies. Campbell's review of the literature led him to this conclusion: the higher the injury level used for comparison, the higher the calculated effectiveness appears. Four methods of inferring the shape of the RIF curve may be suggested: (i) direct, by plotting the RIF curve. This has the disadvantage that there is frequently not much data for the most severe categories of injury. (ii) Calculate G , E2, and E~ using several thresholds and determine which is the most constant. (iii) If the theory is assumed to apply to the comparison of a number of different circumstances, rather than just two, plot the proportion of cases in the middle of three categories against the proportion in the most extreme category. For instance plot the proportion of non-fatal injuries against the proportion of fatalities, if the three groups are uninjured, non-fatal injury, and fatal injury. This method is not relevant to the case of restraint effectiveness, since we usually have two or three conditions (unrestrained, lap/shoulder belt, lap belt) only. (iv) If a more precise test is required, a statistical analysis may be performed to obtain predictions of the numbers falling into each injury category for both restrained and unrestrained occupants, under each of the three hypotheses (4)-(6); these predictions can be cempared with observations and the resulting x-squared values used to determine which hypothesis most closely fits the data. It will be shown in Section 2.2 how the GENCAT program (Landis et al. 1976) can be used for this purpose.
~
\NN\
v $
Fig. 3. Sketchof an exampleof a pair of probabilitydensities of injury severitywhichgives rise to eqn (6). For key, see Fig. 1.
The definitionof restraint effectiveness
85
2.2 Statistical analysis It has already been said that the usual procedure in analysing whether an injury-reducing device has any effect is to calculate the effectiveness at one threshold only. Now, the following :s put forward as a fundamental property that we want to be empirically true of the measure of effectiveness that we choose: that it should be the same whatever threshold is used. This property will be used to write down equations for E~, /52, and E3 in turn. There is a computer program called GENCAT for the analysis of categorical data which is applicable to :ill three, and the problems will be regarded as solved once the equations have been formulated n a way suitable for GENCAT. GENCAT is described in Landis et al. (1976); the application of an earlier version of this program (called CATLIN) to traffic and accident data is discussed by Hutchinson (1974). It is suggested for use here because of the flexibility it offers in the ~pecification of the hypotheses to be tested. We will need the following notation, similar to Table 1: let 7ri~ be the proportion of fatal njuries among people in the ith restraint class (1 = unrestrained, 2 = restrained), and 7ri2, 7r;3, ~nd 7r~4be the proportions seriously, slightly, and un-injured. 2.2.1 El. If we assume the hypothesis described by eqn (4), we will wish to test: "//'21 = 7/'21 + 7/'22 -- 77"21 + 7T22 + 7"/'23
7/'11
"/Tll + 7T12
(8)
"~'11 + 7T12+ 7T13"
Defining the vector u as (TrH, 7r12, ~r~s, 77"14, 7/'21, 7/'22, ~23, 77"24), eqn (8) may be rewritten in aaatrix notation as
i
00
-1
0
1 0
0-1
o
o
O] 0 In
o
.orii°°°°°°i)l ,
1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 oooo o
~r
=
1 fl
0 0 0 1 1 1
(9)
where fl is an unknown parameter. (The superscript T on ~r denotes transpose, that is, ~r is arranged vertically.) fl is related to Et as follows: 100
]Equation (9) is of a form that can be handled by the GENCAT program. 2.2.2 E2. If hypothesis (5) is assumed instead, so that E2 is independent of threshold, then log(Tr21) log(rr21+ rr22) log(Tr2j+ 77"22q- 7"1"23) log(Trl0 log(Trjl + 7r12)- log(zrl) + 7r12+ 7rj3) = E2 which becomes
1
0
0
0
1
0
where fl = -ln(E:).
-1
0
-
In
-In
I(°°°°°°i) i
1 1 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 1 1
0 0 0 0 0 1
1
~r= (10)
86
T.P. HUTCHINSON 2.2.3 E3. Hypothesis (6) implies that (I - r21)~,I _ (I - ~2,-
7<,1(1 - 7r11) (7r21+
~22)(r,I + ~12) _ (I - ~21 - r22-
77"22)(I -- 77"11 -- 77"12)
~23)(rI, + ~I,_+ r,3)
(7/'21 + 77"22 + 77"23)(1 -- 77"11 -- 77"12 -- 77"13 )
= E3
which becomes
! 0 0 10 01-1
0 0-1-1 0-I 0 0 0
0 0 0 0 1 0-1 001 0 In 0 0-110 0
i \
where/3
=
I 1 1 0 0 0 0 0
000000 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 1 0 0 0 1
!~ 0 0 0 0 0 0 1
0 0 0 0 0 0 0
i /
T
=
/3
000000
~ 0O0000000111
(]1)
ln(Es).
2.2.4 Remarks. Equations (10) and (11) are also ones which may be fitted to data using GENCAT. Also, to clarify notation, it should be pointed out that eqns (9)-(11) are hypotheses about the true population values of the probabilities, so the 7r's are the population analogues of the sample values (refer to Table 1) NH/(N, + Ni2 + N,3 + N14), N2,/(N2~ + Nn + N23 + N24), etc. 2.2.5 Output from GENCAT. This includes estimates of the parameter /3 and its standard error, and a x-squared statistic for testing whether the hypothesis fits the data or not. Also, the question of whether there is any statistically significant effect of the injury-reducing device may be tested. No effect will correspond to E1 = 0 or E2 = 1 or E3 = 1; in any case it will be /3 = 0 and, on request, G E N C A T will provide a x-squared statistic for testing whether/3 = 0. 3. MATCHED DATA The general form of the data is as in Table 2. This may have arisen from the case-by-case matching technique as described in Kihlberg (1963) and Schneider (1975). From a databank of accidents investigated in detail, pairs of cases that are as similar as possible (in terms of important variables such as velocity change, vehicle damage, impact configuration, casualty age) except for the variable of interest (restraint usage) are extracted, and the injuries can then be presented in the form of Table 2. Data in this form could also be obtained from a mass accident datafile, by extracting cases where (i) there were two vehicles involved, (ii) the impact was head-on, (iii) the vehicles were of equal mass, and (iv) exactly one driver was using a restraint. Section 3.1 will consider data as described above, that is, in which the two casualties are equivalent apart from restraint usage. It will emphasise how the analysis differs in principle from that for unmatched data and describe an appropriate statistical method. Sections 3.2 and Table 2. Notation for comparing the injuries to the two drivers who receive their injury under similar conditions except that one of them was wearing a seat belt Injury to restr inel iriver
Injury t} unrestrained
Fatal
6 er io'.zs
oii,~ t
Fatal
:~II
Xl 2
$I ~
N14
~erious
J'21
~22
N23
3~2A
}J31 ;;L1
';3 "<42
TJ3
hi4
:145
~44
t ii,7ht driver None
;~one
The definition of restraint effectiveness
87
33 will outline two separate extensions: to non-equivalent casualties (for instance, one is a driver and the other a passenger) and the effect of their being non-equivalent has to be determined along with that of restraint usage; and to a many-to-many matching, as opposed to a one-to-one (it may happen that, instead of finding one case where a restraint system was used rr:atches with one case where it was not used, in the kth set of circumstances there were kM~ cases in which the person was unrestrained and kM2 in which he was wearing a seat belt). 31 Equivalent casualties If two identical vehicles driven by unrestrained persons of equal susceptibility to injury c rash head-on, the drivers' injuries will not always be the same. There will be some variability. ~[his will be even greater if the two drivers have different susceptibilities to injury. The same is true if both drivers are wearing seat belts. The average level of severity will vary between a,:cidents, and will be higher for the unrestrained than for the restrained drivers. This is illustrated in Fig. 4. The probability density of injury severity to six drivers is shown there, unrestrained and restrained at each of three velocity changes. If both drivers in accident A ~ere unrestrained, their injury severities would be drawn from distribution A,. If both wore seat belts, their severities would be drawn from distribution At. If one wore a seat belt and the o:her did not, the injuries would be sampled from Ar and A, respectively. The same goes for accidents B and C. We will refer to any one of the distributions shown in Fig. 4 as describing the within-circumstances variability of injury severity. (The variability between people in their irapact tolerance is incorporated in these distributions.) Now we assume that we can define severity in such a way that all the distributions are Icgistic with the same standard deviation ~r, and that seat belt usage has a constant effect in the sense that the differences between the means of Ar and A,, B, and B~, Cr and C,, and so on, are equal, and are given by A~. Let us choose some threshold, above which the injury is called "~;evere", and below which it is called "slight". Let P be the number of accidents in which the re strained driver was severely injured and the unrestrained driver was slightly injured, and Q be the number of accidents in which the restrained driver was slightly injured and the unrestrained driver was severely injured. Now, the logistic distribution has the following property (Cox, 1970, Section 5.2):
(The factor X/3/Tr appears because the variance of the standard logistic distribution is not 1, but ~r'/3.) Note that the numbers of cases in which both drivers are severely injured, or both sl:ghtly injured, do not enter into this equation. Restrained Unrestrained
Crashes at three different speeds
B A, /
Au \
/.//t/
!
/
B, ~
/
Bu \
/
/
C, \
\~
/
C~ \
//
/
,
\\
\\\\\., severity
Fig. 4. Notional probability distributions of injury severity to restrained and unrestrained drivers in accidents of three levels of violence.
88
T.P. HUTCHINSON
In the context of the progression of lung disease (graded radiologically at the start and end of a period of time) in coal miners, McCullagh (1977) has proposed that when analysing a k × k table obtained by using k - 1 thresholds, each threshold may be used individually to obtain an estimate of A and an overall estimate may be found by averaging these in some way. Thus three estimates of A may be obtained from Tables 3-5, derived from Table 2. McCullagh suggests two methods of combining the three estimates of A, but we will find it convenient to use a third method--GENCAT. Writing n'ij = NdN.. where the Nij are as in Table 2, and N.. is the total sum of entries in this Table, and ~" = (~r., 7r~2. . . . . 7r4D, we have
(1"Y12 "F ~13 "F "/'/']4] = in (~13 q- ,/T14q.- ,/"/'23+ I'T24] = in (1T]4-.}-- IT24 "'F 1'T34/ = /~
l(i ooo0oooo°°°i)
In \7r21 + "/T31+ qT41/
i -10 01 -10 00 0 0
0
i) ,n
\ 77"31-J- qi'32 + 7r41 + "B'42/
\7T41 + 77"42+ qT43/
o o o0,,00,,0000000 ,ooo,o0o,oo
0 0 0 0 0 0 0 1 1 0 0 1 I 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
1
A.
Since the numbers on the leading diagonal of Table 2 do not enter into the equation, A is unaffected by N,~, so it does not matter if this number is unknown (as is often the case).
Table 3. Binary table formed from collapsing Table 2 using the fatal/serious division as the threshold Restrained driver Fatal
Unrestrained
Fatal
driver
Ser, Sli, or
Set, Sli, or i~one
N11
N12 * N13 + %14 S22+N2~+~;24.N32.N33
+1<34+i,,42+;,43+1;44
None
Table 4. Binary table formed from collapsing Table 2 using the serious/slightdivision as the threshold Restrained driver
Fat
Unrestrained
or Ser
Sli or
None
Fat or Ser
NI1+N12+N21+N22
N15+N14+N23+N24
Sli or None
N51+N52+N41+N42
N53+N34+N43+N44
driver
Table 5. Binary table formed from collapsing Table 2 using the slight/none division as the threshold Restrained driver
Fat, Ser, Unrestrained
or
Sli
driver
None
Fat, Set, or Sli
~rone
:411'+N12+'J13+N21+N22 +N23+N31+N32+'{33
Z:14 + N24 + ~$34
N41 + :;42 + ~'~43
~'I44
T h e definition of r e s t r a i n t e f f e c t i v e n e s s
89
It is important to appreciate that A is quite different from Es, despite both being based on an Jnderlying logistic model. E3 may be estimated for Table 2 by means of the following: 4
Defining Oi's to be row totals of the ~r's, that is 0~ = E ~'#, and ~bj's to be column totals, j=!
4
;bj = E ~i~, i=l
(1 -- ~ 1 ) 0 1
(bl(1 -- 01)
-- (1 -- ~ 1 -- ~ 2 ) ( 0 1
q- 02) = (1 -- (~1 -- (])2 -- ~ 3 ) ( 0 1
(thl + ~2)(1 -- 01 -- 02)
"~ 02 + 03) --
E3"
(t~l + q~2+ t~3)(1 -- 01 -- 02 -- 03)
[n matrix notation, this becomes:
( I 0
0 0
0 0
0 0
,,,oooooooo 1 1 I 0 0 0
-I 0
0 l
0 0 1 1
In
0 1 1 0 0 I 0 i 0 I 0 l
0 1 1 0 0 I I 0 0 I 0 I
0 I I 0 0 I I 0 0 1 I 0
I 0 0 1 0 1 0 1 0 I 0 1
I 0 1 0 0 I 0 I 0 1 0 I
I 0 I 0 0 I I 0 0 I 0 I
I 0 1 0 0 I I 0 0 I I 0
1 0 0 1 1 0 0 I 0 l 0 I
I 0 I 0 I 0 0 1 0 I 0 I
I 0 I 0 1 0 I 0 0 I 0 I
I 0 I 0 I 0 I 0 0 I 1 0
1 0 0 1 1 0 0 I I 0 0 1
1 0 I 0 1 0 0 1 1 0 0 I
I 0 1 0 1 0 1 0 1 0 0 I
/ \
0 II r i/~"
O/
'o/
where /3 = ln(E3). N ~ does enter into this equation, and it also does into the corresponding equation based on hypothesis (5), but not into that based on hypothesis (4). Referring to Fig. 4, A is the difference between the means of Au and Ar (and Bu and B , etc) expressed in units of their standard deviation. When considering the whole population of accidents, there will be a distribution of severity to all restrained persons. This will be formed by combining A,, B, C,, etc and will thus have a much larger standard deviation than these distributions individually. Similarly there will be a distribution of severity to all unrestrained persons, formed by combining Au, B,, C,, etc. This will also have a relatively large standard deviation. These two distributions will be separated by A~r (m and o, have been defined earlier). The separation between these two distributions expressed in units of their own (common) ~;tandard deviation ( is (A~r/O. It is also (~/3/~rx In (E3). Consequently, the ratio of the withincircumstances variability to the total variability, (~r/O, is given by
o~ V3 In (E3) ( 7ra In (E3) - In (Q/P)
In (Q*/P*) In (Q/P)
(12)
where P and Q have been defined earlier, and P* and Q* are their expected values on the r ssumption of no effect of restraint usage on injury severity, that is (P + R)(P + S)/(P + Q + R + S) and (Q + R)(Q + S)/(P + Q + R + S) in the notation of Table 6.
'-,.2 Non-equivalent casualties In Table 2, the two casualties are considered to be in identical conditions apart from their restraint usage. Instead we might have data on drivers' and front seat passengers' injuries in single-vehicle accidents. In this case there are four comparisons that can be made: drivers with or without seat belts vs front seat passengers with or without seat belts. If the effectiveness of the seat belt for the driver is A~ and for the passenger is Av, and the effect of being in the passenger seat rather than in the driver's when both are belted is As, then the result of
T. P. HUTCHINSON
90
Table 6. Binary table illustrating the derivation of eqn (12) Unrestrained driver
Severe
Severe
Minor
Total
S
P
P + S
Restrained k~inor
Q
R
Q + R
Total
~ + S
P + R
P+Q+R+S
driver
ln(E3) : i n ( ~ )
= S
<
= in(~.) ~*
= in(
) . ~"
comparing drivers wearing seat belts with passengers wearing seat belts is As; drivers wearing seat belts with passengers not wearing seat belts is As + Ap ; drivers not wearing seat belts with passengers wearing seat belts is A s - A o ; drivers not wearing seat belts with passengers not wearing seat belts is Ae + As - A~ Such equations would also result if the data referred to the injuries to casualties who differed in some other way, such as in age. 3.3 Many-to-many matching If the case-by-case matching technique finds that the kth set of circumstances occurs in kMl accidents in which the person was unrestrained and kM2 accidents in which the person was restrained, and among the kM1 in the first group there were kMli in the ith injury category, and similarly among the kM2 there were kM2j in the jth injury category, we may assume this set of circumstances contributes an increment
kM1+ kM2
1
(13)
to the (i, j)th combination of injury categories, so that the total number in the (i, j)th cell is 5'. k6i~. k
Further study is needed to establish whether a better method of analysis of the many-tomany case may be found, but this suggestion does pass the following three tests: (i) It contributes 1 to the (i, Dth cell and 0 to all other cells if kMI = RM2 --- 1. (ii) If kM1 and RM2 are large for a particular value of k, then restraint effectiveness in this single set of circumstances may be analysed by either of two methods, (a) as unmatched data, or (b) as matched data. The two methods are equivalent because, since we are considering a single set of circumstances, there is no between-circumstances variability. Therefore the within-circumstances and total variabilities are the same. In more detail--(a) Unmatched data: the logistic version of the analysis, as described in Section 2, is used to compare the vectors (kMI1, kMI2,kMI3, kMl4) and (kM21, kM22, kM23, kM24). (b) Matched data: we create a table of the form of Table 2 by means of formula (13) and analyse this by the method of Section 3.1. On following the algebra through, we find that the two methods give the same results. (iii) If all the kM1 fall into the ith severity category, and all the kM2 fall into the jth severity category, and kMI = kM2, then an amount kM1 is contributed to the (i, j)th cell and 0 to all other cells. In practice, we expect that some of the kM1 and RMz may be greater than 1, but also that not all of them are so large that separate analyses for each set of circumstances are worthwhile.
4. D I S C U S S I O N A N D C O N C L U S I O N S
It is often said that seat belts are more effective in reducing fatalities than in reducing serious injuries, and more effective in reducing serious injuries than in reducing slight injuries.
The definition of restraint effectiveness
91
Doctor Campbell's conclusion to this effect has already been mentioned, and it is supported by many of the studies reviewed in Grime (1978). The view taken in the present paper is that this is likely to be an artefact of the use of eqn (1) to define effectiveness, and that it is more useful to search for a definition of effectiveness tl'at is empirically found to be the same whatever level of injury is considered, rather than to accept expression (1) along with a correlation between severity and effectiveness. The justification for this view is, firstly, that it is simpler to describe effectiveness in terms of one parameter rather than two (the extra one being some measure of how much effect injury level has on effectiveness), and, secondly, that on theoretical grounds one does not expect e:~pression (1) to be independent of injury level. These theoretical grounds for disliking formula (1) amount to saying that it is unlikely that tl'e relative injury frequency curve really consists of two straight line segments, from (0, 0) to (1, l/a) and from (1, l/a) to (1, 1), rather than of some smooth curve (such as the possibilities given in eqns (5) and (6)). There are, of course, many difficulties which one may encounter with the data, such as actually discovering whether a restraint was in use, and underreporting of slight injuries and the uninjured, and it has been assumed that these have been overcome. If they have not been, however, the principles and methods put forward in this paper are likely to be of use as components of a data analysis that also includes a model of the types of error present. It has already been said in Section 3.1 that a property of expression (I) is that it is unaffected by the number of accidents in which both the restrained driver and the unrestrained driver were uninjured. This might at first sight appear to be an advantage, since this is always a difficult number to discover, but, on reflection, many people will probably regard it as an undesirable property, for the following reason: consider the case where we are basing our c~dculations on the proportions of fatalities. Then our answer will be unaffected by the number o1 cases in which both drivers escaped death. Therefore our answer will be unaffected by the number of cases in which both drivers were seriously injured. It may be suggested that a descriptive measure of effectiveness ought to be affected by this number--if there are a large number of cases in which the same thing happens to both the restrained and the unrestrained driver, then one probably wants to say that the restraint is having less effect than if there are few such cases. Although we have not attempted to say, on observed data, which of the effectiveness measures El, E2, E3 is most nearly constant, it should be emphasised that this is an empirical question, to which it will be possible to provide an answer as evidence accumulates. Furthermore, it is possible to sta.tistically test whether any particular set of data supports or rejects the ploposition that El (or E2, or E3) is constant. The imposition of the property that effectiveness be independent of the injury level used as the threshold has permitted us to develop methods that take account of all the information in the data, not forcing us to condense it into binary form. The specific statistical methods that have been suggested in this paper are based upon the u~,e of the program GENCAT. The development of computer packages for categorical data has lagged behind that of those of measurement data, so at present there are few enough candidates f~r consideration, and probably none that offers the flexibility of hypothesis specification that G E N C A T does. Furthermore, it is written in Fortran, which means that it is more portable than a facility in the form of a specialised language, or with Assembler components. However, there are certain disadvantages with G E N C A T - - t h e wide range of options makes input clumsy, for instance--and no doubt alternative statistical procedures could be developed. It is worth saying that whether seat belts are as effective at high velocity changes as at low, artd whether they are as effective for elderly casualties as for the young, are also empirical questions, but the answers will not affect our choice of effectiveness measure--that will be d~cided by finding which is independent of injury level considered. In any accident sample, there will be many sources of variation in injury severity, of which one will be the issue being investigated. The process of case-by-case matching which we have postulated in Section 3 removes as many of the others as possible from the comparison, thus le, tding to a more sensitive statistical test. That this is so may be seen from Table 6: the m~tched test will be based on a comparison of the numbers P and Q, whereas the unmatched
92
T.P. HUTCHINSON
test would involve a comparison of P + S and Q + S. Since the variance of categorical data increases with increasing numbers of observations, it will be greater in the latter case and the statistical test less powerful. For the very reason that cases in which the same degree of injury was sustained by both persons are excluded from the calculation, a statistic such as Es from the unmatched methodology may, however, be preferred for descriptive purposes. Whereas A is the effectiveness of the restraint system measured relative to the variability of severity within the particular circumstances of an individual accident (or, more precisely, that small class of accidents satisfying certain conditions with regard to the variables used for matching), ln(E3) is the effectiveness measured relative to the total variability of severity in the type of accident under consideration. The extension to the case of non-equivalent casualties, in Section 3.2, is straightforward and needs little discussion. Suffice it to say that whether there is any interaction between the variables (for instance, whether seat belts have the same effectiveness for passengers as for drivers) is an empirical question for which statistical tests can be devised. With GENCAT, it would be easy to test whether Ap = Ao, In any study, one has to choose how exact the matching has to be. As one insists on stricter and stricter matching, more and more extraneous variation is removed from the comparison, but the sample size gets smaller and smaller. This issue is beyond the scope of the present paper, but if one takes a common-sense attitude to matching, it is quite likely that one will find pairs of groups which correspond to each other quite closely but the members of which cannot be paired off any more exactly than the groups resemble each other. In Section 3.3 a modification of the statistical method is put forward that it is hoped may overcome this, but no proof has been given that it is valid, and the problem needs further study. If the proposed modification is supported by more rigorous investigation, it will mean that neither will great effort have to be expended on boiling down approximate matches into exact pairs, nor will some cases have to be discarded because one of the pair of approximately matched groups contains more cases than the other. Since A is the difference between restrained and non-restrained conditions relative to the variability within these conditions, A is determined in part by the exactness of the matching. Consequently, there is no reason to expect values of A determined from different studies to be the same. Finally, although much of this paper has been written with seat belts in mind, the principles and methods are exactly the same for any other injury-reducing device, whether it be steering column, crash helmet, or vehicle frontal design for pedestrian protection. There are still the same distinctions to be made, between matched and unmatched data, and between the various assumed shapes of RIF curves, and the same desire to use the data as efficiently as possible. Consequently, it is hoped that the ideas in this paper will receive wide application to data, as a result of which their validity or otherwise will be determined. Acknowledgements--The author acknowledges helpful discussions with P. M. McCullagh and the constructive
comments of the referees. REFERENCES Cox D. R., The Analysis of Binary Data. Methuen, London(1970). Grime G., A review of research on the protection afforded to occupants of cars by lap and diagonalseat belts. Presented at 1st Course on Crash-worthiness in Transpn Systems, held at the Ettore Maiorana Centre, Erice, Sicily(1978). Hutchinson T. P., Use of the program CATLIN for analysing tables of frequencies, with examples from traffic and accident studies. Rep. from the Tra~c Studies Group, UniversityCollege London(1974). Hutchinson T. P., Statistical aspects of injury severity. Transpn. Sci. 10, 269-299(1976). Kihlberg J. K., Seat belt effectiveness in the non-ejectionsituation. In D. M. Severy (Editor), Proc. 7th Stapp Car Crash Conf.--Thomas, Springfield, Illinois (1%3 Conference, published 1%5). Landis J. R., Stanish W. M. and Koch G. G., A computerprogram for the generalizedx-square analysis of categorical data using weighted least squares to compute Wald statistics (GENCAT). Unpublished Biostatistics Technical Rep. No. 8, Department of Biostatistics, Universityof Michigan(1976). McCullagh P., A logistic model for paired comparisonswith ordered categoricaldata. Biometrika, 64, 449--453(1977). Reinfurt D. W., Silva C. Z. and Seila A. F., A statistical analysis of seat belt effectiveness in 1973--1975model cars involved in towaway crashes. Volume 1. Rep. No. PB-258 542, National Technical InformationService, Washington DC (1976). Rininger A. R. and Boak R. W., Lap/shoulder belt effectiveness. Proc. of the 20th Conf. of the Am. Assoc. for Auto. Med. 262-279 (1976).
The definition of restraint effectiveness
93
Schneider R. W., The analysis of accident data using the "case-by-case matching" technique. Proc. of the 19th Con[. o[ the Am. Assoc. for Auto. Med. 292-300 (1975). S~:ott R. E., Flora J. D. and Marsh J. C., An evaluation of the 1974 and 1975 restraint systems. Special Rep. UM-HSRI-76--13, Highway Safety Research Institute, University of Michigan (1976).