Computational North-Holland
Statistics & Data Analysis 11 (1991) 275-295
275
Gerhard Lutz Universitiit Regensburg, 8400 Regensburg, Germany Received December 1989 Revised May 1990 Abstract: Threshold models for ordered categorical data as considered by McCullagh (1980) have bw:crne a S+ . I t::rid tool in categorical regressiolr. Alternatively, Lhe class of sequential mod& is coirs:tier 09 .Ail.Fre response categories are reached successively step by step. The sequential models are d~rivcd irom assumptions about an underlying stepwise response mechanism. Connections between the McCullagh type models and the sequential models are investigated. Several examples are considered where the computing has been done by a new program package which includes ordinal regression mod&s. Thereby it is shown that sequential models are appropriate tools which widen the scope of ordered categorical regression. Keywords: Ordered model, Proportional
categorical regression, hazards model.
Threshold
models,
Sequential
models,
Grouped
Cox
McCullagh’s (1980) enlightening treatment of regression models for ordinal data has become the most influential paper is this field of research. The handiing of regression type problems with an ordered categorical response variable is most often based on McCullagh’s general class of models or extensions of it (e.g.? Terza (1985), Anderson and Phihips (1981), Quedneau (1988), Farewell (1985) Tutz (1989), Randall (1989), Jansen (1990)). The models may be based on the existence of an underlying continuous but unobservable random variable of which only a coarsely measured version is observable. The ordered categories correspond to intervals of the real line. This approach which is shortly sketched in section 2 is applicable whenever ahe response vari Ye is categorical with ordered categories. However, often the categories of the response variable are such that they can be reached only successively step by step. A simple exa le is the response zijiiams variable considered in Table 1 which is based on data from (1954) (see also McCullagh (1980)). Children have been classified accc;dipg co their relative tonsil size and w ey are carriers of 0167-9473/91/$03.50
0 1991 - Elsevier Science
blisbers
G. Tutz / Sequential models in categorical regression
276
Table 1 Tonsil size (Holmes and Williams (1954)).
Carriers Non-carriers
Present but not enlarged
Enlarged
Greatly enlarged
19 497
29 560
24 269
pyogenes.
It may be assumed that tonsil size always starts in the normal state ‘present but not enlarged’ (category 1). If, determined by exogeneous or endogenexus influence, the tonsil grow abnormally, they may become ‘enlarged’ (category 2). If the process does not stop, tonsils may become greatly enlarged (category 3). But in order to get greatly enlarged tonsils, they first have to be enlarged whatever the duration of the intermediate state ‘enlarged’ is. For data of this ‘step-wise’ type it may be assumed that the underlying process starts in category 1 and successively a transition to higher categories takes place until the process stops. Sequential models as considered in section 2 explicitely make use of this underlying process which may be determined by latent variables. Similar models have been considered in the modelling of sequential decisions (Amemiya (1975)). The purpose of this paper is to derive and outline the sequential z - ’ roach which often is more appropriate to the underlying response mechai:.L I and consequently often yields a better fit of the data. Moreover, the focus is on uniqueness results and the relation to the threshold approach. After a short treatment of McCullagh’s approach in section 2, in section 3 sequential models are derived from assumptions on underlyirrg latent variables. For the simple versions of the models where the effect of explanatory variables does not depend on the category it is shown that the models also yield strict stochastic ordering of subpopulations. The general case where the effect of variables may also depend on the category may be alternatively derived from latent mechanisms. In section 4 the connection between cumulative and sequential models is investigated. For the grouped Cox model and the exponential model specific versions of both modelling approaches are equivalent. It is shown that equivalence holds only under specific conditions which are not fulfilled for symmetric distributions. In section 5 collapsibility and indistinguishability of categories are considered shortly. In section 6 the embedding of the models into the framework of generalized linear models is sketched.
ative Let yE (l,... , k > denote the ordered categorical response variable and x’ = (x ]Y...Y x,,) be a vector of explantatory or predictor variables. Let the unobservable continuous random variable u take the linear form u = x’p + E where x’j3 is
G. Tutz / Sequential models in categorical regression
217
the systematic part determined by the vector of parameters /3’ = (&, . . . , P,). The random component is determined by the random variable e which has the fully specified distribution function F assuming E(E) = 0. The assumption y=r wheree,
if B,_,
se2s
(r=l,...,k),
. . . 5ek_l,
(2 . 1)
e,= -00, ek= 00 yields the general cumulative model
(2.2)
pr(ysrIx)=F(8,-x’p).
Response mechanism (2.1) is equivalent to the ‘category boundaries’ approach due to Edwards and Thurstone (1952). Here it is assumed that the observable variable y is merely a categorized version of the latent variable U. McCullagh (1980) extensively discusses models of the type (2.2) (see also McCullagh and Nelder (1983, chapter 6)). In patricular he considers the proportional odds model or cumulative logistic model log[pr(yIrlx)/(l-pr(yIrlx))]
=8,-x3
which follows from the logistic distribution ‘ proportional hazards’ model log[-log(pr(y>rlx))]
F(x) = l/( 1 + exp( -x)}
=8,-X3
and the
(2 .3)
which follows from the assumption F(x) = 1 - exp( - exp( x)}. In the context of survival analysis the latter model is also called the grouped proportional hazards or Cox model since it may be derived from the continuous proportional hazards model (Cox (1972), Kalbfieisch and Prentice (1980)). For the cumulative model (2.2) it is not necessary to postulate the existence of latent variable u and response mechanisms (2.1). However, interpretation is easier if one refers to the underlying response mechanism. Extensions of the model which let the parameter depend on the category have been considered (Williams and Grizzle (1972), Terza (1985)). If one assumes that the thresholds e,, r= l,..., k - 1, are determined by explanatory variables z ’ = ZJ by the linear form (+..,
where s,’ = ( Srr, . . . , S,,), we get the general cumulative pr( y I r 1x, z) = F(S*, + ~‘8~- x’p),
model (2 .4)
r= 1 Y”“Yk. Model (2.4) still assumes the category boundaries mechanism, but now only for explanatory variables x. The variables z only determine the thresholds which lie on the latent score. As is immediately seen the assumption parameter. Thus one has to zi = Xi, i = 1,. . . ‘) q, q = p, makes B an unidentifiable distinguish strictly between threshold variables zi and &tent scere variables Xi. The former will be called ‘category specific’, the latter are refered to as ‘global’ variables.
G. Tutz / Sequential models in categorical regression
278
3.
3. I. Simple sequential models Let now the categories of the response variable be such that they can be reached only in consecutive order, step by step. Let latent variables u,, r = 1, _. . , k - 1, have the linear form u, = x’b + cr where e:r is a random variable with distribution function D. The response mechanisms starts in category 1 and the first step is determined by
where 8! is a threshold parameter. If u1 s 8, the process stops. For the data u1 may represent the latent tendency of growth in the initial state tonsil size. If u1 is below threshold 8, the tonsil size remains normal not, at least enlarged tonsils result ( y 2 2). That means, if u1 > 8, the continuing in the form
tonsil size of normal ( y = l), if process is
y = 2 given y 2 2 if u2 s e2 and so on. The latent variable u2 may represent the unobservable tendency of growth when the tonsils are already enlarged. Generally, the complete mechanism is given by y=r
given y2r
if
u,ler,
or equivalently y>rgiven
y2r
if u,>e,,
(3 .l>
r= 17”‘) k - 1. The sequential mechanism (3.1) models the transition from category r to category r + 1 given category r is reached. A transition takes place only if the latent variable which determines the transition is above a threshold which is characteristic for the category under consideration. The main difference to the category boundaries approach (2.1) is the conditional modelling of transitions. The sequential mechanism assumes a binary decision in each step. Given category r is reached it is determined if the process stops in a final category or if it continues with resulting higher categories. Only the finally resulting category is observable. The binary decision in each stsp is modelled by a usual regression model assuming a cumulative distribution function. The category boundaries approach is used locally for modelling the consecutive binary decisions. The two assumptions, existence of latent variables u, and response mechanism (3.1), immediately yield the simple sequential model pr(y=rlykr, The probabilities
(3.2)
x)==D@?~-x’B)~ of model (3.2) are given by
pr(y=rix)=D(B,--x3)
(1 i=l
a( e, - x’p)).
Model (3.2) sometimes is refered to as a continuation ratio mo&i (Agresti (1984)). Here, it is assumed that the step-wise transition is determined by the parameter p which is the same for each step. But now instead of the cumulative probabilities pr( y s r 1x) the discrete hazards pr( y = r 1y 2 r, X) are determined by the linear term 19,- x’p and transformation D. McCullagh (1980) considers a property of cumulative models called stnct stochastic ordering. Let two groups of subpopulations be represented by COvariates x1, x2. Then for the cumulative model (2.2) the difference 4(x*9
x2)
=
F-‘(pr(
y 5 r 1.x1)> - F(pr(
y 5 r 1x2)>
is given by p’(x, - x1). That means that e.g. for the logistic cumulative model where F-r{ pr( y I r 1x)} = log{ pr( y s r I x)/pr( y > r I x)) = Z,(x) the difference between ‘cumulative log-odds 4(x1,
x2)
=Uxd
-4x2)
=pI(xz-x1)
does not depend on the category. Thus if one of the cumulative log-odds I, is for population x1 larger than for population x2 thts holds for any cumulative log-odd. For the grouped Cox model one gets F-‘(pr(y5rIx))
=log[-log(pr(y>rlx))].
Thus the difference of log-log transformed cumulative probabilities does not depend on r. It.should be noted that the difference of F-*(pr( y s r I x)) between two sets of populations does not depend on the category whereas the proportion of discrete hazards ~(xr,
x2) = pr(y = r 1y 2 r, x,)/l+
= i-1 y 2 6 x2)
does depend on category r. Although the model may be derived as a discretized version of the proportional hazards model (Cox (1972)) for the discrete hazards the proportion depends on r. It is not a proportional hazards model for discrete hazards. Thus, model (2.3) is refered to as grouped Cox model instead of the misleading name proportional hazards model. Of course one should keep in mind that Cox’s (1972) model for discrete hazards is different. Actually the latter refers to a logit continuation ratio model. Therefore the adjective ‘grouped’ should not be omitted. The useful property of strict stochastic ordering of subpopulations is not restricted to cumulative models. A very similar condition hlolds for the simple the diffcrcnce of sequential model (3.2). It follows that for two subpn@ations transformed conditional probabilities 4(x1,
x2)
=
=
D-‘(pr(
y =
r Iy > r, x1)>
-
D-‘(pr(
y = r I y 2 6 4)
x2 -X1) does not depend on the category. For the logistic version of the sequential model ,ke discrete hazards pr( y = r I y 2 5 xl> are where D(x) = I/( 1 + exp( -x)} transformed to the ‘sequential’ logits D-‘(pr(y=rly>r,
P(I
x))=log(pr(y=rlx)/
G. Tutz / Sequential models ir! categorical regression
280
Table 2 Sample logits for sequential
and cumulative
modelhng
of the tonsil size data. y=2
y=l
X
Sequential logits based on frequencies
-1 A, (1~ 1)
- 1.009 -0.511 - 0.498
Cumulative logits based on frequencies
1 -1 A, (I.,- 1)
- 1.009 -0.511 - 0.498
1
_-
U.iii5
0.732 - 0.547 0.683 1.367 - 0.684
. -Ads for y = r and y ) r. The difference 4,(x,, x2) which are the iogariihutrc 1-m 1~g-0~ between these sequential logits does not depend on the category. Consequently 2 type of strict stochastic ordering holds for the sequential model too. If one of the sequential logits is for population xi large1* than ca.rr__for population x2 this holds for any sequential logit. The only difference between sequential modelling and cumulative modelling - as far as strict stochastic ordering is concerned -- is that in the former conditional probabilities are transformed -whereas in the iatter cumulative probabilities are trp-nsfori~ned. The consideration of sequential logits . *3?s’*e~~ * oi cumulative logits corresponds to the sequential modelling of the response which is the underlying principle for sequential models. For the extreme value distribution D(x) = 1 - exp( - exp( x>> in model (3.2), we get the log-log odds
D-‘(pr(
y = ii’1y I r, x)) = log[ -log(pr(
y > r 1x)/pr(
y 2 r [ x)}].
Therefore the difference of log-log transformed conditional probabilities does not depend on r. In Table 2 the sequential and the cumulative logits are given for the tonsil size data. The sample logits based on frequencies are computed directly from Table 1. For example the first sequential logit is given by log (19.5/53.5) where 0.5 has . been addGu lAnA +e Lu bnth vvcII nu,merator __ and denominator. It is seen that the differences -0 .-/u AQQand -0.547 for the sequential sample logits are almost equal. For the cumulative sample logits the differences - 0.498 and -0.684 are more differing. Therefore a better fit of the sequential model is to be expected. Table 3 shows the deviances and parameter estimates for both models. Moreover, in Table 3 the difference of logits based on ML estimation are given. Siilce carriers are coded by
Table 3 Estimates for tonsi! size data with standard 8,
82
Sequentiai iogi t model
- 0.775 (0.106)
0.467 (0.111 j
Cumulative ‘:ogit model __
- 0.809 (0.116)
1.061 (0.118)
deviations
given in brackets.
P
(1 -1j *?z \‘,
deviance
--- 0.264 (0.098)
- 0.528 (0.196)
0.006
0.301 (0.112)
- 0.602 (0.224)
0.302
G. Tutz / Sequential models in categorical regression
x = I and non-carriers
by x = - 1 the difference
251
of logits is given by - (+ -
x,>‘p = -2p.
Although the fit of the sequential model is better the main difference between models is the interpretation. Within the cumulative model the parameter p determines the strength with which the latent variable u is shifted on the latent continuttm. The estimated parameter /3 = 0.301 means that the odds of having normal-sized tonsils as well as the odds of having normal-sized or enlarged tonsils is 1.82 times as large for non-carriers as for carriers. Within the sequential model the parameter p gives the strength with which the transition from category 1 to 2, and from 2 to 3 is determined. The estimated parameter value p = 0.264 means that the odds of having normal sized tonsils is 1.69 times as large for non-carriers as for carriers. The same proportion holds for the odds of having merely enlarged tonsils given the tonsils are not normal. The sequential model as a step-wise model assumes that the process of enlargement as far as the comparison of carriers and non-carriers is concerned does not stop in the ‘enlarged’ category but 02ds to greatly enlarged tonsils where (xi - x,)‘b = 2p de1s going on and lbtermines the comparison between populations. 3.2. General sequential models In the same way as for the cumulative modei the thresholds may be influenced by explanatory variables by parameterization $ = S,, + Z’S where z’ = ( zi5 . . . , z& 6,’ = (&*, . . . , l&). In the general sequential model
pr(y=r) y2r,
X, Z) = D(S,,+Z'S,-x'p),
(3.3)
the variables x are modelled as global variables determined by parameter vector p whereas the variables z determine the thresholds and consequently the effect depends on the category. Their influence is determined by parameter Sr which is specific for the category under consideration. In contrast to the cumulative model the general sequential model (3.3) may alternatively be based solely on latent variables. Let U, have the linear form % = x’/!! - Z’S, + E, and the response mechanism be determined by y > p give y 2 r if u, > S+
(3.4
Then, we immediately get model (3.3). Based on this derivation the latent variables determine the response variable completely. The thresholds S,, are assumed to be fixed, but the latent variables comprise a global term x’b which is the same in each step and a term Z’S, which is specific for the step under consideration. Let us consider shortly the consequential differing interpretations of latent variabies in cumulative and sequential models. Within the category boundaries approach the term Z’S, which depends on the category may be interpreted only as a shifting of thresholds. If only global variables are assumed as for the example of tonsil size the underlying variable may be considered as the metric analogon of y, e.g. the proportion of tonsil size and body weight. f the proporticfi is bewfd-3
G. Tutu / Sequential models in categorical regression
282
Table 4 Breathing test results of Houston
Industrial
Workers (Forthofer
and Lehnen (1981)).
Breathing Test Resulis
Age
Smoking Status
Normal
Borderline
Abnormal
< 40
Never Smoked Former Smoker Current Smoker
577 192 682
27 20 46
7 3 11
Never Smoked Former Smoker Current Smoker
164 14s 245
4 15 47
0 7 27
40-59
threshold 8, the examiner of tonsil size chooses category 1. However, if we assume explanatory variables which determine the thresholds this interpretation does not longer make sense since the thresholds are determined by the examiner. Then the latent variable becomes more fictious. It might be considered as a growth stimulus which, if being beneath threshold 8,, leads to normal tonsil size. Within the sequential approach latent variables determine the transition from category to category. Here, the latent variable ‘growth stimulus’ itself may be considered as depending on the category or not. For the general sequential model (3.3) as well as for the general cumulative model (2.3) strict stochastic ordering of subpopulations does no longer hold. For model (3.3) one gets for subpopulations with covariates xi, zi and x2, z2
4(-q, q;
x2,
~2)= (zl -z2)i&- (xl
-x,)‘P
which depends on Y. Only the second term yields a partial ordering with respect to variables x which determine the latent variables u, in the same way. Modelling with the g ral sequential model is demonstrated by an example which has been consider reviously by Forthofer and Eehnen (1981, p. 21) and Agresti (1984, p. 97). It ems the influence of smoking status and age upon the results of a breathing test. For details see Forthofer and Lehnen (1981) from which Table 4 is taken If not born abnormal the category abnormal can be reached only through the intermediate borderline category. Hence the analysis may be based on the s wise approach of sequential models. The data set is interesting because the i raction of age and smoking habit here is so strong that it cannot be omitte(d. Figure 1 shows the hierarchy of models which may be derived in the general sequential model framework. Models are abbreviated by the explanatory variables which are included. The index 12 denotes that the explanatory variable may determine the first and second transition by possible differing weights, i.t. the variable is a threshold variable. If a variable name has no index, the variable is global, i.e. not specific for the category. Interaction terms of age and smoking habit are abbreviated by (A* S),, if they are allowed to depend on the category, by (A*S) if they are global. If a variable or interaction term has only one index, e.g. S,, it is assumed t at the variable determines only the first transition or
G. Tutz / Sequential models in categorical regression
/\ A,S&*S)
A12S,(A*S)
\I
AS,(A*S) ~
A,S&*S)l
283
/4\/1\ A12S@“S)1
\/
A,$( A*S)l \ ,,y
A12S12 A,S12@*S)2 A12Sn(A*S)2
,(A%;
AS
I\ \/
A
S
0
Fig. 1. Models for two explanatory variables and a trichotomous dependent variable.
threshold (between categories 1 and 2), the second transition or threshold is not determined by this variable (i.e. 6, = 0). The hierarchy given in Figure 1 shows only part of the models. Only models are included where the interaction term is allowed to be global or to determine only one of the thresholds. The main effects are considered as global or affecting both thresholds. Thus, for example the model A, S,, (A*S), is not given in the hierarchy. The restriction to models where only for the interaction effect each possible way of influencing the dependent variable is considered is made for sake of simplicity. Including all possible models would yield a hierarchy which is very difficult to survey. In principle the Lerarchy is the same whenever two explanatory variables and a trichotomous response variable are considered. It should be noted that the hierarchy is also the same if a general cumulative model is assumed to hold. IMoreover, it is not determined by the fact that categorical variables have been considered. For continuous explanatory variables the hierarchy is unchanged. Table 5 gives the deviances for some models, cumulative and sequential ones. It is seen from Table 5 that the interaction term cannot be omitted but may be considered as a global term for the cumulative model as well as for the sequential model. Since the fit of the sequential model A, S, (A*S) is distinctly superiour to the fit of the cumulative model A, S, (A*S) only the sequential model has been considered further.
G. Tutz / Sequential models in categorical regressis,n
284
Table 5 Deviances for logit mc?dels applied to breathing data.
Cumulative
Model
Deviance
df
4412, S,,, (A*S)
3.244 27.395 8.109 2.645 27.531 4.310 3.523 4.187 10.979 0.444 6.756 27.534 29.655
2 4 5 2 4 5 2 4 5 1 5 5 6
A!29
s12
A, S, (A*S) Alp S12, (A*S)
Sequential
Al29
s12
A, S, (A*S) A,27 S12r(A* Sh A,,, S,, (A*% Al, Sp (A*%, A, S129 (A*%, A, S,, (A*S), A, s12 Al29
s
Table 6 shows the estimaters for the nearly saturated model A, S,,, (A*S),, and for two simpler models. The model A, S,,, (A* S),, is used as a substitute for the saturated model for which ML estimates do not exist. For convenience, the
Table 6
Estimates
for sequential
logit moaels
applied
to breathing
dtita (standard
given in
deviations
brackets). Model s 10
F
to A S,(l) Sz(U S,(2) S,(2)
(A*
SW),
(A*
SW),
(A*
SW),
(A*
S(2)),
A,
S,,,
(A*%,
2.379 (0.109) 1.510 (0.276) 0.093 (0.107) 0.915 (0.190) 0.675 (0.421) - 0.375 (0.142) - 0.163 (0.319) - 0.561 (0.190) - 0.894 (0.452) 0.015 (0.143) 0.532 (0.380)
A,S,A*S 2.379 (0.108) 1.516 (0.186) 0.094 (0.105) 0.882 (0.186) 0.882 (0.186) - 0.356 (0.135) - 0.356 (0.319) - 0.601 (0.186) - 0.601 (OS 86) 0.092 (0.135) 0.092 (0.135)
A, S,, (A*S), 2.327 (0.097) 1.058 (0.156) 0.198 (0.084) 0.811 (0.162) - 0.324 (0.133) - 0.461 (0.162)
- 0.008 (0.135)
-
G. Tut; / Sequentiulmodels in categorical regression
285
parameters in Table 6 are given for the linear term 6,, + z ‘Sr + x’p where - X’ is replaced by x. The advantage is that 6, and p may be compared more easily if a variable is considered as category specific in one model and as global in another model. As is seen from Table 6 the variable age becomes influential primarily in the interaction term. Although the smoking history is strongly influential the advantage of never having smoked becomes really impressive for higher age and is of less importance for lower age. As is seen from Figure 1 another line of looking for a simple model is by omitting terlms for specific thresholds. Starting _ from the saturated model a reduction of the saturated model may be reached by omitting the influence of the interaction term upon the second transition from borderline to abnormal. From the estimates of model A, S,:, (A*S),, it is seen that the effects for the transition from category 1 to category 2 are weak compared to the effects for the transition from category 2 to category 3. Consequently, the model A, S,, (A x S), with the global variable age, and an influence of smoking habit and intetaction term only upon the first transition from category 1 to 2 yield: a&o a satisfying fit. Based on this model age determines both transitions in the sane way whereas smoking habit and the interaction of age and smoking habit determine only the transition from normal to borderline, not the transition from borderline to abnormal. Agresti (1984) investigates the breathing data within the framework of association models with assigned scores. Sequential models as used here have the advantage that assigned scores which always are a bit arbitrary are not necessary. Moreover, the distinction between global variables and variables which determine the thresholds also yields a comparatively simple model. Since interaction is very strong in this example an analysis e.g. within the usual framework of log-linear models would stop at the investigation of three factor interactions.
. Connections
between cumulative
and sequential
elling
4. I. Equivulence of models for special cases Cumulative er-id sequential models are quite different approaches to the modelling of the response variable. Apart from the modelling of thresholds the cumulative model is a one process model where the response is determined by the latent variable u from which a categorized version is observed. The sequer;tial model is a multi process model where the response is cl~termined step by step. Though the models are based on different principles it is not clear whether they may be equivalent under specific conditions. This is especially questionable if the thresholds are allowed to be determined by explanatory variables, since then the concept of a categorized latent variable for the cumulative model is dissolved. However, if the problem under consideration allows for both types of models it should be clear if the models actually are different. In the following two models are considered where the approaches yield partly equivalent mod%
G. Tut: / Sequential models in categorrcal regression
286
Grouped Cox modei The simple cumulative
model
pr(_v
we
have only global variabies may be easily transformed sequential model pr(y=rlykr,
into the simple
x)=1-exp(-exp(&-x3))
by the reparameterization &=log(exp(OJ-exp(&_,)}.
r=l,...,k-1,
and &, = - 00 (see La&a and Matthews (1985)). Thus, for F(x) exp( - exp( x)) and assuming simple models the cumulative model the sequential model. However, the equivalence holds only for the of models where the thresholds are not determined by explanatory general cumulative model
= D(x) = 1 is equivalent to skmple versions variables. The
= 1 - exp( -exp( & + ~‘6,))
pr(y
z)=,l
- exp[ - (exp(&
+ Z’S,) - exp(6,_,,0 + z’&_,>}].
The latter model may be reparameterized into a sequential model only for k = 2. That means if explanatory variables which determine the thresholds are incorporated the models are no longer equivalent. Exponential response model
Let us consider the asymmetric exponential response with F(u) = 1 - exp( - u). The cumulative exponential model is given by pr(yIrIx,
z)=l-exp(-(6,,+2’6,-~‘/3)},
where it is assumed that the linear term S,, + zS, - x’p is positive. may be reparameterized into the model pr(y=rly>r,
x, z)=l
-exp(
-(&.,+z$--x’&)},
(4 . 1) Model (4.1)
(4 . 2)
where 8,, = 610, 8, = $1, pi = /3, are = ?S&- Sr_*$, & = & - &__1, &Y= (0, 0,. . . , O), r=2 ,..., k-l. That means the cumulative model (4.1) is a special case of the sequential model (4.2) if for the sequential model x and z are threshold variables. The paramelers in (4.2) have a special structure since &. is a vector with zeros for r > 2. Here the global variables x of model (4.1) turn into threshold variables in model (4.2). If one starts with the general sequential model pr(y=rly>r, the cumulative pr(y=rly>r,
x, z)=l-exp(-(&.,+z’&.-x’/?)),
1 ( ,3)
x, z)=l-exp(-(6,,+z’S,-x’&))
(4 .4)
model
f:3llows from reparameterization
6,0 = C:= JrO, S, = Ci= i$, P, = rP:
G. Tut: / Sequential models in categorical regression
287
Again global parameters in model (4.3) have turned into threshold parameters in model (4.4). As is seen easily, if only threshold variables are considered the model . (45)
is equivalent
to the model
pr(y=rlyrr.,
z)=l
-exp{
[email protected]+z’8&
(4 .6)
where 8, = S,, & = S, -- $._ r, Y> 1. Thus the exponential model is the counterpart of the grouped Cox ml ‘el. Cumulative or sequential modelling lead to the same model if the thresholc ace determined by explanatory variables but lead to different models if the s nple versions with only global variables are considered. The parameters Sr in (L 6) refer to the transitions from category to category whereas the parameters Sr in (4.5) refer to the thresholds in the cumulative approach. Although the models are equivalent parameters have a differing meaning. It should be noted that for the reparameterizations considered in the pretext it is assumed that 6,, + Z’S < S,+ ,.O+ Z’S+ 1 holds for any vector z. If this condition is not fulfilled e.g. model (4.1) would make no sense. The two models show that in some cases cumulative and sequential approach lead to equivalent model-, For a more thorough investigation of the equivalence between models some preliminaries are necessary. Since a cumulative model with distribution function F might be equivalent to a sequential model with distribution function D it is first necessary to investigate the uniquences of cumulative and sequential models. 4.2. Uniqueness of cumulative and sequential models Let in the following CM( F, M,, Mg, ( S,,, S, }, /3>denote the set of distribu,. tions of y 1x, z given by p(_Ylrpx,
z) = F( s,, + z’i$ -- x’b),
where the parameters are admissible i.e. for all (z’, x’) E M, X Mg holds S,, + Z’S, < S,+ I.0 + ~‘6, and S,, + Z’S, + x’p is within the open subset of iR where F is strictly monotone increasing. Here, M, is the set of threshold variables and MK the set of global variables for which the model is assumed to hold. A cumulative model CM( F, M,, M,) for explanatory variables from M, x Mg is determined by the family of distribution sets CM( F, M,, Mg, (S,,, S,). /3) wherf the parameters are admissible. Depending on the range of the explanatory variables Ml X Mg the variety of differing models may reduce strongly. A simple exaruple is the case of one dichotomous variable with Mg = (0, l}, M, = +. Then cumulative models with stricly monotone increasing distribution functions will be equivalent i.e. CM( F,, M,, Mg) = CM( F2, M,, M,) will hold for any distribution functions I~,, F-.
G. Tutu / Sequentiul models in categorical regression
288
Consequently, models with differing F,, F, will be different only if the range iki( x: MK where the model is assumed to hold has some more structure. Let e, = (0.. . . ,l,. . . , 0) denote the i th unit vector and let U,,,( wO)= ( w0 + re; 1Y E ( -c, E)> denote the neighbourhood of w0 with respect to the ith variable. The condition U,,J wO>c MS X Mp whk:~ will be used in the following is fulfilled for example if the i th variable is continuous. Similar assumptions concerning the richness of the support of the explanatory variables have been ,,;d by Manski (1988) in his elaborate investigation of identification probte ns 3Lnbinary response models .I). Let F,, F, denote &stri&tion functions. (a) If there exist constants a > 0, b such that I;;(u) = F,(au -I-b) the cumulative models with distribution functions F!, F2 are equivalent, i.e. CM( F,, MS, M,) = CM(F,, M,, M,). ( b) Let there exist a neighbourhood Uc,i( wo) c MS X Mp. Then, from CM{ Zl, MS, Mp) = CM( E;, MS, Al,) follows the existence of constants a, b E R such that Fl (u) = F2(au -I-B), u E IF& Thus, distribution functions generate the same model if they are connected by a linear transformation. If MS X Mp is sufficiently rich each distribution function F generates a cumulative model which is identical only to models generated by linear transformations. For sequential models the basic form is given by pr(y=
rly>r,
x, z)=D(S,,+z’i$-x’P),
where parameters are considered admissible if for all ( z ‘, x ’ ) E M, X MS & + z ‘& - x’p is within the open range where D is strictly monotone increasing. In the same way as for cumulative models a sequential model SM( D, M,, M,J is introduced as a family of distribution sets following (3.3). It can be shown that sequential models have the same uniqueness properties as cumulative models. Proportion 4.1 holds if the cumulative model is substituted for the sequential model. 4.3. Equivalence of cumulative and sequential mode?iing It is immediately seen that for the ‘non-ordinal’ case k = 2 the CM( F, M,, M,) is equivalent to the sequential model SM( F, the following k > 2 is assumed. Moreover, it is again assumed neighbourhood U,,i ( W) c M, X MR. Consider the case where CM( F, M,, MR) is equivalent to Then for the first response category follows pr(y=llx,
z)=F(&-,+z’8,-x’P)=D(&,+z’&-x’@,
cumulative model M,, IQ. Thus, in that there exists a SM( D, M,, M,).
G. Tutz / Sequemial models in categorical regression
289
where S,, p, &, B are parameters for the cumulative and the sequential model respectively. Since F and D are unique only up to linear transformations we get the existence of constants a, b such that F(u)=D(au+b), holds for u E IR. Let us look at the probability termined by pr(y>21x,
(4 . 1) pr( y > 2 1x, z) which is de-
z)-l-F(S,,+z’S,-x’p) - D(S,,+z’S,-x’j3)).
= (1 - D( 8,, + z’6r -x’/I)>(l Under the assumption that tr,,i( w) c IWlX ik$ constants Z, b exist such that (l-
F(u))~=
holds
it may be b
(1 -F&+6))
(4.2)
holds for u E R. Thus equivalence yields the functional equation
of the cumulative
and the sequential
model
(4.3) After some derivation one can show that functional symmetric solution. Thus we get the following result
equation
(4.3) has no
.2). Let F be symmetric, i.e. F( u) = 1 - F( - u) holds everywhere. Then for the cumulative model CM( F, M,, M,) there is no equivaient sequential model and for the sequential model SM( F, 1M,, M,) there is no equivalent cumujative model. From Proposition (4.2) it is clear why the two models considered in section 4.1 have asymmetric distribution functions. Moreover, they clre solutions of the functional equation for the two interesting cases 5 = 1 and 5 f 1. The case 6 # I For asymmetric solutions of functional C > 1, solutions always have the form F(u) =0
for
O
else,
us
equation
(4.2) it may be shown that, if
-b/(a-1)
and if 6 < 1 solutions have the form F(u) = 1
for u 2 -b/(a
O< F(u) < 1
else.
- 1)
This result suggests an exponential form of the distribution. A more general solution than the exponential distribution is given by F(x) = 1 - exp( -x”), x > owever, for a # 1 the equivalence of the cumulative and the sequential approach for this more general solution holds only for the case of a unidimensional explanatory variable. Wence at = 1 is the more interesting case.
290
G. Tutu / Sequential models in categorical regression
The case 2 = I For 5 = 1 the extreme value distribution F(u) = 1 - exp( (Yexp( u)), (x < 0, seems to be the only distribution function in common use which is a solution of functional equation (4.3). The equivalence of the simple versions for (x = - 1 has been considered in section 4.1 and can be extended to the slightly more general function with (Yf - 1. 4.4. A further example Apar from the few cases where sequential and cumulative models are equivalent one has to decide which type of model to use. Williams and Grizzle (1972) consider an example where the influence of years lived in group quarters (0, 1 - 4, 2 5) and of the location (bowery, camp, park slope) upon the extent of current drinking (abstention, moderate, heavy) is investigated. Instead of the scoring methods used by Williams and Grizzle (1972) cumulative and sequential models are fitted to the data. Table 7 gives the results. From Table 7 it may be seen that the cumulative logit and exponential model yield a comparable fit if location (L) and years in quarters (Y) are considered as determing the thresholds. However, the simple versions of the models where the variables are global differ strongly. The exponential model yields a bad fit of the data. Thus McCullagh’s (1980) assumption that simple cumulative models are qualitatively similar does not hold for this data set. However. as Table 7 shows the sequential model is very robust against the choice of link function. The goodness of fit is about the same for all the models of differing complexity of explanatory variables. Sequential models seem to be less sensible to the choice of link. The reason behind might be that the sequentiai model actually models only transitions, i.e. dichotomous responses.
Table 7 Fitted models tar alcohol data.
Cumulative
Logit Exponential
Model
Deviance
df
LIP L, y
5.091 8.671 5.308 23.881
8 12 8 12
5.371 6.402 10.616 5.310 6.624 11.436
8 12 14 8 12 14
L,27
x2
Y2
L. v Sequential
Logit
Ll2,
Y2
L, y L Exponential
LIZ9
L, y L
y12
G. Tutz / Sequential m&h
in categorical regression
291
5.4. Collapsibility Since the categories of an ordinal response variable are sometimes slightly arbitrary it is preferable that grouping of response categories does not change the conclusions drawn from fitted models. Let the response categories ( 1,. . . , k ) be partitioned into subsets S, = ( k;._ 1 + 1,. . . , k, >, i = 1,. . . , t, where k, = 0, k, = k. Consider the simpler response variable jj=i
if yE$.
A model will be called collapsible with reference to a parameter if the parameter is unchanged by the transition from y to the coarser version jY For the general cumulative model one gets immediately pr(j%iIz,
X) = F( S,,, + z’i& - x’b).
Thus the cumulative model CM( F, M,, M,) is collapsible with reference to the latent variable parameter /3. Moreover it is collapsible for the threshold parameters S where of course some rearranging has to be kept in mind. In general the sequential model is not collapsible. Of course, exceptions are the simple grouped Cox model and the general exponential model considered in section 4. The simple grouped Cox model is sequential and cumulative. Thus /3 is unchanged by grouping of categories. For the sequential exponential model pr(y=rlz,
x)=1-exp(&,i-z’&.-x’p),
one gets after grouping pr( 7 = i 1z, x) = 1 -
exp(&, +
Z’~i- X’pi), where
The model is net collapsible -with reference to the pararmeters. But the parameters 8; may be interpreted as the sum of weights & over the categories of S,. The parameter j? becomes a threshold parameter which by the weight 1S, 1 accumulates the transitions within S,. The cumulative model clearly is more stable if categories of the response variable are grouped. For the sequential model only in special cases parameters of the model after grouping have a strong connection to the parameters of the original model. IIowever the importance of collapsibility shtiuld not be overrated. It is merely a secondary criterion for ordinal variables. If a response partem is changed by co;hapsing categories the influence of explanatory variables may also change (see also Anderson (1984, p. 2))
G. Tut: / Sequential models in categorical regression
292
5.2. Stereotype model and indistinguishability
of categories
Anderson (1984) introduced the so-called stereotype the simple one-dimensional form is given by
regression
model which in
k-l
pr(y=+)=exp(&,
I + C exP(&j-$JiP’x)
(5.1)
i=l
1, . . . , q = k - 1. In order to get an ordered regression @r,..., +k must fullfil the constraints r=
1 = $31> . . . ) C/Q= 0.
model the parameters (5 . 2)
However, the constraints
(5.2) are not imposed a priori. Consequently, for most of Anderson’s (1984) examples the estimates & art= not monotone. The need for different algorithms when (5.2) is used was pointed out by Fienbfrg and Bancroft (see Discussion of Anderson (1984)). But when the estimation +i are allowed to yield the ordering of categories the order is a result of the model. This approach is quite different from approaches to make use of the ordinal scale level. For the models considered here the order of categories is given and is to be exploited in order to get simple models. By contrast the stereotype approach establishes on order of categories a posteriori. Anderson (1984) considers an important concept called indistinguishability. Response categories are indistinguishable if x is not predictive between these categories For the one-dimensional model (5.1) categories r and s are indistinguishable if c#+= c#+.That means pr( y = r 1x)/pr( y = s 1x) does not depend on x. A similar concept of indistinguishability may be considered within the cumulative or sequential framework. Let us assume the sequential model with category specific variables pr(y=rlykr,
z)=D@~,+z’6,)
holds. Then z is not predictive for consecutive categories if & = 0. That means the transition from category r to category r + 1, given category r is reached, is not determined by the explanatory variables z. The proportion pr(y=rly2r,
z)/pr(y>rIy2r,
z)
does not depend on z. The hypothesis S, = 0 may be easily tested within the framework of generalized linear models. For the breathing data the hypothesis that borderline and abnormal are indistinguishable with respect to the explanatory variables corresponds to mod&l A,, S,, (A*S), which fits badly with devia,nce 10.97 on 5 degrees of freedom.
The models considered here may be estimated and tested within the framework of generalized linear models. owever, the models are not simple (or univariate)
G. Tutz / Sequential models in cate,aorical regression
293
generalized linear models, but require multivariate specifications. Since most authors consider generalized linear models as univariate response models, the concept is shortly sketched. Let the observations be given by ( yi, xi), i = 1, . . . , n where yl’ = ( yI1,. . . , y,,) is the response vector and xi = (xii, . . . , xi,) are the explanatory variables. The stochastic part of a multivariate generalized linear model is given by the assumption of a natural exponential family
f (ri
141 = c(Yij exP(eilv, - b(ei))~
where 3 E W is the natural parameter. The structural dependent and independent variables is given by
part which brings together
where g: W --) R4 is the link function, r_l(0;) = E( y, 1x,) is the expectation of y, given Xi, Xi is a design matrix composed from the explanatory vector xi and y is a parameter vector. If yi = (yii,. . . , y;,> is multinomially distributed, which is the case considered here, we have E(yilxi)=(7r1,..., Q’ where 7~, is the probability of category r and ?rk= I -7r.i + . . . +w~ is the probability for the reference category. As an example let us consider the general sequential logit model
It is easily seen that the link function g,(7’r*,..., 7$j =log(n,/l-7ri
g = ( g,, . . . , g,) is given by
- . . . -1z,).
The design ;natrix which is built from the vector (xl, z,!) of explanatory is given by
variables
Thus sequential models as well as cumulative models fit into the framework of (multivariate) generalized linear models. The concept is quite natural for polychotomous response variables since the multinomial distribution is a multivariate distribution. Consistency and asymptotic normality for ML estimates as well as asymptotic results for the deviance and Pearson statistic have been given by Fahrmeir and Kaufmann (1985), Fahrmeir (1987). Existence and uniqueness of ML estimates has been investigated by the late . Kaufmann (1988). For thz cumulative model alternative app ches to parameter estimation have been based on the concept of composite li t of the thresholds as nuisance parameters (1981)) or the tre e specification of ver, the multiv Nelder (1989)). models allows for more general models.
294
G. Tutz / Seqtiential models in categorical regression
ble which are based on this concept. The computations for this paper have been done by GLAMOUR (Fahrmeir et al. (1990)), a statistical package for generalized linear models which explicitely allows to choose link functions for ordinal data and to distinguish between category specific and global explanatory variables.
7. Concluding remarks The sequenLtii 91 models have a strong connection to discrete hazard models as considered by Kalbfleisch and Prentice (1980), Aranda-Ordaz (1983), Mantel and Hankey (1978), Thompson (1977), Tibshirani and Ciampi (1983) and Hamerle and Tutz (1989). However, in the context of survival analysis the derivation and consequently the models are somewhat different from the models considered here. If sequential models are derived from continuous hazard models (Kalbfleisch and Prentice (1980)) only global variables are included since variables which determine the continuous hazard rate can not depend on categories. Another strong connection of the models considered here is to latent trait models as used in psychometrics. In psychometrics explanatory variables are substituted for latent traits of persons and items. Apart from difficulties in estimation and testing arising from the problem that no repeated observations are available the models are often very similar. Samejima’s (1972) model is almost identical to the cumulative logit model. Sequential models have been considered by Molenaar (1983) and Tut.z (1990). It is always preferable to have a simple model which fits the data well. Hence a model including only global explanatory variables is preferable in particular for the cumulative type model. But especially if the number of response categories is large it is not to be assumed that for each sequential logit the effect of explanatory variables is homogeneous. In those cases category specific variables often are more appropriate.
eferences Agresti, A. (1984). Analysis of ordinal categorical data. New York: Wiley. Amemiya, T. (1375). Qualitative response models. Annals of Economic and Social Measurement, 413, 363- 372. Anderson, J.A. (1984). Regression anj ordered categorical variables. J. R. Statist. Sot. B, Anderson, J.A., Phillips, R.R. (1981). Regression, discrimination and measurement models for ordered categorical variables. Applied Statistics, 30. Aranda-Ordaz, F.J. (1983). An extension of proportional hazards model for grouped data. Biometrics, 39. 110-118. Cox. D.R. (1972). Regression models and life tables (with discussion). J. R. Statist. Sot., B, 34, 187-220. Edwards, A.L., Thurstone, L.L. (1952). An internal consistency check for scale values determined by the method of successive intervals. Psychometrika, 17, 169-180. Fahrmeir, L. (1987). Asymptotic testing theory for generalized linear models. Statistics, 18, 65-76.
G.
Tut,- / Sequential models in categorical regression
295
Fahrmeir, L., Frost, H., Hennevogl, W., Kaufmann, H., ‘Kranert, T. Tutz, G. (1990). GLAMOURuser and example guide. University of Regensburg. Fahrmeir. L., Kaufmann, H. (1985). Consistency and asymptotic normality of the maGmum likelihood estimator in generalized linear models. The Annals of Statistics, 13, 342-368. Farewell, V.T. (1982). A note on regression analysis of ordinal data with variability of classification_ Biometrika, 69, 533-538. Forthofer, R.N., Lehnen, R.G. (1981). Public Program Analysis, A New Categorical Data Approach. Belmont, Calif.: Lifetime Learning Publications. Greenland, S. (1985). An application of logistic models to the analysis of ordinal responses. Biom. Journal, 27, 189- 197. Hamerle, A., Tutz, G. (1989). Diskrete Modelle zur Analyse von Verweildauem und Lebenszeiten. Frankfurt: Campus Verlag. Holmes, M.C., Williams, K. (1954). The distribution of carriers of Streptococcus pyogenes among 2413 healthy children. J. Hyg. Camb., 52, 165-179. Jansen, J. (1990). On the statistical analysis of ordinal data when extravariation is present. Applied Statistics, 39, 75-84. Kalbfleisch, J., Prentice, R. (1980). The statistical analysis of failure time data. John Wiley & Sons, New York. Kaufmann, H. (1988). On existence and uniqueness of maximum likelihood estimates in quanta1 and ordinal response models. Metrika, 35, 291-313. LZar& E., Matthews, J.N. (1985). The equivalence of two models for ordinal data. Biometrika, 72, 206-207. Manski, C. (1988). Identification of binary response models. J. Amer. Statist. Assoc., $3, 729-738. Mantel, N., Hanlcey, B.F. (1978). A logistic regression analysis of response-time data where the hazard function is time-dependent. Comm. Statist.. A7, 333-347. McCullagh, P. (1980). Regression models for ordinal data. J. R. Statist. Sot. B, 42, 109-142. McCullagh. P., Nelder, J. (1989). Generalized linear models (second edition). London: Chapman and Hall. Nelder, J.A., Wedderbum, R. (1972). Generalized linear mode!s. _r. RF;. S!un!&. See, A, 135. 370-384. Quednatt, H.D. (1988). An extended threshold model for analyzing ordered categorical data. Biom. Journal, 30, 147-155. Randall. J.H. (1989). The analysis of sensory data by generalized linear models. Biometrical Journal, 31. 781-793. Terza, J.V. (1985). Ordinal probit. a generalization. Commun. statist. -theor. meth., 14, l-l 1. Thompson, W.A. (1977). On the treatment of grouped observations in life studies. Biometrics, 3% 463-470. Thompson, R., Baker, R.J. (1981). Composite link functions in generalized linear models. Applied Statistics, 30, 125-131. and additive-hazards models for Tibshirani, R., Ciampi, A. (1983). A family of proportionalsurvival data. Biometrics, 39, 141-147. Tutz, G. (1989). Compound regression models for categorica! ordinal data Biometrical Jownd 3% 259-272. Tutz, G. (1990). Sequential item response models with an ordered response. British Jourmd OfStat& Math. Psychology (in print). Williams, Q.D., Grizzle, J.E. (1972). Analysis of contingency tables having ordered response categories. JASA, 67, 55-63.