Ordinal probit model with random bounds

Ordinal probit model with random bounds

Economics Letters 33 (1990) 239-244 North-Holland 239 ORDINAL PROBIT MODEL WITH RANDOM Denis BOLDUC and Erik POOLE BOUNDS * UniversitP L.aual, Q...

358KB Sizes 0 Downloads 54 Views

Economics Letters 33 (1990) 239-244 North-Holland

239

ORDINAL PROBIT MODEL WITH RANDOM Denis BOLDUC

and Erik POOLE

BOUNDS

*

UniversitP L.aual, Quebec, Que. GIK 7P4, Canada Received 10 February 1989 Accepted 11 November 1989

We derive a general ordinal probit model by specifying the break points or bounds as random functions of explanatory variables. The proposed technique makes it possible to reduce classification errors caused by imperfections in the data collection process. The standard model with constant break points or bounds is expressed as a special case of the proposed model.

1. Introduction McKelvey and Zavoina (1975) make a strong case for the use of an ordinal probit statistical model for studying ordinal data assumed to indicate a latent, interval level dependent variable. Maddala (1983) recapitulates the model and suggests a number of significance tests and pseudo-R statistics. The procedure has, however, a major weakness; it constrains all dependent variable observations to respect the same break points or bounds. These bounds should ideally vary for each observation. Looked upon in a different way, imperfections in the data collection process may result in what Johnson and Creech (1983) call misclassification error. Random classification error should give the same results as random measurement error. Non-random r&-classification may, however, reflect efforts of the respondents in a study to adhere to recognized norms or may be rooted in some other socially conditioned response bias. In an attempt to explicitly account for classification error, we propose to individualize the bounds by replacing them with a linear functional form with a random error term. In the limiting case, the bounds are constants. We explore two different variants of the same model. First, we postulate a traditional normal distribution for the bounds under the assumption that overlapping distributions of the bounds should not pose any difficulties. Second, truncated normal distributions are assumed, to prevent the bounds from crossing each other.

2. Standard ordinal probit model The unobserved

or latent

yn*= ,Yx, +

u,,

model is essentially u, - NIID(O,

I>,

* The authors thank Danny B&urger and Bernard Fortin for helpful discussions. 0165-1765/90/$3.50

0 1990 - Elsevier Science publishers B.V. (North-Holland)

(1)

240

D. Bolduc, E. Poole / Ordinal probit model

where n (n=1,2,..., N) represents a single observation. A postulated standardized error term is necessary as an identifying restriction. Suppose the continuous random variable Y,* manifests itself in a limited number of discrete ordinal observations. Working from the underlying theoretical model to the empirical model, we conceptualize these manifested discrete responses as indicating a defined range of underlying real responses. The standard approach is to assume that the response ranges are limited by fixed separating parameters or bounds aj.

Y, =

1

if

(~aIY,*

< oL1

2

if

(it ly,*


(2)

i:

Note that by convention (Y,,= - cc and (Ye= cc. The preceding is the original model as described by McKelvey and Zavoina (1975). We emphasize that every case under study shares the same set of bounds. The probability of the realization Y,, =j is Pr( y, =j)

= Pr( aj_t 5 /3’x, + 24, <

aj)

(3) where Q(t) =Jr

--m

G(S) ds

and

+(t)

= &.e-f*“.

The likelihood equation and its natural logarithm, as well as the first and second derivatives of the log-likelihood function are given in Maddala (1983, pp. 48-49). The parameters are estimated using Newton-Raphson or other similar non-linear methods.

3. Random bounds We propose to specify individual specific random bounds which slide on the real continuum of the underlying or latent dependent variable so as to accommodate the imperfections contained in the ordinal discrete response. The individualized bounds are random functions of variables containing information that explains the cognitive or other processes which affect the position of the discrete responses. Choosing a simple linear functional form for the bounds, we have Ly/,, =

e+,+'j,n *

e,,,,-N(O,l),

j=1,2

,...,

J-l.

As an identifying restriction, we assume The response y,, = j occurs when ~~i_t,~I p’x, + u, < LX~,~. standardized errors for the bounds. We postulate the underlying probability model of the random

D. Bolduc, E. Poole / Ordinalprobii model

241

Y. =4 ___._.-_____~~_~_~_______,___________~~~_~_________,__________~~_~ ___________________ y; _____..I______ R

-ca

%

a,

+=

a,

Fig. 1

bounds to be exogenous and linear in the explanatory variables. 0, ( j = 1,. __, J - 1) is a (L X 1) vector of parameters specific to the jth break point. z, is a (L X 1) vector of socio-economic variables used to model the bound’s position on the real line. To allow for the standard model where the bounds are constant among individuals, we include a constant one as the first element of the z,, vector. Testing the exclusion of variables z*,~, z~,~, . . . , zL_ amounts to a test of the standard ordinal probit model specification. As we shall see, the bounds are estimated directly and their specification should either have undergone previous empirical verification or depend on the explicit intuitive judgement of the researcher. Figure 1 illustrates the problem in the space of the standard ordinal probit model with four discrete, ordered responses. Imagine we have observations for two individuals. The first individual’s true or underlying response yO* is situated on the real number line between the constant bounds (Y* and (Ye. This individual obliges those employing the standard ordinal probit model by manifesting the ‘correct’ discrete response, i.e., y = 3. A second individual shares the same latent response y,,* as the first individual but manifests a discrete response of y = 4. Such an apparent discrepancy is possible in the context of individual specific bounds, whereas it is inconsistent with the fixed bounds of the standard model. In section 4, we look at the model with overlapping tails in the distribution of the bounds. 4. Random bounds with traditional normal distributions Using eqs. (1) and (4), the probability follows: Pr( y, =j)

that the indicator

yn =j

manifests

itself is written as

= Pr( 6,l_,z, + ei_l,n I p’x, + 24, < 6yzn + E,,~) = Pr( u, - ej,n 5 0,‘~” - /3’x,) - Pr( u, - E,~,,~ I

Since, u, and ej,n - NIID(0, Therefore

1) Vj, n, postulating Cov(u,,

8,‘-tz, - P’x,).

(5)

ej,,) = 0 implies that Var(u, - c,,~) = 2.

The likelihood equation and its natural logarithm are thus:

In L =

5

i

n=l

j=l

dj,” 1

with j = 1, 2,. . . , J categories, and n = 1, 2,. . _, N observations.

(6)

242

-5

D. Bolduc, E. Poole / Ordinalprobit

-4

-2

-3

-1

I

2 1

0 I

E(a,,,)

E(o;,n)

model

3

4

5

E(%,,)

t_______---_----)[_____----)[-------__)[-------------~

y.= Fig. 2. Random

Y” =2

1

bounds

with traditional

normal

Y” = 3

y*=4

distributions.

dj,n is an indicator function such that d

=

J.n

1

if y, falls in the jth category,

0

otherwise.

The first and second derivatives may be calculated following the guidelines in Maddala (1983, pp. 48-49) and then solved iteratively using non-linear methods. That the model permits overlapping tails of the random bound distributions does not necessarily conflict with the reality that the model attempts to represent. If the error components of adjacent bounds for one individual act to reverse the order of the bounds, however, the result would be incoherent. As seen in fig. 2, CX~,~ is postulated to be drawn from a normal distribution with mean E(ol,,,) = ej’zn and variance 1. Overlapping tails could conceivably result in CY~,,falling on the right of ~l/+[,~, I= 1 or 2.

5. Random bounds with truncated distributions To prevent overlapping tails, we truncate the distribution of the random bounds. Such a model is illustrated in fig. 3. y* points to the real ordinal line of underlying responses. Here we have four categories of discrete responses which are separated by three bounds. The Li,n’~ represent the truncation points that separate the distributions of each of the random bounds aj n. We assume that the L,,,'s are fixed constants located halfway between each pair of means of adjacent bounds, i.e., 6’J’_iz,)/2. The extreme truncation points are determined under the assumption that the two (+” extreme bounds adjacent to positive and negative infinity have symmetrical distributions. The model based on random bounds with truncated distributions starts with eqs. (1) and (4) and the probability of the realization y,, =j can be expressed as in eq. (5). However, we send the error terms of the random bounds to the right-hand side of the inequality sign for reasons which will soon become apparent. Pr( y, =j)

= Pr( u, 5 Bizn - p’x, + ej,,) - Pr( u, 5

e;_lz, - /?‘x, + c~_*,~).

(7)

D. Bolduc, E. Poole / Ordinalprobit

y.=

L2,”

m%“)

Lx” y.=4

Y” = 3

Y. =2

1

243

I

I

I

L O.n WU L,,, E(s) t_____________)[_______--)[____-----)[__----_______~

model

Fig. 3

The distribution of the bound LYE,,is truncated at Lj_l,nand Lj,nand we express it as

Similarly Lj-2,n

_ e;_lz,

2

c,_~,~


-

e.‘_ Z I

ln

E

Wj12 ,

l,n

4

(j-l,n

<

w;Iltn-

@b)

By conditioning on ej,n and e,_,,, in the first and second terms of the right-hand side of eq. (7) the probability that y, =j manifests itself can be calculated as

@(e;z,- p’x, + v)

WY, =j) = jwT?, @cwj 1.n

) _

/.n

Qi(wj-~)

+(v)dv

J.n

The advantage of this formulation is that the probability of the realization of y,, =j can be viewed as a difference of expected probabilities

The likelihood equations are

d

“”

244

D. Bolduc, E. Poole / Ordinalprobit

In L = E

5

n=l

j=l

model

dl,” In

with j=l, 2,..., J categories, and n = 1, 2,. . . , N observations. d,,n is the indicator function such that d 1.”

=

1

if y, falls in the jth category,

0

otherwise.

The derivatives and computer programs for both models are available on request.

6. Conclusion This paper proposes a modification to the ordinal probit model that permits the explicit modelling of intermediate variables that may affect the way agents reveal preferences. We postulate a general ordinal probit model which contains the standard ordinal probit as a special case. In many cases, random individual bounds that differ from the mean should cancel each other. However, the econometric analyst is often confronted with data that has been collected by another researcher or agency, a collection process over which she has no control. Or she may suspect that cultural variables such as education influence the manner in which preferences are expressed resulting in classification error. In these cases, the techniques we have presented here may be worthwhile considering.

References Bolduc, D. and P. Fortin, 1988, L’opinion des quebecois en maitiere d’imigration: Une analyse polytomique ordinale, Cahier de Recherche 88-02 (Universitt Laval, Quebec). Boduc, D. and E. Poole, 1989, Ordinal probit model with random bounds, Cahier de recherche 89-11 (Universite Laval, Quebec). Johnson, D.R. and J.C. Creech, 1983, Ordinal measures in multiple indicator models: A simulation study of categorization error, American Sociological Review 48, June, 398-407. Maddala, GS., 1983, Limited-dependent and qualitative variables in econometrics (Cambridge University Press, Cambridge). McKelvey, R.D. and W. Zavoina, 1975, A statistical model for the analysis of ordinal level dependent variables, Journal of Mathematical Sociology 4, 103-120. Pratt, J.W., 1981, Concavity of the log likelihood, Journal of the American Statistical Association 373, 103-106. Winship, C. and R.D. Mare, 1984, Regression models with ordinal variables, American Sociological Review 49, 512-525.