Journal of Statistical Planning and Inference 64 (1997) 83-92
ELSEVIER
journal of statistical planning and inference
Likelihood-based confidence limits for the common odds ratio S.R. Paul *, S. Thedchanamoorthy Department of Mathematics and Statistics, University q[ Windsor, Windsor, Ont., Canada N9B 3P4 Received 4 June 1996; revised 28 October 1996
Abstract In this paper we derive two likelihood-based procedures for the construction of confidence limits for the common odds ratio in K 2 × 2 contingency tables. We then conduct a simulation study to compare these procedures with a recently proposed procedure by Sato (Biometrics 46 (1990) 71-79), based on the asymptotic distribution of the Mantel Haenszel estimate of the common odds ratio. We consider the situation in which the number of strata remains fixed (finite), but the sample sizes within each stratum are large. Bartlett's score procedure based on the conditional likelihood is found to be almost identical, in terms of coverage probabilities and average coverage lengths, to the procedure recommended by Sato, although the score procedure has some edge, in some instances, in terms of average coverage lengths. So, for 'fixed strata and large sample' situation Bartlett's score procedure can be considered as an alternative to the procedure proposed by Sato, based on the asymptotic distribution of the Mantel Haenszel estimator of the common odds ratio. © 1997 Elsevier Science B.V. A M S class!lication: 62F25 Keywords: Bartlett's score procedure; Conditional likelihood; Sato's procedure
1. Introduction The importance of the common odds ratio in the epidemiological case-control and follow-up studies is well known (Mantel and Haenszel, 1959; Breslow, 1981; Hauck et al. 1982; Robins et al., 1986; Sato, 1990. etc.). Much work has been done on testing the assumption of common odds ratio (Mantel et al. 1977; Tarone, 1985; Liang and Self, 1985; Paul and Donner, 1989, 1992, etc.) and on testing if the common odds ratio is one (Mantel and Haenszel, 1959; Breslow and Liang, 1982; Jewell, 1984, etc.). A number of procedures for the construction of confidence limits of the common odds ratio has been proposed based on various large sample variance estimators of the Mantel-Haenszel estimator of the common odds ratio (Hauck and Wallmark, 1983; * Corresponding author, email:
[email protected]. 0378-3758/97/$17.00 (~ 1997 Elsevier Science B.V. All rights reserved PHS0378-3758(96)00214-5
84
S.R. Paul, S. Thedchanamoorthy / Journal of Statistical Planning and Inference 64 (1997) 83-92
Robins et al., 1986; Sato, 1990, etc.). Very little work has been done to construct and compare likelihood-based confidence interval procedures for the common odds ratio. The purpose of this paper is to derive and compare two likelihood-based procedures for the construction of confidence limits for the common odds ratio. In particular, we derive a procedure based on the likelihood score (Bartlett, 1953) and a procedure based on the likelihood ratio, each using both the unconditional and the conditional likelihood. These four procedures are then compared, through simulation, with a recently recommended procedure by Sato (1990), which is based on the asymptotic distribution of the Mantel-Haenzel estimator of the common odds ratio. The notations, assumptions and the likelihoods are given in Section 2. Section 3 provides the derivation of the likelihood-based procedures and a description of Sato's procedure and results of a simulation study are reported in Section 4.
2. Notation, assumptions and the fikefihoods Consider a series of K independent 2 x 2 contingency tables, with the data in the kth table denoted as Xlk N l k - Xlk Nlk
Y2k N 2 k - X2k Nzk Tk N ~ - T k N k We assume that each X/k is binomial with parameters Pik and Nik. The odds ratio in the kth table is
~tlc = (PlkQ2k )/(QlkP2k ), where Qik = 1 - Pik, i = 1,2 and k = 1,2 ..... K. Further, we assume that ffl = ~2 = . . . . f f k = O , where 0 < i f < oo. Now, denote ~ = log~ as the common log-odds ratio and Pk = log(Plk/Qlk). Then, the unconditional log-likelihood, denoted by ll is given by K
Ii = C1 + ~ ( T k p k +
( N 2 k -- X 2 k )7 - -
Nlk log( 1 + epk) -- N2k log(e ~ + epk)
k=l
and the conditional log-likelihood is K
12 = C2 + ~
(X/k7 - log fk(7)),
k=l
where C1 and C~ are constants independent of the parameters and fk(v)
bk ( N l k ) {
Nzk
"~e,X,k'
Ylk~ak
where ak = max(0, Tk -- N2k) and bk = min(Tk,Nlk).
S.R. Paul. S. Thedchanamoorthy/ Journal of Statistical Planning and Inference 64 (1997) 83 {)2 ~5 3. The confidence limits
In this section, we derive the four likelihood-based procedures and give a brief description of the procedure, based on the Mantel-Haenszel estimator of the common odds ratio and an estimator of its asymptotic variance proposed by Sato (1990). First, we discuss Bartlett's procedure and the likelihood ratio procedure in general notations and then we will derive the specific results using the unconditional and the conditional likelihoods. Let 1(0,4)) be the log-likelihood corresponding to the parameter of interest 0, which is a scalar and the vector of nuisance parameters 4) = (4)t, 4)2..... 4)~ )T. Further, define (c321) Ioo = e \ - ~ j ,
1o¢ = E
(
02l ~ =I~'0 and E( ~2l '~ -~Oe4) J 14~e~=- \ C34)~4)TJ "
Bartlett (1953) claims that the adjusted likelihood score 9l
c31
V(O) =- fro - 1o¢I¢~ ~
can be considered as an alternative to the maximum likelihood estimator of 0 with var(T(0)) = Ioo.+ = Ioo -Io~I¢~1~o and the standardized variable S = T ( O ) / v / ~ ~ to be completely insensitive to the estimate of 4) for given value of 0. He indicates that in small samples, this substitution will affect the distribution of S and if the maximum likelihood estimate ~(0) of 4) for given 0 is used, then this will affect the distribution of T to the order of approximation (~(1/n), where n is the sample size. Thus, an approximate 1 0 0 ( 1 - ~)% confidence interval for 0, when maximum likelihood estimate @0) of 4) for given 0 is used, is obtained by solving
where Z~/2 is the 100(1 - e / 2 ) % point of the standard normal distribution. Let 0 and ~ be the maximum likelihood estimate of 0 and 4). Then, the approximate 100(1 - ~ ) % confidence interval for 0 by the likelihood ratio procedure is obtained by solving 2[/(0, q~) - l{O, ~(0)}] ~
86
S.R. Paul S. Thedchanamoorthy/Journal of Stat&tical Planning and Inference 64 (1997) 83-92
and (Pl,P2,..., PK) T by ~b, obtaining appropriate derivatives and simplifying, the 100(1- ~)% confidence interval for the common log-odds ratio y is obtained by solving
kZ=lDt
/i
F_,,(At - Bt ) = :~z~/2,
(1)
k=l
where Dk = {Neke~k -- X2k(ee + e ~k)}/eT(e~ + ePk),
Ak = N2kePk/eY(e7 + e~k)2 Bk = NZkeP~(1 + epk)2/(e~ + ePk)Z{Nlk(e7 + ePk)2 + N2ke~+Pk(1 + ePk)z} • Eq. (1) can be solved by using the 1MSL subroutine ZBREN. Note that ¢Sk=~t(7) is the maximum likelihood estimate of Pt for given value of ~ and detailed derivations show that
/~t = log
2At
'
where At = (Nk - &), Bk = --Tk(1 + e ~) + Nlker + Net and Ck = - T t e L Denote the lower and upper limits of the 100(1 - :¢)% confidence interval for 7 obtained by solving Eq. (1) by 7BUL and ?Buu. Then, the corresponding lower and upper limits of the 100(1 - c¢)% confidence interval for the common odds ratio using Bartlett's procedure based on the unconditional likelihood are ~BUL = e fBu, and ~Buu = e iBuu.
3.2. Bartlett's procedure based on the conditional likelihood (BC) Here we consider the conditional log-likelihood 12. We have only one parameter of interest, namely ? and no nuisance parameter. So, the approximate 100(1 - ~)% confidence interval for ? is obtained by solving
~7
\ ~72 ] = + z ~ / 2 ,
which, after appropriate derivation can be written as E kK = ~ ( X lk -
E(X~,; ~))
= ±z~/2,
(3)
V/~kK1 var(Xlk; 7) E(Xlk;~) and var(Xlk;7) are the mean and variance of the extended hypergeometric distribution with parameters Nlk, N2k and 7 (see Plackett, 1981). Eq. (3) also can be solved by using IMSL subroutine ZBREN. Denote the lower and upper limits of the approximate 100(1 - ~)% confidence interval for ~ obtained by solving Eq. (3) by
S.R. Paul, S. Thedchanamoorthy / Journal of Statistical Planning and Inference 64 (1997) 83 92 I~7
7BCL and '}'Bcu. Then, the approximate lower and upper limits of the 100(1 - : 0 % confidence interval obtained by using the Bartlett's procedure based on the conditional likelihood a r e IfiBCL = e %~ and ~BCU = e %ct,.
3.3. The likelihood ratio procedure based on the unconditional likelihood ( L U )
Again, we consider the unconditional likelihood Ii, which involves the parameter of interest 0 = 7 and the vector of nuisance parameters ~b = (Pl,p2 . . . . . px) v. Let ~, and ilk, k = 1. . . . . K, be the maximum likelihood estimate of 7 and pk obtained by maximizing the unconditional likelihood l (this can be obtained by using the IMSL subroutine DUMINF). As in Section 3.1 fik = fi~-(7) is the maximum likelihood estimate of &, k = 1. . . . . K, for given 7, which is given by Eq. (2). Then, the approximate 100(1 - ~)% confidence interval for 7 using the likelihood ratio procedure based on the unconditional likelihood is obtained by solving K
2 Z [ T k ( f i k - ilk) + (N2k -- X2k )( 9 -- 7) - N~k log{(1 + e £ )/(1 + fla-)} k=l
-Nzk log{(e 9 + e A )/(e ~' + e ~k )}] ~
(4)
Denote the confidence limits obtained by solving Eq. (4) by ~LUL and ~'LCU" The corresponding confidence limits for ~ are ~LUL = e ~:u,L and qJLUU = e ;LLt'
3.4. The likelihood ratio procedure based on the conditional likelihood ( L C )
Using the conditional likelihood 12, the 100(1 - ~ ) % obtained by solving
confidence interval for 7 is
K ~2 2 Z [ X l k ( 7c - 7) + log{fk(7)/j'k( 9c)}] --:)~l-~(1 ),
(5)
k=l
where )'c is the maximum likelihood estimate of 7 by maximizing 12, which is obtained by solving K
K
Z x'* = Z k=l
(6)
k=l
Eqs. (5) and (6) can be solved by using the IMSL subroutine ZREAL. Denote the lower and upper confidence limits with confidence coefficient 1 -:~ using the likelihood ratio procedure based on the conditional likelihood by ;)LCL and ~'LCU. The corresponding limits for qJ a r e ~LCL = e~ .... a n d I~LCL = ek~,,.
88 S.R. Paul, S. Thedehanamoorthy / Journal of Statistical Planning and Inference 64 (1997) 83-92
3.5. Sato's procedure based on M a n t e l - H a e n s z e l estimator ( S M H )
Let Rk = Xlk(N2k - X2k )/Nk, Sk = X2k(NI~ - Xlk )/Nk, Pk = (X~k + N2k -- X2k )/Nk, K K Qk = 1 - P k , R+ = ~k=~Rk and S+ = ~ k = t Sk. The Mantel-Haenszel estimator of the common odds ratio, ~Mrt is defined by the solution of the following estimating equation (Davis, 1985): R+ - ~S+ = 0. Sato (1990) showed that, under both sparse data (k large and Ark fixed) and largestrata (k fixed and Ark large) settings, the asymptotic distribution of ( R + - ~bS+)2/~, W K is Z2(1), where W = ~-~k=l[(Qk + N Z I ) R k + (Pk + Nk-1)Sk]. Then an approximate 100(1 - ~ ) % confidence interval for ~ can be constructed by solving (R+ -
¢s)
2 _ z~(1).
¢,w The solutions are (~kL, ~9u) =
2R+S+ + Z~2 W -4- V/(4R+S+ + Z2 W)X~2 W 2S 2 ,
(7)
where Z2 is the upper ~ percentage point of the Z2 distribution with 1 degree of freedom. Sato compared, through simulation, a number of procedures based on the Mantel-Haenszel estimator and various estimators of the variance of (R+ - ~ S + ) and recommend the confidence limits obtained from (7).
4. Simulation In this section we report on a simulation study conducted to compare the four likelihood based procedures BU, BC, LU and LC and the procedure SMH, based on the Mantel-Haenszel estimator of the common odds ratio, proposed and recommened by Sato (1990). IMSL random number generator RNBIN was used to simulate binomial variables. The range of values for the parameters K, Nlk, N2k and Plk used in the simulation studies were chosen to be representative of situations which arise in epidemiologic practice. In the simulation study, for each of the K = 5 and K = 10 strata, the sample sizes chosen were (N~k,N2k) = (20,20),(10, 10),(5,5),(5,20). The values of probabilities P~k chosen were Plk = 0.05 + O.04k(20/k) (see, Robins, et al., 1986) and the values of ~ chosen were ~ = 1,3.5,6.5. For all the procedures and for each combination of K, (Nlk,N2k), P~k and ~ we produced the coverage probabilities and the average coverage lengths based on 1000 samples. Using conventional rule, we added 0.5 to each observed frequency in any simulated table where a zero observed frequency occured. The coverage probabilities and the average coverage lengths for the designs considered are given in Tables 1 and 2.
S.R. Paul, S. ThedchanamoorthylJournal
of Statistical Planning and InJerence 64 (1997) 83 92 89
Table 1 100x coverage probabilities and average coverage lengths of confidence intervals by the methods BU, BC, LU, LC and SMH; Plk = 0.05 + O.04k(20/K), K - 5, :~ - 0.05; based on 1000 replications Ntk
20
10
5
N2k
20
10
5
~
Method
Coverage probability
BU BC LU LC SMH
94.9 95.0 95.0 95.1 94.7
1.4 1.0 1.5 1.4 1.4
3.5
BU BC LU LC SMH
95.3 96.2 95.4 95.6 94.7
5.8 5.5 6.1 5.7 5.5
6.5
BU BC LU LC SMH
96.5 95.8 97.0 96.0 94.3
12.3 11.4 13.6 12.0 11.4
1.0
BU BC LU LC SMH
94.4 95.7 94.7 96.4 95.7
2.3 2.2 2.4 2.2 2.3
3.5
BU BC LU LC SMH
96.3 96.4 96.8 97.0 96.0
9.4 8.3 10.5 8.9 8.3
6.5
BU BC LU LC SMH
96.0 95.5 97.0 96,4 95.3
18.6 15.8 21.7 17.1 15.9
1.0
BU BC LU LC SMH
95.8 96.1 96.3 97.9 96.5
3,8 3.4 4.3 3.9 3.5
BU BC LU LC SMH
96.1 957 96.8 99.5 96.0
13.4 10.4 16.1 12.7 11.5
BU BC LU LC SMH
95.9 94.0 97.4 99.6 93.6
22.9 16.9 23.2 22.5 18.7
1.0
3.5
6.5
Average length
90 S.R. Paul S. Thedchanamoorthy/Journal of Statistical Planning and Inference 64 (1997) 83-92 Table 1 (continued) Nlk
5
N2k
20
~
Method
Coverage probability
Average length
1.0
BU BC LU LC SMH
95.3 95.7 95.9 97.2 95.6
2.7 2.7 2.8 2.8 2.4
3.5
BU BC LU LC SMH
95.2 96.0 96.3 97.3 95.6
10.4 9.3 11.6 10.0 9.2
6.5
BU BC LU LC SMH
95.9 95.0 97.6 97.3 95.9
20.9 17.6 23.8 19.3 13.2
Table 2 100x coverage probabilities and coverage lengths of confidence intervals by the methods BU, BC, LU, LC and SMH; elk = 0.05 + 0.04k(20/K), K -- 10, ct = 0.05; based on 1000 replications Nlk
20
10
N2k
20
10
Ip
Method
Coverage probability
Average length
1.0
BU BC LU LC SMH
95.7 95.9 95.8 96.0 95.1
1.0 0.9 0.9 0.9 0.9
3.5
BU BC LU LC SMH
94.9 95.5 95.8 95.8 96.4
3.8 3.6 3.9 3.7 3.6
6.5
BU BC LU LC SMH
96.2 96.3 96.9 96.6 95.5
7.8 7.4 8.4 7.5 7.3
1.0
BU BC LU LC SMH
94.6 95.1 94.4 95.0 95.1
1.4 1.4 1.5 1.4 0.9
3.5
BU BC LU LC SMH
96.1 96.1 96.5 96.1 96.7
5.6 5.0 5.8 5.2 5.1
S.R. Paul, S. Thedchanamoorthy / Journal o f Statistical Planning and Inference 64 (1997) 83 92 9 I Table 2 (continued) Nlk
Nzk
t)
Method
Coverage probability
Average length
BU BC LU LC SMH
97.2 94.9 97.5 95.5 94.6
11.0 9.6 I 1.0 10.0 9.4
BU BC LU LC SMH
96.0 97.3 95.9 97.3 95.3
2.2 1.9 2.3 2.1 2.1
BU BC LU LC SMH
97.1 95.3 97.3 96.6 94.6
7.4 6.0 7.9 6.5 6.4
6.5
BU BC LU LC SMH
94.8 95.3 96.1 92.1 96.2
12.3 9.6 12.3 10.4 10.4
1.0
BU BC LU L(' SMH
96.1 96.3 95.6 96.4 96.9
1.6 1.6 1.6 1.6 1.5
3.5
BU BC LU LC SMH
95.6 96.4 95.6 96.7 96.9
6.4 5.6 6.4 5.7 5.6
6.5
BU BC LU LC SMH
96.5 96.3 96.8 96.8 96.4
12.0 10.4 13.3 10.8 1(1.9
6.5
5
1.0
3.5
20
In t e r m s o f b o t h c o v e r a g e p r o b a b i l i t i e s and c o v e r a g e l e n g t h s the p r o c e d u r e s BC and S M H are a l m o s t identical, a l t h o u g h in s o m e i n s t a n c e s , for e x a m p l e , for Niz. = 5,
Nzk = 20, ~ ~ 6.5 a n d K =
10, the B C p r o c e d u r e has s o m e e d g e in t e r m s o f av-
e r a g e c o v e r a g e lengths. T h e B a r t l e t t ' s score p r o c e d u r e BC, b a s e d on the c o n d i t i o n a l likelihood, w h i c h r e q u i r e s s o l v i n g o n l y t w o s i m p l e e q u a t i o n s , can b e c o n s i d e r e d as an alternative to the S M H p r o c e d u r e b a s e d on the M a n t e l - H a e n s z e l e s t i m a t o r o f the c o m m o m o d d s ratio.
92 S.K Paul S. Thedchanamoorthy /Journal of Statistical Plannind and Inference 64 (1997) 83-92
Acknowledgements This research was partially supported by the Natural Sciences and Engineering Research Council of Canada. References Bartlett, M.S. (1953). Approximate confidence intervals II. More than one unknown parameters. Biometrika 40, 306-317. Breslow, N.E. (1981). Odds ratio estimator when the data are sparse. Biometrika 68, 73-84. Breslow, N.E. and K.Y. Liang (1982). The variance of the Mantel-Haenszel estimator. Biometrics 38, 943-952. Davis, L.J. (1985). Weighted averages of the observed odds ratios when the number of tables is large. Biometrika 72, 203-205. Hauck, W.W., S. Anderson and F.J. Leahy (1982). Finite sample properties of some old and some new estimators of a common odds ratio from multiple 2x2 tables. J. Amer. Statist. Assoc. 77, 145-152. Hauck, W.W. and C.B. Wallmark (1983). A comparison of confidence interval methods for a common odds ratio. Presented at the August 1983 Joint Statistical Meetings in Toronto. International Mathematical and Statistical Libraries (1991). 1MSL Manual, Houston, TX. Jewell, N.P. (1984). Small sample bias of point estimators of the odds ratio from matched sets. Biometrics 40, 421-435. Liang, K.Y. and S.G. Self (1985). Tests for homogeneity of odds ratio when the data are sparse. Biometrika 72, 353-358. Mantel, N., C. Brown, and D.P. Byar (1977). Tests for homogeneity of effect in an epidemiologic investigation. Amer. J. Epidemeol. 106, 125-129. Mantel, N. and W.M. Haenszel (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J. National Cancer lnst. 22, 719-748. Paul, S.R. and A. Donner (1989). A comparison of test of homogeneity of odds ratios in K 2x2 tables. Statist. Med 8, 1455-1468. Paul, S.R. and A. Donner (1992). Small sample performance of tests of homogeneity of odds ratios in K 2 x 2 tables. Statist. Med. 11, 159-165. Plackett, R.L. (1981). Analysis of Categorical Data. MacMillan, New York. Robins, J., N.E. Breslow and S. Greenland (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse-data and large-strata limiting models. Biometrics 42, 311-323. Sato, T. (1990). Confidence limits for the common odds ratio-based on the asymptotic distribution of the Mantel-Haenszel estimator. Biometrics 46, 71-79. Tarone, R.E. (1985). On heterogeneity tests based on efficient scores. Biometrika 72, 91-95.