Applied Mathematical Modelling 30 (2006) 477–488 www.elsevier.com/locate/apm
A higher order Markov model for analyzing covariate dependence M. Ataharul Islam
a,*
, Rafiqul Islam Chowdhury
b
a
b
Department of Statistics and Operations Research, College of Science, King Saud University, PO Box 2455 Riyadh 11451, Saudi Arabia Department of Health Information Administration, Kuwait University, PO Box 31470 Sulaibekhat, 90805, Kuwait Received 1 July 2004; received in revised form 1 April 2005; accepted 25 May 2005 Available online 12 July 2005
Abstract During the recent past, there has been a renewed interest in Markov chain for its attractive properties for analyzing real life data emerging from time series or longitudinal data in various fields. The models were proposed for fitting first or higher order Markov chains. However, there is a serious lack of realistic methods for linking covariate dependence with transition probabilities in order to analyze the factors associated with such transitions especially for higher order Markov chains. L.R. Muenz and L.V. Rubinstein [Markov models for covariate dependence of binary sequences, Biometrics 41 (1985) 91–101] employed logistic regression models to analyze the transition probabilities for a first order Markov model. The methodology is still far from generalization in terms of formulating a model for higher order Markov chains. In this study, it is aimed to provide a comprehensive covariate-dependent Markov model for higher order. The proposed model generalizes the estimation procedure for Markov models for any order. The proposed models and inference procedures are simple and the covariate dependence of the transition probabilities of any order can be examined without making the underlying model complex. An example from rainfall data is illustrated in this paper that shows the utility of the proposed model for analyzing complex real life problems. The application of the proposed method indicates that the higher order covariate dependent Markov models can be conveniently employed in a very useful manner and the results can provide in-depth insights to both the researchers and policymakers to resolve complex problems of underlying factors attributing to different types of transitions, reverse transitions and repeated transitions. The estimation and test
*
Corresponding author. E-mail address:
[email protected] (M. Ataharul Islam).
0307-904X/$ - see front matter Ó 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.apm.2005.05.006
478
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
procedures can be employed for any order of Markov model without making the theory and interpretation difficult for the common users. Ó 2005 Elsevier Inc. All rights reserved. Keywords: Markov model; Higher order Markov chain; Logistic regression; Repeated measures; Binary outcome
1. Introduction The theory and structure of Markov chains has been studied extensively during the recent past. For a detailed study in this area readers are referred to Cox and Miller [1], Kemeny and Snell [2], Chiang [3], and Karlin and Taylor [4]. During the recent past, there has been a renewed interest in different areas of research, both concerning modeling in time series analysis as well as analyzing data emerging from longitudinal studies. The Markov chain models for discrete variate time series appears to be restricted due to over-parameterization and several attempts have been made to simplify the application of Markov chain models. Raftery [5], Raftery and Tavare [6] and Berchtold and Raftery [7] addressed one such area of problems in estimating transition probabilities. These developments were motivated by the work of Pegram [8]. This area of research known as the mixture transition distribution (MTD), deals with modeling of high-order Markov chains for a finite state space. These models do not take account of the covariate-dependence in estimating transition probabilities. Albert [9] developed a finite Markov chain model for analyzing sequences of ordinal data from a relapsing remitting of a disease and Albert and Waclawiw [10] developed a class of quasi-likelihood models for a two state Markov chain with stationary transition probabilities for heterogeneous transitional data. There were several studies on discrete time Markov chain models proposed for analyzing repeated categorical data over decades. A model for estimating odds ratio from a two state transition matrix was proposed by Regier [11]. Prentice and Gloeckler [12] proposed a grouped data version of the proportional hazards regression model for estimating computationally feasible estimators of the relative risk function. Korn and Whittemore [13] proposed a model to incorporate role of previous state as a covariate to analyze the probability of occupying the current state. Wu and Ware [14] proposed a model which included accumulation of covariate information as time passes before the event and considered occurrence or non-occurrence of the event under study during each interval of follow up as the dependent variable. The method could be used with any regression function such as the multiple logistic regression model and also the one suggested by Prentice and Gloeckler [12]. Kalbfleisch and Lawless [15] proposed models for analyzing under a continuous time Markov process. They presented procedures for obtaining estimates for transition intensity parameters in homogeneous models. For a first order Markov model, they introduced a model for covariate dependence of log-linear type. Another class of models have emerged for analyzing transition models with serial dependence of the first or higher orders on the basis of the marginal mean regression structure models. Azzalini [16] introduced a stochastic model, more specifically, first order Markov model, to examine the influence of time-dependent covariates on the marginal distribution of the of the binary outcome variables in serially correlated binary data. Markov chains are expressed in transitional form rather than marginally and the solutions are obtained such that covariates relate only to
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
479
the mean value of the process, independent of association parameters. Following Azzalini, Heagerty and Zeger [17] presented a class of marginalized transition models (MTM) and Heagerty [18] proposed a class of generalized MTMs to allow serial dependence of first or higher order. These models are computationally tedious and the form of serial dependence is quite restricted [16]. If the regression parameters are strongly influenced by inaccurate modeling for serial correlation then the MTMs can result in misleading conclusions. Heagerty [18] provided derivatives for score and information computations. In recent years, there is a great deal of interest in the development of multivariate models based on the Markov Chains. These models have wide range of applications in the fields of reliability, economics, survival analysis, engineering, social sciences, environmental studies, biological sciences, etc. Muenz and Rubinstein [19] employed logistic regression models to analyze the transition probabilities from one state to another but still there is serious lack of a general methodology for analyzing transition probabilities of higher order Markov models. In a higher order Markov model, we can examine some inevitable characteristics that may be revealed from the analysis of transitions, reverse transitions and repeated transitions. It is noteworthy that the covariate dependent higher order Markov models can be used to identify the underlying factors associated with such transitions. In this study, it is aimed to provide a comprehensive covariate-dependent Markov model for higher order. A general procedure is developed comprehensively in this paper to propose the estimation procedure for Markov models for any order. The proposed model and inference procedures are simple and the covariate dependence of the transition probabilities of any order can be examined without making the underlying model complex. Another advantage of the model lies in the fact that the estimation and test procedures for both the specific parameter of interest and the overall model remain easy for practical applications for any longitudinal data.
2. Covariate dependent first order model Let us consider a two state Markov chain for a discrete time binary sequence as follows: " # p00 p01 . ð1Þ p¼ p10 p11 Here, 0 and 1 are the two possible outcomes of a dependent variable, Y. The probability of a transition from 0 at time tj1 to 1 at time tj is p01 = P(Yj = 1jYj1 = 0) and similarly the probability of a transition from 1 at time tj1 to 1 at time tj is p11 = P(Yj = 1jYj1 = 1). For covariate dependence, let us define the following notations: X 0i ¼ ½1; X i1 ; . . . ; X ip = vector of covariates for the ith person; b00 ¼ ½b00 ; b01 ; . . . ; b0p = vector of parameters for the transition from 0, b01 ¼ ½b10 ; b11 ; . . . ; b1p = vector of parameters for the transition from 1. Then the transition probabilities can be defined in terms of function of the covariates as follows: 0
ps1 ðY j ¼ 1jY j1 where s = 0, 1.
ebs X ¼ s; X Þ ¼ ps1 ðX Þ ¼ ; 0 1 þ ebs X
ð2Þ
480
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
The likelihood function can be defined as ni Y n Y 1 Y 1 h i Y d L¼ fpsm ðX i Þg smij ;
ð3Þ
i¼1 j¼1 s¼0 m¼0
where ni = total number of follow-up observations since the entry into the study for ith individual; dsmij = 1 if a transition type s–m is observed during jth follow-up for the ith individual. The log likelihood function, after substituting (2) in (3), can be expressed as ln L ¼ ln L0 þ ln L1 ; where L0 and L1 correspond to s = 0 and s = 1, respectively, from (2). Hence, ni h n X X i 0 d01ij fb001 X i g ðd00ij þ d01ij Þ ln 1 þ eb01 X i ln L0 ¼ i¼1
j¼1
and ln L1 ¼
ni h n X X i¼1
i 0 d11ij fb011 X i g ðd10ij þ d11ij Þ ln 1 þ eb11 X i .
j¼1
Differentiating with respect to the parameters and solving the following equations we obtain the likelihood estimates for 2(p + 1) parameters: o ln L0 ¼ 0; ob01q
q ¼ 0; 1; 2; . . . ; p
o ln L1 ¼ 0; ob11q
q ¼ 0; 1; 2; . . . ; p.
and
3. Covariate dependent higher order model: extension of Muenz–Rubinstein The covariate dependent higher order models can be proposed by extending the model for first order Markov chain. To illustrate the extension of the model for higher order, let us consider a second order Markov model. The second order Markov model for time points tj2, tj1 and tj with corresponding outcomes Yj2, Yj1 and Yj, respectively, is shown below:
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
481
Using the definitions of vectors of covariates and parameters mentioned in the previous section, we can define the transition probabilities: 0
p001 ðY j ¼ 1jY j2 ¼ 0; Y j1 p011 ðY j ¼ 1jY j2 ¼ 0; Y j1 p101 ðY j ¼ 1jY j2 ¼ 1; Y j1 p111 ðY j ¼ 1jY j2 ¼ 1; Y j1
eb001 X ¼ 0; X Þ ¼ 0 1 þ eb001 X 0 eb011 X ¼ 1; X Þ ¼ 0 1 þ eb011 X 0 eb101 X ¼ 0; X Þ ¼ 0 1 þ eb101 X 0 eb111 X ¼ 1; X Þ ¼ 0 1 þ eb111 X
;
ð4Þ
;
ð5Þ
;
ð6Þ
.
ð7Þ
It may be noted here that p000 + p001 = 1, p010 + p011 = 1, p100 + p101 = 1, and p110 + p111 = 1. Then the likelihood function, as a generalization of (3) can be expressed as follows: ni h n Y ih i Y L¼ fp000 ðX i Þgd000ij fp001 ðX i Þgd001ij fp010 ðX i Þgd010ij fp011 ðX i Þgd011ij i¼1 j¼1
h ih i fp100 ðX i Þgd100ij fp101 ðX i Þgd1011ij fp110 ðX i Þgd110ij fp111 ðX i Þgd111ij .
ð8Þ
The log likelihood function can be shown as ln L ¼ ln L1 þ ln L2 þ ln L3 þ ln L4 .
ð9Þ
Here L1 ¼
ni h n Y Y
i fp000 ðX i Þgd000ij fp001 ðX i Þgd001ij ;
ð10Þ
i¼1 j¼1
and using (4) we obtain " d000ij b0 X d001ij # ni n Y Y 1 e 001 . L1 ¼ 0 b0001 X 1þe 1 þ eb001 X i¼1 j¼1
ð11Þ
The log likelihood function is ni h n X X i 0 ln L1 ¼ d001ij b0001 X i ðd000ij þ d001ij Þ ln 1 þ eb001 X i . i¼1
ð12Þ
j¼1
The first derivatives with respect to parameters are 0 ni n X o ln L1 X eb001 X i ¼ X iq d001ij ðd000ij þ d001ij Þ ; 0 ob001q 1 þ eb001 X i i¼1 j¼1
q ¼ 0; 1; 2; . . . ; p.
Similarly from ln L2, ln L3 and ln L4, we can show the following: 0 ni n X o ln L1 X eb011 X i ¼ X iq d011ij ðd010ij þ d011ij Þ ; 0 ob011q 1 þ eb011 X i i¼1 j¼1
ð13Þ
ð14Þ
482
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
0 ni n X o ln L1 X eb101 X i ¼ X iq d101ij ðd100ij þ d101ij Þ ; 0 ob101q 1 þ eb101 X i i¼1 j¼1 0 ni n X o ln L1 X eb111 X i ¼ X iq d111ij ðd110ij þ d111ij Þ . 0 ob111q 1 þ eb111 X i i¼1 j¼1
ð15Þ ð16Þ
We can solve for the sets of parameters equating first derivatives to zero. Each set consists of (p + 1) parameters and hence the total number of parameters to be estimated here is 4(p + 1). The second derivatives are: ni n X X
o2 ln L1 ¼ X iq X il fðd000ij þ d001ij Þp000 ðX i Þp001 ðX i Þg ; ð17Þ ob001q ob001l i¼1 j¼1 ni n X X
o2 ln L1 ¼ X iq X il fðd010ij þ d011ij Þp010 ðX i Þp011 ðX i Þg ; ob011q ob011l i¼1 j¼1
ð18Þ
ni n X X
o2 ln L1 ¼ X iq X il fðd100ij þ d101ij Þp100 ðX i Þp101 ðX i Þg ; ob101q ob101l i¼1 j¼1
ð19Þ
ni n X X
o2 ln L1 ¼ X iq X il fðd110ij þ d111ij Þp110 ðX i Þp111 ðX i Þg . ob111q ob111l i¼1 j¼1
ð20Þ
Inverse of the (1) (second derivative) provide estimates of the variance–covariance for the respective estimates of the parameters. To generalize this to the kth order, we need to consider 2k sets of models. The transition probability matrix for the kth order Markov model with outcomes 0 or 1 at time points tjk, tj(k1), . . . , tj1, tj are represented by Yjk, Yj(k1), . . . , Yj1, Yj, respectively. The outcomes are represented in the following matrix:
In the above matrix, m = 1, 2, . . ., 2k. The transition probability for the transition type Yjk = sm,jk, Yj(k1) = sm,j(k1), . . . , Yj1 = sm,j1, Yj = sm,j = 1 is psm;jk ;sm;jðk1Þ ;...;sm;j1 ;sm;j¼1 ðX Þ ¼
egðX Þ ; 1 þ egðX Þ
ð21Þ
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
where
gðX Þ ¼ b0sm;jk ;sm;jðk1Þ ;...;sm;j1 ;sm;j¼1 X .
us
denote
¼ and psm;jk ;sm;jðk1Þ ;...;sm;j1 ;sm;j¼1 ¼ pm , m = 1, 2, . . ., 2 . Then the likelihood function, as a generalization of (3) can be expressed as follows: ni Y n Y 2k h i Y d 1d L¼ fpm ðX i Þg mij f1 pm ðX i Þg mij .
ð22Þ
b0sm;jk ;sm;jðk1Þ ;...;sm;j1 ;sm;j¼1
For
the
sake
b0m ,
of
brevity,
let
483
k
i¼1 j¼1 m¼1
The log likelihood function can be shown as ln L ¼ ln L1 þ ln L2 þ þ ln Lm þ þ ln L2k .
ð23Þ
Here Lm ¼
ni h n Y i Y d 1d fpm ðX i Þg mij f1 pm ðX i Þg mij ;
ð24Þ
i¼1 j¼1
and using (4) we obtain " dmij 1dmij # 0 ni n Y Y ebm X i 1 . Lm ¼ 0 0 1 þ ebm X i 1 þ ebm X i i¼1 j¼1 The log likelihood function is ni h n X X i 0 ln Lm ¼ dmij b0m X i ln 1 þ ebm X i . i¼1
ð25Þ
ð26Þ
j¼1
The first derivatives with respect to parameters are 0 ni n X o ln Lm X ebm X i ; m ¼ 1; 2; . . . ; 2k and q ¼ 0; 1; 2; . . . ; p. ¼ X iq dmij b0m X i obmq 1 þ e i¼1 j¼1 The second derivatives for ln Lm are ni n X X
o2 ln Lm ¼ X iq X il fpm ðX i Þð1 pm ðX i ÞÞg . obmq obml i¼1 j¼1
ð27Þ
ð28Þ
4. Testing for the significance of parameters The vector of 2k sets of parameters for the kth order Markov model can be represented as follows: b0 ¼ ½b1 ; . . . ; bm ; . . . ; b2k ; where b0m ¼ ½bm0 ; . . . ; bmp , m = 1, 2, . . ., 2k. To test the null hypothesis H0: b = 0, we can employ the usual likelihood ratio test 2½ln Lðb0 Þ ln LðbÞ v22k p ; where b00 ¼ ½b10 ; . . . ; bm0 ; . . . ; b2k ;0 .
484
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
To test the significance of the qth parameter of the mth set of parameters, the null hypothesis is H0: bmq = 0 and the corresponding Wald test is W ¼
^ b mq . ^ seðbmq Þ
5. Example The covariate dependent Markov model proposed in this paper is applied to the rainfall data from three districts in Bangladesh, namely, Rajshahi, Dhaka and Chittagong. The duration considered in this study ranges from 1964 to 1990. These secondary data were collected from the Department of Meteorology, Government of Bangladesh. In this paper, we have considered the months June to October for finding the appropriate order of the covariate dependent Markov models. This period is typically considered as the Monsoon season and major agricultural crops are produced during this period in Bangladesh. We have measured rainfall in mm and, in addition, three covariates are considered in the model, namely, wind speed (nautical miles/hour), humidity (relative humidity in percentage), and maximum temperature (in Celsius). S-Plus function developed by Chowdhury et al. [20] was used to estimate the parameters of our model. Table 1 displays the transition counts of different types for first, second and third order transitions for the three selected stations, Dhaka, Chittagong and Rajshahi. Table 2 shows the
Table 1 Transition counts of Markov chain for different orders of rainfall from three stations of Bangladesh Transition
Dhaka
Chittagong
Rajshahi
0
1
0
1
0
1
0 1
990 627
620 1759
1046 593
590 1767
1512 616
611 1257
Second order 0 1 0 1
0 0 1 1
639 321 194 408
314 283 407 1295
696 324 192 378
312 249 381 1329
1099 361 228 364
355 240 357 857
Third order 0 0 1 0 0 1 0 0 1 1 1 0 0 1 1 1
0 0 0 1 0 1 1 1
446 176 103 104 207 85 101 295
162 139 79 199 190 190 292 958
490 191 98 115 213 69 96 266
178 123 85 185 154 175 272 1016
851 214 141 135 205 86 110 238
204 133 83 207 148 141 234 596
First order
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
485
Table 2 Estimates of parameters of covariate-dependent Markov chain models for different order of rainfall data from three stations of Bangladesh Model First order 0!1 Constant Wind speed Humidity Maximum temperature 1!0 Constant Wind speed Humidity Maximum temperature Likelihood ratio Second Order 0!0!1 Constant Wind speed Humidity Maximum temperature 1!0!1 Constant Wind speed Humidity Maximum temperature 0!1!0 Constant Wind speed Humidity Maximum temperature 1!1!0 Constant Wind speed Humidity Maximum temperature Likelihood ratio Third order 0!0!0!1 Constant Wind speed Humidity Maximum temperature 1!0!0!1 Constant Wind speed Humidity Maximum temperature
Dhaka ^ b
s.e.
Chittagong ^ b
s.e.
Rajshahi ^ b
s.e.
30.01** 0.15** 0.28** 0.19**
2.43 0.03 0.02 0.04
24.79** 0.13** 0.25** 0.09*
2.21 0.02 0.02 0.04
25.39** 0.18** 0.22** 0.18**
1.79 0.03 0.01 0.03
2.49 20.64** 0.03 0.18** 0.02 0.24** 0.03 0.04 1618.58 (p < 0.01)
3.23 2.37 0.09** 0.01 0.12** 0.01 0.04 0.21** 1467.98 (p < 0.01)
7.04** 1.77 0.06* 0.03 0.14** 0.01 0.14** 0.03 1429.21 (p < 0.01)
25.23** 0.09* 0.18** 0.31**
2.78 0.05 0.02 0.06
10.72** 0.13** 0.07** 0.10
2.75 0.02 0.02 0.05
19.59** 0.10** 0.11** 0.30**
1.90 0.03 0.01 0.04
4.87 0.13** 0.04 0.03
3.59 0.05 0.03 0.06
13.75** 0.07** 0.09** 0.17*
4.25 0.02 0.03 0.08
8.95** 0.09* 0.07** 0.06
2.74 0.04 0.02 0.04
2.42 0.08 0.02 0.03
3.43 0.04 0.02 0.06
6.36 0.11** 0.07** 0.02
3.93 0.03 0.03 0.08
4.54 0.08* 0.01 0.16**
2.80 0.04 0.02 0.05
2.71 2.77 0.04 0.03 0.02 0.02 0.01 0.04 828.40 (p < 0.01)
1.31 1.57 0.01 0.06** 0.00 0.01 0.03 0.03 908.53 (p < 0.01)
3.57 1.90 0.05* 0.03 0.02 0.01 0.04 0.03 843.38 (p < 0.01)
25.76** 0.05 0.15** 0.39**
3.52 0.06 0.02 0.07
13.52** 0.12** 0.10** 0.11
3.51 0.03 0.03 0.07
16.11** 0.02 0.05** 0.32**
15.32** 0.08 0.08* 0.26**
5.31 0.07 0.04 0.10
1.31 0.05 0.00 0.02
5.18 0.03 0.03 0.10
2.12 0.04 0.01 0.04
9.23* 3.65 0.06 0.06 0.06* 0.03 0.10 0.06 (continued on next page)
486
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
Table 2 (continued) Model 0!1!0!1 Constant Wind speed Humidity Maximum temperature 0!0!1!1 Constant Wind speed Humidity Maximum temperature 1!1!0!1 Constant Wind speed Humidity Maximum temperature 1!0!1!0 Constant Wind speed Humidity Maximum temperature 0!1!1!0 Constant Wind speed Humidity Maximum temperature 1!1!1!0 Constant Wind speed Humidity Maximum temperature Likelihood ratio * **
Dhaka ^ b
s.e.
Chittagong ^ b
s.e.
Rajshahi ^ b
s.e.
1.41 0.10 0.03 0.13
5.59 0.08 0.04 0.11
18.03* 0.14** 0.09 0.31*
7.60 0.05 0.05 0.14
3.76 0.03 0.03 0.02
4.38 0.07 0.03 0.08
0.25 0.01 0.02 0.03
4.84 0.07 0.03 0.09
1.36 0.05 0.01 0.07
5.43 0.03 0.04 0.10
10.01** 0.00 0.06** 0.16*
3.56 0.05 0.02 0.07
1.71 0.02 0.00 0.07
4.64 0.05 0.03 0.07
16.78** 0.01 0.11** 0.22**
4.88 0.02 0.03 0.08
12.96** 0.07 0.07* 0.20**
4.21 0.05 0.03 0.06
3.65 0.02 0.04 0.02
5.50 0.07 0.04 0.09
7.76 0.04 0.03 0.18
7.46 0.04 0.05 0.13
2.04 0.04 0.01 0.06
4.90 0.07 0.04 0.08
3.04 0.04 0.01 0.11
4.96 0.06 0.03 0.09
0.76 0.07* 0.01 0.01
4.94 0.03 0.03 0.10
1.20 0.03 0.02 0.02
3.72 0.05 0.03 0.07
3.36 0.06* 0.01 0.05 812.66 (p < 0.01)
2.18 0.03 0.01 0.04
3.29 3.35 0.03 0.03 0.04 0.02 0.04 0.05 792.23 (p < 0.01)
0.44 1.63 0.03 0.02 0.00 0.01 0.05 0.04 875.268 (p < 0.01)
Significant at 5% level. Significant at 1% level.
estimates for parameters of the covariate dependent Markov models for different types of transitions. The results are displayed for first, second and third orders of Markov models. From the results, we observe that Dhaka, Chittagong and Rajshahi Districts satisfy the first order Markov models. The estimates of parameters for the first order transitions indicate consistent pattern of positive association between transition from no rain to rain and wind speed, humidity and maximum temperature for the three selected stations, whereas continued rain seems to be associated negatively with wind speed and humidity and positively associated with maximum temperature (non-significant for Dhaka). Second order Markov models have four models for transition types 0–0–1, 1–0–1, 0–1–0, and 1–1–0. The model of the transition type 0–0–1 represents rainfall after two consecutive days of no
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
487
rain. It is observed that wind speed, humidity and maximum temperature are positively associated with occurrence of rainfall after two consecutive days of no rain. These results are similar for all the selected stations, except for non-significance for maximum temperature in Chittagong district. The transitional model of the type 1–0–1 shows that wind speed is positively associated for all the selected stations, in addition, humidity is positively associated with occurrence of rain in Chittagong and Rajshahi after rain and no rain in previous two consecutive days, and furthermore, maximum temperature is associated positively in only Chittagong. It is noteworthy that transition type 0–1–0 is associated negatively with wind speed for Chittagong and Rajshahi and humidity is negatively associated for Chittagong only. Only wind speed appears to have negative association with transition type 1–1–0 for rainfall stations Chittagong and Rajshahi. It is clearly evident that among the eight third order transition type models, only 0–0–0–1 has some significant association with wind speed, humidity and maximum temperature for all the rainfall stations with some minor exception. In addition, transition types 1–0–0–1, 0–1–0–1, 0–0–1–1 and 1–1–0–1 display associations partly for one or two stations indicating that third order models do not explain the occurrence or non-occurrence of rainfall strongly based on outcomes of three previous days. Transition types 1–0–1–0, 0–1–1–0 and 1–1–1–0 do not show substantial associations with the selected variables with two minor exceptions for Chittagong and Rajshahi.
6. Conclusion The Markov models can be employed for analyzing time series and longitudinal data emerging in various fields of research and applications. In the past, the transition probabilities were estimated for explaining the pattern of underlying relationships among different states for both short and long terms. Recently, there is a renewed interest in analyzing the longitudinal data by using Markov models in the presence of covariates. A model for covariate dependence in Markov models was proposed by Muenz and Rubinstein [19]. In this paper, a general procedure is shown for higher order Markov models with covariate dependence. This extended procedure for covariate dependent Markov chain models of higher order can be used to a wide range of applications in various fields with important implications. The usefulness of the proposed model is illustrated in this paper with rainfall data from three stations in Bangladesh. The models are fitted up to third order Markov models in order to examine the relationship between selected covariates (wind speed, humidity, and maximum temperature) and different types of transitions based on the order Markov chains. The estimates of parameters for the first order transitions indicate consistent pattern of positive association between transition from no rain to rain and wind speed, humidity and maximum temperature for the three selected stations, whereas continued rain seems to be associated negatively with wind speed and humidity and positively associated with maximum temperature. Similarly, second order models reveal some additional features in addition to clear evidence of variations by stations as well for some transition types under consideration. Third order models are also fitted for the same stations but those do not support any strong relationship between third order transition and the selected variables for most of the transition types. The application of the proposed model indicates that the higher order covariate dependent Markov models can be conveniently employed in a very useful manner and the results can provide
488
M. Ataharul Islam, R.I. Chowdhury / Applied Mathematical Modelling 30 (2006) 477–488
in-depth insights for both the researchers and policymakers to resolve complex problems of underlying factors attributing to different types of transitions, reverse transitions and repeated transitions. The estimation and test procedures can be employed for any order of Markov model without making the theory and interpretation difficult for the common users.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
D.R. Cox, H.D. Miller, The Theory of Stochastic Processes, Methuen, London, 1965. J.G. Kemeny, J.L. Snell, Finite Markov Chains, Springer, New York, 1976. C.L. Chiang, An Introduction to Stochastic Processes and Their Applications, Wiley, New York, 1980. S. Karlin, H.M. Taylor, A Second Course in Stochastic Processes, Academic Press, New York, 1981. A.E. Raftery, A model for high-order Markov chains, Journal of the Royal Statistical Society, Series B 47 (1985) 528–539. A. Raftery, S. Tavare, Estimating and modeling repeated patterns in higher order Markov chains with the mixture transition distribution model, Applied Statistics 43 (1) (1994) 179–199. A. Berchtold, A.E. Raftery, The mixture transition distribution model for high-order Markov chains and nonGaussian time series, Statistical Science 17 (2002) 328–356. G.G.S. Pegram, An autoregressive model for multilag Markov chains, Journal of Applied Probability 17 (1980) 350–362. P.S. Albert, A Markov model for sequence of ordinal data from a relapsing-remitting disease, Biometric 50 (1994) 51–60. P.S. Albert, M.A. Waclawiw, A two state Markov chain for heterogeneous transitional data: a quasilikelihood approach, Statistics in Medicine 17 (1998) 1481–1493. M.H. Regier, A two state Markov model for behavior change, Journal of American Statistical Association 63 (1968) 993–999. R. Prentice, L. Gloeckler, Regression analysis of grouped survival data with application to breast cancer data, Biometrics 34 (1978) 57–67. E.L. Korn, A.S. Whittemore, Methods of analyzing panel studies of acute health effects of air pollution, Biometrics 35 (1979) 795–802. M. Wu, J.H. Ware, On the use of repeated measurements in regression analysis with dichotomous responses, Biometrics 35 (1979) 513–522. J.D. Kalbfleisch, J.F. Lawless, The analysis of panel data under a Markov assumption, Journal of American Statistical Association 88 (1985) 863–871. A. Azzalini, Logistic regression for autocorrelated data with application to repeated measures, Biometrika 81 (1994) 767–775. P.J. Heagerty, S.L. Zeger, Marginalized multi-level models and likelihood inference (with Discussion), Statistical Science 15 (2000) 1–26. P.J. Heagerty, Marginalized transition models and likelihood inference for longitudinal categorical data, Biometrics 58 (2002) 342–351. L.R. Muenz, L.V. Rubinstein, Markov models for covariate dependence of binary sequences, Biometrics 41 (1985) 91–101. R.I. Chowdhury, M.A. Islam, M.A. Shah, N. Al-Enezi, A computer program to estimate the parameters of covariate dependence higher order Markov model, Computer Methods and Program in Biomedicine 77 (2005) 175–181.