A convenient method for the formation of internucleotide linkage

Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi...

Download PDF

704KB Sizes 0 Downloads 67 Views

Report

PDF Reader
Full Text

Communications in Statistics - Simulation and Computation

ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20

Some New Methods to Solve Multicollinearity in Logistic Regression Yasin Asar To cite this article: Yasin Asar (2015): Some New Methods to Solve Multicollinearity in Logistic Regression, Communications in Statistics - Simulation and Computation, DOI: 10.1080/03610918.2015.1053925 To link to this article: http://dx.doi.org/10.1080/03610918.2015.1053925

Accepted online: 31 Aug 2015.

Submit your article to this journal

Article views: 10

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lssp20 Download by: [University of Tasmania]

Date: 01 October 2015, At: 02:19

ACCEPTED MANUSCRIPT Some New Methods to Solve Multicollinearity in Logistic Regression Yasin Asar Department of Mathematics & Computer Science, Necmettin Erbakan University, Konya 42090, Turkey, [email protected] Abstract

Downloaded by [University of Tasmania] at 02:19 01 October 2015

The binary logistic regression is a widely used statistical method when the dependent variable is binary or dichotomous. In some of the situations of logistic regression, independent variables are collinear which leads to the problem of multicollinearity. It is known that multicollinearity affects the variance of maximum likelihood estimator (MLE) negatively. Thus this paper introduces new methods to estimate the shrinkage parameters of Liu-type logistic estimator proposed by Inan and Erdogan (2013) which is a generalization of the Liu-type estimator defined by Liu (2003) for the linear model. A Monte Carlo study is used to show the effectiveness of the proposed methods over MLE using the mean squared error (MSE) and mean absolute error (MAE). A real data application is illustrated to show the benefits of new methods. According to the results of the simulation and application proposed methods have better performance than MLE. Keywords: Logistic regression, Liu-type estimators, Multicollinearity, MSE, MLE

1 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT 1. Introduction Binary logistic regression is one of the most used model when the outcome variable is dichotomous or binary i.e., it has two different categories. In the area of applied sciences such as criminology, business and finance, engineering, biology, health policy, biomedical research, ecology, and linguistics, it is a very popular model when modeling binary data (Hosmer et al.

Downloaded by [University of Tasmania] at 02:19 01 October 2015

2013). The explanatory variables may be intercorrelated which is the problem of multicollinearity which is also a problem in applied sciences. For example, the financial ratios are well known that they are highly correlated. Therefore this paper introduces new methods to overcome the multicollinearity problem. Now, consider the binary logistic regression model such that the dependent variable is distributed as Bernoulli Be  Pi  where Pi is the i

eX  element of the vector P  such that 1 eX 

th

i  1, 2,..., n where X is an n  ( p  1) data matrix having p independent variables and  is

the coefficient vector of ( p  1) 1. The most common method of estimating the coefficients is to use the maximum likelihood estimator (MLE) which can be obtained by using the iteratively weighted least squares (IWLS) algorithm as follows:



ˆ ˆMLE  X WX



1

ˆˆ X Wz

 

where Wˆ  diag Pˆi 1  Pˆi

(1.1)



and

y  Pˆ zˆi  log Pˆi  i i Pˆi 1  Pˆi

 





is the i th element of the vector zˆ .

2 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT ˆ MSE and matrix mean squared error (MMSE) of an estimator  of  can be obtained

respectively by

  

    tr  Var  ˆ    Bias  ˆ  Bias  ˆ  ,

Downloaded by [University of Tasmania] at 02:19 01 October 2015

 MSE ˆ  E ˆ   ˆ  

(1.2)

  

 ˆ      Var  ˆ   Bias  ˆ  Bias  ˆ 

MMSE ˆ  E ˆ  

where tr is the trace of a matrix,

(1.3)

  is the variance matrix and

Var ˆ

   

Bias ˆ  E ˆ  

is the

ˆ bias of the estimator  .

We can compute MSE and MMSE of ˆMLE by using (1.2) and (1.3)respectively and obtain









 

 ˆ MSE ˆMLE  E ˆMLE   ˆMLE    tr X WX



 

MMSE ˆMLE  E ˆMLE  

 ˆ

MLE





ˆ     X WX

1

p 1

 j 1

1

j

,

1

(1.4)

(1.5)

ˆ such that j  1, 2,..., p  1. where  j is the j th eigenvalue of the matrix X WX

3 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT ˆ become close to dependent. When there is multicollinearity, columns of the matrix X WX

ˆ become close to zero. Thus, MSE of MLE is It implies that some of the eigenvalues of X WX inflated so that one cannot obtain stable estimations. In order to overcome this problem, we can use biased estimators. The well-known ridge regression firstly defined by Hoerl and Kennard (1970) has been generalized to logistic regression

Downloaded by [University of Tasmania] at 02:19 01 October 2015

model by Schaefer et al. (1984) successfully. In Månsson and Shukur (2011) and Kibria et al. (2012), the authors investigated the performances of some ridge estimators firstly defined in linear model by Kibria (2003), Muniz and Kibria (2009), Muniz et al. (2012) and Mansson et al. (2010) to the binary logistic regression model using the logistic ridge estimator (LRE) defined by Schaefer et al. (1984). Liu-type estimators (LLT) has been adjusted to binary logistic model by Inan and Erdogan (2013) to solve the multicollinearity problem and decrease the variance so that the estimations become stable. This paper proposed some new methods of estimating the shrinkage parameter to be used in LLT in order to combat multicollinearity in binary logistic regression model. LLT with new estimators are expected to perform better than MLE when the explanatory variables are correlated. Moreover, we give a matrix mean squared error comparison between the estimators and conduct a Monte Carlo simulation to evaluate the performances of the estimators using the MSE and mean absolute error (MAE). This paper is organized as follows: In section 2, theory and proposed methods are developed. Details of Monte Carlo experiment are given in section 3. We provide the results and

4 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT discussions of the simulation in section 4. In section 5, a real data example is given. Finally, we provide a brief summary and conclusion section.

2. New Estimators and MSE Properties In order to overcome the problem of multicollinearity, logistic Liu-type estimator (LLT) is

Downloaded by [University of Tasmania] at 02:19 01 October 2015

defined by Inan and Erdogan (2013). LLT is a logistic generalization of Liu-type estimator defined by Liu (2003) for the linear model. LLT can be written as follows:

ˆLT 1   S  kI 

1

 S  dI  ˆMLE

(2.1)

ˆ , k  0 and   d   , Wˆ is the iteratively weighted covariance matrix, where S  X WX and I is the identity matrix of order p  1 . We make



a transformation in order to present the explicit form of the functions







MSE ˆLLT and MMSE ˆLLT . Let

  Q

and

ˆ QX WXQ    diag  1 , 2 ,...,  p 1 

where 1  2  ...   p 1  0 and Q is the matrix whose columns are the eigenvectors of the

ˆ .The bias and variance of LLT are obtained as follows: matrix X WX





b  Bias ˆLLT    d  k  Q k1 ,



(2.2)



Var ˆLLT  Q k1*d  1*d  k1 Q

(2.3)

where k    kI and *d    dI . MMSE and MSE of LLT can be obtained respectively as follows:

5 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT





MMSE ˆLLT  Q k 1*d  1*d  k1 Q bb ,



MSE ˆLLT



(2.4)

    d 2  p 1   d  k 2  2  j j       2 2  j 1      k   j 1     k  j  j j     f1  k , d   f 2  k , d  p 1

(2.5)

Downloaded by [University of Tasmania] at 02:19 01 October 2015

where the first term in Equation (2.5) is the asymptotic variance and the second term is the squared bias of the estimator. Thus, one should choose suitable values of k and d such that the decrease in the variance is greater than the increase in the bias term. 2.1. MMSE and MSE comparisons of MLE and LLT In this subsection, some theorems regarding MMSE and MSE comparisons of the estimators MLE and LLT are given. Theorem 2.2 gives the necessary and sufficient condition that the MMSE difference of MLE and LLT is positive definite (p.d.). If ˆ1 and ˆ2 are two estimators of the coefficient vector, then ˆ2 is superior to ˆ1 if and

 

 

 

 

only if (iff) MMSE 1  MMSE 2  0 . It is proved that if MMSE 1  MMSE 2

 

is a

 

non-negative definite matrix (n.n.d.), then MSE 1  MSE 2  0 , see Theobald (1974). But the converse is not always true. Now, the following theorem is used to compare the estimators in the sense of MMSE theoretically.

6 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT Lemma 2.1 (Trenkler and Toutenburg, 1990): Let ˆ1 and ˆ2 are two estimators of the coefficient

 

 

vector  . Moreover, let D  Var 1  Var 2

 

 

 

be a p.d. matrix, a1  Bias 1

 

and

a2  Bias  2 . Then, MMSE 1  MMSE 2  0 iff a2  D  a1a1  a2  1.

Downloaded by [University of Tasmania] at 02:19 01 October 2015

Theorem 2.2: Let





 d  k   2 j  k  d   0



1

, j  1, 2,..., p  1





and b  Bias ˆLLT . Then



1 MMSE ˆMLE  MMSE ˆLLT  0 iff b   1  k 1*d  1*d  k 1  b  1.

Proof: Let D  Var  MLE   Var  LLT   QQ  Q k 1*d  1*d  k 1Q which is equal to the p 1

2   j  d    1 1 1 * 1 * 1 Q . following: D  Q     k  d   d  k  Q  Q diag   2   j  j   j  k   j 1

Now, the matrix 1  k 1*d  1*d  k1 is p.d. if



 k     j  d   0 which is 2

j

2

equivalent to   j  k     j  d    j  k     j  d   0 . Simplifying the last inequality, one gets  d  k   2 j  k  d   0 . Thus D is positive and the proof is finished by lemma 2.1. 2.2. New methods to choose k and d A new iterative method is proposed to choose the values of the parameters k and d in this subsection. It is expected that the following method yields a much smaller MSE value for the estimator LLT: First, following Hoerl and Kennard (1970), differentiating the equation (2.5) with respect to the parameter k , it is easy to obtain the following equation:

7 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT



MSE ˆLLT k



 2    k    d 2 2  k  d     k  ˆ 2  2  k  d 2    k  ˆ 2  j j j j j j j     4 4 2  j 1     k   k   j  j  j   p 1

(2.6)

Equating the numerator of Equation (2.6) to zero gives

   j  d    jˆ 2j  k  d    j  d   0, j  1, 2,..., p  1

Downloaded by [University of Tasmania] at 02:19 01 October 2015

2

which can be simplified further and the individual parameter kˆLTj 1 can be obtained as

kˆLTj 1 

 j  d 1   jˆ 2j   jˆ 2j

, j  1, 2,..., p 1.

(2.7)

Since each term in Equation (2.7) should be positive, the numerator  j  d 1   jˆ 2j   0 . Hence

d

j 1   jˆ 2j

(2.8)

needs to be satisfied. After obtaining an initial value for the parameter d , estimate k using the following new estimators: We propose to estimate k by using arithmetic mean, geometric mean and median functions following Kibria (2003) as follows:

1 p  j  d 1   jˆ j  kˆAM   p j 1  jˆ 2j 2

(2.9)

8 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT which is the mean of kˆLTj 1 . 1/  p 1

kˆGM

 p 1   j  d 1   jˆ 2j         j 1   jˆ 2j   

(2.10)

Downloaded by [University of Tasmania] at 02:19 01 October 2015

which is the geometric mean of kˆLTj 1 .

  j  d 1   jˆ 2j    kˆMED  median     jˆ 2j  

(2.11)

which is the median of kˆLTj 1 . In addition, following Alkhamisi et al. (2006), we propose to use maximum and minimum functions to obtain the following new estimators:

  j  d 1   jˆ 2j   ˆ . kMAX  max     jˆ 2j  

(2.12)

and

  j  d 1   jˆ 2j   ˆ . kMIN  min     jˆ 2j  

(2.13)

Following Hoerl et al. (1975), we suggest to use harmonic mean of kˆLTj 1 and obtain the following new estimator:

9 ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT kˆHM 

p 1    jˆ 2j    ˆ 2j   j 1   j  d 1   j   p 1

.

(2.14)

Finally, we suggest two new estimators by using maximum and minimum eigenvalues and

Downloaded by [University of Tasmania] at 02:19 01 October 2015

canonical coefficients as follows:

kˆmin 

kˆmax 

2 min  d 1  minˆ min  2 minˆ min

(2.15)

,

2 max  d 1  maxˆ max 

(2.16)

2 maxˆ max

where max and min are the maximum and minimum eigenvalues of S and ˆ max and ˆ min are the maximum and minimum elements of ˆ .

3. Monte Carlo Simulation Study 3.1. Design of the simulation Design of the Monte Carlo simulation which is conducted to compare the performances of LLT and MLE is given in this section. The two criteria used to judge the performances are the (MSE) and (MAE) computed using Equations (3.1) and (3.2) respectively,

 

MSE  



        

5000 r 1

5000

,

(3.1)

ACCEPTED MANUSCRIPT

10

ACCEPTED MANUSCRIPT 5000

 

MAE  



 

r 1

5000

(3.2)

,

where   ˆLLT and   ˆMLE and . is the usual Euclidean distance. Let ˆ  QˆMLE , ˆ j be the j th element of ˆ . The effective factors in the simulation

Downloaded by [University of Tasmania] at 02:19 01 October 2015

are chosen to be the degree of correlation  2 among the explanatory variables, the sample size n and the number of explanatory variables p . Following McDonald and Galarneau (1975), Kibria (2003), Muniz and Kibria (2009) and Asar et al. (2014), the explanatory variables are generated by using the equation

xi j  1   2 

1/2

zi j   zip

(3.3)

where i  1, 2,, n, j  1, 2,... p  1, and  2 represents the correlation between the explanatory variables and zij ‘s are independent random numbers obtained from the standard normal distribution. The observations of the dependent variable are obtained from the Bernoulli distribution

Be  P  where

Pi 

e xi  1  e xi 

(3.4)

such that xi is the i th row of the matrix X . The parameter values of the coefficients  are chosen so that    1 due to Newhouse and Oman (1971).

ACCEPTED MANUSCRIPT

11

ACCEPTED MANUSCRIPT The sample size n is chosen to be 50,100, 200, the number of explanatory variables p varies between 4,8 and investigated degrees of correlation are 0.90,0.95,0.99. By using the shrinkage parameter d computed via (2.8) and new proposed biasing estimators of k defined in Subsection 2.2, processing in this manner, one can determine which of the estimators has better performance for different combinations of  n, p,  2  . Matlab R2013a is used to write the codes

Downloaded by [University of Tasmania] at 02:19 01 October 2015

and the convergence tolerance is set to 0.0000001 in IWLS algorithm. 3.2. Results and Discussion In this subsection, results of the Monte Carlo simulation are presented. The estimated MSE and MAE values of LLT and MLE are reported in Tables 3.1-3.4. According to the tables, all proposed methods have better performances than MLE such that LLT with new methods have less MSE value than that of MLE. If the sample size is increased, then MSE values of all estimators decrease. In other words, increasing the sample size makes a positive effect on the performances of the estimators. Increasing the sample size affects the performance of MLE positively. Moreover, if the degree of correlation is increased, MSE values of the estimators increase except for LLT with the estimators kˆAM and kˆMAX . There is a positive effect of correlation on these two estimators. This result can be observed from Figure 3.1. According to tables, new estimators can be categorized into two groups. kˆAM , kˆGM , kˆMED and kˆMAX are in one group such that the increase in the degree of correlation does not affect the performances of the estimators substantially. The other estimators are in the second group such that they are affected negatively from the increase in the degree of correlation. This result can be seen comparing the Figure 3.1 and

ACCEPTED MANUSCRIPT

12

ACCEPTED MANUSCRIPT 3.2. Moreover, the estimators considered in Figure 3.1 have better performances than the others considered in Figure 3.2. One can also conclude from Figure 3.2 that MSE of MLE is inflated especially when the sample size is low and the degree of correlation is high. Similarly, MSE of the estimators

kˆMIN , kˆHM , kˆmin and kˆmax are inflated for the same situation. However, increasing the sample size

Downloaded by [University of Tasmania] at 02:19 01 October 2015

has a great positive effect on these estimators. Furthermore, if the number of explanatory variables is increased, MSE values of all estimators are increased. Although, the estimators given in Figure 3.1 are affected slightly from the increase in the number of independent variables, MSE of the others are inflated by the increase in the number of independent variables. According to the tables, it may be concluded that kˆGM has the best performance when

 2  0.90 and  2  0.95 . However, kˆAM is the best estimator when  2  0.99 . One can obtain similar conclusions when MAE is used as a performance criterion. The only difference is that MAE values are quite smaller than MSE values for the same situation.

4. Application In this section, a data set taken from the Banks Association of Turkey is used regarding the Asian financial crisis and its affects in the collapse of commercial banks operating in Turkey. This data set is also used by Inan and Erdogan (2013). In period of collapse, Savings Deposit Insurance Fund (SDIF) took over some of the banks. In the application, a binary logistic regression is used to model the dependent variable which is whether the bank is taken over by SDIF or not. A bank is

ACCEPTED MANUSCRIPT

13

ACCEPTED MANUSCRIPT coded as zero if it is taken over and the other successful banks during that period are coded as one for the year 1999. The following financial ratios are used as independent variables: X1: (Shareholders’s Equity+Total Income) / (Deposits+Non-deposits Funds), X2: Net Working Capital / Total Assets, X3: Non-performing Loans / Total Loans, X4: Liquidity Assets / Total Assets, X5: Liquidity Assets / (Deposits+Non-deposits Funds), X6: Interest Income / Interest

Downloaded by [University of Tasmania] at 02:19 01 October 2015

Expenses and X7: Non-interest Income / Non-interest Expenses. The correlation matrix of the data is given in Table 4.1. According to Table 4.1, it is observed that some of the bivariate correlations are high such that 0.92 and 0.89. The eigenvalues of the matrix X X are 3.5658, 1.3654, 1.0109, 0.8126, 0.2022, 0.0364 and 0.0066. The condition number being a measure of the degree of collinearity is computed as   max / min  23.1578 which shows that there is a moderate multicollinearity problem with this data set. The estimated theoretical MSE values of the estimators are reported in Table 4.2. One can see from Table 4.2 that LLT used with the new methods has less MSE values than that of MLE and MSE of MLE is inflated. kˆMED has the lowest MSE value. Moreover, the coefficients, standard errors and the corresponding p-values of the model are given in Table 4.3. According to Table 4.3, it is seen that the standard errors of the coefficients regarding LLT with new methods are clearly smaller than that of MLE. Especially, the standard errors of kˆMAX are the lowest. Therefore, it can be concluded that LLT with new methods is more stable than MLE.

ACCEPTED MANUSCRIPT

14

ACCEPTED MANUSCRIPT Finally, a plot of the MSE values versus the changing values of the parameter k is given when 0  k  1 in Figure 4.1. According to Figure 4.1, MSE of LLT decreases for k  0.064 , increases otherwise and it is always less than that of MLE.

5. Conclusion

Downloaded by [University of Tasmania] at 02:19 01 October 2015

In this paper, logistic version of Liu-type estimator is considered to overcome the multicollinearity problem in the binary logistic regression. Some new methods to obtain shrinkage parameters are proposed using arithmetic mean, geometric mean, harmonic mean, median, maximum and minimum functions. A Monte Carlo simulation is designed to evaluate the performances of the estimators using the MSE and MAE. According to simulation study, new proposed methods have better performance than MLE.

kˆMED , kˆAM and kˆGM have less MSE values among the other estimators. Increasing the degree of correlation does not affect the estimators in the first group seriously. Thus they are advisable to the researchers when the sample size is low and the degree of correlation is high. Moreover, a real data application is illustrated to give a better understanding of the new methods. Results are consistent with the results of the simulation showing that new methods are effective in the presence of multicollinearity. Acknowledgments: The author wishes to thank the referees and the editor for their helpful suggestions and comments which helped to improve the quality of the paper.

ACCEPTED MANUSCRIPT

15

ACCEPTED MANUSCRIPT 6. References Alkhamisi, M., Khalaf, G., and Shukur, G. (2006). Some modifications for choosing ridge parameters. Communications in Statistics—Theory and Methods, 35(11), 2005-2020. Asar, Y., Karaibrahimoğlu, A., and Genç, A. (2014). Modified ridge regression parameters: A comparative Monte Carlo study. Hacettepe Journal of Mathematics and Statistics, 43(5),

Downloaded by [University of Tasmania] at 02:19 01 October 2015

827-841. Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67. Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975). Ridge regression: Some simulations. Communications in Statistics-Theory and Methods, 4(2), 105-123. Hosmer , D. W., Lemeshow, S., and Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398): John Wiley & Sons Inan, D., and Erdogan, B. E. (2013). Liu-type logistic estimator. Communications in Statistics-Simulation and Computation, 42(7), 1578-1586. Kibria, B. M. G. (2003). Performance of some new ridge regression estimators. Communications in Statistics-Simulation and Computation, 32(2), 419-435. Kibria, B. M. G., Mansson, K., and Shukur, G. (2012). Performance of some logistic ridge regression estimators. Computational Economics, 40(4), 401-414. Liu, K. (2003). Using Liu-type estimator to combat collinearity. Communications in Statistics-Theory and Methods, 32(5), 1009-1020. Månsson, K., and Shukur, G. (2011). On ridge parameters in logistic regression. Communications in Statistics-Theory and Methods, 40(18), 3366-3381.

ACCEPTED MANUSCRIPT

16

ACCEPTED MANUSCRIPT Mansson, K., Shukur, G., and Kibria, B. G. (2010). On some ridge regression estimators: A Monte Carlo simulation study under different error variances. Journal of Statistics, 17(1), 1-22. McDonald, G. C., and Galarneau, D. I. (1975). A Monte Carlo evaluation of some ridge-type estimators. Journal of the American Statistical Association, 70(350), 407-416. Muniz, G., and Kibria, B. M. G. (2009). On some ridge regression estimators: An empirical

Downloaded by [University of Tasmania] at 02:19 01 October 2015

comparisons. Communications in Statistics—Simulation and Computation®, 38(3), 621-630. Muniz, G., Kibria, B. M. G., Mansson, K., and Shukur, G. (2012). On developing ridge regression parameters: a graphical investigation. Sort-Statistics and Operations Research Transactions, 36(2), 115-138. Newhouse, J. P., and Oman, S. D. (1971). An evaluation of ridge estimators. Rand Corporation(P-716-PR), 1-16. Schaefer, R. L., Roi, L. D., and Wolfe, R. A. (1984). A ridge logistic estimator. Communications in Statistics-Theory and Methods, 13(1), 99-113. Theobald, C. (1974). Generalizations of mean square error applied to ridge regression. Journal of the Royal Statistical Society. Series B (Methodological), 103-106. Trenkler, G., and Toutenburg, H. (1990). Mean squared error matrix comparisons between biased estimators—an overview of recent results. Statistical Papers, 31(1), 165-179.

ACCEPTED MANUSCRIPT

17

ACCEPTED MANUSCRIPT

Table 6.1. MSE values of estimators when p=4

Downloaded by [University of Tasmania] at 02:19 01 October 2015

2

0.90

0.95

0.99

n

50

100

200

50

100

200

50

100

200

kˆAM

0.4450

0.3711

0.3014

0.4528

0.3512

0.2529

0.3633

0.2728 0.2587

kˆGM

0.3719

0.1958

0.1403

0.4826

0.2847

0.1516

0.5066

0.3342 0.2687

kˆMED

0.4525

0.2370

0.1394

0.5430

0.2859

0.1433

0.6656

0.3342 0.2294

kˆMAX

0.6214

0.5576

0.4471

0.5784

0.4517

0.3594

0.4865

0.3595 0.3262

kˆMIN

2.7500

0.4930

0.2220

3.8414

1.0565

0.3903

14.6935

4.1562 2.2357

kˆHM

1.4720

0.3012

0.1566

2.0150

0.5892

0.2304

6.7202

1.8670 1.0406

kˆmax

1.5709

0.4201

0.2004

2.0865

0.6519

0.3703

8.0360

1.8541 2.1851

kˆmin

1.8603

0.4168

0.2232

2.4602

0.8543

0.3559

8.0166

3.0219 0.4590

MLE

5.1524

1.0366

0.4344

7.3625

2.3202

0.9001

31.8218

10.1260 5.2321

ACCEPTED MANUSCRIPT

18

ACCEPTED MANUSCRIPT Table 6.2. MSE values of estimators when p=8

2

Downloaded by [University of Tasmania] at 02:19 01 October 2015

n

0.90 50

0.95

100

200

50

0.99

100

200

50

100

200

kˆAM

0.6657

0.5403

0.4697

0.6200

0.4565

0.4381

0.5362

0.3702

0.3396

kˆGM

0.4432

0.2861

0.2111

0.4950

0.2990

0.2847

0.9000

0.6993

0.4989

kˆMED

0.4600

0.3004

0.2409

0.5165

0.3233

0.3555

1.0053

0.7764

0.5465

kˆMAX

0.8291

0.7583

0.7077

0.7867

0.6718

0.6482

0.7042

0.5463

0.4756

kˆMIN

121.1414

1.5931

0.8016 247.6495

4.6483

1.4755

631.5407

36.2478

8.0369

kˆHM

30.8679

0.6941

0.4149

61.7599

1.7583

0.7045

194.1220

13.7543

2.8939

kˆmax

62.8914

1.2029

0.6667 129.2836

2.9743

1.2956

323.6374

22.8857

5.9353

kˆmin

56.4873

0.9990

0.5979 115.2998

2.7501

0.9803

324.4524

21.1242

4.2694

MLE

234.3063

3.1156

1.4587 491.0497

8.7549

2.9104 1081.9348

63.7842 17.4108

ACCEPTED MANUSCRIPT

19

ACCEPTED MANUSCRIPT Table 6.3. MAE values of estimators when p=4

2

Downloaded by [University of Tasmania] at 02:19 01 October 2015

n

0.90 50

0.95

100

200

50

0.99

100

200

50

100

200

kˆAM

0.6401

0.5700

0.4962

0.6555

0.5584

0.4561

0.5760

0.4905

0.4706

kˆGM

0.5806

0.4221

0.3561

0.6596

0.5029

0.3656

0.6490

0.5267

0.4745

kˆMED

0.6308

0.4622

0.3560

0.6828

0.4994

0.3555

0.6881

0.5095

0.4415

kˆMAX

0.7755

0.7247

0.6212

0.7471

0.6355

0.5464

0.6769

0.5729

0.5262

kˆMIN

1.2763

0.6143

0.4277

1.5067

0.8649

0.5371

2.5165

1.6344

1.2183

kˆHM

0.9543

0.4890

0.3655

1.1033

0.6621

0.4246

1.6331

1.0653

0.8218

kˆmax

1.0061

0.5850

0.4093

1.1036

0.7124

0.5376

1.7716

1.0901

1.2047

kˆmin

1.0453

0.5752

0.4300

1.1834

0.7787

0.5380

1.6971

1.2645

0.5826

MLE

1.8801

0.9256

0.6032

2.2780

1.3644

0.8651

4.4159

2.8564

2.0609

ACCEPTED MANUSCRIPT

20

ACCEPTED MANUSCRIPT Table 6.4. MAE values of estimators when p=8

2

Downloaded by [University of Tasmania] at 02:19 01 October 2015

n

0.90 50

0.95

100

200

50

0.99

100

200

50

100

200

kˆAM

0.8023

0.7104

0.6518

0.7719

0.6477

0.6307

0.7080

0.5777

0.5465

kˆGM

0.6549

0.5234

0.4496

0.6842

0.5266

0.5122

0.8584

0.7449

0.6347

kˆMED

0.6643

0.5327

0.4755

0.6904

0.5446

0.5709

0.8550

0.7453

0.6613

kˆMAX

0.9039

0.8583

0.8228

0.8773

0.7999

0.7820

0.8209

0.7096

0.6520

kˆMIN

7.2684

1.1006

0.8282

10.5718

1.5676

1.1162

15.9401

4.8263

2.4800

kˆHM

3.6646

0.7394

0.6022

5.2347

0.9661

0.7714

8.6724

2.8663

1.4605

kˆmax

4.3433

0.9712

0.7643

6.2724

1.3301

1.0515

9.8672

3.8417

2.0498

kˆmin

4.0479

0.8624

0.7092

5.8160

1.1377

0.8927

9.5576

3.2132

1.7054

MLE

10.3230

1.6231

1.1490

15.2626

2.3946

1.6221

22.1051

6.8895

3.8649

ACCEPTED MANUSCRIPT

21

ACCEPTED MANUSCRIPT

Downloaded by [University of Tasmania] at 02:19 01 October 2015

Table 6.5. The correlation matrix of the data X1

X2

X3

X4

X5

X6

X7

X1

1.0000

0.9275

-0.6491

0.2531

0.5794

0.6134

0.3662

X2

0.9275

1.0000

-0.4592

0.2078

0.5611

0.5605

0.4568

X3

-0.6491 -0.4592

1.0000

-0.3151 -0.3453 -0.2107

0.0174

X4

0.2531

0.2078

-0.3151

1.0000

0.8997

0.1916

-0.0216

X5

0.5794

0.5611

-0.3453

0.8997

1.0000

0.4696

0.1926

X6

0.6134

0.5605

-0.2107

0.1916

0.4696

1.0000

-0.0308

X7

0.3662

0.4568

0.0174

-0.0216

0.1926

-0.0308

1.0000

ACCEPTED MANUSCRIPT

22

ACCEPTED MANUSCRIPT Table 6.6. The estimated theoretical MSE values of LLT used with new methods and MLE

kˆGM

kˆMED

kˆMAX

kˆMIN

kˆHM

kˆmax

kˆmin

MLE

126.205

106.480

105.438

137.395

132.416

107.437

132.416

105.610

1905.23

5

6

8

8

8

8

8

5

6

Downloaded by [University of Tasmania] at 02:19 01 October 2015

kˆAM

ACCEPTED MANUSCRIPT

23

ACCEPTED MANUSCRIPT Table 6.7 The coefficients, standard errors and corresponding p-values of the model

Downloaded by [University of Tasmania] at 02:19 01 October 2015

Coefficients

kˆAM

kˆGM

kˆMED

kˆMAX

kˆMIN

beta1

0.6762

2.3001

2.5540

0.1315

3.1676

beta2

0.6394

2.2933

2.5895

0.1232

beta3

-0.3125

-1.2300

-1.4214

beta4

0.2557

0.1933

beta5

0.4872

beta6 beta7

kˆHM

kˆmax

kˆmin

MLE

3.0838

3.1676

2.4949

-1.6266

4.2304

3.3420

4.2304

2.5186

6.5176

-0.0586

-3.0354

-1.9930

-3.0354

-1.3743

-6.2600

0.1325

0.0694

-0.2162

-0.0361

-0.2162

0.1476

-3.9726

1.0523

1.0959

0.1122

1.1962

1.1632

1.1962

1.0865

5.3006

0.3668

0.9529

1.0148

0.0786

1.2461

1.1244

1.2461

1.0010

1.9241

0.3437

1.2875

1.4438

0.0632

2.1603

1.7988

2.1603

1.4071

2.9381

Standard Errors

kˆAM

kˆGM

kˆMED

kˆMAX

kˆMIN

kˆHM

kˆmax

kˆmin

MLE

beta1

0.0420

0.1568

0.1795

0.0080

0.4568

0.2503

0.4568

0.1739

2.5793

beta2

0.0408

0.1719

0.2025

0.0076

0.5292

0.3025

0.5292

0.1948

1.9947

beta3

0.0244

0.1262

0.1570

0.0043

0.6404

0.2801

0.6404

0.1490

1.9720

beta4

0.0623

0.1637

0.1774

0.0137

0.2718

0.2108

0.2718

0.1742

3.5039

beta5

0.0572

0.1372

0.1464

0.0128

0.2361

0.1693

0.2361

0.1443

4.0708

beta6

0.0619

0.2147

0.2438

0.0122

0.4580

0.3280

0.4580

0.2367

1.0607

beta7

0.0580

0.2209

0.2535

0.0109

0.4825

0.3467

0.4825

0.2456

0.9318

kˆHM

kˆmax

kˆmin

MLE

p-values

kˆAM

kˆGM

kˆMED

kˆMAX

kˆMIN

ACCEPTED MANUSCRIPT

24

Downloaded by [University of Tasmania] at 02:19 01 October 2015

ACCEPTED MANUSCRIPT beta1

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.2662

beta2

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0012

beta3

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0016

beta4

0.0001

0.1229

0.2301

0.0000

0.2158

0.4325

0.2158

0.2013

0.1323

beta5

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.1007

beta6

0.0000

0.0000

0.0001

0.0000

0.0050

0.0008

0.0050

0.0001

0.0391

beta7

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0017

ACCEPTED MANUSCRIPT

25

Downloaded by [University of Tasmania] at 02:19 01 October 2015

ACCEPTED MANUSCRIPT

Figure 6.1. The estimated MSE values of estimators when p=4

ACCEPTED MANUSCRIPT

26

Downloaded by [University of Tasmania] at 02:19 01 October 2015

ACCEPTED MANUSCRIPT

Figure 6.2. The estimated MSE values of estimators when p=4

ACCEPTED MANUSCRIPT

27

Downloaded by [University of Tasmania] at 02:19 01 October 2015

ACCEPTED MANUSCRIPT

Figure 6.3. MSE values of LLT and MLE versus the parameter k

ACCEPTED MANUSCRIPT

28

A convenient method for the formation of internucleotide linkage

A convenient method for the formation of internucleotide linkage

Recommend Documents