A new class of score generating functions for regression models

A new class of score generating functions for regression models

Statistics & Probability Letters 57 (2002) 205–214 A new class of score generating functions for regression models & & urkb; ∗ Young Hun Choia , Omer...

127KB Sizes 2 Downloads 153 Views

Statistics & Probability Letters 57 (2002) 205–214

A new class of score generating functions for regression models & & urkb; ∗ Young Hun Choia , Omer Ozt& a

b

Department of Statistics, Hanshin University, South Korea 447-791 Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, Oh 43210, USA Received June 2001; received in revised form December 2001

Abstract In this paper we introduce a new score generating function for the rank dispersion function in a multiple linear regression model. The score function compares the rth and sth power of the tail probabilities of the underlying probability model. We show that the rank estimator of the regression parameter based on the proposed score function converges asymptotically to a multivariate normal distribution. Further, we discuss the selection of the appropriate r and s to improve the e5ciency of the rank estimate of the regression parameter. It is shown that for right- (left-) skewed distributions the values of r ¡ s (s ¡ r) provide higher c 2002 Elsevier Science B.V. All rights reserved. e5ciency than the Wilcoxon scores.  Keywords: Wilcoxon score; Rank estimate; Asymptotic normality; Pitman e5ciency; Score selection

1. Introduction In the last three decades, considerable work on the rank based estimators has been pursued for the linear regression model (see for example, Jure;ckov
Corresponding author. Tel.: +1-614-292-3346; fax: +1-614-292-2096. & Ozt& & urk). E-mail address: [email protected] (O.

c 2002 Elsevier Science B.V. All rights reserved. 0167-7152/02/$ - see front matter  PII: S 0 1 6 7 - 7 1 5 2 ( 0 2 ) 0 0 0 6 1 - 5

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

206

rth and sth power of the left tail probabilities of the underlying distribution. On the other hand, & urk and Hettmansperger (1997) state that if there is no knowledge about the outlier pattern, or Ozt& if the sample has both small and large outliers, the weight function that controls the robustness and & urk e5ciency of the estimator must reEect both on right and left tail probabilities. Therefore, Ozt& (2001) considered another class of Mann–Whitney–Wilcoxon test statistic by incorporating both right and left tail behavior of the underlying distributions. & urk and The main purpose of this paper is to introduce the weight function presented in Ozt& & Hettmansperger (1997) and Ozt&urk (2001) into the rank estimate of the regression parameters in & urk and Hettmansperger (1997) is embedded into the score linear models. The weight function of Ozt& generating function of rank dispersion function which produces similar results as in the minimum distance estimators. In Section 2, we propose our new score generating function. We deKne the dispersion function Dr; s (), see for example Eq. (2), and show that its minimizer ˆr; s converges to a multivariate normal distribution. In Section 3, we compare the e5ciency of the rank estimator based on our proposed score generating function with the e5ciency of the rank estimator based on the Wilcoxon scores and McKean and Sievers (1989) scores, respectively. In Section 4, we provide guidance for the selection of r and s that provides improvement over the Wilcoxon scores.

2. Assumption and score function Consider the linear regression model yi = + xi  + ei , i = 1; : : : ; n, where xi and  are p × 1 vectors of explanatory variables and unknown regression parameters, respectively, and ei is a random variable with density f and distribution function F with F(0) = 1=2. In this model, we consider the rank regression estimate of the regression parameter . In its general form, Jaeckel’s (1972) rank dispersion function can be stated as D() =

n  i=1

(yi − xi ) a[R(yi − xi )];

 where a(1) 6 a(2) 6 · · · 6 a(n) is a set of scores generated by a(i) = (i=(n + 1)) and ni a(i) = 0. We assume that the score generating function (u) is a nondecreasing, square-integrable and bounded function on (0; 1). Under fairly general conditions, the minimizer of D() produces a robust estimator in y-space. The property of such estimator is studied in detail for a general score function (·) in Jure;ckov
. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

207

We now introduce a class of score function to improve the e5ciency of the rank regression estimator. Let   1 1 1 r s r; s (u) = √ − (1 − u) + ; u − !r; s r+1 s+1 r  s   i i 1 1 1 ar; s (i) = √ − 1− ; − + !r; s n+1 r+1 n+1 s+1 where r, s are nonnegative and at least one of them is positive, !r; s =

(r + 1)(s + 1) r2 s2 2 −2 + + (2r + 1)(r + 1)2 (2s + 1)(s + 1)2 (r + 1)(s + 1) (r + s + 2)

and (·) is the gamma function. DeKne the dispersion function n  Dr; s () = ei ar; s [R(ei )];

(1)

(2)

i=1

where R(ei ) denotes the rank of ei = yi − xi . Then  can be estimated by the rank estimator ˆr; s which minimizes Dr; s (). We Krst look at the existence of such estimator.  √ Lemma 1. The function er; s(1) = (1= !r; s )(n + 1)−r ni=1 ei [Rr (ei ) − (r)] is nonnegative and satis9es the property of triangle inequality; where (r) = ni=1 ir =n; ei = yi − xi  and !r; s is de9ned in Eq. (1). By using Lemma 1, we state the following theorem. Theorem 1. The dispersion  function Dr; s () is a nonnegative and convex function of ; where (r) = ni=1 ir =n; (s) = ni=1 is =n; ei = yi − xi ; !r; s is de9ned in Eq. (1) and 1 Dr; s () = √ !r; s



n n   1 1 ei {Rr (ei ) − (r)} − ei {(n + 1 − R(ei ))s − (s)} : (n + 1)r i=1 (n + 1)s i=1

We note that our score generating function ar; s (·) satisKes the condition (A2). Conditions (A1), (A3) and (A4) are the standard conditions for rank regression estimates. Therefore, following the developments in Hettmansperger and McKean (1998) or Heiler and Willers (1988), we state the following theorem. Theorem 2. Let ˆr; s be a rank estimator which minimizes the dispersion function Dr; s () de9ned in Theorem 1 and let 0 be the true regression parameter value. Then under assumptions (A1) – (A4);   √ !r; s −1 d ˆ ; n(r; s − 0 ) → Z ∼ MVN 0;  r; s

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

208

where (r + 1)(s + 1) r2 s2 2 −2 ; + + 2 2 (2r + 1)(r + 1) (2s + 1)(s + 1) (r + 1)(s + 1) (r + s + 2) 

2 r; s = [rF r −1 (t) + s(1 − F(t))s−1 ]f2 (t)dt and  = lim n−1 X X: !r; s =

n→∞

3. Asymptotic eciencies In this section, we compare the e5ciency of the proposed score function with respect to the Wilcoxon scores. The asymptotic variance of the rank estimate of  based on the Wilcoxon scores, denoted here as v(ˆ1; 1 ), is given by 1   −1 : v(ˆ1; 1 ) = 2 12[ f (x) d x]2 Then the asymptotic relative e5ciency of our estimator ˆr; s with respect to ˆ1; 1 is expressed as 1=p  

2 1=p |v(ˆr; s )| (!r; s =r; s )p |−1 | !r; s 2  ARE(11; rs) = = = 12 f (x) d x : r; s (12[ f2 (x) d x]2 )−p |−1 | |v(ˆ )| 1; 1

The asymptotic e5ciencies ARE(11; rs), where ARE(11; rs) ¡ 1 implies that the e5ciency of our score function is superior to that of the Wilcoxon scores, are discussed below for several distributions, such as uniform, exponential, double exponential, normal, Cauchy, lognormal, contaminated normal and generalized F-distributions. 3.1. Uniform distribution Let f(t) = 1, for 0 ¡ t ¡ 1. Then F(t) = t. So we have r; s = 4

and

ARE(11; rs) = 3!r; s :

We evaluated ARE(11; rs) for several values of r; s = 0:1(5)0:1. These values showed that our estimator performs better than the Wilcoxon score rank estimator for r; s ¡ 1. For example, our numerical computations showed that ARE(11; r ∗ s∗ ) = 0:045, where r ∗ = s∗ = 0:01. Thus, we should choose r; s as small as possible as long as 0 ¡ r; s ¡ 1. 3.2. Exponential distribution Let f(t) = exp(−t), for t ¿ 0. Then F(t) = 1 − exp(−t). So we have  2 1 s   if r ¿ 0;   r+1 + s+1 r; s =  2  s    if r = 0 s+1

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

ARE 0 0.2 0.4 0.6 0.8 1

209

3 2.5

1 0.8

2 s

0.6

1.5

0.4

1

0.2

r

Fig. 1. Asymptotic relative e5ciencies [ARE(ˆ1; 1 ; ˆr; s ) = (|v(ˆr; s )|=|v(ˆ1; 1 )|)1=p ] of ˆr; s with respect to ˆ1; 1 for the exponential distribution. Table 1 Asymptotic relative e5ciencies [ARE(ˆ1; 1 ; ˆr; s ) = (|v(ˆr; s )|=|v(ˆ1; 1 )|)1=p ] of ˆr; s with respect to ˆ1; 1 for some selected right-skewed distributions, where N$ (k; 1) = (1 − $)N (0; 1) + $N (k; 1), r ∗ = 1 − 0:5|S3 |; s∗ = 1 + 0:95|S3 | and S3 is the skewness coe5cient Model

0.3

0.1

s

1

2

3

1

2

3

1

2

3

1

2

3

s∗

Expon. Lognor.

0.201 0.672

0.173 0.496

0.145 0.405

0.404 0.651

0.334 0.513

0.280 0.433

0.600 0.735

0.479 0.580

0.397 0.491

0.776 0.839

0.602 0.653

0.493 0.550

0.113 0.280

N0:10 (3; 1) N0:15 (3; 1) N0:20 (3; 1) N0:25 (3; 1) N0:30 (3; 1)

0.956 0.942 0.930 0.921 0.915

0.923 0.878 0.849 0.835 0.836

0.945 0.879 0.832 0.803 0.795

0.949 0.931 0.916 0.904 0.898

0.924 0.882 0.853 0.839 0.838

0.936 0.877 0.835 0.809 0.801

0.958 0.943 0.932 0.923 0.919

0.928 0.891 0.867 0.855 0.856

0.931 0.880 0.843 0.821 0.815

0.973 0.964 0.956 0.952 0.949

0.935 0.904 0.884 0.875 0.878

0.930 0.886 0.855 0.836 0.832

0.924 0.873 0.829 0.802 0.787

N0:10 (5; 1) N0:15 (5; 1) N0:20 (5; 1) N0:25 (5; 1) N0:30 (5; 1)

0.950 0.933 0.918 0.906 0.897

0.895 0.841 0.805 0.785 0.784

0.911 0.831 0.774 0.737 0.723

0.941 0.918 0.899 0.884 0.875

0.898 0.847 0.812 0.792 0.789

0.905 0.835 0.783 0.749 0.736

0.951 0.933 0.918 0.906 0.900

0.905 0.860 0.829 0.813 0.812

0.904 0.842 0.796 0.767 0.756

0.968 0.957 0.947 0.940 0.937

0.915 0.877 0.851 0.839 0.840

0.907 0.853 0.813 0.788 0.779

0.888 0.808 0.738 0.685 0.653

and ARE(11; rs) = 3

0.5

r∗

r

0.7

!r; s : r; s

Fig. 1 displays a surface plot of ARE(11; rs) as a function of r and s. Table 1 shows that our procedure has higher e5ciency than the Wilcoxon scores if r is decreased and s is increased. Thus, we should choose as small an r as possible and as large an s as possible as long as they are positive. We note that there is a discontinuity point of r; s at r = 0 and 0; s ¡ r; s for 0 ¡ r ¡ 1 and s ¿ 1. Therefore, it is important that r is strictly positive (r ¿ 0).

210

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

3.3. Double exponential distribution Let f(t) = exp(−|t|)=2, for − ∞ ¡ t ¡ ∞. Then F(t) = exp(t)=2, for t ¡ 0 = 1 − exp(−t)=2, for t ¿ 0. So we have   r 2 −1 2s − 1 2 r; s = + (r + 1)2r (s + 1)2s and ARE(11; rs) =

3 !r; s : 4 r; s

The numerical computations show that there is not much diTerence between the proposed score function and the Wilcoxon scores. On the other hand, our procedure has slightly higher e5ciency than the Wilcoxon scores for 1 ¡ r; s ¡ 2, especially if both r and s are close to 1.5. 3.4. Normal distribution √ Let f(t) = (1= 2&) exp(−t 2 =2), for − ∞ ¡ t ¡ ∞. Then 

2 r −1 s −1 2 r; s = [rF (t) + sF (t)] f (t) dt and ARE(11; rs) =

3 !r; s : & r; s

For the normal distribution, we computed ARE(11; rs) for several values of r; s = 0:1(5)0:1. The numerical values of ARE(11; rs) shows that there is not much diTerence between the proposed score function and the Wilcoxon scores. On the other hand, for r; s ¡ 1 or r; s ¿ 2 our estimator is slightly better than the rank estimator based on the Wilcoxon scores. 3.5. Cauchy distribution Let f(t) = 1=[&(1 + t 2 )], for − ∞ ¡ t ¡ ∞. Then 

2 [rF r −1 (t) + sF s−1 (t)]f2 (t) dt r; s = and ARE(11; rs) =

3 !r; s : &2 r; s

Again as in the double exponential distribution, our estimator outperforms (but the improvement is not very great) the rank estimator based on the Wilcoxon scores for 1 ¡ r; s ¡ 2, especially when both r and s are close to 1.5.

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

211

3.6. Lognormal distribution √ Let f(t) = exp[ − {log(t)}2 =2]=( 2&t), for t ¿ 0. Then F(t) = '{log(t)}, where U(t) is cdf of the standard normal distribution. So we have 

2 r −1 s −1 2 r; s = [rF (t) + s(1 − F(t)) ]f (t) dt and



ARE(11; rs) = 12

2

f (t) dt

2

!r; s : r; s

As in the exponential distribution, our estimator outperforms the rank estimator based on the Wilcoxon scores for small r ¡ 1 and large s ¿ 1, see, for example, Table 1. 3.7. Contaminated normal distribution Let N$ (y − k) = (1 − $)'(y) + $U(y − k) and N$ (y=k) = (1 − $)U(y) + $U(y=k). Then we have 

2 r −1 s −1 2 r; s ($; k) = [rN$ (t − k) + s(1 − N$ (t − k)) ]n$ (t − k) dt and ∗r; s ($; k) =



[rN$r −1 (t=k) + s(1 − N$ (t=k))s−1 ]n2$ (t=k) dt

2

:

The asymptotic relative e5ciencies for the location and scale shifts are, respectively, 

2 !r; s 2 ARE$; k (11; rs) = 12 n$ (t − k) dt r; s ($; k) and ∗

ARE$; k (11; rs) = 12



n2$ (t=k) dt

2

!r; s : r; s ($; k)

∗

The evaluations of ARE$; k (11; rs) and ARE∗$; k (11; rs), as in the normal distribution, require numerical computation and they are presented in Tables 1 and 2 for selected values of $ = 0:1(0:3)0:05, k = −5; −3; 3; 5 and r; s. 3.8. Generalized F-distribution In this section, we evaluate the e5ciency of our score function with respect to the optimal scores of the generalized F-distribution (McKean and Sievers, 1989). Let F be a random variable having an F distribution with degrees of freedoms 2m1 and 2m2 . Then T = log(F) is said to have a generalized F-distribution (GF(2m1 ; 2m2 )) with degrees of freedoms 2m1 and 2m2 . The generalized F-distribution is a very Eexible distribution that covers variety of shape and tail behaviors. It produces symmetric distributions if m1 = m2 , positively- (negatively-) skewed distributions if m1 ¿ m2 (m1 ¡ m2 ) and heavy- (light-) tailed distributions if m1 ; m2 ¡ 1 (m1 ; m2 ¿ 1). McKean and Sievers (1989) adaptively

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

212

Table 2 Asymptotic relative e5ciencies [ARE(ˆ1; 1 ; ˆr; s ) = (|v(ˆr; s )|=|v(ˆ1; 1 )|)1=p ] of ˆr; s with respect to ˆ1; 1 for some selected left-skewed distributions, where N$ (k; 1) = (1 − $)N (0; 1) + $N (k; 1), r ∗ = 1 + 0:95|S3 |; s∗ = 1 − 0:5|S3 | and S3 is the skewness coe5cient Model

r

1

s

0.1

2

3

1

2

3

0.3

1

2

3

0.5

1

2

3

r∗ s∗

0.7

N0:10 (−3; 1) N0:15 (−3; 1) N0:20 (−3; 1) N0:25 (−3; 1) N0:30 (−3; 1)

0.956 0.942 0.930 0.921 0.915

0.923 0.878 0.849 0.835 0.836

0.945 0.879 0.832 0.803 0.795

0.949 0.931 0.916 0.904 0.898

0.924 0.882 0.853 0.839 0.838

0.936 0.877 0.835 0.809 0.801

0.958 0.943 0.932 0.923 0.919

0.928 0.891 0.867 0.855 0.856

0.931 0.880 0.843 0.821 0.815

0.973 0.964 0.956 0.952 0.949

0.935 0.904 0.884 0.875 0.878

0.930 0.886 0.855 0.836 0.832

0.924 0.873 0.829 0.802 0.787

N0:10 (−5; 1) N0:15 (−5; 1) N0:20 (−5; 1) N0:25 (−5; 1) N0:30 (−5; 1)

0.950 0.933 0.918 0.906 0.897

0.895 0.841 0.805 0.785 0.784

0.911 0.831 0.774 0.737 0.723

0.941 0.918 0.899 0.884 0.875

0.898 0.847 0.812 0.792 0.789

0.905 0.835 0.783 0.749 0.736

0.951 0.933 0.918 0.906 0.900

0.905 0.860 0.829 0.813 0.812

0.904 0.842 0.796 0.767 0.756

0.968 0.957 0.947 0.940 0.937

0.915 0.877 0.851 0.839 0.840

0.907 0.853 0.813 0.788 0.779

0.888 0.808 0.738 0.685 0.653

estimated m1 and m2 to construct a score function designed to reEect on the shape of the underlying probability models. While their score function is optimal for GF(2m1 ; 2m2 ) for given values of m1 and m2 , it may not be optimal for other distributions. On the other hand, adaptive nature of the score function provides improvement over a variety of the probability models. The asymptotic relative e5ciency of our estimator ˆr; s with respect to McKean and Sievers’s estimator, denoted as ˆopt , simpliKes to 1=p |v(ˆr; s )| !r; s m1 + m2 + 1 = : ARE(opt; rs) = ˆ r; s m1 m2 |v(opt )| We note that ARE(opt; rs) ¿ 1 at GF(2m1 ; 2m2 ) since ˆopt is constructed based on the optimal scores of GF(2m1 ; 2m2 ). On the other hand, our numerical computations showed that for some delicate selection of r; sARE(opt; rs) is practically equal to one. We discuss the selection of r and s in Section 4. 4. Selection of r and s In this section, we explore the selection of r and s that provides improvement over the Wilcoxon scores. The numerical computations of ARE(11; rs) in Section 3 clearly indicate that there is an obvious association between the selection of r, s and tail behavior of the underlying distribution. For example, we select r ¡ 1; s ¿ 1 for right-skewed distribution and r ¿ 1; s ¡ 1 for left-skewed distributions. We also note that generalized F-distribution can generate a wide variety of distributions having diTerent degree of skewness. In order to Knd a deKning association between the degree of skewness and the selection of r; s, we select diTerent m1 and m2 to produce selected skewness values S3 =0(2)0:25, where S3 =E(x −+)3 =,3 . For each S3 , we Knd the optimal r; s that produce the smallest

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

213

variance for our estimator. We, Knally, Kt a regression model to predict the optimal values of r; s for a given value of S3 . Our regression Kt provides the following prediction formula r ∗ = 1 − 0:5|S3 | s∗ = 1 + 0:95|S3 |; where we set r ∗ =0:05 if r ∗ =0 which is the discontinuity point of r; s in the exponential distribution. Similar computations were performed for left-skewed distributions. In this case, we found that r ∗ = 1 + 0:95|S3 | s∗ = 1 − 0:5|S3 | and again we set s∗ = 0:05 if s∗ = 0. Table 1 shows the asymptotic Pitman e5ciency with respect to Wilcoxon scores for the rightskewed distributions such as exponential, lognormal and contaminated normal, N$ (k; 1)=(1−$)N (0; 1) + $N (k; 1); $ = 0:1(0:3)0:05; k = 3; 5. The computations were made for all r; s = 0:1(5)0:1, and r ∗ , s∗ but we only report the selected values. The results of Table 1 can be summarized as follows. For the right-skewed distributions, such as exponential, lognormal and contaminated normal, Table 1 indicates that if r ¡ 1 and s ¿ 1, v(ˆr; s ) is much smaller than v(ˆ1; 1 ). In particular, for a strongly right-skewed distribution, the proposed score generating function provides improved e5ciency over the Wilcoxon scores for the small values of r and large values of s. In this regard, our prediction formula r ∗ and s∗ work very well to determine the values of r and s. For example, the values of r ∗ and s∗ provide the smallest ARE(ˆ1; 1 ; ˆr ∗ ; s∗ ) for all distributions in Tables 1 and 2. Table 2 shows the asymptotic Pitman e5ciencies with respect to Wilcoxon scores for the leftskewed distributions. It indicates that results similar to that in Table 1 hold for the left-skewed distributions as well. Again the prediction formula r ∗ and s∗ work very well to identify the values of r and s. The discussion in Section 3 indicates that the proposed score function does not provide great improvement for symmetric distributions with the exception of uniform distribution in which case we suggest r = s = 0:01. Therefore, for symmetric distribution we recommend using Wilcoxon scores. 5. Conclusions In this paper, we introduce a new class of score-generating functions for rank regression estimates in a multiple linear regression model. The new class of score functions compares the rth and sth power of the tail probabilities of the underlying distribution. For the implementation of the procedure, r and s must be estimated. Therefore, in practice we suggest Krst Ktting a regression model with Wilcoxon scores and estimating the skewness, S3 , of the error distribution. Then, we recommend using r ∗ = 1 − 0:5|S3 |, and s∗ = 1 + 0:95|S3 | for the right-skewed distributions and r ∗ = 1 + 0:95|S3 |, and s∗ = 1 − 0:5|S3 | for the left-skewed distributions to construct the score function. We show that these selections provide very good e5ciency with respect to Wilcoxon scores. Acknowledgements Young Hun Choi’s work is completed during his sabbatical leave in the Department of Statistics, The Ohio State University.

214

. Ozt. . urk / Statistics & Probability Letters 57 (2002) 205–214 Y.H. Choi, O.

References Ahmad, I.A., 1996. A Class of Mann–Whitney–Wilcoxon type statistics. Amer. Statist. 50, 324–327. Heiler, S., Willers, R., 1988. Asymptotic normality of R-estimation in the linear Model. Statistics 19, 173–184. Hettmansperger, T.P., McKean, J.W., 1983. A geometric interpretation of inferences based on ranks in the linear model. J. Amer. Statist. Assoc. 78, 885–893. Hettmansperger, T.P., McKean, J.W., 1998. Robust Nonparametric Statistical Methods. Wiley, NY, New York. Jaeckel, L.A., 1972. Estimating regression coe5cients by minimizing the dispersion of the residuals. Ann. Math. Statist. 43, 1449–1458. Jure;ckov