Empirical likelihood for partially linear single-index models with missing observations

Empirical likelihood for partially linear single-index models with missing observations

Journal Pre-proof Empirical likelihood for partially linear single-index models with missing observations Liugen Xue, Jinghua Zhang PII: DOI: Referen...

801KB Sizes 0 Downloads 58 Views

Journal Pre-proof Empirical likelihood for partially linear single-index models with missing observations Liugen Xue, Jinghua Zhang

PII: DOI: Reference:

S0167-9473(19)30232-4 https://doi.org/10.1016/j.csda.2019.106877 COMSTA 106877

To appear in:

Computational Statistics and Data Analysis

Received date : 15 January 2019 Revised date : 11 September 2019 Accepted date : 16 October 2019 Please cite this article as: L. Xue and J. Zhang, Empirical likelihood for partially linear single-index models with missing observations. Computational Statistics and Data Analysis (2019), doi: https://doi.org/10.1016/j.csda.2019.106877. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Elsevier B.V. All rights reserved.

*Manuscript Click here to view linked References

Journal Pre-proof

Empirical likelihood for partially linear single-index models with missing observations Liugen Xue and Jinghua Zhang College of Applied Sciences, Beijing University of Technology, Beijing 100124, China

Abstract

In this paper, we study the empirical likelihood for a partially linear

of

single-index model with a subset of covariates and response missing at random. By using the bias-correction and the imputation method, two empirical log-likelihood ratios are proposed such that any of two ratios is asymptotically chi-squared. Two

p ro

maximum empirical likelihood estimates of the index coefficients and the estimator of link function are constructed, their asymptotic distributions and optimal convergence rate are obtained. It is proved that our methods yield asymptotically equivalent estimators for the index coefficients. An important feature of our methods is their ability to handle missing response and/or partially missing covariates. In addition, we study the estimation and empirical likelihood for two

Pr e-

special cases—the single-index model and partially linear model with observations are missing at random. A simulation study indicates that the proposed methods are comparable for bias and standard deviation, as well as in terms of coverage probabilities and average areas (lengths) of confidence regions (intervals). The proposed methods are illustrated by an example of real data. Key words and phrases: Missing at random; Imputation method; Confidence

al

region; Bias-correction; Empirical likelihood.

urn

AMS subject classification: Primary 62J05; secondary 62G15.

Corresponding author: Liugen Xue

0

E-mail address: [email protected]

Jo

0

Journal Pre-proof

1

Introduction Consider the partially linear single-index model (PLSIM) Y = g(β T X) + θT Z + ε,

(1.1)

where Y is a scalar response variable, X and Z are p×1 and q×1 covariates respectively,

of

β and θ are two unknown vectors, g(·) is an unknown smooth link function, and ε is a random error with E(ε|X, Z) = 0 almost surely. The restriction ∥β∥ = 1 assures identifiability, where ∥ · ∥ denotes the Euclidean norm.

p ro

Suppose that {Xi , Zi , Yi }ni=1 is an independent and identically distributed sample

from {X, Z, Y }. Consider the vector (ViT , VicT )T formed by shuffling the elements of (XiT , ZiT , Yi )T such that Vi , a d-dimensional non-null vector with 0 < d ≤ p + q, is

observed for all i’s, while Vic contains elements for which observations may or may

Pr e-

not be available for some i’s. Let δi = 1 if all values contained in Vic are observed, and δi = 0 otherwise. Throughout this paper, we assume that {Vi , δi } and {V, δ} is identically distributed, and let the missing data are missing at random (MAR), that is P (δ = 1|X, Z, Y ) = P (δ = 1|V ) ≡ π(V ), where π(v) is a unknown function, called as the selection probability function. In this paper, we consider the following four missing data scenarios: (i) data missing

al

from all of the covariates but fully observed for the response; (ii) data missing from the response only; (iii) data missing from the response and a subset of covariates; (iv) data

urn

missing from a subset of covariates only. We do not consider the case of data missing from all of the covariates as well as the response, because we require a non-null Vi in order to be able to estimate π(v) by kernel methods. For missing data, some authors carry out research. Robins et al. (1994) proposed a new class of semiparametric efficient estimators, based on inverse probability weighted

Jo

estimating equations, that are consistent for parameter vector of the conditional mean 2

Journal Pre-proof model when the data are missing at random in the sense of Rubin (1976) and the missingness probabilities are either known or can be parametrically modeled. Wang and Rao (2002a) developed an adjusted empirical likelihood approach to inference for the mean of the response variable, and proved the nonparametric version of Wilks’theorem for the adjusted empirical log-likelihood ratio by showing that it has an asymptotic standard chi-squared distribution. Liang et al. (2004) studied estimation in the par-

of

tially linear model with missing covariates, and developed new methods for estimating regression coefficients. Wang et al. (2004) and Wang (2009) developed the inference tools to define the estimators and the empirical likelihood ratios of the response mean

p ro

and the regression coefficients in the partial linear model with the response and covariables missing at random, respectively, and proved the asymptotic properties for the proposed estimators and ratios. Qin and Zhang (2006) employed the empirical likelihood method to seek a constrained empirical likelihood estimation of mean response

Pr e-

with the assumption that responses are missing at random; the empirical-likelihoodbased estimators enjoy the double-robustness property, asymptotically unbiased and efficient estimators even if the true regression function is not completely known. Qin et al. (2009) proposed a unified empirical likelihood approach to missing data problems and explore the use of empirical likelihood to effectively combine unbiased estimating equations when the number of estimating equations is greater than the number of un-

al

known parameters, the proposed method can achieve semiparametric efficiency if the probability of missingness is correctly specified. The related works have Wang and Rao (2001, 2002b), Wang (2004), Liang (2008), Xue (2009a, b). It is worth mentioning

urn

that all of the above cited references were to handle either missing covariates or missing response, but not both.

For the single-index models under no missing data, many authors have carried out research. Xie et al. (2006) derived asymptotic distributions for two estimators of the

Jo

single-index model. Li et al. (2017) proposed a novel functional varying-coefficient

3

Journal Pre-proof single-index model to carry out the regression analysis of functional response data on a set of covariates of interest, and developed a efficient estimation procedure to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. The related works have Carroll et al. (1997), Yu and Ruppert (2002), Zhu and Xue (2006), Xia and H¨ardle (2006), Wang et al. (2010). For the single-index models under missing data, Xue (2013) and

of

Xue and Lian (2016) studied the estimation and empirical likelihood for single-index models with missing covariates and response respectively, the asymptotic normalities for the estimators of index coefficients and the link function were proved, the optimal

p ro

convergence rate for the estimator of link function was also given, a class of empirical likelihood ratios was proposed such that each of our class of ratios is asymptotically chisquared. Guo et al. (2015) investigated the empirical-likelihood-based inference for the construction of confidence intervals and regions of the parameters of interest in

Pr e-

the single-index models with missing covariates at random, and the proposed inverse probability weighted-type empirical likelihood ratio is asymptotically standard chisquared. However, the above mentioned references does not have to handle missing response and/or partially missing covariates.

In this paper, we are interested in estimating β, θ and unknown function g(u), as well as in constructing confidence regions of (β, θ) for model (1.1) in above mentioned

al

missing data scenarios (i)–(iv). The bias-correction and the imputed method are used to develop some methods for constructing the confidence regions of (β, θ). Two empirical log-likelihood ratios for (β, θ) are constructed. It is shown that each of two

urn

ratios is asymptotically chi-squared. We also construct a class of estimator for β, θ and g(u). We prove that our methods yield asymptotically equivalent estimators for β and θ that achieve the desirable asymptotic properties of unbiasedness, normality and √ n-consistency. We also prove that the estimator of g(u) has the asymptotic normality

Jo

and uniformly convergence rate. The proposed results can be used directly to construct

4

Journal Pre-proof the confidence regions of (β, θ). The proposed methods have the following features and points. (a) The resulting empirical likelihood ratio has asymptotically chi-squared distribution by using the centered covariates and the imputation for missing values, this achieves the bias-correction for the empirical likelihood ratio function; (b) Because of our bias correction, the existing data-driven algorithm is valid for selecting an optimal bandwidth to estimate g(·) and its derivative, undersmoothing for estimator of g(·) is

of

avoided; (c) The proposed methods can handle both missing responses and missing covariates simultaneously, this extends the applied scope in practice. We need to point out that the empirical likelihood method introduced by Owen (1988) does not involve

p ro

a plug-in estimator for the asymptotic variance, and the confidence region does not needs to impose prior constraints on the region shape.

The rest of this paper is organized as follows. Section 2 gives methodology for constructing the empirical likelihood ratios and estimators for β, θ and g(·). Section 3

Pr e-

develops some asymptotic properties of the proposed methods. Section 4 reports the results of some simulation studies and a real data example. Section 4 is the concluding remarks. The proofs of main results are relegated to the appendix.

2

Methodology

Weighted empirical likelihood

al

2.1

Note that ∥β∥ = 1, to use the information, we adopt the popularly used delete-one-

urn

component method. Let β = (β1 , . . . , βp )T , and let β (r) = (β1 , . . . , βr−1 , βr+1 , . . . , βp )T

be a p − 1 dimensional vector after removing the rth component βr in β, assume that, without loss of generality, the value of the rth component to be positive (or negative),

Jo

and hence ∥β (r) ∥ < 1 means that β is infinitely differentiable in a neighbourhood ∂β of the true parameter β (r) , and the Jacobian matrix is Jβ (r) , where Jβ (r) = = ∂β (r) (γ1 , . . . , γp )T , γs (1 ≤ s ≤ p, s ̸= r) is a p − 1 dimensional unit vector with sth 5

Journal Pre-proof component 1 and γr = −(1 − ∥β (r) ∥2 )−1/2 β (r) . Therefore, an auxiliary random vector is defined as ζi (β (r) , θ) =





g (β δi {Yi − θT Zi − g(β T Xi )}   π(Vi )

T

Xi )JβT(r) Xi Zi



 ,

(2.1)

where g ′ (·) is the derivative of g(·). Note that E{ζ(β (r) , θ)} = 0, if β and θ are the true parameters. Using this, we can construct an empirical likelihood ratio for (β (r) , θ).

of

However, this ratio cannot be directly used to make statistical inference on (β (r) , θ) because ζ(β (r) , θ) contains the unknowns π(Vi ), g(β T Xi ) and g ′ (β T Xi ), a natural way

p ro

is to replace them by their estimators. For a given kernel function K ∗ (v) in Rd , and a bandwidth h1 = h1 (n) with 0 < h1 → 0, the estimator of π(v) is defined as π ˆ (v) =

n ∑

∗ Wni (v)δi ,

i=1

∗ Wni (v)

=K



(

Pr e-

where Vi − v h1

)/∑ n

j=1

K



(

)

Vj − v . h1

(2.2)

(2.3)

For given β and θ, we can construct the estimators of g(u) and g ′ (u) by using the local linear fitting method proposed by Fan and Gijbels (1996). Let K(·) is a kernel function on the real set R, and h2 = h2 (n) is a bandwidth sequence tending to 0. Then, the estimators of g(u) and g ′ (u) are defined as n ∑

al

gˆ(u; β, θ) = and

urn

i=1

gˆ′ (u; β, θ) =

where

i=1

(2.4)

f (u; β)(Y − θ T Z ), W ni i i

(2.5)

n−1 {δi /ˆ π (Vi )}Kh2 (β T Xi − u){Sn,2 (u; β) − (β T Xi − u)Sn,1 (u; β)} , 2 Sn,0 (u; β)Sn,2 (u; β) − Sn,1 (u; β)

Jo

Wni (u; β) =

n ∑

Wni (u; β)(Yi − θT Zi )

6

Journal Pre-proof f (u; β) = W ni

n−1 {δi /ˆ π (Vi )}Kh2 (β T Xi − u){(β T Xi − u)Sn,0 (u; β) − Sn,1 (u; β)} , 2 Sn,0 (u; β)Sn,2 (u; β) − Sn,1 (u; β)

Sn,l (u; β) =

n 1∑ δi (β T Xi − u)l Kh2 (β T Xi − u), l = 0, 1, 2 n i=1 π ˆ (Vi )

and Kh2 (·) = h−1 ˆ(u; β, θ) can be written as gˆ(u; β, θ) = gˆ1 (u; β)− 2 K(·/h2 ). Obviously, g θT gˆ2 (u; β), where n ∑

Wni (u; β)Yi and gˆ2 (u; β) =

i=1

n ∑ i=1

Wni (u; β)Zi

of

gˆ1 (u; β) =

(2.6)

are the estimators of g1 (u) = E(Y |β T X = u) and g2 (u) = E(Z|β T X = u), respectively.

p ro

When replacing π(Vi ), g(β T Xi ) and g ′ (β T Xi ) in ζi (β (r) , θ) by π ˆ (Vi ), gˆ(β T Xi ; β, θ)

and gˆ′ (β T Xi ; β, θ), an auxiliary random vector can be obtained, and hence an empirical likelihood ratio for (β (r) , θ) can be constructed. However, this method needs undersmoothing for estimating the nonparametric component g(·) and its derivative g ′ (·)

Pr e-

to obtain an asymptotic distribution of this ratio. This brings difficulty in selecting bandwidths. Therefore, we propose a bias-correction such that we do not need undersmoothing. So, by adopting the conditional centering for Xi and Zi given β T Xi , we introduce an auxiliary random vector

ηˆi,W (β (r) , θ) =

al

where π ˆ (·) is defined in (2.2),

δi ∗ (r) ξ (β , θ) π ˆ (Vi ) i

(2.7)

ξi∗ (β (r) , θ) = w(β T Xi ){Yi − θT Zi − gˆ(β T Xi ; β, θ)} 

gˆ (β

urn

× 



T

Xi ; β, θ)JβT(r) {Xi

T



− gˆ3 (β Xi ; β)}  . Zi − gˆ2 (β T Xi ; β)

(2.8)

Here gˆ(·), gˆ′ (·) and gˆ2 (·) are defined in(2.4)–(2.6) respectively, and gˆ3 (u; β) is the esti-

Jo

mator of g3 (u) = E(X|β T X = u), that is gˆ3 (u; β) =

n ∑

Wni (u; β)Xi ,

i=1

7

(2.9)

Journal Pre-proof where Wni (u; β) is defined by (2.4), and w(·) is a bounded weight function with a bounded support Uw , which is introduced to control the boundary effect in the estimation of g(u). In the simulation study w(·) can be taken as an indicator function on the bounded support of distribution of β T X. The simulation results are not sensitive to the choice of this weight function. In practice, quite often one may take w(·) = 1, and the calculation is stable.

of

A weighted empirical log-likelihood ratio function for (β (r) , θ) with missing observations is defined as RW (β , θ) = −2 max

p1 ,...,pn

{ n ∑ i=1

} n n ∑ ∑ (r) log(npi ) pi ≥ 0, pi = 1, pi ηˆi,W (β , θ) = 0 . i=1

i=1

p ro

(r)

Remark 1. We use the kernel method to estimate the selection probability function π(v). This overcomes the difficulty with the mis-specification of selection probability commonly encountered with parametric methods. However, the nonparametric method

Pr e-

also has its drawbacks: the rate of convergence of the kernel estimator decreases quickly as the dimension of the model grows, that is the well-known curse of dimensionality. When the dimension of V is high, we may use a parametric or semiparametric model for π(v). Suppose that our model is πτ (v), where τ can contain finite-dimension or infinite-dimension parameters, and let πτˆ (v) is an estimator of πτ (v). Then we can replace π ˆ (v) of (2.2) by πτˆ (v). Parametric models would be a leading case, because πτˆ (v) is easy to compute and likely to have a distribution well approximated by its

al

limit. Semiparametric models, such as the partially linear model and the single-index model, are also a nice selection. In either of these cases, the method will not require

urn

high-dimensional smoothing operations.

We maximize {−RW (β (r) , θ)} to obtain the estimators of β (r) and θ, βˆW and θˆW say, (r)

called the maximum weighted empirical likelihood estimate. They are the solution of estimating equations

n ∑ i=1

ηˆi,W (β (r) , θ) = 0 (Qin and Lawless, 1994). By using ∥β∥ = 1,

an estimator βˆr of βr can be obtained, and hence an estimator βˆW of β is obtained.

Jo

With βˆW and θˆW , the final estimator of g(u) is defined by gˆW (u) = gˆ(u; βˆW , θˆW ). 8

Journal Pre-proof 2.2

Imputed empirical likelihood

In Section 2.1, since the missing data is not considered in constructing the empirical likelihood ratio for (β (r) , θ), the coverage accuracies of confidence regions are reduced when there are plenty of missing values. To solve the issue, a common approach is to use the predicted values to impute the missing values. Define m∗ (v) = m∗ (v; β (r) , θ) = E{ξi∗ (β (r) , θ)|Vi = v}, where ξi∗ (β (r) , θ) is defined in (2.8). Following the ideas of Robins

of

et al.(1994) and Qin et al. (2009), we use m∗ (Vi ) to impute ξi∗ (β (r) , θ). Since m∗ (v) is unknown, we need to estimate it. For given β and θ, a kernel estimator of m∗ (v) is

m(v; ˆ β (r) , θ) =

n ∑ i=1

p ro

defined as Wni (v)ξi∗ (β (r) , θ),

where (

)/∑ n

(

)

Vj − v δj K , h3 j=1

Pr e-

Vi − v Wni (v) = δi K h3

(2.10)

(2.11)

the kernel K(·) is a probability density function, and the bandwidth h3 = h3 (n) satisfies 0 < h3 → 0. Therefore, we can introduce an auxiliary random vector {

}

δi ∗ (r) δi ηˆi,I (β , θ) = ξi (β , θ) + 1 − m(V ˆ i ; β (r) , θ). π ˆ (Vi ) π ˆ (Vi ) (r)

(2.12)

An empirical log-likelihood ratio for (β (r) , θ) based on imputed values is defined as (r)

RI (β , θ) = −2 max

} n n ∑ ∑ (r) log(npi ) pi ≥ 0, pi = 1, pi ηˆi,I (β , θ) = 0 .

al

p1 ,...,pn

{ n ∑ i=1

i=1

i=1

The ratio is more appropriate than the weighted empirical likelihood, because it suffi-

urn

ciently uses the information contained in the data. Remark 2. Since the nonparametric estimators π ˆ (·) and gˆ(·) are plugged in the auxiliary random vector ηˆi,I (β (r) , θ), the bias yields in the empirical log-likelihood ratio RI (β (r) , θ). Therefore, the conditional centering for X and Z given β T X and the

Jo

imputation for missing values are requisite, such a technique is called as the biascorrection. By using the bias-correction, the resulting empirical log-likelihood ratio has 9

Journal Pre-proof asymptotically chi-squared distribution. This result is given in Theorem 1 of Section 3. Also, by using the centered covariates in the auxiliary random vectors ηˆi,W (β (r) , θ) and ηˆi,I (β (r) , θ), undersmoothing for estimating nonparametric functions are avoided, and the existing data-driven algorithm can be used to select an optimal bandwidth. For details, refer to the proofs of lemma 1 in Appendix. (r) We can maximize {−RI (β (r) , θ)} to obtain the estimators of β (r) and θ, βˆI and θˆI

are the solution to the estimating equations

n ∑

of

say, called the maximum empirical likelihood estimate with the imputed values. They ηˆi,I (β (r) , θ) = 0 (Qin and Lawless, 1994).

i=1

Using ∥β∥ = 1, we can obtain an estimator of β, βˆI say. With βˆI and θˆI , we obtain the

Main results

Pr e-

The following conditions are needed for giving our main results.

(C1) The density function p(u) of β T X is bounded away from zero and infinity on u ∈ Uw , and p(u) and q(u) satisfy the Lipschitz condition of order 1, where Uw is the support of w(u), q(u) = E(ε2 /π(V )|β T X = u).

(C2) The functions g(u), g2s (u) and g3s (u) have two bounded and continuous derivatives on Uw , where g2s (u) and g3s (u) are the sth components of g2 (u) =

al

E(Z|β T X = u) and g3 (u) = E(X|β T X = u) respectively.

(C3) The kernel K(u) is symmetric probability density function with bounded derivative and support (−1, 1).

urn

(C4) The kernels K ∗ (v) and K(v) are the functions of order b with b > d, and there exist positive constants c1 , c2 and ρ such that for L = K ∗ or K, c1 I(∥v∥ ≤ ρ) ≤ L(v) ≤ c2 I(∥v∥ ≤ ρ).

2b (C5) The bandwidths satisfy nh2d l → ∞ and nhl → 0 for b > d and l = 1, 3,

Jo

3

p ro

final estimator of g(u), that is gˆI (u) = gˆ(u; βˆI , θˆI ).

and h2 = c0 n−1/5 for some constant c0 > 0, where b is defined in (C4). 10

Journal Pre-proof (C6) sup E(ε4 |β T X = u) < ∞, sup E(∥X∥4 |β T X = u) < ∞, sup E(∥Z∥4 |β T X = u

u

u

u) < ∞, sup E(ε4 |V = v) < ∞, sup E(∥X∥4 |V = v) < ∞ and v

v

sup E(∥Z∥4 |V = v) < ∞. v

(C7) The density function f (v) of V has bounded partial derivatives up to an order b (b > d), and there exist constants a, r1 > 0 such that ∫

f (v)dv ≥ ar0

(3.1)

of

v∈S(v0 ,r0 )∩V

for all r0 ∈ [0, r1 ] and v0 ∈ V, where b is defined in (C4), V is the support of V , and S(v0 , r0 ) is the closed sphere with center v0 and radius r0 .

p ro

(C8) The selection probability function π(v) and the function m(v) have bounded partial derivatives up to order b (b > d), and there exists some positive constant c0 such that inf π(v) ≥ c0 > 0, where m(v) is defined in Theorem v 1, and b is defined in (C4).

Pr e-

(C9) The matrices Γ and Σ defined in Theorems 1 and 3 are positive definite. Remark 3. Condition (C1) ensures that the denominators of the estimators of g(u) and g ′ (u) are bounded away from 0 with high probability. We demand Condition (C2) because we are using a second-order kernel K(u). Conditions (C3) and (C4) are commonly used conditions for kernel function. Condition (C5) gives the convergence rates of bandwidths h1 and h3 , and the optimal bandwidth h2 . Conditions (C6) and

al

(C9) ensure that there are asymptotic variances for the estimators βˆW and βˆI . Condition (C7) is weak; (3.1) holds if inf f (v) > 0 and v∈V



v∈S(v0 ,r0 )

dv ≥ ar0 for all r0 ∈ [0, b]

m(v; ˆ β (r) , θ).

urn

and v0 ∈ V. Condition (C8) is needed for deriving the convergence rates of π ˆ (v) and Theorem 1. Suppose that Conditions (C1)–(C9) hold. If β and θ are the true parameters, then

D

(a) RW (β (r) , θ) −→ w1 χ21,1 + · · · + wp+q−1 χ21,p+q−1 , D

Jo

(b) RI (β (r) , θ) −→ χ2p+q−1 , 11

Journal Pre-proof where χ2p+q−1 represents the chi-square variable with p + q − 1 degrees of freedom,

χ21,1 , · · · , χ21,p+q−1 are independent χ21 variables, w1 , · · · , wp+q−1 are the eigenvalues of

matrix Γ−1 Ω, where Γ = E[{1/π(V )}cov(ξ|V )],

]

[

Ω = E {1/π(V )}cov(ξ|V ) + {m(V )}⊗2 , (

Λ = g ′ (β T X)X T Jβ (r) , Z T

)T

(3.2)

, m(V ) = E(ξ|V ) and ξ = w(β T X)ε{Λ − E(Λ|β T X)}, D

of

A⊗2 = AAT for any matrix A, and −→ represents the convergence in distribution.

Using result (a) of Theorem 1, and note that m(v) = 0 when the response Y is

p ro

missing only, or the responses Y and the partial components of covariates X and Z are missing, we can obtain the following corollary.

Corollary 1. Suppose that the conditions of Theorem 1 hold, and that the response Y is missing only, or the responses Y and the partial components of covariates X and D

Z are missing. If β and θ are the true parameters, then RW (β (r) , θ) −→ χ2p+q−1 .

Pr e-

Remark 4. From Theorem 1, it can be seen that the imputed empirical loglikelihood ratio RI (β (r) , θ) is asymptotically standard chi-squared, but the weighted

empirical log-likelihood ratio RW (β (r) , θ) is not asymptotically standard chi-squared. This shows that the imputation approach plays a important role for constructing the empirical likelihood ratio with asymptotically standard chi-square distribution. In Corollary 1, since the centered covariates are used in the auxiliary random vector

al

ηˆi,W (β (r) , θ), the weighted empirical log-likelihood ratio RW (β (r) , θ) is also asymptotically standard chi-squared when the response Y is missing only, or the responses Y and the partial components of covariates X and Z are missing.

urn

Using result (b) of Theorem 1, we can obtain an approximate 1 − α confidence

region for (β (r) , θ) as follows: {



}

˜(r) ˜ RI (β˜(r) , θ) ˜ ≤ χ2 (β˜(r) , θ) p+q−1 (1 − α), ∥β ∥ < 1 ,

Jo

where χ2p+q−1 (1 − α) be the 1 − α quantile of the χ2p+q−1 for 0 < α < 1. 12

(3.3)

Journal Pre-proof To apply result (a) of Theorem 1 to construct a confidence region of (β (r) , θ), we need to estimate the unknown weights wi consistently. The plug-in method is used to estimate Γ and Ω by n 1∑ (r) ˆ (r) (r) T ˆ = Γ( ˆ βˆW Γ , θW ) = ηˆi,W (βˆW , θˆW )ˆ ηi,W (βˆW , θˆW ), n i=1

n ∑ (r) T ˆ(r) ˆ ˆ J = Ω( ˆ βˆJ(r) , θˆJ ) = 1 Ω ηˆi,I (βˆJ , θˆJ )ˆ ηi,I (βJ , θJ ) n i=1

(3.4) (3.5)

of

for J = W or I, where ηˆi,W (·, ·) and ηˆi,I (·, ·) are defined in (2.7) and (2.12) respectively. ˆ and Ω ˆ J are the consistent estimators of Γ and Ω respectively. It can be proved that Γ

p ro

ˆ −1 Ω ˆ W, w This implies that eigenvalues of Γ ˆi say, estimates wi consistently, 1 ≤ i ≤

p + q − 1. Let H(·) be the conditional distribution of the weighted sums wˆ1 χ21,1 + · · · + w ˆp+q−1 χ21,p+q−1 , and let cˆα be the 1 − α quantile of H(·). Then the confidence region for (β (r) , θ) with asymptotically correct coverage probability 1 − α can be defined as

{

}

˜ RW (β˜(r) , θ) ˜ ≤ cˆα . However, it is computationally intensive method. We can (β˜(r) , θ)

Pr e-

recommend a adjustment to avoid the Monte Carlo simulation of the limit distribution. Therefore, an adjusted empirical log-likelihood ratio function is defined as RA (β (r) , θ) = rˆ(β (r) , θ)RW (β (r) , θ), where

(3.6)

ˆ (r) , θ)Ω ˆ −1 (β (r) , θ)Q ˆ T (β (r) , θ)}{Q(β ˆ (r) , θ)Γ ˆ −1 (β (r) , θ)Q ˆ T (β (r) , θ)}−1 , rˆ(β (r) , θ) = {Q(β n ∑

ˆ (r) , θ), Ω(β ˆ (r) , θ), and ηˆi,W (β (r) , θ) are defined ηˆi,W (β (r) , θ), and Γ(β

al

ˆ (r) , θ) = n−1/2 Q(β

i=1

in (3.4), (3.5) and (2.7). The following theorem gives the asymptotic distribution of

urn

the ratio RA (β (r) , θ).

Theorem 2. Suppose that Conditions (C1)–(C9) hold. If β and θ are the true D

parameters, then RA (β (r) , θ) −→ χ2p . Using the result of Theorem 2, we obtain an approximate 1 − α confidence region

for (β (r) , θ), that is {



}

Jo

˜(r) ˜ RA (β˜(r) , θ) ˜ ≤ χ2 (β˜(r) , θ) p+q−1 (1 − α), ∥β ∥ < 1 . 13

(3.7)

Journal Pre-proof (r)T The following theorem gives the asymptotic normality of the estimators (βˆJ , θˆJT )T

for J = W or I. Theorem 3. Suppose that Conditions (C1)–(C9) hold. Then 



of

(r) √  βˆJ − β  D −1 −1 n  −→ N (0p+q−1 , Σ ΩΣ ) θˆJ − θ

for J = W or I, where 0p+q−1 is a (p + q − 1)-dimension zero vector, Ω is defined in [

]

(

(3.2), Σ = E w(β T X){Λ − E(Λ|β T X)}⊗2 and Λ = g ′ (β T X)X T Jβ (r) , Z T

p ro

Remark 5. From the result of Theorem 3, it can be seen that (βˆI

)T

.

(r)T T T (βˆW , θˆW )

and

, θˆIT )T are asymptotically equivalent estimators that achieve the desirable asymp√ totic properties of unbiasedness, normality and n-consistency. In addition, from (r)T

Condition (C5) we see that the optimal bandwidth h2 can be taken; namely, Theorem

Pr e-

3 shows that the undersmoothing approach for gˆ(·) is unnecessary and that root-n consistency can be achieved. Therefore, the proposed estimators βˆJ and θˆJ differs from the estimators of Wang et al. (2010) because their estimator needs undersmoothing for estimating g(·).

By Theorem 3 and note that ∥βˆJ ∥ = 1, we can obtain the asymptotic distributions

of βˆJ and θˆJ respectively. That is the following corollary.



Jo

and

D

−1 T n(βˆJ − β) −→ N (0p , Jβ (r) A−1 1 B1 A1 Jβ (r) )

urn



al

Corollary 2. Under the conditions of Theorem 3, we have

D

−1 n(θˆJ − θ) −→ N (0q , A−1 2 B2 A2 )

14

Journal Pre-proof for J = W or I, where [

]

A1 = E w(β T X)g ′2 (β T X)JβT(r) {X − E(X|β T X)}⊗2 Jβ (r) , [

B1 = E {1/π(V )}cov(ζ1 |V ) + {E(ζ1 |V )}⊗2 [

]

A2 = E w(β T X){Z − E(Z|β T X)}⊗2 , [

] ]

B2 = E {1/π(V )}cov(ζ2 |V ) + {E(ζ2 |V )}⊗2 ,

Jβ (r) and w(·) are respectively defined in (2.1) and (2.8).

of

ζ1 = w(β T X)εg ′ (β T X)JβT(r) {X − E(X|β T X)}, ζ2 = w(β T X)ε{Z − E(Z|β T X)}, and To apply the result of Theorem 3 to construct the confidence regions of (β (r) , θ), we

p ro

ˆ J and Σ ˆ J say for J = W or I. The estimator need to give the estimators of Ω and Σ, Ω ˆ J is defined in (3.5), and the estimator Σ ˆ J is defined as Ω n ∑ δi ˆ J = Σ( ˆ βˆJ(r) , θˆJ ) = 1 Σ w(βˆJT Xi ) n i=1 π ˆ (Vi )



⊗2

Pr e-

ˆ′ (βˆJT Xi ; βˆJ , θˆJ )J Tˆ(r) {Xi − gˆ3 (βˆJT Xi ; βˆJ )}   g βJ ×  Zi − gˆ2 (βˆT Xi ; βˆJ )

(3.8)

J

where π ˆ (·), gˆ′ (·), gˆ2 (·) and gˆ3 (·) are defined in (2.2), (2.5), (2.6) and (2.9) respectively. ˆ J and Σ ˆ J are the consistent estimators of Ω and Σ , respectively. It can be proved that Ω Thus, by Theorem 3, we have





al

(r) βˆJ − β  D √ ˆ JΣ ˆ −1 )−1/2 n  ˆ −1 Ω (Σ  −→ N (0p+q−1 , Ip+q−1 ),  J J θˆJ − θ

where Ip+q−1 is the identity matrix with order p+q −1. Using Theorem 10.2d in Arnold 

T

urn

(1981), we can obtain





(r) ˆ(r) βˆ − β  D 2  βJ − β  −1 ˆ ˆ −1 −1  J ˆ  −→ χp+q−1 .  n(ΣJ ΩJ ΣJ )   θˆJ − θ θˆJ − θ

Above two results can be used to construct the confidence regions of (β (r) , θ). Similarly,

Jo

using the results of Corollary 2 and by estimating B1 , A1 , B2 and A2 , we also can construct the confidence regions (or intervals) of β and θ, respectively. 15

Journal Pre-proof The following two theorems give the uniformly convergence rate of gˆ(u; β, θ) and the asymptotic distribution of gˆJ (u) for J = W or I. Theorem 4. Suppose that Conditions (C1)–(C8) hold. Then (

)

˜ θ) ˜ − g(u)| = OP (nh2 / log n)−1/2 + h2 , |ˆ g (u; β, 2

sup ˜ n ,θ∈Θ ˜ n u∈Uw ,β∈B

˜ ∥β˜ − β∥ ≤ d1 n−1/2 } and Θn = {θ| ˜ ∥θ˜ − θ∥ ≤ d2 n−1/2 } for some positive where Bn = {β| Suppose that Conditions (C1)–(C8) hold. Then √

D

nh2 [ˆ gJ (u) − g(u) − b(u)] −→ N (0, γ 2 (u))

p ro

Theorem 5.

of

constants d1 and d2 .

∫ ∫ 1 2 ′′ 2 2 for J = W or I, where b(u) = h2 g (u) t K(t)dt, γ (u) = {q(u)/p(u)} K 2 (t)dt, p(u) 2 is the density function of β T X, q(u) = E(ε2 /π(V )|β T X = u), gˆJ (u) = gˆ(u; βˆJ , θˆJ ), and

gˆ(·) is defined in (2.4). If further assume that nh52 → 0, then b(u) disappears, that is √

D

Pr e-

nh2 [ˆ gJ (u) − g(u)] −→ N (0, γ 2 (u)).

To use the result of Theorem 5 to construct the confidence interval, we need to give the consistent estimator of γ 2 (u). That is, we need to estimate p(u) and q(u). Their estimators are respectively defined as

(

n 1 ∑ βˆJT Xi − u pˆJ (u) = K nh4 i=1 h4

qˆJ (u) =

n ∑

Wni (u; βˆJ ){Yi − θˆJT Zi − gˆ(βˆJT Xi ; βˆJ , θˆJ )}2 /ˆ π (Vi ),

urn

i=1

al

and

)

where K(·) is a kernel function on the real set R, h4 is a bandwidth sequence tending to 0. π ˆ (·), gˆ(·) and Wni (·) are defined in (2.2) and (2.4) respectively. Using the estimators pˆJ (u) and qˆJ (u), we can obtain a consistent estimator of γ 2 (u). That is ∫

= {ˆ qJ (u)/ˆ pJ (u)} K 2 (t)dt. From Theorem 3 we have √

Jo

γˆJ2 (u)

D

nh2 [ˆ gJ (u) − g(u)]/ˆ γJ (u) −→ N (0, 1). 16

Journal Pre-proof The above results can be used to construct the pointwise confidence interval for g(u). The formula is as follows: √

gˆJ (u) ± z1−α/2 γˆJ (u)/ nh2 for J = W or I, where z1−α/2 is the (1−α/2)-th quantile of standard normal distribution. Remark 6. Consider two special cases: the single-index model and the partially

of

linear model. By the above proposed methods, we can construct the empirical likelihood ratios of β (r) and θ respectively. Note that model (1.1) is reduced to a singleindex model or a partially linear model when θ = 0 or β = 1, respectively. There-

p ro

ηi,I (β (r) , 0) and fore, we introduce two auxiliary random vectors η˜i,I (β (r) ) = (1Tp−1 , 0Tq )ˆ ∗ ηi,I (θ) = (0Tp−1 , 1Tq )ˆ ηi,I (1, θ), where 1s denotes the s-dimension vector that all elements

f (β (r) ) and R∗ (θ) are 1 for any integer s, and ηˆi,I (βˆ(r) , θ) is defined in (2.12). Let R I I

∗ denote RI (β (r) , θ) with ηˆi,I (β (r) , θ) being replaced by η˜i,I (β (r) ) and ηi,I (θ) respectively.

Pr e-

f (β (r) ) and R∗ (θ) are respectively referred the empirical log-likelihood ratios of Then R I I

β (r) and θ based on imputed values. We have the following result.

Theorem 6. Suppose that Conditions (C1)–(C9) hold. If β and θ are the true D D f (β (r) ) −→ parameters, then R χ2p−1 and R∗I (θ) −→ χ2q . I

The results of Theorem 6 can be used to construct the regions for β (r) and θ respec-

f (β˜(r) ) ≤ χ2 (1 − α), ∥β˜(r) ∥ < 1} and {θ| ˜ R∗ (θ) ˜ ≤ χ2 (1 − α)}. tively. That is {β˜(r) |R I p−1 I q

f (β (r) )} and {−R∗ (θ)} to obtain the estimators of β (r) and We can maximize {−R I I

al

(r) θ, β˜I and θˆI∗ say, and hence we can obtain the estimator β˜I of β. We can prove

that β˜I and βˆI have the same asymptotic distribution, and θˆI∗ and θˆI also have the

urn

same asymptotic distribution. Therefore, we can construct the confidence regions (or intervals) of β and θ by using these results. Remark 7. It would be interesting for the confidence intervals of one component of β. Therefore, we need to construct a partial profile empirical likelihood ratio. Let γ = (γ1 , . . . , γp+q−1 )T = (β1 , . . . , βr−1 , βr+1 , . . . , βp , θ1 , . . . , θq )T , Using the estimator

Jo

(r) (r) (r)T βˆI and θˆI of βI and θI . we can obtain the estimator γˆ of γ, where γˆ = (βˆI , θˆIT )T ,

17

Journal Pre-proof and hence obtain the estimator of the sth component γs , that is γˆs = eTs γˆ , where er denotes the unit vector of length p + q − 1 with 1 at position s for s = 1, . . . , p + q − 1 and s ̸= r. Let ηˆis,I (γs ) = eTs ηˆi,I (ˆ γ1 , . . . , γˆs−1 , γs , γˆr+1 , . . . , γˆp+q−1 ), where ηˆi,I (·) is defined in (2.12). Then, the partial profile empirical log-likelihood ratio

Rs,I (γs ) = −2 max

{ n ∑ i=1

of

for γs , based on imputed values, is defined as } n n ∑ ∑ log(npi ) pi ≥ 0, pi = 1, pi ηˆis,I (γs ) = 0 . i=1

i=1

p ro

Under the assumptions of Theorem 1, we can prove that the asymptotic distribution of Rs,I (γs ) is chi-squared with 1 degree of freedom. We have the following result. Corollary 3. Suppose that the conditions of Theorem 1 hold. If β and θ are the D

true parameter, then Rs,I (γs ) −→ χ21 . confidence interval of γs , that is {

Pr e-

Applying of the results of Corollary 3, we can construct the approximate 1 − α }

γ˜s | Rs,I (γ˜s ) ≤ χ21 (1 − α) ,

for s = 1, . . . , p + q − 1 and s ̸= r, and hence we can obtain the confidence intervals of every components of β and θ.

al

Similarly, by improving the adjusted empirical log-likelihood ratio that is defined in (3.6), we also can construct the confidence intervals of every components of β and θ. In addition, we can construct the partial profile empirical likelihood ratio for some

4

urn

components of (β T , θT )T by using the above method.

Simulations and Application

Jo

In this subsection, we present a simulation study to evaluate the finite sample performance of the proposed methods. We compare the empirical likelihood with the normal 18

Journal Pre-proof approximation method in terms of coverage accuracies and average areas (lengths) of confidence regions (intervals). In Example 1 of simulation study, we consider the data missing from all of the covariates but fully observed for the response. In Example 2, we consider the data missing from the response and a subset of covariates. In real example, we consider the data missing from a subset of covariates only. For the data

4.1

of

missing from the response only, its analysis is similar to Example 2.

Simulation studies

p ro

Example 1. Consider PLSIM (1.1) with p = 2 and q = 1, where β = (0.8, 0.6)T , θ = 1 and g(u) = 15 exp(−u). The covarite X is from the bivariate standard normal distribution, and Z and ε are from the normal distributions N (0, 1) and N (0, 0.4) respectively.

Data missing was from all of the covariates X and Z but fully observed for the

Pr e-

response Y . Therefore, the variable V equals to Y . We generated 500 Monte Carlo random samples of size n = 60, 100 and 150 based on the following three selection probability functions πi (v) respectively.

Case 1. π1 (v) = 1/{1 + exp(−0.23v)}. Case 2. π2 (v) = 1/{1 + exp(−0.07v)}. Case 3. π3 (v) = 1/{1 + exp(−0.02v)}.

al

The average missing rates corresponding to the preceding three cases are approximately 0.1, 0.24 and 0.39, respectively.

urn

The kernel functions K ∗ (·), K(·) and K(·) were taken to be 0.75(1 − v 2 ) if |v| ≤ 1, 0 otherwise. The cross-validation (CV) method was used to select the bandwidths for hν (ν = 1, 2, 3, 4), and w(u) was taken to be 1. The initial values of θ and β were obtained by the linear model and the generalized linear model, respectively. The simulations were considered in the following situations.

Jo

(I) The weighted empirical likelihood (WEL), the imputed empirical likelihood 19

Journal Pre-proof (IEL) and estimating equation method (EEM) were used to calculate the estimates of β and θ. The bias and standard deviation (SD) were computed by 500 runs with sample size n = 60, 100. The simulated results were presented in Table 1.

of

Table 1 is about here

From Table 1 it is seen that the estimates of β1 , β2 and θ have small bias and SD, and they decrease as the sample size n increases. The IEL and WEL have approach

p ro

bise, but IEL has smaller SD than WEL. The IEL and WEL have smaller SD than EEM.

(II) Consider the confidence intervals of β1 , β2 and θ. The average lengths and coverage probabilities of confidence intervals were calculated by the given three methods

Pr e-

at nominal level 0.95. The three methods are: the adjusted empirical likelihood (AEL), the IEL and the normal approximation (NA). The estimator βˆI was used in the normal approximation method, the estimator βˆW has similar results. The simulation was made by 500 runs with sample size n = 60, 100, 150, and the results are reported in Table 2.

al

Table 2 is about here

From Table 2, it can be obtained the following results: Firstly, the coverage prob-

urn

abilities of IEL and AEL are close to 95%, but IEL has shorter lengths of confidence intervals than AEL. Secondly, AEL and IEL have slightly longer interval lengths, but higher coverage probabilities than NA. Thirdly, all the interval lengths decrease and the empirical coverage probabilities increase as n increases for every fixed missing rate;

Jo

when the sample size is large, all the coverage probabilities are in agreement with the nominal level 0.95, the average length of the confidence intervals becomes shorter. 20

Journal Pre-proof Observably, the missing rate also affects the interval length and coverage probability. Generally, the interval length increases and the coverage probability decreases as the missing rate increases for every fixed sample size. (III) The confidence regions for (β1 , θ) and (β2 , θ), and their coverage probabilities were also computed from 200 simulation runs, which were based on AEL, IEL and NA in Case 3 when the sample size was 100. The simulation results are presented in Figure

p ro

Figure 1 is about here

of

1.

Figure 1 shows that IEL gives smaller confidence regions than AEL and NA. For (a), the empirical coverage probability for AEL, IEL and NA are 0.94, 0.94 and 0.97,

Pr e-

respectively. For (b), the empirical coverage probability for AEL, IEL and NA are 0.945, 0.945 and 0.97, respectively.

(IV) The performances of the estimate for g(u) was considered by 200 runs, based on cases 3 as n = 60. The estimator gˆJ (u) is assessed by using the root mean squared errors (RMSE), which is given by

ngrid

n−1 grid



k=1

{ˆ gW (uk ) − g(uk )}

2

]1/2

al

RMSE =

[

,

where gˆW (u) = gˆ(u; βˆW , θˆW ), the number ngrid of grid point is 20, and {uk , k =

urn

1, . . . , ngrid } are equidistant grid points. The boxplot for the 200 RMSEs is given when n = 60. The simulation results are presented in Figure 2.

Jo

Figure 2 is about here

21

Journal Pre-proof Figure 2(a) gives the real function curve and the estimated curve. From Figure 2(a) it is clear that the estimated curve is very close to the real link function curve. Figure 2(b) shows that the RMSE of the estimate for function g(u) is very small. Example 2. Consider PLSIM (1.1) with q = 2 and p = 4, where θ = (0.18, 0.12)T , √ √ β = (0.5 1.2, 0.5, 0.5, −0.5 0.5)T and g(u) = 1.5 sin(u). The two components of Z are independent and both from the uniform U (0, 1), X are from four-varite standard

of

normal distribution, the error ε are from the normal distribution N (0, 0.16). Data missing was from all of the covariate X and the response Y , but fully observed

function is taken as π(z1 , z2 ) =

p ro

for the response Z. Therefore, the variable V equals to Z. The selection probability exp(1.8z1 + 0.9z2 ) . 1 + exp(1.8z1 + 0.9z2 )

We generated 500 Monte Carlo random samples of size n = 100 based on above selection probability with the average missing rate 0.25.

Pr e-

The kernel function K(u) wae taken to be 0.75(1 − u2 ) if |u| ≤ 1, 0 otherwise.

The kernel functions K ∗ (v1 , v2 ) and K(v1 , v2 ) were taken to be the product kernel

K0 (v1 )K0 (v2 ), where K0 (u) = (15/16)(1 − u2 )2 if |u| ≤ 1, 0 otherwise. The cross-

validation (CV) method was used to select the bandwidths for hν (ν = 1, 2, 3, 4), and w(u) was taken as to be 1.

Note that m(v) = 0 when the responses Y and X are missing, it can be known

al

that the WEL and IEL are identical, thereby the WEL was used only in this example. Consider the confidence intervals for every component of β = (β1 , β2 , β3 , β4 )T and θ = (θ1 , θ2 )T . We used the initial values of θ and β which were obtained by the liner

urn

model and the generalized linear model. The average length and coverage probability of confidence intervals were calculated by the WEL and NA methods at nominal level 0.95. The normal approximation was considered in the estimator βˆW . Based on 500

Jo

simulation runs, the simulation results are reported in Table 3.

22

Journal Pre-proof Table 3 is about here

From Table 3 we have the following results: For parameter β, the WEL has smaller interval lengths and higher coverage probabilities than NA, and the coverage probabilities are in agreement with the nominal level 0.95, however, for parameter θ, the WEL and NA have large interval lengths and small coverage probabilities.

of

The performances of the estimate for g(u) was considered by 200 runs when n = 100. The boxplot for the 200 RMSEs is given when n = 100. The simulation results are

p ro

presented in Figure 3.

Pr e-

Figure 3 is about here

From Figure 3(a), it is clear that the estimated curve is very close to the real link function curve. Figure 3(b) shows that the RMSE of the estimate for function g(u) is also very small.

4.2

A real data example

In this section, we present an analysis of a data set from an AIDS clinical trial group

al

(ACTG) study. The data set contains viral load, base ribonucleic acid (RNA) virus, CD4 and CD8 cell counts from 48 valuable patients enrolled in the ACTG protocol 315.

urn

In this study, every patient was scheduled to be measured after initiation of antiviral therapy. with the number of observations per patient ranging from 2 to 91. Thus, there are a total of 317 observations, whit 20% of CD4 and CD8 cell counts missing. This is a longitudinal data set. However, model (1.1) with missing observations can be

Jo

used to study this data set if there is no correlation. Here we ignored the correlation structure when computing the estimates, using the so-called working independence 23

Journal Pre-proof assumption. The working independence has some model-robustness advantages over estimation methods (Lin and Carrol, 2001). It is one of the clinical investigator’s interests to study the effectiveness of antiviral medicines. The purpose of this study was to investigate the relationship between virologic and immunologic responses in AIDS clinical trials. The data set was analyzed by Wu and Wu (2001, 2002). In general, it is believed that the virologic response RNA

of

(measured by viral load) and immunologic response (measured by CD4 and CD8 cell counts) are negatively correlated during treatment. Liang et al. (2004) suggested that viral load depends linearly on CD4 cell count but nonlinearly on treatment time. Our

p ro

way to model the relationship among viral load, base RNA virus, CD4 and CD8 cell counts by using model (1.1), the pure single-index model and the partially linear model. Let Y be the viral load, Z be the base RNA virus, T be treatment time (day), and let X1 and X2 be the CD4 and CD8 cell counts, respectively. We used θ to represent

Pr e-

the coefficient of Z, and used β1 and β2 to represent the coefficients of X1 and X2 respectively. To stabilize the variance and computational algorithms, we used log10 scale in viral load (this is commonly used in AIDS clinical trials). Most of the missing values of the covariates CD4 and CD8 cell counts occurred because the covariates and the viral load were measured at different times, the missingness does not depend on the values being missing, and in this sense is MAR (Wu, 2002; Liang, et al., 2004). Since the covariates X1 and X2 are missing, the covariate Z and response Y are

al

fully observed, and hence the variable V equals to (Z, Y )T . We used the same kernel function as in the simulation study in Example 2. The CV method was used to select

urn

the bandwidths of hν (ν = 1, 2, 3, 4), and w(u) was taken as 1. The determination 2 coefficient Rnew was used to evaluate the goodness of fit for nonlinear regression models.

That is

2 Rnew =1−

 n ∑ 

i−1

(Yi − Yˆi )2

n /∑

j=1

Yj2

1/2  

,

Jo

where Yˆi (1 ≤ i ≤ n) are the prediction values of Yi (1 ≤ i ≤ n) . The determination 24

Journal Pre-proof 2 coefficient Rnew and the R2 of linear model are equivalent in meaning of the goodness

of fit. We used the following three semiparametric models to analyze the AIDS data: Model 1. Y = g(β1 X1 + β2 X2 ) + θZ + ε, Model 2. Y = g(β1 X1 + β2 X2 ) + ε, Model 3. Y = β1 X1 + β2 X2 ) + g(T ) + ε, The WEL, IEL, NA and EEM were used to obtain the estimates and the confidence

of

intervals for the parameters β1 , β2 and θ, where NA is based on EEM estimate. The

p ro

calculation results are reported in Table 4.

Table 4 is about here

From Table 4 it is clear that the estimates and the confidence intervals of β1 , β2

Pr e-

and θ, calculated by WEL and IEL, are similar. The reason is that the sample size is 317, only 20% of the CD4 and CD8 cell counts are missing. The lengths of the intervals based on WEL and IEL are slightly shorter than those based on NA in great majority cases. In addition, for model 1, the estimation of θ is 0.182, this implies that the base RNA levels have a positive affect for the viral load RNA levels. The pointwise confidence intervals of g(·) were computed by the EEM and NA 2 methods. The results are given in Figure 4. The determination coefficients Rnew

al

and the optimal bandwidths of hν (ν = 1, 2, 3, 4) were also obtained. The results are

urn

presented in reported in Table 5.

Figure 4 and Table 5 are about here

Jo

In Figure 4, the abscissa axis u is a linear combination βˆ1 x1 + βˆ2 x2 of the CD4 and CD8 cell counts, and t is day. the ordinate is the estimates and the pointwise confidence 25

Journal Pre-proof intervals of g(·), based on NA. Figure 4 (a) indicates that the viral load RNA levels are increasing with u increasing on (−400, 0), then decreasing with u increasing on (0, 100) rapidly. Figure 4 (b) indicates that the viral load RNA levels are increasing with u increasing on (−100, 0), then decreasing with u increasing on (0, 400). This implies that the CD4 and CD8 cell counts influence the viral load RNA levels, and that the antiviral treatment has obvious effect in decreasing the viral load RNA levels. Figure

of

4 (c) indicates that the viral load RNA levels rapidly decrease after initial antiviral treatment, then rebound slightly.

2 In Table 5, The determination coefficients Rnew of Models 1–3 are 0,8934, 0.9181

p ro

and 0.8166 respectively. Therefore, from Models 1 and 2 it can be seen that even though the base RNA variable is significant, but its inclusion leads to a minor decrease 2 in Rnew . This also shows that CD4 and CD8 cells are the most important factors in

immune defence against HIV. In addition, The determination coefficient of Model 3 is

Pr e-

smaller than that of Models 1 and 2, this illustrates that the single-index models are more reasonable than the partially linear model for analyzing the AIDS data. Also, since the determination coefficient R2 = 0.0218 for the linear model is very small, therefor, the linear model is not reasonable to study the AIDS data.

5

Concluding remarks

al

We have developed the empirical likelihood method for the partially linear singleindex model with data missingness in both the response and the covariates. We pro-

urn

posed two empirical log-likelihood ratios such that any of two ratios is asymptotically chi-squared. We also constructed the maximum empirical likelihood estimators of the index coefficients and the estimator of link function, and obtained their asymptotic distributions and optimal convergence rate. An important feature of our methods is

Jo

their ability to handle missing response and/or partially missing covariates. The proposed methods were illustrated by the simulation studies and a real data example. Our 26

Journal Pre-proof methods also can be generalized to other semiparametric regression models such as the partially linear varying coefficient model, the single-index varying coefficient model, and so on.

Acknowledgements

This work was supported by the National Natural Sci-

ence Foundation of China (11971001), and the Beijing Natural Science Foundation

of

(1182002). The data set of real example is from an AIDS clinical trial group (ACTG)

p ro

study.

Appendix

In this appendix, we prove Theorems 1, 2, 5 and 6 only. The proofs of Theorems 3 and 4 are similar to the proofs of Theorems 1, 2 and 4 in Wang et al. (2010),

Pr e-

respectively, and hence we omit their proofs. The following Lemma 1 is useful for proving these Theorems. The proof of Lemma 1 is given in the supplement material.

Lemma 1. Suppose that Conditions (C1)–(C8) hold. If β and θ are the true parameters, then

n 1 ∑ D √ ηˆi,J (β (r) , θ) −→ N (0, Ω), n i=1

al

n 1∑ P T ηˆi,W (β (r) , θ)ˆ ηi,W (β (r) , θ) −→ Γ, n i=1

and

urn

n 1∑ P T ηˆi,I (β (r) , θ)ˆ ηi,I (β (r) , θ) −→ Ω, n i=1

max |ˆ ηi,J (β (r) , θ)| = oP (n1/2 )

1≤i≤n

Jo

for J = W or I, where Γ and Ω are defined in Theorem 1.

27

(A.2) (A.3) (A.4)

(A.5)

Journal Pre-proof Proof of Theorem 1.

For J = W or I, by the Lagrange multiplier method,

RJ (β (r) , θ) can be represented as RJ (β (r) , θ) = 2

n ∑

log(1 + λT ηˆi,J (β (r) , θ)),

(A.6)

i=1

where λ = λ(β (r) , θ) is a (p + q − 1) × 1 vector given as the solution to ηˆi,J (β (r) , θ) = 0. T ˆ (β (r) , θ) i,J i=1 1 + λ η

(A.7)

of

n ∑

By Lemma 1, and using the same arguments as are used in the proof of (2.15) in Owen (1990), we can show that

(A.8)

p ro

λ = OP (n−1/2 ).

Applying the Taylor expansion to (A.6), and invoking Lemma 1 and (A.8), we get that RJ (β (r) , θ) = 2

n ∑ i=1

[λT ηˆi,J (β (r) , θ) − {λT ηˆi,J (β (r) , θ)}2 /2] + oP (1).

0=

Pr e-

By (A.7), it follows that n ∑

(A.9)

n n ∑ ∑ ηˆi,J (β (r) , θ) (r) T = η ˆ (β , θ) − ηˆi,J (β (r) , θ)ˆ ηi,J (β (r) , θ)λ i,J Tη (r) , θ) 1 + λ ˆ (β i,J i=1 i=1 i=1

+

n ∑ ηˆi,J (β (r) , θ){λT ηˆi,J (β (r) , θ)}2

1 + λT ηˆi,J (β (r) , θ)

i=1

.

This, together with Lemma 1 and (A.8), proves that

i=1

and

ηˆi,J (β

(r)

λT ηˆi,J (β (r) , θ) + oP (1)

i=1

T , θ)ˆ ηi,J (β (r) , θ)

urn

λ=

{ n ∑

n ∑

{λT ηˆi,J (β (r) , θ)}2 =

al

n ∑

i=1

}−1

n ∑

ηˆi,J (β (r) , θ) + oP (n−1/2 ).

i=1

Therefore, from (A.9) we have {

}T {

n 1 ∑ RJ (β (r) , θ) = √ ηˆi,J (β (r) , θ) n i=1

{

}

Jo

n 1 ∑ × √ ηˆi,J (β (r) , θ) + oP (1). n i=1

28

}−1

n 1∑ T ηˆi,J (β (r) , θ)ˆ ηi,J (β (r) , θ) n i=1

(A.10)

Journal Pre-proof This together with Lemma 1 proves Theorem 1.

Proof of Theorem 2. From (A.10). we can obtain ˆ W,ad (β (r) , θ) = Q ˆ T (β (r) , θ)Ω ˆ −1 (β (r) , θ)Q(β ˆ (r) , θ) + oP (1). R

of

This together with Lemma 1 proves Theorem 2.

Proof of Theorem 5. Using the standard argument, we can obtain ∫

tl K(t)dt + OP (hl+2 2 ), l = 0, 1, 2.

p ro

Sn,l (u; βˆJ ) = hl2 p(u) Therefore, from (2.4) it follows

where µ2 =



Pr e-

n 1 1∑ δi ε i gˆ(u; βˆJ , θˆJ ) − g(u) = µ2 h22 g ′′ (u) + Kh (β T Xi − u) + oP (cn ), 2 n i=1 π(Vi )p(u) 2

t2 K(t)dt, cn = (nh2 )−1/2 + h22 . By Theorem 4.4 of Masry and Tjøstheim

(1995), Theorem 5 is proved.

2

Proof of Theorem 6. Similarly to proof of Lemma 1, it can be proved that n 1 ∑ D √ η˜i,I (β (r) ) −→ N (0, B1 ), n i=1

(A.11)

max |˜ ηi,I (β (r) )| = oP (n1/2 ),

(A.12)

al

n 1∑ P η˜i,I (β (r) )˜ ηi,I (β (r) ) −→ B1 n i=1

urn

and

(A.10)

1≤i≤n

where B1 is defined in Corollary 1. From (A.10)–(A.12), and similar to the proof of D ˜ I (β (r) ) −→ Theorem 1, we can prove that R χ2p−1 . Similarly, we also can prove that D

Jo

R∗I (θ) −→ χ2q . This completes the proof of Theorem 6.

29

Journal Pre-proof References Arnold, S. f. (1981). The Theory of Linear Models and Multivariate Analysis. John Wiley & Sons, New York. Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92, 477–489.

of

Chen, C. H. and Li, K. C. (1998). Can SIR be as popular as multiple linear regression. Statistics Sinica 8, 289–316.

p ro

Engle, R. F., Granger, C. W. J., Rice, J. and Weiss, A. (1986). Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81, 310–320.

Chapman and Hall, London.

Pr e-

Fan, J. Q. and Gijbels, I. (1996). Local Polynomial Modeling and Its Applications.

Guo, X., Niu, C. Z., Yang, Y. P. and Xu, W. L. (2015). Empirical likelihood for single index model with missing covariates at random. Statistics 49, 588–601. Li, J. L., Huang, C. and Zhu, H. T. (2017). A functional varying-coefficient singleindex model for functional response data. J. Amer. Statist. Assoc. 112, 1169–

al

1181. Harrison, D. and Rubinfeld, D. (1978). Hedonic housing pries and the demand for

urn

clean air. Environmental Economics and Management 5, 81–102. Liang, H., Wang, S. J., Robins, J. M. and Carroll, R. J. (2004). Estimation in partially linear models with missing covariates. J. Amer. Statist. Assoc. 99, 357–367. Liang, H. (2008). Generalized partially linear models with missing covariates. J.

Jo

Mult. Anal. 99, 880–895.

30

Journal Pre-proof Lin, X. H. and Carroll, R. J. (2001). Semiparametric regression for clustered data using generalized estimating equations, J. Amer. Statist. Assoc. 96, 1045–1056. Masry, E. and Tjøstheim, D. (1995). Nonparametric estimation and identification of nonlinear ARCH time series: Strong convergence and asymptotic normality. Econometric Theory 11, 258–289

of

Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single function. Biometrika 75, 237–249.

p ro

Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Ann. Statist. 18, 90–120.

Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations (in likelihood and related topics). Ann. Statist. 22, 300–325.

Pr e-

Qin, J. and Zhang, B. (2007). Empirical-likelihood-based inference in missing response problems and its application in observational studies. J. Roy. Statist. Soc., ser. B, 69, 101–122.

Qin, J., Zhang, B. and Leung, D. H. Y. (2009). Empirical Likelihood in Missing Data Problems J. Amer. Statist. Assoc. 104, 1492–1503.

Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression-

Assoc. 89, 846–866.

al

coefficients when some regressors are not always observed. J. Amer. Statist.

urn

Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63 581–592. Seber, G. A. F. and Wild, C. J. (1989). Nonlinear Regression. New York: John Wiley & Sons, Inc., 103–110.

Jo

Wang, J. L., Xue, L. G., Zhu, L. X. and Chong, Y. S. (2010). Estimation for a partial-linear single-index model. Ann. Statist. 38, 246–274. 31

Journal Pre-proof Wang, Q. H. (2004). Likelihood-based imputation inference for mean functionals in the presence of missing responses. Ann. Instit. Statist. Mathemat. 56, 403-414 SEP 2004. Wang, Q. H. (2009). Statistical estimation in partial linear models with covariate data missing at random. Ann. Instit. Statist. Mathemat. 61, 47–84.

of

Wang, Q. H., Linton, O. and H¨ardle, W. (2004). Semiparametric regression analysis with missing response at random. J. Amer. Statist. Assoc. 99, 334–345.

p ro

Wang, Q. H. and Rao, J. N. K. (2001). Empirical likelihood for linear regression models under imputation for missing responses. Canadian J. Statist. 29, 597– 608.

Wang, Q. H. and Rao, J. N. K. (2002a). Empirical likelihood-based inference under

Pr e-

imputation for missing response data. Ann. Statist. 30, 896–924. Wang, Q. H. and Rao, J. N. K. (2002b). Empirical likelihood-based inference in linear models with missing data. Scandinavian J. Statist. 29, 563–576. Wu, L. (2002). A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. J. Amer.

al

Statist. Assoc. 97, 955–964.

Wu, H. and Wu, L. (2001). A multiple imputation method for missing covariates in nonlinear mixed-effect models, with application to HIV Dynamics. Statist. Med.,

urn

20, 1755–1769.

Wu, L. and Wu, H. (2002). Nonlinear Mixed-Effect Models with Missing TimeDependent Covariates, with Application to HIV Viral Dynamics. J. Roy. Statist.

Jo

Soc. Ser. C, textbf51, 297–318.

32

Journal Pre-proof Xia, Y. C. (2006). Asymptotic distributions for two estimators of the single-index model. Econometric Theory 22, 1112–1137. Xia, Y. C. and H¨ardle, W. (2006). Semi-parametric estimation of partially linear single-index models. J. Multi. Anal. 97, 1162 - 1184 Xue, L. G. (2009a). Empirical likelihood for linear models with missing responses. J.

of

Mult. Anal. 100, 1353–1366. Xue, L. G. (2009b). Empirical likelihood confidence intervals for response mean with

p ro

data missing at random. Scandinavian J. Statist. 36, 671–685.

Xue, L. G. (2013). Estimation and empirical likelihood for single-index models with missing data in the covariates. Comput. Statist. Data Anal. 60, 82–97. Xue, L. G. and Lian, H. (2016). Empirical likelihood for single-index models with

Pr e-

responses missing at random. Science China, Mathemat. 59, 1187–1207. Yu, Y. and Ruppert, D. (2002). Penalized spline estimation for partially linear singleindex models. J. Amer. Statist. Assoc. 97, 1042–1054. Zhu, L. X. and Xue, L. G. (2006). Empirical likelihood confidence regions in a partially

Jo

urn

al

linear single-index model. J. Roy. Statist. Soc. ser. B 68, 549–570.

33

Journal Pre-proof Table 1. Simulation results of Example 1. The bias and SD (in parentheses) for β1 , β2 and θ based on 500 simulations under different selection probability functions πi (v) and sample sizes n (all entries are multiplied 100)

WEL 0.0187 (0.444) −0.0083 (0309)

EEM −0.0163 (0.514) 0.0041 (0.331)

0.0084 (0.651) −0.0083 (0.236)

π3 60 −0.0228 (0.578) 100 0.0105 (0.507)

−0.0129 (0.565) 0.0132 (0.505)

EEM 1.6431 (1.970) 1.6600 (1.091)

0.0382 (0.664) −0.0090 (0.241)

−0.0165 (0.771) 0.0150 (0.305)

−0.0156 (0.770) 0.0140 (0.303)

0.0421 (0.793) 0.0107 (0.319)

2.1792 (1.567) 1.7047 (1.528)

2.1857 (1.566) 1.7208 (1.556)

1.6942 (1.740) 1.5130 (1.713)

0.0092 (0.557) −0.0090 (0.529)

0.0084 0.0091 −0.0101 2.3434 (0.708) (0.707) (0.736) (1.415) 0.0080 −0.0033 −0.0091 2.0170 (0.642) (0.640) (0.666) (1.371)

2.3538 (1.444) 2.0290 (1.386)

1.7953 (1.472) 1.7019 (0.842)

al urn Jo

34

of

π2 60 −0.0097 (0.651) 100 0.0092 (0.237)

WEL 1.9130 (1.827) 1.4205 (1.076)

θ=1 IEL 1.9175 (1.820) 1.4242 (1.090)

p ro

EEM 0.0042 (0.386) −0.0041 (0.249)

πi n π1 60

β2 = 0.6 IEL 0.0193 (0.443) 0.0085 (0.309)

Pr e-

WEL 0.0234 (0.349) 100 −0.0061 (0.244)

β1 = 0.8 IEL −0.0235 (0.349) −0.0066 (0.244)

Journal Pre-proof Table 2. Simulation results of Example 1. The average lengths and empirical coverage probabilities (in parentheses) of the confidence intervals for β1 , β2 and θ under different selection probability functions πi (v) and sample sizes n when the nominal level is 0.95

β1 IEL 0.0140 (0.940) 0.0095 (0.956) 0.0068 (0.972)

NA 0.0120 (0.920) 0.0076 (0.926) 0.0055 (0.938)

AEL 0.0189 (0.944) 0.0124 (0.966) 0.0089 (0.972)

β2 IEL 0.0185 (0.944) 0.0122 (0.966) 0.0086 (0.972)

π2 60

0.0162 (0.938) 0.0100 (0.946) 0.0075 (0.952)

0.0135 (0.918) 0.0033 (0.926) 0.0064 (0.934)

0.0214 (0.940) 0.0129 (0.954) 0.0097 (0.958)

0.0210 (0.938) 0.0125 (0.950) 0.0092 (0.952)

0.0178 (0.926) 0.0110 (0.932) 0.0086 (0.940)

π3 60

0.0227 (0.930) 0.0122 (0.942) 0.0093 (0.950)

0.0187 (0.912) 0.0105 (0.924) 0.0075 (0.930)

0.0279 (0.936) 0.0153 (0.952) 0.0120 (0.954)

0.0273 (0.932) 0.0150 (0.950) 0.0116 (0.952)

0.0247 (0.924) 0.0140 (0.926) 0.0099 (0.934)

0.0169 (0.936) 100 0.0103 (0.952) 150 0.0078 (0.956)

Jo

urn

al

0.0230 (0.932) 100 0.0125 (0.948) 150 0.0096 (0.954)

NA 0.3509 (0.956) 0.2456 (0.964) 0.1802 (0.976)

0.4522 (0.948) 0.2896 (0.954) 0.2473 (0.960)

0.4488 (0.946) 0.2889 (0.952) 0.2470 (0.956)

0.4162 (0.950) 0.2775 (0.956) 0.2111 (0.962)

0.5482 (0.942) 0.3318 (0.950) 0.2720 (0.952)

0.5480 (0.942) 0.3306 (0.948) 0.2717 (0.952)

0.5331 (0.946) 0.3332 (0.950) 0.2462 (0.954)

AEL 0.4169 (0.952) 0.2797 (0.964) 0.2186 (0.970)

of

NA 0.0159 (0.928) 0.0101 (0.934) 0.0073 (0.944)

p ro

AEL 0.0144 (0.942) 100 0.0097 (0.956) 150 0.0069 (0.972)

Pr e-

πi n π1 60

θ IEL 0.4165 (0.950) 0.2784 (0.964) 0.2181 (0.970)

35

Journal Pre-proof Table 3. Simulation results of Example 2. The average lengths (AI) and empirical coverage probabilities (CP) of the confidence intervals for β1 , . . . , β4 , θ1 and θ2 for selection probability function π(z1 , z2 ) and sample size n = 100 when the nominal level is 0.95

Method WEL

β1 0.0706 0.9540 0.1019 0.9640

β2 0.0965 0.9800 0.0998 0.9500

β3 0.0972 0.9860 0.1003 0.9620

β4 0.4787 0.9160 0.5261 0.9060

θ1 0.5000 0.9200 0.4232 0.9140

θ2 0.5000 0.9540 0.3601 0.9460

of

NA

AI/CP AI CP AI CP

p ro

Table 4. Calculation results of a real data example. Estimates and the 95% confidence intervals for β1 , β2 and θ based on WEL, IEL and EEM/NA (all entries for Model 3 are multiplied 1000).

Pr e-

M odels Ps WEL IEL EEM/NA Model 1 β1 −1.001(−1.004, −0.996) −1.001(−1.004, −0.996) −0.999(−0.002, −0.997) β2 0.030 (0.015, 0.045) 0.030 (0.015, 0.045) 0.040 (−0.063, 0.143) θ 0.182 (0.120, 0.242) 0.182 (0.123, 0.238) 0.182 (0.122, 0.243) Model 2 β1 1.000 (0.985, 1.015) 1.000 (0.985, 1.015) 0.999 (0.992, 1.005) β2 −0.040(−0.055, −0.025) −0.040(−0.055, −0.025) −0.049 (−0.105, 0.007) Model 3 β1 β2

0.005 (−0.242, 0.250) 0.004 (−0.057, 0.004)

0.041 (−0.189, 0.217) 0.002 (0.002, 0.002)

−0.007 (−0.245, 0.230) −0.018 (−0.053, 0.017)

Model1 0.8934 0.4123 277.4938 5.2618 3.1328

urn

Jo

2 Rnew /hi 2 Rnew h1 h2 h3 h4

al

2 Table 5. Calculation results of a real data example. The determination coefficients Rnew and the bandwidths of hν (ν = 1, 2, 3, 4), based on Models 1–3 and the linear model.

36

Model2 0.9181 0.1071 336.7335 0.3241 3.1418

Model3 0.8166 0.6627 28.0757 14.9822 0.5615

0.800

0.805

0.590

0.595

β1

of

θ

0.795

0.8 0.9 1.0 1.1 1.2 1.3

θ

0.8 0.9 1.0 1.1 1.2 1.3

Journal Pre-proof

0.600

0.605

β2

Figure 1:

Pr e-

p ro

Simulation results of Example 1. Approximate 95% confidence regions for (β1 , θ) and (β2 , θ), based on AEL (solid curve), IEL (dashed curve) and NA (dotted-dashed curves) in Case 3 when n = 100.

(b)

−2

−1

0

1

2

15 0

al

0

5

10

Values

60 40 20

g(u)

80

20

25

(a)

3

u

RMSE

Jo

urn

Figure 2: Simulation results of Example 1. (a) is the real function curve (solid curve), the estimated curve (dotted-dashed curve) and approximate 95% pointwise confidence intervals (dashed curve) for g(u), (b) is the boxplot of the 200 RMSE values for estimate of g(u), based on Case 3 when n = 60.

37

−2

−1

0 u

1

Pr e-

0.0

−1.5

0.2

0.4

Values

0.6

0.8

1.5 0.5 −0.5

g(u)

−3

p ro

(b) 1.0

(a)

2

of

Journal Pre-proof

3

RMSE

Jo

urn

al

Figure 3: Simulation results of Example 2. (a) is the real function curve (solid curve), the estimated curve (dotted-dashed curve) and approximate 95% pointwise confidence intervals (dashed curve) for g(u), (b) is the boxplot of the 200 RMSE values for estimate of g(u), based on Case 3 when n = 100.

38

Journal Pre-proof

of

1.4 1.3

0

100

−100

u

40

60 t

100

Pr e-

1.6 1.4 1.2 1.0 0.8

20

0

200

300

400

u

(c)

0

p ro

^ (u) g

1.1 1.0

−400 −300 −200 −100

^ (t) g

1.2

0.3 0.2 0.0

0.1

^ (u) g

0.4

1.5

(b)

0.5

(a)

80

100

Jo

urn

al

Figure 4: Calculation results of a real data example. The estimated curve (solid curve) and approximate 95% pointwise confidence intervals (dashed curve) for g(·).

39