Spline confidence bands for functional derivatives

Spline confidence bands for functional derivatives

Journal of Statistical Planning and Inference 142 (2012) 1557–1570 Contents lists available at SciVerse ScienceDirect Journal of Statistical Plannin...

390KB Sizes 0 Downloads 55 Views

Journal of Statistical Planning and Inference 142 (2012) 1557–1570

Contents lists available at SciVerse ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Spline confidence bands for functional derivatives Guanqun Cao a, Jing Wang b, Li Wang c,n, David Todem a a

Michigan State University, United States University of Illinois at Chicago, United States c University of Georgia, United States b

a r t i c l e i n f o

abstract

Article history: Received 17 April 2011 Received in revised form 31 October 2011 Accepted 4 January 2012 Available online 20 January 2012

We develop in this paper a new procedure to construct simultaneous confidence bands for derivatives of mean curves in functional data analysis. The technique involves polynomial splines that provide an approximation to the derivatives of the mean functions, the covariance functions and the associated eigenfunctions. We show that the proposed procedure has desirable statistical properties. In particular, we first show that the proposed estimators of derivatives of the mean curves are semiparametrically efficient. Second, we establish consistency results for derivatives of covariance functions and their eigenfunctions. Most importantly, we show that the proposed spline confidence bands are asymptotically efficient as if all random trajectories were observed with no error. Finally, the confidence band procedure is illustrated through numerical simulation studies and a real life example. & 2012 Elsevier B.V. All rights reserved.

Keywords: B-spline Eigenfunctions Functional data Karhunen–Loe ve representation Semiparametric efficiency

1. Introduction Functional data analysis (FDA) is a relatively new field in statistics, where the data are typically curves or highdimensional surfaces. In exploratory FDA, it is often of interest to estimate the mean functions; see for example, Ramsay and Silverman (2005), Yao et al. (2005a,b), Ferraty and Vieu (1996), Li and Hsing (2010) and Cao et al. (forthcoming). In some settings, however, estimation and inference of derivatives of the mean functions in FDA are of equal importance. For example, in economics, consistent and direct estimation of derivatives are essential for estimating elasticities, returns to scale, substitution rates and average derivatives. Often, these index (derivative) functions are as interesting as the mean functions themselves. Another example is in the fields of engineering and biomedical sciences, where the estimation of velocity and acceleration are of great importance in addition to obtaining a smooth curve of the measurements. The problem of estimation and inference of derivatives for functional data is very challenging; see Ramsay and ¨ Silverman (2005), Liu and Muller (2009), and Hall et al. (2009) for some discussions. Existing methodologies for derivatives ¨ of the regression function in FDA often rely on a pointwise analysis. For example, in Liu and Muller (2009) the theoretical focus was primarily on obtaining consistency and asymptotic normality of the proposed estimators, thereby providing the necessary ingredients to construct pointwise confidence intervals. This approach is important but its usefulness in conducting global inferences is limited. To our knowledge, we are not aware of any methodology that provides simultaneous confidence bands for functional derivatives in FDA. In this paper, we develop such methodology with the primary aim to better understand the variability and shape of the mean curve.

n

Corresponding author. E-mail addresses: [email protected] (G. Cao), [email protected] (J. Wang), [email protected] (L. Wang), [email protected] (D. Todem).

0378-3758/$ - see front matter & 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2012.01.009

1558

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

Nonparametric simultaneous confidence bands are powerful tools for global inference of functions. Some work has been conducted to study the simultaneous confidence band of the mean curves for FDA; see Degras (2011), Ma et al. (2012) and Cao et al. (forthcoming). The research work on confidence bands for functional derivatives is actually sparse. This is partially due to the technical difficulty to formulate such bands for FDA and establish the associated theoretical properties. Some smoothing tools are necessary to construct the confidence bands. Popular smoothing methods include kernels ¨ ¨ (Gasser and Muller, 1984; Hardle, 1989; Xia, 1998; Claeskens and Van Keilegom, 2003), local polynomials (Fan and Gijbels, 1996), splines (Wahba, 1990; Stone, 1994) and series expansion methods (Morris and Carroll, 2006). In this paper, we use B-splines, which can be readily implemented due to their explicit expression, to construct the bands. B-spline approximation has also been employed to estimate the functional mixed-effect models in Shi et al. (1996) and Rice and Wu (2001), and to study functional data via principle components in Yao and Lee (2006) and Zhou et al. (2008). Other works include Zhou et al. (1998) and Wang and Yang (2009a) who have proposed B-spline confidence bands for regression functions. The proposed confidence bands are asymptotically the same as if all the random trajectories are correctly recorded over the entire interval. As discussed in Section 2.3, the estimators are semiparametrically efficient thereby providing partial theoretical justification for treating functional data as perfectly recorded random curves over the entire data range, as in Cao et al. (forthcoming). The rest of the paper is organized as follows. In Section 2, we introduce the model and the spline estimators for the mean curves and their derivatives. Section 3 presents the simultaneous confidence bands for the derivatives of the mean curves. Specifically, in Section 3.1, we show that the bands have asymptotically correct coverage probabilities; and in Section 3.2, we discuss how to estimate the unknown components involved in the band construction and other issues of the implementation. Section 4 reports findings from a simulation study and a real data set. Proofs of technical results are relegated to Appendix. 2. Models and spline estimators 2.1. Models We consider a collection of trajectories fX i ðtÞgni¼ 1 which are i.i.d. realizations of a smooth random function X(t), defined R on a continuous interval T . Assume that fXðtÞ,t 2 T g is a L2 ðT Þ process, i.e. E T X 2 ðtÞ dt o þ 1, and define the mean and covariance functions as mðtÞ ¼ EfXðtÞg and Gðt,sÞ ¼ covfXðtÞ,XðsÞg, t,s 2 T . The covariance function is a symmetric P nonnegative-definite function with a spectral decomposition, Gðt,sÞ ¼ 1 k ¼ 1 lk fk ðtÞfk ðsÞ, where l1 Z l2 Z    Z 0, are P1 the eigenvalues satisfying k ¼ 1 lk o 1, and ffk ðtÞg1 an orthonormal k ¼ 1 are the corresponding eigenfunctions that form P basis. By the standard Karhunen–Loe ve representation (Hall and Hosseini-Nasab, 2006), X i ðtÞ ¼ mðtÞ þ 1 k ¼ 1 xik fk ðtÞ, where the random coefficients xik are uncorrelated with mean 0 and variance lk . In what follows, we assume that lk ¼ 0, for k4 k, where k is a positive integer or 1. We consider a typical functional data setting where X i ðÞ is recorded on a regular grid in T , and assumed to be contaminated with measurement errors. Without loss of generality, we take T ¼ ½0; 1. Then the observed data are Y ij ¼ X i ðT ij Þ þ sðT ij Þeij , for 1 r ir n, 1 r j r N, where T ij ¼ j=N, eij are independent random errors with Eðeij Þ ¼ 0 and Eðe2ij Þ ¼ 1, and sðÞ is the standard deviation of the measurement errors. By the Karhunen–Loe ve representation, the observed data can be written as k X

Y ij ¼ mðj=NÞ þ

xik fk ðj=NÞ þ sðj=NÞeij ,

ð1Þ

k¼1

where mðÞ, sðÞ and ffk ðÞgkk ¼ 1 are smooth but unknown functions of t. In addition, ffk ðÞgkk ¼ 1 are further subject to R1 2 R1 0 constraints 0 fk ðtÞ dt ¼ 1, and 0 fk ðtÞfk0 ðtÞ dt ¼ 0, for k ak. 2.2. Spline estimators We first introduce some notation of the B-spline space. Divide the interval T ¼ ½0; 1 into ðNm þ 1Þ subintervals N IJ ¼ ½oJ , oJ þ 1 Þ, J ¼ 0, . . . ,N m 1, INm ¼ ½oNm ,1, where $m :¼ foJ gJ ¼m 1 is a sequence of equally spaced points, called interior ðp2Þ knots. Let H be the polynomial spline space of order p on [0,1]. This space consists of all p 2 times continuously differentiable functions on [0,1] that are polynomials of degree p 1 on each interval IJ. The J-th B-spline of order p is denoted by BJ,p . We augment the boundary and the number of interior knots as o1p ¼    ¼ o1 ¼ o0 ¼ 0 o o1 o    o oNm o 1 ¼ oNm þ 1 ¼    ¼ oNm þ p , in which oJ ¼ Jhm , J ¼ 0; 1, . . . ,Nm þ 1 and hm ¼ 1=ðNm þ 1Þ is the distance between neighboring knots. Following Cao et al. (forthcoming), we estimate the mean function mðÞ in (1) by

m^ ðÞ ¼ arg min

n X N X

gðÞ2Hðp2Þ i ¼ 1 j ¼ 1

fY ij gðj=NÞg2 ¼

Nm X J ¼ 1p

b^ J,p BJ,p ðÞ,

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

1559

where the coefficients b^ p ¼ fb^ 1p,p , . . . , b^ Nm ,p gT ¼ arg min RN m þ p

Let Y ¼ ðY 1 , . . . ,Y N ÞT and Y j ¼ n1 T

1

Pn

n X N X

(

i¼1j¼1

i¼1

Y ij 

Nm X

)2 bJ,p BJ,p ðj=NÞ

:

J ¼ 1p

Y ij , 1r j r N. Applying elementary algebra, one obtains

T

m^ ðtÞ ¼ Bp ðtÞðB BÞ B Y,

ð2Þ B ¼ ðBTp ð1=NÞ,

. . . ,BTp ðN=NÞÞT

is the design matrix. in which Bp ðtÞ ¼ ðB1p,p ðtÞ, . . . ,BNm ,p ðtÞÞ and We denote by mðnÞ ðtÞ the n-th order derivative of mðtÞ with respect to t. Since m^ ðtÞ is an estimator of mðtÞ, it is natural to ðnÞ consider m^ ðtÞ as the estimator of mðnÞ ðtÞ, for any n ¼ 1, . . . ,p2, i.e.

m^ ðnÞ ðtÞ ¼ BpðnÞ ðtÞðBT BÞ1 BT Y,

ð3Þ

nÞ ðtÞ, . . . ,BðNnmÞ ,p ðtÞÞ. According to B-spline property in de Boor (2001), for p 4 2 and 2p r J rN m 1, where BpðnÞ ðtÞ ¼ ðBð1p,p   BJ,p1 ðtÞ BJ þ 1,p1 ðtÞ d BJ,p ðtÞ ¼ ðp1Þ  : dt oJ þ p1 oJ oJ þ p oJ þ 1

Therefore, BðpnÞ ðtÞ ¼ Bpn ðtÞDTðnÞ , in which DðnÞ ¼ D1    Dn1 Dn , with matrix 0 1 1 0 0  0 0 o1 o1p þ l B C C B 1 1 0  0 0 B o1 o1p þ l o2 o2p þ l C B C B C 1 1    0 0 0 Dl ¼ ðplÞB C o  o o  o 2 3 2p þ l 3p þ l C B C B ^ ^ ^ & ^ ^ B C @ A 1 0 0 0    0 oN þ pl oN m

m

for 1 r l r n r p2, which is the same as Eq. (6) in Zhou and Wolfe (2000). 2.3. Asymptotic convergence rate Define the following ‘‘infeasible estimator’’ of function mðnÞ ðnÞ

m ðnÞ ðtÞ ¼ X ðtÞ ¼ n1

n X

X ði nÞ ðtÞ,

t 2 ½0; 1:

ð4Þ

i¼1

The term ‘‘infeasible’’, borrowed from Cao et al. (forthcoming), refers to the fact that m ðnÞ ðÞ would be a natural estimator of mðnÞ ðÞ if all random curves X ði nÞ ðÞ were observed. In the following, we want to show that the spline estimator m^ ðnÞ ðÞ in (3) is asymptotically equivalent to m ðnÞ ðÞ. P ðnÞ We break the error m^ ðÞmðnÞ ðÞ into three terms. Let e j ¼ n1 ni¼ 1 eij , 1 r j r N. Denote the signal vector, the T noise vector and the eigenfunction vectors by l ¼ ðmð1=NÞ, . . . , mðN=NÞÞ , e ¼ ðsð1=NÞe 1 , . . . , sðN=NÞe N ÞT and /k ¼ ðfk ð1=NÞ, . . . , fk ðN=NÞÞT . Projecting the relationship in model (2) onto the linear subspace of RNm þ p spanned by fBJ,p ðj=NÞg1 r j r N,1p r J r Nm , we obtain the following crucial decomposition: ðnÞ

m^ ðnÞ ðtÞ ¼ m~ ðnÞ ðtÞ þ e~ ðnÞ ðtÞ þ x~ ðtÞ,

ð5Þ Pk Pn ðnÞ ð n Þ ð n Þ ð n Þ T T 1 1 ~ where m~ ðtÞ ¼ G ðtÞl, e~ ðtÞ ¼ G ðtÞe and x ðtÞ ¼ k ¼ 1 x k G ðtÞ/k with G ðtÞ ¼ Bp ðtÞðB BÞ B and x k ¼ n i ¼ 1 xik , 1 rk r k. The following proposition provides asymptotic properties of the three terms. ðnÞ

ðnÞ

ðnÞ

ðnÞ

Proposition 1. Under Assumptions (A1)–(A6) in Appendix, one has sup 9m~ ðnÞ ðtÞmðnÞ ðtÞ9 ¼ oðn1=2 Þ,

ð6Þ

t2½0;1 ðnÞ

sup 9x~ ðtÞðm ðnÞ ðtÞmðnÞ ðtÞÞ9 ¼ oP ðn1=2 Þ,

ð7Þ

t2½0;1

sup 9e~ ðnÞ ðtÞ9 ¼ oP ðn1=2 Þ:

ð8Þ

t2½0;1

Appendix A.2 contains proofs for the above proposition, which together with (5), leads to the following semiparametric efficiency result.

1560

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

Theorem 1. Under Assumptions (A1)–(A6) in Appendix, the B-spline estimator m^ pffiffiffi n approximation power, i.e.

ðnÞ

is asymptotically equivalent to m ðnÞ with the

ðnÞ sup 9m^ ðtÞm ðnÞ ðtÞ9 ¼ oP ðn1=2 Þ: t2½0;1

Remark 1. Since the ‘‘infeasible estimator’’ m ðnÞ ðtÞ is the sample average of i.i.d. trajectories fX i ðtÞgni¼ 1 , an application of the central limit theorem gives supt2½0;1 9m ðnÞ ðtÞmðnÞ ðtÞ9 ¼ OP ðn1=2 Þ. Thus combining with the results in Theorem 1, one has ðnÞ

sup 9m^ ðtÞmðnÞ ðtÞ9 ¼ OP ðn1=2 Þ: t2½0;1

3. Confidence bands In this section, we develop the simultaneous confidence bands for the derivative function mðnÞ ðÞ. 3.1. Asymptotic confidence bands P ðnÞ ðnÞ Let Sð,Þ be a positive definite function, and defined as Sðt,sÞ ¼ kk ¼ 1 lk fk ðtÞfk ðsÞ, t,s 2 ½0; 1. Denote by zðtÞ, t 2 ½0; 1 2 a standardized Gaussian process such that EzðtÞ  0, Ez ðtÞ  1 with covariance function EzðtÞzðsÞ ¼ Sðt,sÞfSðt,tÞSðs,sÞg1=2 , t,s 2 ½0; 1. Denote by q1a the 100ð1aÞth percentile of the absolute maxima distribution of zðtÞ, t 2 ½0; 1, i.e. Pfsupt2½0;1 9zðtÞ9 r q1a g ¼ 1a,8a 2 ð0; 1Þ. Theorem 2. Under Assumptions (A1)–(A6) in Appendix, 8a 2 ð0; 1Þ, as n-1, ( ) P

sup n1=2 9m ðnÞ ðtÞmðnÞ ðtÞ9Sðt,tÞ1=2 r q1a -1a:

t2½0;1

Applying Theorems 1 and 2 gives asymptotic confidence bands for mðnÞ ðtÞ, t 2 ½0; 1. Corollary 1. Under Assumptions (A1)–(A6) in Appendix, 8a 2 ð0; 1Þ, as n-1, an asymptotic 100  ð1aÞ% exact confidence band for mðnÞ ðtÞ is ðnÞ

PfmðnÞ ðtÞ 2 m^ ðtÞ 7 n1=2 q1a Sðt,tÞ1=2 ,

t 2 ½0; 1g-1a:

3.2. Implementation When constructing the confidence bands, one needs to estimate the unknown function Sðt,sÞ. Note that Sðt,sÞ ¼ Gðt,sÞ, ðnÞ ¨ when n ¼ 0. Following Liu and Muller (2009), we estimate fk through the derivatives of Gðt,sÞ. According to Cao et al. (2011), Gðt,sÞ is estimated by NG X

^ Gðt,sÞ ¼

b^ JJ0 BJ,p ðtÞBJ0 ,p ðsÞ,

ð9Þ

J,J 0 ¼ 1p

where R^ jj0 ¼ n1 coefficients

Pn

m^ ðj=NÞgfY ij0 m^ ðj0 =NÞg, 1 rjaj0 r N, NG is the number of interior knots for B-spline, and the

i ¼ 1 fY ij 

8 N < X R^ jj0  0 : RNG þ p RNG þ p

G ¼ arg min fb^ JJ0 gN J,J 0 ¼ 1p

jaj

X

0

bJJ0 BJ,p ðj=NÞBJ0 ,p ðj =NÞ

1p r J,J 0 r NG

92 = ;

with ‘‘’’ being the tensor product of two spaces. They showed that G^ converges to G as n goes to 1. In this section, we further show that G^ and G are asymptomatically equivalent up to the n-th partial derivative. We define the n-th derivative ^ with respect to s for Gðt,sÞ and Gðt,sÞ as n X ð0, n Þ @ @n ^ b^ JJ0 BJ,p ðtÞBðJn0 ,pÞ ðsÞ: ðt,sÞ ¼ n Gðt,sÞ ð10Þ ¼ Gð0, nÞ ðt,sÞ ¼ n Gðt,sÞ, G^ @s @s 0 J,J Theorem 3. Under Assumptions (A1)–(A6), one has sup 9G^

ð0, nÞ

ðt,sÞGð0, nÞ ðt,sÞ9 ¼ oP ð1Þ,

1 r n rp2:

ðt,sÞ2½0;12

The proof of Theorem 3 is given in Appendix A.3.

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

1561

^ ðnÞ using the following ¨ According to Liu and Muller (2009), we estimate the n-th derivative of eigenfunctions f k eigenequations: Z 1 n n Z 1 d @ ^ ^ ^ ðnÞ ^ ^ f^ k ðtÞ dt ¼ ð11Þ Gðt,sÞ n n Gðt,sÞf k ðtÞ dt ¼ l k f k ðsÞ, @s ds 0 0 ^ are subject to R 1 f ^ 2 ðtÞ dt ¼ 1 and R 1 f ^ ðtÞf ^ 0 ðtÞ dt ¼ 0 for k0 ok. If N is sufficiently large, the left hand side of (11) where f k k k k 0 0 PN ð0, nÞ ^ ^ ðnÞ ^ ðnÞ ^ ðj=NÞ. Then we estimate Sðt,sÞ by S ^ ðt,sÞ ¼ Pk can be approximated by ð1=NÞ j ¼ 1 G^ ðj=N,j0 =NÞf k k ¼ 1 l k f k ðtÞf k ðsÞ. ^ ð,Þ and Sð,Þ are asymptomatically equivalent. The following theorem shows that S Theorem 4. Under Assumptions (A1)–(A6), one has ^ ðt,sÞSðt,sÞ9 ¼ oP ð1Þ: sup 9S ðs,tÞ2½0;12

The proof of Theorem 4 is given in Appendix A.4. ^ Then we In practice, we choose the first L positive eigenvalues l^ 1 Z    Z l^ L 4 0 by eigenvalue decomposition of Gðt,sÞ. Pl ^ ¨ apply a standard criterion in Muller (2009), to choose the number of eigenfunctions, i.e. k ¼ arg min1 r l r L f k ¼ 1 lk= PL ¨ l^ 40:95g. Muller (2009) suggests the ‘‘pseudo-AIC’’ and this simple method of counting the percentage of variation k¼1

k

explained can be used to choose the number of principal components. The simple method performed well in our simulations and is used for our numerical studies. To construct the confidence bands, we use cubic splines to estimate the mean and covariance functions and their first order derivatives. Generalized cross-validation is used to choose the number of knots Nm (from 2 to 20), to smooth out the mean function. According to Assumption (A3), the number of knots for smoothing the covariance function is taken to be NG ¼ ½n1=ð2pÞ logðnÞ, where [a] denotes the integer part of a. Finally, in order to estimate q1a , we generate i.i.d. standard normal variables Z k,b , 1 rk r k, b ¼ 1, . . . ,5000. Let qffiffiffiffiffi ðnÞ ^z ðtÞ ¼ S ^ ðt,tÞ1=2 Pk l^ k Z k,b f^ k ðtÞ, t 2 ½0; 1, q1a can be estimated by 100ð1aÞ-th percentile of fsupt2½0;1 9z^ b ðtÞ9g5000 b k¼1 b ¼ 1. Therefore, in application we recommend the following band:

m^ ðnÞ ðtÞ 7n1=2 S^ ðt,tÞ1=2 q^ 1a , t 2 ½0; 1:

ð12Þ

4. Numerical studies 4.1. Simulated examples To illustrate the finite-sample performance of the confidence band in (12), we generate data from the following: Y ij ¼ mðj=NÞ þ

k X

i:i:d

xik fk ðj=NÞ þ eij , eij  Nð0; 0:12 Þ

k¼1

for 1 r ir n, 1 r j rN. We consider two scenarios. pffiffiffi pffiffiffi Model I: mðtÞ ¼ 5t þ 4 sinð2pðt0:5ÞÞ, f1 ðtÞ ¼  2 cosð2pðt0:5ÞÞ, f2 ðtÞ ¼ 2 sinð4pðt0:5ÞÞ, xik  Nð0, lk Þ, l1 ¼ 2, l2 ¼ 1, k ¼ 2; pffiffiffi pffiffiffiffiffiffi Model II: mðtÞ ¼ 4t þ ð1= 2p0:1Þexpððt0:5Þ2 =2ð0:1Þ2 Þ, fk ðtÞ ¼ 2 sinðpktÞ, xik  Nð0, lk Þ, lk ¼ 2ðk1Þ , k ¼ 1; 2, . . . , k ¼ 8. ¨ The second case has similar design as in Simulation C of Liu and Muller (2009). We use the proposed method in (12) and its ‘‘oracle’’ version with true Sðt,tÞ to construct the confidence bands for mð1Þ ðÞ, respectively, in both studies. We consider Table 1 Coverage rates of the spline confidence bands in Model I. n

N

30 50 100 200

30 60 50 100 100 200 200 400

95%

99%

Est.

Oracle

Est.

Oracle

0.756 0.833 0.824 0.856 0.851 0.856 0.866 0.869

0.831 0.900 0.863 0.904 0.897 0.907 0.910 0.944

0.901 0.926 0.932 0.947 0.943 0.949 0.950 0.959

0.936 0.969 0.933 0.979 0.971 0.971 0.972 0.987

1562

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

two confidence levels: 1a ¼ 0:95,0:99. The number of trajectories n is taken to be 30, 50, 100, 200, and for each n, we try different numbers of observations on the trajectory. Each simulation consists of 1000 Monte Carlo samples. We evaluate the coverage of the bands over 200 equally spaced points on [0,1] and test whether the true functions are covered by the confidence bands at these points. Tables 1 and 2 show the empirical coverage probabilities out of 1000 replications for Models I and II, respectively. From Tables 1 and 2, we observe that coverage probabilities for both estimated bands and ‘‘oracle’’ bands approach the nominal levels, which show positive confirmation of Theorem 2. In most of the scenarios the ‘‘oracle’’ confidence bands outperform the estimated bands, and the ‘‘oracle’’ bands arrive at about the Table 2 Coverage rates of the spline confidence bands in Model II. n

N

30

95%

30 60 50 100 100 200 200 400

50 100 200

99%

Est.

Oracle

Est.

Oracle

0.650 0.695 0.817 0.830 0.858 0.876 0.874 0.889

0.730 0.839 0.911 0.929 0.940 0.939 0.939 0.946

0.830 0.860 0.930 0.933 0.948 0.960 0.949 0.963

0.854 0.941 0.982 0.986 0.986 0.986 0.980 0.991

n=30, N=30

n=30, N=60

30

30

20

20

10

10 µ(1)

40

µ(1)

40

0

0

−10

−10

−20

−20

−30

−30 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

n=50, N=50

0.4

0.6

0.8

1.0

0.8

1.0

n=50, N=100

30

30

20

20

10

10 µ(1)

40

µ(1)

40

0

0

−10

−10

−20

−20

−30

−30 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

Fig. 1. Plots of the cubic spline estimators (dotted-dashed line) and 99% confidence bands (upper and lower dashed lines) of mð1Þ ðtÞ (solid line) in Model I.

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

n=30, N=30

1563

n=30, N=60

40 30 20

20

µ(1)

µ(1)

10 0

0 −10

−20

−20 −30

−40 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

30

30

20

20

10

10

0

−10

−20

−20

−30

−30

0.2

0.4

0.6

0.8

1.0

0.8

1.0

0

−10

0.0

0.6

n=50, N=100

µ(1)

µ(1)

n=50, N=50

0.4

0.8

1.0

0.0

0.2

0.4

0.6

Fig. 2. Plots of the cubic spline estimators (dotted-dashed line) and 99% confidence bands (upper and lower dashed lines) of mð1Þ ðtÞ (solid line) in Model II.

nominal coverage for large n and N. The convergence rates of estimated bands are slower than those ‘‘oracle’’ bands, but the convergence trend to nominal level is clearly. Figs. 1 and 2 show the estimated functions and their 99% confidence bands for the first order derivative curve mð1Þ ðÞ for Models I and II, respectively. As expected when n and N increase, the confidence band is narrower and the cubic spline estimator is closer to the true derivative curve. For Model I, the boundary effects in all four panels are almost unnoticeable. For Model II, there seems to be some boundary effects for small n and N, which are attenuated as n and N increase. 4.2. Tecator data Here we apply the proposed method to the Tecator dataset, which can be downloaded from http://lib.stat.cmu.edu/ datasets/tecator. This data set contains measurements on n ¼240 meat samples and for each sample N ¼ 100 channel nearinfrared spectrum of absorbance measurements were obtained. The Feed Analyzer worked in the wavelength range 850– 1050 nm. Fig. 3 shows the estimated mean absorbance measurements mðÞ in the upper panel and its estimated first order derivative mð1Þ ðÞ in the lower panel. Their 99% confidence bands (dashed lines) are also included in the figure, both bands have similar band width around 0.1–0.2 even though the bands for mð1Þ ðÞ looks much narrower in the figure. As shown in Fig. 3, in the region of 850–950 nm the derivative estimate of mean absorbance is increasing gradually above 0, which corroborates with the convex behavior of the corresponding estimated mean function. For wavelength between 950 and 970 nm, the big bump in the derivative graph is consistent with the changing pattern of the mean estimate before it reaches the turning point at around 970 nm. When wavelength is larger than 970 nm, its estimated derivative turns negative and is relatively flat after 1000 nm, and it is in accordance with the quick dip of the mean absorbance and a linear decreasing trend for wavelength larger than 1000 nm.

1564

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

absorbance

3.6 3.4 3.2 3.0 2.8 850

900

950 wavelength

1000

1050

850

900

950 wavelength

1000

1050

first derivative of absorbance

5 4 3 2 1 0 −1 −2

Fig. 3. Plots of the cubic spline estimators (dotted-dashed line) and 99% confidence bands (upper and lower dashed lines) of the mean function and its first order derivative.

Acknowledgment The first and fourth authors’ work was supported in part by NCI/NIH K-award, 1K01 CA131259 and its supplement from the 2009 American Recovery and Reinvestment Act funding mechanism; in addition the first author received support from the NSF awards DMS 0706518, 1007594; the second author’s research was supported in part by NSF award DMS 1107017 and NSF ADVANCED IT WISEST award at University of Illinois at Chicago; the third author’s research was supported in part by NSF award DMS 0905730 and the ASA/NSF/BLS Research Fellow program. The views expressed here are those of the authors and do not necessarily reflect the views of the BLS. The authors appreciate the helpful comments from two referees and the Associate Editor, which have led to significant improvement of the paper.

Appendix A Throughout this section, we use C to denote some positive constant. For any vector a ¼ ða1 , . . . ,an Þ 2 Rn , denote r r JaJr ¼ ð9a1 9 þ    þ 9an 9 Þ1=r , 1r r o þ 1, JaJ1 ¼ maxð9a1 9, . . . ,9an 9Þ. For any n  n symmetric matrix A, denote its L1 norm Pn as JAJ1 ¼ max1 r i r n j ¼ 1 9aij 9. In this paper, T ¼ ½0; 1 or ½0; 12 . For any function f on a domain T , denote JfJ1 ¼ supt2T 9fðtÞ9, and define JfJq,r ¼ r ðqÞ ðqÞ suptas,t,s2½0;1 9f ðtÞf ðsÞ9=9ts9 . For any Lebesgue measurable L2-integrable functions f and j on [0,1], define their R P theoretical and empirical inner products as /f, jS ¼ T fðtÞjðtÞ dt and /f, jS2,N ¼ N 1 N with the j ¼ 1 fðj=NÞjðj=NÞ, R1 2 P 2 2 corresponding theoretical and empirical inner product norms defined as JfJ2 ¼ 0 f ðtÞ dt and JfJ22,N ¼ N1 N j ¼ 1 f ðj=NÞ, respectively. ¨ For any r 2 ð0; 1, we denote C q,r ½0; 1 as the space of Holder continuous functions on [0,1], C q,r ½0; 1 ¼ ff : JfJq,r o þ 1g. The technical assumptions we need are as follows: (A1) The regression function m 2 C p1;1 ½0; 1; (A2) The standard deviation function sðtÞ 2 C 0, d ½0; 1 for some d 2 ð0; 1; (A3) The number of observations for each trajectory N b ny for some y 4 ð1 þ2nÞ=2ðpnÞ; the number of interior knots satisfies n1=2ðpnÞ 5 Nm 5 ðN=logðnÞÞ1=ð1 þ 2nÞ , n1=2p 5 NG 5 n1=ð2 þ 2nÞ ;

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

1565

(A4) There exists a constant C 40 such that Sðt,tÞ 4 C, for any t 2 ½0; 1; pffiffiffiffiffi ðiÞ (A5) For k 2 f1, . . . , kg, i ¼ 0; 1, . . . ,p2, fðiÞ ðtÞ 2 C 0, d ½0; 1, for some d 2 ð0; 1, Pk lk Jfk J1 o1; and for a sequence k¼1 k pffiffiffiffiffi ðiÞ d Pkn 1 l f J ¼ oð1Þ; fkn gn ¼ 1 of increasing integers with limn-1 kn ¼ k, Nm J k k¼1 k 0, d (A6) There exist d 40, d 4 0 such that E9x 94 þ d1 þE9e 94 þ d2 o þ1, for 1 ri o 1, 1r k r k, 1 r j o1. The number k of 1

2

ij

ik

nonzero eigenvalues is finite or k is infinite while the variables fxik g1 r i o 1,1 r k o 1 are i.i.d. Assumptions (A1)–(A6) are standard in the spline smoothing literature; see Huang (2003) and Cao et al. (forthcoming), for instance. In particular, (A1) and (A2) guarantee the convergence rates of m^ ðtÞ and its derivatives. Assumption (A3) states the requirement of the number of observations within each curve to the sample size, and the order of the number of knots of splines. Assumptions (A4) and (A5) concern that the derivatives of principal components have collectively bounded smoothness. When i ¼ 0, Assumption (A5) is the same as (A4) in Cao et al. (forthcoming). Assumption (A6) is necessary when using strong approximation result in Lemma A4. ðnÞ ðnÞ If k is finite and all fk ðtÞ 2 C 0, d ½0; 1, then Assumption (A5) on fk ’s holds trivially. For k ¼ 1, Assumption (A5) is fulfilled as long as lk decreases to zero sufficiently fast. For example, considering the following canonical orthonormal Fourier basis of L2 ð½0; 1Þ: pffiffiffi f1 ðtÞ  1, f2k þ 1 ðtÞ  2 cosðkptÞ, pffiffiffi

f2k ðtÞ  2 sinðkptÞ, k ¼ 1; 2, . . . ,t 2 ½0; 1,

pffiffiffiffiffi ðnÞ pffiffiffi pffiffiffi P P k we can ¼ 1 and lk ¼ ð½k=2pÞ2n r2½k=2 , k¼2,y, for any r 2 ð0; 1Þ, then 1 lk Jfk J1 ¼ 1 þ 1 k¼1 k ¼ 1 r ð 2 þ 2Þ ¼ pffiffiffitake l1 1 1 1 þ2 2rð1rÞ o1. While for any d ¼ 1 and fkn gn ¼ 1 with kn increasing, odd and kn -1, one has N1 m

kn pffiffiffiffiffi X

lk JfðknÞ J0;1 ¼ N 1 m

k¼1

ðknX 1Þ=2

pffiffiffi

pffiffiffi

pffiffiffi

rk ð 2kp þ 2kpÞ r 2 2pN1 m r

1 X

pffiffiffi

2 rk1 k ¼ 2 2pN 1 ¼ OðN1 m ð1rÞ m Þ ¼ oð1Þ:

k¼1

k¼1

N

In the following, define the theoretical and empirical inner product matrices of fBJ,p ðtÞgJ ¼m 1p as N

N

Vp ¼ ð/BJ,p ,BJ0 ,p SÞJ,Jm0 ¼ 1p ¼ ðvJJ0 ,p ÞJ,Jm0 ¼ 1p , N N V^ p ¼ ð/BJ,p ,BJ0 ,p S2,N ÞJ,Jm0 ¼ 1p ¼ ðv^ JJ0 ,p ÞJJ0m¼ 1p :

ðA:1Þ

We establish next that V^ p has an inverse with bounded L1 norm. Lemma A1 (Cao et al., forthcoming). Under Assumption (A3), for Vp and V^ p defined in (A.1), JVp V^ p J1 ¼ OðN 1 Þ and 1 JV^ p J1 ¼ OðNm Þ. A.1. Proof of Proposition 1 Following Wang and Yang (2009b), we introduce the p-th order quasi-interpolant of m corresponding to the knots $, denoted by Q $ ðmÞ; see DeVore and Lorentz (1993, Eq. (4.12) on p. 146) for details. According to DeVore and Lorentz (1993, Theorem 7.7.4), the following lemma holds. Lemma A2. There exists a constant C 4 0, such that for 0 r n r p2 and m 2 C p,1 ½0; 1, pn

JðmQ $ ðmÞÞðnÞ J1 r CJmðpÞ J1 hm : Lemma A3. Under Assumptions (A2), (A3) and (A6), one has sffiffiffiffiffiffiffiffiffiffiffiffiffi! N 1X logðnÞ BJ,p ðj=NÞsðj=NÞe j ¼ OP : Nj¼1 nNN m Proof. We first truncate the random error eij by un ¼ ðnNÞg (2=9 o g o 1=3) and write eij ¼ eij,1 þ eij,2 þaij , where eij,1 ¼ eij If9eij 9 4un g, eij,2 ¼ eij If9eij 9 run gaij and aij ¼ E½eij If9eij 9 r un g. It is easy to see that 9aij 9 ¼ 9E½eij If9eij 9 4 un g9 r 4 þ d2 Eð9eij 9 Þunð3 þ d2 Þ . It is straightforward from the boundedness of spline basis that     N n X 1 X   BJ,p ðj=NÞsðj=NÞ aij  ¼ Oðunð3 þ d2 Þ Þ: nN   j¼1 i¼1

Next we show that the tail part vanishes almost surely. Note that 1 X n¼1

Pf9eij 9 4un g r

4 þ d2 1 X E9eij 9 n¼1

u4n þ d2

r Md

1 X n¼1

unð4 þ d2 Þ o1:

1566

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

By the Borel–Cantelli lemma, Pfo9(NðoÞ,9eij ðoÞ9 r un for n 4NðoÞg ¼ 1. Let ve ¼ maxf9eij 9,1 r i,j rNðoÞg and there exists N 1 ðoÞ 4 NðoÞ, uN1 ðoÞ 4 ve . Since un ¼ ðnNÞg is an increasing function, un 4uN1 ðoÞ 4 ve and n 4 N1 ðoÞ. Therefore Pfo9(NðoÞ,9eij ðoÞ9 r un , 1 r i rn, 1r j r N, for minðn,NÞ 4 NðoÞg ¼ 1, which implies Pfo9(NðoÞ,9eij,1 9 ¼ 0, 1 ri rn, 1 rj r N for minðn,NÞ 4 NðoÞg ¼ 1. Thus     N n X 1 X  k   B ðj=NÞ s ðj=NÞ e for any k 4 0: J,p ij,1  ¼ Oa:s: ðnNÞ , nN   j¼1 i¼1 Next denote Dj ¼ ðnNÞ1 BJ,p ðj=NÞsðj=NÞ

Pn

i¼1

eij,2 . Since

2 2 ij If9 ij 9 4un gaij

d2 e ¼ 1 þOP fu þ un2ð1 þ d2 Þ g, Varðeij,2 Þ ¼ 1E½e n P 1 one has V 2n ¼ Varð N for a constant c 40. Now Minkowski’s inequality implies that j ¼ 1 Dj Þ ¼ cðnNN m Þ k

2

E9eij,2 9 r2k1 Efekij If9eij 9 r un g þ akij g r 2k2 uk2 n E9eij,2 9 k!,

k Z 2:

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Thus E9Dj 9 rð2n1 N1 un Þk2 k!EðD2j Þ o 1 with the Cramer constant cn ¼ 2un =nN. For any d, let dn ¼ d logðnÞ=nNN m . By the Bernstein inequality, for any large enough d 4 0, 9 8 8  9 > > ( )  > > 2 2 N = < < X =  dn dn r 2n3 : P  Dj  Z dn r2 exp ¼ 2exp 2 un > > 4c : ; 4V n þ 2cn dn  > j¼1 ; : þ 4 dn > nNN m nN PN Pn P1 P1 3 o 1, for such d 40. Thus Borel–Cantelli’s Hence n ¼ 1 Pð9ð1=nNÞ j ¼ 1 BJ,p ðj=NÞsðj=NÞ i ¼ 1 eij,2 9 Z dn Þ r2 n¼1n lemma implies the desired result. & k

2 Lemma A4 (Cso+ rgo+ and Re´ve´sz, 1981, Theorem 2.6.7). Suppose that xi , 1r i rn are i.i.d. with Eðx1 Þ ¼ 0, Eðx1 Þ ¼ 1 and HðxÞ 4 0 ðx Z 0Þ is an increasing continuous function such that x2g HðxÞ is increasing for some g 40 and x1 log HðxÞ is decreasing with EHð9x1 9Þ o1. Then there exist constants C1, C 2 , a 4 0 which depend only on the distribution of x1 and a sequence of Brownian P motions fW n ðmÞg1 such that for any fxn g1 H1 ðnÞ oxn oC 1 ðn logðnÞÞ1=2 and Sm ¼ m then i ¼ 1 xi , n ¼ 1, n ¼ 1 satisfying 1 Pfmax1 r m r n 9Sm W n ðmÞ9 4 xn g r C 2 nfHðaxn Þg .

Proof of Proposition 1. We first show (6). According to Theorem A.1 of Huang (2003), there exists an absolute constant C 4 0, such that p

Jm~ mJ1 rC inf JgmJ1 r CJmðpÞ J1 hm :

ðA:2Þ

g2C p,1

Applying Lemma A2, for 0 r n rp2, pn

JfQ $ ðmÞmgðnÞ J1 r CJmðpÞ J1 hm :

ðA:3Þ p

As a consequence of (A.2) and (A.3) if n ¼ 0, one has JQ $ ðmÞm~ J1 r CJm J1 hm , which, according to the differentiation of B-spline given in de Boor (2001), entails that ðpÞ

pn

JfQ $ ðmÞm~ gðnÞ J1 r CJmðpÞ J1 hm ,

ðA:4Þ

for 0 r n r p2. Combining (A.3) and (A.4) proves (6) for n ¼ 1, . . . ,p2. ðnÞ ~ ðnÞ ðtÞ ¼ GðnÞ ðtÞ/ , for any Next we prove (7). Similar to the definition of m~ ðnÞ ðtÞ and x~ k ðtÞ, in the following we denote f k k k Z 1. Using the similar arguments as in the proof of (6), we can show that, for any k Z 1, ðnÞ

ðnÞ

pn

~ J1 r C h : Jfk f f m k

ðA:5Þ

~ ðnÞ J1 r c JfðnÞ J1 ¼ Oð1Þ. Also, according to triangle inequality one has that Jf f k k 4 þ d1 According to Assumption (A6), E9xik 9 o þ 1, d1 4 0, so there exists some b 2 ð0; 1=2Þ, such that 4þ d1 4 2=b. Let HðxÞ ¼ x4 þ d1 , xn ¼ nb , then n=Hðaxn Þ ¼ a4d1 n1ð4 þ d1 Þb ¼ Oðng1 Þ for some g1 4 1. Applying Lemma A4 and Borel–Cantelli Lemma, one finds i.i.d. variables Z ik, x  Nð0; 1Þ such that pffiffiffiffiffi ðA:6Þ max9x k Z k, x lk 9 ¼ Oa:s: ðnb1 Þ, kZ1

P where Z k, x ¼ n1 ni¼ 1 Z ik, x , kZ 1. pffiffiffiffiffi pffiffiffiffiffi If k is finite, according to (A.6) note that 9x k 9r 9Z k, x 9 lk þ 9x k Z k, x lk 9, 1 rk r k, so max1 r k r k 9x k 9 ¼ OP ðn1=2 þ nb1 Þ. Then the definition of m ðtÞ in (4) and (A.5) entail that ðnÞ ðnÞ ~ ðnÞ J1 ¼ o ðn1=2 Þ: Jm ðnÞ mðnÞ x~ J1 r k max 9x k 9 max Jfk f P k 1rkrk

1rkrk

Thus (8) holds according to Assumption (A3).

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

1567 1=2

If k ¼ 1, using similar arguments in Cao et al. (forthcoming), by (A.6) one can show that 9x k 9lk 1=2 1=2 9Z k, x 9 þ9x k lk Z k, x 9, for any kZ 1, so 9x k 9lk ¼ OP ðn1=2 þnb1 Þ. Also following Assumption (A5) one has

r

ðnÞ E sup 9m ðnÞ ðtÞmðnÞ ðtÞx~ ðtÞ9 t2½0;1

r

k X

t2½0;1

k¼1

(

rC

ðnÞ

ðnÞ

~ ðtÞ9 rC E9x k 9 sup 9fk ðtÞf k

k X

ðnÞ

E9x k 9 sup 9fk ðtÞ9 t2½0;1

k¼1

kn X

k X

1=2 1=2 1=2 1=2 ðnÞ d ðnÞ ðE9x k 9lk Þlk Jfk J0, d hm þ ðE9x k 9lk Þlk Jfk J1 k¼1 k ¼ kn þ 1

(

r Cn1=2

kn X

k X

d ðnÞ l1=2 Jfk J0, d hm þ k

ðnÞ l1=2 Jfk J1 k

)

)

¼ oðn1=2 Þ:

k ¼ kn þ 1

k¼1

ðnÞ Hence, Jm ðnÞ mðnÞ x~ J1 ¼ oP ðn1=2 Þ. According to Lemmas A.1 and A.3, finally we have 1

n

1

Je~ ðnÞ J1 ¼ JBðpnÞ ðtÞV^ p N1 BT eJ1 rChm JV^ p J1 JN1 BT eJ1 pffiffiffiffiffiffiffiffiffiffiffiffiffi 1=2n logðnÞÞ: ¼ Oa:s: ðn1=2 N 1=2 hm Thus (8) holds according to Assumption (A3).

&

A.2. Proof of Theorem 2 pffiffiffiffiffi ðnÞ We denote z~ k ðtÞ ¼ lk Z k, x fk ðtÞ, k ¼ 1, . . . , k and define " #1=2 k k k X X X ðnÞ 2 ~z ðtÞ ¼ n1=2 lk ffk ðtÞg z~ k ðtÞ ¼ n1=2 Sðt,tÞ1=2 z~ k ðtÞ: k¼1

k¼1

k¼1

It is clear that, for any t 2 ½0; 1, z~ ðtÞ is Gaussian with mean 0 and variance 1, and the covariance Ez~ ðtÞz~ ðsÞ ¼ Sðt,tÞ1=2 Sðs,sÞ1=2 Sðt,sÞ, for any t,s 2 ½0; 1. That is, the distribution of z~ ðtÞ, t 2 ½0; 1 and the distribution of zðtÞ, t 2 ½0; 1 in Section 3.1 are identical. Similar to the proof of (7), Note that ðnÞ E sup 9z~ ðtÞn1=2 Sðt,tÞ1=2 x~ ðtÞ9 t2½0;1

  X  k ðnÞ   ¼ n1=2 E sup Sðt,tÞ1=2  z~ k ðtÞx~ ðtÞ   t2½0;1 k¼1

1=2

rn

E sup Sðt,tÞ

k X

1=2

t2½0;1

ð9Z k, x

pffiffiffiffiffi ðnÞ lk x k 99fðknÞ ðtÞ9 þ9x k 99fðknÞ ðtÞf~ k ðtÞ9Þ:

k¼1

ðnÞ pn If k is finite, then Esupt2½0;1 9z~ ðtÞn1=2 Sðt,tÞ1=2 x~ ðtÞ9 ¼ Oðnb1=2 þhm Þ ¼ oð1Þ. If k ¼ 1, by Assumption (A5) one has

k X pffiffiffiffiffi ðnÞ ðnÞ ~ ðnÞ ðtÞ9Þ ð9 lk Z k, x x k 99fk ðtÞ9 þ 9x k 99fk ðtÞf k

n1=2 E sup Sðt,tÞ1=2 t2½0;1

k¼1

1=2

1=2

rn

(

sup Sðt,tÞ

n

t2½0;1

b1

) k pffiffiffiffiffi k pffiffiffiffiffi X X ðnÞ ðnÞ ðnÞ 1=2 ~ lk 9fk ðtÞ9 þ n lk 9fk ðtÞf k ðtÞ9 k¼1

k¼1

¼ oð1Þ: Theorem 2 follows directly. A.3. Proof of Theorem 3 Following Cao et al. (2011), we define the tensor product spline space as 8 9 NG < X = ðp2Þ,2 2 ðp2Þ,2 ðp2Þ ðp2Þ 0 0 ½0; 1  H ¼H H ¼ bJJ BJ,p ðtÞBJ ,p ðsÞ,t,s 2 ½0; 1 : H : 0 ; J,J ¼ 1p

Let Rij  Y ij mðj=NÞ, 1r i rn, 1 rj r N and R jj0 ¼ n1 ^ Gð,Þ ¼ arg min

Pn

i¼1

Rij Rij0 , 1r j,j0 rN. Note that the spline estimator in (9) is

N X fR^ jj0 gðj=N,j0 =NÞg2 ,

gð,Þ2Hðp2Þ,2 jaj0

so we define the ‘‘infeasible estimator’’ of the covariance function X ~ Gð,Þ ¼ arg min fR jj0 gðj=N,j0 =NÞg2 : gð,Þ2Hðp2Þ,2 1 r jaj0 r N

ðA:7Þ

1568

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

nÞ Denote X ¼ B  B, Bp ðt,sÞ ¼ Bp ðtÞ  Bp ðsÞ and Bð0, ðt,sÞ ¼ Bp ðtÞ  BðpnÞ ðsÞ. Let vector R ¼ fR jj0 g1 r j,j0 r N , then we have p ð0, nÞ @n ^ ðt,sÞ  n Gðt,sÞ ¼ Bpð0, nÞ ðt,sÞðXT XÞ1 XT R: G~ @s

ðA:8Þ ð0, nÞ

ðnÞ

In the following we write fkk0 ðt,sÞ ¼ fk ðtÞfk0 ðsÞ, fkk0 ðt,sÞ ¼ fk ðtÞfk0 ðsÞ. Let /kk0 ¼ /k  /k0 , where /k is a N-dimensional ~ 0 ðt,sÞ ¼ Bp ðt,sÞðXT XÞ1 XT / 0 . Further we denote f ~ ð0,0nÞ ðt,sÞ ¼ Bð0, nÞ ðt,sÞðXT XÞ1 XT / 0 . vector defined in Section 2.3, f kk kk kk kk p Lemma A5. Under Assumptions (A5), for any 1 r n r p2, and k ¼ 1, one has X qffiffiffiffiffiffiffiffiffiffiffi ð0, nÞ ð0, nÞ lk lk0 Jfkk0 f~ kk0 J1 ¼ oð1Þ: k,k0 Z 1

Proof. When k ¼ 1, one has lk 40 for any k Z 1. Note that X pffiffiffiffiffi X qffiffiffiffiffiffi ðnÞ X qffiffiffiffiffiffiffiffiffiffiffi ð0, nÞ lk lk0 Jfkk0 J1 r lk Jfk J1 lk0 Jfk0 J1 ¼ oð1Þ: k,k0 Z 1

k0 Z 1

kZ1

Also similarly, X qffiffiffiffiffiffi ðnÞ X qffiffiffiffiffiffi ðnÞ X qffiffiffiffiffiffiffiffiffiffiffi ð0, nÞ X pffiffiffiffiffi X pffiffiffiffiffi lk lk0 Jf~ kk0 J1 r lk Jf~ k J1 lk0 Jf~ k0 J1 r C lk Jfk J1 lk0 Jfk0 J1 ¼ oð1Þ: 0

0

kZ1

k,k Z 1

0

kZ1

k Z1

k Z1 ð0, nÞ

ð0, nÞ

pn

~ 0 J1 ¼ Oðh Þ. Lemma A6. Under Assumption (A5), for any 0 r n r p2 and k,k Z1, one has Jfkk0 f kk G 0

Proof. According to Theorem 12.8 of Schumaker (2007), there exists an absolute constant C 4 0, such that ~ 0 f 0 J1 r CJf f 0 þ f f 0 J1 h , Jf kk kk k k k G k ðpÞ

ðpÞ

p

which proves (6) for the case u ¼ 0. Let Q ðf Þ the p-th order quasi-interpolant of a function f; see the definition in (12.29) of Schumaker (2007), for 0 r n rp2, pn

JfQ ðfkk0 Þfkk0 gð0, nÞ J1 rCJfk fk0 þ fk fk0 J1 hG : ðpÞ

ðpÞ

For the case n ¼ 0, one has ~ 0 J1 ¼ JQ ðf 0 ÞQ ðf ~ 0 ÞJ1 r CJf ~ 0 f 0 J1 r CJf f 0 þ f f 0 J1 h , JQ ðfkk0 Þf kk kk kk kk kk k k k G k ðpÞ

ðpÞ

p

~ 0 gð0, nÞ J1 r which, according to the differentiation of B-spline given in de Boor (2001), entails that JfQ ðfkk0 Þf kk pn ðpÞ ðpÞ CJfk fk0 þ fk fk0 J1 hG , for 0 r n r p2. & Lemma A7. Under Assumptions (A1)–(A6), for any 1 r n r p2 JG~

ð0, nÞ

Gð0, nÞ J1 ¼ OP ðn1=2 þhG Þ þoP ðN 1 n1=2 hG

JG^

ð0, nÞ

G~

where G^

pn

ð0, nÞ

ð0, nÞ

and G^

ð0, nÞ

Proof. Let x kk0 ¼ n R 1jj0 ¼

k X

3=2n

J1 ¼ OP ðn1 hG

i¼1

x kk0 fkk0



0

kak

R 4jj0 ¼ n

1

n X i¼1

1=2

log

nÞ,

ðA:9Þ

1n p

þn1=2 hG

hm Þ,

ðA:10Þ

are given in (10) and (A.8).

Pn 1

(

1n

xik xik0 and e jj0 ¼ n1 0

j j , , N N

R 2jj0 ¼

k X

Pn

i¼1

x kk fkk

k¼1

eij eij0 . To show (A.9), we decompose R jj0 in (A.7) into



 j j0 , , N N

R 3jj0 ¼ s

   0 j j s e 0, N N jj

   0  0   ) k X j j j j xik fk s eij0 þ xik fk s eij : N N N N k¼1 k¼1 k X

~ ð0, nÞ ðt,sÞ ¼ Bð0, nÞ ðt,sÞðXT XÞ1 XT R i , i¼ 1,2,3,4. Then G~ ð0, nÞ ðt,sÞ ¼ R ~ ð0, nÞ ðt,sÞ þ R ~ ð0, nÞ ðt,sÞ þ R ~ ð0, nÞ ðt,sÞ þ Denote R i ¼ fR ijj0 g1 r j,j0 r N , R i 1 2 3 p ~ ð0, nÞ ðt,sÞ. Next we define R 4 R1ð0, nÞ ðt,sÞ ¼

k X kak0

ð0, nÞ x kk0 fkk 0 ðt,sÞ,

R2ð0, nÞ ðt,sÞ ¼ Gð0, nÞ ðt,sÞ þ

k X k¼1

ð0, nÞ

ffkk ðt,sÞðx kk lk Þg:

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

ð0, nÞ

~ Note that R 1

ðt,sÞ ¼ ð0, nÞ

~ sup E9R 1

ðt,sÞ2½0;12

Pk

1569

ð0, nÞ

kak0

x kk0 f~ kk0 ðt,sÞ, then Lemma A5 and Assumption (A5) imply that if k is 1, lk lk0 40 and one has

nÞ ðt,sÞRð0, ðt,sÞ9 r 1

qffiffiffiffiffiffiffiffiffiffiffi ð0, nÞ ~ ð0,0nÞ J1 ¼ oð1Þ: E9x kk0 ðlk lk0 Þ1=2 9 lk lk0 Jfkk0 f kk

k X 0

kak

Similarly, one has ð0, nÞ

~ sup E9R 2

ðt,sÞ2½0;12

nÞ ðt,sÞRð0, ðt,sÞ9 r 2

k X

ð0, nÞ

ð0, nÞ

~ E9x kk lk 19lk Jfkk f kk J1 ¼ oð1Þ: 1

k¼1

If k is finite, then Lemma A.6 and Assumption (A3) imply that ð0, nÞ

~ JR 1

R1ð0, nÞ J1 r k2

max0

1 r kak r k

ð0, nÞ

ð0, nÞ

pn

~ 0 J1 ¼ O ðh n1=2 Þ: 9x kk0 9Jfkk0 f P G kk

Similarly, ð0, nÞ

~ JR 2

ð0, nÞ

ð0, nÞ

pn

~ R2ð0, nÞ J1 r k max 9x kk lk 9Jfkk f kk J1 ¼ OP ðhG Þ: 1rkrk

ð0, nÞ

~ Hence, JR 2

ð0, nÞ

~ R2ð0, nÞ J1 þ JR 1

R1ð0, nÞ J1 ¼ oð1Þ. By Proposition 3.1 in Cao et al. (2011), one has 1=2

JN 2 XT R 3 J1 ¼ JN2 XT R 4 J1 ¼ oP ðN 1 n1=2 hG log Hence,

~ ð0, nÞ J1 JR 3

JG~

ð0, nÞ

~ ð0, nÞ J1 ¼ JR 4

Gð0, nÞ J1 r

k X 0

kak

1n ¼ oP ðN1 n1=2 hG

k X

nÞ x kk0 Jfð0, 0 J1 þ kk

1=2

log

ðnÞÞ:

nÞ. Therefore,

ð0, nÞ ~ ð0, nÞ Rð0, nÞ J1 þ JR ~ ð0, nÞ Rð0, nÞ J1 þ JR ~ ð0, nÞ J1 þ JR ~ ð0, nÞ J1 ðx kk lk ÞJfkk J1 þ JR 1 2 3 4 1 2

k¼1

pn 1n ¼ OP ðn1=2 þ hG Þ þ oP ðN1 n1=2 hG

1=2

log

nÞ:

The proof of (A.10) is similar to Proposition 2.1 in Cao et al. (2011), thus omitted.

&

Proof of Theorem 3. According to Lemma A7, one has JG^

ð0, nÞ

ð0, nÞ ð0, nÞ ð0, nÞ Gð0, nÞ J1 r JG~ Gð0, nÞ J1 þJG^ G~ J1 ¼ oP ð1Þ:

&

A.4. Proof of Theorem 4 ^ ðÞ, for k Z 1, in the following lemma. We first show asymptotic consistency of l^ k and f k Lemma A8. Under Assumptions (A1)–(A6), one has 9l^ k lk 9 ¼ oP ð1Þ,

^ f J1 ¼ o ð1Þ, Jf P k k

k Z1:

^ Proof. We first want to show that for any kZ 1, JDfk J1 ¼ oP ð1Þ, in which D is the integral operator with kernel GG. Note R ^ ^ that ðDfk ÞðtÞ ¼ ðGGÞðs,tÞ fk ðsÞ ds. By Theorem 3, when n ¼ 0, JGGJ 1 ¼ oP ð1Þ. Thus, for any k Z1, JDfk J1 ¼ oP ð1Þ. Hall and Hosseini-Nasab (2006) gives the L2 expansion: X f^ k fk ¼ ðlk lk0 Þ1 /Dfk , fk0 Sfk0 þ OðJDJ22 Þ, k0 ak

RR ^ 2 ^ f J r CðJDf J2 þ JDJ2 Þ ¼ o ð1Þ. By (4.9) in where JDJ2 ¼ ½ fGðs,tÞGðs,tÞg ds dt1=2 . By Bessel’s inequality, one has Jf P k k 2 k 1 2 Hall et al. (2006) and Theorem 3 ZZ ^ l^ k lk ¼ ðGGÞðs,tÞ fk ðsÞfk ðtÞ ds dt þOðJDfk J22 Þ ¼ oP ð1Þ: Thus 9l^ k lk 9 ¼ oP ð1Þ for any k Z1. Next note that Z Z ^ l^ k f^ k ðtÞlk fk ðtÞ ¼ Gðs,tÞ f^ k ðsÞ ds Gðs,tÞfk ðsÞ ds Z Z Z ^ ^ f^ k ðsÞfk ðsÞÞ ds þ ðGGÞðs,tÞ fk ðsÞ ds þ Gðs,tÞff^ k ðsÞfk ðsÞg ds: ¼ ðGGÞðs,tÞð By the Cauchy–Schwarz inequality, uniformly for all t 2 ½0; 1 Z 1=2 Z ^ f J ¼ o ð1Þ: ^ ðsÞf ðsÞÞ ds r G2 ðs,tÞ ds Jf Gðs,tÞðf P k k k k 2

1570

G. Cao et al. / Journal of Statistical Planning and Inference 142 (2012) 1557–1570

R ^ R ^ Similar arguments and Theorem 3 imply that ðGGÞðs,tÞð f^ k ðsÞfk ðsÞÞ ds ¼ oP ð1Þ and ðGGÞðs,tÞ fk ðsÞ ds ¼ oP ð1Þ. All the ^ ^ above together yield Jl k f k lk fk J1 ¼ oP ð1Þ. By the triangle inequality and

lk Jf^ k fk J1 rJl^ k f^ k lk fk J1 þ 9l^ k lk 9Jf^ k J1 ¼ oP ð1Þ, the second result in Lemma A8 follows directly.

&

Proof of Theorem 4. According to Lemma A8 and Theorem 3, one has   Z 1 Z 1   ð0, nÞ ^ ðtÞ dtl1 ^ ðnÞ ðsÞfðnÞ ðsÞ9 ¼ l^ 1 ðt,sÞf Gð0, nÞ ðt,sÞfk ðtÞ dt  9f G^ k k k  k k 0

1 1 r 9l^ l 9  k

k

0

Z

1 0

ð0, nÞ ^ ðtÞ9 dt þ l1 9G^ ðt,sÞf k k

Z

1 0

9G^

ð0, nÞ

^ ðtÞGð0, nÞ ðt,sÞf ðtÞ9 dt: ðt,sÞf k k

R1 ð0, nÞ ^ ðtÞ þ Gðt,sÞðf ^ f ÞðtÞ9 dtg ¼ o ð1Þ and Jf ^ ðnÞ fðnÞ J1 ¼ o ð1Þ. Theorem 4 follows from Hence, sups2½0;1 f 0 9ðG^ Gð0, nÞ Þðt,sÞf P P k k k k k the definition of Sðt,sÞ. & References Cao, G., Wang, L., Li, Y., Yang, L., 2011. Spline confidence envelopes for covariance function in dense functional/longitudinal data. Manuscript. Cao, G., Yang, L., Todem, D. Simultaneous inference for the mean function of dense functional data. Journal of Nonparametric Statistics, forthcoming. Claeskens, G., Van Keilegom, I., 2003. Bootstrap confidence bands for regression curves and their derivatives. Annals of Statistics 31, 1852–1884. + o, + M., Re´ve´sz, P., 1981. Strong Approximations in Probability and Statistics. Academic Press, New York, London. Csorg de Boor, C., 2001. A Practical Guide to Splines. Springer-Verlag, New York. Degras, D.A., 2011. Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica 21, 1735–1765. DeVore, R., Lorentz, G., 1993. Constructive Approximation: Polynomials and Splines Approximation. Springer-Verlag, Berlin. Fan, J., Gijbels, I., 1996. Local Polynomial Modelling and its Applications. Chapman and Hall, London. Ferraty, F., Vieu, P., 1996. Nonparametric Functional Data Analysis: Theory and Practice. Springer, New York. ¨ Gasser, T., Muller, H.G., 1984. Estimating regression functions and their derivatives by the kernel method. Scandinavian Journal of Statistics 11, 171–185. Hall, P., Hosseini-Nasab, M., 2006. On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B 68, 109–126. ¨ Hall, P., Muller, H.G., Wang, J.L., 2006. Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics 34, 1493–1517. ¨ Hall, P., Muller, H.G., Yao, F., 2009. Estimation of functional derivatives. Annals of Statistics 37, 3307–3329. ¨ Hardle, W., 1989. Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis 29, 163–179. Huang, J., 2003. Local asymptotics for polynomial spline regression. Annals of Statistics 31, 1600–1635. Li, Y., Hsing, T., 2010. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Annals of Statistics 38, 3321–3351. ¨ Liu, B., Muller, H.G., 2009. Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics. Journal of the American Statistical Association 104, 704–717. Ma, S., Yang, L., Carroll, R.J., 2012. A simultaneous confidence band for sparse longitudinal regression. Statistica Sinica 22, 95–122. Morris, J.S., Carroll, R.J., 2006. Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B 68, 179–199. ¨ Muller, H.G., 2009. Functional modeling of longitudinal data. In: Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G. (Eds.), Longitudinal Data Analysis (Handbooks of Modern Statistical Methods), Wiley, New York, pp. 223–252. Ramsay, J.O., Silverman, B.W., 2005. Functional Data Analysis. Springer, New York. Rice, J.A., Wu, C.O., 2001. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57, 253–259. Schumaker, L., 2007. Spline Functions Basic Theory, Third ed. Cambridge University Press, Cambridge. Shi, M., Weiss, R.E., Taylor, J.M.G., 1996. An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. Journal of the Royal Statistical Society, Series B 45, 151–163. Stone, C.J., 1994. The use of polynomial splines and their tensor products in multivariate function estimation. Annals of Statistics 22, 118–184. Wahba, G., 1990. Spline Models for Observational Data, vol. 59. SIAM, Philadelphia, PA. Wang, J., Yang, L., 2009a. Polynomial spline confidence bands for regression curves. Statistica Sinica 19, 325–342. Wang, L., Yang, L., 2009b. Spline estimation of single index model. Statistica Sinica 19, 765–783. Xia, Y., 1998. Bias-corrected confidence bands in nonparametric regression. Journal of the Royal Statistical Society, Series B 60, 797–811. Yao, F., Lee, T.C.M., 2006. Penalized spline models for functional principal component analysis. Journal of the Royal Statistical Society, Series B 68, 3–25. ¨ Yao, F., Muller, H.G., Wang, J.L., 2005a. Functional linear regression analysis for longitudinal data. Annals of Statistics 33, 2873–2903. ¨ Yao, F., Muller, H.G., Wang, J.L., 2005b. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100, 577–590. Zhou, L., Huang, J., Carroll, R.J., 2008. Joint modelling of paired sparse functional data using principal components. Biometrika 95, 601–619. Zhou, S., Shen, X., Wolfe, D.A., 1998. Local asymptotics of regression splines and confidence regions. Annals of Statistics 26, 1760–1782. Zhou, S., Wolfe, D.A., 2000. On derivative estimation in spline regression. Statistica Sinica 10, 93–108.