Multivariate functional linear regression and prediction

Multivariate functional linear regression and prediction

Accepted Manuscript Multivariate functional linear regression and prediction Jeng-Min Chiou, Ya-Fang Yang, Yu-Ting Chen PII: DOI: Reference: S0047-25...

294KB Sizes 1 Downloads 143 Views

Accepted Manuscript Multivariate functional linear regression and prediction Jeng-Min Chiou, Ya-Fang Yang, Yu-Ting Chen PII: DOI: Reference:

S0047-259X(15)00253-5 http://dx.doi.org/10.1016/j.jmva.2015.10.003 YJMVA 4025

To appear in:

Journal of Multivariate Analysis

Received date: 1 January 2015 Please cite this article as: J.-M. Chiou, Y.-F. Yang, Y.-T. Chen, Multivariate functional linear regression and prediction, Journal of Multivariate Analysis (2015), http://dx.doi.org/10.1016/j.jmva.2015.10.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript Click here to download Manuscript: JMVA-15-8R2-mvFReg-main1.pdf

Click here to view linked Referenc

Multivariate Functional Linear Regression and Prediction Jeng-Min Chiou1 , Ya-Fang Yang, Yu-Ting Chen Institute of Statistical Science, Academia Sinica, Nankang, Taipei 11529, Taiwan

Abstract We propose a multivariate functional linear regression (mFLR) approach to analysis and prediction of multivariate functional data in cases in which both the response and predictor variables contain multivariate random functions. The mFLR model, coupled with the multivariate functional principal component analysis approach, takes the advantage of cross-correlation between component functions within the multivariate response and predictor variables, respectively. The estimate of the matrix of bivariate regression functions is consistent in the sense of the multi-dimensional Gram-Schmidt norm and is asymptotically normally distributed. The prediction intervals of the multivariate random trajectories are available for predictive inference. We show the finite sample performance of mFLR by a simulation study and illustrate the method through predicting multivariate traffic flow trajectories for up-to-date and partially observed traffic streams. Keywords: Functional prediction, Functional principal component analysis, Functional regression, Multivariate functional data, Stochastic processes 1. Introduction Functional regression analysis is widely used to describe the relationship between response and predictor variables when at least one of the variables contains a random function (see Cuevas [11], Ferraty and Vieu [16], Horv´ath and Kokoszka [23], M¨uller [29], and Ramsay and Silverman [34] for excellent overviews). There has been intensive literature in the functional linear regression (FLR) models of the type of a scalar response and a functional predictor, the simple FLR. Two methods are typically used to address the simple FLR model. The most popular one is based on functional principal component analysis (FPCA) (e.g., Cardot et al. [5] and Hall and Horowitz [21]). The other is based on penalized regularization such ∗

Corresponding author. Email address: [email protected] (Jeng-Min Chiou)

Preprint submitted to Elsevier

October 6, 2015

as the penalized B-splines (Li and Hsing [26]) and the reproducing kernel Hilbert space approaches (Yuan and Cai [37]). This simple FLR model was extended to nonlinear and semiparametric functional regression models to accommodate the case of multiple functional predictors or the situation when the relationship between response and predictor variables is not linear. These include the generalized single and multiple index FLR models with known or unknown link (Amato et al. [1], Chen et al. [6], Ferraty et al. [15], James [24], M¨uller and Stadtm¨uller [30]), the additive functional regression model (Febrero-Bande, Gonzalez-Manteiga [14], Ferraty and Vieu [18], Goia and Vieu [20]), the semi-functional partial linear regression (Aneiros-Perez and Vieu [3]), FLR with derivatives as supplementary covariates (Goia [19]), and time series prediction (Mas and Pumo [27]). FLR models with a single random function as the response were introduced by Ramsay and Delzell [33]. These include functional response models with scalar predictors (Chiou et al. [9, 10], Faraway [12]) and the functional linear regression model with a functional response and a functional predictor (Ferraty et al. [17], Yao et al. [36]). Generalizations of functional response models include functional additive models (M¨uller and Yao [31]) and the functional mixture prediction model (Chiou [7]). Functional response models with more than one functional predictors were discussed in Matsui et al. [28]. More recently, the functional response additive model estimation (Fan et al. [13]) and the functional errors-in-variable (Radchenko et al. [32]) approaches consider univariate functional responses and multivariate functional predictor variables, which are interesting methods for functional response models. Albeit the extensive development of functional regression models, functional response models with multivariate random functions as the response variable have not been discussed in the literature. We develop a multivariate FLR (mFLR) model in which both the response and predictor variables contain multivariate random trajectories and are contaminated with measurement errors. The mFLR model takes the advantage of component dependency of multivariate random functions and accommodates incomparable magnitudes of variation among the component functions of the response and predictor variables, respectively. We discuss the existence and uniqueness of the estimates of bivariate regression functions, obtain the asymptotic properties of the estimators, and construct relevant pointwise and simultaneous prediction intervals for the predictive inference. We illustrate the finite sample performance of the proposed mFLR approach through a simulation study and apply the method to predict future multivariate traffic flow trajectories for an up-to-date and partially observed traffic streams in intelligent transportation systems. The remainder of this article is organized as follows. In Section 2, we present the proposed multivariate FLR (mFLR) model. In Section 3, we discuss estimation 2

of the regression model and prediction of future multivariate trajectories with prediction intervals. In Section 4, we derive the asymptotic properties of the mFLR model. In Section 5, we present the numerical results of a simulation study and a real-life application to multivariate traffic-flow data. Technical details and information on the estimation process are compiled in Appendix A1–A5. More technical details and numerical results are provided in the online Supplement. 2. Multivariate functional linear regression model 2.1. Preliminaries Let {Xl }1≤l≤p and {Yk }1≤k≤d be the sets of random functions, corresponding to the predictor and response variables, with each Xl in L2 (S) and Yk in L2 (T ), where L2 (·) is a Hilbert space of square-integrable functions with respect to Lebesgue measures ds and dt on closed intervals S and  > T . Further, let X(s) = X1 (s), . . . , X p (s) be a vector in a Hilbert space of pp dimensional vectors of functions in L2 (S), denoted by H1 = L2 (S). Assume X(s) has a smooth mean function µX (s) = (µ1X (s), . . . , µXp (s))> , µlX (s) = EXl (s), and con o   , G Xjl (s1 , s2 ) = cov X j (s1 ), Xl (s2 ) . variance function GX (s1 , s2 ) = G Xjl (s1 , s2 ) 1≤ j,l≤p

Similarly, Y(t) = (Y1 (t), . . . , Yd (t))> in H2 = L2d (T ) has a smooth mean function µY (t) = (µY1 (t), . . . , µYd (t))> , µYk (t) = EYk (t), and covariance function GY (t1 , t2 ) = n o GYkm (t1 , t2 ) , GYkm (t1 , t2 ) = cov (Yk (t1 ), Ym (t2 )). In addition, let the diag1≤k,m≤d   onal matrices be DX (s) = diag v1X (s)1/2 , . . . , vXp (s)1/2 , where vlX (s) = GllX (s, s)   and DY (s) = diag vY1 (t)1/2 , . . . , vYd (t)1/2 , where vYk (t) = GYkk (t, t). The inner prodR uct of any functions f and g in L2 (S) is h f, gi = S f (s)g(s)ds, with the norm k · k = h·, ·i1/2 . The inner product of any functions f = ( f1 , f2 , . . . , f p )> and Pp g = (g1 , g2 , . . . , g p )> in H1 is h f , giH1 = l=1 h fl , gl i, and the norm k · kH1 = h·, ·i1/2 H1 . The inner product of two functions in H2 , h·, ·iH2 , is defined in the same way. To accommodate incomparable magnitudes of variation between the component functions {Xl (s)} (resp. {Yk (t)}), we take the transformation approach of Chiou et al. [8]. Let XZ (s) = (X1Z (s), . . . , X Zp (s))> = DX (s)−1 {X(s) − µX (s)} (resp. Y Z (t) = (Y1Z (t), . . . , YdZ (t))> = DY (t)−1 {Y(t) − µY (t)}). It follows that XZ (s) (resp. Y Z (t)) has a mean of 0 and covariance function CX (s1 , s2 ) = {C Xjl (s1 , s2 )} (resp. CY (t1 , t2 ) = Y Y (t1 , t2 ) = E{YkZ (t1 )YmZ (t2 )}). {Ckm (t1 , t2 )}), where C Xjl (s1 , s2 ) = E{X Zj (s1 )XlZ (s2 )} (resp. Ckm X Then, there exists an orthonormal eigenbasis, {φZ,r (s)}r≥1 , where φZX,r (s) = (φZX,1r (s), . . . , φZX,pr (s))> and hφZX,q , φZX,r iH1 = δrq , the Kronecker delta, with the corresponding non-negative P X X X > eigenvalues {λZX,r }r≥1 in descending order, such that CX (s1 , s2 ) = ∞ r=1 λZ ,r φZ ,r (s1 ) φZ ,r (s2 ) , P ∞ whose ( j, l) element is CXjl (s1 , s2 ) = r=1 λZX,r φZX, jr (s1 ) φZX,lr (s2 )> . Similarly, we de3

fine {λYZ,r }r≥1 , and {φYZ,r (t)}r≥1 and φYZ,r (t) = (φYZ,1r (t), . . . , φYZ,dr (t))> in H2 such that P > Y Y Y CY (t1 , t2 ) = ∞ q=1 λZ ,q φZ ,q (t1 ) φZ ,q (t2 ) .

2.2. Multivariate functional linear regression model The multivariate functional linear regression (mFLR) model is based on the transformed variables XZ and Y Z , which can be represented as follows: XZ (s) =

∞ X r=1

ξZX,r φZX,r (s), s ∈ S (resp. Y Z (t) =

∞ X q=1

ξZY,q φYZ,q (t), t ∈ T ),

(1)

where ξZX,r = hXZ , φZX,r iH1 (resp. ξZY,q = hY Z , φYZ,q iH2 ) is a random coefficient that satisfies EξZX,r = 0, E(ξZX,r ξZX,q ) = λZX,r δrq (resp. EξZY,q = 0, and E(ξZY,q ξZY,r ) = λYZ,q δqr ). The components {XlZ } in XZ are correlated, so are the components {YlZ } in Y Z . Further, they are cross-correlated in the mFLR model such that p Z X Z Yk (t) = β0lk (s, t) XlZ (s) ds + ζk (t) , t ∈ T , for k = 1, . . . , d, (2) l=1

S

where β0lk (s, t) is the bivariate regression coefficient function in L2 (S × T ) and ζk (t) is the random error process with Eζk (t) = 0 and var(ζk (t)) = κk2 (t), which is uncorrelated with {Xl (t)} for all k. A matrix form of (2) can be expressed as !> Z  Y Z (t) = XZ (s) > β0 (s, t)ds + ζ(t), t ∈ T , (3) S

p×d

where β0 (s, t) is in L2 (S × T ), whose (l, k) element is β0lk (s, t), and ζ(t) = (ζ1 (t), . . . , ζd (t))> . Here, ζ(t) and XZ (s) are uncorrelated × T. nR for any (s, t) ∈ So> The conditional expectation of (3) is E {Y Z (t) | XZ } = S (XZ (s))> β0 (s, t)ds for any t ∈ T . The case of p = 1 and d = 1, which does not require transformation in the random functions, was discussed in Yao et al. [36]. The matrix of the bivariate coefficient functions β0 needs to be estimated. For p×d any bivariate function β(s, t) in L2 (S × T ), we define an integral operator B : R > H1 → H2 with a kernel β such that, for any f in H1 , (B f )(t) = S ( f (s))> β(s, t)ds P 2 1/2 , where with the Gram-Schmidt norm kBkGS = ( ∞ {ϕr } is an arbir=1 kBϕr kH2 ) trary orthonormal basis in H1 (see, e.g., Kato [25]). To ensure the existence of the p×d true β0 , we further define H3 = {β ∈ L2 (S × T ) : kBkGS < ∞}, a subspace of p×d L2 (S × T ) in which β is restricted to finite Gram-Schmidt norms. We define the P inner product of any β1 and β2 in H3 as hβ1 , β2 iH3 = ∞ r=1 hB1 ϕr , B2 ϕr iH2 , where R > > B` ϕr (t) = S (ϕr (s)) β` (s, t)ds , ` = 1, 2, equipped with the norm k · kH3 = h·, ·i1/2 H3 .

4

> R Given X in H1 and for any β in H3 , let LX : H3 → H2 , (LX β)(t) = S (XZ (s))> β(s, t)ds , the integrable regression operator. The true β0 is defined by minimizing the expectation of the squared H2 -normed distance between Y Z and the regression operator LX on β: β0 =arg min EkY Z − (LX β)k2H2 . (4) β∈H3

The minimizer of (4) is expressed as follows: β0 (s, t) =

∞ X Y X ξZ,q ) E(ξZ,r

r,q=1

λ

X Z,r

 > X φZ,r (s) φYZ,q (t) .

(5)

The derivation of (5) is provided in Supplement W1. The result (5) for p = 1 and d = 1 was derived in He et al. [22]. It is worth noting that β0 is unique if and only if the condition (C1) in Section 4 holds. (cf. Theorem 2.3 in [22].) Besides the optimality criterion (4), β0 (s, t) can also be derived as follows. 2 = P∞ kBφX k2 = P∞ hBφX , φY i2 , where Here, kBkGS Z,r H2 Z,r Z,q H2 r=1 r,q=1 X , φYZ,q iH2 = hBφZ,r

Z

S×T

X X (φZ,r (w))> β(w, t) φYZ,q (t) dwdt = hβ, φZ,r (φYZ,q )> iH3 .

 > P X Y > X Y X hβ, φ (φ ) i φ (s) φ (t) , and {φZ,r (s)(φYZ,q (t))> } is Hence, β(s, t) = ∞ H Z,r Z,q Z,r Z,q 3 r,q=1 P∞ X (s)(φYZ,q (t))> , an orthonormal basis in H3 . We can express β0 (s, t) = r,q=1 brq φZ,r X where brq = hβ0 , φZ,r (φYZ,q )> iH3 . Plugging it into E {Y Z (t) | XZ } derived in (3) and by X the orthonormality of {φZ,r } yields ∞



 XX E Y Z (t) | XZ = brq ξZX,r φYZ,q (t).

(6)

q=1 r=1

P X Y X By the orthonormality of {φYZ,q }, E[ξZY,q | XZ ] = ∞ r=1 brq ξZ ,r . Hence, E(ξZ ,q | ξZ ,r ) = E{E(ξZY,q | XZ ) | ξZX,r } = brq ξZX,r . Multiplying ξZX,r and taking the expectation on both sides, we obtain E(ξZY,q ξZX,r ) = brq E{(ξZX,r )2 } and the result of β0 (s, t) in (5) follows. 3. Model estimation and statistical inference 3.1. Estimation of the multivariate functional linear model To estimate β0 (s, t) in (5), we consider a truncated version of (1). Let XLZ (s) =

L X r=1

 > X X ξZX,r φZX,r (s) = ϕZ,L (s) ξZ,L , s ∈ S, 5

(7)

Z X > X X X X ) (resp. Y M (t) = , . . . , ξZ,L = (ξZ,1 (s))> and ξZ,L (s), . . . , φZ,L where ϕX (s) = (φZ,1 P M Y Z,LY > Y Y Y > Y Y q=1 ξZ ,q φZ ,q (t) = (ϕZ,M (t)) ξZ,M , t ∈ T , where ϕZ,M (t) = (φZ,1 (s), . . . , φZ,M (t)) and Y Y Y )> ) with L (resp. M) unknown and to be determined. Here, we , . . . , ξZ,M = (ξZ,1 ξZ,M choose L (resp. M) to denote the minimum number of components necessary to ensure that the leading L (resp. M) components explain at least 100δ% of the total variation. We use δ = 0.95 in this study. Other selection criteria, such as AIC- and BIC-like methods, could also be used. PM PL,M X q φYZ,q (t) with q = (s)(φYZ,q (t))> and ζ M (t) = q=1 Let βLM (s, t) = r,q=1 brq φZ,r hζ, φYZ,q iH2 . Then, we have a truncated version of (3),

Y M (t) = Z

Z

S

>

(XL (s)) βLM (s, t)ds Z

!>

+ ζ M (t),

t ∈ T.

(8)

Z By substituting the expressions of Y M (t), XLZ (t), βLM (s, t), and ζ M (t), (8) can be reduced to a multivariate regression model,

(ξZY,M )> = (ξZX,L )> PLM + ( M )> ,

(9)

where PLM = {brq }, 1 ≤ r ≤ L, 1 ≤ q ≤ M, and  M = (1 , . . . ,  M )> . Besides, P M PL Z X Y we have a truncated version of (6), E{Y M (t) | XLZ } = q=1 r=1 brq ξZ ,r φZ ,q (t) = Y > > X Y X (ϕZ,M (t)) (PLM ) ξZ,L . Based on (9) and given realizations {ξZ,iM } and {ξZ,iL } of ξZY,M and ξZX,L , the least square estimator of PLM is > −1 > PLS LM = (ΘX ΘX ) ΘX ΘY ,

(10)

with ΘX = (ξZX,1L , . . . , ξZX,nL )> and ΘY = (ξZY,1M , . . . , ξZY,nM )> , where (ξZX,iL , ξZY,iM ) are the vectors of scores of (XiZ , YiZ ) that are sampled from (XZ , Y Z ). In practice, realizations of (X, Y) may be contaminated with measurement errors and observed sparsely. We can use either the weighted least squares (WLS) or the conditional expectation methods to obtain the estimates of ξZX,iL and ξZY,iM i = 1, . . . , n (see Chiou et al. [8]). Here, we use the WLS estimate, which is summarized in Appendix A1. To estimate βLM , we obtain the estimates of µˆ X (resp. µˆ Y ) for µX (resp. µY ) and Cˆ X (resp. Cˆ Y , Cˆ XY ) for CX (resp. CY , CXY ), using one- and two-dimensional X X X smoothing techniques. The estimates of φˆ ZX,r , λˆ Z,r (resp. φˆ YZ,q , and λˆ YZ,q ) for φZ,r , λZ,r (resp. φYZ,q , and λYZ,q ) are obtained from the spectral decomposition of Cˆ X (resp. Cˆ Y ), coupled with discrete approximations. (see Supplement W2 for more details). By   ˆ ˆ > ˆ −1 Θ ˆ> plugging in the estimates of ΘX and ΘY in (10), we obtain Pˆ LS X ΘY . LM = ΘX ΘX X Y Given the estimates ϕˆ Z,L and ϕˆ Z,M , we have the least squares estimate of β0 ,  > X ˆ YZ,M (t). βˆ 0LS (s, t) = ϕˆ Z,L (s) Pˆ LS LM ϕ 6

(11)

X , where ρrq = E(ξZY,q ξZX,r ), we may estimate β0 Alternatively, since brq = ρrq /λZ,r through the cross-covariance estimate of ρrq . Since the cross-covariance CXY (s, t) =  > P∞ P∞ > in H and ρ XY X Y Y X (t)} = hC , φ iH3 , we obtain the (s){φ ρ φ 3 rq rq Z,r φZ,q Z,q r=1 q=1 RZ,r > XY Y X ˆ ˆ ˆ estimate ρˆ rq = S×T {φZ,r (s)} C (s, t) φZ,q (t)dsdt and, thus, the estimate of β0 ,

 > X ˆ Z,L ˆ YZ,M , βˆ CC Pˆ CC LM ϕ 0 (s, t) = ϕ

(12)

n o X ˆ CC ˆ CC ˆ rq /λˆ Z,r where Pˆ CC = b . rq 1≤r≤L and brq = ρ LM 1≤q≤M

3.2. Prediction and inference of the multivariate functional linear model Given a new predictor X∗ , X∗ (s) = (X1∗ (s), . . . , X ∗p (s))> , we predict Y ∗ (t) by the conditional expectation E{Y ∗ (t) | X∗ } = µY (t) + DY (t) E{Y ∗Z (t) | X∗Z }, where E{Y ∗Z (t) | X∗Z } =

∞ X ∞ X



brq ξZX,r φYZ,q (t),

(13)

r=1 q=1



and X∗Z (s) = DX (s)−1 {X∗ (s) − µX (s)}. The ξZX,r = hX∗Z , φZX,r iH1 is a latent coef∗Z (t) = PL P M b ˆ ˆ X∗ ˆ Y ˆ ficient to be estimated. Let Yˆ LM r=1 q=1 rq ξZ ,r φZ ,q (t), where brq denotes ˆ LS ˆ Y the estimate bˆ CC rq or brq , φZ ,q (t) is obtained based on the estimates of the observed ∗ trajectories, and ξˆZX,r is estimated via WLS as is summarized in Appendix A1. We obtain the predicted response function, ∗ Yˆ LM (t) = µˆ Y (t) +

L X M X r=1 q=1

 ∗  ˆ Y φˆ YZ,q (t), bˆ rq ξˆZX,r D

(14)

ˆ Y and φˆ YZ,q are the estimates of DY and φYZ,q . In matrix form, we write where D >  > ∗  ∗Z Yˆ LM (t) = ϕˆ YZ,M (t) Pˆ LM ξˆZX,L ,

(15)

∗ ∗ ∗ ˆ CC where ξˆZX,L = (ξˆZX,1 , . . . , ξˆZX,L )> , and Pˆ LM can be Pˆ LS LM or PLM . Statistical inference ∗Z (t) (15) depends on the convergence rates of the components, including of Yˆ LM ∗ the estimates ϕˆ YZ,M (t), ξˆZX,L and Pˆ LM = {bˆ rq }. Please refer to Supplement W3 for ˆ ˆ ZX,L have the same rates of convergence, more details. When bˆ rq = bˆ CC rq , brq and ϕ ∗ ∗ while ξˆZX,L has a slower rate of convergence, as ξˆZX,L depends on ϕˆ ZX,L . Consequently, ∗ the inference of (15) depends on the asymptotic normality of ξˆZX,L . Alternatively, LS , the inference of (15) is based on the asymptotic normality of b LS ˆ rq when bˆ rq = bˆ rq as it has a slower rate of convergence comparing with the estimates of the other

7

∗ (t) (14) is presented in Theorem 4.2. It components. Asymptotic properties of Yˆ LM ∗Z (t) | X, X ∗ ) ≡ ω LS requires the estimate of cov(Yˆ LM LM (t, t), such that when brq = brq ,  >  o −1 ∗  n LS X∗ > X ωLM (t, t) = ξZ,L ΘX ΘX ξZ,L (ϕYZ,M (t))> Σ M ϕYZ,M (t) , (16)

where Σ M = cov ( M ,  M ) = {σq1 q2 } with  M and ΘX defined in (9) and (10), and when brq = bCC rq ,  > Y ωCC (t, t) = ϕ (t) (PLM )> ΩL PLM ϕYZ,M (t), (17) Z ,M LM

X ∗ −1 X∗ X∗ where ΩL = (ΨZ,L ) , the inverse of the covariance matrix of (ξˆz,iL − ξz,iL ), with LS X∗ ˆ LM (t, t) ΨZ,L given in (A.1) in Appendix and PLM is defined in (9). The estimates ω X∗ ˆ CC and ω (t, t) are obtained by substituting the individual unknown terms ξZ,L , ΘX , LM Y ϕZ,M (t), Σ M , and PLM in (16) and (17), with their estimates. In particular, the −1 ˆ ˆ ˆ LS ˆ ˆ LS > ˆ ˆ estimate of Σ M in ωLS LM (t, t) is Σ M = n {(ΘY − ΘX PLM ) (ΘY − ΘX PLM )}. > Given any scalar d-vector a = (a1 , . . . , ad ) , a 100(1 − α)% pointwise confidence interval (CIL (t), CIU (t)) for E{a> Y ∗Z (t) | X∗ }, for any t ∈ T , can be approximated by  i1/2 α h > ∗Z ˆ LM (t, t)} a a> Yˆ LM (t) ± Φ−1 1 − a {ω , 2 where Φ is the standard normal cumulative distribution function, and the simultaneous confidence band (CBL (t), CBU (t)) for E{a> Y ∗Z (t) | X∗ }, for all t ∈ T , is approximated by h i1/2 ∗Z ˆ LM (t, t)} a a> Yˆ LM (t) ± χ2L,1−α a> {ω ,

where χ2L,1−α is the 100(1 − α)th percentile of the chi-squared distribution with L degrees of freedom. In particular, when a = ek , k = 1, . . . , d, the unit vector whose kth component is 1, the 100(1 − α)% pointwise confidence interval and the simultaneous confidence band for E{Yk∗ (t) | X∗ } are approximately as follows:   µˆ Yk (t) + vˆ Y (t)1/2 CIL (t), µˆ Yk (t) + vˆ Y (t)1/2 CIU (t) ,   µˆ Yk (t) + vˆ Y (t)1/2 CBL (t), µˆ Yk (t) + vˆ Y (t)1/2 CBU (t) .

For the prediction interval of Y Z∗ , we assume the error process ζ(t) in (3) is Gaussian. Note that the estimate of the covariance, n o n o Z∗ Z∗ cov Yˆ LM (t) − Y Z∗ (t) = cov Yˆ LM (t) − E Y Z∗ (t) | XZ∗ + cov {ζ(t)} , (18)   ˆ ˆ = diag κˆ 2 (t), . . . , κˆ 2 (t) , where κˆ 2 (t) ˆ LM (t, t) + E(t), can be obtained by ω and E(t) 1 d k denotes the estimates of var{ζk (t)}, 1 ≤ k ≤ d. Given any scalar d-vector a = 8

(a1 , . . . , ad )> , a 100(1 − α)% pointwise prediction interval for a> Y ∗Z (t), for any t ∈ T , is approximated by  o i1/2 α h > n ∗Z ˆ ˆ LM (t, t) + E(t) a> Yˆ LM (t) ± Φ−1 1 − a ω a , 2

and the simultaneous prediction band for a> Y ∗Z (t), for all t ∈ T , is approximated by n o i1/2 h ˆ ˆ LM (t, t) + E(t) a> Yˆ ∗Z (t) ± χ2 a> ω a . LM

L,1−α

The 100(1 − α)% pointwise prediction intervals and the simultaneous prediction band for each Yk∗ (t) given X∗ can be obtained analogously by transforming back to the original scales. 4. Asymptotic properties

We define the conditions for the asymptotic properties of the estimators. P P∞ 2 Y X X (C1) ∞ q=1 r=1 brq < ∞, where brq = E(ξZ ,q ξZ ,r )/λZ,r . P P∞ X Y (C2) For all 1 ≤ l ≤ p and 1 ≤ k ≤ d, γlk (s, t) = ∞ r=1 q=1 |brq φZ,lr (s)φZ,kq (t)| is continuous in s and t. P P∞ 2 X Y (C3) ∞ r=1 q=1 brq λZ,r /λZ,q < ∞.

(C4) There exists a continuous positive-definite matrix of functions ωLS (t1 , t2 ) (or CC LS resp. ωCC (t1 , t2 )) such that ωLS LM (t1 , t2 ) → ω (t1 , t2 ) (or resp. ωLM (t1 , t2 ) → ωCC (t1 , t2 )) as L = L(n), M = M(n) → ∞ for all t1 , t2 ∈ T .

Condition (C1) is required for the convergence of β0 (5) in H3 sense, whereas (C2) is needed for the uniform convergence of β0 (s, t) for all s ∈ S, t ∈ T . Conditions (C3) and (C4) are used to establish the properties of prediction and inference. ∗Z (t) | X, X ∗ ). In (C3), the term Condition (C4) assures the convergence of cov(Yˆ LM 2 X Y Y X 2 Y X X brq λZ,r /λZ,q = {corr(ξZ,q , ξZ,r )} , where corr(ξZ,q , ξZ,r ) = E(ξZY,q ξZX,r )/(λZ,r λYZ,q )1/2 represents the correlation coefficient between ξZY,q and ξZX,r . Thus, (C3) implies that the correlation between X and Y need to decay fast enough as r, q → ∞. These conditions are required to handle the infinite-dimensional random functions X and Y, which can be fulfilled when brq converges to 0 rapidly. We can X write b2rq = {corr(ξZY,q , ξZX,r )}2 (λYZ,q /λZ,r ). Given that −1 ≤ corr(ξZY,q , ξZX,r ) ≤ 1 and, Y X for a fixed q, (λZ,q /λZ,r ) → ∞ as r → ∞, (C1) implies it holds for each q that n o n o X corr(ξZY,q , ξZX,r ) → 0 as r → ∞. Moreover, when φZ,r and φYZ,q are uniformly bounded, (C2) implies (C1), and (C3) can be dropped. These conditions are parallel to (A1)–(A3) and (A6) in [36] to establish the asymptotic properties of the univariate functional regression model. 9

Lemma 4.1. Given fixed L and M, under conditions of Theorem A3 in Appendix A2, a.s. ˆ LS we have Pˆ LS LM → PLM as n → ∞, and PLM is asymptotic normal such that for all 1 ≤ q, q1 , q2 ≤ M,   D   −1 √  LS n bˆ ·q − b·q → N 0, ΛXL σqq and

 −1 LS ˆ LS P n cov( bˆ ·q , b·q2 ) → ΛXL σq1 q2 , 1

LS (resp. b ) is the q-th column of P ˆ LS (resp. PLM ), ΛX = diag(λX , . . . , λX ) where bˆ ·q ·q L L LM 1 and σq1 q2 is as in (16).

Lemma 4.1 lays the basis for the following theorem. Theorem 4.1. (Consistency and inference of βˆ 0LS ) Assume that the conditions in Theorems A1 and A3 in Appendix A2 hold. (a) If (C1) holds, then

lim kβˆ 0LS − β0 kH3 = 0 a.s.

n→∞

a.s. (b) If (C2) holds, then given fixed L and M, for all s ∈ S and t ∈ T , βˆ 0LS (s, t) → β0 and βˆ 0LS is asymptotic normal such that for all 1 ≤ k ≤ d,

and

n o  D  √  LS n βˆ ·k (s, t) − β·k (s, t) → N 0, BX (s) e> k BY (t)ek n o   P n cov βˆ ·kLS1 (s, t), βˆ ·kLS2 (s, t) → BX (s) e> B (t)e Y k 2 k1

where βˆ ·kLS (resp. β·k ) is the k-th column vector of βˆ 0LS (resp. βLM ), BX (s) =  −1 X X ϕZ,L (s)> ΛXL ϕZ,L (s) and BY (t) = ϕYZ,M (t)> Σ M ϕYZ,M (t).

X LS . The proofs Here the asymptotic properties of βˆ 0LS rely on φˆ Z,r , φˆ YZ,q and bˆ rq of Lemma 4.1 and Theorem 4.1 are provided in Appendices A3 and A4. The following result is derived from the above theorems.

Lemma 4.2. Under the conditions of Theorems A1 and A2 in Appendix A2, it holds that, for 1 ≤ r ≤ L and 1 ≤ q ≤ M, CC bˆ rq − brq → 0 a.s.

By Lemma 4.2, it also holds that lim kβˆ CC 0 − β0 kH3 = 0 a.s.. The rate of n→∞ convergence depends on the regularity conditions in Supplement W3. The results in Lemmas 4.1 and 4.2 form the bases for the following theorem. 10

∗Z ) If (C3) and (C4) hold for all Theorem 4.2. (Consistency and inference of Yˆ LM t ∈ T and under the conditions in Theorems A1– A3 in Appendix A2, then

 ∗Z lim Yˆ LM (t) = E Y ∗Z (t) | X∗Z in probability,

n→∞

for each k = 1, . . . , d, and for any t ∈ T ,   ∗Z ˆ LM (t, t))−1/2 Yˆ LM (t) − E Y ∗Z (t) | X∗Z = D ∼ N(0, Iq ). lim (ω n→∞

In addition, for a fixed L and M, n o   a> Yˆ ∗Z (t) − Y ∗Z (t) q   LM LM   2 χ lim P sup ≤  ≥ 1 − α, L,1−α > n→∞  t∈T ˆ LM (t, t) a a ω

ˆ LM (t, t) represent either ω ˆ LS where a = (a1 , . . . , ad )> is a scalar d-vector, and ω LM (t, t) CC LS CC ˆ ˆ ˆ LM (t, t) when brq is used as the estimate of brq . when brq is used or ω The proof of Theorem 4.2 is provided in Appendix A5. While the consis∗Z (t), it also holds that Y ˆ ∗ (t) in (14) converges in tency result is derived for Yˆ LM LM ∗ ∗ probability to E (Y (t) | X ) along with the asymptotic distributions under suitable transformation. The result of Theorem 4.2 can be used to construct the pointwise prediction intervals and the simultaneous prediction band for Y ∗ (t) as discussed in Section 3.2. It is worth noting that the prediction band is constructed under fixed L and M, which does not take account of the uncertainty due to truncation. As shown in the proof of Theorem 4.2, this effect may be ignored if the remainder term of truncation in (C3) is vanishes rapidly. 5. Simulation study and application to traffic prediction 5.1. Simulation We generate the cross-correlated observations for the random functions X and Y via (7) and (8) with p = d = 3. We outline the data generating process as follows, while more details are provided in Supplement W4. We set the eigenfunctions {φZX,r } and {φYZ,q } obtained through the pre-specified n o covariance functions as in Chiou et al. [8]. Then, we set the values of brq , 1 ≤ PL P M X Y > r ≤ L, 1 ≤ q ≤ M, and obtain βLM (s, t) = r=1 q=1 brq φZ,r (s)(φZ,q (t)) , where L = M = 10. Next, to obtain the realizations of XLZ (s) in (7) and ζ M (t) in (8), >  we generate the variates of ((ξZX,L )> , ( M )> )> , where ξZX,L = ξZX,1 , . . . , ξZX,L and  M = (1 , . . . ,  M )> , by the multivariate normal distribution with the mean of 0 11

and the pre-specified covariance cov(ξZX,L ) = ΛZX and cov( M ,  M ) = Σ M , where Z ξZX,L and  M are uncorrelated. We then obtain the realizations of Y M (t) according X Y to (8). Further, we set the mean function µ (s) and µ (t) and the variance functions vX (s) and vY (t)to obtain realizations   of X and Y. Finally, we add random errors εX,i ∼ N 0, σ2X and εY ,i ∼ N 0, σ2Y with the pre-specified variances σ2X and σ2Y to the realizations of X and Y to form measurement-error contaminated observations. We define the mean prediction error (MPE) as the performance measure for the kth variable, nk Z 1 X MPEk = (Yˆ ∗ (t) − Yki∗ (t))2 dt/|T |, (19) nk i=1 T ki,LM

where nk is the number of sample trajectories for each test data set, |T | is the length ∗ of time T , Yˆ ki,LM denotes the predicted trajectories, and Yki∗ is the ith observed trajectory of the test dataset. For comparisons, let mFLRn represents the proposed mFLR approach; mFLRu the mFLR model similar to the proposed method but without normalization in the mFPCA procedure; and uFLR the univariate FLR with a univariate response function and multivariate predictor functions as in mFLRu. We compare the performance of these methods based on 200 simulation replicates. Table 1 summarizes the MPE performance of the three methods. The numbers of components L and M are chosen by the criterion of the fraction of variance explained, FVE = 95%. The results using 90% and 99% are close to but slightly bigger than 95% for mFLRn. The results show that mFLRn has the smallest MPEk in both the sample averages and the standard errors for each of the response variables. In addition, the method of mFLRn using βˆ 0LS performs slightly better than βˆ CC 0 . The proposed mFLRn outperforms mFLRu and uFLR in the simulation, indicating that mFLRn effectively deals with incomparable heteroscedasticity among the components of a multivariate random function and takes the advantage of the Table 1: Averages of the mean prediction errors (MPEk ) (with the standard errors in parenthesis) based on 200 simulation replicates.

Meth. βˆ LS βˆ CC

Var. Y1 Y2 (×10−2 ) Y3 Y1 Y2 (×10−2 ) Y3

mFLRn mFLRu uFLR 38.92 (0.30) 43.71 (0.37) 43.70 (0.04) 8.04 (0.07) 9.34 (0.07) 9.31 (0.07) 7.67 (0.04) 8.40 (0.05) 8.40 (0.05) 39.18 (0.30) 44.16 (0.37) 44.21 (0.37) 8.06 (0.07) 9.60 (0.07) 9.41 (0.07) 7.68 (0.04) 8.51 (0.06) 8.55 (0.06)

12

Table 2: Average coverage probabilities of βlkLS (s, t), 1 ≤ k, l ≤ 3 under different FVE settings.

FVE (0.90) k=1 k=2 k=3 l = 1 0.83 0.82 0.83 l = 2 0.86 0.87 0.85 l = 3 0.77 0.76 0.78

FVE (0.95) k=1 k=2 k=3 0.93 0.93 0.89 0.93 0.93 0.90 0.92 0.93 0.89

FVE (0.99) k=1 k=2 k=3 0.95 0.95 0.95 0.96 0.95 0.96 0.95 0.95 0.95

correlation information among the random functions. To investigate the truncation effect of the selected number of components on the derived pointwise confidence bands, we choose different values of FVE. Table 2 shows the simulation results, where the probabilities are taken as the sample averages over the recording time points. When FVE is 0.90, the probabilities are low indicating that the standard normal quantile values need to be scaled upward. They are much improved when FVE is 0.95, but are slightly under estimated. The coverage probabilities are on the right target when FVE is set as 0.99. The phenomena are due to the truncation lags L and M while the confidence bands do not consider the uncertainty. Therefore, one has to be cautious when using the confidence bands. Figure 1 illustrates a sample of prediction curves along with the 95% pointwise prediction intervals and the simultaneous prediction bands, in comparison with LS and b ˆ CC the trajectories from the test data set. Since the estimates bˆ rq rq perform similarly well, we display the former only. The simultaneous prediction bands are a little wider than the pointwise prediction intervals as expected, and they generally cover the “observed” trajectories of the test data. 5.2. Application to functional prediction of traffic streams Short-term traffic prediction is central to the development of intelligent transportation systems. The daily trajectories based on the three macroscopic traffic characteristics: vehicle speed (average speed in kilometers per hour), flow rate (vehicle count per unit time), and occupancy (percentage of time in which a unit length of roadway is occupied by vehicles), together form a speed-flow-occupancy functional variable. Whereas existing research on traffic prediction emphasizes either flow rate or vehicle speed, we used the proposed mFLR model to predict multivariate traffic trajectories of the three variables simultaneously, based on historical data and up-to-date partially observed speed-flow-occupancy trajectories. The data were recorded over 24 hours in 5-minute intervals during 2011 by a vehicle detector near the Shea-San Tunnel on National Highway 5 in Taiwan. Since traffic 13

(a) Y1

(b) Y2

(c) Y3

10

150

60

100

40 5

50

20

0

0

0

−20

−50

−40

−5

−100

−60

−150 0

1

2

3

4

5 −10 0

1

2

3

4

5

0

1

2

3

4

5

Figure 1: A sample of the multivariate predicted trajectories (curves in red) using the mFLRn, coupled with βˆ 0LS , with the 95% pointwise prediction interval (areas in orange) and simultaneous prediction band (areas in light blue), superimposed on the observed trajectories (curves in grey).

patterns in weekdays may differ from those on holidays and at weekends, here we consider weekday traffic trajectories for our data example, leading to 116 days for training dataset and 103 days for test dataset. Let X1 , X2 , and X3 denote the variables of speed, flow, and occupancy. We take the notion that each traffic profile is a realization of daily traffic trajectories that are sampled from multivariate random functions. To illustrate the prediction performance, let τ be the time point up to which the partial observation is available. Let the time intervals S = [0, τ] and T = [τ, T ] with 0 ≤ τ < T and T = 24 hours. Table 3 compares the prediction performance in terms of MPE, when τ = 8 and 14 hours. The mFLRn approach has the smaller MPE than mFLRu and uFLR, and there are significant reductions in percentage of MPE from mFLRn to mFLRu and uFLR, indicating that mFLRn has better prediction performance. Figure 2 illustrates a sample of the multivariate predicted trajectories for speed, flow rate, and occupancy (curves in red) when τ = 8 and 14 hours using the mFLRn Table 3: Average mean prediction errors MPEk (19) of the test dataset of the traffic flow data.

τ=8 Variable mFLRn mFLRu uFLR Speed(Y1 ) 0.62 0.63 0.67 Flow(Y2 ) 39.66 40.20 49.36 Occup(Y3 ) 0.30 0.30 0.35

14

τ = 14 mFLRn mFLRu uFLR 0.75 0.77 0.84 30.32 36.33 38.43 0.26 0.30 0.33

(a) Speed (τ = 8) 100

(b) Flow Rate (τ = 8) (c) Occupancy (τ = 8) 350

35

300

30

250

25

80

200

20

75

150

15

100

10

95 90 85

70 65 50

60 55 0

5

10

15

20

0 0

5

5

10

Hour

15

20

0 0

5

10

Hour

(d) Speed (τ = 14) 100

20

(e) Flow Rate (τ = 14) (f) Occupancy (τ = 14) 350

35

300

30

250

25

80

200

20

75

150

15

100

10

95

15

Hour

90 85

70 65 50

60 55 0

5

10

15

Hour

20

0 0

5

5

10

15

Hour

20

0 0

5

10

15

20

Hour

Figure 2: Samples of the multivariate predicted trajectories (curves in red) when τ = 8 and 14 hours using the mFLRn and βˆ 0LS approach along with 95% pointwise prediction intervals (areas in orange) and simultaneous prediction bands (in light blue), superimposed on the observed trajectories (curves in grey).

LS . The 95% pointwise prediction intervals and the approach with the estimate bˆ rq simultaneous prediction band are superimposed on the observed trajectories. The widths of the prediction intervals become a little shorter as the value of τ move toward the end of the day since uncertainty lessens when more observed information is collected.

6. Discussion Functional data are intrinsically infinite dimensional and are closely related to high-dimensional statistics. Connections between functional data analysis and high-dimensional statistics were discussed in Bongiorno et al. [4]. Besides, Aneiros and Vieu [2] investigated the variable selection of FLR with multiple functional predictors in the infinite dimensional variable problems. In this study, we consider the case in which the dimensions of the response and the predictor variables are of moderate sizes. When predictor variables are ultrahigh-dimensional, pre-screening may be required to select relevant predictor variables before applying the proposed mFLR modeling procedure, and this preliminary process may need further investigation.

15

Acknowledgement This study was supported in part by a grant from Academia Sinica and Taiwan Ministry of Science and Technology (NSC101-2118-M-001-MY3). Appendix A. Technical details A1. Estimation of the model components Let (Xi , Yi ) be a pair of unobservable trajectories sampled by smooth random processes (X, Y), i = 1, . . . , n. Assume that the realizations of Xi (resp. Yi ) are observed at times {S ig ∈ S; i = 1, . . . , n, g = 1, . . . , miX } (resp. {T ih ∈ T ; i = 1, . . . , n, h = 1, . . . , mYi }), denoted by Uig = (U1ig , . . . , U pig )> (resp. Vih = (V1ih , . . . , Vdih )> ), which are contaminated with measurement errors. We then use one- and two-dimensional local linear regression to estimate the fixed model component such as µX (s), GX (s1 , s2 ) and CXY (s, t). Detailed procedures are given in Supplement W2. Analogously, let (XiZ , YiZ ) be pairs of random trajectories with predictor trajectories XiZ in H1 and response trajectories YiZ in H2 , i = 1, . . . , n. For Z Z each j = 1, . . . , p (resp. k = 1, . . . , d), define Ulig (resp. Vkih ) as a normalized trajectory associated with Ulig (resp. Vkih ). To estimate the functional principal component (FPC) scores ξZX,iL , where ξZX,iL = X (ξZ,i1 , . . . , ξZX,iL )> , we use the weighted least-squares approach, denoted by ξ˜ZX,iL . Let eZ = {(U eZ )> , . . . , (U eZ )> }> , where U eZ = (U Z , . . . , U Z X )> , and the mathe matrix U i ip i1 il li1 lim i

trix e φZX,ir = {(e φZX,1ir )> , . . . , (e φZX,pir )> }> , where e φZX,lir = {φZX,lr (S i1 ), . . . , φZX,lr (S imX )}> . i Using the WLS method, the predicted FPC scores are as follows: X eZ , ξ˜ZX,iL = [ΨZ,iL ]−1 e ϕZX,iL [ΥiX ]−1 U i

(A.1)

where ΥiX = diag((ςX2,i1 )> , . . . , (ςX2,ip )> ), with the measurement error variances ςX2,il = X (ς2 , . . . , ς2 X )> and ΨZ,iL =e ϕX (ΥX )−1 (e ϕX )> , with e ϕX = (e φX )> , . . . , (e φX )> )> . X ,li1

X ,limi

Z ,iL

i

Z ,iL

Z ,iL

Z ,i1

Z ,iL

In addition, the estimates ξˆZX,ir are obtained by plugging in their estimates of the unknown quantities. The FPC scores for Yi and new predictor X∗ are estimated in the same way. Details in obtaining multivariate FPC scores were derived in Chiou et al. [8].

A2. Technical assumptions for the consistency of the estimated model components We list the assumptions that confirm the consistency properties of the estimates in Section A1. Theorems A1 and A3 along with the conditions (S1)–(S4) were derived in Chiou et al. [8]. The new Theorem A2 provides the consistency property for the estimate of the cross-variance CXY (s, t), which is required to establish the consistency of the estimate of bCC rq in Lemma 4.2. The theorems listed in Appendix A2 are required to derive the proofs of Theorems 4.1 and 4.2. 16

(S1) The density functions fS , of S ig and fT , of T ih are bounded and differentiable, with a bounded derivative. (S2) Kernel K(·) used in local linear regression is a symmetric probability density function with support [-1,1], and is differentiable, with a bounded derivative. (S3) The mean functions µlX (s) and µYk (t) are twice continuously differentiable, with their second derivative bounded on S and T for 1 ≤ l ≤ p, 1 ≤ k ≤ d. (S4) The covariance function GllX (s, s) (resp. GYkk (t, t)) is bounded above by RvXl (resp. RYvk ) and bounded below by rvXl (resp. rvYk ) for some RvXl ≥ rvXl > 0 (resp. RYvk ≥ rvYk > 0). In addition, the second-order derivative of G Xjl (s1 , s2 ) (resp. GYkm (t1 , t2 ), ClkXY (s, t)) exists and is bounded for all 1 ≤ l, j ≤ p and 1 ≤ k, m ≤ d. Here, some regularity conditions concerning with convergent rate are omitted. Detailed conditions can be found in Supplement W3. For a vector u =  P p 2 1/2 and u (u1 , . . . , u p )> and a matrix A = {Akl }1≤k,l≤p , we take kuk2 = k=1 k 1/2 P p 2 . kAk2 = k,l=1 Akl

>  X Theorem A1. Let λˆ rX and φˆ rX (s) = φˆ 1r (s), . . . , φˆ Xpr (s) be the estimates of λrX and φrX (t). Under assumptions (S1)–(S4) and the corresponding regularity conditions, if λr is simple, then (a) |λˆ rX − λrX | = o(1) a.s. (b) kφˆ rX − φrX kH1 = o(1) a.s. (c) sup s∈S kφˆ rX (s) − φrX (s)k2 = o(1) a.s. Furthermore, similar results associated with λˆ Yq and φˆ Yq also hold. n o Theorem A2. Let Cˆ XY (s, t) = Cˆ lkXY (s, t) be the estimate of CXY (s, t).Under 1≤l≤p,1≤k≤d assumptions (S1)–(S4) and the corresponding regularity conditions, it holds that sup kCˆ XY (s, t) − CXY (s, t)k2 = o(1) a.s.

s∈S,t∈T

Theorem A3. Given a fixed 0 < δ0 < 1, set L = L(δ0 ). Under assumptions (S1)– (S4) and the corresponding regularity conditions, assume that miX → ∞ as n →  ∞. 2 1+δ 1 <η In addition, assume (i) 0 < δ1 < ∞ and 0 < η < ∞ exist such that E |X,lig | X X −1 X for all 1 ≤ l ≤ p and 1 ≤ g ≤ mi ; and (ii) 0 < δ2 < ∞ exists such that (pmi ) Ψz,iL n o X > δ2 > 0. is nonsingular when miX is sufficiently large, with det (pmiX )−1 Ψz,iL a.s. X X Under (i) and (ii), ξˆz,iL → ξz,iL as n → ∞, and   D ˆ X }−1/2 ξˆ X − ξ X → N(0, IL ), as n → ∞. {Ψ z,iL z,iL z,iL 17

A3. Proof of Lemma 4.1 X Since the consistency of ξˆz,iL is established in Theorem A3, it suffices to show the consistency and asymptotic normality of PLS LM as in (10). Note that (10) is the least squares estimate of the multivariate linear regression model (9). Based  = diag(λ1X , . . . , λXL ), the following conditions on the fact that limn→∞ 1/n Θ> X ΘX  exists and being positive definite; and (ii) are satisfied (i) limn→∞ 1/n Θ> X ΘX  > −1 X > X max1≤i≤n ξZ,iL ΘX ΘX ξZ,iL → 0 as n → ∞, which confirm the consistency and asymptotic normality (cf. Theorem 7.2.2 in Sen and Singer, 1994). A4. Proof of Theorem 4.1 o> n PL P M Y X (t) . We have Proof. (a) Let βLM (s, t) = r=1 (s) φ b φ rq Z,q Z,r q=1 kβˆ 0LS − β0 kH3 ≤ kβˆ 0LS − βLM kH3 + kβLM − β0 kH3 .

With the consistency results given in Lemma 4.1 and Appendix A2, it follows P P∞ a.s. 2 kβˆ 0LS − βLM kH3 → 0. And kβLM − β0 k2H3 = ∞ r=L+1 q=M+1 brq → 0 under (C1), which completes the proof. PL P M ˆ LS X X Y ˆ ˆY (b) Let ∆1 = sup(s,t)∈S×T r=1 q=1 brq φZ,lr (s)φZ,kq (t) − brq φ Z,lr (s)φZ,kq (t) for the P∞ P∞ X (s)φYZ,kq (t) for first (L, M) terms, and let ∆2 = sup(s,t)∈S×T r=L+1 q=M+1 brq φZ,lr the remaining terms. It follows that sup(s,t)∈S×T βˆ 0lk (s, t) − β0lk (s, t) ≤ ∆1 + ∆2 a.s.

with ∆1 → 0 as in (a) and ∆2 →0 under (C2). And the asymptotic normality follows directly from that of Pˆ LS  LM in Lemma 4.1. A5. Proof of Theorem 4.2 ∗Z (t) = PL P M b ξ X ∗ φY (t). Then, Proof. (Proof of consistency) Let Yk,LM r=1 q=1 rq Z ,r Z,kq     Yˆ ∗Z (t) − E Y ∗Z (t) | X∗Z ≤ Yˆ ∗Z (t) − Y ∗Z (t) + Y ∗Z (t) − E Y ∗Z (t) | X∗Z k,LM k k,LM k,LM k,LM k = ∆ 1 + ∆2 . a.s.

Here, ∆1 → 0 is acquired by the consistency properties of previous result. Further, ∞ ∞ X  X X   2 h i2  b2rq λZ,r ∗Z ∗Z ∗Z Y Y E Yk,LM (t) − E Yk (t) | X = λ φ (t) , Z,q Z,kq λYZ,q r=L+1 q=M+1

where [λYZ,q φYZ,kq (t)]2 is uniformly bounded by Mercer’s theorem, which implies p

∆2 → 0 under (C3). (Proof of asymptotic normality) As discussed in Section 3.2, the asymptotic norˆ X∗ mality follows from that of Pˆ LS LM as is provided in Lemma 4.1 or that of ξZ,L as shown ∗Z (t) as in (15), and analogously for Y ∗Z (t), the in Theorem A3. By expressing Yˆ LM LM asymptotic normality results are obtained similarly, following the procedure in the proof of Corollary 5.1 in Chiou et al. [8].  18

Appendix B. Supplementary data Supplementary material related to this article can be found online at hURLi. References [1] U. Amato, A. Antoniadis, I. De Feis, Dimension reduction in functional regression with applications. Computational Statistics & Data Analysis 50 (2006) 2422–2446. [2] G. Aneiros, P. Vieu, Variable selection in infinite-dimensional problems. Statistics & Probability Letters 94 (2014) 12–20. [3] G. Aneiros-Perez, P. Vieu, Semi-functional partial linear regression. Statistics & Probability Letters, 76 (2006) 1102–1110. [4] E. Bongiorno, E. Salinelli, A. Goia, P. Vieu, Contributions in infinite-dimensional statistics and related topics. Esculapio, 2014. [5] H. Cardot, F. Ferraty, P. Sarda, Spline estimators for the functional linear model. Statistica Sinica 13 (2003) 571–591. [6] D. Chen, P. Hall, H.G. M¨uller, Single and multiple index functional regression models with nonparametric link. Annals of Statistics 39 (2011) 1720–1747. [7] J.M. Chiou, Dynamical functional prediction and classification, with application to traffic flow prediction. The Annals of Applied Statistics 6 (2012) 1588–1614. [8] J.M. Chiou, Y.T. Chen, Y.F. Yang, Multivariate functional principal component analysis: A normalization approach. Statistica Sinica 65 (2014) 1571–1596. [9] J.M. Chiou, H.G. M¨uller, J.L. Wang, Functional quasi-likelihood regression models with smooth random effects. Journal of the Royal Statistical Society, Ser. B 65 (2003) 405–423. [10] J.M. Chiou, H.G. M¨uller, J.L. Wang, Functional response models. Statistica Sinica 14 (2004) 675–693. [11] A. Cuevas, A partial overview of the theory of statistics with functional data. Journal of Statistical Planning and Inference 147 (2014) 1–23. [12] J. Faraway, Regression analysis for a functional response. Technometrics 39 (1997) 254–261. [13] Y.Y. Fan, N. Foutz, G.M. James, W. Jank, Functional response additive model estimation with online virtual stock markets. Annals of Applied Statistics 8 (2014) 2435–2460. [14] M. Febrero-Bande, W. Gonzalez-Manteiga, Generalized additive models for functional data. Test 22 (2013) 278–292. [15] F. Ferraty, A. Peuch, P. Vieu, Single functional index model. C. R. Academic Science Paris, Ser. I 336 (2003) 1025-1028. [16] F. Ferraty, P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice. New York: Springer, 2006. [17] F. Ferraty, I. Van Keilegom, P. Vieu, Nonparametric regression when both response and predictor are random functions. Journal of Multivariate Analysis 109 (2012) 10–28. [18] F. Ferraty, P. Vieu, Additive prediction and boosting for functional data. Computational Statistics & Data Analysis 53 (2009) 1400–1413.

19

[19] A. Goia, A functional linear model for time series prediction with exogenous variables. Statistics & Probability Letters 82 (2012) 1005–1011. [20] A. Goia, P. Vieu, A partitioned single functional index model. Computational Statistics, 30 (2015) 673–692. [21] P. Hall, J.L. Horowitz, Methodology and convergence rates for functional linear regression. The Annals of Statistics 35 (2007) 70–91. [22] G. He, H.G. M¨uller, J.L. Wang, W.J. Yang, Functional linear regression via canonical analysis. Bernoulli 16 (2010) 705–729. [23] L. Horv´ath, P. Kokoszka, Inference for functional data with applications, volume 200. Springer Science & Business Media, 2012. [24] G.M. James, Generalized linear models with functional predictors. Journal of the Royal Statistical Society, Ser. B64 (2002) 411–432. [25] T. Kato, Perturbation Theory for Linear Operators. New York : Springer-Verlag, 1984, pp. 262. [26] Y. Li, T. Hsing, On rates of convergence in functional linear regression. Journal of Multivariate Analysis98 (2007) 1782–1804. [27] A, Mas, B. Pumo, Functional linear regression with derivatives. Journal of Nonparametric Statistics 21 (2009) 19–40. [28] H. Matsui, S. Kawano, S. Konishi, Regularized functional regression modeling for functional response and predictors. Journal of Math-for-Industry 1 (2009) 17–25. [29] H.G. M¨uller, Functional modelling and classification of functional and longitudinal data. Scandinavian Journal of Statistics 32 (2005) 223–240. [30] H.G. M¨uller, U. Stadtm¨uler, Generalized functional linear regression. The Annals of Statistics 33 (2005) 774–805. [31] H.G. M¨uller, Yao, F., Functional additive model. Journal of the American Statistical Association 103 (2008) 1534–1544. [32] P. Radchenko, X. Qiao, G.M. James, Index models for sparsely sampled functional data. Journal of the American Statistical Association (2014) Accepted. [33] J.O. Ramsay, C.J. Dalzell, Some tools for functional data analysis (with discussion). Journal of the Royal Statistical Society, Ser. B 53 (1991) 539–572. [34] J.O. Ramsay, B. Silverman, Functional Data Analysis (2nd ed.). New York: Springer, 2005. [35] P.K. Sen, J.M. Singer, Large Sample Methods in Statistics: An Introduction with Applications. New York: Chapman & Hall/CRC, 1994. [36] F. Yao, H.G. M¨uller, J.L. Wang, Functional linear regression analysis for longitudinal data. The Annals of Statistics 33 (2005) 2873–2903. [37] M. Yuan, T. Cai, A reproducing kernel Hilbert space approach to functional linear regression. The Annals of Statistics 38 (2010) 3412–3444.

20