Function-on-function quadratic regression models

Function-on-function quadratic regression models

COMSTA: 106814 Model 3G pp. 1–14 (col. fig: nil) Computational Statistics and Data Analysis xxx (xxxx) xxx Contents lists available at ScienceDire...

NAN Sizes 1 Downloads 122 Views

COMSTA: 106814

Model 3G

pp. 1–14 (col. fig: nil)

Computational Statistics and Data Analysis xxx (xxxx) xxx

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Function-on-function quadratic regression models Yifan Sun, Qihua Wang



Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing 100190, China

article

info

Article history: Received 20 February 2019 Received in revised form 25 June 2019 Accepted 16 July 2019 Available online xxxx Keywords: Functional data analysis Quadratic regression Functional principal component analysis Test of significance Asymptotics

a b s t r a c t A quadratic regression model where the covariate and the response are both functional is considered, which is a reasonable extension of common function-on-function linear regression models. Methods to estimate the coefficient functions, predict unknown response and test significance of the quadratic term are developed in functional principal component regression paradigm. Asymptotic theories for these approaches are also established. A simulation study is presented to demonstrate the finite sample performances of the proposed methods and an application to real data is used to illustrate the improvement that can be gained by comparing to the function-on-function linear models. © 2019 Published by Elsevier B.V.

1. Introduction

1

With recent development of data collection and storage technology, datasets that are represented as curves or images have been encountered widely in several areas such as spectrochemical analysis, environtology and biomedical science. Therefore functional data analysis (FDA) has received increasing attention and a considerable number of methods, theoretical results and application research have been proposed in the last three decades. Readers can refer to Ramsay and Silverman (2005), Ferraty and Vieu (2006), Horváth and Kokoszka (2012) and Hsing and Eubank (2015) for general introductions. There are abundant real examples with multiple underlying functional objects. To list a few, Yang et al. (2011) measured Functional correlation between systolic blood pressure and body mass index, and correlation between pairs of repeated bidding in on-line auctions, Sun et al. (2018) studied regression models for curves of temperature and precipitation, and gene expression levels and histone variant levels. A series of articles have made excellent contributions to investigate regression models with a functional response and functional predictors, most of which have concentrated on the standard function-on-function linear model Y (t) = α (t) +



X (s)β (s, t)ds + ϵ (t).

(1)

2 3 4 5 6 7 8 9 10 11 12

13

S

Both the response Y (t) and the predictor X (s) are functional, ϵ (t) is a mean 0 random error function that is independent of the predictor. Analogous to nonparametric statistics, a commonly used tactic for solving the infinite dimensional problem is to approximate functional variable by basis expansions. The choice of basis functions can either be pre-determined or data-driven. The former one includes spline functions and Fourier basis, see e.g. Ramsay and Dalzell (1991), Chapter 16 of Ramsay and Silverman (2005) and Ivanescu et al. (2015). The most popular approach of the latter case is functional principal component analysis (FPCA). He et al. (2000) expressed function parameter in eigenbasis expansion and studied the normal equation, see also Chiou et al. (2004). Yao et al. (2005) extended this field to sparsely observed cases. ∗ Corresponding author. E-mail address: [email protected] (Q. Wang). https://doi.org/10.1016/j.csda.2019.106814 0167-9473/© 2019 Published by Elsevier B.V.

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

14 15 16 17 18 19 20

COMSTA: 106814

2

1 2 3 4 5 6 7 8

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Sophisticated theoretical results about FPCA in functional linear model are provided in Hall and Horowitz (2007) and Crambes and Mas (2013). Recently Luo and Qi (2017) refined the traditional double expansion and proposed a signal compression approach to choose optimal basis for prediction. Other regularized or penalized methods include Tikhonov regularization adopted by Benatia et al. (2017) and reproducing kernel Hilbert space approaches employed by Lian (2015) and Sun et al. (2018). However, structural linearity may be too restrictive in real applications. Ferraty et al. (2012) extended Nadaraya–Watson estimator to function-on-function regression. Inspired by Yao and Müller’s work on scalar-on-function quadratic models (Yao and Müller, 2010), Matsui (2017) puts forward the function-on-function quadratic regression model

9

10

Y (t) = α (t) +



X (s)β (s, t)ds + S

∫ ∫ S

X (r)X (s)γ (r , s, t)drds + ϵ (t).

(2)

S

30

α (t), β (s, t) and γ (r , s, t) are unknown function coefficients that need to be estimated. The quadratic term can be regarded as an interaction effect of X on Y . When function γ is zero a.e., model (2) degenerates to (1). The quadratic model (2) is comparatively more flexible than linear models and more interpretable than general nonparametric model. Our first contribution is to construct estimates of unknown functional parameters in model (2) and develop asymptotic theories. Matsui (2017) assumed that X (s) can be expressed by finite basis expansions and error process is Gaussiandistributed. He then estimated the model by maximum likelihood method with no theoretical results provided. In this paper, we borrow the idea of FPCA to convert the functional quadratic model into a series of scalar quadratic models. Methods for estimation and prediction are then proposed. On the other hand, it is also of interest to test whether the quadratic term is significant since model (2) is more complex than simple linear model (1). There are relatively few works on test problems in functional models. For the case that the response Y is scalar, Cardot et al. (2003) considered testing the linear effect of first finite basis projections, while Lei (2014) developed a scan test procedure to check a global linear effect. Kong et al. (2016) extended several classical testing techniques to functional linear models. Delsol et al. (2011) studied structural test of general E(Y |X ) and no effect tests can be further constructed, see Delsol (2013). When Y is also functional, a concise method for lack of effect test in model (1) is stated in Chapter 9 of Horváth and Kokoszka (2012). In this paper, a model checking method is suggested to check whether the function-on-function linear model is adequate based on the prior work of Horváth and Reeder (2013). Asymptotic results are given in mild conditions. The rest of this paper is organized as follows. In Section 2 we develop methods for model estimation, response prediction and hypothesis testing. Corresponding theoretical properties are stated in Section 3. Simulation studies and real data analysis are conducted in Sections 4 and 5 respectively. All technical proofs are postponed to Appendix A.

31

2. Methodology

32

2.1. Model and notations

11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

33 34 35 36 37 38 39 40 41 42

43

Suppose we have a set of i.i.d samples {Xi (s), Yi (t)}, i = 1, 2, . . . , n, s ∈ S , t ∈ T from the population {X ∈ L2 (S ), Y ∈ L2 (T )}, where S and T are both bounded intervals of R. Suppose the regression model is established as (2). All conditions for variables involved are collected in (C1) below. (C1) E ∥X ∥4 < ∞, E ∥Y ∥4 < ∞; Without loss of generality, we assume that E(Xi (s)) = 0 for all s ∈ S . The i.i.d random error processes ϵi are independent of Xi satisfying ϵi ∈ L2 (T ) and E(ϵi (t)) = 0 for all t ∈ T . In this paper we consider the case where the whole trajectories of {Xi (s), Yi (t)} are fully observed. All methods proposed and corresponding theories remain the same for densely observed case. See, for example, Theorem 4 of Zhang and Chen (2007). As in literature, we can transform all functions in model (2) into basis expansions. Denote CX , CY as covariance function of X and Y . According to Mercer’s Theorem, we have CX (s1 , s2 ) =



λk ψk (s1 )ψk (s2 ),

CY (t1 , t2 ) =

k≥1 44 45 46 47 48

49



σk φk (t1 )φk (t2 ),

k≥1

for all s1 , s2 ∈ S and t1 , t2 ∈ T , where {λk , k = 1, 2, . . . } and {σk , k = 1, 2, . . . } are nonnegative eigenvalues of corresponding covariance function, while {φk , k ≥ 1} and {ψk , k ≥ 1} form orthonormal basis in L2 (T ) and L2 (S ) respectively. We further assume that all eigenvalues are distinct to ensure the uniqueness of eigenfunctions. (C2) Define δX ,j = min1≤k≤j (λk − λk+1 ) and δY ,j = min1≤k≤j (σk − σk+1 ). Assume δX ,j > 0, δY ,j > 0 for j ≥ 1. The coefficient functions can then be represented as

β (s, t) =

∑∑ k≥1 m≥1

bmk ψm (s)φk (t),

γ (r , s, t) =

∑∑∑

r˜lmk ψl (r)ψm (s)φk (t),

(3)

k≥1 l≥1 m≥1

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

3

for all r , s ∈ S and t ∈ T , where bmk = β (s, t)ψm (s)φk (t)dsdt and r˜lmk = γ (r , s, t)ψl (r)ψm (s)φk (t)drdsdt are projection coefficients. For the sample curves, we also have Karhunen–Loève expansions

∫∫



Xi (s) =

ξik ψk (s),

Yi (t) = µY (t) +

k≥1

∫∫∫



ηik φk (t),

(4)





bmk ξim +

m≥1

where αk =

∫ T

2

3

k≥1

where µY (t) = E(Yi (t)), ξik = S Xi (s)ψk (s)ds and ηik = T (Yi (t) − µY (t))φk (t)dt. Random variables ξik and ηik are projections of Xi and centered Yi onto their kth eigenfunctions respectively. Insert (3) and (4) into (2), and by orthonormality, we obtain

ηik = αk +

1



∑∑

r˜lmk ξil ξim + ϵik ,

i = 1, 2, . . . , n, k = 1, 2, . . .

5 6

(5)

7

ϵi (t)φk (t)dt, for each k = 1, 2, . . . ,. We then merge terms r˜lmk ξil ξim and

8

l≥1 m≥1

(α (t) − µY (t))φk (t)dt and ϵik =

∫ T

r˜mlk ξim ξil for all l ̸ = m. We further impose condition (C3) such that Eq. (5) makes sense.

9

(C3) α ∈ L (T ), β ∈ L (S × T ) and γ ∈ L (S × S × T ) satisfying γ (r , s, t) = γ (s, r , t). Due to the symmetry of γ (r , s, t) with respect to r and s, we have r˜lmk = r˜mlk for all k. Eq. (5) can then be modified to 2

4

2

2

10 11

l

ηik = αk +



bmk ξim +

m≥1

∑∑

rlmk ξil ξim + ϵik ,

i = 1, 2, . . . , n,

k = 1, 2, . . . ,

(6)

12

l≥1 m=1

where rlmk = 2r˜lmk for l ̸ = m and rmmk = r˜mmk .

13

2.2. Estimation and prediction

14

For the purpose of estimating β and γ , it is equivalent to estimating all coefficients in (6). Note that the Karhunen– Loève expansion yields that E(ξik ) = 0, E(ξik2 ) = λk , and E(ξik ξij ) = 0, for j ̸ = k. Similar to condition (A1) in Yao and Müller’s (2010), we strengthen these properties by (C4). u v ij ik )

(C4) E(ξ ξ

u v ij )E( ik )

= E(ξ

ξ

for 1 ≤ j < k < ∞ when u + v = 3 or 4, and E(ξ

3 ij )

= 0 for 1 ≤ j < ∞.

αk = −

rllk λl ,

(7)

rlmk = E(ξim ξil ηik )/(λm λl ) for m < l,

n

K

rˆllk λˆ l ,

bˆ mk =

l=1

rˆlmk =

n 1∑

n

1∑ n

i=1

/

ξˆim ξˆil ηˆ ik (λˆ m λˆ l ) for m < l,

rˆllk =

n 1∑

n

/( ξˆil2 ηˆ ik

i=1

n 1∑

n

)

(8)

n

where µ ˆ Y (t) =

1 n

n

Cˆ X (s1 , s2 ) =

i=1



Xi (s1 )Xi (s2 ),

Cˆ Y (t1 , t2 ) =

)( ) 1 ∑( Yi (t1 ) − µ ˆ Y (t1 ) Yi (t2 ) − µ ˆ Y (t2 ) , n

Yi (t). Then the eigen-decompositions of Cˆ X and Cˆ Y can be calculated by

λˆ k ψˆ k (s1 )ψˆ k (s2 ) and Cˆ Y (t1 , t2 ) =

ˆ m (s)ds and Xi (s)ψ S

24

ξˆil4 − λˆ 2l ,



25 26 27

28

σˆ k φˆ k (t1 )φˆ k (t2 ).

29

30

k≥1

Therefore estimates of the projection scores are

ξˆim =

23

i=1

k≥1



22

n

i=1

∑n

21

i=1

for 1 ≤ k ≤ D and 1 ≤ m ≤ l ≤ K . K and D are pre-determined truncation constants and eigenvalues λˆ m and projection scores ξˆim , ηˆ ik for 1 ≤ m ≤ K , 1 ≤ k ≤ D can be obtained by common functional principal component analysis. Briefly speaking, we can first estimate covariance function of X and Y using their empirical counterparts 1∑

20

/ ξˆim ηˆ ik λˆ m ,

i=1

Cˆ X (s1 , s2 ) =

19

rllk = E(ξil2 ηik )/(E(ξil4 ) − λ2l ).

By replacing all expectations in (7) with corresponding empirical counterparts and truncating, estimates of coefficients are then given by



17

bmk = E(ξim ηik )/λm ,

l≥1

αˆ k = −

16

18

Note that when the predictor Xi is Gaussian-distributed, its projection scores are independently normal, thus Condition (C4) is satisfied. Under (C4), model (6) then implies that all unknown coefficients satisfy



15

ηˆ ik =



(Yi (t) − µ ˆ Y (t))φˆ k (t)dt . T

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

31

32

COMSTA: 106814

4

1

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Based on (8), estimates of β (s, t) and γ (r , s, t) are then given by

βˆ (s, t) =

2

D K ∑ ∑

ˆ m (s)φˆ k (t), bˆ mk ψ

γˆ (r , s, t) =

k=1 m=1 3 4

Yˆ ∗ (t) = µ ˆ Y (t) +

D ∑

ηˆ k∗ φˆ k (t),

ηˆ k∗ = αˆ k +

8 9

where µ ˆ Y (t) =

∑n

1 n

i=1

K ∑

bˆ mk ξˆm∗ +

m=1

k=1

7

ˆ l (r)ψˆ m (s)φˆ k (t), rˆ˜lmk ψ

(9)

k=1 l=1 m=1

where rˆ˜lmk = rˆ˜mlk = rˆlmk /2 for m < l and rˆ˜mmk = rˆmmk . Suppose there is a new observation X ∗ (s) from the same population. A straightforward way to predict Y ∗ (t) is as follows

5

6

D K K ∑ ∑ ∑

Yi (t) and ξˆm∗ =

∫ S

K l ∑ ∑

rˆlmk ξˆl∗ ξˆm∗ ,

(10)

l=1 m=1

ˆ m (s)ds. X ∗ (s)ψ

Remark 1. To implement the estimation and prediction, truncation parameters K and D need to be specified. In practice, we use V -fold cross-validation method. To be specific, denote full dataset by F , and randomly split F into V equal size sets F1 , . . . , FV . The criterion is given by CV (K , D) =

10

V ∑ ∑∫

(Yi (t) − Yˆi−k (t))2 dt ,

k=1 i∈Fk 11 12 13

ˆ −k

where Yi (t) is calculated similar to (10) except that we regard set F − Fk as training data and Fk as validation data.

ˆ are given by minimizing CV (K , D) on the set {K , D : K ≤ Kmax , D ≤ Dmax }. We define Kmax and Dmax by the Hence (Kˆ , D) ∑ λˆ j ‘‘percentages of variance explained’’ (PVE). If PVE of X is set to be ρ , then Kmax := min{k : ∑j≤kˆ ≥ ρ}. We determine j

14

Dmax similarly .

15

2.3. Testing procedure

16 17

In this subsection we construct statistics to test null hypothesis H0 : γ (r , s, t) = 0 based on a slightly modified coefficient estimates. Let

18

Bk = (bik , b2k , . . . , bKk )T , R k = (r11k , r21k , r22k , r31k , r32k , r33k , . . . , rKKk )T ,

19

H k = (η1k , η2k , . . . , ηnk )T , L i = (ξi1 , ξi2 , . . . , ξiK )T ,

20 21

22

23

24

25

26

and Q i = (ξi1 ξi1 , ξi2 ξi1 , ξi2 ξi2 , ξi3 ξi1 , ξi3 ξi2 , ξi3 ξi3 , . . . , ξiK ξiK )T .

ˆ k , Lˆ i and Qˆ i to denote the estimates of H k , L i and Q i respectively. Then the solution of problem We also use H min

αk ,Bk ,R k

n ∑ i=1

29

K ∑

bmk ξˆim −

m=1

K l ∑ ∑

rlmk ξˆil ξˆim }2 , k = 1, . . . , D.

(11)

l=1 m=1

⎛ ⎞ Rˇ k ⎝ Bˇ ⎠ = (Zˆ T Zˆ )−1 Zˆ T Hˆ k , k = 1, 2, . . . , D, k αˇ k

(12)

where the matrix Zˆ is defined by T

T

Qˆ 1

⎜ ⎜ T ⎜Qˆ 2 Zˆ = ⎜ ⎜ . ⎜ .. ⎝ T Qˆ n T

28

{ηˆ ik − αk −

is given by



27

λj



Lˆ 1

1

T Lˆ 2 .. .

⎟ ⎟ 1⎟ ⎟. .. ⎟ .⎟ ⎠

T

Lˆ n

(13)

1

T

ˆ k , Rˆ k ) as vector form of (8). For a fixed K and a finite k, it can be verified that the difference between Rˆ k Denote (αˆ k , B ˇ and R k is asymptotically negligible in probability. Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx T

5

T

Let Rˇ = (Rˇ 1 , . . . , Rˇ D )T . Under H0 , the magnitude of Rˇ should be small. We then construct the test statistic as follows T

ˆ )Rˇ , ˆ −1 ⊗ M Tn = nRˇ (Σ

(14)

ˆ and Σ ˆ are given by where M n

ˆ = M

1∑ n

T

i=1

n

1∑ n

2 3

n

Qˆ i Qˆ i − (

1

Qˆ i )(

i=1

ˆ = (σˆ k,k′ )1≤k,k′ ≤D with σˆ k,k′ = Σ

1∑ n 1

Qˆ i )T ,

4

i=1

ϵˆ Tk ϵˆ k′ ,

T T T T ϵˆ k = (Hˆ k − (Rˇ k , Bˇ k , αˇ kT )Zˆ )T ,

5

n and ⊗ means Kronecker product of matrices. Note that Tn reduces to the test statistic derived by Horváth and Reeder (2013) when D = 1. H0 is rejected when Tn is large. According to the limiting distribution of Tn derived in Theorem 3.3 in the next section, 2 we set the threshold to be upper-α quantile of χDK (K +1)/2 distribution, where α is a pre-specified significance level. The truncation parameters K and D are chosen by V -folds cross-validation mentioned in Remark 1.

10

3. Asymptotic properties

11

Before stating asymptotics formally, we first specify some notations. From now on, ∥ · ∥2 means Euclidean norm of real vectors. With slight abuse of we adopt ∥ · ∥ to denote∫ordinary L2 −norms in function spaces for different ∫ notations, ∫ 2 2 2 2 domains. For example, ∥f ∥ = S f (s) ds for f ∈ L (S ), and ∥g ∥ = S T g(s, t)2 dtds for g ∈ L2 (S × T ). Notation f ⊗ h for functions f , h is used to represent the multivariate function satisfying (f ⊗ h)(x, y) = f (x)h(y) for all x ∈ X and y ∈ Y , where X and Y are domains of f and h. Besides the conditions (C1)–(C4), we need a moment condition to help verify the consistency of γˆllk . (C5) E ξ ≥ C λ , j ≥ 1 for some positive constant C > 1. When the predicator Xi is Gaussian, all ξij ’s are normal. Thus (C5) is satisfied for any C ≤ 3. All conditions (C1)–(C5) are common in functional data analysis. Let dn = ∥Cˆ X − CX ∥. Denote K˜ n = min{j ≥ 1 : λj ≤ 2dn } − 1. It is clear that dn → 0 and thus K˜ n → ∞ as n → ∞. The following theorem shows the consistency of the proposed estimators. 4 ij

2 j

Theorem 3.1. Let truncations K = K (n) → ∞ and D = D(n) → ∞, which satisfy K ≤ K˜ n and D K 1 ∑∑ 1



n

k=1 m=1

(

1

λ m δX , m

1

+

δY , k

) → 0.

(15)

(16)

then ∥γˆ − γ ∥ = op (1).

λj − λj+1 ≥ C1 j

−ν1 −1

,

σj − σj+1 ≥ C2 j

15 16 17 18 19 20 21 22

24

26

1

λ m δX , m

+

30

−ν1

, δX , j ≥ C 1 j

−ν1 −1

and δY ,j ≥ C2 j

−ν2 −1

, which yield that

D K C4 ∑ ∑ ν1 ν1 +1 )≤ √ m (m + kν2 +1 ). δY , k n

32

k=1 m=1

Our next theorem ensures the prediction consistency. Theorem 3.2. Suppose that all conditions of Theorem 3.1 hold. As n → ∞, we have Yˆ ∗ (t) − E(Y ∗ (t)|X ∗ )

31

1

Hence we can always find suitable K = K (n) → ∞, D = D(n) → ∞ such that (15) holds. (16) can be verified similarly.

∫ (

14

29

−ν2 −1

for j ≥ 1 and ν1 , ν2 > 1. This implies that λj ≥ C3 j

k=1 m=1

13

28

Remark 2. Similar to formula (3.2) in Hall and Horowitz (2007), suppose that

n

12

27

To illustrate availability of assumption (15), we consider a common setting in FPCA.

(

9

25

) (√ l D K ∑ 1 ∑∑ σk 1 1 1 1 + ( + + ) → 0, √ δX , m δY , k n λ3l δX ,l m=1 λm λl δX ,l k=1 l=1

D K 1 ∑∑ 1

8

23

Then under conditions (C1)–(C5), we have ∥βˆ − β∥ = op (1). If we further assume that



6 7

)2

33 34

35

P

dt − → 0.

36

T

Finally we investigate the asymptotic behaviors of the testing statistic Tn . Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

37

COMSTA: 106814

6

1

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Theorem 3.3. Suppose conditions (C1)–(C5) are satisfied. For any pre-determined constants K and D we have d

2

Tn − → χ 2 (DK (K + 1)/2) P

3

under H0 and Tn − → ∞ if R ̸= 0, where R = (R T1 , . . . , R TD )T .

6

Remark 3. In Theorem 3.3, the asymptotic distribution of Tn depends on pre-specified finite K and D. The proposed test is consistent when the alternative hypothesis satisfies R ̸ = 0, which means that the projection of γ on the subspace spanned by {ψl ⊗ ψm ⊗ φk : 1 ≤ l, m ≤ K , 1 ≤ k ≤ D} is not zero.

7

4. Simulations

4 5

8 9 10 11

In this section, we conduct some simulations to evaluate finite sample performances of proposed methods. We set S and T as [0, 1]. Each random∑curve is ‘‘observed’’ on 100 equally spaced points in their domains. The √ setting for 35 2 ξ ψ (s), where ξ ∼ N(0 , 10 / (2k − 1) ) independently, ψ (s) = functional predicator is X (s) = 2 sin(2mπ s), k 2m k=1 k k √ ψ2m+1 (s) = 2 cos(2mπ s) for m = 1, . . . , 17 and ψ1 = 1. The error process ϵ is generated from a mean-zero Gaussian ′ 2

12 13

process with covariance function Σϵ (t , t ′ ) = σ 2 ρ 20|t −t | . σ 2 √ is variance of ϵ and ρ reflects auto-correlation. Response Y (t) is generated from model (2) with α (t) = t(1 − t), β (s, t) = 2 sin(4π t) and γ (r , s, t) = √5 t( 32 − t){sin(2π r) + sin(2π s)}. 2

18

In each of Monte Carlo trials, we choose K and D through V -fold cross validation (CV). Typical choices of V are 5 or 10, see Friedman et al. (2001). The case V = n is just the well-known leave-one-out CV. Further simulations show analogous performances of these criteria for estimation, prediction and test. Among them, 5-folds CV provides the smallest standard deviation of prediction errors and has the lowest computational cost. Thus we use 5-folds CV throughout the simulation works.

19

4.1. Simulations for model fit

14 15 16 17

20

We investigate the performance of estimates (9) by calculating relative sum of estimation errors (RSEE)

∫∫ 21

22 23 24

(βˆ (s, t) − β (s, t))2 dsdt

∫∫

β (s,

26

∫∫∫

(γˆ (r , s, t) − γ (r , s, t))2 drdsdt

∫∫∫

γ (r , s, t)2 drdsdt

.

To assess prediction accuracy, we use relative median square prediction error similar to Yao et al. (2005). Suppose that we have a sequence of testing samples {Yj∗ (t), Xj∗ (s), j = 1, 2, . . . , J }, the relative median square prediction error (RMSPE) is defined by

∫ 25

t)2 dsdt

,

Median1≤j≤J

{Yˆj∗ (t) − E(Yj (t)∗ |Xj∗ )}2 dt ∫ , {E(Yj (t)∗ |Xj∗ )}2 dt

where Yˆj∗ is obtained from Eq. (10) and E(Yj (t)∗ |Xj∗ ) = α (t) +

∫ S

Xj∗c (s)β (s, t)ds +

∫ ∫ S

S

Xj∗c (r)Xj∗c (s)γ (r , s, t)drds.

40

In Table 1, quantiles of RSEEs of 2000 replications with ρ = 0.1 and 0.5 show the consistency of the proposed estimates. Relatively large sample size n is needed to achieve same estimation accuracy when noise level σ 2 is high. To study prediction performance, in each replicate we generate J = 500 new samples from the same distribution and calculate RMSPEs. The proposed method is compared with two alternative methods. The first one is nonparametric method (Ferraty et al., 2012). The second one is classical linear principal component regression method proposed to fit the linear model (1), see, for example, He et al. (2000). The 50th percentiles of RMSPEs and standard deviations of 2000 replications are reported in Table 2. As expected, our method always performs better than the general nonparametric method and the classical linear method under quadratic model settings. The prediction accuracy increases and the fluctuation of prediction error decreases as the sample size n grows. RMSPEs become larger when noise level σ 2 increases. Large correlation level ρ tends to reduce prediction accuracy when noise signal is strong. We additionally study performances of three methods when the true model is linear (1). Let γ = 0 and keep other settings as above. Results presented in Table 3 reveal that there is little loss even if the proposed quadratic method is employed for a simple linear regression model (1).

41

4.2. Empirical size and power

27 28 29 30 31 32 33 34 35 36 37 38 39

42 43 44 45 46 47

In this subsection we investigate empirical size and power of test method proposed in Section 2.3. The model settings are all identical to the previous section except that γ (r , s, t) = √C t( 32 − t){sin(2π r) + sin(2π s)}. C is a constant representing 2 signal strength. In particular, H0 is true when C = 0. Results of 2000 repetitions are reported in Table 4. As shown in Table 4, the empirical sizes get close to the nominal sizes when sample size increases and the empirical power increases to 1 as C grows. Large sample size can alleviate over-rejection when C = 0 and improves power simultaneously. Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

7

Table 1 25%, 50% and 75% percentiles of RSEEs.

ρ

σ

0.1

0.1

1

10

0.5

β

n

0.1

1

10

γ (×10−2 )

25%

50%

75%

25%

50%

75%

100 250 500 100 250 500 100 250 500

0.108 0.041 0.023 0.108 0.052 0.028 0.230 0.087 0.043

0.310 0.116 0.065 0.273 0.126 0.067 0.491 0.187 0.092

0.761 0.291 0.149 0.714 0.283 0.153 1.043 0.363 0.190

0.051 0.019 0.010 0.303 0.127 0.043 0.922 0.346 0.153

0.213 0.064 0.029 1.235 0.458 0.188 5.201 2.159 1.007

0.569 0.158 0.073 3.842 1.255 0.595 21.028 7.816 4.116

100 250 500 100 250 500 100 250 500

0.108 0.044 0.022 0.139 0.052 0.027 0.240 0.086 0.044

0.305 0.132 0.060 0.326 0.133 0.062 0.527 0.194 0.104

0.697 0.297 0.152 0.770 0.296 0.148 1.053 0.394 0.203

0.079 0.019 0.010 0.263 0.099 0.038 1.046 0.424 0.204

0.253 0.068 0.033 1.160 0.442 0.198 6.001 2.571 1.204

0.666 0.194 0.088 4.117 1.510 0.677 29.868 10.662 5.277

Table 2 Medians of RMSPEs (and standard deviation) of the quadratic model.

ρ

σ2

n

Quadratic fit

Nonparametric

Linear fit

0.1

0.1

100 250 500 100 250 500 100 250 500

0.033(0.048) 0.013(0.019) 0.007(0.009) 0.035(0.045) 0.014(0.019) 0.007(0.010) 0.057(0.051) 0.022(0.020) 0.011(0.010)

0.119(0.021) 0.066(0.008) 0.045(0.005) 0.162(0.024) 0.102(0.012) 0.075(0.009) 0.348(0.053) 0.253(0.050) 0.169(0.050)

0.771(0.159) 0.725(0.083) 0.704(0.055) 0.777(0.154) 0.724(0.079) 0.701(0.056) 0.793(0.164) 0.728(0.084) 0.713(0.061)

100 250 500 100 250 500 100 250 500

0.031(0.048) 0.013(0.017) 0.007(0.010) 0.034(0.046) 0.014(0.018) 0.007(0.010) 0.059(0.050) 0.024(0.019) 0.013(0.009)

0.118(0.019) 0.065(0.008) 0.043(0.005) 0.156(0.025) 0.097(0.012) 0.073(0.008) 0.329(0.055) 0.234(0.046) 0.158(0.049)

0.773(0.166) 0.721(0.079) 0.706(0.056) 0.768(0.167) 0.723(0.083) 0.709(0.059) 0.788(0.165) 0.730(0.092) 0.708(0.056)

1

10

0.5

0.1

1

10

Table 3 Medians of RMSPEs (and standard deviation) of the linear model.

ρ

σ2

n

Quadratic fit

Nonparametric

Linear fit

0.1

0.1

100 250 500 100 250 500 100 250 500

0.011(0.030) 0.005(0.013) 0.002(0.007) 0.016(0.037) 0.007(0.014) 0.004(0.008) 0.063(0.066) 0.033(0.026) 0.016(0.018)

0.017(0.003) 0.010(0.002) 0.006(0.001) 0.043(0.011) 0.024(0.005) 0.017(0.003) 0.138(0.037) 0.081(0.017) 0.053(0.013)

0.010(0.030) 0.004(0.013) 0.002(0.006) 0.013(0.037) 0.006(0.014) 0.004(0.008) 0.050(0.035) 0.024(0.016) 0.012(0.010)

100 250 500 100 250 500 100 250 500

0.012(0.031) 0.004(0.013) 0.002(0.007) 0.017(0.031) 0.008(0.013) 0.004(0.007) 0.076(0.061) 0.043(0.040) 0.030(0.020)

0.017(0.003) 0.010(0.002) 0.006(0.001) 0.042(0.011) 0.024(0.005) 0.016(0.003) 0.130(0.048) 0.078(0.022) 0.051(0.015)

0.012(0.030) 0.004(0.012) 0.002(0.007) 0.014(0.030) 0.006(0.013) 0.003(0.007) 0.057(0.039) 0.029(0.022) 0.015(0.010)

1

10

0.5

0.1

1

10

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

8

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx Table 4 Empirical sizes and powers. Nominal size

n

C =0

0.04

0.06

0.08

0.1

0.2

0.05

100 250 500

0.077 0.080 0.058

0.141 0.225 0.319

0.232 0.469 0.680

0.346 0.729 0.926

0.492 0.920 0.989

0.967 1 1

0.01

100 250 500

0.021 0.020 0.009

0.051 0.089 0.147

0.094 0.255 0.459

0.183 0.523 0.828

0.309 0.802 0.969

0.933 1 1

Fig. 1. Smoothed data of 376 pairs of CCA curves and RCST curves.

1

5. Application

24

The proposed method is applied to white matter tracts profiles in a brain tractography study. The data that we found in R package ‘‘refund’’ were originally collected at Johns Hopkins University and the Kennedy–Krieger Institute. The investigators used diffusion tensor imaging (DTI) tractography, a magnetic resonance imaging (MRI) technique, to measure the diffusivity of water along the white matter tracts. The diffusivity, quantified by fractional anisotropy (FA), is meaningful for the study of some brain disease such as multiple sclerosis, see Goldsmith et al. (2011) and reference therein. The data includes FA values for different locations of two white matter tracts, the corpus callosum (CCA) and the right corticospinal tract (RCST). There are 382 pairs of FA curves of CCA and RCST, which are observed in 93 and 43 design points respectively. We are interested in investigating the spatial relationship between the two functional data objects by constructing function-on-function quadratic regression model with CCA as the predictor and RCST as the response. Removing 6 pairs of samples that have missing values, 376 pairs of curves on normalized domain [0, 10] are displayed in Fig. 1. Some initial functional principal component analysis and estimated mean function of CCA and RCST are visualized in Fig. 2. We randomly split the data into 300 training samples and 76 test samples and calculate the RMSPE defined in Section 4.1. Replicating the procedure 100 times, we compare the prediction performances for the quadratic model and the linear model (1) in Table 5. It can be seen that the prediction errors of the quadratic model are smaller than the linear model in almost all choices of truncation parameters. The RMSPE of nonparametric method is 8.16 × 10−3 , which is a comparable result to our quadratic fit. We also note that P-values P(χK2(K +1)D/2 > T376 ) are always sufficient small under varying settings of K and D, which again indicates the significance of the quadratic term. The estimated αˆ , βˆ and γˆ with truncation parameters chosen by 5-folds CV are exhibited in Fig. 3. It can be observed that the intercept αˆ on the top left keeps the main trend of RCST curves. Image of βˆ on the top right reveals that large (>7) or small (<2) s has relatively greater impact on the response. Two cross-section images of γˆ show that CCA might have strong positive quadratic effect on RCST at t = 3.3 whereas relatively weaker effect at t = 6.9.

25

Acknowledgments

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

26 27 28

Wang’s research was supported by the National Natural Science Foundation of China (General program 11871460, Key program 11331011 and program for Creative Research Group in China 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS. Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

9

Fig. 2. Functional principal component analysis of both ‘‘CCA’’ and ‘‘RCST’’. Top row: Cumulative percentage of variance explained for the first 8 eigenvalues of ‘‘CCA’’ and first 13 of ‘‘RCST’’. Bottom-row: Estimated mean functions of ‘‘CCA’’ and ‘‘RCST’’. Table 5 RMSPEs (×10−3 ) of quadratic models (and linear models, in the parentheses) under varying truncation parameters. D=6 7 8 9 10 11 12 13

K =4

5

6

7

8

8.88(9.16) 8.96(8.62) 9.10(9.48) 9.27(9.43) 9.02(11.4) 8.12(8.91) 8.49(8.68) 9.06(9.21)

9.76(10.23) 8.06(8.23) 8.29(8.17) 8.40(8.26) 8.62(9.36) 8.60(8.63) 8.35(8.61) 8.19(8.19)

9.25(9.55) 9.91(10.01) 9.02(9.26) 9.17(9.23) 8.88(9.33) 7.19(8.15) 7.88(8.48) 9.15(9.82)

8.68(8.46) 9.17(9.27) 9.01(9.89) 9.1(10.27) 9.38(9.79) 8.36(9.71) 8.12(9.53) 8.41(9.08)

8.99(9.43) 8.22(8.76) 9.74(9.39) 9.16(10.18) 8.67(9.01) 9.08(9.03) 9.02(9.44) 9.24(8.84)

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

10

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

Fig. 3. Parameters estimates. Top left: αˆ (·). Top right: βˆ (·). Bottom left: cross-section function γˆ (·, ·, t = 3.3). Bottom right: cross-section function γˆ (·, ·, t = 6.9).

1

3 2 4

5 6

Appendix A

Appendix A contains proofs of theoretical results stated in Section 3. Note that constant C may vary from lines and all of them are not related to K or D.

ˆ since the derivation of ∥γ − γˆ ∥ is analogous. Definitions (3) Proof of Theorem 3.1. We only show the result of ∥β − β∥ and (9) yield that ˆ 2=∥ ∥β − β∥

∞ ∑ ∞ ∑

bmk ψm ⊗ φk −

=∥

D K ∑ ∑

ˆ m ⊗ φˆ k ∥2 bˆ mk ψ

k=1 m=1

k=1 m=1

7

D K ∑ ∑

ˆ m ⊗ φˆ k ) + bmk (ψm ⊗ φk − ψ

k=1 m=1

+

D ∑ ∑ k=1 m>K

D K ∑ ∑

ˆ m ⊗ φˆ k (bmk − bˆ mk )ψ

k=1 m=1

bmk ψm ⊗ φk +

∞ ∑∑

bmk ψm ⊗ φk ∥2

k>D m=1

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

≤ 4∥

D K ∑ ∑

ˆ m ⊗ φˆ k )∥2 + 4 bmk (ψm ⊗ φk − ψ

k=1 m=1

+4

11

D K ∑ ∑

(bmk − bˆ mk )2

k=1 m=1

D ∑ ∑

b2mk + 4

k=1 m>K

∞ ∑∑

1

b2mk

k>D m=1

:= 4I1 + 4I2 + 4Rβ (K , D)2 , (∑ ∑ ∑ ∑∞ 2 )1/2 D 2 → 0 as K , D → ∞ since ∥β∥2 < ∞. By Cauchy–Schwarz’s where Rβ (K , D) = m=1 bmk k=1 m>K bmk + k>D inequality and results in Lemma 1, we have

I1 =



S

S

≤ ∥β∥2

ˆ m (s)φˆ k (t)) bmk (ψm (s)φk (t) − ψ

dsdt

k=1 m=1

∫ ∫ (∑ D K ∑ T

3

}2

∫ ∫ {∑ D K ∑ T

2

) b2mk

D K ∑ ∑

k=1 m=1

D K ∑ ∑

ˆ m (s)φˆ k (t))2 dsdt (ψm (s)φk (t) − ψ

4

k=1 m=1

∥ψm ⊗ φk − ψˆ m ⊗ φˆ k ∥2 .

k=1 m=1

(

1 n

Lemma 1 and (C3) then lead to I1 = Op

∑D ∑K

m=1 (

k=1

1

δX2 ,m

+

1

δY2 ,k

) ) .

5

According to Lemma 2, for any 1 ≤ m < l ≤ K and 1 ≤ k ≤ D, we have

|bˆ mk − bmk | ≤ √

C

(

nλm δX ,m

( which yields that I2 = Op

( ∥βˆ − β∥ = Op

1

+

1

δY ,k

∑D ∑K k=1



k=1 m=1

7

1

m=1 nλ2 m

D K 1 ∑∑ 1

n

),

(

1

λ m δX , m

(

1

+

δX2 ,m

+

1

δY , k

1

δY2 ,k

) ) . Hence we arrive at

) + Rβ (K , D) .

9



Proof of Theorem 3.2. According to the original model (2) and the prediction stated in Section 3.2, we have

  D   ∑   ∗ ∗ ∗ ∥Yˆ − E(Y |X )∥ ≤ µ ˆY + αˆ k φˆ k − α    k=1  D ( K ) (∞ )  ∞  ∑ ∑ ∑ ∑   ∗ ∗ +  bˆ mk ξˆm φˆ k − bmk ξm φk    k=1 m=1 k=1 m=1  D ( K l ) ( )  ∞ ∞ ∑ ∞  ∑ ∑ ∑ ∑ ∑   ∗ ˆ∗ ∗ ∗ ˆ ˆ r˜lmk ξl ξm φk  +  rˆlmk ξl ξm φk −   l=1 m=1

k=1

  D   ∑ ∑   B1 = µ ˆY + αˆ k φˆ k − µY − αk φk    k=1 k≥1  D   D    ∑  ∑  ∑        ≤ ∥µ ˆ Y − µY ∥ +  φˆ k (αˆ k − αk ) +  (φˆ k − φk )αk  +  φk αk        k=1

10

11

12

l=1 m=1

:= B1 + B2 + B3 , ∫ ∑ where µ ˆ Y = 1n ni=1 Yi and ξˆm∗ = S X ∗ (s)ψˆ m (s)ds. ∑ Recalling the definition of αk , we know that α (t) = µY (t) + k≥1 αk φk (t). Then

k=1

8

)

The proof is then completed due to assumption (15) in Theorem 3.1.

k=1

6

k>D

:= B1,1 + B1,2 + B1,3 + B1,4 . Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

13 14

15

COMSTA: 106814

12

1 2

3

4

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

It is obvious that B1,1 = Op (n−1/2 ), and B21,4 = of Theorem 3.1, we obtain B21,3 =

)2

∫ (∑ D

(φˆ k (t) − φk (t))αk

T

|B1,2 |2 =

∑D

1 k=1 δY ,k ).



D ∑

(αˆ k − αk )2 = Op ⎝

8 9

) αk2

D ∑

Furthermore, Lemma 3 in the supplementary material yields that

{

D K 1∑ ∑ 1

n

k=1

∥φˆ k − φk ∥2 ,

k=1

l=1

(



1 σk + + ) δX ,l λl δY ,k

1

λ l δX , l

}2 +(



D ∑ ∑

λl )(

l>K

⎞ 2 ⎠ ) . rllk

k=1 l>K

Therefore we have

( 7

D ∑

αk2 → 0 when D → ∞. For B1,3 , by similar procedure in the proof

k>D

k=1

k=1 6

dt ≤

k=1

which indicates that |B1,3 | = Op ( √1n

5

(



B1 = Op

D K 1 ∑∑ 1



n

k=1 l=1

) √ 1 σk + + ) + Rα,γ (K , D) , δX , l λ l δY , k

1

(

λl δX ,l

2 )}1/2 + ( where Rα,γ (K , D) = {( l>K λl )( k=1 l>K rllk By some algebras, it can be shown that

∑D ∑



B22

10



k>D

(17)

αk2 )1/2 → 0.

 D  D (∞ )2 ( K )2 ∞  ∑  ∑ ∑ ∑ ∑     ∗ ∗ ∗ (φˆ k − φk ) bmk ξm  φˆ k bmk ξm  + 3  ≤ 3 bˆ mk ξˆm −     k=1 m=1 k=1 m=1 m=1  (∞ )2  ∑ ∑   φk bmk ξm∗  + 3   k>D m=1  D (∞ )2 ( K (∞ )2 )2 D ∞  ∑ ∑ ∑ ∑ ∑ ∑ ∑  ∗ ∗  ∗ ∗ ˆ ˆ ˆ bmk ξm ≤3 bmk ξm  + 3 bmk ξm − bmk ξm + 3  (φk − φk )   k>D m=1 k=1 m=1 k=1 m=1 m=1 ( K )2 ( K )2 ( )2 D D D ∑ ∑ ∑ ∑ ∑ ∑ ∗ ∗ ∗ ∗ ≤9 +9 +9 (bˆ mk − bmk )ξˆm bmk (ξˆm − ξm ) bmk ξm k=1

k=1

m=1

k=1

m=1

m>K

 D (∞ )2 (∞ )2  ∑ ∑ ∑ ∑   ∗ ∗ bmk ξm bmk ξm  + 3 + 3  (φˆ k − φk )   k=1

k>D

m=1

m=1

:= 9B2,1 + 9B2,2 + 9B2,3 + 3B2,4 + 3B2,5 11

We verify in Lemma 4 of the supplementary material that

(

12

13

B2,1

B2,3

) ( K ) D K ∑ ∑ ∑ 1 1 1 1 ( + 2 ) , B2,2 = Op , = Op nλ2m δX2 ,m δY , k nδY2 ,m m=1 k=1 m=1 ( ) D ∞ ∑ ∑ ∑ ∑∑ 2 = Op ( λm )( bmk ) , B2,5 = Op ( b2mk ). m>K

14

16

B2 = Op

D K 1 ∑∑ 1



18

19

1

k=1

nδY2 ,k

),

k>D m=1

n

k=1 m=1

(

1

λm δX ,m

+

1

δY , k

) ) + Rβ (K , D) .

(18)

Similarly, we have

( 17

D ∑

In summary, we obtain

( 15

k=1 m>K

B2,4 = Op (

B3 = Op

D K 1 ∑∑



n

k=1 l=1

(√

) ) l ∑ σk 1 1 1 1 + ( + + ) + Rγ (K , D) , δX , m δY , k λ3l δX ,l m=1 λm λl δX ,l

(19)

where Rγ (K , D) =

( D ∞ ∑∑∑ k=1 l>K m=1

2 rlmk

˜

+

D K ∑ ∑ ∑ k=1 l=1 m>K

2 rlmk

˜

+

∞ ∑ ∞ ∑∑

)1/2 2 rlmk

˜

→ 0.

k>D l=1 m=1

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

COMSTA: 106814

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

13

Combining (17), (18) and (19) yields

{ ∥Yˆ ∗ − E(Y ∗ |X ∗ )∥ = Op

1

( ∑∑ √ D

1



n

K

k=1 l=1

)

l

σ λ δ

k 3 X ,l l

+

∑ 1 1 1 1 ( + + ) λm λl δX ,l δX , m δY , k m=1 }

2

+ Rα (K , D) + Rβ (K , D) + Rγ (K , D) , where Rα (K , D) = (



k>D

αk2 )1/2 . Therefore given condition (16) in Theorems 3.1 or 3.2, we obtain

3

∥Yˆ ∗ − E(Y ∗ |X ∗ )∥ = op (1). □

4

ˆ k can be Proof of Theorem 3.3. We now investigate the asymptotic property of Rˆ with fixed K and D. Based on (6) H written as Rk Bk

5 6

( ) ˆ k = Zˆ H

αk

+ ϵ∗k ,

(20)

∗ T ∗ ∗ , . . . , ϵnk ) with , ϵ2k for 1 ≤ k ≤ D, where ϵ∗k = (ϵ1k

ϵik∗ = ϵik +



bmk ξim +

m>K

l ∑∑

8

rlmk ξil ξim

l>K m=1

+ ηˆ ik − ηik +

K ∑

bmk (ξim − ξˆim ) +

m=1

K l ∑ ∑

(21)

9

rlmk (ξil ξim − ξˆil ξˆim ).

l=1 m=1

Combined with (12) we have

10





( ( ) ) ( ) Rk Rk Rˇ k T T − 1 ∗ ⎝ Bˇ ⎠ = (Zˆ Zˆ ) Zˆ Zˆ Bk + ϵk = Bk + (Zˆ T Zˆ )−1 Zˆ T ϵ∗k . k αk αk αˇ k

(22)

Under the null hypothesis, R k = 0. According to equations (4.2) and (4.3) in Horváth and Reeder (2013), we have



7

11

12

n

) )−1 ( 1 ∑ ∗( nRˇ k = √ ϵik E(Q i Q Ti ) − E(Q i )E(Q Ti ) Qˆ i − E(Q i ) + op (1), n

13

i=1

where Q i = (ξi1 ξi1 , ξi2 ξi1 , . . . , ξiK ξiK )T as defined in Section 2.3. For simplicity, we omit the random signs of estimated eigenfunctions considered in Horváth and Reeder (2013) which turn out to have no effect on the test statistics. Based on Lemma 5, we further obtain



14 15 16

n

nRˇ k = E(Q i Q Ti ) − E(Q i )E(Q Ti )

(

) )−1 1 ∑ ∗ ( ϵik Qˆ i − E(Q i ) + op (1) √ n

i=1 17

n )−1 1 ∑ = E(Q i Q Ti ) − E(Q i )E(Q Ti ) ϵ˜ik (Q i − E(Q i )) + op (1), √

(

n

where ϵ˜ik = ϵik +



m>K

bmk ξim under H0 . Therefore Central Limit Theorem yields that

ϵ˜i1 n ) 1 ∑⎜ . ⎟ ( T T −1 ˇ nR = √ (Q i − E(Q i )) + op (1) ⎝ .. ⎠ ⊗ E(Q i Q i ) − E(Q i )E(Q i ) n i=1 ϵ˜iD ( ( )−1 ) d − → N 0, Σ ⊗ E(Q i Q Ti ) − E(Q i )E(Q Ti ) ,





i=1 18



(23)

19

ˆ ⊗ where Σ = Var(ϵ˜i1 , . . . , ϵ˜iD )T . Due to Lemma 6 the asymptotic variance can be consistently estimated by Σ

20

( 1n

∑n

T

1 ˆ ˆ i=1 Q i Q i − ( n

∑n

ˆ 1 i=1 Q i )( n

∑n

ˆ T i=1 Q i ) ), where

ˆ = (σˆ k,k′ )1≤k,k′ ≤D with σˆ k,k′ = Σ

1 n

ϵˆ Tk ϵˆ k′ ,

T T T T ϵˆ k = (Hˆ k − (Rˇ k , Bˇ k , αˇ kT )Zˆ )T .

21

22

d

Thus we conclude that Tn − → χ 2 (DK (K + 1)/2). Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.

23

COMSTA: 106814

14

Y. Sun and Q. Wang / Computational Statistics and Data Analysis xxx (xxxx) xxx

When H0 is not true, (23) still holds with

1

∑l

nRˇ replaced by P



n(Rˇ − R), where ϵ˜ik = ϵik +



m>K

bmk ξim +

→ ∞ if R ̸= 0. □ m=1 rlmk ξil ξim . It is consequently obvious that Tn −

2



3

Appendix B. Supplementary data

l>K



7

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.csda.2019.106814. All fundamental lemmas mentioned in Appendix A are provided in Section 1 of supplementary material. Section 2 includes a brief discussion of identifiability issue of model (2).

8

References

5 4 6

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

Benatia, David, Carrasco, Marine, Florens, Jean-Pierre, 2017. Functional linear regression with functional response. J. Econometrics 201 (2), 269–291. Cardot, Hervé, Ferraty, Frédéric, Sarda, Pascal, 2003. Spline estimators for the functional linear model. Statist. Sinica 571–591. Chiou, Jeng-Min, Müller, Hans-Georg, Wang, Jane-Ling, 2004. Functional response models. Statist. Sinica 675–693. Crambes, Christophe, Mas, André, 2013. Asymptotics of prediction in functional linear regression with functional outputs. Bernoulli 19 (5B), 2627–2651. Delsol, Laurent, 2013. No effect tests in regression on functional variable and some applications to spectrometric studies. Comput. Statist. 28 (4), 1775–1811. Delsol, Laurent, Ferraty, Frédéric, Vieu, Philippe, 2011. Structural test in regression on functional variables. J. Multivariate Anal. 102 (3), 422–447. Ferraty, Frédéric, Van Keilegom, Ingrid, Vieu, Philippe, 2012. Regression when both response and predictor are functions. J. Multivariate Anal. 109, 10–28. Ferraty, Frédéric, Vieu, Philippe, 2006. Nonparametric functional data analysis: theory and practice. Springer Science & Business Media. Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert, 2001. The Elements of Statistical Learning, Vol. 1. Springer series in statistics New York. Goldsmith, Jeff, Bobb, Jennifer, Crainiceanu, Ciprian M, Caffo, Brian, Reich, Daniel, 2011. Penalized functional regression. J. Comput. Graph. Statist. 20 (4), 830–851. Hall, Peter, Horowitz, Joel L., 2007. Methodology and convergence rates for functional linear regression. Ann. Statist. 35 (1), 70–91. He, Guozhong, Müller, Hans-Georg, Wang, Jane-Ling, 2000. Extending correlation and regression from multivariate to functional data. Asymptot. Statist. Probab. 197–210. Horváth, Lajos, Kokoszka, Piotr, 2012. Inference for Functional Data with Applications, Vol. 200. Springer Science & Business Media. Horváth, Lajos, Reeder, Ron, 2013. A test of significance in functional quadratic regression. Bernoulli 19 (5A), 2120–2151. Hsing, Tailen, Eubank, Randall, 2015. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. John Wiley & Sons. Ivanescu, Andrada E., Staicu, Ana-Maria, Scheipl, Fabian, Greven, Sonja, 2015. Penalized function-on-function regression. Comput. Statist. 30 (2), 539–568. Kong, Dehan, Staicu, Ana-Maria, Maity, Arnab, 2016. Classical testing in functional linear models. J. Nonparametr. Stat. 28 (4), 813–838. Lei, Jing, 2014. Adaptive global testing for functional linear models. J. Amer. Statist. Assoc. 109 (506), 624–634. Lian, Heng, 2015. Minimax prediction for functional linear regression with functional responses in reproducing kernel Hilbert spaces. J. Multivariate Anal. 140, 395–402. Luo, Ruiyan, Qi, Xin, 2017. Function-on-function linear regression by signal compression. J. Amer. Statist. Assoc. 112 (518), 690–705. Matsui, Hidetoshi, 2017. Quadratic regression for functional response models, arXiv preprint arXiv:1702.02009. Ramsay, James O., Dalzell, C.J., 1991. Some tools for functional data analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 539–572. Ramsay, James O., Silverman, Bernard W., 2005. Functional Data Analysis, second ed. Springer. Sun, Xiaoxiao, Du, Pang, Wang, Xiao, Ma, Ping, 2018. Optimal penalized function-on-function regression under a reproducing kernel Hilbert space framework. J. Amer. Statist. Assoc. 1–11. Yang, Wenjing, Müller, Hans-Georg, Stadtmüller, Ulrich, 2011. Functional singular component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 (3), 303–324. Yao, Fang, Müller, Hans-Georg, 2010. Functional quadratic regression. Biometrika 97 (1), 49–64. Yao, Fang, Müller, Hans-Georg, Wang, Jane-Ling, 2005. Functional linear regression analysis for longitudinal data. Ann. Statist. 33 (6), 2873–2903. Zhang, Jin-Ting, Chen, Jianwei, 2007. Statistical inferences for functional data. Ann. Statist. 35 (3), 1052–1079.

Please cite this article as: Y. Sun and Q. Wang, Function-on-function quadratic regression models. Computational Statistics and Data Analysis (2019) 106814, https://doi.org/10.1016/j.csda.2019.106814.