Journal of Statistical Planning and Inference 204 (2020) 116–127
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Skew-normal partial functional linear model and homogeneity test ∗
Yuping Hu a,b , Liugen Xue b , Jing Zhao c , , Liying Zhang a a
School of Mathematics and Statistics, Zhengzhou University, Zhengzhou, 450001, China College of Applied Sciences, Beijing University of Technology, Beijing 100124, China c China National Institute of Standardization, Beijing 100191, China b
article
info
Article history: Received 9 August 2018 Received in revised form 1 May 2019 Accepted 6 May 2019 Available online 15 May 2019 MSC: 62G08 62G20
a b s t r a c t This paper is concerned with the statistical inference of skew-normal partial functional linear models which give a useful extension of the normal functional regression model. The maximum likelihood estimation based on functional principal component analysis is proposed. Furthermore, the score test for homogeneity of the variance is discussed. Asymptotic properties of the proposed estimators and test are established and finite sample behavior is studied through a small simulation experiment. © 2019 Elsevier B.V. All rights reserved.
Keywords: Skew-normal Partial functional linear models Principal components Score test Homogeneity
1. Introduction In the recent literature, there has been an increasing interest in functional data analysis with its extensive application in biometrics, chemometrics, econometrics, medical research as well as other fields. Functional data are intrinsically infinite dimension and thus, classical methods for multivariate observations are no longer applicable. There has been intensive methodological and theoretical development in function data analysis, see Besse and Ramsay (1986), Rice and Silverman (1991), Ramsay and Silverman (2002) and Horváth et al. (2013), Lian and Li (2014), Feng and Xue (2016), Huang et al. (2016) and Yu et al. (2017). It is also frequently the case that a response will be related to both a vector of finite length and a function-valued random variable as predictor variables, which refers to the partial functional linear models. Some research on partial functional linear models has been provided. For example, Zhang et al. (2007) discussed a partial functional linear model by incorporating a parametric linear regression into functional linear regression models. Shin (2009) proposed new estimators and their asymptotic properties for the parameters of a partial functional linear model. Furthermore, Lu et al. (2014) extended the results of Shin (2009) to quantile regression setting. Yu et al. (2016) investigated the hypothesis test of the parametric component in partial functional linear regression. However, the aforementioned articles have two significant limitations. First, they assume a single functional predictor model so it cannot be applied in a situation involving p different functional predictors. Second, these articles mainly focus on the estimation of the functional linear regression models with common errors which have symmetric property. However, in many fields such ∗ Corresponding author. E-mail address:
[email protected] (J. Zhao). https://doi.org/10.1016/j.jspi.2019.05.001 0378-3758/© 2019 Elsevier B.V. All rights reserved.
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
117
as economics, finance, biomedicine, environment, demography, and pharmacokinetics etc., sometimes error structures in regression models no longer satisfy symmetric property. Often there is a presence of high skewness and fat tails. Therefore it is of practical interest to develop more flexible approaches using a broader family of distributions. The first of these limitations can, in principle, be easily solved using the multiple functional predictors. The second limitation, that of symmetric property, can be addressed using a skew-normal distribution, which represents a generalization of normal distribution but has an additional shape parameter. There has been considerable work in non-normal distribution. Cancho et al. (2010) relaxed the normality assumption of the error in nonlinear regression models by using skew-normal distribution introduced by Sahu et al. (2003). A generalized flexible skew-elliptical distribution for random effects density was adopted by Ma and Genton (2004). Xie et al. (2009) considered some diagnostics for skew-normal nonlinear regression models with AR(1) errors. Wu et al. (2017) discussed linear mixed effect models with skew-normal errors and distribution-free random effects. Thus, in this paper, we consider partial functional linear regression models based on replacing the assumption of normal distribution by a weaker assumption of skew-normal distribution. Various approaches are available for fitting functional data, such as, functional principal component method, spline methods, rough penalty, and so on. The more traditional approach, masterfully documented in the monograph by Ramsay and Silverman (2005), typically starts by representing functional data with an expansion with respect to a certain basis, and subsequent inferences are carried out on the coefficients. The common used bases include the B-spline basis for non-periodic data and the Fourier basis for periodic data. Recently, methods based on functional principal component analysis (FPCA) are popular and widely used by researchers. It allows finite dimensional analysis of a problem that is intrinsically infinite dimensional. Dauxois et al. (1982) obtained the basic results related to asymptotic theory of FPCA when the functions are directly observable. Cardot et al. (1999) applied FPCA to estimate the slope function and derived convergence of the estimator in functional linear regression models with scalar response. Hall and Hosseini-nasab (2009) and Hall and Horowitz (2007) proved that this approach provides an orthogonal data-driven basis providing optimal convergence rates in the functional linear regression model. In another direction of statistical research, when we use regression models to analyze data, over dispersion is a common problem of concern. If we have variance heterogeneity in the regression models, then it would be more difficult to deal with statistical inference. So it is of great importance to consider the test of detecting varying scale of the skew-normal functional linear regression model. This paper is organized as follows. In Section 2, we describe the FPCA approach to skew-normal partial functional linear models and investigate the asymptotic properties. Section 3 discusses the score test for homogeneity of variance in skew normal partial functional linear models. Some simulation studies are given in Section 4, which include the asymptotic properties and powers of the score test. Section 5 demonstrates the effectiveness of the proposed method by an application for the Tecator data set. In Section 6, we conclude our results with a discussion. The technical proofs are collected in the Appendix. 2. Model and estimation 2.1. Model and FPCA Definition 2.1. A random variable Z is said to have a skew-normal distribution with location vector µ, the scale parameter σ 2 and the skewness parameter λ, denoted by Z ∼ SN(µ, σ 2 , λ), if the density function of Z is given by fZ (z) =
2
σ
ϕ(
z−µ
σ
) Φ (λ
z−µ
σ
),
where ϕ (·) denotes the standard normal density function and Φ (·) denotes the standard normal distribution function. Note that SN(µ, σ 2 , 0) is normal distribution N(µ, σ 2 ). Some important properties of the random variable Z related with moment estimators, skewness and kurtosis indexes, among others, can be found in Bolfarine et al. (2007). Let Yi be a real-valued random variable defined on a probability space (Ω , B, P ) corresponding to the ith observation, Z i be a p-dimensional vector of random variables with finite second moments and {Xij (t) : t ∈ T } be a zero mean, second2 order stochastic process defined ∫ on (Ω , B, P ) with2sample paths in L (T ), the set of all square integrable functions on T with inner product ⟨x, y⟩ = T x(t)y(t)dt, ∀x, y ∈ L (T ). For simplicity, we suppose T = [0, 1]. The partial functional linear model with skew-normal errors is given by Yi =
d ∫ ∑ j=1
βj (t)Xij (t)dt + ZiT γ + εi ,
i = 1, . . . , n
(1)
T
where βj (t) is a square integrable function on [0,1], εi ∼ SN(0, σ 2 , λ), and is independent of Zi and Xij . Model (1) generalizes both the classical linear regression model and functional linear regression model which correspond to the cases βj (t) = 0, j = 1, . . . , d and γ = 0, respectively. Moreover, this model includes the analysis of covariance model for designed experiments where the covariate is a function-valued random element. Also, note that the model (1) is natural extension of partial functional linear regression model given by Shin (2009), that is,
118
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
Yi = β (t)Xi (t)dt + ZiT γ + εi . Without loss of generality, we assume that the response Yi , the functional predictors {Xij : j = 1, . . . , d} and the scalar covariates {Zi : l = 1, . . . , p} have been centered to have mean zero. Denote the covariance function of the process Xj (t) by KXj . If KXj is continuous on T × T . Mercer’s theorem implies that there exist a complete L2 (T ) and a non-increasing sequence of non-negative eigenvalues ωjk ∑∞ortho-normal basis φjk∑in ∞ there are no ties among such that Kj (s, t) = k=1 ωjk < ∞ for given j. We further assume that ∑ k=1 ωjk φjk (s)φjk (t) with ∞ the ordered eigenvalues, so that ωj1 > ωj2 > · · · > 0. The Karhunen–Loève expansion Xij∫(t) = k=1 ξijk φjk (t) forms the foundation of the functional principal component analysis, where the coefficients ξijk = Xij (t)φjk (t)dt are uncorrelated 2 random variables with mean zero and variances E(ξijk ) = ωkj , also called the functional principal component scores. ∑∞ Expanded on the orthonormal eigenbasis {φjk }, the regression function can be written as βj (t) = k=1 bjk φjk (t). Note that we expand the regression function based on the basis determined by the covariance function. This is not unnatural because {φjk } is the unique canonical basis leading to a generalized Fourier series which gives the most rapidly convergent representation of Xj (t) in L2 space. Based on the above FPCA analysis, the model (1) can be written as
∫
Yi =
d ∞ ∑ ∑
ξijk bjk + ZiT γ + εi , i = 1, . . . , n
(2)
j=1 k=1
where ξijk = ⟨Xij (t), φjk (t)⟩, bjk = ⟨βj (t), φjk (t)⟩. Due to the infinite dimensionality of the functional predictors, smoothing and regularization are necessary in estimation. We adopt the simple yet effective truncation approach in the spirit of controlling smoothness as in classical nonparametric regression. Then the regression model can be well approximated by snj d . ∑∑
Yi =
ξijk bjk + ZiT γ + εi , i = 1, . . . , n
(3)
j=1 k=1
for sufficiently large snj . The approximate model (3) naturally suggests the idea of principal components regression (PCR). However, in practice, the φjk are unknown and must be replaced by estimates in order to estimate γ and bjk j = 1, . . . , d; k = 1, . . . , snj . For this purpose, we consider the empirical version of Kj (s, t), which is given by Kˆ Xj (s, t) =
n 1∑
n
Xij (s)Xij (t) =
∞ ∑
i=1
ωˆ jk φˆ jk (s)φˆ jk (t), j = 1, . . . , d
k=1
where the (ω ˆ jk , φˆ jk ) are pairs of eigenvalue and eigenfunction for the covariance operator associated with Kˆ Xj and ωˆ j1 ≥
ωˆ j2 ≥ · · · ≥ 0. We take (ωˆ jk , φˆ jk ) to be the estimator of (ωjk , φjk ), j = 1, . . . , d; k = 1, . . . , snj . By replacing φjk by φˆ jk , model (3) can be written as snj d . ∑∑ ˆ ξijk bjk + ZiT γ + εi = Nˆ iT b + ZiT γ + εi , i = 1, . . . , n
Yi =
(4)
j=1 k=1
where ξˆijk = ⟨Xij (t), φˆ jk (t)⟩, Nˆ i = (ξˆi11 , . . . , ξˆi1snj , . . . , ξˆidsnj ) and b = (b11 , . . . b1snj , . . . , bdsnj )T = (bT1 , . . . , bTd )., 2.2. Maximum likelihood estimation Based on Eq. (4) and the skew-normal errors, we have log-likelihood l(η) = −
+
n 2
n 1 ∑ (Yi − Nˆ iT b − ZiT γ )2
log(2πσ 2 ) −
n ∑
( log Φ
i=1
2
σ2 )
i=1
λ(Yi − Nˆ iT b − ZiT γ ) σ
.
(5)
where η = (γ T , bT , σ 2 , λ)T . From Eq. (5), it is easy to obtain the first two derivatives of l(η) with respect to η. The first order derivatives, i.e. the score functions are listed below. We denote the score function for η by S(η) = (SγT , SbT , Sσ 2 , Sλ ), then Sγ =
n ∑ (Yi − Nˆ T b − Z T γ )Zi i
σ2
i=1 n
Sb =
i
i=1
i
σ2
n ∑ φ (mi ) λZi , Φ (mi ) σ i=1
∑ (Yi − Nˆ T b − Z T γ )Nˆ i i
−
n ∑ φ (mi ) λNˆ i − , Φ (mi ) σ i=1
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
Sσ 2 = −
n 2σ 2
+
n ∑ (Yi − Nˆ T b − Z T γ )2 i
i
2σ 4
i=1
−
119
n ∑ φ (mi ) mi , Φ (mi ) 2σ 2 i=1
n ∑ φ (mi ) mi Sλ = , Φ (mi ) λ i=1
where mi = λ(Yi − Nˆ iT b − ZiT γ )/σ . The maximum likelihood estimator of parameter η, denoted by ηˆ = (γˆ T , bˆ T , σˆ 2 , λˆ )T then can be obtained by the Newton–Raphson method. Consequently, the estimator of coefficient function βj (t) can be defined by
βˆ j (t) =
snj ∑
bˆ jk φˆ jk (t).
k=1
To implement our estimation method, we need to choose snj . Here different truncation parameters snj are tuned simultaneously by AIC. Let D = {1, . . . , d}, we minimize AIC(snj : j ∈ D) = log RSS(snj : j ∈ D) + 2n−1
∑
snj
(6)
j∈D
with respect to combinations of {snj : j ∈ D}, where
RSS(snj : j ∈ D) =
⎧ n ⎨ ∑ i=1
Yi −
snj ∑∑
⎩
ξˆijk bˆ jk (snj ) −
j∈D k=1
p ∑
Zil γˆl (snj )
l=1
⎫2 ⎬
,
⎭
with bˆ jk (snj ) and γˆl (snj ) being the estimated value. 2.3. Main results Denote the true values of (γ , b, σ 2 , λ) by (γ0 , b0 , σ02 , λ0 ). In order to establish asymptotic properties of the estimators, the following assumptions are required. In the remaining part of the paper, we denote by c a generic positive constant, which may take different values at different places. Assumption 1. For j = 1, . . . , d, for any c > 0 there exists an ϵ > 0 such that sup[E {|Xj (t)|c }] < ∞, t ∈T
sup (E [{|s − t |−ϵ |Xj (s) − Xj (t)|c }]) < ∞.
s,t ∈T
−r For each integer r ≥ 1, ωjk E(ξjk2r ) is bounded uniformly in k.
Assumption 2. For j = 1, . . . , d, Xj (t) is twice continuously differentiable on [0, 1] with probability 1 and
∫
(2)
E [Xj (t)]4 dt <
∞, Xj(2) (t) denotes the second derivative of Xj (t). Assumption 3. For j = 1, . . . , d and k ≥ 1, there exists an a > 1 such that ωjk − ωj(k+1) ≥ Ck−a−1 . Assumption 4. For j = 1, . . . , d and k ≥ 1, there exists a g > a/2 + 1 such that |bjk0 | ≤ ck−g for k > 1. These decay conditions are needed only to control the tail behavior for large k, and so are not as restrictive as they appear. Without loss of generality, we use a common truncation parameter sn in the theoretical analysis. It is important to control sn appropriately. +2 Assumption 5. (s2a + san+4 )/n = o(1). n 2g −1
Assumption 6. sn
/n → ∞ as n → ∞.
Assumption 7. The random vector Z has bounded fourth moments. Assumption 8. limn→∞ E(−(1/n)(∂ 2 l(η0 )/∂ρ∂ρ T )) = Iρ , where Iρ (ρ = (γ , σ 2 , λ)T ) is a positive definite matrix. Assumption 1 consists of regularity assumptions for functional data, for example, a Gaussian process with Hölder continuous sample paths satisfies Assumption 1, see Hall and Hosseini-nasab (2009). Assumption 2 is standard for local linear smoothers. Assumption 3 implies that ωjk ≥ Ck−a . Assumption 5 control sn is not to be too large due to
120
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
increasingly unstable functional principal component estimates. Assumption 6 control sn is not to be too small so that the covariances between Z and {ξjk : k ≥ sn + 1} are asymptotically negligible. Assumptions 5 and 6 also entail that g > max(a + 3/2, a/2 + 5/2) as a sufficient condition for such a sn exists. Theorem 1. Under Assumptions 1–8, then
√
D
n(ρˆ − ρ0 ) −→ N(0, Iρ−1 ), D
where ρ = (γ , σ 2 , λ), ρˆ = (γˆ , σˆ 2 , λˆ ) and ‘‘−→’’ denotes convergence in distribution. This theorem shows that asymptotic normality of the estimated parameters is achieved when the functional predictor is modeled to the response parametrically (partial functional linear model). 3. Score test for homogeneity of variance In this section we conduct score test for homogeneity of variance [ following the ] idea of Xu et al. (2015). In common 2λ2 skew-normal partial functional linear models, the variance of Yi is 1 − (1+λ2 )π σ 2 and the scale parameter σ 2 is a in many real cases the scale parameter may be related with ith observation, that is, Var(Yi ) = ] [constant. 2However, 2λ σ 2 . Detecting heteroscedasticity is important to statistical inference. The presence of heteroscedastic 1 − (1+λ 2 )π
residuals suggests potential efficiency improvements upon standard estimators and/or power gains for tests based on them. Therefore, it is necessary to test the homogeneity of the scale parameters. We assume that
σi2 = σ 2 g(hTi θ ),
(7)
where θ is an unknown vector; hi is a vector of explanatory variables. g(·) is any strictly positive, twice differentiable function such that g(0) = 1, g ′ (0) ̸ = 0, and σ 2 is a positive constant. Obviously, if θ = 0, then σi2 = σ 2 and Yi s have the same variance. A choice of the explanatory variables hi , is according to the domain knowledge or modeling convenience. Then the test of homogeneity of the variance can be expressed by H0 : θ = 0
vs.
H1 : θ ̸ = 0 .
(8)
Denote ζ = (θ , η ) , then the log-likelihood of ζ , ignoring additive constant terms, can be written as T
l(θ , η)
=
T T
−
n 1∑
2
log σi2 −
i=1
(
n
+
∑
log Φ
i=1
n 1 ∑ (Yi − Nˆ iT b − ZiT γ )2
2
i=1
σi2 )
λ(Yi − Nˆ iT b − ZiT γ ) . σi
According to the above log-likelihood function, the score function is n n n m ∗ Vi ∂ l(θ , η) 1 ∑ Vi 1 ∑ (Yi − Nˆ iT b − ZiT γ )2 1∑ =− + V − u(m∗i ) i , i 2 ∂θ 2 gi 2 2 gi σ 2 gi i=1
i=1
i=1
where Vi = ∂ θ /∂θ , gi = θ ), mi = λ(Yi − ˆ − γ )/σi , and u(m∗i ) = φ (m∗i )/Φ (m∗i ). Let ζˆ0 = (0 , ηˆ ) denote the maximum likelihood estimate of ζ under H0 . Then the score function of l(θ, η) is g(hTi ) T T T
g(hTi
∗
NiT b
ZiT
⏐ ∂ l(θ , η) ⏐⏐ = {Vh}|ζˆ0 , ∂θ ⏐ ζˆ0
where V = (V1 , . . . , Vn ), h = (h1 , . . . , hn )T , hi = −1/2 + m2i /2 − (1/2)u(mi )mi , mi = (Yi − Nˆ iT b − ZiT γ )/σ . The second order derivations of l(θ, η) with respect to θ and η can also be derived easily. Then the information matrix of ζ under H0 is
∂ 2 l(θ , η) ⎜− ∂θ ∂θ T I(ζ ) = ⎜ ⎝ ∂ 2 l(θ , η) − ∂η∂θ T
⎛
⎞ ∂ 2 l(θ, η) ( −¨lθ ,θ (θ, η) ∂θ∂ηT ⎟ ⎟ = ∂ 2 l(θ, η) ⎠ −¨lη,θ (θ, η) − ∂η∂ηT −
) −¨lθ,η (θ, η) . −¨lη,η (θ, η)
where ∂ 2 l(θ , η)/∂θ ∂ηT = [∂ 2 l(θ, η)/∂θ∂γ T , ∂ 2 l(θ, η)/∂θ∂ bT , ∂ 2 l(θ, η)/∂θ∂σ 2 , ∂ 2 l(θ, η)/∂θ∂λ]. Then the score statistic for hypothesis testing (8) is Tsc = {hT V T Iζθ θ Vh}ζˆ , 0
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
121
where Iζθ θ is the upper left corner block of Iζ−1 corresponding to θ . The score statistics Tsc has the following asymptotic property. Theorem 2. Under Assumptions 1–8 and H0 , we have d
Tsc = {hT V T Iζθ θ Vh}ζˆ −→ χq2 , 0
where χq2 denotes the χ 2 distribution with q degrees of freedom. 4. Simulation study In this section, simulation studies are presented to evaluate the performance the proposed procedures. The simulated data {Yi } are generated from the model Yi =
2 ∫ ∑ j=1
1
βj (t)Xij (t) + ZiT γ + εi ,
i = 1, . . . , n
(9)
0
√
√
where β1 (t) = 2 sin(π t /2) + 3 2 sin(3π t /2), β2 (t) = 2β1 (t). Zi = (Zi1 , Zi2 , Zi3 )T follows the multivariate normal distribution N3 (0, I), γ = (1, 0.5, 2) and εi ∼ SN(0, σi2 , λ). σi2 = σ02 g(hi θ ), σ02 = 0.5, g(hi θ ) = exp(hi θ ), hi ∼ N(0, 0.5). ∑50 We suppose the functional predictors can be expressed as Xi1 (t) = j=1 ξj vj (t), where the ξj are independently
√
distributed as normal distribution with mean 0 and variance ωj = ((j − 0.5)π )−2 , vj (t) = 2 sin((j − 0.5)π ), and Xi2 (t) = 2Xi1 (t) + 3. Note that the two predictors are correlated. For the actual observations, we assume that they are realizations of {Xij (·)}, j = 1, 2 at an equally spaced grid of 100 points in [0,1]. As we have said in Section 2.2 different truncation parameters snj , j = 1, 2 are tuned simultaneously by AIC in our simulation. We first examine the performance of the maximum likelihood estimator for skew normal partial functional linear models. We use 1000 Monte Carlo runs for estimation assessment. For different skewness parameter λ, we consider sample size n = 50, 100, 200, 400 at θ = 0. We use the average bias, standard deviation (SD) and mean squared errors (MSE) as measure of parametric estimation accuracy. Also we use the square root of average squared errors(RASE) as standard measure of functional coefficients estimation. The RASE of an estimator of β (t) is defined as
{ RASEˆ βj =
N 1 ∑
N
[βˆ j (tl ) − βj (tl )]2
}1/2 ,
j = 1, 2,
l=1
where {tl , l = 1, . . . , N } are grid points for calculating functional value of βˆ j (t). The simulation results for point estimates of γ , σ 2 and λ with different values for n and different λ are reported in Tables 1–4. For every setting we repeat 1000 times to obtain the results. Table 5 gives the RASE of βˆ 1 (t). From the results we can see the following facts. Firstly, the performance of all these parameter estimates becomes better as the sample size increases. Secondly, the values of RASEβˆ (t) decrease as the sample size increases. This indicates that the estimate curves 1 fit better to the corresponding true curve as the sample size increases. Thirdly, under the different values of skewness parameter λ, the estimates of parametric components and nonparametric components perform similarly if θ = 0. Fig. 1 gives the average estimate curve of the regression function with different λs when n = 100. There is no significant difference for the other settings, so we omit them here. From Fig. 1 we can see that the average estimate curve fits true regression function curve well. If β2 (t) = 0, then model (9) just has a single functional predictor. Shin (2009) proposes estimators for the parameters of a partial functional linear model with a single functional predictor. In that paper random error ε is assumed with zero mean and variance σ 2 . In order to know the performance of estimators in skew-normal distribution error, we conduct additional simulation. Tables 6–8 give performance of Shin’s estimators and our estimators in different λ value. From the results we can see that when error is skew-normal our Newton–Raphson estimators outweigh Shin’s estimators, especially for β1 (t). Table 9 gives the power of the score statistic. We set n = 50, 100, 200, 400, 500 and θ = 0, 0.2, 0.4, 0.6 under different values of skewness parameter λ. For every setting we repeat 1000 times to obtain the results. The values of score test statistic are calculated by the formula shown in Section 3. It is noted that the frequency of rejection to the null hypothesis is just the simulated value of power. Here, the test level we adopted is α = 0.05. The results show that although there is some inflation for the actual sizes of the test when n is not very large, but they are close to 0.05 as n increases. The results also indicate that the power of tests increases quickly as n and θ increase which is rational in hypothesis testing. 5. Real data analysis We demonstrate the effectiveness of the proposed method by an application for the Tecator data set, which is available at http://lib.stat.cmu.edu/datasets/tecator, and has been widely used in the context of functional data (see, e.g., AneirosPérez and Vieu, 2006). The Tecator data set contains 215 samples; each sample contains the finely chopped pure meat
122
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
Fig. 1. The average estimate curves of the regression function with different λs when n = 100 (from left to right responding to λ = 1, 0).
Table 1 Parameters estimation in simulation when θ = 0 and λ = −1. n 100 200 300 500
Bias SD Bias SD Bias SD Bias SD
γ1
γ2
γ3
σ2
λ
−0.0253
−0.0235
−0.0185
0.1865 −0.0082 0.1302 0.0016 0.1056 0.0071 0.0834
0.1756 0.0031 0.1323 −0.0071 0.1067 0.0054 0.0878
0.1532 0.0275 0.1324 0.0234 0.1235 0.0289 0.1125 0.0175
0.7187 0.0425 0.5510 0.0387 0.3565 0.0368 0.1925 0.0285
0.1887
−0.0046 0.1342 0.0032 0.1099 −0.0089 0.0845
Table 2 Parameters estimation in simulation when θ = 0 and λ = 0. n 100 200 300 500
Bias SD Bias SD Bias SD Bias SD
γ1
γ2
γ3
σ2
λ
−0.0081
0.0098 0.2126 0.0079 0.1490 −0.0025 0.1175 4e−04 0.0895
1e−04 0.1998 0.0165 0.1552 0.0045 0.1076 0.0054 0.0889
0.1895 0.0436 0.1698 0.0285 0.1535 0.0235 0.1454 0.0185
0.5245 0.0454 0.4365 0.0395 0.3596 0.0345 0.2895 0.0265
0.2035
−0.0265 0.1565
−0.0015 0.1165 0.0055 0.0844
Table 3 Parameters estimation in simulation when θ = 0 and λ = 1. n 100 200 300 500
Bias SD Bias SD Bias SD Bias SD
γ1
γ2
γ3
σ2
λ
−0.0098
−0.0028 0.2245 0.0125 0.1365 −0.0064 0.1298 0.0055 0.0936
0.2135 0.0475 0.1854 0.0276 0.1754 0.0245 0.1615 0.0195
−0.2895
0.2015 −0.0086 0.1478 −0.0055 0.1140 −0.0095 0.0954
0.0143 0.1994 0.0032 0.1585 0.0087 0.1123 −0.0076 0.0875
0.0486 −0.2554 0.0356 −0.2426 0.0323 −0.2235 0.0265
Table 4 Parameters estimation in simulation when θ = 0 and λ = 2. n 100 200 300 500
Bias SD Bias SD Bias SD Bias SD
γ1
γ2
γ3
σ2
λ
0.0175 0.2213 −0.0173 0.1423 8e−04 0.1225 −0.0072 0.0999
−0.0035
0.0034 0.2132 0.0287 0.1388 −0.0135 0.1203 −6e−04 0.0887
0.2165 0.0332 0.1899 0.0295 0.1790 0.0254 0.1675 0.0312
−0.2923
0.2298 0.0034 0.1398 −0.0032 0.1245 −5e−04 0.0965
0.0356
−0.2576 0.0399
−0.2414 0.0355
−0.1254 0.0289
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
123
Table 5 RASE of β1 (t) in simulation when θ = 0. Sample size
λ = −1
λ=0
λ=1
λ=2
n n n n
0.1787 0.0598 0.0375 0.0235
0.1203 0.0565 0.0454 0.0320
0.1621 0.0895 0.0443 0.0235
0.0954 0.0885 0.0490 0.0395
= 100 = 200 = 300 = 500
Table 6 Bias, MSE of estimates of γ and RASE of estimates of β1 (t) when θ = 0 and λ = 3. N
100 200 300
Bias
Our estimate Shin’s estimate Our estimate Shin’s estimate Our estimate Shin’s estimate
MSE
RASE
γ1
γ2
γ3
γ1
γ2
γ3
β1 (t)
0.0149 −0.1833 −0.0032 −0.0786 −0.0005 −0.0955
0.0109 −0.1892 −0.0405 −0.0886 −0.0019 −0.0966
−0.0196 −0.9548
0.0481 0.1687 0.0187 0.0579 0.0120 0.0911
0.0408 0.1758 0.0209 0.0549 0.0115 0.0564
0.0326 0.8506 0.0184 0.0661 0.0116 0.0695
0.0787 1.2781 0.0517 1.9391 0.0273 0.1179
0.0045
−0.0623 0.0055
−0.0899
Table 7 Bias, MSE of estimates of γ and RASE of estimates of β1 (t) when θ = 0 and λ = 1. N
100 200 300
Bias
Our estimate Shin’s estimate Our estimate Shin’s estimate Our estimate Shin’s estimate
MSE
RASE
γ1
γ2
γ3
γ1
γ2
γ3
β1 (t)
−0.0133 −0.0949
−0.0124 −0.0941
0.0248
−0.0816
0.0353 0.0908 0.0222 0.0798 0.0152 0.0602
0.0446 0.0839 0.0189 0.0608 0.0111 0.0573
0.0544 0.0752 0.0264 0.0407 0.0138 0.0322
0.0975 2.3880 0.0484 2.1207 0.0939 2.3034
0.0030
0.0146
0.0034
−0.0896 −0.0035 −0.06950
−0.0947
−0.0794
0.0079
0.0111
−0.0879
−0.0601
Table 8 Bias, MSE of estimates of γ and RASE of estimates of β1 (t) when θ = 0 and λ = 2. N
100 200 300
Bias
Our estimate Shin’s estimate Our estimate Shin’s estimate Our estimate Shin’s estimate
MSE
RASE
γ1
γ2
γ3
γ1
γ2
γ3
β1 (t)
−0.0163 −0.0941
0.0097 −0.0976 0.0077 −0.0873 0.0014 −0.0681
0.0041 −0.0909 0.0057 −0.0796 0.0017 −0.0500
0.0444 0.0898 0.0197 0.0706 0.0171 0.0611
0.0415 0.0842 0.0212 0.0753 0.0142 0.0500
0.0401 0.0989 0.0233 0.0884 0.0146 0.0623
0.1624 3.6201 0.0088 3.1404 0.0482 3.0245
0.0024
−0.0752 0.0038
−0.0654
Table 9 Simulated size and power of SC. n
θ =0
θ = 0.2
θ = 0.4
θ = 0.6
0.0865 0.0725 0.0645 0.0585 0.0515
0.0985 0.1065 0.1975 0.2980 0.3575
0.2635 0.4235 0.6685 0.8975 0.9954
0.4335 0.6525 0.8990 0.9985 1
0.0775 0.0655 0.06115 0.0545 0.0515
0.1095 0.1855 0.1835 0.2820 0.3825
0.2635 0.4235 0.7105 0.8635 0.9755
0.4420 0.6865 0.9515 0.9945 1
0.0825 0.0645 0.0615 0.0435 0.0515
0.0945 0.1535 0.2015 0.3025 0.4545
0.2510 0.4325 0.6565 0.9215 0.9980
0.4855 0.7565 0.8865 1 1
λ = −1 100 200 300 500 700
λ=0 100 200 300 500 700
λ=1 100 200 300 500 700
124
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
Fig. 2. The histogram figure and fitted distribution of Y . Table 10 The PMSEs for different model with 1000 times randomly splits. model
PMSE
Skew-normal partial functional linear model Partial functional linear model
0.7073 0.7757
with different moisture, fat, and protein contents, which are measured in percentages and are determined by the analytical chemistry. The functional covariate by X (t) for each food sample consists of a 100-channel spectrum of absorbances recorded on a Tecator Infratec Food and Feed Analyzer working in the wavelength range 850–1050 nm by the nearinfrared transmission (NIT) principle. In our analysis, we denote the fat content as Yi , the protein content as Z1i , the moisture content as Z2i , and the absorbances as Xi (t). In order to predict the fat content of a meat sample, many models and algorithms are proposed to fit the data; see, for example, Aneiros-Pérez and Vieu (2006). We first draw the distribution of Y , see Fig. 2. We can clearly see that the data does not follow normal distribution or spherically symmetric distribution. In this paper, to fit the data, we consider the skew-normal partial functional linear models. To compare the performance of different models, the sample is divided into two data sets: the training sample I1 = {1, . . . , 165} is used to obtain the estimators of the parameter and the nonparametric function, and the testing sample I2 = {166, . . . , 215} is used to verify the quality of prediction by the following mean squared error of prediction (PMSE), PMSE =
2 1 ∑ (Yi − Yˆi )
50
i∈I2
VarI2 (Y )
.
(10)
where Yˆi is the predicted value based on the training sample and VarI2 is variance of response variables from test sample. To assess the performance of the proposed model, we compare with Shin’s partial functional linear models (2009). The PMSE results are reported in Table 10. From Table 10 we can see that the PMSE of our model are relatively smaller, which shows that it is effective to consider skew-normal error. 6. Discussion and conclusion In practice we do not observe the entire trajectories Xij (t) but have only intermittent noisy measurements Wijl (t) = Xij (tijl ) + eijl , where {eijl , i = 1, . . . , n} are independent and identically distributed measurement errors independent of Xij (t), satisfying E(eijl ) = 0, Var(eijl ) = σij2 for i = 1, . . . , n and l = 1, . . . , mij . If the repeated observations are sufficiently dense for each subject, a common method is to use a smoother through {(tijl , Wijl (t)), l = 1, . . . , mij } and then we can use the estimates {Xˆ ij (t), i = 1, . . . , n, j = 1, . . . , d} to construct the covariance, eigenvalues/basis and functional principal component scores, for details one can see the Supplementary Material of Kong et al. (2016) where they proved that the estimators obtained from Xˆ ij (t) and those obtained from the true Xij (t) were asymptotic equivalent. Acknowledgment This research has been supported by the National Social Science Foundation of China (18BTJ021).
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
125
Appendix
Lemma A.1. Suppose Assumptions 1–7 hold. Then, one has
⏐ 1 ∂ l(η) ⏐ ⏐ √ n ∂γ ⏐
(γ0 ,b0 ,σ02 ,λ0 )
) ( ∑d ∑∞ T n ξ b − Z γ Zi Y − ∑ ijk jk0 0 i i k = 1 j = 1 1 = √ 2 n σ0 i=1 n 1 ∑ φ (m0i ) λ0 Zi
−√ where m0i = λ0 ((Yi −
∑d ∑∞
k=1
j=1
n
Φ (m0i ) σ0
i=1
+ Op (sn2−g ),
ξijk bjk0 − ZiT γ0 )/σ0 ).
∑d ∑∞
ξijk bjk0 − ZiT γ0 and mi0 ≜ λ0 ((Yi − Nˆ iT b0 − ZiT γ0 )/σ0 ). By direct calculation, we can have 2 n n ∂ l(η) ∑ ∑ φ (m0i ) λ0 Zi A i Zi − | + 2 2 0 ∂γ (γ0 ,b0 ,σ0 ,λ0 ) γ σ Φ (m ) 0 0 i i=1 i=1 n 2 [ ] n n ∑ (Y − Nˆ T b − Z T γ )Z ∑ ∑ φ (m0i ) Ai Zi φ (mi0 ) λ0 Zi i i 0 i 0 i = − + − 2 2 Φ (mi0 ) γ0 σ σ Φ (m0i ) 0 0 i=1 i=1 i=1 2 ⎛ ⎞ 2 ∑ sn n d ∞ d ∑ ∑ ∑ ∑ n ∑ λ 0 Zi 0 2 ⎝ ξijk bjk0 − ≤ ξˆijk bjk0 ⎠ Zi /σ0 + (u(mi ) − u(mi0 )) . γ0
Proof. Denote Ai ≜ Yi −
i=1
j=1
j=1 k=1
k=1
i=1
j=1 k=1
Note that
=
≤
⎛ ⎞ 2 ∑ sn d ∞ d ∑ ∑ ∑ n ∑ ⎝ ⎠ ˆ ξijk bjk0 − ξijk bjk0 Zi i=1 j=1 k=1 j=1 k=1 2 ∑ sn d d ∞ ∑ ∑ ∑ n ∑ ˆ (ξijk − ξijk )bjk0 Zi + ξijk bjk0 Zi i=1 j=1 k=1 j=1 k=sn +1 ⎛ 2 ⎞ 2 s n ∞ n n ∑ ∑ ∑ ∑ ξijk bjk0 Zi ⎠ . Op ⎝ (ξijk − ξˆijk )bjk0 Zi +
(A.1)
i=1 k=sn +1
i=1 k=1
According to Lemma 1(b) of Kong et al. (2016) with the help of Assumptions 1–3 and Assumption 5, we have 1
∥φˆ jk − φjk ∥ = Op (kn− 2 ). Combined with Assumptions 1, 4 and 7, we have n s 2 n ∫ 2 s 2 n n ∑ ∑ ∑ ∑ ˆ ˆ (ξijk − ξijk )bjk0 Zi ≤ Xij (t)Zil dt (φjk − φjk )bjk0 i=1 k=1 i=1 k=1 ( s )2 n ∑ 1 k−b kn− 2 = Op (n2 )Op =
Op (ns4n−2b ).
k=1
∑n ∑∞
i=1 k=sn +1 ξijk bjk0 Zi , given Assumption 6, we have ⏐ n ⏐ ∞ n ∞ ⏐∑ ∑ ⏐ ∑ ∑ ⏐ ⏐ E⏐ ξijk bjk0 Zi ⏐ ≤ |bjk0 |E |ξijk Zil | ⏐ ⏐
For
i=1 k=sn +1
i=1 k=sn +1
≤
n ∞ ∑ ∑
2 |bjk0 |E {ξijk EZil2 }1/2 |
i=1 k=sn +1
≤
n ∞ ∑ ∑ i=1 k=sn +1
k−g k−1/2 = O(nsn−g +1/2 ) = o(n1/2 ).
(A.2)
126
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
So
n 2 ∞ ∑ ∑ ξijk bjk0 Zi = op (n).
(A.3)
i=1 k=sn +1
Note that m0i ⎛ − mi0
=
=
⎞ ⎞ ⎛ sn d ∞ d ∑ ∑ ∑ ∑ λ λ0 ⎝ 0 ⎝Y i − ξˆijk bjk0 − ZiT γ0 ⎠ Yi − ξijk bjk0 − ZiT γ0 ⎠ − σ0 σ0 j=1 k=1 ( sj=1 k=1 ) d ∞ n ∑ λ0 ∑ ∑ (ξˆijk − ξijk )bjk0 − ξijk bjk0 . σ0 j=1
k=1
k=sn +1
Combine (A.1) with (A.2), (A.3) and (A.4), we have the conclusion of Lemma A.1. Lemma A.2. Suppose Assumptions 1–7 hold. Then, we have: 1 ∂ l(η) ⏐
√
⏐ d ⏐ −→ N(0, Iρ ), n ∂ρ ⏐(γ0 ,b0 ,σ 2 ,λ0 ) 0
where ∂ l(η)/∂ρ = ((∂ l(η)/∂γ )T , ∂ l(η)/∂σ 2 , ∂ l(η)/∂λ)T . Proof. By Lemma A.1 and Assumptions 5 and 6, we have
⏐ 1 ∂ l(η) ⏐ ⏐ √ n ∂γ ⏐
(γ0 ,b0 ,σ02 ,λ0 )
( ) ∑d ∑∞ T n Y − ξ b − Z γ Zi ∑ i ijk jk0 0 i j=1 k=1 1 = √ 2 n σ0 i=1 n 1 ∑ φ (m0i ) λ0 Zi
−√
n
i=1
Φ (m0i ) γ0
+ op (1),
Similarly, we also have
) ( ∑d ∑∞ T n ξ b − Z γ Y − ∑ ijk jk0 0 i i k=1 j=1 n 1 =− 2 +√ n 2σ0 2σ04 i=1 ( ) ∑d ∑∞ T n 1 ∑ φ (m0i ) λ0 Yi − j=1 k=1 ξijk bjk0 − Zi γ0 −√ n Φ (m0i ) 2σ03 i=1 √
⏐ 1 ∂ l(η) ⏐ ⏐ √ n ∂σ 2 ⏐
(γ0 ,b0 ,σ02 ,λ0 )
+op (1),
⏐ n 1 ∑ φ (m0i ) m0i ⏐ = + op (1). √ n ∂λ ⏐(γ0 ,b0 ,σ 2 ,λ0 ) n Φ (m0i ) λ0 i=1
1 ∂ l(η) ⏐
√
0
The following proof is the same as Lemma A.3’s proof of Xu et al. (2015), so we omit it here. Proof of Theorem 1. By simple calculation, we can have
⏐ ⏐ ⏐ ⏐ ∂ l(η) ⏐⏐ ∂ l(η) ⏐⏐ ∂ 2 l(η) ⏐ = + ( ρ ˆ − ρ ) + op (ρˆ − ρ0 ). 0 ⏐ ⏐ ⏐ T ∂ρ (ρ=ρ, ∂ρ η=η0 ∂ρ∂ρ ˆ b0 ) (η=η0 ) Then we have
√
n(ρˆ − ρ0 ) =
( −
1 n
By Assumption 8, we have
−
1 ∂ 2 l(η) n ∂ρ∂ρ T
P
−→ Iρ .
(
∂ 2 l(η) + op (1) ∂ρ∂ρ T
))−1
⏐ ∂ l(η) ⏐⏐ . ∂ρ ⏐η=η0
(A.4)
Y. Hu, L. Xue, J. Zhao et al. / Journal of Statistical Planning and Inference 204 (2020) 116–127
127
Combined Lemma A.2 with Slustsky theorem, we have
√
d
n(ρˆ − ρ0 ) −→ N(0, Iρ−1 ).
Now we complete the proof of Theorem 1. Proof of Theorem 2. The proof of Theorem 2 follows the proof of Theorem 3 of Xu et al. (2015), so we omitted it here. References Aneiros-Pérez, G., Vieu, P., 2006. Semi-functional partial linear regression. Statist. Probab. Lett. 76 (11), 1102–1110. Besse, P., Ramsay, J.O., 1986. Principal component analysis of sampled functions. Psychometrika. 51, 285–311. Bolfarine, H., Montenegro, L.C., Lachos, V.H., 2007. Influence diagnostics for skew-normal linear mixed models. Sankhya¯ : Indian J. Stat. 69, 648–670. Cancho, V.G., Lachos, V.H., Ortega, E.M.M., 2010. A no nlinear regression model with skew-normal errors. Stat Pap. 51, 547–558. Cardot, H., Ferraty, F., Sarda, P., 1999. Functional linear model. Statist. Probab. Lett. 45 (1), 11–22. Dauxois, J., Pousse, A., Romain, Y., 1982. Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J. Multivariate Anal. 12 (1), 136–154. Feng, S.Y., Xue, L.G., 2016. Partially functional linear varying coefficient model. Statistics. 50 (4), 717–732. Hall, P., Horowitz, J.L., 2007. Methodology and convergence rates for functional linear regression. Ann. Statist. 35 (1), 70–91. Hall, P., Hosseini-nasab, M., 2009. Theory for high-order bounds in functional principal components analysis. Math. Proc. Cambridge Philos. Soc. 146 (1), 225–256. Horváth, L., Kokoszka, P., Reeder, R., 2013. Estimation of the mean of functional time series and a two-sample problem. J. R. Statist. Soc. Ser. B. 75 (1), 103–122. Huang, L.L., Zhao, J.L., Wang, H.W., Wang, S.Y., 2016. Robust shrinkage estimation and selection for functional multiple linear model through LAD loss. Comput. Statist. Data Anal. 103, 384–400. Kong, D., Xue, K., Yao, F., Zhang, H.H., 2016. Zhang HH partially functional linear regression in high dimensions (supplementary material). Biometrika. 103 (1), 147–159. Lian, H., Li, G., 2014. Series expansion for functional sufficient dimension reduction. J. Multivariate Anal. 124, 150–165. Lu, Y., Du, J., Sun, Z., 2014. Functional partially linear quantile regression model. Metrika. 77, 317–332. Ma, Y., Genton, M.G., 2004. Flexible class of skew-symmetric distributions. Scand. J. Statist.. 31 (3), 459–468. Ramsay, J.O., Silverman, B.W., 2002. Applied Functional Data Analysis: Methods and Case Studies. Springer, New York. Ramsay, J.O., Silverman, B.W., 2005. Functional Data Analysis, second ed. Springer, New York. Rice, J., Silverman, B.W., 1991. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. Ser. B. 53, 233–243. Sahu, S.K., Dey, D.K., Branco, M., 2003. New class of multivariate distributions with applications to bayesian regression models. Can J. Stat. 29, 129–150. Shin, H., 2009. Partial functional linear regression. J. Stat. Plan Inference 139 (10), 3405–3418. Wu, M.X., Zhao, J., Wang, T.H., Zhao, Y., 2017. The anova-type inference in linear mixed model with skew-normal error. J. Syst. Sci. Complex 30, 710–720. Xie, F.C., Lin, J.G., Wei, B.C., 2009. Diagnostics for skew-normal nonlinear regression models with AR(1) errors. Comput. Statist. Data Anal. 53, 4403–4416. Xu, D.K., Zhang, Z.Z., Du, J., 2015. Skew-normal semiparametric varying coefficient model and score test. J. Stat. Comput. Simul. 85 (2), 216–234. Yu, P., Du, J., Zhang, Z.Z., 2017. Varying-coefficient partially functional linear quantile regression models. J. Korean Statist. Soc. 46, 462–475. Yu, P., Zhang, Z.Z., Du, J., 2016. A test of linearity in partial functional linear regression. Metrika. 79 (8), 953–969. Zhang, D., Lin, X., Sowers, M.F., 2007. Assessing the effects of reproductive hormone profiles on bone mineral density using functional two-stage mixed models. Biometrics. 63, 351–362.