ARTICLE IN PRESS Journal of Statistical Planning and Inference 140 (2010) 837–850
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Asymptotic properties of nonparametric regression for long memory random fields Lihong Wang a,1, Haiyan Cai b, a b
Department of Mathematics, Institute of Mathematical Science, Nanjing University, China Department of Mathematics and Computer Science, University of Missouri—St. Louis, USA
a r t i c l e in fo
abstract
Article history: Received 13 June 2008 Received in revised form 24 June 2009 Accepted 14 September 2009 Available online 20 September 2009
The kernel estimator of spatial regression function is investigated for stationary long memory (long range dependent) random fields observed over a finite set of spatial points. A general result on the strong consistency of the kernel density estimator is first obtained for the long memory random fields, and then, under some mild regularity assumptions, the asymptotic behaviors of the regression estimator are established. For the linear long memory random fields, a weak convergence theorem is also obtained for kernel density estimator. Finally, some related issues on the inference of long memory random fields are discussed through a simulation example. & 2009 Elsevier B.V. All rights reserved.
Keywords: Spatial models Long memory random fields Kernel estimation Asymptotic normality Strong consistency
1. Introduction The concept of long memory has received considerable attention in recent years, due to the fact that more empirical data with long range temporal or spatial dependence are seen in a wide range of applications. In this paper, we focus on long memory random field which extends the concept of classical long memory time series to the spatial domain. The study of long memory random fields presents interesting and challenging probabilistic and statistical problems and it is important to many scientific investigations. In fact, the long memory phenomenon is more commonly seen in higher dimensional random fields than in one-dimensional time series. For example, it is remarkable to note that unlike onedimensional cases, many common Markovian random fields at two or higher dimensions can exhibit long memory behavior characterized by various phase transitions. The significance of these phase transitions in understanding a physical system is well-documented in the physical literature, and it underlines the importance of developing statistical methods for analyzing data generated from such systems. For more details about analysis and applications of long memory time series and long memory random fields in general, we refer the readers to Dobrushin and Major (1979), Ivanov and Leonenko (1989), Beran (1994), Leonenko (1999), Doukhan et al. (2003), Gue´gan (2005), Lavancier (2006), Palma (2007) and the references therein. To describe the problem more specifically, let S be a countable point set in Rd . We consider a random field defined as a collection of random vectors ðYi ; Xi Þ 2 R R defined over some probability space ðO; F ; PÞ and indexed in S. Note that here
Corresponding author.
E-mail address:
[email protected] (H. Cai). Research partly supported by NSFC Grants and NSFJS Grant (no. BK2009222), and by the Cultivation Fund of the Key Scientific and Technical Innovation Project, Ministry of Education of China (no. 708044) of Lihong Wang at Nanjing University. 1
0378-3758/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.09.012
ARTICLE IN PRESS 838
L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
we do not require S to be a regular lattice point set. We assume that the random field fðYi ; Xi Þ; i 2 Sg is stationary in the sense that (a) there is a common marginal distribution for all Xi , i 2 S, (b) the joint distribution of any two random variables Xi and Xj depends only on their distance in space but not on their actual locations, and (c) the conditional expectation of Yi given Xi ¼ x exists and is independent of i. We model spatial data as a realization of such a random field. In the regression problem of this paper, Yi will be the response variable and Xi will be the explanatory variable. In the case d ¼ 1, a random field reduces to a time series. The case d41 is necessary for modeling stochastic phenomena evolving in higher dimensional spaces and showing spatial interactions. Spatial data arise in many applied areas and the statistical treatments of such data are the subjects of an abundant literature; see, for example, Ripley (1981), Cressie (1991), Guyon (1995), Haining (2003), and Oliver and Carol (2005). Suppose that the random field fYi ; Xi g is observed over a finite subset In of S, where In extends increasingly to the whole S as n-1. Let f ðxÞ be the common marginal density function of Xi , i 2 S, and gðxÞ be the conditional expectation of Yi given Xi ¼ x. Nonparametric estimation of the density function f ðxÞ and the regression function gðxÞ are fundamental problems in statistics. The aim of this paper is to study the asymptotic properties of the kernel estimates of these functions for long memory random fields without imposing any parametric structure a priori. To be more specific, we consider the following regression model. Let X ¼ fXi ; i 2 Sg be a long memory random field and Yi ¼ gðXi Þ þ ei ;
i 2 In ;
ð1Þ
where the random error e ¼ fei ; i 2 Sg is itself a long memory random field with 0 mean. We assume that e is independent of X. The kernel density estimator of f ðxÞ takes the usual form 1 X1 x Xi ; ð2Þ fn ðxÞ ¼ K hn jIn j i2I hn n
where hn is a sequence of bandwidths tending to zero at an appropriate rate as n tends to infinity, K is a kernel satisfying the conditions given in the next section, and jIn j denotes the number of points in In . The regression function gðxÞ can then be estimated with 1 X1 x Xi fn ðxÞ: gn ðxÞ ¼ Yi K ð3Þ hn jIn j i2I hn n
These estimators have been studied extensively in the literature for d ¼ 1. Asymptotic properties of the estimators under ¨ the conditions of weak dependence as well as long range dependence are investigated (see, for example, Masry and Gyorfi, ¨ o+ and Mielniczuk, 1995, 1999; Ho and Hsing, 1996; Lu and 1987; Boente and Fraiman, 1988; Truong and Stone, 1992; Csorg Cheng, 1997; Wu and Mielniczuk, 2002). However, it is not always a trivial extension to obtain similar results in the higher (d41) dimensional spaces. Several recent papers, e.g. Carbon et al. (1997, 2007), Hallin et al. (2001, 2004a, b) and Lu and Chen (2004) examine the density and regression estimation problem for stationary random fields. Their results, however, are based on mixing conditions and therefore only apply to weakly dependent data. To the best of our knowledge, the spatial version ðd41Þ of the density and regression estimation for long memory random fields remains essentially unexplored. This is the issue we intend to address in the current paper. In the following, we will first prove a pointwise strong consistency theorem for the kernel density estimator fn ðxÞ for long memory random fields (Theorem 2.2). Then, applying this asymptotic property, we show a weak convergence theorem (Theorem 2.3) for the regression estimator gn ðxÞ. For the kernel density estimate, we also consider the case of linear random field with long memory. A weak convergence result is shown (Theorem 2.6). In Theorem 2.3 we see, among other things, that the limiting distribution of gn ðxÞ does not depend on the point x if the error field e has long memory. This property is in clear contrast to the well-known results for weakly dependent spatial data. The main results will be stated and proved in Section 2. Some related issues in the regression analysis will be discussed in Section 3 in connection with a simulation study on the XY model from statistical physics. The simulation results support our claims by showing satisfactory performance of the kernel estimates for long memory data. They also justifies the need for developing special methods for analyzing long memory spatial data.
2. Asymptotic properties of the kernel estimates The long memory property of a stationary random field can be characterized in various ways (see, for example, Gue´gan, 2005; Lavancier, 2006). For the main results of this paper to hold we assume that the explanatory random field X has a long memory property which is characterized by its pairwise joint distributions. This long memory property is defined in the following. For every point i 2 S, let si 2 Rd be its location in Rd . Let jsi sj j be the Euclidean distance between points i and j. Let fi;j be the joint density function of the random variables Xi and Xj and let Dðsi ; sj Þ ¼ sup
xi ;xj 2R
jfi;j ðxi ; xj Þ f ðxi Þf ðxj Þj : f ðxi Þf ðxj Þ
ARTICLE IN PRESS L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
839
We note that Dðsi ; sj Þ can be also written as supxi ;xj 2R jcðFðxi Þ; Fðxj ÞÞ 1j, where F is the distribution function of Xi , and cð; Þ is the density copula associated with the joint density function fi;j (see Granger, 2003; Gue´gan, 2005; Ibragimov and Lentzas, 2008, for the concept of the density copula). For a stationary random field as defined earlier in this paper, D depends only on the distance of the locations of the random variables Xi and Xj . In the following we will write an bn for any two real sequences fan g and fbn g to mean that there are constants c40 and Co1 such that cran =bn rC for all sufficiently large n. Definition 2.1. A stationary random field fXi ; i 2 Sg is long memory in distribution if for some b 2 ðd=2; dÞ and any iaj, Dðsi ; sj ÞRððsi sj Þ=jsi sj jÞjsi sj jd2b
ð4Þ d
as jsi sj j-1 with RðÞ being some positive, bounded and continuous function on Sd1 fy 2 R : jyj ¼ 1g. To see some examples of such random fields, we note that the stationary Gaussian random fields with the covariances satisfying the condition CovðXi ; Xj ÞRððsi sj Þ=jsi sj jÞjsi sj jd2b
ð5Þ
are long memory in distribution. Ising model (or autologistic model) on a regular square lattice with its temperature parameter at the critical point is another example of random field with long memory in distribution. To state and show the main results of this paper, we first note that, while in the literature the analysis on random fields are usually done based on the assumption that S is a regular lattice point set, many application data are collected at nonregular spatial points. For example, data might be collected according to zip codes, county borderlines, etc. In order to establish the asymptotic properties of the estimates for both regular and non-regular lattice data, we need to formulate our results for a more general S. On the other hand, it is clear that S cannot be completely arbitrary. We start with some basic assumptions on S. Let S be the set of points in Rd at which data ðYi ; Xi Þ, i 2 S, are collected. For each point i 2 S and integer kZ1, we define the k-neighborhood set of i as Bi;k ¼ fj 2 S : k 1ojsi sj jrkg. We assume S satisfies the condition that supjBi;k j ¼ Oðkd1 Þ:
ð6Þ
i2S
Let In , n ¼ 1; 2; . . . ; be a sequence of subsets of S such that I1 I2 In . Let dðIn Þ ¼ maxfjsi sj j : i; j 2 In g be the diameter of In . We assume that the number of points in each In satisfies the condition: jIn jnd
ð7Þ
dðIn Þ ¼ OðnÞ:
ð8Þ
and
It is not difficult to check that the usual regular lattice point sets and many non-regular lattice point sets without cumulating points satisfy these conditions. Next, let KðxÞZ0 be a kernel function. Throughout this paper, we assume the following regularity conditions for K: Assumption (A). (A1) K is integrable with bounded second derivative; R R (A2) K satisfies the conditions R tKðtÞ dt ¼ 0 and R t 2 KðtÞ dto1.
2.1. Strong consistency of the density estimator Let fn ðxÞ be the kernel estimator of f ðxÞ as defined in (2) with In given above. In this subsection, we prove a strong consistency theorem for fn ðxÞ for general long memory random fields. This result will allow us to establish a weak convergence property in the next subsection for the kernel regression estimator. We first give an estimate for the variance of the kernel density estimator fn ðxÞ. Lemma 2.1. Let X ¼ fXi ; i 2 Sg be a long memory random field as defined in Definition 2.1. Then there is an a40 such that for every x 2 R, Varðfn ðxÞÞ ¼ oðna Þ: Proof. Let Ui ¼
1 x Xi x Xi EK : K hn hn hn
ð9Þ
ARTICLE IN PRESS 840
L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
Since kernel K is integrable, we have, for some constant Co1, and all n and all x 2 R, EjUn jrC. Therefore 1 X 1 X EðUi Uj Þ ¼ ½EðUi Uj Þ ðEU i ÞðEU j Þ Varðfn ðxÞÞ ¼ 2 jIn j i;j2In jIn j2 i;j2In Z 1 X ¼ Ui Uj ½fi;j ðxi ; xj Þ jIn j2 i;j2In R2 Z 1 X C2 X C1 X jUi Uj jDðsi ; sj Þf ðxi Þf ðxj Þ dxi dxj r Dðsi ; sj Þr jsi sj jd2b f ðxi Þf ðxj Þ dxi dxj r 2 2 2 jIn j i;j2In R jIn j i;j2In jIn j2 i;j2In for some positive constant C1 o1, where the last estimate is due to the condition (4). Applying conditions (6)–(8), we have dðIn Þ dðIn Þ C1 X X C1 X X d2b jB jk r nd1 kd2b Oðnd2b Þ þ Oðn1 Þ; i;k jIn j2 i2In k¼1 jIn j2 i2In k¼1 P where we use the inequality ni¼1 ic o½ðn þ 1Þcþ1 1=ðc þ 1Þ. The lemma is proved by taking a 2 ð0; minf1; 2b dgÞ.
Varðfn ðxÞÞr
&
The estimate on the variance of the kernel density estimator in the previous lemma leads us to the following strong consistency theorem. Theorem 2.2. Let X ¼ fXi ; i 2 Sg be a long memory random field as defined in Definition 2.1. Then for every x, fn ðxÞ-f ðxÞ
a:s:
ð10Þ
Proof. For some large gZd 1 (we will determine its value latter), we are able to choose from the sequence fIn g a subsequence fInk g such that jInk Ink1 jkg :
ð11Þ
Then Ink ’s satisfy the properties: jInk j
k X
ig kgþ1
ð12Þ
i¼1
and, by (7), nk kðgþ1Þ=d :
ð13Þ
Now we show fn Eðfn Þ-0 a.s. as n-1. Fix an x and let Mk ¼ maxfjfj ðxÞ Eðfj ðxÞÞj : nk rjonkþ1 g: j
Let Ui be as defined in the proof of the previous lemma. Then corresponding to each sample point o there is a j0 ¼ j0 ðoÞ, nk rj0 onkþ1 , such that Mk ¼ jfj0 Eðfj0 Þj and X X 1 X 1 Mk ¼ U þ U jUi j: i i rjfnk Eðfnk Þj þ jIj0 j i2I jInk j i2I I i2Ij0 In n n n k
k
2
kþ1
k
Therefore, using ða þ bÞ r2a þ 2b and Lemma 2.1, we have 0 12 X 2jInkþ1 Ink j2 1 2 a @ jUi jA r2oðn EðU12 Þ: EðMk Þr2Varðfnk Þ þ 2E k Þþ jInk j i2I I jInk j2 2
2
nkþ1
nk
Let e40 be any small number and d 2 ð0; 1Þ. By Markov inequality PðMk Zkd=2 eÞr2kd
a oðn k Þ
e
2
þ 2kd
jInkþ1 Ink j2
e2 jInk j2
EðU12 Þ:
Now set g4maxfd 1; dð1 þ dÞ=a 1g, then d aðg þ 1Þ=do 1. We have, from (13), 1 X
a kd n k
k¼1
1 X
kdaðgþ1Þ=d o1;
ð14Þ
k¼1
and, from (11) and (12), 1 X k¼1
kd
jInkþ1 Ink j2 2
jInk j
1 X k¼1
kd2 o1:
ð15Þ
ARTICLE IN PRESS L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
841
Borel–Cantelli lemma implies that Mk -0, a.s. which in turn implies that fn ðxÞ Eðfn ðxÞÞ-0
a:s:
Finally, the strong consistency of the kernel estimator follows from this and the property (from (A1)): lim Eðfn ðxÞÞ ¼ f ðxÞ:
ð16Þ
n-1
This completes the proof.
&
Remark 2.1. It is clear from the proof that the strong consistency (10) holds for any random field satisfying (9). If in addition, f ðxÞ is continuous, then Pðfn ðxÞ-f ðxÞ; 8x 2 RÞ ¼ 1: At this point, we do not have any result on the convergence in distribution for fn. However, if X is a linear random field satisfying some additional conditions, then we do have a weak convergence result. Such a result is nice but will not be needed for the weak convergence of gn discussed in the next subsection. We will come back to this issue in Section 2.3. 2.2. Weak convergence of kernel regression estimator Let gn be the kernel regression estimator as defined in (3) and S a general point set satisfies the conditions given at the beginning of this section. With the consistency property of the kernel density estimator, we now establish a weak convergence theorem for gn. P The sample mean of the error random field en ¼ fei ; i 2 In g will play an important role here. Let e n jIn j1 i2In ei . As we will see, the existence of the limiting distribution of an appropriately scaled gn g depends on the existence of that of an appropriately scaled e n . In fact, the former, if exists, equals the later. Since it is known that the linear or Gaussian random fields with long memory (in covariance sense, as defined in (5)) have such convergence property, we allow e to be any one of these random fields in our theorem. That is, we assume that the error field e satisfies Covðei ; ej ÞRe ððsi sj Þ=jsi sj jÞjsi sj j2td
ð17Þ
as jsi sj j-1 for some 0oto1=2 with Re ðÞ being some bounded continuous function on Sd1 . Note that if the covariance of a random field satisfies (17), then this random field exhibits long memory in distribution. In the following, we first prove our theorem for a general error field e. The special cases of e will be considered briefly after that. Only weak consistency of the density estimator fn ðxÞ is required for the results of this subsection to hold. 2.2.1. A general case In what follows we will assume, in addition to the conditions we imposed in Lemma 2.1 and Theorem 2.2, the following conditions. Assumption (B). (B1) The regression function gðxÞ is twice differentiable with bounded, continuous and integrable derivatives. (B2) The joint density function fi;j of ðXi ; Xj Þ and the marginal density function f of Xi are twice differentiable with bounded, continuous and integrable derivatives. (B3) e is a 0 mean random field with the autocovariance function satisfying the condition (17). (B4) There is a 0oto1=2 such that jIn jt e n converges in distribution to a random variable x. 1=2 (B5) For the t given above, jIn jt h2n -0 and jIn jt1=2 hn -0. Two most commonly used random fields satisfying conditions (B3) and (B4) are the Gaussian random fields and the linear random fields. They will be discussed in the later subsections. Theorem 2.3. Under Assumptions (A), (B) and (10) we have, for any x 2 R, D
jIn jt ðgn ðxÞ gðxÞÞ!x: Proof. From (1) and (3) we note that ( ) 1 1 X 1 X ei Ki ðxÞ þ ðgðXi Þ gðxÞÞKi ðxÞ ; gn ðxÞ gðxÞ ¼ fn ðxÞ jIn jhn i2I jIn jhn i2I n
where we write Ki ðxÞ for Kððx Xi Þ=hn Þ. By standard kernel manipulations we have 2 h1 n EðKi ðxÞÞ ¼ f ðxÞ þ Oðhn Þ:
n
ARTICLE IN PRESS 842
L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
Thus, by (10), gn ðxÞ gðxÞ ¼ Note that (
1 þ oP ð1Þ f ðxÞ
(
e n f ðxÞ þ
) 1 X 1 X ei ½Ki ðxÞ EðKi ðxÞÞþoP ðjIn jt h2n Þ þ ðgðXi Þ gðxÞÞKi ðxÞ : jIn jhn i2I jIn jhn i2I n
1 X ðgðXi Þ gðxÞÞKi ðxÞ E jIn jhn i2I
8 1
)2
n
ð18Þ
n
9 = X þ E½ðgðXi Þ gðxÞÞðgðXj Þ gðxÞÞKi ðxÞKj ðxÞ : ; iaj
ð19Þ By Assumptions (A2) and (B2), we have E½ðgðXi Þ gðxÞÞ2 Ki2 ðxÞ Z xy f ðyÞdy ¼ ½gðyÞ gðxÞ2 K 2 hn R Z Z ¼ hn ½gðx hn uÞ gðxÞ2 K 2 ðuÞf ðx hn uÞdu ¼ h3n ðg 0 ðx ÞÞ2 u2 K 2 ðuÞf ðx hn uÞdu ¼ Oðh3n Þ; R
ð20Þ
R
where x ¼ x ðuÞ is a point between x and x hn u. Again by Assumptions (A2) and (B2) and standard kernel manipulations, Z xy xz K fi;j ðy; zÞdydz E½ðgðXi Þ gðxÞÞðgðXj Þ gðxÞÞKi ðxÞKj ðxÞ ¼ ðgðyÞ gðxÞÞðgðzÞ gðxÞÞK 2 hn hn R Z ¼ h2n ðgðx hn uÞ gðxÞÞðgðx hn vÞ gðxÞÞKðuÞKðvÞ R2
fi;j ðx hn u; x hn vÞdudv ¼ Oðh6n Þ:
ð21Þ
Combining (19)–(21), we obtain ( )2 1 X ðgðXi Þ gðxÞÞKi ðxÞ ¼ OðjIn j1 hn Þ þ Oðh4n Þ: E jIn jhn i2I
ð22Þ
n
This, together with Assumption (B5), implies that 1 X ðgðXi Þ gðxÞÞKi ðxÞ ¼ oP ðjIn jt Þ: jIn jhn i2I
ð23Þ
n
Next we show that 1 X e ½K ðxÞ EðKi ðxÞÞ ¼ oP ðmaxfðjIn jhn Þ1=2 ; ntdþd=2b gÞ ¼ oP ðjIn jt Þ: jIn jhn i2I i i
ð24Þ
n
In fact, by the independence of fei g and fXi g, we have !2 X X X ei ½Ki ðxÞ EðKi ðxÞÞ ¼ Eðe2i ÞE½ðKi ðxÞ EðKi ðxÞÞÞ2 þ Eðei ej ÞCovðKi ðxÞ; Kj ðxÞÞrjIn jEðe21 ÞEðK12 ðxÞÞ E i2In
i2In
þ
X iaj
þ
X iaj
Covðei ; ej Þ
Z R2
iaj
Ki ðxÞKj ðxÞjfi;j ðxi ; xj Þ f ðxi Þf ðxj Þjdxi dxj rOðjIn jhn Þ
Covðei ; ej ÞDðsi sj Þ
Z R2
Ki ðxÞKj ðxÞf ðxi Þf ðxj Þdxi dxj
¼ OðjIn jhn Þ þ OðjIn j2 h2n n2tdþd2b Þ; where the last equality is obtained from the estimates like those used in the proof of Lemma 2.1. This implies (24). Combining (18), (23), (24) and (B4), we show the Theorem 2.3. & Remark 2.2. From Theorem 2.3 we note that the limiting distribution of the kernel regression estimator does not depend on the point at which it is estimated. Actually, from the proof we see that the behavior of the estimator is asymptotically governed by the sample mean of the error term e n . Hence the asymptotic property of the regression estimator does not depend on whether or not the explanatory random field X is strongly correlated. This is coincide with the d ¼ 1 case under long memory, but in contrast with short memory case, see, e.g. Hidalgo (1997) and Hallin et al. (2004a).
ARTICLE IN PRESS L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
843
Next, we consider some special forms of the error field e ¼ fei ; i 2 Sg in our regression problem. 2.2.2. When e is a Gaussian random field Assume e is a zero-mean Gaussian random field which is stationary in the sense described in the Introduction section of this paper. In other words, we assume that all ei , i 2 S, have the same mean and variance, and that the covariance of any two random variables ei and ej depends only on their Euclidean distance but not on their actual locations. We further assume that
ge ðsi sj Þ ¼ Covðei ; ej ÞRe ððsi sj Þ=jsi sj jÞjsi sj ja ;
ð25Þ a=2d
where a40 and Re ðÞ is a continuous function on Sd1 . Then it is known that jIn j e n converges weakly to x ¼ c1 Z, where Z is the standard normal random variable and c1 is the standard deviation of ei (Dobrushin and Major, 1979). Theorem 2.3 has the following special form with t ¼ a=2d. Corollary 2.4. Under the conditions of Theorem 2.3 we have, for any x 2 R, D
jIn ja=2d ðgn ðxÞ gðxÞÞ!c1 Z: 2.2.3. When e is a linear random field Assume e is a linear random field, which means X ei ¼ bi;j zj ; i 2 S;
ð26Þ
j2S
where zj , j 2 S, are independent and identically distributed random variables with zero mean and variance one, bi;j are nonrandom weights of the form bi;j ¼ Be ððsi sj Þ=jsi sj jÞjsi sj jbe ;
i; j 2 S;
where be 2 ðd=2; dÞ and Be ðyÞ is a bounded piecewise continuous function on Sd1 . It is not difficult to see that
ge ðsi sj Þ ¼ Covðei ; ej ÞRe ððsi sj Þ=jsi sj jÞjsi sj jd2be as jsi sj j-1, where Re ðÞ is a continuous function on Sd1 . Therefore, in this case, the random field fei ; i 2 Sg is long memory in the covariance sense. Moreover, Ee 2n ¼ ðc22 þ oð1ÞÞnd2be as n-1, where c2 is a constant whose value can be calculated explicitly for some special case. For example, if fei ; i 2 Sg is a linear fractional ARIMA ð0; D; 0Þ field, then we have c22 ¼ GðDÞGð1 2DÞ=Gð1 DÞ with D ¼ 1 be =2. For the linear field (25), it is known that jIn jbe =d1=2 e n converges weakly to the random variable x ¼ c2 Z (Doukhan et al., 2002). Therefore, for t ¼ be =d 1=2, Theorem 2.3 implies Corollary 2.5. Under the conditions of Theorem 2.3 we have, for any x 2 R, D
jIn jbe =d1=2 ðgn ðxÞ gðxÞÞ!c2 Z:
2.3. Weak convergence of the density estimator for the linear fields over Z d In this last subsection, we return to the asymptotic behavior of the kernel density estimator fn. We are going to show a weak convergence result for the estimator. Here we consider the case when X is a linear random field and is defined over the lattice set S ¼ Zd . Let In ¼ f1; . . . ; ngd be a rectangular subset of Zd . Suppose X aj Ziþj ; i 2 Zd ; ð27Þ Xi ¼ j2Zd
where Zj , j 2 Zd , are independent and identically distributed random variables with zero mean and variance one, aj are nonrandom weights of the form aj ¼ BX ðj=jjjÞjjjb ;
j 2 Zd ;
where b 2 ðd=2; dÞ and BX ðyÞ is a bounded piecewise continuous function on Sd1 . We note that
gX ðjÞ ¼ CovðXi ; Xiþj ÞRX ðj=jjjÞjjjd2b
ARTICLE IN PRESS 844
L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
as jjj-1, where RX ðÞ is some continuous function on Sd1 . Moreover, if X n ¼ jIn j1 jIn j ¼ nd ), then
P
i2In
Xi is the sample mean (here
2
EX n ¼ ðc2 þ oð1ÞÞnd2b ;
ð28Þ
where c is a finite constant. Our result depends on an asymptotic property of empirical distribution function Fn of Xi . Let X IðXi rxÞ: Fn ðxÞ ¼ jIn j1 i2In
Corollary 1.2 of Doukhan et al. (2002) asserts that, under the conditions stated in (A3) below, for any e40 there exists CðeÞo1 such that P supjIn jb=d1=2 jFn ðxÞ FðxÞ þ f ðxÞX n j4e rCðeÞnk=2 ; ð29Þ x2R
where k ¼ minðb d=2; bdÞ. This estimate is the key to our result here. In addition to what we have assumed before, let us introduce some more assumptions. Assumption (A) (Continued). (A3) EjZ1 j2þd o1 and jEeiaZ1 jrCð1 þ jajÞd , for some constants C; d40 and a 2 R. (A4) The bandwidth sequence satisfies jIn jb=d1=2 h2n -0 and jIn jl hn -1, where 0olominfb=d 1=2; 1 b=dg. As stated in Giraitis and Surgailis (1999), Assumption (A3) is a rather weak restriction on the smoothness of the distribution of Z1 and it guarantees that the density function f is twice differentiable with nonzero, bounded, continuous and integrable derivatives. The theorem below asserts a weak convergence property of the fn . Theorem 2.6. If Assumption (A) holds and
R
R
KðtÞdt ¼ 1, then
D
jIn jb=d1=2 ðfn ðxÞ f ðxÞÞ!cf 0 ðxÞZ
ð30Þ 0
in the space D½1; 1, equipped with sup-norm, where f is the first derivative of the density function f, Z is the standard normal random variable and c is as in (28). Proof. Denote the normalized empirical density process as Dn ðxÞ ¼ jIn jb=d1=2 ðfn ðxÞ f ðxÞÞ;
x 2 R:
We first prove the following estimate: as n-1, supjDn ðxÞ þ jIn jb=d1=2 f 0 ðxÞX n j ¼ oð1Þ
a:s:
ð31Þ
x2R
By (29), standard arguments as in the proof of Theorem 2.2 of Ho and Hsing (1996) yield that supjFn ðxÞ FðxÞ þ f ðxÞX n j ¼ oðjIn j1=2b=dl Þ
a:s:
ð32Þ
x2R
where 0olominfb=d 1=2; 1 b=dg. From the definitions of Fn ðxÞ and fn ðxÞ and the estimate (32), integration by parts gives, Z Z supjjIn j1=2b=d Dn ðxÞ þ f 0 ðxÞX n j ¼ supjX n ðf 0 ðx hn tÞ f 0 ðxÞÞKðtÞdtj þ suph1 Sn ðx hn tÞdKðtÞ n x2R
x2R
x2R
R
R
Z h2 þ sup n f 00 ðx Þt 2 KðtÞdt x2R 2 R Z Z h2 1=2b=dl ¼ sup hn jX n j f 00 ðx ÞtKðtÞdt þ sup n f 00 ðx Þt 2 KðtÞdt þ Oðh1 Þ; n jIn j 2 x2R
R
x2R
ð33Þ
R
where jx xjrjhn tj, jx xjrjhn tj and Sn ðxÞ ¼ Fn ðxÞ FðxÞ þ f ðxÞX n . Moreover, along similar lines to the proof of Theorem 2.1 in Wang et al. (2003), we obtain jX n j ¼ OðjIn j1=2b=d ðlog logjIn jÞ1=2 Þ
a:s:
ð34Þ
ARTICLE IN PRESS L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
845
Thus, by (33) and (34), we have l supjDn ðxÞ þ jIn jb=d1=2 f 0 ðxÞX n j ¼ Oðhn ðloglogjIn jÞ1=2 Þ þ Oðh2n jIn jb=d1=2 Þ þ Oðh1 n jIn j Þ
a:s:
ð35Þ
x2R
Hence (35) implies (31). As shown in Corollary 1.3 of Doukhan et al. (2002), we know that jIn jb=d1=2 X n converges weakly to the random variable cZ. Hence, in virtue of (31), we have D
Dn ðxÞ!cf 0 ðxÞZ
ð36Þ
in the space D½1; 1, equipped with sup-norm. The weak convergence (30) follows from (36).
&
Remark 2.3. From Theorem 2.6 and its proof we see that, quite unlike the independent or weakly dependent cases, when appropriately normalized, the empirical density process Dn ðxÞ with long memory data converges weakly to the Dehling–Taqqu type limiting process (see Dehling and Taqqu, 1989), i.e., multiplications of a non-random function with a random variable. This phenomenon has also been seen in studying the density estimation for long memory time series, see, e.g. Ho and Hsing (1996). The results obtained here for long memory random fields are the analogs of those of Ho and Hsing (1996). 3. A simulation study and some related issues We now report the results of a simulation study to check the performance of the kernel regression estimates. We choose the two-dimensional ferromagnetic XY model from statistical physics as our testing random field for X ¼ fXi ; i 2 Z2 g. The joint probability density function of the random field is given by 0 1 X 1 @ pðxÞ ¼ exp r cosðxi xj ÞA; xi ; xj 2 R=½0; 2pÞ; C js s j¼1 i
j
where C is the normalizing constant and r is the inverse temperature. The model can be viewed as a generalization of the Ising model (also known as the autologistic model, Besag, 1974) and a special case of the OðnÞ model. These models describe the spin–spin interactions of large physical systems. The XY model is famous for its lack of Ising type ‘‘symmetry breaking’’ and for its possession of Kosterlitz–Thouless type phase transition (Kosterlitz and Thouless, 1978). The later is equivalent to the long memory property in the covariance sense (as defined in (5)) and is experimentally observed in liquid crystal displays (LCD) and other physical environments. The long memory characteristic of the XY model emerges when its inverse temperature r exceeds a critical point believed to be somewhere in a neighborhood of 0.89. Although it is an open problem if the XY model satisfies the condition (4), it is easy to see due to condition (5) that the XY model has a memory at least as long as a random field that satisfies the Definition 2.1 (the condition CovðXi ; Xj ÞC 0 jsi sj jd2b implies Dðsi sj ÞZC 00 jsi sj jd2b for some C 0 ; C 00 40). Therefore while the key results of this paper are proved based on the condition that X is long memory in distribution, we do have an example showing that the asymptotic properties of the kernel regression estimate hold at least for some random fields with longer memories. To generate data for study, we first draw samples from the XY model in a rectangular region A Z2 of size 256 256. This can be done using a standard Gibbs sampler with sufficiently large number of iterations. Since some of our simulations are done in a region of inverse temperature relatively close to the critical region, a random cluster algorithm is used (Cai, 2009). Next, we generate samples from the error field e over A from a linear random field as defined in (26), where we set bi;j ¼ ji jj1:5 , which results the autocovariance function ge ðrÞ having decay rate ge ðrÞ ¼ Oðr 1 Þ. We then set Yi ¼ 3sinðXi Þ þ ei ;
i 2 A:
To obtain non-regular lattice data after the regular lattice data are generated, we randomly choose points from A to form In and then draw data ðXi ; Yi Þ from In . The sample sizes of the final data are jIn j ¼ 128 128 and 32 32 (this later data set is drawn from a 64 64 sub-lattice of A), respectively. All the computations are carried out with SPLUS and C. Data are obtained at inverse temperatures r ¼ 0:75, 0.95, 1.1 and 5.0, respectively. It is known that the XY model with the inverse temperature ro0:88 is a short memory process (Aizenman and Simon, 1980), while that with r40:90 is believed to be a long memory process. 3.1. Detecting long memory property The first question in studying a long memory random field is if we can detect the long memory property for the field with data. For this purpose we borrow some methods developed for the one-dimensional time series. Giraitis et al. (2003) proposed a rescaled variance test based on V/S statistic which can be defined as follows. Suppose S ¼ Zd and P P In ¼ fsi : si 2 S; jsi jrng. For any random field fXi ; i 2 Sg, let Xk ¼ i2Ik ðXi X n Þ; where krn and X n ¼ jIn j1 i2In Xi . Define X S2n ¼ jIn j1 ðXi X n Þ2 i2In
ARTICLE IN PRESS 846
L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
and 2 S^ n ¼ jIn j1
X ðXi X n Þ2 : i2In
The V/S statistic can be defined as 2
Vn ¼ jIn j1 S^ n =S2n : For d ¼ 1 Giraitis et al. (2003) showed that if the given time series is short memory then Vn has the same asymptotic distribution as that of the standard Kolmogorov statistic. On the other hand, if the time series has long memory with the long memory parameter b, then jIn j2b=d2 Vn converges to a non-degenerated random variable different from the Kolmogorov statistic. Thus the V/S test using Vn can be easily carried out using a Kolmogorov statistic table. To estimate the long memory parameter b one simple method is to use the V/S statistic in the following way. A plot of logðVk Þ against logðjIk jÞ scatters the points around a straight line with slope 2 2b=d under long memory. Therefore a rough estimate of b can be obtained as the slope of this line. It seems that with some slight modifications, the methods described above can be applied to the random fields. To the data we generate, we first apply the techniques described above to check for long memory property and to estimate the long memory parameter. The results are shown in Table 1. From Table 1 we see that with the V/S test statistics Vn ¼ 0:442, we cannot reject the hypothesis that the XY model is a short memory process when the true inverse temperature r ¼ 0:75, since the critical limit for the V/S test under significance level 0.05 is 2.285. On the other hand, the hypothesis of short memory is rejected for the cases where r ¼ 0:95, 1.1 and 5.0. We also see that the long range dependency of the random field gets stronger (when b gets more close to 1) as the inverse temperature r gets larger, in agreement with the known qualitative property of the XY model.
Table 1 Test and estimation for long memory of the XY model.
r
0.75
0.95
1.1
5.0
Vn
0.442
10.491 1.179
11.449 1.121
80.147 1.096
b
The values of the V/S test statistics and the estimators of the long memory parameter for the XY model with inverse temperatures r ¼ 0:75, 0.95, 1.1 and 5.0.
Table 2 Kernel regression estimates and confidence intervals with jIn j ¼ 128 128 and r ¼ 0:75. x
True value
Estimate
CI
2 2.5 3 3.5 4
2.7279 1.7954 0.4234 1.0524 2.2704
2.8139 1.8783 0.5438 1.0059 2.1341
(2.4171, (1.4815, (0.1470, (1.4026, (2.5308,
Square-root-n CI 3.2106) 2.2750) 0.9405) 0.6091) 1.7373)
(2.7745, (1.8389, (0.5044, (1.0452, (2.1734,
2.8532) 1.9176) 0.5831) 0.9665) 2.0947)
Kernel regression estimators and confidence intervals with square-root-n CI and the true value of the regression function gðxÞ ¼ 3sinðxÞ, for sample size jIn j ¼ 128 128 with fXi g being the XY model with inverse temperature r ¼ 0:75, using non-lattice data.
Table 3 Kernel regression estimates and confidence intervals with jIn j ¼ 128 128 and r ¼ 1:1. x
True value
Estimate
CI
2 2.5 3 3.5 4
2.7279 1.7954 0.4234 1.0524 2.2704
2.8907 1.5742 0.2789 1.1574 2.3104
(2.4939, (1.1774, (0.1178, (1.5541, (2.7071,
Square-root-n CI 3.2874) 1.9709) 0.6756) 0.7606) 1.9136)
(2.8513, (1.5348, (0.2395, (1.1967, (2.3497,
2.9300) 1.6135) 0.3182) 1.1180) 2.2710)
Kernel regression estimators and confidence intervals with square-root-n CI and the true value of the regression function gðxÞ ¼ 3sinðxÞ, for sample size jIn j ¼ 128 128 with fXi g being the XY model with inverse temperature r ¼ 1:1, using non-lattice data.
ARTICLE IN PRESS L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
847
3.2. Confidence interval Using the same data, we also estimate the regression function. The kernel estimate gn as given in (3) is used with a Gaussian kernel. A data-driven choice of the bandwidth in this context would be desirable. However, in view of the lack of theoretical results on this issue, we choose a fixed bandwidth hn ¼ 0:5 in all of our estimations. Theorem 2.3 suggests a ð1 aÞ100% asymptotic confidence interval (CI) for gðxÞ: gn ðxÞ7jIn jt x1a=2 ;
5
y
0
-5
-10 0
1
2
3
4
5
6
x Fig. 1. Kernel regression estimate and the confidence band with the actual regression curve (dotted line) gðxÞ ¼ 3sinðxÞ, for sample size jIn j ¼ 32 32 and the inverse temperature r ¼ 0:75. The scatter plot shows the observations ðXi ; Yi Þ.
10
y
5
0
-5
-10
0
1
2
3
x
4
5
6
Fig. 2. Kernel regression estimate and the confidence band with the actual regression curve (dotted line) gðxÞ ¼ 3sinðxÞ, for sample size jIn j ¼ 128 128 and the inverse temperature r ¼ 0:75. The scatter plot shows the observations ðXi ; Yi Þ.
ARTICLE IN PRESS 848
L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
where x1a=2 is the 1 a=2 quantile of the random variable x. However, the values of t and x1a=2 depend on the limiting distribution of jIn jt e n and are generally unknown. The existing methods for estimating parameters of a long memory random field are mostly restricted to d ¼ 1 case. For example, if e is a Gaussian time series with long memory, then x1a=2 ¼ c1 z1a=2 and t ¼ a=2, where a and c1 are the parameters one needs to estimate. It is known that in this case the usual sample standard deviation is not an efficient estimator of c1 , but instead the maximum likelihood method should be used to estimate both c1 and a simultaneously. Approximation methods are proposed in the literature to solve this MLE problem. For more details on this approach we refer the readers to Beran (1994). The general question of how to apply time series methods to higher dimensional random fields deserves further exploration. For the linear random fields we have the following observations.
5
y
0
-5
-10 0
1
2
3
x
4
5
6
Fig. 3. Kernel regression estimate and the confidence band with the actual regression curve (dotted line) gðxÞ ¼ 3sinðxÞ, for sample size jIn j ¼ 32 32 and the inverse temperature r ¼ 1:1. The scatter plot shows the observations ðXi ; Yi Þ.
5
y
0
-5
-10
0
1
2
3
x
4
5
6
Fig. 4. Kernel regression estimate and the confidence band with the actual regression curve (dotted line) gðxÞ ¼ 3sinðxÞ, for sample size jIn j ¼ 128 128 and the inverse temperature r ¼ 1:1. The scatter plot shows the observations ðXi ; Yi Þ.
ARTICLE IN PRESS L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
849
Suppose the error terms fei g are from a linear long memory random field with long memory parameter be, then an approximate 100ð1 aÞ% confidence interval for gðxÞ is given by gn ðxÞ7c2 jIn j1=2be =d z1a=2 :
ð37Þ
Here parameters be and c2 are unknown. To use this CI we have to insert consistent estimates of be and c2 . We can estimate be in a similar way we estimate b. A semiparametric estimation method studied in Frı´as et al. (2008) can also be used. For c2 , we can derive a formula based on a method used in some similar computations for linear fractional ARIMA ð0; D; 0Þ process (Hosking, 1981). For the error field generated above we obtained that c22 ¼ GðDÞGð1 2DÞ=Gð1 DÞ with be ¼ 1:5 and D ¼ 1 be =2. The simulation results are displayed in Tables 2–3 and Figs. 1–4. We use Tables 2–3 to compare the confidence intervals obtained using the current analysis on long memory data with the typical method for short memory data. The square-rootn confidence intervals, gn ðxÞ7se jIn j1=2 z0:975 , are calculated and shown in the tables along with the intervals given in (37). Here se is the sample standard deviation based on the regression residuals. The long memory effect of the error field e on the interval estimation is clear. At almost all the displayed points, the true values of the regression function fall out of the square-root-n intervals. It shows that with long memory noise, the usual square-root-n confidence interval is too narrow to catch the true values. It therefore demonstrates a need to use statistical methods which are specific to the long memory random fields. It also shows that the confidence interval proposed in this simulation works reasonably well for long memory data. Figs. 1–4 display the kernel estimates of the regression function and the confidence bands for the estimates. These figures show generally good qualities of the estimates. This is the case even for sample sizes as small as 32 32. Note that, although we use short memory ðr ¼ 0:75Þ as well as long memory ðr ¼ 1:10Þ explanatory fields to generate our data, the error random field e is long memory (in the covariance sense) in both cases. From the simulations we also observe that, if the explanatory field X is long memory, the data points ðXi ; Yi Þ tend to be very dense in one region of the space and quite sparse in other regions. A consequence of this uneven distribution of data points is that the estimates become less accurate in the regions with fewer data points. Note that for long memory data, regions with sparse data points are not necessary the ones with lower probability densities. References Aizenman, M., Simon, B., 1980. A comparison of plane rotor and Ising models. Phys. Lett. A 76, 281–282. Beran, J., 1994. Statistics for Long-Memory Processes. Chapman & Hall, New York. Besag, J., 1974. Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. B 36, 192–236. Boente, G., Fraiman, R., 1988. Consistency of a nonparametric estimate of a density function for dependent random variables. J. Multivariate Anal. 25, 90–99. Cai, H., 2009. A random cluster algorithm for some continuous Markov random fields. J. Statist. Comput. Simul., in press. Carbon, M., Tran, L.T., Wu, B., 1997. Kernel density estimation for random fields. Statist. Probab. Lett. 36, 115–125. Carbon, M., Francq, C., Tran, L.T., 2007. Kernel regression estimation for random fields. J. Statist. Plann. Inference 137, 778–798. Cressie, N.A.C., 1991. Statistics for Spatial Data. Wiley, New York. + S., Mielniczuk, J., 1995. Density estimation under long-range dependence. Ann. Statist. 23, 990–999. ¨ o, Csorg + S., Mielniczuk, J., 1999. Random-design regression under long-range dependent errors. Bernoulli 5, 209–224. ¨ o, Csorg Dehling, H., Taqqu, M.S., 1989. The empirical process of some long-range dependent sequences with an application to U-statistics. Ann. Statist. 17, 1767–1783. Dobrushin, R.L., Major, P., 1979. Non-central limit theorems for non-linear functions of Gaussian fields. A. Wahrsch. Verw. Geb. 50, 27–52. Doukhan, P., Lang, G., Surgailis, D., 2002. Asymptotics of weighted empirical processes of linear fields with long-range dependence. Ann. I. H. Poincare´-PR 38, 879–896. ¨ Doukhan, P., Oppenheim, G., Taqqu, M.S., 2003. Theory and Applications of Long-range Dependence. Birkhauser, Boston. Frı´as, M.P., Alonso, F.J., Ruiz-Medina, M.D., Angulo, J.M., 2008. Semiparametric estimation of spatial long-range dependence. J. Statist. Plann. Inference 138, 1479–1495. Giraitis, L., Kokoszka, P., Leipus, R., Teyssie re, G., 2003. Rescaled variance and related tests for long memory in volatility and levels. J. Econometrics 112, 265–294. Giraitis, L., Surgailis, D., 1999. Central limit theorem for the empirical process of a linear sequence with long memory. J. Statist. Plann. Inference 80, 81–93. Granger, C.W., 2003. An introduction to long-range time series models and fractional differencing. J. Time Ser. Anal. 1, 15–30. Gue´gan, D., 2005. How can we define the concept of long memory? An econometric survey. Econometric Rev. 24, 113–149. Guyon, X., 1995. Random Fields on a Network. Springer, New York. Haining, R., 2003. Spatial Data Analysis: Theory and Practice. Cambridge University Press, Cambridge. Hallin, M., Lu, Z., Tran, L.T., 2001. Density estimation for spatial linear processes. Bernoulli 7, 657–668. Hallin, M., Lu, Z., Tran, L.T., 2004a. Local linear spatial regression. Ann. Statist. 22, 2469–2500. Hallin, M., Lu, Z., Tran, L.T., 2004b. Kernel density estimation for spatial processes: the L1 theory. J. Multivariate Anal. 88, 61–75. Hidalgo, J., 1997. Non-parametric estimation with strongly dependent multivariate time series. J. Time Ser. Anal. 18, 95–122. Ho, H.-C., Hsing, T., 1996. On the asymptotic expansion of the empirical process of long-memory moving averages. Ann. Statist. 24, 992–1024. Hosking, J.R.M., 1981. Fractional differencing. Biometrika 68, 165–176. Ibragimov, R., Lentzas, G., 2008. Copulas and long memory. Harvard Institute of Economic Research Discussion Paper No. 2160, Available at SSRN: /http:// ssrn.com/abstract=1185822S. Ivanov, A.V., Leonenko, N.N., 1989. Statistical Analysis of Random Fields. Kluwer, Dordrecht. Kosterlitz, J.M., Thouless, D.J., 1978. Two-dimensional physics. Progress in Low Temperature Physics, vol. VIIB. North-Holland, Amsterdam, p. 371. Lavancier, F., 2006. Long memory random fields. Dependence in Probability and Statistics. Lecture Notes in Statistics, vol. 187. Springer, New York, pp. 195–220. Leonenko, N.N., 1999. Random Fields with Singular Spectrum. Kluwer, Dordrecht. Lu, Z., Chen, X., 2004. Spatial kernel regression estimation: weak consistency. Statist. Probab. Lett. 68, 125–136.
ARTICLE IN PRESS 850
L. Wang, H. Cai / Journal of Statistical Planning and Inference 140 (2010) 837–850
Lu, Z., Cheng, P., 1997. Distribution-free strong consistency for nonparametric kernel regression involving nonlinear time series. J. Statist. Plann. Inference 65, 67–86. ¨ Masry, E., Gyorfi, L., 1987. Strong consistency and rates for recursive density estimators for stationary mixing processes. J. Multivariate Anal. 22, 79–93. Oliver, S., Carol, A.G., 2005. Statistical Methods for Spatial Data Analysis. Chapman & Hall, CRC Press, London, Boca Raton, FL. Palma, W., 2007. Long-memory time series. Theory and Methods. In: Wiley Series in Probability and Statistics. Wiley-Interscience, Wiley, Hoboken. Ripley, B., 1981. Spatial Statistics. Wiley, New York. Truong, Y.M., Stone, C.J., 1992. Nonparametric function estimation involving time series. Ann. Statist. 20, 77–97. Wang, Q., Lin, Y., Gulati, M., 2003. Strong approximation for long-memory processes with applications. J. Theoret. Probab. 16, 377–389. Wu, W.B., Mielniczuk, J., 2002. Kernel density estimation for linear processes. Ann. Statist. 30, 1441–1459.