Statistical analysis for rounded data

Statistical analysis for rounded data

Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542 Contents lists available at ScienceDirect Journal of Statistical Planning and ...

489KB Sizes 3 Downloads 197 Views

Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Statistical analysis for rounded data Zhidong Baia, b,1 , Shurong Zhenga,1 , Baoxue Zhanga,∗,2 , Guorong Hua,1 a b

Key Laboratory for Applied Statistics of MOE, Department of Statistics, Northeast Normal University, Changchun, 130024 Jilin Province, PR China DSAP & RNI, National University of Singapore, Singapore

A R T I C L E

I N F O

Article history: Received 17 May 2006 Received in revised form 9 July 2008 Accepted 20 November 2008 Available online 20 January 2009 Keywords: AR(p) model MA(q) model AMLE Asymptotic normality Rounded data

A B S T R A C T

When random variables do not take discrete values, observed data are often the rounded values of continuous random variables. Errors caused by rounding of data are often neglected by classical statistical theories. While some pioneers have identified and made suggestions to rectify the problem, few suitable approaches were proposed. In this paper, we propose an approximate MLE (AMLE) procedure to estimate the parameters and discuss the consistency and asymptotic normality of the estimates. For our illustration, we shall consider the estimates of the parameters in AR(p) and MA(q) models for rounded data. © 2009 Elsevier B.V. All rights reserved.

1. Introduction When random variables do not take discrete values, observed data are often the rounded values of continuous random variables. The rounding of data may be due to the precision of experimental instruments or the recording/storage mechanism. For example, modern computers have some computing limits in precision. Blood pressures in patients are only reported to a multiple of 5 mmHg by doctors. While some rounding errors may sometimes be small, e.g. measurements may be accurate to 10−3 mm in high technological fields, some rounding errors may be very large, e.g. GDPs of a country may be rounded at tens of million dollars. But nevertheless, rounded data often occur in day-to-day measurements. In classical statistical theories, however, the rounding errors are often ignored. In the earlier days, rounding errors do not seem to pose a serious problem as the sample size is small and an approximate result is acceptable by practitioners. In these days, however, statisticians are faced with huge data sets due to the rapid developments of computer technologies which makes it possible to collect, store and deal with huge data sets, e.g. in biological sciences, economics, finance and the wireless communications, etc. More precise results are desired because huge data sets are available. It is therefore natural to ask how serious the rounding errors affect the accuracy of statistical inference. Tricker (1990a, b) showed that there were some problems between traditional statistical methods and their applications to rounded data and statisticians would need to reinstitute the original model according to true observed values. In the recent decades, the topic on data mining becomes very popular. However, there were few discussions on the rounding errors found in the literature even though rounding errors are strongly influential to the inferences. The aim of this paper is to explore new methodologies of higher performance accuracy of statistical inferences when rounding errors are involved. ∗ Corresponding author. E-mail address: [email protected] (B. Zhang). 1 The work of this author was supported partially by the National Natural Science Foundation of China 10571020, 10501005, 10701021, NENU-STC07001 and NUS Grant R-155-000-061-112. 2 The project was supported by the National Natural Science Foundation of China 10871037. 0378-3758/$ - see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.11.018

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

2527

Researchers have noticed that the rounding errors may cause serious effects to statistical inferences. The earliest discussion on rounded data dates back to at least Sheppard (1898). Recent work can be found in Hall (1982) that examined the influence of rounding errors on nonparametric density. Dempster and Rubin (1983) compared three simple approaches in dealing with rounding errors in a linear regression model, namely, the usual least squares estimate and those with BRB and Sheppard corrections. The latter two methods were developed on the assumption that the rounding errors were uniformly distributed over a symmetric interval with the unit length of the rounding precision and were independent of the rounded data and the original sample, respectively. However, it is not difficult to examine the invalidity of these assumptions. Heitjan (1989) gave a detailed review of the most popular approaches in dealing with rounded data. Heitjan and Rubin (1991) presented a general model for coarsened data that included rounded, heaped, censored and missing data. A few of the earlier works, other than some exceptions that used full MLE on grouped data, had provided consistent estimators. It was until the recent works of Lee and Vardeman (2001, 2002, 2003) that investigated confidence interval estimation of the parameters  and 2 when a rounded sample comes from the N(, 2 ) and the extension to the balanced one-way ANOVA with random effects. Wright and Bray (2003) showed the danger of ignoring heaping before presenting a case-study of the ultrasound measurements. They analyzed a mixture model for rounded data by Bayes approach using the Gibbs sampling. The general issue of rounding continuous data has been indispensable for practical experts. Some recent references can be made to Tricker et al. (1998), Vardeman (2005) and Vardeman and Lee (2005). For early discussions and bibliographies, one can refer to Fisher (1936) and Haitovsky (1982). In this paper, we proposed to use MLE or approximated MLE (AMLE) to estimate the unknown parameters based on the rounded data and study the statistical inferences for the cases whereby the rounded data sets are from time series models such as AR(p) and MA(p). The organization of the paper is as follows. In Section 2, we propose to use AMLE to estimate the unknown parameters of rounded data from an AR(p) or MA(p) model and prove the consistency and asymptotic normality of new estimators. To illustrate the proposed procedure, two examples are presented in Section 3. Some simulation results will be presented in Section 4. Comments and conclusions are given in Section 5 and technical proofs are included in the Appendix. 2. Rounded data from MA(p) model and AR(p) model 2.1. MA(p) model We consider an MA(p) model such that Xt = c + t + 1 t−1 + · · · + p t−p , i.i.d.

where t ∼ N(0, 2 ) for t = 1, . . . , n. Let / = (1 , . . . , p ). Here, (X1 , . . . , Xn ) is normally distributed as N(c1n , Rn×n ) where 1n is a column vector formed by n ones, Rn×n = (ij )n×n , ij = |i−j| , ⎡

0 ⎢  ⎢ 1

Rn×n = ⎢ ⎢ .. ⎣

.

1 0 .. .

n−1 n−2

... p ... p−1 .. .. . . . . . n−p−1

⎤ . . . n−1 . . . n−2 ⎥ ⎥ .. .. ⎥ ⎥ . . ⎦ ... 0

and

i =



2 (i + i+1 1 + i+2 2 + · · · + p p−i ), i = 0, 1, . . . , p, 0, i > p.

X1 , . . . , Xn are observed. Without loss of generality, we assume that the data are rounded off with 0 = 1. Here, only rounded data to integer values. Otherwise we divide the data by the rounding unit. For simplicity, let pi P(ij − 0.5  Xj < ij + 0.5,

j = 1, . . . , p + 1),

p+1 where i = (i1 , . . . , ip+1 ) and ij takes integer values. Let Ai denote the rectangle Ai = j=1 [ij − 0.5, ij + 0.5). Xn , n = 1, 2, . . .} are p-dependent sequences. We can therefore split It is obvious that the sequence of {Xn , n = 1, 2, . . .} and hence { the data sequence into p + 1 subsequences of i.i.d. random variables and estimate the parameters based on each subsequence. We shall propose the following estimation approach. Define p + 1 subsets of the rounded data as follows: Sub-samples: Xp+1 X1 · · · Xp+2 X2 · · · ··· X2p+1 (p + 1) Xp+1 · · · (1) (2)

where m = [(n − p)/(p + 1)].

X2p+2 · · · X3p+2 X2p+3 · · · X3p+3 ··· X3p+2 · · · X4p+2

··· X(m−1)(p+1)+1 · · · Xm(p+1) ; ··· X(m−1)(p+1)+2 · · · Xm(p+1)+1 ; ··· ··· Xm(p+1) · · · Xm(p+1)+p ;

2528

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

Estimation procedure based on X1 , . . . , Xn : Xp+1 ) ( X2p+2 · · · X3p+2 ) · · · ( X(m−1)(p+1)+1 · · · Xm(p+1) ) form a sample of m i.i.d. p + 1-dimensional random Step 1: Note that ( X1 · · · vectors. Denote the frequency of i in this sub-sample by ni . Based on this sub-sample, the MLE of parameters (c, /, 2 ) can be 2 , c1 , / obtained by maximizing ni log pi . We shall denote the MLE by ( 1 1 ). 2 2 , Step 2: Similarly, by the j-th subset of the sample, we can construct an MLE ( cj , / j j ), j = 2, . . . , p + 1 of the parameters (c, /,  ). Step 3: Take the averages of the MLE's as our estimators of parameters, i.e. c=

p+1

ci /(p + 1),

= /

i=1

p+1

/(p + 1), / 2 = i

i=1

p+1

2i /(p + 1).

i=1

, We denote the AMLE by ( c, / 2 ). , Theorem 2.1. The estimates ( c, / 2 ) obtained by the proposed estimation procedure based on X1 , . . . , Xn are consistent. Proof. Since Gaussian MA(p) time series model is (p + 1)-dependent and strictly stationary, the rounded sample ( X1 · · · Xp+1 ), 2 , (X2p+2 · · · X3p+2 ), . . . , (X(m−1)(p+1)+1 · · · Xm(p+1) ) are independent and identically distributed. Therefore the estimate(c1 , / 1 1 ) are 2 , strongly consistent. The strong consistency property holds also for other estimates ( cj , /  ), j = 2, . . . , p + 1. Then when j j , n → +∞, we know that the estimates ( c, / 2 ) obtained by the proposed estimation procedure based on X1 , . . . , Xn are strong consistent. 

, Theorem 2.2. The estimates ( c, / 2 ) are asymptotically multivariate normally distributed, that is, √



⎞ c−c − / ⎠ ∼ N(0, G(h)), n⎝ / 2 − 2

where h = (c, /, 2 )T , G(h) = I−1 (h)Vp (h)I−1 (h) and ∞

I(h) =

pi (h)−1

i=−∞

Vp (h) = I(h) + 2

jpi (h) jpi (h) , jh jhT

p

−1 P(Y1 ∈ Ai1 , Yt ∈ Ai2 )p−1 i1 (h)pi2 (h)

t=2 i1 ,i2

jpi1 (h) jpi2 (h) . jh jhT

Xj · · · Xp+j ), ( X(p+1)+j · · · X(p+1)+p+j ), . . . , ( X(m−1)(p+1)+j Proof. Let the log-likelihood be Lj (c, /, 2 ) based on the jth sub-sample ( · · · X(m−1)(p+1)+p+j ) for j = 1, . . . , p + 1. The Taylor expansion of jLj (h)/ jh at h gives 



0=

jLj (h)  jLj (h) j2 Lj (h)  = +  · (hj − h), jh hˆ j jh jh2 hˆ ∗j

where hj is a point on the segment connecting h and hˆ j . Then we obtain that ∗

⎧ ⎫ −1 p+1 p+1 j2 Lj (h) jLj (h) ⎬ 1 ⎨ 1 ˆ . (hj − h) = − ⎩ p+1 p+1 jhjhT ∗ jh ⎭ j=1

hj

j=1

That is, √

⎧ ⎫ −1 p+1 2

⎨ 1 1 jLj (h) ⎬ 1 j Lj (h) ˆ n(h − h)  ·√ − . m jhjhT m jh ⎭ p + 1 j=1 ⎩ h∗ j

Since hˆ j is consistent, we can prove that −(1/m)j2 Lj (h)/ jhjh |h∗j −→ I(h) where I(h) is the Fisher information matrix of h based on the joint pdf of ( X1 · · · Xp+1 ). Let Yj = ( Xj , . . . , Xp+j ) for j = 1, . . . , n − p. Using this notation, we shall have T



n(hˆ − h) =

n−p jpi (h) I−1 (h) I(Yj ∈Ai ) pi (h)−1 . √ jh n i j=1

a.s.

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

2529

Note that the above sum of p-dependent sequence is normalized and hence the CLT holds. Thus to complete the proof of the theorem, one needs only to compute the asymptotic variance. By standard central limiting theorem, one can show that √

n(hˆ − h) −→ N(0, I−1 (h)Vp (h)I−1 (h)). L



Remark 2.1. When p = 0 or 1, our method works well. When p  2, the computation of the AMLE become time consuming even though the theoretical proofs hold. The computation problem is solved in Zhang et al. (2009). 2.2. AR(1) model Suppose {Xt } is a causal AR(p) process, Xt+1 = c + 1 · Xt + · · · + p Xt−p+1 + t+1 , i.i.d.

where t ∼ N(0, 2 ) for t = 1, . . . , n. The causality assumption implies that 1 − 1 z − · · · − p zp  0 for |z|  1. Then (X1 , . . . , Xn ) are normally distributed as (X1 , . . . , Xn ) ∼ N(1n , Rn×n ), where  = c/(1 − 1 − · · · − p ).

Now, we assume that only data rounded off to integer values are available. That is, X1 , . . . , Xn are observed where Xi are the value of Xi rounded to its nearest integer for i = 1, . . . , n. If the rounding errors were ignored, the usual estimates such as the estimators using Yule–Walker equation are not consistent. Some simulation results will be presented in Section 3. In this section, we will look for better estimates which are consistent and asymptotically normal. Our problem is to find new estimates of the parameter vector / = (c, 1 , . . . , p ) , and the innovation variance 2 . We propose a new procedure, called the AMLE, to estimate the parameters (c, 1 , . . . , p , 2 ) based on the observed rounded data. For simplicity, we only give the case p = 1, that is, the AR(1) model. Estimating procedures for the general case p  2 may be similarly derived. In a causal AR(1) model Xt+1 = c +  · Xt + t , i.i.d.

where || < 1 and t ∼ N(0, 2 ) for t = 1, . . . , n − 1. With this, we have (X1 , . . . , Xn ) ∼ N(1n , Rn×n ), where  = c/(1 − ) and ⎡

2 Rn×n = 2 1−

1

⎢  ⎢ ⎢ 2  ·⎢ ⎢ ⎢ .. ⎣ .

 1

 .. .

2  1 .. .

n−1 n−2 n−3

··· ··· ··· .. . ···



n−1 n−2 ⎥ ⎥ n−3 ⎥ ⎥. ⎥ .. ⎥ . ⎦ 1

(1)

More specifically, we have       2  1  Xt · ∼N , ,  1 − 2  1 Xt+1 where  is the correlation coefficient of Xt and Xt+1 . Then for any integer i and j, the probability of the event x˜ 1 = i, x˜ 2 = j is as follows: x1 = i, x2 = j) pij P( = P(i − 0.5  X1 < i + 0.5, j − 0.5  X2 < j + 0.5)       i+0.5  j+0.5 1 x1 −  x1 −  −1 · exp −0.5 R dx2 dx1 = 2×2 0.5 x2 −  x2 −  i−0.5 j−0.5 2|R2×2 |   2  i+0.5−  j+0.5− 1− x21 + x22 − 2x1 x2 = · exp − dx2 dx1 . 22 2 2 i−0.5− j−0.5−

(2)

As || < 1, ||k may be as small as desired if we were to choose k large. That is, when k is large enough, (Xi , Xi+1 ) and (Xi+k , Xi+k+1 ) Xi+1 ) and ( Xi+k , Xi+k+1 ) are also approximately independent. For are approximately independent. So the rounded samples ( Xi ,

2530

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

10 example, when  = 0.3, then  = 5.90 × 10−6 . Then when  = ±0.3, ( Xi , Xi+1 ) and ( Xi+11 , Xi+12 ) may be considered to be mutually 20 independent. Similarly, 0.5 = 9.54 × 10−7 ; 0.7540 = 1.0 × 10−5 ; 0.9100 = 2.66 × 10−5 . Based on the above argument, we propose the following estimation procedure to obtain the AMLE of parameters c,  and 2 . Assume that k is a large integer. Split the rounded data as the following k subgroups of size m = [(n − 1)/k] (overlapping of groups is allowed):

X2 )( Xk+1 , Xk+2 ) . . . , ( X(m−1)k+1 , X(m−1)k+2 ); ( X1 , ( X2 , X3 )( X , X ) . . . , ( X , X ); k+2

··· ( X , X

k+3



(m−1)k+2





(m−1)k+3



k+1 )(X2k , X2k+1 ) . . . , (Xmk , Xmk+1 ).

k

Xn : Estimation procedure based on X1 , . . . , X2 ), ( Xk+1 , Xk+2 ), . . . , ( X(m−1)k+1 , X(m−1)k+2 ) as if a sample of i.i.d. two-dimensional random vectors Step 1: We consider ( X1 , (1) and maximizing the approximated log-likelihood ij nij log pij . We then obtain an AMLE of parameters (c, , 2 ), denoted by , ( c , 2 ), where n(1) is the frequency of (i, j) in the sub-sample. 1

1

1

ij

2 , Step 2: Similarly, by the j-th subgroup of the data, we can obtain an AMLE of parameters (c, , 2 ), denoted by ( cj ,  j j ), j = 2, . . . , k. Step 3: Set our estimators of parameters (c, , 2 ) as the averages of these AMLE's, i.e.

c=

k

k

= 

ci /k,

i=1

/k,  2 = i

i=1

k

2i /k.

i=1

ˆ , We will denote the (AMLE) by ( c,  2 ). To compute the AMLE, we can consider step 1. The estimating equations are (1)

(1)

nij jpij = 0, pij jc

nij jpij = 0, pij j

ij

(1)

and

ij

nij jpij = 0, pij j2 ij

where

jpij = jc



jpij = j



jpij = j2



i+0.5−



i−0.5− i+0.5−

i−0.5−

2

1 −  (x1 + x2 )

j+0.5−

24

j−0.5−



j+0.5− j−0.5−

⎛ ⎝

2

1 −  x1 x2

4



 exp −

x21 + x22 − 2x1 x2





22

dx2 dx1 ,

(3)



  x2 + x22 − 2x1 x2 ⎠ 1 exp − 1 dx2 dx1 2 22 2

2 1 − 

and i+0.5− i−0.5−



j+0.5−

j−0.5−

⎛ ⎝

2

1 −  (x21 + x22 − 2x1 x2 ) 2 6



1−

4

2



  x2 + x22 − 2x1 x2 ⎠ 1 exp − 1 dx2 dx1 . 2 2 2

, Theorem 2.3. By choosing k = kn → ∞ with order o(n2/3 ), then the AMLE ( c,  2 ) obtained by the proposed estimation procedure based on X1 , . . . , Xn are weakly consistent. The AMLE is also strongly consistent if −(1 + ) log n/log ||  k  n(1−)/3 for some positive constant . Proof. It is easy to verify that h = h0 is the only maximizer of 1/m times the expected approximated likelihood ij pij (h0 ) log pij (h) (l) where h0 is the true value of parameters. Thus to show the consistency of hˆ , we only need to show that maxl  k suph∈H | ij (pˆ ij − P

(l)

(l)

pij (h0 )) log pij (h)| → 0 where pˆ ij = nij /m and H is a compact set containing h0 as an inner point. By expression (2), we have sup | log pij (h)|  K(i2 + j2 + 1),

(4)

h∈H

(l)

for some constant K which may depend on the choice of H. Denote by Iij (h) the indicator function that the h-th pair in the

(l) l-th sub-sample equals (i, j) and I(l) (h1 , . . . , ht ) = ts=1 Ii ,j (hs ), h1 , . . . , ht need not be distinct. By elementary calculus, for any fixed s s

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

2531

h ∈ H, we have 4   

  (l) 1

! log(pik jk (h)) EI(l) (h1 , h2 , h3 , h4 ) E (pˆ ij − pij (h0 )) log pij (h) = 4 m   ij i1 ,j1 i2 ,j2 i3 ,j3 i4 ,j4 1,2,3,4 h1 ,h2 ,h3 ,h4

4

! − 3 log(pik jk (h))pi4 j4 (h0 ) EI(l) (h1 , h2 , h3 ) m i1 ,j1 i2 ,j2 i3 ,j3 i4 ,j4 1,2,3,4

h1 ,h2 ,h3

(l) 6

! + 2 log(pik jk (h))pi1 j1 (h0 )pi2 j2 (h0 ) EI (h3 , h4 ) m i1 ,j1 i2 ,j2 i3 ,j3 i4 ,j4 1,2,3,4

h3 ,h4

(l) 4

! − log(pik jk (h))pi1 j1 (h0 )pi2 j2 (h0 )pi3 ,j3 (h0 ) EI (h4 ) m i1 ,j1 i2 ,j2 i3 ,j3 i4 ,j4 1,2,3,4



+⎝

⎞4

h4

log(pij (h))pij (h0 )⎠ ,

(5)

i,j

where (1/m) h4 EI(l) (h4 ) = pi4 ,j4 (h0 ). Note that 1 (l) 1 " EI (h3 , h4 ) = 2 mpi3 ,j3 (h0 )I(i3 =i4 ,j3 =j4 ) + m(m − 1)pi3 ,j3 (h0 )pi4 ,j4 (h0 ) m2 m h3 ,h4

 i1 +0.5  j1 +0.5  i2 +0.5  j2 +0.5 + g(x1 , x2 , x3 , x4 ; h1 , h2 ) dx1 dx2 dx3 dx4 h3  h4

i1 −0.5

j1 −0.5

i2 −0.5

j2 −0.5

 1 = (mp ( h )I + m(m − 1)p ( h )p ( h )) + R(3, 4), 0 0 0 (i =i ,j =j ) i ,j i ,j i ,j 3 4 3 4 3 3 4 4 3 3 m2 where g(x1 , x2 , x3 , X4 ; h1 , h2 ) is the difference of the joint density of the h3 -th and h4 -th pair in the l-th sub-sample subtracting the product of their marginal densities. Using Lemma 1 in Appendix, we have |g(x1 , x2 , x3 , x4 ; h1 , h2 )|  Kg(x1 , x2 )g(x3 , x4 )||k , where g(x1 , x2 ) is the joint density of (X1 , X2 ). Hence, |R(3, 4)|  K||k pi3 ,j3 (h0 )pi4 ,j4 (h0 ). Similarly, we have 1 m3

EI(l) (h2 , h3 , h4 ) =

h2 ,h3 ,h4

1 [mpi3 ,j4 (h0 )I(i2 =i3 =i4 ,j2 =j3 =j4 ) + 3m(m − 1)pi2 ,j2 (h0 )pi3 ,j3 (h0 )I(i3 =i4 ,j3 =j4 ) m3 + m(m − 1)(m − 2)pi2 ,j2 (h0 )pi3 ,j3 (h0 )pi4 ,j4 (h0 )] + R(2, 3, 4),

1 m4

h1 ,h2 ,h3 ,h4

EI(l) (h1 , h2 , h3 , h4 ) =

1 [mpi1 ,j1 (h0 )I(i1 =i2 =i3 =i4 ,j1 =j2 =j3 =j4 ) + 4m(m − 1)pi1 ,j1 (h0 )pi2 ,j2 (h0 )I(i2 =i3 =i4 ,j2 =j3 =j4 ) m4 + 3m(m − 1)pi1 ,j1 (h0 )pi2 ,j2 (h0 )I(i1 =i3 ,i2 =i4 ,j1 =j3 ,j2 =j4 )

+ 6m(m − 1)(m − 2)pi1 ,j1 (h0 )pi2 ,j2 (h0 )pi3 ,j3 (h0 )I(i3 =i4 ,j3 =j4 )

+ m(m − 1)(m − 2)(m − 3)pi1 ,j1 (h0 )pi2 ,j2 (h0 )pi3 ,j3 (h0 )pi4 ,j4 (h0 )] + R(1, 2, 3, 4), where |R(2, 3, 4)|  ||k pi2 ,j2 (h0 )pi3 ,j3 (h0 )pi4 ,j4 (h0 ), |R(1, 2, 3, 4)|  ||k pi1 ,j1 (h0 )pi2 ,j2 (h0 )pi3 ,j3 (h0 )pi4 ,j4 (h0 ). Substituting these estimates into (5), we obtain ⎡⎛  4 ⎞4 ⎛ ⎞⎤  

 (l)  ⎢ ⎥ −2 k 4 E (pˆ ij − pij (h0 )) log pij (h)  K(m + || ) ⎣⎝ pi,j (h0 )| log(pi,j (h))|⎠ + ⎝ pi,j (h0 )| log(pi,j (h))| ⎠⎦  ij  ij ij for some constant K. Applying (4), we obtain  4    (l)  E (pˆ ij − pij (h0 )) log pij (h) = O(m−2 + ||k ).  ij 

2532

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

By the same arguments, one can prove that for any h1 , h2 ∈ H, ⎡⎛  4 ⎛ ⎞⎤ ⎞  4     4 

 (l)      ( h ) ( h ) p p 1 1 i,j i,j ⎢ ⎠ + ⎝  ⎠⎥ E (pˆ ij − pij (h0 ))(log pij (h1 ) − log pij (h2 ))  K(m−2 + ||k ) ⎣⎝ pi,j (h0 ) log pi,j (h0 ) ⎦. pi,j (h2 )  pi,j (h2 )   ij  ij ij Using the partial derivatives of log pi,j (h1 ) given in (3), it can be shown that  4    (l)  E (pˆ ij − pij (h0 ))(log pij (h1 ) − log pij (h2 ))  K(m−2 + ||k ) h1 − h2 4  ij 

(6)

for the same constant K. Now, let us select a dense net of H as follows: (1)

(1) Split H into cubes of edge length 1 and denote the cubes by T (1) (j), and their centers hj , j = 1, . . . , N. ()

(+1)

(2) For each   1, split each cube Tj , ...,j as a3 smaller cubes T j , ...,j ,j with edge length a− , where a is an integer not less than 4. 1 1   (3) j1 = 1, . . . , N, j = 1, . . . , a3 for all   2. From this, we obtain    ⎛ ⎞ ⎞     k N

  (l)   (l) (1) P ⎝sup max  (pˆ ij − pij (h0 )) log pij (h)  ⎠  P ⎝ (pˆ ij − pij (h0 )) log pij (hj )  /2⎠   ij  h∈ l  k  ij l=1 j=1   ⎛ ⎞ ()   N k

a3 pij (hj , ...,j )

 (l)  1     ⎝ ⎠ ˆ + P  (pij − pij (h0 )) log   /2 (−1)  ij pij (h j1 , ...,j−1 )  l=1 =2 j1 =1 j2 , ...,j =1 ⎛ ⎞     k N a3

 (l) pij (h)  ⎜

  /2 ⎟ + P ⎝ sup  (pˆ ij − pij (h0 )) log ⎠  ( ) ( )  p ( h ) h∈Tj , ...,j ij j1 , ...,j  ij l=1 j1 =1 j2 , ...,j =1

1 ⎛

 Nk(m−2 + k )



=1

+

k N

a3

l=1 j1 =1 j2 , ...,j =1

2 a−(−1) ⎛ ⎜ P ⎝ sup

h∈Tj( ), ...,j 1



   (l) pij (h)  (pˆ − pij (h0 )) log  ij ( )  ij pij (hj , ...,j 1



⎞      /2 ⎟ ⎠.  )

When is large enough (possibly depending on n), the last term in the above is zero. Thus,   ⎞     (l) P ⎝sup max  (pˆ ij − pij (h0 )) log pij (h)  )  Kk(m−2 + ||k ⎠ .  h∈H l  k  ij ⎛

From this, we conclude that the regularity condition 4 holds in probability of almost surely if k→∞

and

k = o(n2/3 )

(7)

or for some  ∈ (0, 1), (1 + ) log n/| log |||  k  n(1−)/3 , respectively. Consequently, the AMLE is consistently weak or strong under respective conditions.

(8)



, Theorem 2.4. If k = kn is selected such that (8) is true, then the AMLE ( c,  2 ) are asymptotically multivariate normally distributed, that is, %

⎛ ⎞ c−c n ⎝  −  ⎠ ∼ N(0, I−1 (h)Vp (h)I−1 (h)), m 2 − 2

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

2533

where the expression of I(h) is in Theorem 2.2 and ∞

Vp (h) = I(h) + 2

−1 P(Y1 ∈ Ai1 , Yj+1 ∈ Ai2 )p−1 i1 (h)pi2 (h)

i1 ,i2 j=1

jpi1 (h) jpi2 (h) . jh jhT

2.3. General AR(p) model For the general AR(p) model Xt = c + 1 Xt−1 + · · · + p Xt−p + t , i.i.d. where t ∼ N(0, 2 ) for t = p + 1, . . . , n. Let h = (c, 1 , . . . , p , 2 )T = (c, /, 2 )T . Here, only rounded data X˜ 1 , . . . , X˜ n can be available. ˆ , ˆ 2 ) and expect more simulation studies to investigate properties. The sample can be We propose the following estimator (cˆ , /

split into the following k groups: Sub-samples: Xp+1 (1) X1 · · · (2) X2 · · · Xp+2 ··· Xk+p (k) Xk · · ·

Xk+1 Xk+2 · · · Xk+p+2 ··· X2k · · · X2k+p

··· Xk+p+1 · · · X(m−1)k+1 · · · X(m−1)k+p+1 ; ··· X(m−1)k+2 · · · X(m−1)k+p+2 ; ··· ··· Xmk · · · Xmk+p ;

where m = [(n − p)/k]. Xn : Estimation procedure based on X1 , . . . , Xp+1 )( Xk+1 · · · Xk+p ) · · · ( Xmk+1 · · · Xmk+p+1 ) as a sample of i.i.d. p-dimensional random vectors Step 1: We shall consider ( X1 · · · (1) and the approximated log-likelihood ij nij log pij is maximized. We then obtain an AMLE of parameters (c, /, 2 ), denoted by , ( c ,/ 2 , ), where n(1) is the frequency of (i, j) in the sub-sample. 1

1

1

ij

2 , Step 2: Similarly, by the j-th subgroup of the data, we also obtain an AMLE of parameters (c, /, 2 ), denoted by ( cj , / j j ), j = 2, . . . , k. Step 3: We will set our estimators of parameters (c, /, 2 ) as the averages of these approximated MLEs, i.e.

c=

k

ci /k,

i=1

= /

k

i=1

/k, / 2 = i

k

2i /k.

i=1

Remark 2.2. The consistency and asymptotic normality of the obtained estimates can be established in the similar way as Theorems 2.1 and 2.2. However, when p  2, the computation of the AMLE of the parameters is very time consuming. A less time consuming but slightly less efficient AMLE (named snake-chopping method) is proposed in Zhang et al. (2009). 3. Examples Example 3.1. We shall reanalyze the data in Table Series C from the book Time Series Analysis (Forecasting and Control) (Box et al., 1994, Third Edition, P544) to illustrate our estimation procedures. The 226 data are records of chemical process temperature readings per minute. From Table Series C, we can see that the 226 chemical temperature data take values at most to one digit after decimal point. But we know that temperature is continuous. Thus, the 226 data are in fact rounded data. If the conventional methods are directly used to analyze the rounded data, this may cause some serious errors, as analyzed in the Introduction section. In order to avoid these problems, we use our estimation procedures to deal with the rounded data. As pointed out in the book (P189) that data in Table series C is suggested to be an AR(1) process about  = 0.8 after taking the first difference. That is, the data satisfy the following model: ∇xt = ∇xt−1 + t , where ∇ denotes the first difference notation, that is, ∇xt = xt − xt−1 . By using our estimation procedure for rounded data, the estimate of  is 0.716. The original MLE of  that assume no rounding in the book Time Series Analysis is 0.8. Then we can obtain that the difference between our estimate and the estimate in this book is 0.084. The difference between the two values is due to whether conventional methods have been used to directly analyze the rounded data or not. Our introduction has shown that it often leads to inconsistent estimates if we apply directly conventional methods to rounded data. Our estimation procedure will obtain consistent estimators which is also asymptotically normally distributed. Remark 3.1. The data given in the book of Box et al. (1994) are rounded values of the original times series before the differencing. We have assumed in our Example 3.1 that the rounding was performed on the difference sequence for simplicity, since the density

2534

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

of the time series are not easy to express for ARI(1, 1) series. We have not figured out how to deal with the rounding before the differencing. This will be left for our further investigation. Example 3.2. In finance, each observation in the CRSP monthly value-weighted return series is the return on a portfolio of NYSE stocks with weights based on the relative market value of those stocks at the beginning of each month. At the beginning of the next month, a new set of weights is chosen to replicate the new composition of the stock market. The market value of a stock is the price of a share times the number of shares outstanding. Its weight in the portfolio is its market value divided by the total value of the stock market. Portfolio weights based on market values change for two reasons: (1) stock prices change and (2) the number of shares change. Weight changes in the portfolio due to price movements occur automatically without the need to adjust shares held. Changes in the number of shares outstanding for a stock require the rebalancing of the portfolio (an adjustment to the number of shares held) to maintain the weights. Rebalancing is required whenever there is entry or exit and stock issue or repurchase. Financial component of Splus software lists several examples about AR(1) with rounded data. For example, the monthly return rate of CRSP value-weighted index is an autoregressive process of order one or AR(1) while the monthly return rate is a rounded datum. 4. Simulation results In this section, for simplicity, AR(1) model Xt = Xt−1 + t , i.i.d.

where t ∼ N(0, 2 ) for t=1, . . . , n is used for simulation. X1 , . . . , Xn is simulated from the AR(1) model and we let the corresponding rounded data be X˜ 1 , . . . , X˜ n . Different configuration of parameters are used to obtain the samples. The estimates of parameters based on √ the above AMLE estimation procedure considering rounding and the corresponding the square toot of mean square errors ( MSE) are listed in Table 1. The simulation is based on 200 repeated procedure. The Q–Q plots based on 200 estimates of  are given in Figs. 1–9. The figure of k against MSEs of  and 2 is plotted where the solid curve and the dotted curve denote Table 1 Simulation results for AR(1). (, )

Yule–Walker estimators √ ( MSE) 

√  ( MSE)

AMLE

(0.3, 1.0)

500

0.277 (0.049)

(0.5, 1.0)

500 1000 2000 10 000

0.468 0.469 0.470 0.470

(0.75, 1.0)

1000

0.722 (0.036)

2000

0.723 (0.031)

1.062 (0.064)

(0.9, 1.0)

1000

0.883 (0.022)

1.075 (0.079)

sqrt(n/m)(estimate-0.3)

n

(0.052) (0.042) (0.036) (0.030)

n(k)

( MSE) 

√  ( MSE)

1.043 (0.054)

500 (5) 500 (10)

0.300 (0.051) 0.298 (0.053)

0.999 (0.035) 0.989 (0.039)

1.049 1.050 1.050 1.050

500 (5) 1000 (10) 2000 (20) 10 000 (20)

0.499 0.492 0.493 0.499

0.990 0.991 0.993 0.998

1000 (5) 1000 (10) 2000 (5)

0.738 (0.026) 0.732 (0.030) 0.744 (0.017)

0.988 (0.028) 0.987 (0.030) 0.990 (0.022)

1000 (5) 1000 (10)

0.842 (0.080) 0.804 (0.114)

0.972 (0.062) 0.989 (0.067)

(0.059) (0.055) (0.052) (0.051)

1.063 (0.067)



1.0 0.5 0.0

-1.0 -3

-2

-1 0 1 2 Quantiles of Standard Normal

Fig. 1. n = 500, k = 5,  = 0.3.

3

(0.042) (0.032) (0.022) (0.010)

(0.035) (0.028) (0.017) (0.007)

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

sqrt(n/m)(estimate-0.5)

1.0 0.5 0.0 -0.5 -1.0 -3

-2

-1 0 1 2 Quantiles of Standard Normal

3

Fig. 2. n = 500, k = 5,  = 0.5.

sqrt(n/m)(estimate-0.5)

0.4 0.2

-0.2

-0.6 -3

-2

-1 0 1 2 Quantiles of Standard Normal

3

sqrt(n/m)(estimate-0.5)

Fig. 3. n = 1000, k = 20,  = 0.5.

0.4 0.2

-0.2

-0.6 -3

-2

-1 0 1 2 Quantiles of Standard Normal

3

Fig. 4. n = 10 000, k = 20,  = 0.5.

sqrt(n/m)(estimate-0.75)

0.5

0.0 -0.5

-1.0 -3

-2

-1 0 1 2 Quantiles of Standard Normal

Fig. 5. n = 1000, k = 5,  = 0.75.

3

2535

sqrt(n/m)(estimate-0.75)

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

0.2

-0.2

-0.6

-1.0 -3

-2

-1 0 1 2 Quantiles of Standard Normal

3

sqrt(n/m)(estimate-0.75)

Fig. 6. n = 1000, k = 10,  = 0.75.

0.5

0.0 -0.5 -1.0 -3

-2

-1 0 1 2 Quantiles of Standard Normal

3

Fig. 7. n = 2000, k = 5,  = 0.75.

sqrt(n/m)(estimate-0.9)

0

-1

-2

-3

-3

-2

-1 0 1 Quantiles of Standard Normal

2

3

Fig. 8. n = 1000, k = 10,  = 0.9.

0.007 0.006 0.005 MSE

2536

0.004 0.003 0.002 0.001 10

20

30 k

Fig. 9. n = 1000.

40

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

2537

Table 2 Simulation results for MA(1). (, )

Moment estimators

AMLE

n

√ ( MSE) 



 ( MSE)

n(k)

( MSE) 



√  ( MSE)

(0.3, 1.0)

50 200

0.275 (0.085) 0.287 (0.154)

1.040 (0.066) 1.026 (0.108)

50 200

0.316 (0.231) 0.318 (0.103)

0.934 (0.119) 0.991 (0.062)

(0.5, 1.0)

50 200

0.431 (0.185) 0.455 (0.117)

1.036 (0.115) 1.046 (0.074)

50 200

0.489 (0.220) 0.501 (0.123)

0.942 (0.096) 0.993 (0.062)

(0.75, 1.0)

50 200

0.586 (0.246) 0.635 (0.172)

1.070 (0.137) 1.069 (0.100)

50 200

0.703 (0.299) 0.716 (0.139)

0.976 (0.079) 1.006 (0.048)

(0.9, 1.0)

50 200

0.666 (0.321) 0.718 (0.239)

1.106 (0.171) 1.100 (0.135)

50 200

0.814 (0.251) 0.870 (0.143)

0.986 (0.060) 1.006 (0.033)

the MSEs of  and 2 corresponding to k, respectively. For comparison, Yule–Walker estimators of  and  ignoring rounding are also given in Table 1 based on 5000 simulations. Similarly, MA(1) model Xt = t + t−1 , i.i.d.

where t ∼ N(0, 2 ) for t = 1, . . . , n is used for simulation. The simulation study includes AMLE considering rounding and moment estimators ignoring rounding. The corresponding simulation results are listed in Table 2. From the following tables and graphs, we can know that (1) When we increase the sample size, the MSEs of the AMLEs become smaller. The larger a sample size, the more precise the estimates. (2) For fixed sample size n, even though the data of each sub-sample are more nearly i.i.d. for larger k, the number of data of each sub-sample is smaller. It may lead to worse AMLEs because of the larger MSEs of estimates. Therefore k should be moderate. The relationship of k and MSEs of  and 2 can be seen by the plots k to MSEs of  and 2 . (3) From the following Q–Q plots, we can know that the estimates are asymptotically normal. This ensures Theorem 2.2 again.

5. Comments and conclusions Rounded data are often encountered in many situations, especially for cases where data take continuous values. In fact, for irrational numbers, we cannot obtain the true number so at the very most we can obtain only rounded data. This is especially so now as many instruments can only record rounded data and computers can only deal with rounded data. For these, this paper deals with rounded data in normal distribution, AR(p) and MA(p) time series models, gives estimation procedures and proves the consistency and asymptotic normality of estimates obtained. Although this paper seems to deal with data rounded at one digit after decimal point, the proposed estimation procedure can be easily extended to data rounded at any digit after decimal point. We will need to research further on how to deal with rounded data in different statistical models. Appendix Lemma 1. If X1 , X2 , . . . , Xn is a sample of an AR(1) model with autoregressive coefficient  (|| < 1) and normally distributed innovations, then & ' 2 (1 −  )||k (9) |g(x1 , x2 ) − g(x1 )g(x2 )|  Kg(x1 )g(x2 )||k exp ((xi2 − )2 + (xi3 − )2 ) (1 + |xi2 − |2 + |xi3 − |2 ), 2k 22 (1 +  ) where g(x1 , x2 ) is the joint density of (Xi1 , Xi1 +1 , . . . , Xi2 , Xi3 , Xi3 +1 , . . . , Xi4 ) and g(x1 ), g(x2 ) are the joint densities of (Xi1 , Xi1 +1 , . . . , Xi2 ) and (Xi3 , Xi3 +1 , . . . , Xi4 ), respectively. Also, k = i2 − i1 . Here K is an absolute constant depending on k and  only. Furthermore, for  > 2,   & '    −1 2 !

  (1 −  )||kt 2 2 kt g(x1 , . . . , x ) −  K g(x ) |  | g (x , . . . , x ) exp −  ) + (x −  ) ) ((x t t 1 2t 2t+1     2k   22 (1 +  t ) t=1 t=1 × [1 + |x2t − |2 + |x2t+1 − |2 ], where xt = (xi2t−1 , . . . , xi2t ), kt = i2t+1 − i2t , gt (x1 , . . . , x ) = g(x1 , . . . , xt )g(xt+1 ) · · · g(x ) and K is the same as in (9).

2538

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

Proof. We need only to prove (9). Let X1 , . . . , Xn be a normal casual AR(1) sequence with || < 1. Let x1 = (xi1 , . . . , xi2 )T , x2 = (xi3 , . . . , xi4 )T and x = (x1T , x2T )T and let g(x), g(x1 ) and g(x2 ) be joint densities of (Xi1 , . . . , Xi2 , Xi3 , . . . , Xi4 ), (Xi1 , . . . , Xi2 ) and (Xi3 , . . . , Xi4 ), respectively, where i1 < i2 < i3 < i4 . n−1 T ) .Then, the covariance matrix of (Xi2 , . . . , Xi1 , Xi4 , . . . , Xi4 )T is We write the matrix in (1) as Rn and qn = (1, , . . . ,  ⎛ ⎞ 2 k ⎜ ⎝

 

Ri2 −i1 +1

2

1−

qi2 −i1 +1 qTi4 −i3 +1 ⎟

2 k T 2 qi4 −i3 +1 qi −i +1 2 1 1−

Ri4 −i3 +1

⎠.

We write its inverse in blocks as  (11)  R R(12) (21) (22) .

R

R

T Notice that (2 /(1− ))qi4 −i3 +1 is the first column of Ri4 −i3 +1 . This implies (2 /(1− ))qTi −i +1 R−1 i4 −i3 +1 =fi4 −i3 +1 =(1, 0, . . . , 0)1×i4 −i3 +1 . 4 3 Then by the inverse matrix formula, we have 2

2



R(11) = Ri2 −i1 +1 − 

4 2k T qi2 −i1 +1 qTi4 −i3 +1 R−1 i4 −i3 +1 qi4 −i3 +1 qi2 −i1 +1 2 (1 −  )2

−1

−1

2 2k = Ri2 −i1 +1 − q qT 2 i2 −i1 +1 i2 −i1 +1 1− −1 −1 T 2 2k Ri2 −i1 +1 qi2 −i1 +1 qi2 −i1 +1 Ri2 −i1 +1 = R−1 + · i2 −i1 +1 2 2 2k 1− 1 +   2 qTi −i +1 R−1 j qi2 −i1 +1 2 1 1−

=

R−1 i2 −i1 +1

2

+

1−

2

·

2k

f fT . 2k i2 −i1 +1 i2 −i1 +1

1+

Similarly

R(22) = R−1 i4 −i3 +1 +

2

1−

2

·

2k

f fT 2k i4 −i3 +1 i4 −i3 +1

1+

and 

R(12) = − k fi2 −i1 +1 qTi4 −i3 +1 R−1 i4 −i3 +1 + 2

= −

k

2

1−

2

·



2k 2k

1+

fi4 −i3 +1 fiT4 −i3 +1

3k

(1 −  )( + 2 ) 2k

2 (1 +  )

fi2 −i1 +1 fiT4 −i3 +1 .

By the above expression, we can also calculate (⎛ ⎞( ( ( 2 k T ( ( R i −i +1 2 qi2 −i1 +1 qi −i +1 2 1 4 3 ⎟( (⎜ 2k 1− (⎝ 2 k ⎠( = Ri2 −i1 +1

Ri4 −i3 +1 (1 −  ), (  ( T R ( ( i −i +1 2 qi4 −i3 +1 qi −i +1 4 3 2 1 1−

where A denotes the determinant of the matrix A. From these calculation, we obtain that   2 1 1− 2k 2k k 2k exp − g(x1 , x2 ) = g(x1 )g(x2 ) ((xi2 − )2  + (xi3 − )2  − 2(xi2 − )(xi3 − ) (1 + 2 )) . 2k 2k 22 (1 +  ) 1− Then, inequality (9) follows from the above and the mean-value-theorem. Furthermore, the second conclusion of Lemma 1 follows from (9) trivially.  Lemma 2. Suppose the assumptions of Lemma 1 hold with || < 1. Assume that f is a k-dimensional measurable function such that T Ef (Xi ) = 0 and Ef (Xi )f (Xi )T = 0 exists. If V = 0 + ∞ j=1 [j + j ] exists and is positive definite, then, with Zi = f (Xi ), 1 L √ (Z1 + · · · + Zn ) −→ N(0, V), n where j = Ef (X1 )f (X1+j )T = T−j .

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

2539

Proof. Without loss of generality, we may assume that f is one-dimensional. Otherwise, we shall consider Zi = aT f (Xi ) for any fixed vector a. Note that  Var

  n−1

 1 j  → V. 1− √ (Z1 + · · · + Zn ) = 0 + 2 n j n j=1

We see that the variance of Z1 + · · · + Zn is of the order n. √ We shall choose an integer m and an integer k such that m/log n → ∞, nm/k → 0 and km/n → 0. For example, we may √ 2 3 choose m = [log n], k = [ n log n]. Let 1 Ynk = √ [(Z1 + · · · + Zk−m ) + · · · + (Z(r−1)k+1 + · · · + Zrk−m )], n 1 Wnk = √ [(Zk−m+1 + · · · + Zk ) + · · · + (Zrk−m+1 + · · · + Zrk )], n where r = [n/k], the inter part of n/k. It is easy to verify that Var(Wnk ) 

r2 m2 0 →0 n

√ √ by the choice of k and m which implies rm=o( n). Consequently, Ynk and (1/ n)(Z1 +· · ·+Zn ) have the same limiting distribution. Consider the characteristics function of Ynk . By the second conclusion of Lemma 1, we have  ⎛ ⎞⎡ ⎞r   ⎛    k−m r k−m

   it  E exp(itY nk ) − E exp ⎝ √it  ⎠ ⎝ Zj  =  exp √ f (x(i−1)k+j )⎠ ⎣g((x(i−1)k+j : j  k − m, i  r)  n n    j=1 i=1 j=1 ⎤   r !  g((x(i−1)k+j : j  k − m))⎦ dx −  i=1 2    2 (1 −  )||m+1 2 2  K(r − 1)||m+1 E exp ) (1 + X X 1 1 2m+2 22 (1 −  ) →0 since m/log n → ∞. √ r Now, (E exp((it/ n) k−m j=1 Zj ) can be regarded as the characteristic function of the sum of r i.i.d. random variables distributed √ k−m √ (i) similarly as (1/ n) j=1 Zj . We denote the random variables by (1/ n) k−m j=1 Zj , i = 1, 2, . . . , r. At first, we truncate the variables √ at  n, that is, let √ (i) (i) (i) Zj = Zj I(|Zj |   n). Then, ⎛

⎞ k−m k−m √ √ 1 (i) 1 (i) ⎠ ⎝ P √ Zj Zj  √  nP(|Zj(i) | >  n)  −2 EZ 2j I(|Zj | >  n) → 0 n n j=1

and 1 √ n

(10)

j=1

  k−m   (i)  √  E Zj   −1 EZ 2j I(|Zj | >  n) → 0.   j=1 

(11)

Since (10) and (11) are true for any  > 0, we can choose an  = n ↓ 0 such that (10) and (11) hold true with  replaced by n . √ √ (i) Thus, we can truncate the variables at n n and then recentralize them. Therefore, we may further assume that |Zj |  n n. We notice that the variance of the sum is ⎛ ⎞ k−m

1 Zj ⎠ Vn = r Var ⎝ √ n j=1 ⎛ ⎞ k−m−1

r ⎝ (k − m)0 + 2 = (k − m − j)j ⎠ n j=1

→V = 0 + 2



j=1

j .

2540

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

Therefore, to complete the proof of the lemma, one needs only to verify the Lyapounov condition, that is, ⎛ ⎞4 k−m r ⎝ ⎠ E Zj → 0. n2

(12)

j=1

Expanding (

k−m j=1

Zj )4 , we obtain

⎛ ⎞4 ⎡ ⎤ k−m k−m

3

2

r ⎝ ⎠ r ⎣ 4 3 2 2 2 2 E Zj = 2 EZ j + 4 (EZ i Zj + EZ j Zi ) + 6 EZ i Zj + 4 (EZ i Zj Zl + EZ i Zj Zl + EZ i Zj Zl ) + 24 EZ i Zj Zl Zt ⎦ n2 n j=1

i
i=1

i
i
i
:= I1 + · · · + I8 . It is obvious that from the truncation I1 

r2n nk0  2n 0 → 0. n2

By the truncation, we know that for some constant K,  E|Zi | exp

2

(1 −  )||t 2t

22 (1 +  )







Xi2 (1 + Xi2 )2 < E|Zi |2 E exp



2

(1 −  )||t

2 (1 + 2t )



1/2

Xi2 (1 + Xi2 )4

Kn(1/2)( −1) n −1 .

Thus, by the first conclusion, we have I2 

4r K||j−i E|Zi |3 exp n2 i


22 (1 + 

2(j−i)





2

(1 −  )||j−i )

Xi2 (1 + Xi2 )E|Zj | exp



2

(1 −  )||j−i 22 (1 + 

2(j−i)

)

Xj2 (1 + Xj2 )

4r  2 K||j−i n2n n i
4K 2n || → 0.  1 − || Its dual part I3 also tends to 0. Furthermore, by Lemma 1 I4 

  * )   2 2 6r (1 −  )||j−i 2 (1 −  )||j−i 2 2 2 2 j−i 2 2 2 X X EZ + K|  | E|Z | exp )EZ exp ) EZ (1 + X (1 + X i i j j i i j j 2(j−i) 2(j−i) n2 22 (1 +  ) 22 (1 +  ) i
6r 2  2 [0 + K||j−i n2n ] n i


Kk K 2n || → 0. + n 1 − ||

For I5 , by Lemma 1, we have I5 

4r n2

i


) K||l−j E|Zi |2 |Zj | exp 

2

(1 −  )||l−j 22 (1 + 

2(l−j)

) 

 Xj2 (1 + Xj2 )E|Zl | exp



22 (1 + 

2(l−j)

*



2

(1 −  )||l−j )

Xl2 (1 + Xl2 )

Kr (1 −  ||l−j EZ 2i |Zj | exp Xj2 (1 + Xj2 ) 2(l−j) n2 22 (1 +  ) i


2

)||l−j

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

2541

By the same arguments, one can prove that I6 , I7 → 0. Finally, we show that I8 → 0. To this end, we split I8 into I81 + · · · + I83 , where I81 contains all those terms for which t − l  m, I82 contains all those terms for which j − i  m but t − l < m, I83 contains all remaining terms. Then, I81 

Kr n2



)

||t−l E|Zi ||Zj ||Zl | exp

22 (1 + 

i
2(t−l)





2

(1 −  )||t−l )

Xl2 (1 + Xl2 )E|Zt | exp

22 (1 + 

2(t−l)

*



2

(1 −  )||t−l )

Xt2 (1 + Xt2 )

 Kk3 2n ||m → 0, I82 

Kr n2



)

||j−i E|Zi | exp

22 (1 + 

i
2(j−i)





2

(1 −  )||j−i )

Xi2 (1 + Xi2 )E|Zj ||Zl ||Zt | exp

22 (1 + 

2(j−i)

*



2

(1 −  )||j−i )

Xj2 (1 + Xj2 )

 Kk3 2n ||m → 0, I83 

Kr n2

)





EZ i Zj EZ l Zt + ||l−j E|Zi ||Zj | exp

i


×E|Zl ||Zt | exp

2(l−j)

22 (1 +  2√ Kr20 k2 m2 Krkm nn ||  + n2 n2 (1 − ||) Kkm| Kmn ||  +√ → 0. n n(1 − ||)

)

22 (1 + 

2(l−j)

)

Xj2 (1 + Xj2 )

*



2

(1 −  )||l−j



2

(1 −  )||l−j

Xl2 (1 + Xl2 )

Here, in the second inequality, we have used the facts that EZ i Zj  0 and  E|Zi ||Zj | exp

22 (1 + 

2(l−j)





2

(1 −  )||l−j )

Xj2 (1 + Xj2 )  E1/2 |Zi |2 E1/4 |Zj |4 E1/4 exp √  K nn .

2

2(1 −  )||l−j

2 (1 + 2(l−j) )

 Xj2 (1 + Xj2 )4

The proof of the lemma is complete.  Proof of Theorem 2.4. Let the log-likelihood be Lj (c, /, 2 ) based on the j-th sub-sample ( Xj · · · Xp+j ), ( X(p+1)+j · · · 2 X(p+1)+p+j ), . . . , (X(m−1)(p+1)+j · · · X(m−1)(p+1)+p+j ) for j = 1, . . . , k. Let h = (c, /,  ), then Taylor expansion of jLj (h)/ jh at yields 



jLj (h)  j Lj (h ) j2 Lj (h) 0= = +  jh hˆ j jh jhjhT

    · (hˆ j − h),  ˆ∗ h

j

∗ where hj is a point on the segment connecting h and hˆ j . Then we obtain that

⎧ ⎫ −1   ⎪ k k ⎪ 2 ⎨

 j Lj (h) jLj (h) ⎬  (hˆ j − h) = − .  ⎪ ⎭  ˆ ∗ jh ⎪ jhjhT j=1 j=1 ⎩ hj

That is, √

⎧ ⎫ −1   ⎪ k ⎪ 2 ⎨ ⎬

 j L ( h ) j L ( h ) 1 j j  · √1 n(hˆ − h) = − .  T ⎪ ⎪ m jh n ⎭ ˆ∗ jhjh j=1 ⎩ hj

T T 2 Xj , . . . , Xp+j ) and Wj = − +∞ We first consider the convergence of −(1/m)j2 Lj (h)/ jhjh |hˆ ∗ . Let Yj = ( i=−∞ I(Yj ∈Ai ) j log(pi (h))/ jhjh j for i = 1, . . . , n − p. Then we obtain that −

m 2 1 1 j Lj (h) = − W(t−1)k+j . m jhjhT m t=1

2542

Z. Bai et al. / Journal of Statistical Planning and Inference 139 (2009) 2526 -- 2542

First, we notice that −

2

1 j Lj (h) j2 log(pi (h)) −1 jpi (h)) jpi (h)) = −EW 1 = − pi (h) = pi (h) = I(h). E T m jhjh jh jhjhT jhT i i

Using Lemma 1 and the fact that each hj is consistent to h, similar to the proof of Theorem 2.2, we can show that √ a.s. T −(1/m)j2 Lj (h)/ jhjh |h∗j −→ I(h). Now, we discuss the asymptotic properties of kj=1 (1/ n)jLj (h)/ jh which can be expressed √ n−p +∞ +∞ −1 as (1/ n) j=1 [ i=−∞ I(Yj ∈Ai ) p−1 i (h)jpi (h)/ jh]. Now, let Zi = [ i=−∞ I(Yj ∈Ai ) pi (h)jpi (h)/ jh] for i = 1, . . . , n − p. Then we have √ √ n−p k j=1 (1/ n)jLj (h)/ jh = (1/ n) i=1 Zi where EZi = 0 for i = 1, . . . , n − p. At first, we see that ⎛

⎞  n−p n−p−1

 1 ⎠ j ⎝ Var √ Zi = I(h) + 2 EZ1 Zj+1 1− n n i=1

⇒ Vp (h) = I(h) + 2

j=1



−1 P(Y1 ∈ Ai1 , Yj+1 ∈ Ai2 )p−1 i1 (h)pi2 (h)

i1 ,i2 j=1

jpi1 (h) jpi2 (h) . jh jhT

(The reader should note the difference between the covariance matrices for MA(p) and AR(1) although we use the same notation. n−p √ L In fact, that for MA(p) can also use the above notation.) Applying Lemma 2, it follows that (1/ n − p) i=1 Zi −→ N(0, Vp (h)). Hence, we obtain √

n(hˆ − h) −→ N(0, I−1 (h)Vp (h)I−1 (h)). L



References Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 1994. Time Series Analysis (Forecasting and Control). Prentice-Hall, Inc., Englewood Cliffs, NJ. Dempster, A.P., Rubin, D.B., 1983. Rounding error in regression: the appropriateness of Sheppard's corrections. J. Roy. Statist. Soc. Ser. B. 45, 51–59. Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenics. 8, 179–188. Haitovsky, Y., 1982. Grouped data. In: Encyclopedia of Statistical Science. Wiley, New York. Hall, P., 1982. The influence of rounding errors on some nonparametric estimators of a density and its derivatives. SIAM J. Appl. Math. 42, 390–399. Heitjan, D.F., 1989. Inference from grouped continuous data: a review. Statist. Sci. 4, 164–183. Heitjan, D.F., Rubin, D.B., 1991. Ignorability and coarse data. Ann. Statist. 19, 2244–2253. Lee, C.S., Vardeman, S.B., 2001. Interval estimation of a normal process mean from rounded data. J. Qual. Technol. 33, 335–348. Lee, C.S., Vardeman, S.B., 2002. Interval estimation of a normal process standard deviation from rounded data. Comm. Statist. Simulation Comput. 31, 13–34. Lee, C.S., Vardeman, S.B., 2003. Confidence interval based on rounded data from the balanced one-way normal random effects model. Comm. Statist. Simulation Comput. 32, 835–856. Sheppard, W.F., 1898. On the calculation of the most probable values of frequency constants for data arranged according to equidistant divisions of a scale. Proc. London Math. Soc. 29, 353–380. Tricker, A., 1990a. The effect of rounding on the significance level of certain normal test statistics. J. Appl. Statist. 17, 31–38. Tricker, A., 1990b. The effect of rounding on the power level of certain normal test statistics. J. Appl. Statist. 17, 219–228. Tricker, A., Coates, E., Okell, E., 1998. The effect on the R chart of precision of measurement. J. Qual. Technol. 30, 232–239. Vardeman, S.B., Lee, C.S., 2005. Likelihood-based statistical estimation from quantization data. IEEE Trans. Instrum. Meas. 54, 409–414. Vardeman, S.B., 2005. Sheppard's correction for variances and the quantization noise model. IEEE Trans. Instrum. Meas. 54, 2117–2119. Wright, D.E., Bray, I., 2003. A mixture model for rounded data. Amer. Statist. 52, 3–13. Zhang, B.X., Liu, T.Q., Bai, Z.D., 2009. Analysis of rounded data from dependent sequences. Ann. I. Statist. Math., to appear.