On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics

On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics

STAPRO: 8049 Model 3G pp. 1–7 (col. fig: nil) Statistics and Probability Letters xx (xxxx) xxx–xxx Contents lists available at ScienceDirect Stat...

281KB Sizes 2 Downloads 158 Views

STAPRO: 8049

Model 3G

pp. 1–7 (col. fig: nil)

Statistics and Probability Letters xx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics Dietmar Ferger Technische Universität Dresden, Department of Mathematics, Zellescher Weg 12-14, D-01069 Dresden, Germany

article

a b s t r a c t

info

Article history: Received 24 July 2017 Received in revised form 11 October 2017 Accepted 17 October 2017 Available online xxxx

Let T be the point, where a reflected Brownian bridge attains√ its maximal value M. We determine the density and the distribution function of R := M / T (1 − T ). Its counterpart Rn for tied-down sums pertaining to i.i.d. random variables converges in distribution to R. This enables the construction of a change-point test. Our new test performs significantly better than the well-known maximum-type test statistics. © 2017 Elsevier B.V. All rights reserved.

Keywords: Maximum and maximizer of Brownian bridge Tied down sums Change-point analysis Random weights

1. Introduction and main results

1

Let B be a Brownian bridge. It is well known that with probability one |B| and B attain their maximal values M := sup |B(t)|

and M

+

0≤t ≤1

:= sup B(t)

3

0≤t ≤1

at exactly one point T and T + , respectively. In other words, the maximizers T = argmax|B(t)| t ∈[0,1]

and T

+

= argmaxB(t)

4 5

t ∈[0,1]

each are a.s. unique and actually take their values in the open unit interval. In fact, while T + is known to be uniformly distributed on (0, 1), confer Ferger (1995), the density fT of T is given by, confer Ferger (2001): fT (u) = 2

2

− 32

∑∑

(−1)i+j αi αj {αi2 (1 − u) + αj2 u}

,

0 < u < 1,

6 7

8

i≥0 j≥0

with αi := 2i + 1. Similarly, the distributions of M and M + are well-known. However, in this paper we are interested in the following functionals of the Brownian bridge: M

R := √ T (1 − T )

and R+ := √

M+ T + (1 − T + )

.

Indeed, these quantities occur as limit variables of certain test-statistics, e.g., in change-point analysis or goodness of fit as will be explained in the next section. The determination of critical values or p-values requires the distribution of R and R+ . E-mail address: [email protected]. https://doi.org/10.1016/j.spl.2017.10.008 0167-7152/© 2017 Elsevier B.V. All rights reserved.

Please cite this article in press as: Ferger D., On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics and Probability Letters (2017), https://doi.org/10.1016/j.spl.2017.10.008.

9 10 11

12

13 14

STAPRO: 8049

2

D. Ferger / Statistics and Probability Letters xx (xxxx) xxx–xxx

2

In the sequel we make use of the following convention. If Z is a random real vector let us denote its density by fZ and its distribution function by FZ . Furthermore, notice that R and R+ are positive and finite with probability one.

3

Theorem 1.1. Put αj := 2j + 1, j ∈ N0 . Then

1

16 fR (u) = √ 2π

4

5



(−1)j+l

0≤j
∑ 4 αj αl 1 2 2 − 1 α 2 u2 − 1 α 2 u2 [e 2 j − e− 2 αl u ] + √ u2 αj2 e 2 j 2 α − αj 2π j≥ 0

(1.1)

2 l

for all u > 0.

6

FR (x) =

7

16

8

∑ Φ (αj x) − 1/2 4 [ − xϕ (αj x)], αj



αj αl Φ (αj x) − 1/2 Φ (αl x) − 1/2 [ − ]+ αj αl αl2 − αj2

(−1)j+l

0≤j
x ≥ 0,

(1.2)

0≤j<∞

9 10

where Φ and ϕ denote the distribution function and the density, respectively, of the standard normal law. The random variable R+ follows the Maxwell–Boltzmann distribution, that is

√ fR+ (u) =

11

12

2

1 2

u2 e− 2 u , u > 0,

π

and

√ 13

14

(1.3)

FR+ (x) = 2Φ (x) −

2

π

1 2

xe− 2 x − 1, x ≥ 0.

(1.4)

Moreover, R+ and T + are independent.



15 16 17

Proof. First observe that (M , T ) ∈ G := (0, ∞) × (0, 1) a.s. Let g : G → G be defined by g(x, y) := (x/ y(1 − y), y). √ Then (R, T√) = g(M , T ) and g is a diffeomorphism with inverse g −1 (u, v ) = (u v (1 − v ), v ) and pertaining Jacobian J(u, v ) = v (1 − v ), (u, v ) ∈ G. According to Theorem 2.6 of Ferger (2001)

√ 18

19

f(M ,T ) (x, y) =

ψ (x, y)ψ (x, 1 − y),

x > 0, y ∈ (0, 1),

where 3

20

8

π

− 21 αj2 x2 y−1



ψ (x, y) = xy− 2

(−1)j αj e

.

j≥ 0 21

By the Transformation Theorem, confer, e.g., Billingsley (1986, p. 266), we obtain that for every u > 0 and v ∈ (0, 1),

√ 22

23 24



f(R,T ) (u, v ) =



v (1 − v ) f(M ,T ) ( v (1 − v )u, v ) =

8

π

u2 s(u, 1 − v )s(u, v ),

(1.5)

where s(u, v ) =



(−1)j αj e

− 21 αj2 u2 v

.

j≥ 0 25

This series converges absolutely for every u > 0 and v ∈ (0, 1). Conclude that 1

∫ 26

f(R,T ) (u, v )dv =

fR (u) = 0

√ 27

π

√ 28

29

8

8

π

31

1 0

u

2



(−1)j+l αj αl e

− 21 u2 (αj2 (1−v )+αl2 v )

dv =

0≤j,l≤∞



(−1)

j+l

− 21 u2 αj2

αj αl e

1



1 2 2 u (αj −αl2 )v

e2

dv.

(1.6)

0

0≤j,l≤∞

If j = l the integral in (1.6) is equal to one and otherwise it is equal to 2

30

u2



α −α 2 j

2 l

1 2 2 u (αj −αl2 )

u−2 (e 2

− 1).

This yields (1.1) upon noticing the symmetry of the summands with index (j, l), j ̸ = l. Integration of (1.1) gives (1.2). Please cite this article in press as: Ferger D., On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics and Probability Letters (2017), https://doi.org/10.1016/j.spl.2017.10.008.

STAPRO: 8049

D. Ferger / Statistics and Probability Letters xx (xxxx) xxx–xxx

3

In Ferger (1995) we derive the distribution function of (M + , T + ). It is given by

1

F(M + ,T + ) (x, y)



( √

2

)

x y(1 − y)

− exp{−2x2 }Φ

(

x(2y − 1)



)

y(1 − y)

− (1 − y){2Φ

( √

x

y(1 − y)

) − 1}

3

for x > 0 and 0 < y < 1. Differentiation yields the density



2

4

1

u2 (v (1 − v ))−3/2 exp{− u2 (v (1 − v ))−1 }, (1.7) π 2 for 0 < u < ∞ and 0 < v < 1 and zero elsewhere. Since (R+ , T + ) = g(M + , T + ) another application of the Transformation f(M + ,T + ) (u, v ) =

Theorem gives

5

6 7

√ f(R+ ,T + ) (u, v ) =



√ v (1 − v ) f(M + ,T + ) ( v (1 − v )u, v ) = 1(0,∞) (u)

2

1 2

u2 e− 2 u 1(0,1) (v ),

(1.8)

8

which in turn immediately results in (1.3) and (1.4). Moreover, we observe that f(R+ ,T + ) (u, v ) = fR+ (u)fT + (v ), whence R+ and T + ∼ U(0, 1) are independent. □

10

2. Applications in statistics

11

π

Let X1 , . . . , Xn be n ∈ N independent real random variables such that for some ∫ m ∈ {1, 2, . . . , n − 1} (moment of∫change) the Xi have a distribution function F or G according as i ≤ m or i > m. Put µ := xF (dx) (pre-change mean) and ν := xG(dx) (post-change mean). We want to test the hypothesis H0 : F = G (no change) against the alternative H1 : µ ̸ = ν (change of the mean after time m). The following test statistic is well-known in the change-point literature, confer Csörgő and Horváth (1997, section 1.4):

9

12 13 14 15 16



∑ | ki=1 (Xi − X¯ n )| n , max Tn := √ σˆ n 1≤k
for large values of Tn . Apart from the motivation one can find in Csörgő and Horváth (1997), we like to give another one: Some simple algebra shows that Tn = max |Tn (k)|,

17

18 19 20 21

1≤k
where

22

1

Tn (k) =

1 k

∑k

i=1

Xi −

σˆ n



1 k

∑n

1

i=k+1

n−k

+

1

Xi

.

23

n−k

This is the well-known two-sample t-test statistic for the hypothesis of equality of the means of the first k and the last n − k observations. Insofar the Tn -statistic is quite reasonable for detecting a change. If one wants to test against the one-sided alternative H1+ : µ > ν then Tn has to be replaced by

24 25 26



∑k ¯ n i=1 (Xi − Xn ) + Tn := max √ . 1 ≤ k < n σˆ n k(n − k)

27

To describe the asymptotic behavior of these test statistics under H0 we introduce

28

1 1 log log log n − log π. 2 2 It follows from Theorem A.4.2 of Csörgő and Horváth (1997) that if H0 holds, then An :=



2 log log n

lim P(Tn ≤

n→∞

and

t + Dn An

Dn := 2 log log n +

) = exp(−2e−t )

for all t ∈ R

29

30

(2.1)

and

31

32

lim P(Tn+ ≤

n→∞

t + Dn An

) = exp(−e−t )

for all t ∈ R.

(2.2)

For a given level α ∈ (0, 1) of significance let dn,α :=

33

34

Dn − log(− log(1 − α )) 1 2

An

Please cite this article in press as: Ferger D., On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics and Probability Letters (2017), https://doi.org/10.1016/j.spl.2017.10.008.

35

STAPRO: 8049

4

1

D. Ferger / Statistics and Probability Letters xx (xxxx) xxx–xxx

and

3 4

Dn − log(− log(1 − α ))

d+ n,α :=

2

An

be the pertaining critical values, i.e., H0 is rejected in favor of H1 if Tn > dn,α and in favor of H1+ if Tn+ > d+ n,α . Then (2.1) and (2.2) ensure that lim P(Tn > dn,α ) = lim P(Tn+ > d+ n,α ) = α

5

n→∞ 6 7 8 9 10 11 12 13

,

n→∞

under H0 .

(2.3)

Thus, both decision rules are asymptotic-level-α tests as desired. However, there is a serious problem. The limits in (2.1) and (2.2) are extreme value distributions (Gumbel-distributions) and the rate of convergence to such distributions ’’is believed to be usually very slow’’ as Csörgő and Horváth (1997, p. 23), point out. As a consequence there are poor approximations of the exact critical values even for large sample sizes n. In particular, the probability of the type I error is much smaller than the given level α of significance. These drawbacks can be observed in the first table of Example 2.2. √ On the other hand the weighting factors k(n − k), 1 ≤ k < n, occurring in the definition of Tn and Tn+ make the tests more sensitive under the alternative, especially if the proportion m/n is close to zero or to one. Therefore the statistician should certainly not do without weighting the cumulative sums Sk :=

14

k ∑

(Xi − X¯ n ), 1 ≤ k < n.

i=1 15 16

√ Our basic idea √ is to replace the deterministic weighting factors ˆkn (n − kˆ n ) or kˆ + ˆ+ n (n − kn ), respectively, where 1≤i
and kˆ + n := min{1 ≤ k < n : Sk = max Si }

19

20

1≤i
are the respective smallest maximizing indices. This leads to the new statistics

√ Rn :=

21

22

∑k

n max1≤k
σˆ n



24

25 26

28

29

30

31

− X¯ n )|

and n max1≤k
+

Rn :=

σˆ n



∑k

i=1 (Xi

− X¯ n )

.

ˆ+ kˆ + n (n − kn )

The next result yields distributional convergence under H0 , which means that all observations are i.i.d. Theorem 2.1. Let X1 , . . . , Xn , n ∈ N, be independent and identically distributed random variables with positive, finite variance σ 2 . Then we have L

27

i=1 (Xi

kˆ n (n − kˆ n )

√ 23

k(n − k), 1 ≤ k < n, by the single random constant

kˆ n := min{1 ≤ k < n : |Sk | = max |Si |}

17

18



Rn → R

and

L

+ R+ n → R .

(2.4)

Proof. To show the first statement in (2.4) we consider

∑k ∑ ¯ max1≤k
kˆ n . n

(2.5)

Introduce the tied-down partial sum process

[nt ] 1 ∑ Γn (t) := √ (Xi − X¯ n ), t ∈ [0, 1]. n i=1

32 33 34

For each n ∈ N the process Γn is a random element in the Skorokhod-space (D[0, 1], s), where s denotes the Skorokhodmetric. Let Mn : D[0, 1] → R with Mn (f ) :=

sup

1/n≤t ≤1−1/n

f (t),

f ∈ D[0, 1].

Please cite this article in press as: Ferger D., On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics and Probability Letters (2017), https://doi.org/10.1016/j.spl.2017.10.008.

STAPRO: 8049

D. Ferger / Statistics and Probability Letters xx (xxxx) xxx–xxx

5

If the maximum in the numerator of the second fraction in (2.5) is denoted by Nn , then Nn = Mn (|Γn |). In view of θˆn occurring in the denominator we define the functional Ψn : D[0, 1] → R by

Ψn (f ) := min{t ∈ Gn : f (t) = max f (s)},

1 2 3

s∈Gn

where Gn := { nk : 1 ≤ k < n}. In other words, Ψn (f ) is the smallest maximizing point of f on the grid Gn . Herewith, we obtain that θˆn = Ψn (|Γn |). Thus, if An := (Mn , Ψn ) : D[0, 1] → R , 2

[nt ] ∑

1 n

5 6

then it follows that (Nn , θˆn ) = An (|Γn |). We want to apply the Extended Continuous Mapping Theorem (ECMT), confer Billingsley (1968, Theorem 5.5). For that purpose, we first observe that

Γn (t) = σ √

4

(Zi − Z¯n ), t ∈ [0, 1],

7 8

9

i=1

where Zi = (Xi − µ)/σ , µ = E(Xi ) and Z¯n =

1 n

∑n

i=1 Zi .

Then from Donsker’s invariance principle one can easily infer that

L

Γn → σ B in (D[0, 1], s).

(2.6)

Thus, by continuity

10

11 12

L

|Γn | → σ |B| in (D[0, 1], s).

(2.7)

Next, we will prove that An = (Mn , Ψn ) converges in some sense (specified in (2.8)) to A = (M , Ψ ), where M(f ) = sup0≤t ≤1 f (t). As to the definition of Ψ (f ) let S(f ) := {t ∈ [0, 1] : f (t) ∨ f (t −) = sup f (s)}

13 14 15 16

0≤s≤1

with the convention f (0−) := f (0). By Lemma 6.1(i) of Ferger (2001), S(f ) is a non-empty, closed set, whence

17

Ψ (f ) := min S(f ), f ∈ D[0, 1],

18

is well defined (and measurable, confer Lemma 12.12 in Kallenberg (1997)). Obviously, Mn , Ψn and M are measurable and hence An and A are measurable as well. In order to use the ECMT it suffices to show: fn →s f ∈ Cˆ



An (fn ) → A(f ),

(2.8)

where Cˆ := {f ∈ C [0, 1] : f has a unique maximizing point}. To see this, notice that continuity of f ensures that actually ∥fn − f ∥ → 0, where ∥ · ∥ denotes the supremum norm on D[0, 1]. Observe that

|Mn (fn ) − M(f )| ≤ |Mn (fn ) − Mn (f )| + |Mn (f ) − M(f )| ≤ ∥fn − f ∥ + |Mn (f ) − M(f )|.

(2.9)

19 20 21

22 23 24

25

Since the sequence (Mn (f ))n is monotone increasing and bounded by ∥f ∥ < ∞ its limit exists and is equal to

26

f (t) = sup f (t) = sup f (t) = M(f ).

27

lim Mn (f ) = sup

n→∞

sup

n≥1 1/n≤t ≤1−1/n

0
0≤t ≤1

Conclude from (2.9) that: fn →s f ∈ Cˆ



28

Mn (fn ) → M(f ).

(2.10)

Furthermore, Lemma 6.1(ii) of Ferger (2001) says that: fn →s f ∈ Cˆ



Ψn (fn ) → Ψ (f ),

30

(2.11)

so that (2.8) immediately follows from (2.10) and (2.11). Thanks to P(B ∈ Cˆ ) = 1 we can apply the ECMT, which with (2.7) results in L

(Nn , θˆn ) = An (|Γn |) → A(σ |B|) = (σ M , T ). Let h : R → R with h(x, y) := 2

√ x

y(1−y)

(2.12)

, (x, y) ∈ H := R × (0, 1) and zero elsewhere. Then h is continuous on H and

(σ M , T ) ∈ H P-a.s. Recall (2.5) and notice that by definition θˆn ∈ [1/n, 1 − 1/n] ⊂ (0, 1). Therefore, (2.12) and the (simple) CMT yield that Qn = √

Nn

θˆn (1 − θˆn )

29

σM L = h(Nn , θn ) → h(σ M , T ) = √ . T (1 − T )

Please cite this article in press as: Ferger D., On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics and Probability Letters (2017), https://doi.org/10.1016/j.spl.2017.10.008.

31 32 33

34 35 36 37

38

STAPRO: 8049

6

D. Ferger / Statistics and Probability Letters xx (xxxx) xxx–xxx L

2

3 4

5

Let cα and cα+ be the (1 − α )-quantiles of FR and FR+ , which can be determined via Theorem 1.1. Then by (2.4) the tests + with rejection region {Rn > cα } and {R+ n > cα } are asymptotic level-α tests: + lim P(Rn > cα ) = lim P(R+ n > cα ) = α

n→∞ 6

7

L

M + Finally, observe that Rn = σˆ1 Qn and that σˆ n → σ , whence Rn → √T (1 = R by Slutsky’s Lemma. For the proof of R+ n → R −T ) n simply replace |Γn | by Γn and follow the above scheme using (2.6). □

P

1

n→∞

under H0 .

(2.13)

The rate of convergence in (2.13) is much faster than in (2.3). To see this consider the following example. Example 2.2. X1 , . . . , Xn i.i.d. ∼ N(0, 1) with n = 1000.

8

α

d+ n,α

Exact crit. val.

P(Tn+ > d+ n,α )

Percentage of α

0.10 0.05 0.01

2.98710 3.35323 4.18229

2.76275 3.03500 3.56796

0.056877 0.019792 0.001032

56.9% 39.6% 10.3%

9

α

cα+

Exact crit. val.

+ P(R+ n > cα )

Percentage of α

0.10 0.05 0.01

2.50028 2.79548 3.36821

2.44501 2.74166 3.31454

0.088403 0.043534 0.00845

88.4% 87.1% 84.5%

13

The values are based on a Monte-Carlo simulation with 106 copies of Tn+ and R+ n. For instance let us look at the level of significance α = 0.01. We observe that the Tn+ -test has a true probability for a false rejection, which is only 10.3% of α , whereas our test reaches 84.5%. Of course this has an impact on the power of the tests. The next example demonstrates that the new test performs significantly better than the traditional one.

14

Example 2.3. n = 500, m = 100 and α = 0.01.

10 11 12

N(0, 1) , N(−γ , 1) ,

{ 15

Xi ∼

16

1 ≤ i ≤ 100 101 ≤ i ≤ 500 γ

0.00

0.05

0.10

0.15

0.20

0.25

P(Tn+ > d+ n,α ) + P(R+ n > cα )

0.00090 0.00788

0.00187 0.01569

0.00456 0.03218

0.01240 0.06655

0.02965 0.12849

0.06903 0.22897

17

18 19 20 21 22 23 24 25 26

27

γ

0.30

0.35

0.40

0.45

0.50

P(Tn+ > d+ n,α ) + P(R+ n > cα )

0.13955 0.36165

0.25034 0.51945

0.40344 0.68072

0.57193 0.81211

0.72666 0.90104

It is very important to notice that our approach can be used also for other test problems. To set a first simple example consider the Kolmogorov–Smirnov-test for the uniform distribution. So, here X1 , . . . ,√ Xn are i.i.d. with uniform distribution on [0, 1]. Let Fn denote the pertaining empirical distribution function and let αn := n(Fn − Id) be the uniform empirical process. Then instead of using the Kolmogorov–Smirnov-statistic Kn := sup0≤t ≤1 |αn (t)| it is reasonable from statistical point |αn (t)| of view to replace it by its weighted version Vn := sup0
Kn

τn (1 − τn )

. L

28 29

30

L

Since (Kn , τn ) = A(|αn |) it easily follows that (Kn , τn ) → (M , T ). This is so, because αn → B and A is continuous on Cˆ , whence the CMT is applicable. Finally, Ln = h(Kn , τn ), and another application of the CMT gives Ln = √

Kn

τn (1 − τn )

L

→ R. L

31 32

Observe that our arguments remain valid for any other empirical process βn in place of αn provided βn → B. This opens the door to many further applications. Please cite this article in press as: Ferger D., On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics and Probability Letters (2017), https://doi.org/10.1016/j.spl.2017.10.008.

STAPRO: 8049

D. Ferger / Statistics and Probability Letters xx (xxxx) xxx–xxx

7

References Billingsley, P., 1968. Convergence of Probability Measures. John Wiley & Sons, New York. Billingsley, P., 1986. Probability and Measure, second ed.. John Wiley & Sons, New York. Csörgő, M., Horváth, L., 1997. Limit Theorems in Change Point Analysis. Wiley, New York. Ferger, D., 1995. The joint distribution of the running maximum and its location of D-valued Markov processes. J. Appl. Probab. 32, 842–845. Ferger, D., 2001. Analysis of change-point estimators under the null hypothesis. Bernoulli 7, 487–506. Jaeschke, D., 1979. The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals. Ann. Statist. 7, 108–115. Kallenberg, O., 1997. Foundations of Modern Probability. Springer-Verlag, New York.

Please cite this article in press as: Ferger D., On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics and Probability Letters (2017), https://doi.org/10.1016/j.spl.2017.10.008.

1

2 3 4 5 6 7 8 10 9