Local Gaussian correlation: A new measure of dependence

Local Gaussian correlation: A new measure of dependence

Journal of Econometrics 172 (2013) 33–48 Contents lists available at SciVerse ScienceDirect Journal of Econometrics journal homepage: www.elsevier.c...

789KB Sizes 1 Downloads 39 Views

Journal of Econometrics 172 (2013) 33–48

Contents lists available at SciVerse ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Local Gaussian correlation: A new measure of dependence Dag Tjøstheim ∗ , Karl Ove Hufthammer University of Bergen, Department of Mathematics, PO Box 7803, 5020 Bergen, Norway

article

info

Article history: Received 6 June 2011 Received in revised form 16 July 2012 Accepted 1 August 2012 Available online 11 August 2012 Keywords: Local likelihood Dependence measures Local dependence Local correlation

abstract It is a common view among finance analysts and econometricians that the correlation between financial objects becomes stronger as the market is going down, and that it approaches one when the market crashes, having the effect of destroying the benefit of diversification. The purpose of this paper is to introduce a local dependence measure that gives a precise mathematical description and interpretation of such phenomena. We propose a new local dependence measure, a local correlation function, based on approximating a bivariate density locally by a family of bivariate Gaussian densities using local likelihood. At each point the correlation coefficient of the approximating Gaussian distribution is taken as the local correlation. Existence, uniqueness and limit results are established. A number of properties of the local Gaussian correlation and its estimate are given, along with examples from both simulated and real data. This new method of modelling carries with it the prospect of being able to do locally for a general density what can be done globally for the Gaussian density. In a sense it extends Gaussian analysis from a linear to a non-linear environment. © 2012 Elsevier B.V. All rights reserved.

1. Introduction It is a common statement among finance analysts and econometricians that the correlation between financial objects becomes stronger as the market is going down, and that it approaches one when the market crashes, having the effect of destroying the benefit of diversification. Such a statement is imprecise, since it is not clear what ‘correlation’ means here. The purpose of this paper is to introduce a local dependence measure that gives a precise mathematical description and interpretation of such phenomena. In finance and econometrics there have been several attempts to handle variation in local dependence in, say, portfolio modelling (Silvapulle and Granger, 2001), measuring contagion (Rodrigues, 2007; Inci et al., 2011) and description of tail dependence (Campbell et al., 2008). Perhaps the most used solution to these problems is to employ the concept of conditional correlation; see, for example, Forbes and Rigobon (2002), and Longin and Solnik (2001) and Hong et al. (2007). This simply amounts to computing the ordinary product-moment correlation for certain regions of values of log difference returns. Given two random variables X1 and X2 with observed values (X1i , X2i ), i = 1, . . . , n the correlation between X1 and X2 conditionally on being in a region A is



Corresponding author. E-mail address: [email protected] (D. Tjøstheim).

0304-4076/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2012.08.001

 ρc (A)

 (X1i ,X2i )∈A

= 

(X1i −  µX1 ,c )(X2i −  µX2 ,c ) 1/2 



(X1i −  µX1 ,c )

2

(X1i ,X2i )∈A

1/2  (X1i ,X2i )∈A

(X2i −  µX2 ,c )

2

where

 µX1 ,c =

1



nA (X ,X )∈A 1i 2i

X1i

and  µX 2 ,c =

1



nA (X ,X )∈A 1i 2i

X2i ,

with nA being the number of pairs with (X1i , X2i ) ∈ A. For an ergodic series (X1i , X2i ), as nA tends to infinity,  ρc (A) converges to ρc (A) = corr(X1 , X2 |(X1 , X2 ) ∈ A) almost surely. There are several disadvantages of the conditional correlation concept: (i) The introduction of an indicator function to define the local region means that ρc (A) does not equal the global correlation ρ for a pair of jointly Gaussian variables (X1 , X2 ). This is unfortunate, since in a Gaussian distribution dependence is completely characterised by the correlation coefficient. This difficulty is sometimes termed the bias problem for the conditional correlation, and attempts to correct it have been sought in several ways; see, for example, Forbes and Rigobon (2002). (ii) The conditional correlation is defined for a region A and not for a point (x1 , x2 ). How should A be chosen? (iii) The conditional correlation adopts a linear dependence measure to local regions. Is this sensible in a non-linear and non-Gaussian situation? We believe the local Gaussian correlation that we advocate in this paper does not have these disadvantages. The idea behind

34

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

Fig. 1. Local Gaussian correlation for a bivariate distribution with standard Gaussian marginals and a Gumbel copula.

it is very simple, though its computation and estimation are far more difficult than the corresponding computation and estimation of the conditional correlation. We start by recalling that the ordinary correlation coefficient is fully satisfactory for a Gaussian distribution. Now, for a general bivariate density f for the variables (X1 , X2 ), locally in a neighbourhood of each point x = (x1 , x2 ) we fit a Gaussian bivariate density 1

ψ(v, µ(x), Σ (x)) = 2π |Σ (x)|1/2  1

× exp − (v − µ(x)) Σ T

−1

2



(x)(v − µ(x))

using local likelihood. Here v = (v1 , v2 )T is the running variable in this Gaussian distribution, µ(x) = (µ1 (x), µ2 (x))T is the local mean vector, Σ (x) = (σij (x)) is the local covariance matrix, and we desire that ψ(v, µ(x), Σ (x)) equals f (x) at v = x and is close to f in a neighbourhood of x. With σi2 (x) = σii (x), we define the σ (x)

local correlation at the point x by ρ(x) = σ (x12)σ (x) , and in terms of 1 2 the local correlation, ψ may be written in the usual fashion,

ψ(v, µ1 (x), µ2 (x), σ12 (x), σ22 (x), ρ(x)) =

1 2πσ1 (x)σ2 (x) 1 − ρ 2 (x)







(v1 − µ1 (x))2 − 2ρ(x) 2 1 − ρ 2 (x) σ12 (x)  (v1 − µ1 (x))(v2 − µ2 (x)) (v2 − µ2 (x))2 × + . σ1 (x)σ2 (x) σ22 (x) × exp −

1

1

(1)

In the sequel we will use θ (x) to denote the 5-dimensional vector [µ1 (x), µ2 (x), σ12 (x), σ22 (x), ρ(x)] or alternatively with ρ(x) replaced by σ12 (x) as the 5th element. As we move to another point x′ , we obtain a new Gaussian approximation density ψ(v, µ(x′ ), Σ (x′ )), which approximates f in a neighbourhood of x′ . In this way f is represented by a family of Gaussian bivariate densities as x varies, the point being that in each specific neighbourhood of a point x, the local dependence properties are described by the local covariance/correlation matrix, and since ψ is Gaussian, we get a complete characterisation of the dependence relationship in that neighbourhood (modulo the precision of the approximation). All of this depends on whether a local Gaussian representation

exists and to what degree it is unique. This problem will be discussed throughout Section 2, along with an appropriate asymptotic estimation theory in Sections 3–4. The ‘bias’ problem of the conditional correlation disappears trivially for the local Gaussian correlation, since if f is Gaussian, the same Gaussian will approximate f at every point. Note that in general the local Gaussian is not centered at the point x. Indeed, in the case of f Gaussian, all of the local Gaussians will be identical to f and hence located at the mean µ of f . A picture of the estimated local correlation of a distribution with standard Gaussian marginals and a Gumbel copula is shown in Fig. 1. The local Gaussian correlation increases as (x1 , x2 ) increases. We will return to the pattern of Fig. 1 in Sections 6.2 and 8.3. The corresponding local Gaussian correlation for the Clayton copula, sometimes used to model financial objects, has the opposite pattern. We refer to Berentsen et al. (2012) for more details and other copula structures. It may be useful to compare our approach to measuring local dependence with other attempts in the statistical literature. Bjerve and Doksum (1993) and Doksum et al. (1994) base their approach on the relationship between correlation and linear regression. There are well-defined procedures for defining non-linear and non-parametric regression models, and the authors define a nonlinear local correlation by extending linear relationships to nonlinear ones. A disadvantage is that the response variable Y plays a fundamentally different role from the input variable X . This leads to asymmetry in their local correlation (which they can heuristically correct for). An application to finance is given in Inci et al. (2011). In our approach we treat (X , Y ), or rather (X1 , X2 ), on exactly the same basis. This in turn leads to non-traditional nonparametric estimation problems. In the approach by Holland and Wang (1987) and Jones (1996), the two variables are treated on the same basis. The authors start from a local version of the odds ratio in contingency tables. Using limit arguments, they arrive at the local dependence function

γ (x1 , x2 ) =

∂2 log f (x1 , x2 ). ∂ x1 ∂ x2

This is a function of the ordinary correlation coefficient when the variables are jointly Gaussian, but its range is not between −1 and 1. Jones (1996) demonstrated that γ may be obtained by a localization argument of the product moment correlation appropriately scaled; see also Jones (1998) and Jones and Koch (2003). In a recent contribution Delicado and Smrekar (2009) propose a new dependence measure using principal curves.

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

A formal definition of local Gaussian population parameters

µ(x), Σ (x) and hence ρ(x) is given in Section 2.1. After that, in Section 2.2, a discussion of local parameters and their computation in the setting of (possibly nonlinear) transformations of Gaussian variables is given. A connection to copula theory is pointed out. See also Section 5. The local quantities µ(x) and Σ (x) must be estimated as a function of x. We use local likelihood for that purpose. The relevant theory is outlined in Sections 3–4. A first stage of an asymptotic theory with a fixed bandwidth is given in Section 3. The general case of a decreasing bandwidth is covered in Section 4. A number of properties of the local parameters, such as range, transformation, symmetry and tail dependence, are established in Sections 5–6. Matters of practical implementation, such as computing standard errors and choosing bandwidth, are looked at in Section 7. Finally, there are some illustrations on simulated and real data in Section 8. The present paper contains the basic theory and properties of the local Gaussian correlation. There are many potential applications. Applications to independence testing, including introduction of positive and negative nonlinear dependence, are in Berentsen and Tjøstheim (2012). A connection to copula theory and a characterization and visualization of their dependence structure can be found in Berentsen et al. (2012). Applications to financial contagion are given in Støve et al. (2012). 2. Existence of local parameters In Section 2.1 we define population parameters θb (x) for a fixed bandwidth b completely analogous to those defined in Hjort and Jones (1996). We then let b tend to zero to obtain population values θ (x). In Section 2.2 we look at the special case of nonlinear transformations of Gaussian variables and copulas. 2.1. A local penalty function First note that the representation in (1) is not well-defined unless extra conditions are imposed. Actually, as is easy to check, if f (x) is a global Gaussian N (µ, Σ ), infinitely many Gaussians densities can be chosen that pass through f (x) at the point x. However, these Gaussian densities are not really the objects we want. We need to construct a Gaussian approximation that approximates f (x) in a neighborhood of x and such that (1) holds at x. In the case of X ∼ N (µ, Σ ) this is trivially obtained by taking one Gaussian; i.e., µ(x) = µ and Σ (x) = Σ for all x. In fact, these relationships may be taken as definitions of the local parameters for a Gaussian distribution, and it will be shown in Sections 3 and 4  (x) converge towards that the local likelihood estimates  µ(x) and Σ µ and Σ . Since our focus is on representation (and estimation) in a neighbourhood, in order to define population parameters, we start with a fixed neighbourhood of x defined by a 2-dimensional bandwidth b and a local penalty function. Subsequently we let b be small. Perhaps the most natural penalty function is the one given by ′



q =

Kb (v − x)(ψ(v, θ (x)) − f (v))2 dv

1 −1 where Kb (v − x) = (b1 b2 )−1 K (b− 1 (v1 − x1 ))K (b2 (v2 − x2 )) is a product kernel with bandwidth b = (b1 , b2 ). Then θ (x) = θb (x) could be chosen to minimize q′ , so that it would satisfy



∂ψ Kb (v − x) (v, θ (x))[f (v) − ψ(v, θ (x))]dv = 0, ∂θj j = 1, . . . , 5.

(2)

In a sense this would correspond to a local least squares approximation. But since we are dealing with densities a criterion related

35

to local log likelihood may be better suited. Such a penalty function is obtained by replacing q′ with

 q=

Kb (v − x)[ψ(v, θ (x)) − log ψ(v, θ (x))f (v)]dv.

(3)

As is seen in Hjort and Jones (1996) this can be interpreted as a locally weighted Kullback–Leibler distance from f to ψ(·, θ (x)). Corresponding to (2), with the local score function as a weight function, we then obtain that the minimizer θ (x) of (3) should satisfy



Kb (v − x)

∂ log(ψ(v, θ (x))[f (v) − ψ(v, θ (x))])dv = 0, ∂θj

j = 1, . . . , 5.

(4)

We define the population value θb (x) as the minimizer of the penalty function q, assuming that there is a bandwidth b0 such that there exists a unique minimizer of (3) satisfying (4) for any b with 0 < b < b0 . Here, and in the following, for 2-dimensional bandwidths b′ ≤ b ≤ b′′ is taken to mean b′i ≤ bi ≤ b′′i for i = 1, 2. This definition of θb (x) is essentially identical to the one used in Hjort and Jones (1996) for more general parametric families of densities. Note that both (2) and (4) yield local solutions θb (x) such that if Kb has a compact support Sb and f1 (v) and f2 (v) coincide on {v : (v − x) ∈ Sb }, then θb (f1 , x) = θb (f2 , x). It is easy to find examples where (4) is satisfied with a unique θb (x). A trivial example is when X ∼ N (µ, Σ ) is Gaussian, where θb (x) = θ (x) = (µ, Σ ). The next stage consists in defining a step function of a Gaussian variable Z ∼ N (µ, Σ ), where we will take µ = 0 and Σ = I2 , the identity matrix of dimension 2. Let Ri , i = 1, . . . , k be a set of non-overlapping regions of R2 such that R2 = ∪ki=1 Ri . Further, let ai and Ai be a corresponding set of vectors and matrices in R2 such that Ai is non-singular, and define the piecewise linear function X = gs (Z ) =

k  (ai + Ai Z )1(Z ∈ Ri ),

(5)

i =1

where 1(·) is the indicator function. Let Si be the region defined by Si = {x : x = ai + Ai z , z ∈ Ri }. It is assumed that (5) is one-to-one in the sense that Si ∩ Sj = ϕ for i ̸= j and ∪ki=1 Si = R2 . To see that the linear step function (5) can be used to obtain a solution of (4) let x be a point in the interior of Si and let the kernel function K have a compact support. By choosing b small enough, then all v for which v − x is in the support of Kb will be in Si . Under this restriction on b, θb (x) = θ (x) ≡ θi = (µi , Σi ), where µi = ai and Σi = Ai ATi as defined in (5). Thus, in this sense, for a fixed but small b, there exists a local Gaussian approximation ψ(x, θb ) of f , with corresponding local means µi,b (x), variances σi2,b (x), i = 1, 2, and correlation ρb (x). It will be seen in Sections 3 and 4 that for these points b (x)] converge the local likelihood estimates  θb (x) = [ µb (x), Σ towards θb (x) = [µb (x), Σb (x)] as the number of observations tends to infinity and the bandwidth b is held fixed, and towards a population parameter θ (x) as n → ∞ and b → 0. (The local parameters at the boundary of the regions can be defined by some convention; these boundaries having zero measure in R2 .) Note that (5) can be used to simulate observations from a pair (X1 , X2 ) having a simple given local stepwise structure (µi , Σi ) in 1 the x1 − x2 plane with Ri = {z : z = A− i (x − ai ), x ∈ Si } and with 1/2

ai = µi , Ai = Σi . Before we proceed any further and let b → 0, let us verify that the population values satisfying (4) are consistent with the local likelihood approach of Hjort and Jones (1996). Their primary purpose was not the estimation of a population parameter θ (x) but rather the estimation of f (x) via ψ(x, θ (x)) where ψ may not be Gaussian and where θ (x) may not be a location-scale parameter. To

36

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

introduce the local likelihood, let (X1 , . . . , Xn ) with Xi = (X1i , X2i ) be i.i.d. observations from the density f and let ψ(x, θ (x)) be the parametric family we are interested in (Gaussian in our case). One might suppose that an appropriate local log likelihood, which should be maximised to obtain an estimate of θ (x) for a fixed x, is ′

L =n

−1



 Kb (Xi − x) log ψ Xi , θ (x) , 

approximating the dependence between the Xt variables by a multivariate Gaussian, but in general this will lead to a drastic increase in the number of local parameters, and we prefer instead to use the marginal local Gaussian penalty function Qn X1 , . . . , Xn , θb (x) = −nL X1 , . . . , Xn , θb (x)



where Kb is a kernel function as defined above. However, this gives nonsensical results when used in an approximation context, as has been realised by Hjort and Jones (1996) and others. There are several modifications of the local likelihood to handle this problem; see the discussion in Hjort and Jones (1996), and Eguchi and Copas (1998). We follow Hjort and Jones (1996) in using the modified local log likelihood, which in our notation, with a fixed bandwidth b, is



Kb (Xi − x) log ψ(Xi , θb (x))

i

 −

Kb (v − x)ψ(v, θb (x))dv.

(6)

The last term implies that ψ(x, θb (x)) is not allowed to stray far away from f (x) as b → 0. Indeed, using the notation uj (·, θ ) = ∂/∂θj log ψ(·, θ ), by the law of large numbers, assuming E{Kb (Xi − x) log ψ(xi , θb (x))} < ∞, we have almost surely

 ∂L Kb (Xi − x)uj (Xi , θb (x)) = n−1 ∂θj i  − Kb (v − x)uj (v, θb (x))ψ(v, θb (x))dv  → Kb (v − x)uj (v, θb (x))[f (v) − ψ(v, θb (x))]dv.

(7)

Putting the expression in the first two lines of (7) equal to zero yields the local maximum likelihood estimates  θb (x) of the population value θb (x) which satisfy the third line of (7) put equal to zero, which in turn is identical to our requirement for the population value in (4). Letting b → 0 and requiring

∂ L/∂θj = 0

(8)

 

+n



uj x, θb (x) [f (x) − ψ(x, θb (x))] + O(bT b) = 0,



(9)

so that, ignoring solutions that yield uj x, θb (x) = 0, (9) requires





  ψ x, θb (x) to be close to f (x), in fact with a difference of the order O(bT b) as b → 0. (This is exploited in the bandwidth algorithm of Section 7.2.) Hjort and Jones (1996) did consider local Gaussians in Sections 5.3, 5.4, and 5.7 of their paper as one of several examples of parametric families that could be used for estimating f . Gourieroux and Jasiak (2010) put the main emphasis on the Gaussian family, but again the main purpose was to estimate the density, in particular in the tails. Our goal is different, as we are interested in the parameter vector θ itself as a means of describing the local dependence. One of the most important motivations for introducing a local Gaussian correlation is to be able to handle time series, such as financial returns. One example would be {Xt } = {Yt , Yt −j } for a univariate time series {Yt }, for the purpose of modelling autodependence at lag j. The local log likelihood function of Eq. (6) can be used for dependent data in that it can be maximised using the same algorithm as in the i.i.d. case, resulting in the same expression for  θb (x) in terms of the variables Xt . When the Xt variables are dependent, however, it cannot really be thought of as a local Gaussian log likelihood. A local likelihood can be constructed by

(10)

Kb (Xt − x) log ψ(Xt , θ (x)) Kb (v − x)ψ(v, θb (x))dv,

(11)

which is proportional to the local Gaussian likelihood in the independent case and could be interpreted as an M-estimation penalty function in the more general case. Assuming stationarity, we can still approximate the marginal distribution of {X1t , X2t } by a local Gaussian distribution, and we can model the local Gaussian auto-correlation structure for {Yt , Yt −j } for each j. A natural next step would be to use this as a first step in a local autocorrelation analysis of time series, including a local spectral density function that could describe periodicities on several scales. Using (11), the expected value of the function n−1 Q (x) for a fixed b is given by the penalty function q defined in (3). The time series version of Eq. (4) is of course identical in form to the one obtained from (7) and is again identical to the equation used by Hjort and Jones (1996) to define the population value θb (x). In Section 3 it will be shown that the corresponding local likelihood estimate converges towards θb (x) for a stationary ergodic series under some additional regularity conditions. We now return to the question as to what happens when the neighbourhood defined by b gets smaller. This is in fact considered indirectly by Hjort and Jones (1996) on pp. 1627–1630, for a possibly more general parametric family than the Gaussian. Assuming differentiability and Taylor expanding in the integral equation (4),



Kb (v − x)α(v)dv = α(x) +

×

2 2 1 

2 i=1 j=1

σK2i,j

∂ 2 α(xi , xj ) bi bj + o(bT b), ∂ xi ∂ xj

where α(v) = u(v, θ (x))[f (v) − ψ(v, θb (x))], σK2i,j =

leads to





=−

i

  L X1 , . . . , Xn , θb (x) = n−1



(12)



j

z1i z2

K (z1 , z2 )dz1 dz2 with uT = [u1 , . . . , u5 ]. This leads to f (x) − ψ(x, θb (x)) → 0, but also equality in derivatives as b → 0. The details are somewhat tedious (cf. Hjort and Jones, 1996) and are not needed by us here. It will be used in a separate paper where we look at the bias of the local likelihood estimates. The point for us is that these consistency conditions guarantee that f (x) − ψ(x, θb (x)) → 0 in a smooth manner as b → 0. If, with little loss of generality, we assume that b1 = kb2 for some constant k, then as b → 0 in this manner, the sequence ψ(x, θb (x)) is a Cauchy sequence converging to f (x). For two bandwidths bi = (bi1 , bi2 ), i = 1, 2, by the mean value theorem and the continuous differentiability of ψ , with respect to θb (x) = [θ1,b (x), . . . , θ5,b (x)] = [µb (x), Σb (x)],

ψ(x, θb1 (x)) − ψ(x, θb2 (x))  ∂ = (ψ(x, ξ ))(θi,b1 (x) − θi,b2 (x)) ∂θi i =1 and θb (x) is a Cauchy sequence and has a unique limit θ (x) if the vector of partial derivatives ∂θ∂ (ψ(x, ξ )) has linearly independent i components. Here ξ is determined by the mean value theorem. Recalling that ψ is Gaussian, the linear independence condition is not a strong condition. In the sequel the above regularity conditions will be assumed to hold, guaranteeing the existence of a unique θ (x) = (µ(x), Σ (x)),

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

and it will be shown in Section 4 that the local likelihood estimates

 θbn ,n (x) converge towards θ (x). Again, for the step function model of (5) the existence of a unique θ (x) is trivial since θb (x) does not depend on b once b is small enough. Moreover, if there exists a neighborhood of x such that f1 (x) = f2 (x) on this neighborhood, and if the kernel function K has a compact support, then, in the notation defined after Eq. (4), there exists a bandwidth b0 such that for b ≤ b0 , θb (f1 , x) = θb (f2 , x), and hence θ (f1 , x) = θ (f2 , x).

2.2. Non-linear transformations and explicit computation of population parameters As seen above the local parameters can be explicitly calculated for the linear step function case in (5). A continuous one-to-one function g : z → g (z ) = x mapping R2 onto R2 with an inverse h = g −1 can be approximated by a sequence of one-toone piecewise linear functions such as in (5) by letting k increase and by letting the regions Ri be smaller. If g is continuously differentiable at z, then a representation such as (1) can easily be found by Taylor expansion with µ′ (x) and Σ ′ (x) defined explicitly in terms of g (or h), but unfortunately it is not unique. The existence of an open set in Si within which X is identical to a Gaussian is the crucial argument in the derivation of uniqueness for the local parameters of the piecewise linear function. This property is lost for a general continuous function g. However, as seen in Berentsen et al. (2012), using an extension of the Rosenblatt (1952) representation, there is a set of points for which we can prove uniqueness and actually compute the local Gaussian parameters (µ(x), Σ (x)), and in particular the local Gaussian correlation ρ(x). This result can be applied to an exchangeable copula C where we have been able to compute ρ(x) along the curve defined by {x = (x1 , x2 ) : F1 (x1 ) = F2 (x2 )}, where F1 and F2 are the two marginal cumulative distribution functions of X . If they are the same, then the curve reduces to the diagonal x = (s, s). In such a case, with FX (x) = C (F1 (x1 ), F2 (x2 )), it is shown in Berentsen et al. (2012) that ρ(s, s) = 

C11 (F1 (s), F2 (s))φ(Φ −1 (F1 )(s))

[φ(Φ −1 (C1 (F1 (s), F2 (s))))]2 + [C11 (F1 (s), F2 (s))φ(Φ −1 (F1 (s)))]2

(13) where φ is the standard normal density, where C1 (u1 , u2 ) = ∂ ∂u 1

2 C (u1 , u2 ) and C11 (u1 , u2 ) = ∂ 2 C (u1 , u2 ). This formula can be sim-

∂ u1

plified for various copula classes and gives an explicit formula for the local parameters; i.e., the local correlation along the diagonal for the density f (x) determined by the copula in question. In Berentsen et al. (2012) this has been checked against the estimated local Gaussian correlation (it can be estimated everywhere, not only along the diagonal, for these copulas, an example of which is given in Fig. 1). Subsequently these local correlations can be used in a test of fit of copula models using the theoretical formula for ρ(x) along the diagonal. So far we have not been able to compute the population value ρ(x) for a general x. We refer to Berentsen et al. (2012) for details. A much simpler example is given by the parabola X2 = X12 + ε;

ε ∼ N (0, σε2 ),

X1 ∼ N (0, σx2 ),

where X1 and ε are independent. This is a textbook example of a situation where there is (strong) dependence between X2 and X1 , but where the global correlation ρ(X1 , X2 ) = 0 under very general conditions. We define the function g (Z ) by X1 = σX Z1 ;

X2 = σX2 Z12 + σε Z2 where Z1 and Z2 are independent standard normal variables. It ∂g follows that a local linear approximation is given by ∂ z (z )Z with T Z = [Z1 , Z2 ] and

∂g σX (z ) = 2σX2 z1 ∂z 

0

σε



37

.

Along the curve x2 = x21 the local covariance is given by

 2 ∂g ∂g T σX Σ( , ) = (z ) (z ) = 2σX2 x1 ∂z ∂z x1 x21

2σX2 x1 , 4σX2 x21 + σε2



with constant local variance in the x1 direction, which is reasonable since X1 is Gaussian. The local correlation is given by ρ(x1 , x21 ) =



2x1 / 4x21 + σε2 /σX2 . As σX → ∞, or σε → 0, ρ(x1 , x21 ) tends

to +1 for x1 > 0 and to −1 for x1 < 0, whereas for x = 0, ρ(x1 , x21 ) = 0. All of these results are intuitively reasonable, and are consistent with simulation experiments. There is some bias that can be predicted and that decreases as the bandwidth in the local likelihood algorithm tends to zero, but such bias problems will be studied generally in a separate publication. Finally, in the one-dimensional case the Rosenblatt (1952) transformation is unique everywhere, and in our setting we have X = g (Z ) = FX−1 (Φ (Z )), where Φ is the cumulative distribution function for a standard normal variable Z . The local mean and variance at the point z can then be obtained by Taylor expansion of g (Z ) such that with z = g −1 (x), µ(x) = g (z ) − g ′ (z )z and σ 2 (x) = (g ′ (z ))2 The case of the uniform distribution is somewhat pathological in this context. If X is one-dimensional uniform, the support of X is bounded and the bound depends on the parameter defining the distribution. On the support of f (x), ψ(x, θ (x)) ≡ constant but then the whole representation in terms of ψ and θ (x) degenerates with a solution µ(x) = x, σ (x) ≡ constant that is in fact picked up by the likelihood estimates. In some sense this solution is picked up by (4), since the parametrization with µ(x) = x, σ (x) ≡ constant has uj (x) = 0, whereas the Rosenblatt representation is a unique solution of (4) only if uj (x) ̸= 0. 3. Asymptotic theory for b fixed After having defined population values we are in a position to discuss local likelihood estimates of these and prove consistency and asymptotic normality. We define the local likelihood estimate  θb (x) as a solution of (8). With a sufficient number of observations we can estimate the population values for different bandwidths, thereby getting several measures of correlation, each measuring the dependence at the specified degree of locality. For the extreme case of b = ∞, the local Gaussian likelihood reduces to the global Gaussian quasi-likelihood, and it is easily seen that by maximising this an estimated parameter vector  θ that estimates the vector θ composed of the global means, global variances and the global correlation of X1 , X2 is obtained. A numerical algorithm has been developed and is available from the authors on request. In the asymptotic theory we will first look at the case of a fixed b; i.e. estimating θb (x) of (4). In the next section we let b → 0, estimating θ (x) as defined in Section 2. Asymptotic theory for b fixed in the i.i.d. case is covered in Hjort and Jones (1996). Our contribution in this section is in allowing dependence in time as formulated in Theorem 1. A basic assumption in Hjort and Jones (1996) is the existence of a unique solution θb of (4) for every b > 0. Under this assumption they state the following result, which is used as an instrument in density estimation: Proposition 1 (Hjort and Jones, 1996). Under the existence and uniqueness assumption on θb (x), and letting  θn,b (x) be the local likelihood estimate of θb (x), d

(nb1 b2 )1/2 [ θn,b (x) − θb (x)] → N (0, Jb−1 Mb (Jb−1 )T ),

38

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

1

where

.



Jb =

+ (θ − θb )T Tn (θ ∗ , θb )(θ − θb ), 2

Kb (v − x)u v, θb (x) uT v, θb (x) ψ v, θb (x) dv

 



 −

 



Kb (v − x)∇ u v, θb (x) f (v) − ψ v, θb (x) dv,









(14)

with u(x, θ ) = ∇ log ψ(x, θ ) being the column vector of score functions, and where

.

a.s.

∂ Qn (θb ) → 0. (i) n−1 ∂θ

Mb = lim Var (nb1 b2 )1/2 Vn (θb )





n→∞

 = b1 b2

a.s.

(ii) n−1 Wn (θb ) → Wb , where Wb is a 5 × 5 positive definite matrix that can be identified with Jb of (14). (iii) For j, k = 1, . . . , 5, limn→∞ supδ→0 (nδ)−1 |Tn (θ ∗ , θb )jk | < ∞ a.s., where Tn (θ ∗ , θb )jk is the (j, k)th element of Tn (θ ∗ , θb ).

Kb2 (v − x)u v, θb (x) uT v, θb (x) f (v)dv



 





  − b1 b2 Kb (v − x)u v, θb (x) f (v)dv    × Kb (v − x)uT v, θb (x) f (v)dv

(15)

The use of the scaling factor (nb1 b2 )1/2 is perhaps slightly deceptive, as the limit theorem is for b fixed. The question of existence and uniqueness of θb has been discussed in Section 2.1. The local likelihood algorithm works by fitting a Gaussian surface to the surface of f at any neighbourhood of finite size b, and if f is Gaussian or stepwise Gaussian, its parameter vector θ is uniquely determined by any finite portion f (x), x ∈ N (x) of its surface, where N (x) is a neighbourhood of x. As mentioned in Section 2.1 an important motivation for introducing a local Gaussian correlation is to be able to handle time series, such as financial returns. In the following theorem we extend the above result to the time series case using (11) as a local penalty function. Theorem 1. Let {Xt } be a stationary bivariate time series with Xt having bivariate density f . Assume that (i) {Xt } is geometrically ergodic, (ii) f has support on all of R2 , and is such that E|Kb (Xt − x)∂/∂θj log ψ(Xt , θ )|γ < ∞, j = 1, . . . , 5 for some γ > 2, and (iii) there exists a bandwidth b0 such that there is a unique minimizer θb (x) of (3) for every 0 < b < b0 . Then for every ε > 0 there is a possibly x-dependent event A with P (Ac ) < ε such that there exists a sequence  θn,b (x) that minimises Q in (11), and such that  θn,b (x) converges almost surely to θb (x) on A. Moreover, the central limit result in Proposition 1 continues to hold, with the integral expression in the first part of (15) as the leading term for Mb . Proof. We will use the approach formalised by Klimko and Nelson (1978) and Tjøstheim (1986) (see also Taniguchi and Kakizawa, 2000, pp. 96–100). This is an approach for obtaining asymptotic properties of estimates using Taylor expansion of a general penalty function Q . As before, we denote the minimiser of Qn by  θ = θn,b . We omit the x-dependence in the sequel, as x is fixed. Taylorexpanding, we have in a neighbourhood N = {θ : |θ − θb | < δ} of θb (noting that Qn (θ ) is twice continuously differentiable with respect to θ )

∂ Qn (θb ) ∂θ

∂2 Qn (θb )(θ − θb ) 2 ∂θ ∂θ T  2  1 ∂ ∂2 T ∗ + (θ − θb ) Qn (θ ) − Q (θb ) (θ − θb ) 2 ∂θ ∂θ T ∂θ ∂θ T ∂ . = Qn (θb ) + (θ − θb )T Qn (θb ) ∂θ 1

+ (θ − θb )T

1

+ (θ − θb )T Wn (θb )(θ − θb ) 2

In our case, because of the ergodic assumption on Xt ,

∂ ∂L Qn (θb ) = (X1 , . . . , Xn , θb ) ∂θ ∂θ    ∂ a.s. → Kb (v − x) log ψ(v, θb ) f (v)dv ∂θ    ∂ − Kb (v − x) log ψ(v, θb ) ψ(v, θb )dv = 0, ∂θ

−n − 1

with Vn (θ ) = ∂ L(θ )/∂θ .

Qn (θ) = Qn (θb ) + (θ − θb )T

where θ ∗ is an intermediate point between θ and θb . It should be carefully noted that we do our analysis for b fixed, so the true value of θ is identified with θb satisfying Eq. (4). To obtain the consistency statement (via Egorov’s theorem), we have to show (see Klimko and Nelson, 1978, Theorem 2.1, or Taniguchi and Kakizawa, 2000, Theorem 3.2.23)

due to the requirement on θb in (4). Similarly, using ergodicity, it can be shown that n−1 Wn (θb ) = −

∂ 2 L(X1 , . . . , Xn , θb ) a.s. → Jb . ∂θ ∂θ T

Since θb is global minimizer of the function Kb (v − x)[ψ(v, θ ) − log(ψ(v, θ ))f (v)]dv , this means that the second-order derivative matrix of the function must be positive definite at θb , but this derivative matrix is Jb = Wb , so that (ii) above is fulfilled. Finally, the third-order derivative of ψ(x, θ ) is bounded on N , and (iii) is implied by the mean-value theorem. Theorem 2.2 of Klimko and Nelson (1978), or the last part of Theorem 3.2.23 of Taniguchi and Kakizawa (2000), can now be used to prove asymptotic normality of  θn,b . In addition to (i)–(iii) above, we need to verify the condition



1 d ∂ (iv) n− 2 ∂θ Qn (θb ) → N (0, Sb ),

where Sb is a positive definite matrix that can be identified with (b1 b2 )−1 Mb in (15). The expression for Mb is positive definite since it can be expressed as the covariance matrix of the stochastic √ vector variable b1 b2 u(Xt , θb (x)) (u cannot be degenerate due to the uniqueness condition). To prove (iv) in our case, we can use a mixing CLT; see, for instance, Theorems 2.20(i) and 2.21(i) of Fan and Yao (2003). Those theorems are formulated for a scalar process, ∂ whereas n− 2 ∂θ Qn (θb ) is a five-dimensional quantity. However, Theorem 2.20(i) of Fan and Yao (2003) can be applied componentwise, and by using a Cramér–Wold argument, Theorem 2.21(i) can be used. These theorems assume an α -mixing process {Xt } with a certain mixing rate, but this is fulfilled by our assumption of geometric ergodicity (in fact a slower rate would suffice). In addition, it is required that E|Kb (Xi − x)∂/∂θj log ψ(Xi , θ )|γ < ∞ for some γ > 2, which is our condition (ii) in the statement of Theorem 1.  1

Remark 1: The condition that f has support on all of R2 can be weakened, but with this assumption we do not need to discuss what happens with θb at the boundary of the support. A pathological situation emerges for a uniform distribution, as was seen at the end of Section 2.2. Remark 2: The condition of geometric ergodicity is quite standard in time series analysis. The condition (ii) is fulfilled under

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

weak conditions on f because of the presence of the kernel function. In fact, the finiteness condition in (ii) can be expressed as



γ Kb (v − x)





 ∂ 1 log |Σ (x)| ∂θj 2 γ

1

− (v − µ(x)) Σ T

2

−1

(x)(v − µ(x))

f (v)dv < ∞

where θj , j = 1, . . . , 5 corresponds to the components of µ(x) and Σ (x). Clearly, for a fixed x, this integral exists by choosing the tail behaviour of the kernel K to decrease sufficiently fast compared to the tail behaviour in f . In particular, K can be chosen to have a compact support. The condition (iii) (in a slightly stronger version) is left unverified in Hjort and Jones (1996), but has been discussed in Section 2.1. It seems quite reasonable from a practical point of view, and it holds for the step function model (5), which in turn can be taken as an approximation for more general models. Remark 3: The question remains how Mb and Jb should be evaluated for a given data set. We will return to this question in Section 7. 4. Asymptotic theory as b → 0 in the global Gaussian case Looking at the case where f is globally Gaussian may seem to be of little use, but it turns out that this case and the existenceuniqueness arguments of Section 2 are key elements in proving a limit theorem in the more general case. In fact, once existence and uniqueness are established as in Section 2, the proof of the asymptotic properties is virtually identical to the proof presented in the present section, in particular it holds trivially for the step function model (5). The global Gaussian case (and the stepwise Gaussian case) has the advantage that we need not worry about uniqueness and existence, and we have θb (x) = θ (x) = θ (or θi for x ∈ Si ). It will be shown below that the local likelihood estimator  θb,n (x) converges towards θ (x) as n and b tend to infinity and 0, respectively, and that an asymptotic limit theory akin to that in Theorem 1 can be established under an appropriate scaling. The main difficulty in extending Theorem 1 to the present situation is that the matrices Jb and Mb are no longer positive definite as b → 0. In fact, with Jb and Mb defined by the integral expressions in (14) and (15), but with the parameter θb (x) replaced . by θ(x) = θ , we have Jb → J = u(x, θ )uT (x, θ )ψ(x, θ ) and Mb → . T M = u(x, θ )u (x, θ )f (x) K 2 (v)dv (the last term in (15) is of smaller order as b → 0). Both of these quantities are non-negative definite but not positive definite, the matrix u(x, θ )uT (x, θ ) being of rank 1. The matrices J and M are obtained by only retaining the first-order term in a Taylor expansion in terms of b. To obtain positive definiteness, higher-order terms must be included, and these are picked up using a scaling that is changed accordingly. We illustrate this by looking at the Taylor expansion of the term

 IM = b1 b2

Kb2

(v − x)u(v, θ (x))u (v, θ (x))f (v)dv T

corresponding to Mb and looking at the more general (possibly non-Gaussian) case of a parameter θ (x), whose existence was established in Section 2.1. The other terms are treated in an 1 identical manner. Substituting wi = b− i (vi − xi ) and letting w = (w1 , w2 ) we have, by retaining two more terms of the Taylor expansion,

 IM ∼

K 2 (w1 , w2 )Abw bTw AT f (x + b1 w1 + b2 w2 )dw1 dw2 ,

where typically K is a product kernel and where bw is the 6dimensional vector defined by bTw = [1 b1 w1 b2 w2 b21 w12

b1 w1 b2 w2 1 u2 2 x1

39

b22 w22 ] and A is the 5 × 6 matrix A = 1 u2 2 x2

ux 1 x 2



u

ux1

ux 2

∂L , with uT = [u1 , . . . , u5 ], ui = ∂θ , ux1 = ∂∂xu1 , i



ux2 = ∂∂xu , ux2 = ∂ 2u , ux2 = ∂ 2u , ux1 x2 = ∂ x∂ ∂ux . ∂ x2 2 1 2 1  ∂ x1 2 Computing K 2 (w1 , w2 )bw bTw dw1 dw2 we obtain a 6 × 6 matrix H, where (only retaining the powers of b, omitting all constant  factors like w 2 K (w)dw, 12 , K i (w)dw , and, for simplicity, taking b1 = b2 = b), 1 0  0 H = 2 b  2 b 0



0 b2 0 0 0 0

2

0 0 b2 0 0 0

2

2

b2 0 0 b4 b4 0

b2 0 0 b4 b4 0

0 0  0 . 0  0 b4



Assuming that the local Gaussian at the point x is non-degenerate, the matrix A is of rank 5, and consequently, as H is of full rank, AHAT is of rank 5. It follows that the second-order Taylor expansion Mb,2 of Mb is positive definite of order (b1 b2 )2 in the sense that (b1 b2 )−2 Mb,2 exists and is positive definite in the limit. Similarly, whereas the matrix J is singular, (b1 b2 )−2 Jb,2 is non-singular and positive definite as b1 , b2 → 0. Hence, (b1 b2 )2 Jb−1 Mb Jb−1 exists and is positive definite as b1 , b2 → 0. One can get an explicit expression for the leading term of (b1 b2 )2 Jb−1 Mb Jb−1 by symbol manipulation software, and we have used the software package Maple to do so. The coefficient associated with the leading term is very complicated and not useful, as it runs over several pages of computer output. It will be seen that the above matrix terms enter into the asymptotic expressions in the following theorem. To evaluate the covariances in practice, one must use other means, as shown in Section 7.1. Note that because of the assumed Gaussianity in Theorem 2, the condition (ii) of Theorem 1 is automatically fulfilled. Theorem 2. Let {Xt } be a non-degenerate stationary bivariate Gaussian time series. Assume that (i) {Xt } is geometrically ergodic, (ii) n → ∞ and b1 , b2 → 0 such that log n/(nb1 b2 )5 → 0, (iii) The kernel function K is Lipschitz. Then for every ε > 0, there is a possibly x-dependent event A with P (Ac ) < ε such that there is a sequence  θn,b (x) that minimises Q in (11) and such that  θn,b (x), for a fixed x, converges almost surely to the true value θ = θ0 on A. Moreover, d (n(b1 b2 )3 )1/2 ( θn,b (x) − θ0 ) → N (0, (b1 b2 )2 Jb−1 Mb Jb−1 ),

this meaning that the left hand side is asymptotically normal with limiting covariance (b1 b2 )2 Jb−1 Mb Jb−1 as b1 , b2 go to zero at the given rate, or alternatively −1/2

(nb1 b2 )1/2 Jb Mb

d ( θn,b (x) − θ0 ) → N (0, I )

where I is the identity matrix of dimension 5. Proof. The proof follows the same pattern as the proof of Theorem 1, but we introduce a new penalty function Qn′ = (b1 b2 )−2 Qn . And instead of relying on the ergodic theorem and CLT for a fixed b in that proof, we now have to depend on an ergodic theorem and a CLT where there is smoothing involved. It turns out that we can use standard theorems of this type. For example, we can use the non-parametric ergodic Theorem 4.1 of Masry and Tjøstheim(1995). In the proof of that theorem  hn (y) is replaced by n−1 i Kb (Xi − x)u(Xi , θ ), where u(Xi , θ ) = ∂ log ψ(Xi , θ )/∂θ and E( hn (y)) by the corresponding expectation Kb (v − x)u(v, θ )ψ(v, θ )dv . We use the truncation argument of that proof, with s¯ > 2 replaced by γ > 2 of condition (ii) of Theorem 1 (the latter condition being trivially fulfilled by the

40

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

Gaussianity of {Xt } and by condition (iv) of Theorem 3 in Section 4.1 in the non-Gaussian case). Moreover, we use geometric ergodicity instead of the weaker mixing condition in the proof of Masry and Tjøstheim. The conclusion of that proof is that for a compact subset D of R2 ,

 1   sup Kb (Xi − x)u(Xi , θ ) x∈D  n i  −

almost surely. It follows that for a fixed x and with the bandwidth condition (ii),



n

1



= ∂θ θ=θ0 (b1 b2 )2 −n

−1



Kb (v − x)u(v, θ0 )ψ(v, θ0 )dv

∂ log ψ(Xi , θ0 ) Kb (Xi − x) ∂θ



a.s

→ 0.

Moreover, n− 1

=

 ∂ 2 Qn′  ∂θ∂θ T θ=θ0  1

Kb (v − x)u(v, θ0 )uT (v, θ0 )ψ(v, θ0 )dv (b1 b2 )2  + Kb (v − x)∇ u(v, θ0 )ψ(v, θ0 )dv − n−1   × Kb (Xi − x)∇ u(Xi , θ0 ) .

The last two terms cancel almost surely again using the nonparametric ergodic theorem and the bandwidth condition. The first term has the positive definite matrix J2 = limb→0 (b1 b2 )−2 Jb,2 as a limit, which means that J2 has positive eigenvalues, and that an inverse exists. Finally, we look at



 ∂ 2 Qn′ ∂ 2 Qn′ (δ n) (θ0 ) − (θ ) ∂θ ∂θ T ∂θ ∂θ T    ∂ 2 Qn ∂ 2 Qn −2 −1 = (b1 b2 ) (δ n) (θ0 ) − (θ ) ∂θ ∂θ T ∂θ ∂θ T −1

where |θ − θ0 | < δ . From the existence and continuity of the third-order derivative, element-wise |∂ 2 Qn /∂θ ∂θ T (θ ) − ∂ 2 Qn /∂θ ∂θ T (θ0 )| ≤ c δ for some constant c, and using the bandwidth condition, it follows that element-wise

     2 ′ ∂ 2 Qn′   −1 ∂ Qn lim lim sup(δ n) (θ ) − (θ0 )  < ∞, 2 2 n→∞  ∂θ ∂θ δ→0 and as in the proof of Theorem 1, the almost sure part of Theorem 2 follows. We then look at 1/2 −1

((b1 b2 ) n) 3

n

∂ log ψ(Xi , θ ) ∂θ   ∂ log ψ(Xi , θ ) , − E Kb (Xi − x) ∂θ

( b1 b2 )

Kb ( X i − x )

and then using the ‘big block–small block’ technique of that proof. The assumptions (4.6a) and (4.7) of Masry and Tjøstheim (1995) follows from the Gaussianity and condition (i), whereas (4.6b) and (4.6c) are trivially fulfilled from the definition of Zn∗,t . The claimed result then follows from that theorem and Theorem 2.2 of Klimko and Nelson (1978). 

  21   log n  Kb (v − x)u(v, θ )ψ(v, θ )dv  = O  nb1 b2

′ −1 ∂ Qn 



3/2

  ∂ Qn′ 1/2 −1 −1 ∂ Qn (θ0 ) = (nb1 b2 ) n ( b1 b2 ) (θ0 ) . ∂θ ∂θ

The quantity in brackets has covariance matrix Mb /(b1 b2 )2 , which tends to a positive definite matrix M2 . Asymptotic normality can then be shown using a standard non-parametric CLT; see, e.g., Masry and Tjøstheim (1995), Theorem 4.4, with Zn∗,t in (4.47) and (4.48) of the proof of that theorem replaced by

Remark 1: The convergence rate can be compared to the results of Jones (1996), where there is convergence of the same order. The convergence is much slower than in the standard bivariate non-parametric case, which means that for a given number of observations, one must use a much larger bandwidth, which again means that if one is interested in details of a non-smooth θ (x) in the non-Gaussian case (Theorem 3), one must have rather large samples. For a reasonably smooth density, θ (x) will typically be smooth. Remark 2: The almost sure non-parametric convergence actually yields uniform convergence on a compact set. We refer to Hansen (2008) for a possibly expanding set. Such results are of interest in the construction of test functionals for independence and goodness-of-fit tests. As seen in Berentsen et al. (2012); Berentsen and Tjøstheim (2012) and Støve et al. (2012) we get good results for n = 500 and in some cases for n considerably smaller. 4.1. A more general limit theorem In Section 2.1 existence of a unique θ (x) consistent with the local penalty function was established, in particular for the step function model of (5), but also more generally under the regularity conditions stated towards the end of that sub-section. Theorem 3. Let {Xt } be a bivariate time series with bivariate density f having support on all of R2 . Assume that (i) {Xt } is geometrically ergodic, (ii) n → ∞ and b1 , b2 → 0 such that log n/(nb1 b2 )5 → 0, (iii) The kernel function K is Lipschitz, (iv) f is such that in each fixed point x there exists a non-degenerate local Gaussian approximation ψ(v, θ (x)) as given in Section 2 such that E|Kb (Xt − x)∂/∂θj log ψ(Xt , θ (x))|γ < ∞, j = 1, . . . , 5 for some γ > 2. Then for every ε > 0, there is a possibly x-dependent event A with P (Ac ) < ε such that there exists a sequence  θn,b (x) that minimises Q in (11) and such that  θn,b (x), for this fixed x, converges almost surely to the true value θ (x) = θ0 (x) on A. Moreover, d (n(b1 b2 )3 )1/2 ( θn,b (x) − θ0 (x)) → N (0, (b1 b2 )2 Jb−1 Mb Jb−1 ),

this meaning that the left hand side is asymptotically normal with limiting covariance obtained from (b1 b2 )2 Jb−1 Mb Jb−1 as b1 , b2 → 0 at the appropriate rate, or alternatively −1/2

(nb1 b2 )1/2 Jb Mb

d ( θn,b (x) − θ0 (x)) → N (0, I ),

where I is the identity matrix of dimension 5. Proof. By inspecting the proof of Theorem 2, it is seen that this can be repeated step by step for a fixed x in the present situation. Only obvious notational changes and the additional use of condition (iv) are required.  5. Linear transformations and the local Gaussian distribution To analyse symmetry properties we need to study the effect of linear transformations. We assume that X has a local Gaussian

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

representation ψ(x, θ (x)) with θ (x) = (µ(x), Σ (x)) and we look at the transformation Y = a + AX , where a is a 2-dimensional vector and A is a non-singular matrix. We know that under mild regularity conditions  θ (x) converges towards θ (x) and we will use local likelihood to establish that for θ (y) = (µ(y), Σ (y)), µ(y) = a + Aµ(x) and Σ (y) = AΣ (x)AT . It will be seen from the arguments at the end of this section that these transformation results are also true for a population parameter θb (x) for a fixed bandwidth. Let (X1 , . . . , Xn ) with Xi = (X1i , X2i ) be observations from the density f (x) = ψ(x, θ (x)), and we let Yi = a + AXi , i = 1, . . . , n. We introduce bandwidth matrices Bx and By , see Fan and Gijbels (1996), and we have LX (X1 , . . . , Xn , µ(x), Σ (x))

= n− 1

n 

KBx (Xi − x) log ψ(Xi , θ )

An alternative route to this result goes via (4) for the parameter

θb (x) = θBx (x). It is shown below that if θBx (x) satisfies this equation, then θBy (y) satisfies this equation with µBy (y) = a + AµBx (x) and ΣBy (y) = AΣBx (x)AT . Then if θBx (x) → θ (x), we have θBy (y) → θ (y) with µ(y) = a + Aµ(x) and Σ (y) = AΣ (x)AT . Indeed, for y = a + Ax and w = a + Av , corresponding to (17),

KBy (Yi − y) log ψ(Yi , θ (y))



and hence it is not difficult to see that there is a one-to-one correspondence between the maxima of



KBy (w − y)ψ(w, θ (y))dw

with θ (y) = (µ(y), Σ (y)). We will now assume µ(y) = a + Aµ(x) and Σ (y) = AΣ (x)AT and show that this parametrization is consistent with the local likelihood procedure. With this parametrization it follows at once that

(Yi − µ(y))T Σ −1 (y)(Yi − µ(y))

Moreover, KBy (w − y) = |A|−1 KBx (v − x), and reasoning as above for the likelihood it is seen that corresponding to (4)



KBy (w − y)uj (w, θBy )[fY (w) − ψ(w, θBy )]dw −1



KBx (v − x)uj (v, θBx )[fx (v) − ψ(v, θBx )]dv

(16) from which the desired conclusion follows.

and hence, 1 log |Σ (x)| 2 T −1 − (Yi − µ(y)) Σ (y)(Yi − µ(y))

log ψ(Yi , θ (y)) = − log 2π − log |A| −

= − log |A| + log ψ(Xi , θ ).

6. Properties of local parameters



(17)

We need to let Bx and By tend to zero at different rates, which is not unnatural given the possible different metric structure of two neighbourhoods N (x) and N (y) around x and y, respectively. We 1 1 −1 1 −1 −1 let By = ABx such that B− = B− and |B− |. y x A y | = |Bx ||A Hence 1 −1 KBy (Yi − y) = |B− y |K (By (Yi − y)) 1 −1 −1 = |A−1 ||B− (Yi − y)) x |K (Bx A

and consequently KBy (Yi − y) log ψ(Yi , θ (y)) = |A−1 |KBx (Xi − x)

× [− log |A| + log ψ(Xi , θ (x))]. In the same manner with w = a + Av, y = a + Ax,

KBx (v − x)ψ(v, θ (x))dv,

and hence LY (Y1 , . . . , Yn , µ(y), Σ (y)) = |A−1 |LX (X1 , . . . , Xn , µ(x), Σ (x))

− |A−1 | log |A|n−1

n  i =1

We will focus on ρ(x) = σ12 (x)/ σ11 (x)σ22 (x), as we are particularly interested in ρ(x) as a measure of local dependence. It follows from the expression in (1) that −1 ≤ ρ(x) ≤ 1, and from the corresponding local likelihood algorithm that −1 ≤  ρ (x) ≤ 1. Further, if X is Gaussian, then ρ(x) ≡ constant, but note that Berentsen et al. (2012) show that ρ(x) ≡ constant also for a Gaussian copula with arbitrary marginals. In this section we will look more closely at independence and symmetry properties of ρ(x), and at limit behaviour as |x| gets large. For financial data the motivation is the often observed asymmetric patterns in financial returns, and the fact that dependence becomes stronger as xi → −∞, i = 1, 2. We wish to make this precise using ρ(x). We will carry through the analysis for θ (x), but with appropriate approximations it can be carried out for θb (x). 6.1. Independence and functional dependence

KBy (w − y)ψ(w, θ (y))dw

= |A−1 |

KBy (w − y)(log ψ(w, θ (y))f (w) − ψ(w, θ (y)))dw.

= |A|

= (Xi − µ(x))T Σ −1 (x)(Xi − µ(x)),



KBx (v − x)(log ψ(v, θ (x))f (v) − ψ(v, θ (x)))dv

and

i =1



log |ΣBx (x)|

−1

= − log |A| + log ψ(v, θBx (x))



LY (Y1 , . . . , Yn , µ(y), Σ (y)) n 

1 2

− (w − µBy (y)) ΣBy (y)(w − µBy (y)) T

1 −1 where KBx (Xi − x) = |B− x |K (Bx (Xi − x)). Letting Bx be diagonal with bx = (bx1 , bx2 ) along the diagonal and K (x) = K1 (x1 )K2 (x2 ) yields the familiar product kernel representation. Analogously,



(µ(y), Σ (y)) = (a + Aµ(x), AΣ (x)AT ). Here,  θb,n (x) converges to θ (x) and  θb,n (y) to θ (y) as shown in Theorem 3.

KBx (v − x)ψ(v, θ (x))dv,



= n−1

The last term and the multiplicative factor |A−1 | do not depend on local parameters. Hence, as Bx and By tend to zero at their different rates, there will be a one-to-one correspondence between local likelihood estimates. The estimates  θBx ,n (x) are associated with θ (x) and the estimates  θBy ,n (y) are associated with θ (y) =

log ψ(w, θBy (y)) = − log 2π − log |A| −

i =1



41

KBx (Xi − x).

For the global correlation ρ , independence implies ρ = 0, but not vice versa. For Gaussian variables, ρ = 0 implies independence. Moreover, if X1 and X2 are linearly related (not necessarily Gaussian) with X2 = a + bX1 , then ρ = 1 if b > 0 and ρ = −1 if b < 0, but for X2 = α(X1 ) for some function α , the value of ρ will depend on α . For X2 = X12 , ρ = 0 under weak assumptions. If X1 and X2 are independent, then f (x1 , x2 ) = f1 (x1 )f2 (x2 ). As b → 0 in (7)–(9) this forces ψ(x, θ (x)) to factor and ρ(x) ≡ 0. Alternatively, this can be proved by using arguments related to

42

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

Fig. 2. Local Gaussian correlation map based on 5000 observations from a bivariate t-distribution with ρ = 0 and ν = 4 degrees of freedom.

the Rosenblatt (1952) transformation, which in this case is unique and leads to the representation X1 = FX−11 (Φ (Z1 )) and X2 =

FX−21 (Φ (Z2 )), where Z1 and Z2 are independent standard normal

variables, again resulting in f (x1 , x2 ) = ψ(x1 , µ1 (x1 ), σ12 (x1 )) ψ(x2 , µ(x2 ), σ22 (x2 )), so that ρ(x) ≡ 0.

A necessary and sufficient condition for independence is that ρ(x) ≡ 0, µi (x) ≡ µi (xi ), σi2 (x) ≡ σi2 (xi ), i = 1, 2. We have not found an example with ρ(x) ≡ 0, and where there is dependence. Functionals based on ρ(x) for measuring dependence and for testing of independence are given in Berentsen and Tjøstheim (2012). An advantage of this approach is that one can distinguish between positive and negative dependence in a nonlinear situation, which does not seem to be the case for other commonly used non-linear dependence measures; see e.g. Skaug and Tjøstheim (1993) and Székely and Rizzo (2009). The other extreme is where |ρ(x)| = 1. This is connected to functional dependence in much the same way as linear dependence is connected to |ρ| = 1. Assume that X2 = α(X1 ), with α differentiable, and that X1 can be represented by a local Gaussian representation, so that X1 = g (Z1 ). Then X2 = α(g (Z1 )). There is only one variable involved, and any Gaussian approximation must also collapse to one variable. This is only possible if, at the point x, the Gaussian variables (U1 (x), U2 (x)) defining the bivariate density ψ(v, µ(x), Σ (x)) has U1 (x) = a + bU2 (x) for some scalars a and b, which implies ρ(x) = +1 or ρ(x) = −1, depending on whether the sign of α ′ (x1 ) is positive or negative. The same reasoning can be carried out if (x1 , x2 ) defines a closed curve, such as α(x1 )+β(x2 ) = c. Then ρ(x) is equal to 1 or −1 along the curve, depending on whether the tangent at the point x has positive or negative slope. 6.2. Symmetry and asymmetry

µY (y) = AµX (x). Symmetry properties of Σ (x) and µ(x) can conceivably be used to obtain more precise estimates, and in Berentsen and Tjøstheim (2012) they are used to increase the power of independence tests.  (i) Radial  symmetry: If X has radial symmetry, then X and −X = −1 0

0

−1 X have the same density. It follows that

Σ (−x) =

 −1 0





0 = Σ (x), −1

so that Σ (x) and hence ρ(x) has radial symmetry. For the local mean, however, µ(−x) = −µ(x). Examples of densities having radial symmetry are all elliptic distributions. If we let X1 and X2 represent Yt and Yt −1 in a GARCH model, then −Yt , −Yt −1 has the same GARCH structure, and GARCH densities have radial symmetry, and consequently the local covariance and local correlation have radial symmetry. Figs. 2 and 3 show estimates of ρ(x) for a t-distribution with 4 degrees of freedom and global ρ = 0, and for a GARCH model, respectively. The radial symmetry of ρ(x) emerges clearly from the estimated values in the figures. In Berentsen et al. (2012) the radial symmetry along the diagonal x = (s, s) of the population quantity ρ(x) is verified using (13). (ii) Reflection symmetry: Reflection symmetry around the x1 and x2 axes is described by the matrices

 A1 =

1 0



0 −1

and

A2 =

 −1 0



0 , 1

so that



x1 −x2



  = A1

x1 x2

 and

−x1 x2



  = A2

x1 . x2

We have

Σ (x1 , −x2 ) = A1 Σ (x)AT1 = We will look at bivariate densities f having the following types of symmetries (we may assume that µ = E(X ) = 0, because otherwise we may just center the density at µ and discuss symmetry around µ): (i) Radial symmetry, where f (−x) = f (x), (ii) reflection symmetry, where f (−x1 , x2 ) = f (x1 , x2 ) and/or f (x1 , −x2 ) = f (x1 , x2 ), (iii) exchange symmetry, where f (x1 , x2 ) = f (x2 , x1 ), and (iv) rotation symmetry, where f (x) = γ (|x|) for some function γ . We will examine how the corresponding local Gaussian correlation ρ(x) behaves. All of the symmetries can be described by linear transformations, and we use the facts already established in Section 5 that if y = g (x) = Ax, then ΣY (y) = AΣX (x)AT ,



0 −1 Σ (x) −1 0



σ11 (x) −σ12 (x)

 −σ12 (x) . σ22 (x)

We see that σii (x1 , −x2 ) = σii (x1 , x2 ), i = 1, 2, and ρ(x1 , −x2 ) = −ρ(x1 , x2 ). Similarly, if there is reflection symmetry around the x2 axis, σii (−x1 , x2 ) = σii (x1 , x2 ), i = 1, 2, and ρ(−x1 , x2 ) = −ρ(x1 , x2 ). Moreover, µ1 (x1 , −x2 ) = µ1 (x1 , x2 ), µ2 (x1 , −x2 ) = −µ2 (x1 , x2 ), and µ1 (−x1 , x2 ) = −µ1 (x1 , x2 ), µ2 (−x1 , x2 ) = µ2 (x1 , x2 ). Examples of densities having reflection symmetry are the spherical densities, for example, the t-distribution with 4 degrees of freedom with global ρ = 0. The odd reflection symmetry of the local correlation in this case is brought out by the estimated  ρ (x) in Fig. 2. A general consequence of the odd

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

43

Fig. 3. Local Gaussian correlation map based on 2000 observations from the GARCH(1,1) model defined in Section 7.2. The bandwidths are b1 = b2 = 1. The bulk of the observations are within (−2, +2), and we have restricted the scales to (−3, +3) for display purposes.

reflection symmetry derived above is that ρ(x) is zero along the coordinate axes; that is, ρ(x1 , 0) ≡ ρ(0, x2 ) ≡ 0. Again this is roughly confirmed by the estimates on Fig. 2. In the GARCH model, (−Yt , Yt −1 ) have the same distribution as (Yt , Yt −1 ), so that again the local correlation has odd reflection symmetry (cf. Fig. 3 for estimated values). 0

(iii) Exchange symmetry: Let A = [1

1 ]. 0

If we have exchange

symmetry, Σ (x2 , x1 ) = AΣ (x1 , x2 )A , and exchange symmetry of f (x) implies exchange symmetry of Σ (x), and hence of ρ(x). Many copula models have exchange symmetry. An example is given in Fig. 1. Elliptic distributions with a principal axis along x1 = x2 has exchange symmetry. On the other hand, for a GARCH model, we cannot in general exchange Yt and Yt −1 and still obtain the same bivariate distribution. This implies that the local correlation for GARCH models do not in general have exchange symmetry (cf. again Fig. 3). For an exchange-symmetric density the local mean is exchange symmetric. We refer to Berentsen et al. (2012) for plots of theoretical values of ρ(x) = ρ(s, s) along the diagonal. T

cos α

(iv) Rotations: The rotation matrix is given by A = [ sin α

− sin α ]. cos α

For a given vector x, Ax is rotated counter-clockwise through an angle α . We consider an arbitrary spherical density f . Then f is rotation symmetric in addition to being radial, reflection and exchange symmetric. The corresponding local correlation is radial and exchange symmetric, but it is odd reflection symmetric. We start rotation from a point x = (x1 , 0) on the positive x1 axis. Then ρ(x1 , 0) = 0, so that

 

y y= 1 y2

 

x =A 1 0

x cos α = 1 . x1 sin α





Further, noting that Σ (x) is diagonal, since ρ(x) = 0, Σ (y) = AΣ (x)AT  σ (x) cos2 α + σ22 (x) sin2 α = 11 (σ11 (x) − σ22 (x)) sin α cos α

similarly for the other quadrants. Moreover, it can be shown that ρ(α) is positive in the first and third quadrant and negative in the second and fourth quadrant. For an elliptic distribution (again assuming µ = 0), let Σ be the global covariance of X . The elliptic density can then be written f (x) = |Σ −1/2 |γ (xT Σ −1 x) for a function γ . If the global standard deviations σ1 and σ2 are equal, then [x1 , x2 ]Σ −1 [x1 , x2 ]T = [x2 , x1 ]Σ −1 [x2 , x1 ]T and f (x) and ρ(x) are exchange symmetric. This in turn implies that f (x) and ρ(x) are reflection symmetric along the line x1 = x2 . A similar argument leads to reflection symmetry around the line x1 = −x2 . This means that as we approach the diagonals along the density contours, which are ellipses, these are axes of local maxima and minima (for ρ > 0 we have local maxima along the diagonals in the first and third quadrant, and minima along the axes in the second and fourth quadrant; for ρ < 0 the role of the axes are reversed). If σ1 ̸= σ2 , the same reasoning can be carried through for the scaled variables Xi /σi , the axes (x1 , x1 ) and (x1 , −x1 ) now σ σ corresponding to (x1 , σ1 x1 ) and (x1 , − σ1 x1 ). 2

 (σ11 (x) − σ22 (x)) sin α cos α . 2 2 σ11 (x) sin α + σ22 (x) cos α

ρ 2 = ρ 2 (α) (σ11 (x) − σ22 (x))2  2 2 σ11 (x) + σ22 (x) + σ11 (x)σ22 (x) tan2 α +

1 tan2 α

,

which has its maximum for tan2 α = 1, i.e., α = ±π /4. This is consistent with estimated values in Fig. 2, where it is seen that

2

6.3. Extremes for local variance We consider the local variance and local mean in the univariate case. Let f and F be the density and cumulative distribution function for X . Then X = g (Z ) = F −1 (Φ (Z )), with Z ∼ N (0, 1), and with Φ being the cumulative distribution function of the standard normal. Let φ be the standard normal density, and let σ (x) be the local standard deviation of f . Then, from Section 2.2 and with z = Φ −1 (F (x)),

σ (x) = g ′ (z ) =

It follows that

=

ρ(α) has maxima/minima along α = π /4 and α = −π /4, and

φ(Φ −1 (F (x))) φ(z ) = . f (F −1 (Φ (z ))) f (x)

Let xα and zα be the α -quantiles defined by P (X ≥ xα ) = α and P (Z ≥ zα ) = α . Note that σ (xα ) = φ(zα )/f (xα ). We only consider the upper tail. As a definition of ‘thick’ tail we propose two requirements: As α becomes small, xα > zα and f (xα )/φ(zα ) → 0. A moment of reflection will convince us that this may be a reasonable definition, albeit a slightly unconventional one. It essentially means that any density having a heavier tail than the Gaussian will be characterized as thick tailed, e.g. the exponential density, which is not traditionally called thick tailed. If f is Gaussian, f (xα )/φ(zα ) ≡ 1. It follows immediately that for a

44

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

thick-tailed density, as xα → ∞ when α → 0, then σ (xα ) → ∞. In the limit, the local Gaussian density must have infinite variance to compensate for the thickness of the tail of f . (Note that it is not sufficient to require φ(x)/f (x) → 0 to obtain thick-tailedness. Indeed, if f is Gaussian with σ > 1, then φ(x)/f (x) → 0, whereas f (xα )/φ(xα ) ≡ 1.) In the same manner f (xα )/φ(zα ) → ∞ (thin tail) has the local standard deviation tending to zero. 6.4. Local correlation and the tail index

of i.i.d. variables, is the bootstrap, and this was used in Berentsen et al. (2012) and Berentsen and Tjøstheim (2012) to evaluate significance of goodness-of-fit tests for copulas and in tests of independence. However, it is quite time-consuming, since for each bootstrap realisation the local likelihood has to be optimised numerically for a grid of points. Moreover, it is not extendable to the dependent case, although the block bootstrap might be tried. The second alternative is to first note that Mb in (15) can be approximated by

 A much used measure of dependence in the tail is the so-called tail index. This is described in McNeil et al. (2005). The upper and lower tail index can be defined as

b = b1 b2 M −

q→1

λL = lim P (F1 (X1 ) ≤ q | F2 (X2 ) ≤ q), q→ 0

×

   X 1 − µ1  ≤ u =u = 2 lim P u→−∞ σ2 σ1    1−ρ = 2 lim Φ u , u→−∞ 1+ρ 

λL,G

X2 − µ2

so that for ρ < 1, λL = 0, whereas for ρ = 1, λL = 1. This absence of a non-trivial asymptotic tail dependence is used as one argument against employing the Gaussian for financial data. Applying this to the Gaussian U (x) = (U1 (x), U2 (x)) ∼ N (µ(x), Σ (x)) with x = (s, s),



   U1 (x) − µ1 (x) U2 (x) − µ2 (x) lim P ≤ u  =u u→−∞ σ2 (x) σ1 (x)    1 − ρ(s, s) = 2 lim Φ u . u→−∞ 1 + ρ(s, s) If U (x) gives a satisfactory approximation to F (x) in the tail as x → −∞, then the above formula suggests that if ρ(s, s) ≤ M < 1 as s → −∞, then F has no lower tail dependence (F Gaussian coming out as a special case). In other words, if there is positive tail dependence, then it is necessary that ρ(s, s) → 1. We have not been able to find exact (monotonicity) conditions under which this holds in general, but (13) can be used to actually compute the limiting value of ρ(x) = ρ(s, s) as s → −∞ or s → ∞ for copulas with exchange symmetry. Applied to the Clayton copula it is shown in Berentsen et al. (2012) that ρ(s, s) → 1 as s → −∞ (ρ(s, s) → 0 as s → +∞), and plots of the population value of ρ(s, s) for the copulas looked at in that publication indicate that it has far greater validity. We believe that this is a theoretical characterization of dependence relationships in a market crash situation. Another form of correlation based on extreme values is given by Davis and Mikosch (2009). 7. Some practical considerations 7.1. Estimating the standard error of estimates As seen in Section 4, it is difficult to obtain explicit and useful expressions for the standard errors of  θb,n (x) by Taylor expansion in the general formulas (14)–(15). We have used two alternative methods. The first one, only valid in case {Xi } consists

n i=1

Kb2 (Xi − z )u Xi , θb (x) uT Xi , θb (x)

 





n

λU = lim P (F1 (X1 ) > q | F2 (X2 ) > q) and

where F1 and F2 are the marginal cumulative distributions of X1 and X2 . A problem is determining λU and λL in practice (see, for example, Frahm et al., 2005). For a Gaussian distribution (cf. McNeil et al., 2005, p. 211),

n 1

1 n i =1 n 1

n i=1

Kb (Xi − x)u Xi , θb (x)





  Kb (Xi − x)u Xi , θb (x) , T



b → Mb almost surely (for fixed b) by the strong where M law of large numbers. In practice, we must replace θb (x) by the estimate  θb,n (x). Similarly, in (14) the term containing f (v) can be approximated using the law of large numbers, whereas the other integral terms can be computed analytically if K is a Gaussian kernel. If K is not Gaussian, these latter integrals   can be evaluated by Monte Carlo sampling from ψ v,  θb,n (x) . In case a Gaussian kernel is used, this method is much faster than the bootstrap, and it gives very similar results. Another advantage of using this alternative is that it can be extended to the dependent case, the ergodic theorem playing the role of the law of large numbers. Confidence intervals can then be constructed in a standard fashion. 7.2. Choosing the bandwidth In his book on local likelihood, Loader (1999) discusses the issue of bandwidth choice. His results are not directly applicable in the present situation, however. In our version of a bandwidth algorithm, we have chosen b as a compromise between having variance of order (nb31 b32 )−1 and ‘bias’ squared of order (b21 + b22 )2 , where the   role of the bias is played by the distance between ψ x,  θ (x) and f (x). In practice, the true f is unknown, and in the bandwidth algorithm, we replace it by the kernel estimate  f of f , where there are a number of algorithms for choosing ‘optimal’ for f.   bandwidths  If b = bi is chosen by balancing Var  θi (x) and |ψ x,  θ (x) − f (x)|2 , then the latter quantity is again of order b4 , since | f (x) − f (x)|2 is 4 4 of order bK = o(b ), where bK is the bandwidth used in the kernel   estimation. This results from Var  f (x) being of smaller order than Var  θi (x) .





The distance between  f and ψ( θ ) can be measured in many ways, and can be evaluated locally for a fixed x or globally. In this section we are only concerned with a global bandwidth, and to simplify further, we use the bandwidth b = (b1 , b2 ), obtained by first scaling the observations to have identical global standard deviations equal to one. We also use the following estimate for the mean integrated global variance V ( θ i ) = n− 1

n 

  Var θi (X1j , X2j ) . 



j =1

Since reliable estimates in low-density regions require larger bandwidths, and the variances in these locations could dominate the other terms in the sum, in practice we restrict the observations included in the above sum to observations in locations of moderate to high density, using a cut-off point based on the estimated

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

density. As before, we can estimate the variance by bootstrapping if {X1j , X2j } is a sequence of i.i.d. variables, or by the method in Section 7.1 in the dependent (and in the i.i.d.) case. Because the latter method is much faster, it is the method we have used. We use the squared distance n     2 D( θ) = n−1 ψ X1j , X2j ,  θ (X1j , X2j ) −  f (X1j , X2j ) j =1

for computing the distance between  f and ψ( θ ). The bandwidth b = bi can now be chosen as the minimiser of the weighted sum pV ( θi , b) + (1 − p)D( θ , b), where we choose 0 < p < 1 depending on which aspect of our estimator (low variance or low bias) we value. Note that the density term D is not invariant to linear scaling, so we have pre-scaled our observations to unit sample variance. The quantity V is invariant—in that linearly scaling both observations and bandwidths leaves the sum unchanged. As a last step, we must therefore scale the bandwidth up again with the standard deviation. Finally, note that in the special case of f itself being Gaussian, a minimising bandwidth of the sum can not be  expected, since as b → ∞, ψ x,  θ (x) → f (x). We illustrate our procedure with two examples, both involving the choice of bandwidth for  ρb (x) only. Because of the scaling, we take the two components of b to be the same. First we look at a bivariate Gaussian distribution with µ1 = µ2 = 0, σ1 = σ2 = 1 and ρ = 0.5, with n = 100, 200, 500 and 1500 observations, and for bandwidths from 0.5 to 2.5; see Fig. 4(a). As expected, both the variance and the bias sums decrease with n. The unweighted sum V + D does not have a minimum as a function of b, which is consistent with the reasoning above. In our second example, we look at a GARCH(1,1) model (see Bollerslev, 1986), {Yt }, with parameters µ = 0, ω = 0.1, α = 0.7 and β = 0.2, and εt ∼ N (0, 1), Yt = µ + σt εt ,

σt2 = ω + α Yt2−1 + βσt2−1 ,

(18)

for t = 1, . . . , n. We now let {Xt } = {Yt , Yt +1 }. We see from Fig. 4(b) that the sum of variance and bias squared yields a minimum for b which gets smaller as n increases. This approach seems useful, but clearly there is a need for evaluating alternative algorithms, including ones for local bandwidth, and for examining the theoretical properties. One alternative method in copula testing is given in Berentsen et al. (2012). 8. Simulated and real examples Many more examples are displayed in Berentsen et al. (2012); Berentsen and Tjøstheim (2012) and Støve et al. (2012). 8.1. The Gaussian distribution We have done a number of experiments for bivariate Gaussian variables. The local correlation ρ(x) is constant, and this is confirmed for  ρ (x) in the simulations, with  ρ (x) lying within a 90% confidence interval with expected frequency. We have used a bandwidth which is comparable to those used in the simulations for non-Gaussian examples. Increasing the bandwidth leads to more precise results. Both the bootstrap and the procedure outlined in Section 7.1 were used to construct confidence intervals, and the results were very similar. 8.2. A bivariate t-distribution The symmetry pattern of the local correlation for the tdistribution with 4 degrees of freedom has already been explained

45

in Section 6. Qualitatively similar results were obtained by Holland and Wang (1987) (for ν = 1), using a different measure of local dependence. 8.3. A Gumbel copula The exchange symmetry of the Gumbel copula in Fig. 1 has been explained in Section 6. Such a plot can be used as a tool for checking the validity of a copula model, by comparing the local correlation structure for a chosen copula model to the local correlation structure of the data themselves. We refer to Berentsen et al. (2012) for a formal test. 8.4. A GARCH model Although Yt and Yt −1 are uncorrelated, Fig. 3 shows strong local dependence, with much the same pattern as for the t-distribution, i.e. positive dependence in the first and third quadrant, and negative dependence in the second and fourth quadrant, with increasing dependence away from the centre, but more strongly expressed. The absence of strict exchange symmetry was mentioned in Section 6. We use the method of Section 7.1 to estimate standard errors. The standard errors vary from about 0.03 to 0.05 within the (−2, +2) region, up to 0.2 at ±3. The local correlation plot is consistent with volatility plots of a GARCH(1,1) model. 8.5. Stock returns data We briefly explore the local Gaussian correlation structure for a real trivariate financial data set. The data consists of daily 100 × log returns for three stocks, namely X1 : Boeing, X2 : Alcoa and X3 : Caterpillar, taken from 1973-01-01 to 2007-12-31, with zero return values removed and a GARCH(1,1) filtering using a conditional t-distribution applied. The GARCH filtering has the effect of reducing the kurtosis and transforming the data to being closer to independent in time. A scatterplot matrix of the data is shown in Fig. 5(a), with superimposed linear regression lines and kernel density estimates on the diagonals. Judging by the plots, the returns appears to be linearly dependent with a positive global correlation for each pair. But examining the local Gaussian correlation reveals much more structure, and quite dramatic differences between the pairs; see Fig. 5(b), where we have plotted the diagonal values of the estimated local Gaussian correlation, that is,  ρ (x1 , x2 ) with x1 = x2 . At x1 = x2 = ±2, the standard errors are about 0.015. We note that for all the three pairs, we have a non-constant and asymmetric structure, implying that neither a Gaussian, nor a t-distribution, nor any other distribution displaying symmetry, will provide an adequate fit. (The non-constant and asymmetric structure is supported by hypothesis tests at the points x1 = x2 = ±2 at a level of 0.01 or less, using the asymptotic distribution of  ρ. A more appropriate tool in this situation would be a non-parametric test testing the different structure of the curves. Such tests will be treated in a separate publication).There is a beginning in Berentsen et al. (2012) for a copula goodness-of-fit test. The shape of the local Gaussian correlation pattern for Boeing against the two other stocks also differ very much from the Caterpillar–Alcoa pattern in direction, showing differences in the underlying distributions. None of these results are readily apparent from the scatterplots, or from univariate plots. Moreover, the detected differences have direct implications for portfolios where these stocks are involved. An increase of Gaussian local correlation when the market is going down, as shown for the pairs (X1 , X2 ) and (X1 , X3 ), would lead to a deterioration of this diversification protection. On the other hand, a linear combination of X2 and

46

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

(a) Normal distribution.

(b) GARCH(1,1) model. Fig. 4. Variance, bias and log sum values for choosing bandwidths.

X3 would not experience such problems. These effects can be quantified numerically using the estimates of the local Gaussian correlation and the local Gaussian variances. It is possible to apply the local Gaussian correlation to measure and detect financial contagion (Støve et al., 2012). In principle, a joint local analysis of more than two variables can be done. One has the prospect of doing locally what can be done globally in multivariate analysis for normal variables, for example, portfolio theory, principal component analysis and discrimination theory. One then has to generalise the bivariate

Gaussian approximation to a general multivariate local Gaussian approximation. This necessitates a simplification of the model to avoid the curse of dimensionality. In fact the simplified model has a faster convergence rate for pairwise local Gaussian correlations. Acknowledgments We are very grateful to two anonymous referees for a number of valuable comments and suggestions. Their constructive criticism has led to a much improved paper.

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

47

(a) Scatterplot matrix.

(b) Line plot showing diagonal values of the local Gaussian correlation (ordinate) for the three pairs of stocks, calculated at various values of x1 = x2 = x (abscissa). Fig. 5. 100 × log returns for the three stocks Boeing, Caterpillar and Alcoa, based on daily data from 1973-01-01 to 2007-12-31.

References Berentsen, G., Støve, B., Tjøstheim, D., Nordbø, T., 2012. Recognizing and visualizing copulas: an approach using local Gaussian approximation. Manuscript. Department of Mathematics, University of Bergen. See http://folk.uib.no/gbe062/localgaussian-correlation. Berentsen, G., Tjøstheim, D., 2012. Recognizing and visualizing departures from independence in bivariate data using local Gaussian correlation. Manuscript. Department of Mathematics, University of Bergen. See http://folk.uib.no/ gbe062/local-gaussian-correlation. Bjerve, S., Doksum, K., 1993. Correlation curves: measures of association as function of covariate values. Annals of Statistics 21, 890–902. Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327. Campbell, R., Forbes, C.S., Koedijk, K., Kofman, P., 2008. Increasing correlations or just fat tails? Journal of Empirical Finance 15, 287–309. Davis, R., Mikosch, T., 2009. The extremogram: a correlogram for extreme events. Bernoulli 15, 977–1009. Delicado, P., Smrekar, M., 2009. Measuring non-linear dependence for two random variables distributed along a curve. Statistics and Computing 19, 255–269. Doksum, K., Blyth, S., Bradlow, E., Meng, X., Zhao, H., 1994. Correlation curves as local measures of variance explained by regression. Journal of the American Statistical Association 89, 571–582. Eguchi, S., Copas, J., 1998. A class of local likelihood methods and near-parametric asymptotics. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 60, 709–724. Fan, J., Gijbels, I., 1996. Local Polynomial Modelling and its Application, Theory and Methodologies. Chapman and Hall, London. Fan, J., Yao, Q., 2003. Nonlinear Time Series. Nonparametric and Parametric Methods. Springer, New York.

Forbes, K.J., Rigobon, R., 2002. No contagion, only interdependence: measuring stock market comovements. The Journal of Finance 57, 2223–2261. Frahm, G., Junker, M., Schmidt, R., 2005. Estimating the tail-dependence coefficient: properties and pitfalls. Insurance: Mathematics and Economics 37, 80–100. Gourieroux, C., Jasiak, J., 2010. Local likelihood density estimation and value-at-risk. Journal of Probability and Statistics 2010, 26. Hansen, B.E., 2008. Uniform convergence rates for kernel estimation with dependent data. Econometric Theory 24, 726–748. Hjort, N., Jones, M., 1996. Locally parametric nonparametric density estimation. Annals of Statistics 24, 1619–1647. Holland, P., Wang, Y., 1987. Dependence functions for continuous bivariate densities. Communications in Statistics 16, 863–876. Hong, Y., Tu, J., Zhou, G., 2007. Asymmetries in stock returns: statistical tests and economic evaluation. Review of Financial Studies 20, 1547–1581. Inci, A.C., Li, H.C., McCarthy, J., 2011. Financial contagion: a local correlation analysis. Research in International Business and Finance 25, 11–25. Jones, M., 1996. The local dependence function. Biometrika 83, 899–904. Jones, M., 1998. Constant local dependence. Journal of Multivariate Analysis 64, 148–155. Jones, M., Koch, I., 2003. Dependence maps: local dependence in practice. Statistics and Computing 13, 241–255. Klimko, L.A., Nelson, P.I., 1978. On conditional least squares estimation for stochastic processes. Annals of Statistics 6, 629–642. Loader, C.R., 1999. Local Regression and Likelihood, Statististics and Computing. Springer, New York. Longin, F., Solnik, B., 2001. Extreme correlations in international equity markets. The Journal of Finance 56, 649–676. Masry, E., Tjøstheim, D., 1995. Nonparametric estimation and identification of ARCH non-linear time series: strong convergence and asymptotic normality. Econometric Theory 11, 258–289.

48

D. Tjøstheim, K.O. Hufthammer / Journal of Econometrics 172 (2013) 33–48

McNeil, A.J., Frey, R., Embrechts, P., 2005. Quantitative Risk Management: Concepts, Techniques, Tools. Princeton University Press, Princeton. Rodrigues, J.C., 2007. Measuring financial contagion. A copula approach. Journal of Empirical Finance 14, 401–423. Rosenblatt, M., 1952. Remarks on a multivariate transformation. Annals of Mathematical Statistics 23, 470–472. Silvapulle, P., Granger, C.W.J., 2001. Large returns, conditional correlation and portfolio diversification. A value-at-risk approach. Quantitative Finance 1, 542–551.

Skaug, H.J., Tjøstheim, D., 1993. A nonparametric test of serial independence based on the empirical distribution function. Biometrika 80, 591–602. Støve, B., Tjøstheim, D., Hufthammer, K., 2012. Using local Gaussian correlation in a nonlinear re-examination of financial contagion. Revised Manuscript. Székely, G.J., Rizzo, M.L., 2009. Brownian distance covariance. Annals of Applied Statistics 3, 1236–1265. Taniguchi, M., Kakizawa, Y., 2000. Asymptotic Theory of Statistical Inference for Time Series. Springer-Verlag, New York. Tjøstheim, D., 1986. Estimation in nonlinear time series models. Stochastic Processes and their Applications 21, 251–273.