A nonparametric test for slowly-varying nonstationarities

A nonparametric test for slowly-varying nonstationarities

Signal Processing 143 (2018) 241–252 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro A...

1MB Sizes 23 Downloads 160 Views

Signal Processing 143 (2018) 241–252

Contents lists available at ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

A nonparametric test for slowly-varying nonstationarities Douglas Baptista de Souza a,∗, Jocelyn Chanussot b, Anne-Catherine Favre c, Pierre Borgnat d a

GE Global Research Center, Rua Trinta e Seis, Ilha do Bom Jesus, 21941-593, Rio de Janeiro, RJ, Brazil GIPSA-Lab, 11 rue des Mathématiques, BP46 - 38402, Saint-Martin d’Hères, France c LTHE, 70 rue de la physique, BP53 - 38400, Saint-Martin d’Hères, France d Lab PHYS, École normale supérieure de Lyon, CNRS, 46, allée d’Italie 69364 Lyon cedex 07 France b

a r t i c l e

i n f o

Article history: Received 16 March 2017 Revised 19 July 2017 Accepted 31 August 2017 Available online 1 September 2017 Keywords: Stationarity test Slowly-varying nonstationarities Time marginals Trend detection Empirical mode decomposition Real-world signals Block bootstrapping

a b s t r a c t This paper develops a new nonparametric method that is suitable for detecting slowly-varying nonstationarities that can be seen as trends in the time marginal of the time-varying spectrum of the signal. The rationale behind the proposed method is to measure the importance of the trend in the time marginal by using a proper test statistic, and further compare this measurement with the ones that are likely to be found in stationary references. It is shown that the distribution of the test statistic under stationarity can be modeled fairly well by a Generalized Extreme Value (GEV) pdf, from which a threshold can be derived for testing stationarity by means of a hypothesis test. Finally, the new method is compared with other ones found in the literature.

1. Introduction Stationarity is a crucial assumption for many statistical models, but many real-world signals turn out to be nonstationary. For instance, rainfall [1] and sunspot [2] signals are real-world processes that commonly exhibit nonstationary behaviors. Assessment of stationarity is thus an important task in signal processing and time series analysis. The goal of this paper is to propose a method suitable for testing slow nonstationary evolutions. As a matter of fact, detecting such nonstationarities is especially challenging and most of traditional tests fail. Several stationarity tests have been proposed in the last decades, some being rooted in time series modeling [3–5], spectral analysis [6], or detection of abrupt changes [7]. The emerging alternatives in the literature can be categorized into parametric and nonparametric approaches. The definition of nonparametric technique adopted in this work is that of a method which does assume any a priori functional form or parametric model for the input signal [8]. Therefore, even if the technique makes use of a particular window function to analyze the signal (which could be considered a priori a kind of model), if the methods requires no



Corresponding author. E-mail addresses: [email protected] (D. Baptista de Souza), [email protected] (J. Chanussot), [email protected] (A.-C. Favre), [email protected] (P. Borgnat). http://dx.doi.org/10.1016/j.sigpro.2017.08.026 0165-1684/© 2017 Published by Elsevier B.V.

© 2017 Published by Elsevier B.V.

assumption regarding the distribution (Gaussian, gamma, etc.) or process model (TVAR, ARMA, etc.) of the input signal, then we consider this method as nonparametric. Nonparametric methods may be preferable in real-world applications, as the performance of a parametric method depends on the accuracy of the chosen model, which is hardly assessed for real-world data. Methods for detecting slow nonstationary evolutions, however, are more common in the class of parametric techniques. Some parametric methods that have been proposed in the past years assume the underlying signal can be modeled by a time-varying autoregressive process (TVAR) [3,4,9,10]. Among the nonparametric methods, the work in [11] has presented a technique for detecting changes in high-order statistics, whereas [12] has proposed a test for second-order stationarity in the TF domain. A common drawback of TF-based methods, however, is the computational load required to estimate full TF representations [13]. Moreover, traditional TF representations often estimate poorly the spectral content at very low frequencies [14]. The latter is an important issue if the goal is to detect slowly-varying nonstationarities. In this regard, the method of [12] has been modified in [15] so as to improve the detection of nonstationary signals with spectral content more concentrated at low frequencies. Some other methods have also been proposed in the literature to improve the resolution of TF transforms at low frequency bands [16,17]. Unfortunately, these modifications end up increasing even more the computational complexity of the TF-based approaches.

242

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

TF representations

Signals

200

40 0

y (n)

20

150

1

1

250

60

5 x (n)

Time marginals

100 0 30

-5 100

200

300

Freq.

50

40

20 20

10 0 0

(a)

0

Time

(b)

200 y2(n)

x2(n)

300

250

40 0

200

(c)

60

5

100

20

150 100

0 30

-5

40

20 100

200

(d)

300

Freq.

20

10 0 0

Time



50 0

y (n ) = x (n ) ∗ h (n ) 100

200

2

.

(1)

300

(f)

(e)

For testing nonstationary behaviors that appear as temporal structures in the time marginal y(n), the tasks of computing y(n) should be performed. A straightforward approach to compute y(n) is to intregrate over the frequency axis the TF representation of x(n) [19]. However, such approach would require the estimation of full TF representations, for the signal x(n) and any possible stationarized reference used by the method [12]. Unfortunately, resorting to TF representations to compute y(n) would lead to the problems mentioned in Section 1 regarding computational complexity [13] and poor estimation of the spectral content at very low frequencies [14]. Thanks to the marginal properties of TF distributions [19], the time marginal y(n) of the TF representation of a given real, discrete-time signal x(n) can be easily computed by squaring the result of the numerical convolution of x(n) and a given shorttime window h(n):

Fig. 1. (a) A stationary Gaussian signal with μ = 0, σ = 1 and length N = 300. (d) A Gaussian signal with time-varying mean and variance of the same length. The TF representations of x1 (n) and x2 (n) are shown in (b) and (e). Their time marginal series y1 (n) and y2 (n) are shown in (c) and (f). 2

In this paper, we show how a nonparametric test for slow nonstationary evolutions can be built in time domain, by developing a hypothesis test for the presence of trends in the marginal of the time-varying spectrum. A crucial point of the proposed technique is that we compute the time marginal directly in time domain, therefore avoiding the problems mentioned above involving TF representations. We remark that this paper is a modified and extended version of the work presented in [18]. Different from the present paper, the work in [18] has been built in TF domain and has not been developed as a proper hypothesis test. To build the new stationarity test, concepts like bootstrapping and GEV modeling are introduced. Furthermore, this paper develops the mathematics to describe the link between slowly-varying nonstationarities and trend-like structures in the time marginals. Also, the experimental study is extended significantly, by testing more nonstationary processes against a longer list of alternative methods. This paper is organized as follows. In Section 2, we show how nonstationarities that vary slow in time can appear as trends in the time marginal. We define how to approximate these trends and assess their significance by means of the trend importance estimator. In Section 3, we describe the framework and the resampling method needed to build the hypothesis test. In Section 4, we analyze the behavior of the trend importance estimator in stationary and nonstationary situations. In Section 5, we propose a model for the distribution of this estimator under stationarity. This model allows for characterizing a hypothesis test for stationarity in Section 6. The experimental study and the conclusions are shown in Sections 7 and 8, respectively.

Doing so, we can compute y(n) without having to deal with the estimation of TF representations. According to the properties of quadratic TF distributions, (1) is an approximation of the instantaneous power (|x(n)|2 ) of the signal [19]. Here, the study of stationarity is reduced to the study of the time marginal computed by (1). Therefore, the proposed method is developed in the time domain, as it uses only information coded in y(n). In following section, we show how the computation of y(n) via (1) is affected if stationarity does not hold. 2.1. The influence of a nonstationary behavior on the time marginal computation Let us assume the short-time window h(n) in (1) contains L samples, or weights, given by h(0 ), h(1 ), . . . , h(L − 1 ), where L  N and N is the number of samples of the signal x(n). Then, (1) can be written as follows:

y (n ) =

 L−1

h ( l )x ( n − l )

2

,

(2)

l=0

The choice of the (deterministic) window function h(n) and its size L will be discussed in Section 2.6. After some algebra, one can rewrite (2) as follows:

y (n ) =

L−1 

h2 ( l )x2 ( n − l )

l=0

+2

L−1 L −1−l 

h ( j )h ( j + l )x ( n − j )x ( n − j − l )

(3)

j=0

l=1

from where an expression for the expected value of y(n) can be obtained

E [ y ( n )] =

L−1 

h 2 ( l ) E [ x 2 ( n − l )]

l=0

2. The rationale behind the framework

+2 This paper proposes a method for detecting slowly-varying nonstationarities that can be seen as a trend in the time marginal y(n) of the time-varying spectrum of the signal. As illustration, Fig. 1 shows the time-varying spectra and the time marginals of a stationary Gaussian signal (x1 (n)) and a nonstationary one (x2 (n)), (Fig. 1a and d, respectively). The mean and variance of the nonstationary process start to increase slowly at n = N/2. Notice the difference between the two TF representations (Fig. 1b and e) and the trend-like behavior that can be seen in the time marginal of the nonstationary process (Fig. 1f).

L−1 L −1−l  l=1

h ( j ) h ( j + l ) E [ x ( n − j ) x ( n − j − l )] .

(4)

j=0

Now, we define the time-varying autocorrelation function Rx (n, l) of x(n) [20]:

Rx (n, l ) = E[x(n )x∗ (n − l )],

(5)

which is function of both time n and lag l. Given (5), one can then express (4) as function of Rx (n, l):

E [ y ( n )] =

L−1  l=0

h2 (l )Rx (n − l, 0 )

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

+2

L−1 L −1−l 

h( j )h( j + l )Rx (n − j, l ).

j=0

l=1

L−1 

h2 ( l ) + 2

L−1 

l=0

rx ( l )

L −1−l

l=1

h ( j )h ( j + l ).

(7)

j=0

Thus, (7) is a special case of (6) obtained when Rx (n, l) is invariant in time. Hence, given the deterministic window h(n) and its 1, . . . , L values (or weights) have been chosen a priori, if x(n) is WSS, E[y(n )] becomes a function of 2L constants only, i.e.,

h2 ( l ) =

E [ y ( n )] = f [ h ( 0 ) , h ( 1 ) , . . . , h ( L − 1 ) , n ] , and the manner f( · ) varies in time will depend on Rx (n, l). 2.1.1. Using multiple windows to compute y(n) In this paper, we also propose to use more than a single window to compute y(n). This can be done by averaging the results of calculating (1) with a set of K windows:

2 K  1 x ( n ) ∗ hk ( n ) , K

(8)

k=1

where individual windows hk (n) in (8) can be distinct in nature, or can be members of a given family of windows, like the Hermite polynomials [21] or the Slepians sequencies [22], but they should contain the same number L of samples. Regardless of the chosen family of windows, it can be shown that a nonstationary behavior affects the expected value of (8) in a similar way as in the case using only a single window. More specifically, by following the same steps from (3) to (6), it can be demonstrated that the expected value of (8) will be given by:

E [ y ( n )] =

L−1   K  l=0

+2

h2k (l )

 E [ x 2 ( n − l )] K

k=1

L−1 L −1−l  K   l=1

hk ( j )hk ( j + l )

j=0 k=1

 E[x(n− j )x(n− j−l )] K

.

(9) One can rewrite (9) as function of the time-varying autocorrelation of x(n):

E [ y ( n )] =

L−1   K  l=0

+2

h2k (l )

 R (n − l, 0 ) x K

k=1

 R (n − j, l ) L−1 L −1−l  K   x hk ( j )hk ( j + l ) . K l=1

(10)

j=0 k=1

E [ y ( n )] =

l=0

k=1

h2k

(l )

K

and

h ( j )h ( j + l ) =

K  hk ( j )hk ( j + l ) . K k=1

Thus, it is easy to see that (4), (6), and (7) are particular cases of (9), (10), and (11), as the latter expressions reduce to the former ones if only K = 1 window is being considered. 2.1.2. A criterion for rejecting stationarity For rejecting stationarity, one could seek for evidences that E[y(n )] is not a constant function as in (7) or (11), but changes with time (i.e. evidences that (6) or (10) holds). Performing this task can be fairly complicated, as E[y(n )] can vary with time in different ways. Fortunately, thanks to the way y(n) is being computed, and to the kind of nonstationarity we want to detect (i.e. slowly-varying nonstationarities), we can determine the approximate temporal behavior of E[y(n )]. 2.2. How should E[y(n )] vary in time if x(n) is a slowly-varying nonstationary signal? We begin this section by defining what we mean by slowlyvarying nonstationarity. In this paper, we consider that a discretetime signal x(n) of length N is first and second-order slowlyvarying nonstationary if the following relations hold:

E [ x ( n )] ≈ E [ x ( n − λ )] , E [ x 2 ( n )] ≈ E [ x 2 ( n − λ )] , E [ x ( n ) x ( n − l )] ≈ E [ x ( n − λ ) x ( n − λ − l )] , for λ = 1, . . . ,  and l = 1, . . . , L,

(12)

where L is the observation interval dictated by h(n) (or the window size), and  is a constant such that  ≥ L and  + L  N. This definition of slowly-varying nonstationarity is similar to the locally stationarity property described in [19]. Note that (12) simply tells that the moments up to the second order of x(n) vary in time as a function following

g(n ) ≈ g(n − λ ), for λ = 1, . . . , lim ,

(13)

where lim is a constant such that  + L ≤ lim and lim  N. The constant lim sets a limit to how far in the past goes the approximated invariant behavior of g(n). Note that (13) is a very broad definition of slowly-varying behavior in time. If x(n) fulfills (12), then it is easy to show that (4) can be approximated as follows:

E [ y ( n )] ≈

L−1 

h 2 ( l ) E [ x 2 ( n − l − λ )]

l=0

Similarly to (6), (10) should vary in time in the general case, as it is written as function of Rx (n, l). However, in case of WSS, (10) can be simplified to L−1   K 

K  h2k (l ) k=1

and since all arguments of f( · ) are constants, E[y(n )] should be itself a constant in time. On the other hand, if x(n) is not WSS, then E[y(n )] will vary with time. In that case, E[y(n )] will be given as function of L constants and the time variable n, i.e.,

(11)

j=0 k=1

In the same fashion as (7), (11) should be a function of constants in time, given the set of h1 (n ), . . . , hK (n ) deterministic windows with 1, . . . , L samples are determined a priori. Note in (9), (10), and (11) that the only change caused by using (8) instead of (1) to compute y(n) is the set of deterministic window values h2 (l) and h( j )h( j + l ) used to weight the signal samples in the expression for E[y(n )]. For the multiple-window formulation, these deterministic weights are obtained by averaging out the ones computed for the K individual windows. More precisely, (4) could be written as (9) if we let

E [ y ( n )] = f [ r x ( 0 ) , . . . , r x ( L − 1 ) , h ( 0 ) , . . . , h ( L − 1 )] .

y (n ) =

 L−1 L−1−l K   rx ( l )   hk ( j )hk ( j + l ) . K l=1

Rx (n, l) varies with time if x(n) is nonstationary, but if x(n) is widesense stationary (WSS), Rx (n, l) reduces to the time-invariant autocorrelation rx (l) [19], which is function of the lag l only. In this case, as Rx (n, l ) = rx (l ) ∀ n, (6) can be simplified to

E [ y ( n )] = r x ( 0 )

+2

(6)

243

 r (0 ) x K

+2

L−1 L −1−l  l=1

h ( j ) h ( j + l ) E [ x ( n − j − λ ) x ( n − j − l − λ )] .

(14)

j=0

since E[x2 (n − l )] ≈ E[x2 (n − l − λ )] and E[x(n − j )x(n − j − l )] ≈ E[x(n − j − λ )x(n − j − l − λ )]. The same idea applies if y(n) is

244

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

computed via multiple-window approach. In this case, (9) can be approximated by

E [ y ( n )] ≈

L−1   K  l=0

+2

h2k (l )

 E [ x 2 ( n − l − λ )]

k=1

K

 L−1 L −1−l  K   hk ( j )hk ( j + l ) l=1

j=0 k=1

E [ x ( n − j − λ ) x ( n − j − l − λ )] K

(15)

By comparing (4) with (14) and (9) with (15), it can be seen that computing the right-hand sides of (14) and (15) are equivalent to computing E[y(n − λ )], which means that E[y(n )] ≈ E[y(n − λ )] for λ = 1, . . . , . In other words, if the moments up to the second order of x(n) vary slowly in time in a sense that they follow (13), then E[y(n )] will also be a slowly-varying function of time, as it will also follow (13), be y(n) computed by (1) or by (8). Recall that such a situation, i.e., the case of a series whose expected value changes as a slowly-varying function in time, is commonly referred to as a trend. Therefore, existence of a trend in the time marginal not only conveys a breakdown in the stationarity of x(n) (owing to the time-dependence E[y(n )], see (4) or (9)), but also indicates x(n) exhibits a slowly-varying nonstationary behavior. Notice that it was the case in Fig. 1, where the slowly-varying nonstationarity of x(n) was reflected on the time marginal as an increasing trend. The key idea behind the proposed technique consists in identifying a significant trend in the time marginal to reject stationarity. There exist different approaches to estimate trends in time series [4,23]. To make the proposed method more applicable to real-world situations, we focus on model-free approaches. Before explaining the chosen method, however, we define properly what we mean by “trend component”. 2.3. Trends in the time marginal: Definition Since “trend” is a context-dependent term, we shall adopt a definition that is appropriate for the proposed framework. The definition of trend adopted here is that of a smooth additive component that carries information about global changes in the time series [23]. Also, we assume the detrended series (representing the fluctuations) is stationary with constant mean given by μr [24]. If we represent the trend by c(n) and the detrended series by r(n), and if we consider the following additive model to hold

y ( n ) = c ( n ) + r ( n ), then we should have E[y(n )] = g(n ) + μr , where g(n ) = E[c (n )] is a function varying slowly with time. Testing for stationarity can be done by first extracting a trend component c(n) from y(n) (see Section 2.4), then by proposing a way to quantify its significance (see Section 2.5). 2.4. Trends in the time marginal: Estimation Among the nonparametric approaches for trend estimation, the Singular Spectrum Analysis (SSA) [25] and the Empirical Mode Decomposition (EMD) [24] are known for decomposing the signal into oscillatory components. Both methods have their advantages, however, the SSA depends on a free parameter, while the EMD is fully data-driven [24]. Moreover, the properties of the EMD basis functions are instrumental to understand the behavior of the trend importance estimator and characterize its equations in Section 4.2 for stationary and nonstationary situations. Owing to these reasons, we have chosen the EMD method for trend estimation.

The EMD is an algorithm introduced in [26] to decompose signals into a superposition of oscillatory terms known as intrinsic mode functions (IMFs). The IMFs are computed iteratively and must satisfy two conditions regarding the number of zero crossing and the mean value of local envelopes. The algorithm for extracting the IMFs is known as sifting process [26]. This algorithm is simple and well documented in the literature [24,26,27], but it will not be reproduced here. The output of the sifting process is a collection of I IMFs {m(i) (n ), i = 1, . . . , I} and a residual ρ I (n). For the time marginal y(n), the EMD gives the following representation:

y (n ) =

I 

m ( i ) ( n ) + ρ I ( n ),

(16)

i=1

where the IMFs ranging from m(1) (n) to m(I) (n) represent local oscillations going from the shortest period to the longest one [24]. The IMFs and ρ I (n) will have the same length of the decomposed series (in this case, the time marginal). The IMFs have the following two important properties that are further recalled in this paper (1) The IMFs are designed to be zero-mean waveforms [28], (2) in practice, consecutive IMFs can be considered to be locally orthogonal to each other [26]. As we have defined trend as a slowly-varying component, we follow here the same idea of [24], which considers the trend component as a superposition of the last few IMFs and ρ I (n). Therefore, estimating the trend will be equivalent to estimating the index i = i∗ in (16) that gives the best approximation of the trend

c (n ) =

I 

m ( i ) ( n ) + ρ I ( n ).

(17)

i=i∗

The criterion to obtain i∗ is to select the smallest common index given by two independent techniques, namely the Ratio and the Energy approaches. The former consists in assessing the expected ratio of zero crossing of successive IMFs. The latter considers that the energy of a given IMF generally increases for i near to i∗ [24]. As illustration, for the same test signals shown in Fig. 1, we present in Fig. 2 a and b the time marginals and the trend components computed by the EMD-based method. 2.5. Estimating the importance of the trend In this paper, the following measure is used to quantify the importance of the trend

θTI =

Var[y(n )] Var[y(n )] = , Var[ydt (n )] Var[y(n ) − c (n )]

(18)

where ydt (n ) = y(n ) − c (n ) is detrended time marginal series. By computing (18), we are assessing the fraction of the original variance of y(n) accounted for by c(n) [29]. The estimation of (18) is carried out by the trend importance estimator M 

θˆTI =

[y ( n ) − μ y ]

2

n=1 M 

,

(19)

[ydt (n ) − μdt ]

2

n=1

where M is the length of y(n) and c(n) series, and μy and μdt are the sample mean of y(n) and ydt (n), respectively. If c(n) is a negligible component of y(n), in a sense that y(n) ≈ ydt (n), then θˆTI ≈ 1. Otherwise, θˆTI should take values greater than one.

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

245

Fig. 3. Overview of the proposed stationarity test. Fig. 2. The time marginals (in gray) and the corresponding EMD trends (in black) for the stationary and nonstationary Gaussian signals are shown in (a) and (b), respectively. The histograms of θTI and mean values θTI computed for 10 0 0 realizations of the nonstationary and stationary Gaussian signals are shown in (c) and (d), respectively.

To illustrate the performance of (19), we have generated 10 0 0 realizations of the processes shown in Fig. 1. Then, for each realization of the processes, we have measured the trend contamination in the time marginal by estimating θˆTI . In Fig. 2 c and d, we present for the nonstationary and stationary signals, respectively, the histograms of θˆTI and its average value θˆTI over all realizations. Notice that, while the distribution of θˆTI is concentrated about one for the stationary processes, it takes much higher values for the nonstationary ones. This difference is also evidenced by the distinct values of θˆTI obtained for the stationary and nonstationary cases. 2.6. Choosing a window function to compute y(n) Thus far we have not yet addressed how to choose the window h(n) (or family of windows). As it is common to nonparametric approaches, it is difficult to define an a priori window to compute y(n), and the window choice can be considered one degree of freedom of the method [8]. Therefore, it is advisable to test different windows to compute y(n), whenever possible. The choice of the length L of the window is associated with the temporal scale of the events we are aiming to capture: larger windows will more likely capture longer temporal events, while short windows will capture local, shorter ones. In practice, one needs a window sufficiently large for visualizing any significant trend in y(n), while keeping the best possible time resolution. Choosing the window in real-world applications, however, may be a difficult task, specially in the absence of information about the underlying process, or if one does not know the time scales of the nonstationarities possibly present in the data. In these cases, it would be of interest to have a window that could be used in general-purpose applications. One could try to find analytically an expression for the window function that would optimize the estimation of θˆTI in a given sense (e.g. by determining the theoretical h(n) that minimizes the bias or the variance of θˆTI ). Unfortunately, such a task can be fairly complicated owing to the empirical algorithmic nature of EMD. Nonetheless, the search for an a priori window function can still be performed empirically, by testing various candidate windows via extensive Monte Carlo simulations. In this paper, we have carried such extensive study on stationary and nonstationary signals of different types and lengths. To evaluate

the performance of the candidate window, we have defined a cost function f (θˆTI ) which should be as close as possible to unity in ideal cases. The details and results of the experimental study are given in Appendix A. 3. Building the stationarity test with the trend c(n) and time marginal y(n) The flowchart of the proposed stationarity test is shown in Fig. 3. More specifically, a one-sided hypothesis test is built, where the null hypothesis of stationarity refers to the situation where the approximated trend c(n) cannot be distinguished from spurious ones that could appear by resampling the original signal x(n) (i.e. by approximating virtual realizations of x(n)). These resamples are bootstrap replicates of x(n), which are supposed to be stationary, so trends in their time marginal are unlikely to be found. We analyze the resamples individually by computing y(n) for each bootstrap replicate and then estimating θˆTI . This will generate a collection of trend importance statistics  = θˆTI (1 ), . . . , θˆTI (P ) characterizing the null hypothesis of stationarity. Further, by developing the expression of the estimator θˆTI and analyzing how it should behave given the properties of the IMFs, we propose to model the distribution of the collection  by a generalized extreme value (GEV) pdf (see Section 5). This parametrization allows for a much easier determination of a threshold for the hypothesis test (see Section 6). 3.1. Generating virtual realizations with block bootstrapping In this work, we use the block bootstrap method to obtain approximated stationary realizations of x(n) while preserving any possible correlation in the data, at least within a short interval (or a block). We have chosen the nonoverlapping block boootstrap approach to obtain resamples containing as many different samples as possible, while approximating (or equaling) the original length of the signal [30,31]. The choice of the block length  used to block-bootstrap x(n) is a crucial point of the algorithm. Roughly speaking, we want blocks long enough to capture possible correlation structures in x(n), while short enough to reduce the variability. We have adopted the criterion presented in [32], which states that if one wants to estimate a one-sided distribution (which is the case here), the asymptotic mean squared error of the block bootstrap estimator is minimized for  = N 1/4 . Therefore, this value of  has been used in this work. The output of the block bootstrap algorithm is a collection x∗1 (n ), . . . , x∗P (n ) of resamples of x(n), where “∗ ” is used to identify

246

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

a resampled quantity. The collection of trend importance statistics  characterizing the null hypothesis of stationarity is obtained from x∗1 (n ), . . . , x∗P (n ). We have observed that the null pdf can be approximated well by a generalized extreme value (GEV) model, which simplifies the selection of a threshold. To understand the choice for a GEV model, we take a closer look at the behavior of θˆTI in Section 4.

terms: M  I  

θTI =

4.1. Trendless time marginal

θTI = Var

 (i )

m (n ) + ρ (n ) I

 Var

i=1

I 

 (i )

m (n ) ,

(20)



θTI =

n=1

2 y m (i ) ( n ) + ρ I ( n ) − μ

i=1 M  n=1

 i∗ −1 

2

i=1

y and μ dt are the sample mean of y(n) and ydt (n), respecwhere μ tively. The length of y(n) and ydt (n) is given by M. If we recall that each IMF is a zero-mean function (see property 1 of the IMFs in i = 0 for i = 1, . . . , i∗ , . . . , I. Thus, in (21), Section 2.4), we have μ dt = μ 1 + · · · + μ i∗ −1 = 0 and μ y = μ 1 + · · · + μ I + μ ρ = we have μ ρ , where μ ρ is the sample mean of ρ I (n). Thus, it can be shown μ that (21) can be written as follows by expanding the squared



+2

M  I  

n=1





. (22)

n=1 i
θTI =

I 

i2 + 2 σ

γi,l + 2

I 

∗ i −1

ρ2 γρ ,i + σ

i=1

i
σ +2 i2

i=1

∗ i −1

.

(23)

γi,l

i
where the terms 1/(M − 1 ) were omitted in (23) since they are anyway canceled out in the division. In (23), the covariance estii,l between ith and lth IMFs should be close to zero due to mate γ the property 2 of the IMFs. Hence, the following could be assumed: I 

i2 2 σ

I 

i=1

γi,l

∗ i −1

and

i2 2 σ

i=1

i
∗ i −1

γi,l .

(24)

i
Then, by using (24), one could approximate (23) as: I 

i2 + σ ρ2 + 2 σ

i=i∗

I 

γρ ,i

i=1 ∗ i −1

,

(25)

i2 σ

i=1

where the terms corresponding to the trended (i = i∗ , . . . , I ) and to the detrended (i = 1, . . . , i∗ − 1 ) counterparts were rearranged. It is simple to verify that (25) could be expressed as function of the individual energies of the IMFs and the residual. To do so, one ρ . could first define the slightly-shifted residual ρ I (n ) = ρ (n ) − μ Then, owing to the property 1 of the IMFs, one can easily verify that M 



2

m (i ) ( n )

ρ2 = = Ei and σ

n=1

M 



2

ρ I ( n )

= Eρ

(26)

n=1

where Ei and Eρ are the energies of the ith IMF and ρ I (n), respectively. Also, one can verify that I 

γρ ,i =

i=1

I  M  

  I

ρ I ( n )m ( i ) ( n ) = Ei Eρ cos αi .

i=1 n=1

(27)

i=1

Note that (27) can be interpreted as a sum of scalar products, where α i is the angle of the ith term. The fraction 1/(M − 1 ) were omitted in (26) and (27) as they are again canceled out in the expression of θTI . According to (26) and (27), we can write (25) as

E∗ + ··· + E + E

1 It should be noted that due to approximation errors in the EMD algorithm, we might obtain values of θ TI that are slightly lower than 1. Anyway, the approximation θ TI ≈ 1 is still reasonable.

m ( i ) ( n )m ( l ) ( n )

 2 M    ρ m (i ) ( n ) + ρ ρ I (n ) − μ ρ I (n ) − μ

i=1

(21)

dt m (i ) ( n ) − μ

n=1 i
By recalling again property 1 of the IMFs, one can verify that i2 , i = (22) can be given as function of the sample variances {σ i,l , i < l, l = 1, . . . , I} between the IMFs, 1, . . . , I} and covariances {γ ρ2 and covariances {γ ρ ,i , i = 1, . . . , I} beand the sample variance σ tween the residual ρ I (n) and the IMFs:

i2 = σ ,

m ( i ) ( n )m ( l ) ( n )

n=1 i
4.2. Trended time marginal

M I  

M  I 

2 M i −1  M i −1   m (i ) ( n ) + 2 m ( i ) ( n )m ( l ) ( n )

θTI = 1 + For the case of a trended time marginal we have θTI ≥ 1. Now, the EMD might not only use the residual ρ I (n), but also different IMFs to approximate c(n). Thus, the algorithm returns an index i∗ ≤ I as the one which gives the best approximation of the trend. In this case, it can be shown that the detrended time marginal is  ∗ −1 (i ) given by ydt (n ) = ii=1 m (n ), and (19) can be rewritten as follows:

2

+2

n=1 i=1

i=1

and by letting ρ I (n) ≈ a, we should have θ TI ≈ 1 in (20), as   Var[ Ii=1 m(i ) (n ) + a] = Var[ Ii=1 m(i ) (n )] for any constant a. The case of θTI ≈ 1 is the most common one in the stationary bootstrapped data1 .

m (i ) ( n )

2

n=1 i=1

+

I 

For the case of a trendless time marginal, we have θTI ≈ 1. This can be easily shown by noting that, in this case, the EMD will return only c (n ) = ρ I (n ) as an approximation of the trend [24].  Then, ydt (n) will be given by ydt (n ) = Ii=1 m(i ) (n ). In this case, the slowly-varying residual should be nearly constant in comparison to the IMFs, so ρ I (n) ≈ a can be assumed, where a is a constant. Then, if we write the quantity we want to estimate, i.e. the importance of the trend in (18), according to the EMD notation I 



n=1 i=1

4. Behavior of the trend importance estimator



n=1 i=1

M i −1  

2

We analyze the behavior of θTI in two situations. The first one stands for the case of a trendless time marginal (indicating stationarity), where a significant trend cannot be detected in y(n). The second one stands for the case of a trended time marginal (indicating nonstationarity), where a significant trend can be detected in y(n).

m (i ) ( n )

I ρ θTI = 1 + i E1 + · · · + Ei∗ −1





E1 Eρ cos α1 + · · · + EI Eρ cos αI

+2



E1 + · · · + Ei∗ −1

,

(28)

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

247

the assumption that the distribution of  can be fairly well modeled by a GEV pdf of type II. Moreover, it is shown that using a reduced number of bootstraps like P = 100 is already sufficient to reproduce the prescribed false alarm rate of the test. In this regard, notice in Fig. 4 that the approximated GEV curves obtained by using P = 100 and P = 10 0 0 bootstraps are already very similar. Therefore, for now on, we shall adopt P = 100 bootstrap replicates to perform the stationarity test. 6. Hypothesis test Fig. 4. Fitting a GEV pdf of type II to the collection { = θTI ( p), p = 1, . . . , P } by using (a) P = 100 and (b) P = 10 0 0 bootstraps. The GEV fit is shown in black and the histograms of θTI are shown in gray.

where the scalar product angles α i are approximately 90 ◦ , and the terms cos α i are close to zero due to property 2 of the IMFs, and also due to the fact that the residual is nearly constant in comparison to other IMFs capturing faster oscillation modes. In this regard, recall that the scalar product terms come from the covariance esi,ρ , and in a limit case, for a given constant a, we should timates γ have Cov[m(i ) (n ), a] = 0. In (28), the energies of the IMFs of the trend component (Ei∗ , . . . , EI ) and residual (Eρ ), are being divided by those of the detrended part (E1 , . . . , Ei∗ −1 ). According to [24], the energies are in general increasing for IMFs with index i ≥ i∗ . Hence, if the energy criterion for choosing i∗ holds, the ratio in (28) should be maximized, increasing the value of θTI , which indicates the presence of a trend. 5. The generalized extreme value distribution The bootstraps are assumed to be stationary. Thus, any resample with a significantly trended time marginal could be interpreted as an extreme event, very unlikely to be found in the stationarized data. Nevertheless, when these rare events occur, we will likely observe large values of θTI , as the ratio of energies in (28) will attain a peak value if a trend is detected. A heuristic argument suggests that the distribution under stationarity of θTI could be modeled by a heavy tailed pdf with a peak and a lower bound around the unity. These features match those from the Fréchet or GEV distribution of type II [33]. This type of pdf is one of the three belonging to the GEV family, which is governed by its shape, scale and location parameters ( , σ and μ, respectively). The type II, or the Fréchet case occurs for > 0, and its cdf is given in (29). This distribution has no upper bound, however, a lower limit is given by its location parameter μ [34], which should be μ ≈ 1 in this application.



 −  if x < μ F ( x; , σ , μ ) = exp − x−σμ if x ≥ μ. 0

(29)

The parameters of the Fréchet pdf are estimated by maximum likelihood from the collection of values of θTI computed for p = 1, . . . , P bootstraps. We represent this collection by { = θTI ( p), p = 1, . . . , P}. The GEV model fitted to  permits to select of a threshold above which the null hypothesis of trendless time marginal is rejected. An example of the GEV modeling is shown in Fig. 4, where θTI was estimated in the time marginals of P = 100 (Fig. 4a) and P = 10 0 0 (Fig. 4b) bootstrap replicates of the nonstationary process shown in Fig. 1d. An empirical study on the adherence of the GEV fit is carried out in Appendix B, where the results of applying a goodness-offit test for GEV pdfs is shown. Furthermore, we verify the test can reproduce the null hypothesis of stationarity. The results support

Having defined the distribution of θTI for the stationary references, it is now simple to derive a threshold above which the null hypothesis of stationarity is rejected under a given level. Putting it in the form of a one-sided test, we use as test statistic the estimated importance of the trend in the original signal. The hypothesis test is given in (30), where the threshold τ is computed given a false alarm rate of 5%.



d (x ) =

1 0

if θTI > τ , “nonstationary”, if θTI < τ , “stationary”.

(30)

7. Experimental study 7.1. Alternative methods considered for evaluation The proposed approach has been compared with other stationarity tests available in the literature, namely the nonstationarity detector proposed by S. Kay [3], the classical KPSS test [4,35], and the TF approach of [12] based on surrogate resampling. The approach of [3] assumes the signal follows a time-varying autoregressive (TVAR) model. The rationale behind the technique is to test the nullity of certain TVAR parameters, which should be equal to zero in case of stationarity. The KPSS method tests the null hypothesis that the signal is trend stationary, against the alternative hypothesis that it is a nonstationary unit-root process. For the KPSS test, the time series is expressed as x(n ) = rw (n ) + cd (n ) + s (n ), where s (n) is a stationary error term, cd (n) is a deterministic trend component, and rw (n) is a random walk process rw (n ) = rw (n − 1 ) + w(n ) with w(n) being an i.i.d. random noise with mean and variance equal to 0 and σ 2 , respectively. The KPSS method tests the null hypothesis that σ 2 = 0 (meaning that rw (n) is constant), against the alternative hypothesis that σ 2 > 0. Finally, the surrogate method makes use of the TF representation of x(n) to test if the dispersion of its local spectra S(n, f) around the average  spectrum S(n, f ) = 1/N N n=1 S (n, f ) can be rejected via hypothesis test, considering the null hypothesis of stationarity is charecterized by the stationary surrogates. 7.2. Tested datasets The proposed stationarity test has been applied to (i) nonstationary Gaussian and gamma processes following the Lombard smooth-change model, (ii) uniformly modulated processes, (iii) first-order Markov processes. These three models are commonly used for representing slowly-varying nonstationary behaviors that appear in practice. Also, one could easily verify that they fulfill the broad definition of slowly-varying nonstationary process adopted in this work (see (12)). We have tested stationary autoregressive (AR) and white Gaussian noise (WGN) processes as well. The signals tested in this section have two lengths, N = 10 0 0 and N = 20 0 0. 7.2.1. First-order markov processes First-order Markov processes are commonly employed to model slowly-varying nonstationary processes, specially in applications of

248

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252 Table 1 Ranges over which the parameters of the synthetic nonstationary processes vary. Markov 1st-order

UMP

Gaussian Mean

Gaussian Var.

Gamma Shape

Gamma Scale

Case 1

α = 0.99

ρ = 0.5

Case 2

α = 0.999

ρ = 0.75

ξ1 = 0 ξ2 = 2 ξ1 = 0 ξ2 = 3

ξ1 = 1 ξ2 = 2.5 ξ1 = 1 ξ2 = 3.25

ξ1 = 0.355 ξ2 = 3.195 ξ1 = 0.355 ξ2 = 3.55

ξ1 = 19.94 ξ2 = 139.58 ξ1 = 19.94 ξ2 = 159.52

Table 2 Results of the experimental study for the proposed stationarity test (“New”), the TF-based surrogate test of [12] (“Surro”), the KPSS test (“KPSS”), and the nonstationarity detector of [3] (“Kay”). The results are given as percentage “nonstationary” outcomes observed over 10 0 0 realizations of the test signals. Nonstationary signals Length

N = 10 0 0

Method

New (%)

Kay (%)

KPSS (%)

Surro (%)

New (%)

Kay (%)

KPSS (%)

Surro (%)

100 100 71.9 67.1 92.6 94.7 17.1 16.9 99.2 100 52.9 83

100 100 16.8 18.9 93 99.9 15.7 17.4 2.7 10.7 3 0.3

4.7 6.2 6.3 5.4 5.3 5.5 5.6 6.2 4.5 5.9 100 100

22.9 50.9 67.8 90 39.5 40.9 99.1 99.2 86 99.8 6.5 9

100 100 50.2 33.9 61.2 73.4 30 30.8 99.8 100 18.9 74

100 100 16.9 23 93.7 100 20.8 18.4 3.1 10.1 6 0.6

6.9 5.2 4.8 7.4 6.1 5.9 6.5 5.2 5.5 6.7 100 100

28.8 66.3 96.3 99.7 68.5 73.5 100 100 100 100 6.6 8

6.4 7.8

7.6 8

6.3 0

7.7 6.9

4.1 8.9

7 7.7

5.1 0

6.4 8.6

Mean Gaussian Var. Gaussian Shape gamma Scale gamma UMP Markov 1st order

Case 1 Case 2 Case 1 Case 2 Case 1 Case 2 Case 1 Case 2 Case 1 Case 2 Case 1 Case 2

N = 20 0 0

Stationary signals WGN AR(1)

adaptive filtering, where such models are used to represent the slowly random variations of the plant [36]. A univariate first-order Markov model can be expressed as

w ( n ) = α w ( n − 1 ) + φ ( n ), where 0 ≤ α ≤ 1 and φ (n) is a zero-mean white noise process. The first-order Markov process is nonstationary, however, it approaches the asymptotic stationarity as n → ∞. The closer the parameter α is to 1 (without ever equaling it), the greater is the number of iterations needed to achieve the asymptotic stationarity, or the slower the nonstationary behavior. We have tested two groups of signals following the first-order Markov model, each one with a different value of α . These are shown in Table 1 in Cases 1 and 2. 7.2.2. Uniformly modulated processes Suppose u(n) is a nonstationary signal obtained by multiplying a zero-mean stationary process v(n), and a deterministic modulation function z(n), i.e., u(n ) = z(n )v(n ). If z(n) varies “slowly” in comparison to v(n), then u(n) is called a Uniformly Modulated Process (UMP)2 . These processes are commonly used to model slowlyvarying nonstationary behaviors [19,37]. In this paper, we have tested the following UMP:

u(n ) = Cne−β n/2 vu (n ),

(31)

where vu (n) is random and uniformly distributed over [−1, 1]. The UMP in (31) has been used to model the accelaration trace of ground motion in earthquakes, where z(n ) = C β e−bn/2 models intensity signal commonly observed in historical seismic events [37]. The parameters C and β are related to the intensity of the ground acceleration, and the effective duration of the ground motion [37], respectively. In this work, we have defined β = 2/ρ N and 2 In other words, if the variation z(n) is small if compared to the statistical memory of v(n) [19].

C = β /2e−1 to guarantee that 0 < z(n) ≤ 1 and ρ N = arg maxn z(n ), where ρ ∈ (0, 1]. Doing so, we force the seismic intensity to reach z(n ) = 1 (its maximum) at nMAX = ρ N. Since 0 < ρ ≤ 1, nMAX will stand for a given fraction of N. The closer ρ is to one, the longer it takes for z(n) to reach its maximum. In Table 1, the two values of ρ considered in this work are shown in Cases 1 and 2. 7.2.3. Nonstationary gaussian and gamma processes The tested nonstationary Gaussian and gamma processes are assumed to follow the Lombard smooth-change model [38]. More specifically, let x(n) be a univariate stochastic process whose pdf exhibts a time-varying parameter θ (n). Then, x(n) is assumed to follow the Lombard smooth-change model if

⎧ n = 1, . . . , n1 , ⎪ ⎨ξ1 , (n − n1 )(ξ2 − ξ1 ) θ ( n ) = ξ1 + , n = n1 + 1, . . . , n2 − 1, ( n2 − n1 ) ⎪ ⎩ ξ2 , n = n2 , . . . , N .

(32)

where ξ 1 and ξ 2 are the parameter values before and after change, respectively. In this work, we have chosen n1 = 1 and n2 = N to obtain a configuration of (32) known as onset-of-trend model [38,39]. This setup represents a gradual, linear increase of the initial parameter value ξ 1 to the final one ξ 2 that starts at n = 1 and ends at n = N. Four groups of signals have been generated using this configuration: (i) Gaussian processes with a fixed mean (μ = 0) and a varying variance (ii) Gaussian processes with a fixed variance (σ 2 = 1) and a varying mean, (iii) gamma processes with a fixed shape parameter (a = 0.355) and a varying scale parameter, and (iv) gamma processes with a fixed scale parameter (b = 19.94) and a varying shape parameter. The range over which the parameters of the Gaussian and gamma signals vary are shown in Table 1 in Cases 1 and 2. The initial values chosen for the gamma shape and scale parameters (i.e. a = 0.355 and b = 19.94) are from gamma models applied to real-world rainfall series [40]. Thus, by

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

testing these signals we can assess the performance of the test in the event of a gradual change in the parameters of gamma models used in practice.

Distribution

0

3

0

y(n) c(n)

TI

is indeed observed in the histograms of Fig. 5h. Note the nonstationarity detector proposed in [3] (“Kay”) performed poorly in classifying the Gaussian process with a varying variance, the UMP and first-order Markov signals. We point out that, although the TVAR modeling used in [3] is very flexible, guaranteeing good model adherence for different types of nonstationary signals can be difficult. The classical KPSS test has only worked well in detecting the nonstationarity of the first-order Markov pro-

3 The Accuracy of a binary classifier can be defined as the sum of the true positives and true negatives accused by the classifier, over the total population of tested signals.

Hist. GEV fit τ = 1.12 θˆT I = 2.1

Distribution

40

2 1.5 1

30 20 10 0

0

200 400 600 800 1000 Time (c)

1 1.5 2 2.5 Trend importance estimates (d)

Gaussian varying variance 40

200 Time marginal

y(n) c(n)

150 100 50 0

3 Time marginal

These measures quantify the overall classification performance of the four methods. As it can be seen, the average performance of the proposed stationarity test is better than the other tests. This is an advantage, especially if we lack a priori knowledge about any possible nonstationary behavior in the test signal, as it is common when testing real-world processes. We have selected some nonstationary signals tested in Table 2 to carry out an extended analysis of the stationarity test. This analysis is shown in Fig. 5 for the following test cases: (i) a Gaussian process with a varying mean (Fig. 5a and b), (ii) a Gaussian process with a varying variance (Fig. 5e and f), (iii) a gamma process with a varying shape parameter (Fig. 5c and d), and (iv) a gamma process with a varying scale parameter (Fig. 5g and h). The test cases (i) to (iii) have been correctly classified as nonstationary, while the test case (iv) has been incorrectly classified as stationary. These signals have N = 10 0 0 samples and their parameters vary according to Case 1 in Table 1. Notice in Fig. 5, that the trends in the computed time marginals are evident, specially for test cases (i) to (iii). Notice the distance between θTI and τ is greater for test cases (i) and (iii), these cases correspond to one of the best performances of the proposed method in Table 2. The test case (iv), on the other hand, corresponds to one of the worst performances reported in Table 1. The probable reason for such bad performance is the resulting very large variance of x(n) as the gamma scale parameter increases. The variance of a gamma variable varies with the square of the scale parameter. These significant fluctuations of x(n) are manifested in y(n), as it can be seen in Fig. 5g. The peak values of x(n) can appear in its bootstraps, which may lead to spurious trended time marginals that can increase the test threshold. The occurence of much higher values in the null distribution of θ

1 1.2 1.4 1.6 Trend importance estimates (b)

Gamma varying shape parameter

×105

0.5

ACCKPSS = 0.323.

30

Hist. GEV fit τ = 1.1 θˆT I = 1.3

20 10 0

200 400 600 800 1000 Time (e) ×10

1 1.2 1.4 1.6 Trend importance estimates (f)

Gamma varying scale parameter 15

5 y(n) c(n)

2

1

0

Hist. GEV fit τ = 1.04 θˆT I = 1.7

20

200 400 600 800 1000 Time (a)

2.5

ACCKay = 0.478,

40

50

Distribution

ACCSurro = 0.689,

100

Distribution

ACCNew = 0.728,

Gaussian varying mean 60

150

Time marginal

We have tested 10 0 0 realizations of the each test signal, considering the cases shown in Table 1. For choosing the window function to be used by the stationarity test, we have considered the empirical rule defined in Appendix A. We have used P = 100 bootstrap resamples to perform the tests. The results are shown in Table 2, where “New”, “KPSS”, “Kay”, and “Surro” correspond to the stationarity proposed in this work, the KPSS test, the nonstationary detector of [3], and the TF-based surrogate method, respectively. All tests have been carried out with a significance level of 5%. It can be seen in Table 2 that, in comparison to the other three approaches, the proposed method (“New”) allows for an overall better classification of the nonstationary signals, while keeping the false positives low. If we consider the four tested methods as binary classifiers and compute their Accuracy (ACC)3 , we obtain

y(n) c(n)

Time marginal

7.3. The test in action

200

249

200 400 600 800 1000 Time (g)

10

Hist. GEV fit τ = 4.24 θˆT I = 1.1

5

0

1 2 3 4 Trend importance estimates (h)

Fig. 5. Analyzing the time marginals, trends and the hypothesis tests for some test cases selected from Table 2. The test threshold and the test statistic are given by τ and θTI , respectively. The time marginal and trends, and the hypothesis test scheme are shown, respectively, for the following test cases: (a) and (b) a Gaussian process with a varying mean, (c) and (d) a gamma process with a varying shape parameter, (e) and (f) a Gaussian process with a varying variance, (g) and (h) a gamma process with a varying scale parameter.

cess, and the stationarity AR(1) process. However, such good performance is expected, since the KPSS test is designed for testing data that can be modeled as autoregressive processes. Finally, it can be seen that, in comparison to other test signals, the performance of the surrogate method is worse for the Gaussian signals with a varying mean, and the signals of Case 1 in general (i.e. the cases with “slower” nonstationary dynamics). These results reflects

250

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

lengths, more specifically, N = 100, 150, . . . , 4950, 50 0 0. For every value of N, we have generated 10 0 0 realizations of each stationary and nonstationary process, from where we have computed θTI , the average value of θTI over all realizations. Let θTI stat and θTI nstat be the averages computed when h,L,N h,L,N using a certain window h(n) with L samples, for stationary and nonstationary signals of length N, respectively. Recall that θ

the problems TF methods usually face when estimating the spectral content of nonstationary signals at low frequencies. 8. Conclusions We have proposed a nonparametric test for slowly-varying nonstationarities that can be seen as a trend in the time marginal series of the signal. Being nonparametric, this method does not require any a priori knowledge about the functional form of the signal, which makes it better suited to test real-world signals. In the proposed framework, we measure the importance of the trend in the time marginal of the original signal and its stationarized counterparts by using a proper test statistic. We have shown that the distribution of the test statistic under stationarity can be fairly well modeled by a GEV pdf of type II. The stationarity test, which is performed via hypothesis test, has shown to outperform other methods in an experimental study.

TI

should take values much larger than one for nonstationary signals, whereas it should be close to unity in case of stationarity. Thus, ideally, we should have θTI stat ≈ 1 and 1/θTI nstat ≈ 0. Let us h,L,N h,L,N stat nstat  define  (N ) and  (N ) as the mean value of θ stat and h,L

TI h,L,N

h,L

1/θTI nstat over all tested stationary and nonstationary processes, h,L,N respectively. Then, we propose to use the following cost function to assess the performance of a given window: nstat fh,L (N ) = stat h,L (N ) + h,L (N ).

Note that (A.1) evaluates the estimation of θTI in both stationary and nonstationary settings. In ideal situations, (A.1) should take values as close as possible to one. For each N, we have computed (A.1) for all options of windows and values of L, allowing us to determine the best case in this experiment (the one for which (A.1) is the closest to unity). Based on the results, we could establish the following empirical rule to determine the best choice of window for a given N:

Appendix A. On the choice of the window function Monte Carlo simulations have been carried out to assess the average performance of (19) when using different windows. We have evaluated the cases of employing or not the multiple-window technique to compute y(n). To calculate y(n) by means of (1), we have considered the following windows: Gaussian [41], Kaiser, Triangular, Hanning and Rectangular [8]. The expression for these windows are well documented in the literature and will not be presented here. For a given window length L, these five windows can be computed in MATLAB by using the functions gausswin(L), kaiser(L), triang(L), hanning(L), and rectwin(L), respectively4 . To compute y(n) via (8) we have chosen the set of windows formed by the K first Hermite functions, which can be

100 ≤ N ≤ 200 : Rectangular window, nh = 5, 250 ≤ N ≤ 400 : Kaiser window, nh = 7, 450 ≤ N ≤ 800 : Kaiser window, nh = 13 or nh = 15, 850 ≤ N ≤ 20 0 0 : Multiple-window with Hermite functions, nh = 7, if 2050 ≤ N ≤ 50 0 0 : Multiple-window with Hermite functions, nh = 7 or nh = 9. if if if if

computed by hk (n ) = e−n /2 Hk (n )/ π 1/2 2k k!, where Hk (n) are the Hermite polynomials following the recursion Hk (n ) = 2nHk−1 (n ) − 2(k − 2 )Hk−2 (n ) for k ≥ 2 and the initialization H0 (n ) = 1 and H1 (n ) = 2n [12]. The Hermite functions are orthonormal and maximally concentrated in TF domain [42]. In this experiment, the number of Hermite functions was set to K = 10. Eight values of window sizes have been tested: L = 3, 5, 7, 9, 11, 13, 15, and17. For assessing each possible combination of window and value of L, we have generated different datasets of nonstationary and stationary processes. The nonstationary signals are Gaussian and gamma processes following various possible configurations of the smooth-change model in (32). The stationary signals are white Gaussian noise and stationary autoregressive processes. We have tested signals of different 2

Appendix B. On the adherence of the GEV model A goodness-of-fit test for GEV distribution has been proposed by A. Zempléni in [43]. The method is a modification of the wellknown Anderson–Darling test. The Zempléni statistic B2 allows for testing the hypothesis that a cdf F belongs to the GEV family, against the null hypothesis of F being an arbitrary continuous cdf.

Table B.3 (a) Critical values of B2 for different values of and sample sizes. (b) For the test signals from Table 2, average value of the computed Zempléni statistic B2 over 10 0 0 realizations. (a) Tabulated critical values of B2 for different values of

/ sample size

25

50

100

200

400

0.2 0.5

0.307 0.297

0.311 0.308

0.317 0.312

0.323 0.316

0.324 0.320

UMP

Gaussian Mean 97.031 96.731 96.728 96.388

Gaussian Var. 97.173 97.008 97.048 97.644

Gamma Shape 96.856 96.788 98.274 97.906

Gamma Scale 104.303 104.629 104.453 104.982

(b) Average value of B2

Case 1 Case 2 Case 1 Case 2

Markov 1st-order 97.118 96.859 99.541 98.588

97.386 97.448 97.337 97.357

(A.2)

The rule in (A.2) could serve as reference to the user for choosing a candidate window when applying the proposed stationarity test. Finally, we remark that testing different windows (or family of windows for the multiple-window approach) is still advisable, as other choices of window can lead to good performances when applying the stationary test. Note that (A.2) is not an absolute rule, it has been defined empirically for the experiment designed in this section.

4 The free parameters of the Gaussian and Kaiser windows were set to α = 2.5 and β = 0.5, respectively.

Process

(A.1)

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

0.1

False Alarm Rate

0.09 0.08 0.07 0.06 0.05 0.04 0.03

100 200 300 400 500 600 700 800 900 1000 Number of bootstraps

Fig. B.6. Verifying the reproduction of the false alarm rate for an increasing number of bootstraps P = 10, 20, . . . , 990, 10 0 0. For each P, the stationarity test has been applied to 10 0 0 realizations of a stationary WGN(0,1) signal. The plot shows the obtained false positive rate as function of P.

The critical values of B2 for rejecting the null hypothesis are shown in [43] for different values of the shape parameter , and they are reproduced in Table B.3 (a) for the case of a significance level of 5% , = 0.2 and = 0.5. Note that the critical values generally increase with the sample size, but there is no significant variation across the different values of . For all cases tested in this work, the obtained values of  remained around 0.5. In fact, the average value of  for all processes tested in Section 7.3 is equal to  = 0.55. Finally, Table B.3 (b) presents the average values of the Zempléni statistic B2 computed over all realizations of the signals tested in Section 7.3. Note that the averages are much larger than the critical values shown in Table B.3 a, which indicates the adherence of the GEV model. To endorse the assumption that the null distribution of θˆTI belongs to the GEV family, we verify if the proposed test can reproduce the prescribed false alarm rate. This analysis should be performed by considering the number P of bootstraps used to approximate the GEV pdf, as the larger the value of P, the better the characterization of the null hypothesis. The study has been conducted on 10 0 0 independent realizations of a stationary WGN(0,1) process containing 10 0 0 samples, with a false alarm rate set to 5%. The results are shown in Fig. B.6, as function of P. Notice that by using P = 100 is already sufficient for achieving the prescribed level of confidence.

References [1] M. Kamruzzaman, S. Beecham, A.V. Metcalfe, Non-stationarity in rainfall and temperature in the murray darling basin, Hydrol. Process. 25 (10) (2011) 1659–1675. [2] P. Verdes, P. Granitto, H. Ceccatto, Secular behavior of solar magnetic activity: nonstationary time-series analysis of the sunspot record, Solar Phys. 221 (1) (2004) 167–177. [3] S. Kay, A new nonstationary detector, IEEE Trans. Signal Process. 56 (4) (2008) 1440–1451. [4] Y.J. Chu, et al., A new regularized TVAR-based algorithm for recursive detection of nonstationarity and its application to speech signals, in: Proceedings of 2012 IEEE Statistical Signal Processing Workshop (SSP), 2012, pp. 361–364. [5] D. Kwiatkowski, P.C.B. Phillips, P. Schmidt, Y. Shin, Testing the null hypothesis of stationarity against the alternative of a unit root, J. Econom. 54 (1–3) (1992) 159–178. [6] Y. Dwivedi, S. Subba Rao, A test for second-order stationarity of a time series based on the discrete fourier transform, J. Time Ser. Anal. 32 (1) (2011) 68–91. [7] M. Basseville, I.V. Nikiforov, Detection of Abrupt Changes, Prentice-Hall, Englewood Cliffs, NJ, 1993.

251

[8] D.G. Manolakis, V.K. Ingle, S.M. Kogon, Statistical and Adaptive Signal Processing: Spectral estimation, Signal Modeling, Adaptative Filtering and Array Processing, Artech House, London, 2005. 796 p. [9] D. Rudoy, T. Georgiou, Regularized parametric models of nonstationary processes, in: Proceedings of 19th International Symposium on Mathematical Theory of Networks and Systems - MTNS, Budapest, Hungary, 2010, pp. 1023–1029. [10] D. Rudoy, T.F. Quatieri, P.J. Wolfe, Time-varying autoregressions in speech: detection theory and applications, IEEE Trans. Audio Speech Lang. Process. 19 (4) (2011) 977–989. [11] I. Rekanos, L. Hadjileontiadis, An iterative kurtosis-based technique for the detection of nonstationary bioacoustic signals, Signal Process. 86 (12) (2006) 3787–3795. [12] P. Borgnat, P. Flandrin, P. Honeine, C. Richard, J. Xiao, Testing stationarity with surrogates: a time-frequency approach, IEEE Trans. Signal Process. 58 (7) (2010) 3459–3470. [13] I. Shafi, et al., Techniques to obtain good resolution and concentrated time-frequency distributions: a review, EURASIP J. Adv. Signal Process. 2009 (1) (2009) 1–43. [14] B. Boashash, Time Frequency Signal Analysis and Processing: A Comprehensive Reference, 1, Elsevier, Oxford, UK, 2003. [15] D. de Souza, J. Chanussot, A. Favre, P. Borgnat, A modified time-frequency method for testing wide-sense stationarity, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012, pp. 3409–3412. [16] K. Dressler, Sinusoidal extraction using an efficient implementation of a multi-resolution fft, in: Proceedings of International Conference on 9th International Conference on Digital Audio Effects (DAFx-06), Montreal, QC, Canada, 2006, pp. 247–252. [17] T. Fillon, J. Prado, A flexible multi-resolution time-frequency analysis framework for audio signals, in: Proceedings of 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), Montreal, QC, Canada, 2012, pp. 1124–1129. [18] D. de Souza, J. Chanussot, A. Favre, P. Borgnat, A new nonparametric method for testing stationarity based on trend analysis in the time marginal distribution, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014 , Florence, Italy, 2014, pp. 320–324. [19] P. Flandrin, Time-Frequency/Time-Scale Anaylsis, Wavelet Analysis and its Applications, Academic Press, San Diego, CA, 1999. [20] C.W. Therrien, Some considerations for statistical characterization of nonstationary random processes, in: Proceedings of Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002., 2, 2002, pp. 1554–1558. [21] F. Olver, et al., NIST Handbook of Mathematical Functions, Cambridge University Press, Cambridge, UK, 2010. [22] D. Thomson, Spectrum estimation and harmonic analysis, Proceedings of the IEEE 70 (9) (1982) 1055–1096. [23] T. Alexandrov, S. Bianconcini, E.B. Dagum, P. Maass, T.S. McElroy, A review of some modern approaches to the problem of trend extraction, Econom. Rev. 31 (6) (2012) 593–624. [24] A. Moghtaderi, P. Flandrin, P. Borgnat, Trend filtering via empirical mode decompositions, Comput. Stat. Data Anal. 58 (2013) 114–126. [25] R. Vautard, M. Ghil, Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series, Phys. D: Nonlinear Phenom. 35 (3) (1989) 395–424. [26] N.E. Huang, et al., The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis, Proc. R. Soc. Lond. A 454 (1971) (1998) 903–995. [27] G. Rilling, P. Flandrin, P. Gonçalvès, On empirical mode decomposition and its algorithms, in: Proceedings of the 6th IEEE/EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP ’03), Grado, Italy, 2003. [28] P. Flandrin, G. Rilling, P. Goncalves, Empirical mode decomposition as a filter bank, Signal Process. Lett. IEEE 11 (2) (2004) 112–114. [29] P. Guhathakurta, M. Rajeevan, Trends in the rainfall pattern over india, Int. J. Climatol. 28 (2008) 1453–1469. [30] A.C. Davison, D.V. Hinkley, Bootstrap Methods and their Application (Cambridge Series in Statistical and Probabilistic Mathematics), 1, Cambridge University Press, 1997. [31] S.N. Lahiri, Theoretical comparisons of block bootstrap methods, Ann. Stat. 27 (1999) 386–404. [32] P. Hall, J.L. Horowitz, B.-Y. Jing, On blocking rules for the bootstrap with dependent data, Biometrika 82 (3) (1995) 561–574. [33] S. Coles, An Introduction to Statistical Modeling of Extreme Values, Springer, 2001. [34] S. Kotz, Extreme Value Distributions: Theory and Applications, World Scientific Publishing Company, 2001. [35] B. Hobijn, P.H. Franses, M. Ooms, Generalizations of the KPSS-test for stationarity, Stat. Neerl. 58 (4) (2004) 483–502. [36] A.H. Sayed, Adaptive Filtering, John Wiley & Sons, Hoboken, NJ, 2009. [37] A. Giaralis, P.D. Spanos, Derivation of non-stationary stochastic process compatible with seismic response/design spectra, in: Proceedings of the 6th International Conference on Computational Stochastic Mechanics, Rhodes, Greece, 2010, pp. 1–13. [38] F. Lombard, Rank test for changepoint problems, Biometrika 74 (3) (1987) 615–624.

252

D. Baptista de Souza et al. / Signal Processing 143 (2018) 241–252

[39] J.-F. Quessy, A.-C. Favre, M. Saïd, M. Champagne, Statistical inference in Lombard’s smooth-change model, Environmetrics 22 (7) (2011) 882–893. [40] H. Aksoy, Use of gamma distribution in hydrological analysis, Turkish J. Eng. Env. Sci. 24 (6) (20 0 0) 419–428. [41] F.J. Harris, On the use of windows for harmonic analysis with the discrete fourier transform, Proc. IEEE 66 (1) (1978) 51–83.

[42] P. Flandrin, Maximum signal energy concentration in a time-frequency domain, in: Acoustics, Speech and Signal Processing (ICASSP), 1988 IEEE International Conference on, 4, 1988, pp. 2176–2179. New York, NY [43] A. Zempléni, Goodness-of-fit test in extreme value applications, Discussion paper, 2004.