On the clustering of foF2 time series corresponding to disturbed ionospheric periods

Available online at www.sciencedirect.com Advances in Space Research 45 (2010) 1129–1144 www.elsevier.com/locate/asr On the clustering of foF2 time ...

Download PDF

491KB Sizes 1 Downloads 27 Views

Report

PDF Reader
Full Text

Available online at www.sciencedirect.com

Advances in Space Research 45 (2010) 1129–1144 www.elsevier.com/locate/asr

On the clustering of foF2 time series corresponding to disturbed ionospheric periods K. Koutroumbas, I. Tsagouri *, A. Belehaki National Observatory of Athens, Institute for Space Applications and Remote Sensing, Metaxa and Vas. Pavlou Str., P. Penteli 15236 Greece Received 28 July 2009; received in revised form 11 January 2010; accepted 12 January 2010

Abstract This paper presents an analysis of a set of time series that represent foF2 disturbances during storm conditions, using clustering tools. The time series under study have been drawn from ionospheric observations obtained from eight European middle latitude ionosondes during a signiﬁcant number of storm-time intervals and they are divided into eight groups according to the latitude (middle to low and middle to high) and the local time of the observation point at storm onset (afternoon, evening, morning, prenoon). The time series in each group have been analyzed using clustering-based methods. Speciﬁcally, each time series has been represented using two diﬀerent ways of representation: the ﬁrst is the raw representation while the second is through the parameters of the autoregressive (AR) model that best represents it. For each representation a hierarchy of clusterings is produced via the complete link algorithm. The two produced hierarchies are combined to a single one and the ﬁnal clustering results are extracted from the produced hierarchy. The obtained results are in close agreement with the theoretical formulations concerning ionospheric storm eﬀects at middle latitudes. In addition, they may be proved useful in the development of more accurate ionospheric forecasting methods. Ó 2010 COSPAR. Published by Elsevier Ltd. All rights reserved. Keywords: Ionospheric disturbances; Ionospheric modeling and forecasting; Clustering methods

1. Introduction The accurate ionospheric predictions (especially in the short and the medium term) remain a challenge for the space weather community. In addition, the requirement for such accurate ionospheric predictions becomes more demanding, taking into account that the state of the ionosphere inﬂuences signiﬁcantly the performance of several operational applications aﬀected by space weather, e.g. HF communications, Earth observation systems, and satellite positioning and navigation systems. Consequently, numerous attempts have been recorded in recent years for the development of reliable ionospheric forecasting techniques for operational use. Most of the eﬀorts lie in the emphasis on the F region peak electron density (NmF2) and on the total electron content (TEC). Both parameters, together with the scale height, *

Corresponding author. Tel.: +30 210 8109104; fax: +30 210 6138343. E-mail address: [email protected] (I. Tsagouri).

aﬀect the shape of the electron density proﬁle. This is crucial for the speciﬁcation of the conditions in the topside ionosphere and plasmasphere, where a large number of space assets reside. An overview of ionospheric models available for space weather purposes has been recently published (Belehaki et al., 2009, and references therein). Ionospheric storms represent an extreme form of space weather and therefore, a large number of papers have been devoted to studies of the ionospheric storm eﬀects even from the early days of ionospheric research (e.g. Pro¨lss, 1995; Buonsanto, 1999, and references therein). From both experimental and theoretical studies, a basic picture of global ionospheric storms has now emerged, pointing out the eﬀect of the solar wind energy input, which is captured by the magnetosphere, transformed and dissipated in the polar upper atmosphere, in triggering and driving of the ionospheric storm eﬀects (Pro¨lss, 1995, 2005). Critical for the prediction of foF2 disturbances during periods of enhanced geomagnetic activity (magnetic storms) is the

0273-1177/$36.00 Ó 2010 COSPAR. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.asr.2010.01.016

1130

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

local time (LT) and the latitudinal dependence of the ionospheric response (e.g. Pro¨lss, 1995). Both are clearly evident even in the same regional zone. As a consequence, foF2 experiences a large spatial and temporal variability. Thus, the major ambiguity in the ionospheric forecasting issue occurs during geomagnetically disturbed conditions. Based on recent advances, a storm-time empirical model, called STIM, for the formulation of the ionospheric storm-time response at middle latitudes was recently introduced (Tsagouri and Belehaki, 2008). STIM’s formulation was realized by the systematic investigation of the eﬀect of solar wind disturbances observed in the Earth’s vicinity to the middle latitude ionosphere during intense (200 nT 6 Dst < 100 nT) or big (Dst < 200 nT) storm events (Gonzalez et al., 1999). The model provides a correction factor to the quiet diurnal ionospheric variation during storm conditions. Storm conditions are determined by monitoring IMF conditions at the L1 point with the NASA Advance Composition Explorer (ACE) spacecraft. In particular, the model is triggered by an alert signal for upcoming ionospheric disturbances obtained from the on-line analysis of the IMF’s observations. Quantitative criteria applied to (a) IMF magnitude, B, (b) its rate of change, dB/dt, and (c) IMF-Bz component, determine alert conditions and the storm onset at L1 point. This methodology is operationally implemented within the European Digital upper Atmosphere Server–DIAS system (Belehaki et al., 2006, 2007) to issue ionospheric alerts (http://dias.space. noa.gr). Then, as the ionospheric response evolves and recovers on its own time scales, STIM estimates the time delay in the ionospheric storm onset and formulates the ionospheric storm-time response by utilizing empirical expressions taking into account the latitude and the LT sector of the observation point at the storm onset. For the development of STIM, the IMF conditions at the L1 point and the corresponding ionospheric response as it was recorded by several European ionosondes during selected storm-time intervals were investigated via superimposed epoch analysis. To incorporate the latitudinal and the local time dependence of the ionospheric response in that analysis, the ionospheric observations were divided into eight groups based on the latitude and the LT sector of the observation point at storm onset time. In this paper, we further analyze the foF2 time series in each one of these eight groups, in order to see if they form physical groups (clusters). As STIM, in its current formulation, reﬂects the average ionospheric response in each latitudinal zone and storm onset sector, it would be useful to know if there are more than one pattern followed by foF2 during the evolution of the disturbance (for a given LT onset sector and latitudinal zone). This may provide some additional insight to the way the evolution of the foF2 disturbance unfolds, as well as some additional input regarding the ionospheric LT dependence, which may lead to more accurate foF2 predictions during disturbed periods. Moreover, for the purposes of the current investigation, a new quantitative approach based on clustering

theory aspects is applied. The success of the proposed methodology in capturing ﬁne features of the ionospheric response may provide the ionospheric community with additional tools for future investigations/developments. The paper is organized as follows: in Section 2 the methodology adopted is described (along with some basic concepts in clustering, which have been introduced in order to make the paper as self-contained as possible). Section 3 contains the clustering results along with some comments on them. Finally, Section 4 contains a discussion of the results and future work. 2. Time series clustering methodology In this work xi corresponds to a speciﬁc representation of the foF2 time series, with hourly time resolution, although the methodology given here is applied in a wide range of applications. Let X ¼ ffxi ðtÞgg; t ¼ 1; . . . ;

i ¼ 1; . . . ; N g

ð1Þ

be a set consisting of N time series. The length of the ith time series is denoted by li and it is not necessarily the same for all time series. 2.1. Time series representations Two diﬀerent ways for representing each time series are adopted for the purposes of the current paper. Raw representation: According to this representation, the ith time series is represented by the li vector T ½xi ð1Þ; xi ð2Þ; . . . ; xi ðli Þ , where T denotes the transpose. This can be considered as a “time domain representation”. Autoregressive (AR) representation: In this case each time series is assumed to be a stochastic process that is the outcome of an AR model. An AR model of order M is a linear prediction model, which (for the one-step ahead prediction considered in this work) is described by the equation xðt þ 1Þ ¼

M X

wj xðt j þ 1Þ þ eðt þ 1Þ

j¼1

where e(t + 1) is a white noise process with zero mean and ﬁnite variance (see e.g. Haykin, 2001). In words, the value of the time series at time t + 1, i.e. x(t + 1), is expressed as a linear combination of its M previous values, i.e. xðtÞ; . . . ; xðt M þ 1Þ plus an unknown quantity (noise). It can be shown that the best (in the mean square error sense) estimate of x(t + 1), ^xðt þ 1Þ, is given by the following equation ^xðt þ 1Þ ¼

M X j¼1

wj xðt j þ 1Þ

ð2Þ

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

Given a speciﬁc time series x(t), the parameters wj of the AR model of a given order M that best represents x(t) can be computed, e.g. via the Yule-Walker equations (see e.g. Haykin, 2001). The utilization of the AR time series representation in the present work is discussed next. For the ith time series, the AR models of orders M ¼ 1; 2; . . . ; M max (Mmax is chosen equal to the integer part of l6i ) are computed via the Yule–Walker equations. The performance of each of the Mmax AR models is evaluated via the mean square error (MSE) criterion, i.e. MSEM i ¼

l X 1 2 ðxi ðtÞ ^xi ðtÞÞ ; li M 3 t¼Mþa

M ¼ 1; . . . ; M max

1131

The proximity between two clusters Ci and Cj is expressed in terms of similarity or dissimilarity measures. In the former case, the higher the value of the measurement, the more similar the corresponding clusters are, while the opposite holds in the later case. Focusing on dissimilarity measures, the dissimilarity between two clusters Ci and Cj is deﬁned in terms of the dissimilarities (distances) between their elements. Typical examples, among others (see e.g. Theodoridis and Koutroumbas, 2009) are d min ðC i ; C j Þ ¼ min dðx; yÞ x2C i ;y2C j

and d max ðC i ; C j Þ ¼ max dðx; yÞ x2C i ;y2C j

ð3Þ

The one with the best performance (minimum MSE), whose order is denoted by Mi, is adopted for the representation of the ith time series. That is the ith time series is now represented by an Mi dimensional vector wi, which contains the Mi parameters of the AR model that best represent the time series. Note that diﬀerent time series may be represented by parameter vectors of diﬀerent dimensionality. AR representation has a ﬂavor of “frequency domain representation”, taking into account the close relation of the coeﬃcients of the AR model with the spectrum (Fourier transform of the autocorrelation function) (see e.g. Haykin, 2001). Clearly, this representation of the time series unravels diﬀerent aspects compared to those unraveled by the raw representation. Since in each of the above representations the vectors used to represent two time series may be of diﬀerent length, the Edit distance (see e.g. Chen et al., 2001) is used to measure their dissimilarity. For two vectors x ¼ ½x1 ; . . . ; xm T and y ¼ ½y 1 ; . . . ; y n T , the Edit distance is computed via the following recursion 2

rði; jÞ ¼ ðxi y j Þ þ minðrði 1; jÞ; rði 1; j 1Þ; rði; j 1ÞÞ ð4Þ for i = 1, . . ., m, j = 1, . . ., n, with r(0,0) = 0, r(0,j) = 1, j = 1, . . ., n and r(i,0) = 1, i = 1, . . ., m. The value of the Edit distance between x and y is r(m,n). Note that other measures for the comparison of xi and yi, such as the absolute value of their diﬀerence jxi y i j, can also be used in Eq. (4). 2.2. Clustering 2.2.1. Basic concepts Some basic deﬁnitions of the clustering theory that will be proved useful in our purposes are given next. A clustering Rm of a set X of N entities, is a set of m nonempty disjoint sets (clusters) C1,C2, ..., Cm, whose union is X. In addition, each cluster contains entities that are “less dissimilar” to each other while diﬀerent clusters contain entities that are “more dissimilar” to each other.

The deﬁnition of d(x,y) is application dependent. For example, if the elements of the data set X are d-dimensional vectors, the squared Euclidean distance may be adopted, i.e. dðx; yÞ ¼

d X

ðxi y i Þ2

i¼1

If x and y are vectors of diﬀerent length, the Edit distance could be employed. A clustering Rj is said to be nested in a clustering Ri if (a) Ri contains less clusters than Rj and (b) each cluster of Rj is a subset of a cluster in Ri. For example, if X ¼ fx1 ; . . . ; x5 g; R1 ¼ ffx1 ; x2 g; fx3 g; fx4 ; x5 gg is nested to R1 ¼ ffx1 ; x2 g; fx3 g; fx4 ; x5 gg, while it is not nested to R3 ¼ ffx1 ; x3 g; fx2 g; fx4 ; x5 gg and R4 ¼ ffx1 ; x2 g; fx4 g; fx3 ; x5 gg. In this work agglomerative hierarchical clustering algorithms have been adopted. Such algorithms provide us with a hierarchy of N nested clusterings. That is, the clustering at the ith level of hierarchy is nested to the clusterings at levels i þ 1; . . . ; N 1 (see Fig. 1(b)). Such a hierarchy of clusterings can give us a better feeling about the way the entities under study aggregate, in contrast to a single clustering solution produced by other clustering algorithms, such as the k-means and the fuzzy c-means (see Theodoridis and Koutroumbas, 2009). For example, more than one clusterings may ﬁt well the clustering structure of the data set at hand. This may be seen in the data set of Fig. 1(a), where both ffx1 ; x2 g; fx3 ; x4 g; fx5 ; x6 gg and ffx1 ; x2 ; x3 ; x4 g; fx5 ; x6 gg clusterings ﬁt well to the clustering structure of the data. Example 1. Consider the data set X ¼ fx1 ; . . . ; x6 g, where xi’s are two-dimensional vectors. Speciﬁcally, it is x1 ¼ ½0 10T , x2 ¼ ½0 9T , x3 ¼ ½0 15T , x4 ¼ ½0 0T , x5 ¼ ½0 10T , x6 ¼ ½0 8T , where T denotes the transpose (see Fig. 1(a)). In such, rather clear, case most of the agglomerative hierarchical algorithms would provide us with the hierarchy of clusterings shown in Fig. 1(b). On the contrary, e.g. the k-means algorithm (for 3 clusters), will probably return the single R3 clustering shown in Fig. 1(b) (this example is continued later). In the sequel we give a brief description of the concepts of the proximity dendrogram and the cophenetic matrix. In

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

R5={{x1,x2,x3,x4,x5,x6}}

200

R4={{x1,x2,x3,x4},{x5,x6}}

100

...

...

x6

... R3={{x1,x2},{x3,x4},{x5,x6}}

...

x5

x2

...

x1

...

...

1132

4

R2={{x1,x2},{x3,x4},{x5},{x6}}

2.25 1 R1={{x1,x2},{x3},{x4},{x5},{x6}} R0={{x1},{x2},{x3},{x4},{x5},{x6}} 0

x3 x4

(a)

x1

x2

(b)

x3

x4

x5

x6

(c)

Fig. 1. (a) The data set X of example 1. (b) The clustering hierarchy produced by most of the agglomerative hierarchical clustering algorithms when applied on X. (c) The dissimilarity dendrogram produced by the application of the complete link algorithm on X.

addition, a short description of the so called complete link algorithm is given. Proximity dendrogram: A hierarchy of nested clusterings is usually visualized via a proximity dendrogram. It is a tree like the one shown e.g. in Fig. 1(c), which takes into account the level of dissimilarity where two clusters are merged for the ﬁrst time (see Theodoridis and Koutroumbas, 2009). Level 0 corresponds to the clustering that contains N clusters each one containing a single entity of X. Level i corresponds to the clustering that contains Ni clusters where (a) one of them is the result of the merging of two clusters of the i1 level and (b) the rest are the same with the remaining clusters in the i1 level clustering. The two clusters of the i1 level that are merged in the ith level are joined through a branch that indicates their dissimilarity. For example, clusters fx1 ; x2 g and fx3 ; x4 g in the dendrogram of Fig. 1(c) are merged at dissimilarity level 100. Clearly, the last N–1 level corresponds to the clustering that consists of a single cluster containing all the time series (this will be the root of the tree (dendrogram)). Cophenetic matrix: Each dendrogram can also be uniquely represented by the so called cophenetic matrix. This is an N N matrix DC, whose (i,j) element, dC(i,j), equals the dissimilarity level where the ith and jth time series join at the same cluster for the ﬁrst time. For example, in Fig. 1(c), dC(1,4) = 100 since the entities x1 and x4 are merged at dissimilarity level 100 (which is their dissimilarity). Note that, by construction, the cophenetic matrix has N distinct entries. In addition, the entries of a cophenetic matrix satisfy the so-called ultrametric property, i.e. d C ði; rÞ P maxfd C ði; jÞ; d C ðj; rÞg

ð5Þ

Note that there are also other matrix representations of a dendrogram that satisfy the ultrametric property (see Mirzaei et al., 2008).

Agglomerative hierarchical algorithms – Complete link: The general strategy adopted by the agglomerative hierarchical clustering algorithms is the following: The ﬁrst clustering (level 0) contains N clusters, each one containing a single entity. The clustering of the ith level consists of (a) a cluster which is formed by the merging of the two closest clusters of the (i1) level and (b) all the rest clusters of the (i1) level. Diﬀerent ways of measuring the dissimilarity between two clusters give rise to several agglomerative hierarchical algorithms such as the complete link, the single link, the UPGMA, UPGMC, Ward’s algorithm etc (see e.g. Theodoridis and Koutroumbas, 2009). In the complete link algorithm, the dissimilarity between two clusters Ci and Cj is deﬁned as dðC i ; C j Þ ¼ maxx2Ci ;y2Cj dðx; yÞ

ð6Þ

i.e., the dissimilarity between two clusters is deﬁned as the distance between their two most distant elements. Example 1 (cont.): If the squared Euclidean distance between vectors and the dissimilarity measure between clusters deﬁned by Eq. (6) are adopted, the complete link algorithm gives the hierarchy shown in Fig. 1(b) and the corresponding dissimilarity dendrogram is shown in Fig. 1(c). From the later it follows that R1 is formed at dissimilarity level 1 since the dissimilarity between the closest clusters in R0 ðfx1 g; fx2 gÞ equals to 1. Similarly, R4 is formed at dissimilarity level 100 since the dissimilarity between the closest clusters in R3 ðfx1 ; x2 g; fx3 ; x4 gÞ is dðfx1 ; x2 g; fx3 ; x4 gÞ ¼ maxðdðfx1 ; x3 gÞ; dðfx1 ; x4 gÞ; dðfx2 ; x3 gÞ; dðfx2 ; x4 gÞÞ ¼ dðfx1 ; x4 gÞ ¼ 100: The cophenetic matrix that corresponds to the dissimilarity dendrogram of Fig. 1(c) is

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

2

0

6 6 1 6 6 100 6 6 DC ¼ 6 100 6 6 100 6 6 4 200 200

1

100

100

200

0

100

100

200

100 100

0 0

2:25 200 2:25 200

100 2:25 200 200

0 200

200 0

200

200

4

200

200

3

1133

Dq , are derived. Then all the entries in each Dj are normalized in the [0, 1] interval via the transformation

7 200 7 7 200 7 7 7 200 7 7 200 7 7 7 4 5 0

Dj ðr; sÞ ¼

Dj ðr; sÞ minðDj Þ maxðDj Þ minðDj Þ

ð8Þ

where Dj ðr; sÞ is the (r,s) entry of the cophenetic matrix, maxðDj Þ and mixðDj Þ are the maximum and the minimum elements of matrix Dj , respectively and Dj(r,s) is the (r,s) entry of the jth normalized cophenetic matrix. In the sequel the distance matrix

2.2.2. Combination of clustering hierarchies Let us consider a set X of N entities which is subject to clustering. Suppose that entity i is represented by a set of m1 measurements, which form a m1 dimensional vector T ½xi1 ; . . . ; xim1 , i = 1, . . ., N. The same entity can be represented by another set of m2 measurements (not necessarily the same with those in the previous representation) which T form an m2 dimensional vector ½y i1 ; . . . ; y im2 , i = 1, . . ., N. Clearly, in the above spirit, several representations of the same entities can be produced. Each such representation focuses on diﬀerent characteristics of the entities. Thus, provided that a suitable measure of dissimilarity between two vectors that correspond to a given representation has been adopted, applying a clustering algorithm on each such representation of the N entities it is likely to end up with diﬀerent clusterings or hierarchies of clusterings. However, since all the produced clusterings or clustering hierarchies are based on diﬀerent views of the same data set, they all convey (in principle) useful information for the clustering structure of X. Thus, a reasonable way to utilize all of them is to combine them to a single clustering or a single hierarchy of clusterings. While the subject of a combination of several clusterings of the same data set to a single one has been studied to some extent (e.g. Topchy et al., 2005; Fern and Brodley, 2003; Theodoridis and Koutroumbas, 2009), little work has been done for the combination of hierarchies of clusterings. One such work is discussed in Mirzaei et al. (2008). In the sequel a method for combining hierarchies of clusterings, which is almost identical with the one given in Mirzaei et al. (2008) is described. Assume that q diﬀerent representations are used for the N entities of the data set X. Let h iT j j j X j ¼ xi ¼ xi1 ; . . . ; ximj ; i ¼ 1; . . . ; N ; j ¼ 1; . . . ; q

Dmixt ¼

q 1X Dj q j¼1

ð9Þ

is deﬁned and a hierarchical clustering algorithm is applied on Dmixt. Clearly, the produced clustering has taken into account all q hierarchies that correspond to the q diﬀerent representations. Some comments on the above method for combining hierarchies of clusterings are now in order. Note that, by deﬁnition, all elements of Dmixt are in the [0, 1] interval. Although Dmixt has been produced by the averaging of cophenetic matrices, it is not necessarily cophenetic itself. The normalization step is adopted in the above method in order to avoid problems that may arise due to the diﬀerent scaling in the proximity dendrograms D1 ; . . . ; Dq . Note that entities that join at the same cluster at low (high) dissimilarities in all the “ingredient” dendrograms are also likely to join at the same cluster in the combined dendrogram at low (high) dissimilarities. For example, in Fig. 2, entities 8 and 13 which join at the same cluster at low dissimilarity levels in both dendrograms (a) and (b), they join at the same cluster at low dissimilarity level also in the combined dendrogram (c). The opposite holds for entities 8 and 9. On the other hand, if two entities r and s join at the same cluster at low dissimilarities d jrs ; j ¼ 1; . . . ; q1 in the ﬁrst q1 “ingredient” dendrograms and at high dissimilarities d jrs ; j ¼ q1 þ 1; . . . ; q, in the rest q q1 hierarchies, then in the combined dendrogram they will join at the same cluster at a dissimilarity level which will be a (probably weighted) average of the above dissimilarities.

ð7Þ

be the set that corresponds to the jth representation of X. This set contains N mj dimensional vectors such that the vector xji is the jth representation of the ith entity. A hierarchical clustering algorithm (such as single link, complete link, UPGMA, UPGMC, etc) is applied to each one of the sets Xj, j = 1, . . ., q and q proximity dendrograms denoted by D1 ; . . . ; Dq are produced. For these dendrograms the corresponding cophenetic matrices, denoted by D1 ; . . . ;

2.2.3. Choice of the most representative clustering A clustering hierarchy H for a given data set X of N entities gives a “good feeling” of how the entities of X aggregate. However, sometimes it is useful to know which clustering (or clusterings) of H represents better the clustering structure of X, according to a quantitative method. In other words, one can ask for the best point to cut the dendrogram representing H, since cutting the dendrogram at a speciﬁc level identiﬁes a single clustering.

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

1

1

0.9

0.9

0.8

0.8

0.7

0.7

dissimilarity

dissimilarity

1134

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1 0 6

12

3

5

4

1

2

9

10

8

13

11

6

7

8

1

2

time series identities

10

13

3

11

12

4

5

9

7

time series identities

(a)

(b) 1.2

1

1.1 0.9

1

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3

0.9 0.8 0.7 0.6 0.5 0.4

0.2

0.3

0.1

0.2 1

2

3

12

6

4

5

9

10

8

13

11

7

0

5

10

15

20

25

time series identities

time (hours)

(c)

(d)

30

35

40

Fig. 2. MtHA group: (a and b) The dendrograms produced by the complete link algorithm when (a) the raw representation and (b) the AR representation is adopted. (c) The dendrogram produced by combining the previous two dendrograms. (d) The MtHA time series. Dashed line is used for time series 7.

Two such methods are proposed in Boberg and Salakoski (1993) and also discussed in Theodoridis and Koutroumbas (2009). In this work, an extended version of one of them is used. Speciﬁcally, according to the method adopted here, the most representative clustering of X in H should contain clusters that exhibit “low dissimilarity” among its members. As a measure of the “degree of dissimilarity” in a cluster C the following is used hðCÞ ¼ maxfDði; jÞ; i; j 2 Cg

ð10Þ

where D(i,j) is the dissimilarity between the ith and the jth entity. In addition, a threshold of dissimilarity h is also employed and the criterion of the most representative clustering of X in H may be stated as: “Choose the tth clustering of H if there exists a cluster C in the (t + 1)th clustering with h(C) > h”. In our case, h is deﬁned as

h ¼ l þ kr

ð11Þ

where l and r are the mean and the standard deviation of the dissimilarities between the entities of X and k is a user deﬁned parameter. In this work, k scans a range of values and for each value of k the level of hierarchy t at which the dendrogram is cut is obtained. Then the number of times each clustering has been selected is counted and the clustering that has been selected most of the times (excluding the 0 level, R0, and the N–1 level, RN–1) is considered as the most likely to represent the data set under study. Sometimes it is useful to consider also the second frequently selected clustering, especially if it has been selected a signiﬁcant number of times. 2.3. Clustering methodology for the foF2 time series In this work, the entities under study are the foF2 time series. For each one of the eight groups of time series,

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

the raw and the AR representations discussed in Section 2.1 are adopted and the complete link hierarchical algorithm is applied for each of these two representations. The resulting hierarchies are then combined to a single clustering hierarchy via the method discussed in Section 2.2.2. Finally, the clustering in the produced hierarchy that is most representative of the clustering structure of the data set under study is determined according to the method discussed in Section 2.2.3, where k scans the range [3, 3] with step 0.01. Thus, for each one of the 601 values of kð3; 2:99; . . . ; 2:99; 3Þ, the most representative clustering will be identiﬁed. Then the clustering (or clusterings) that have been selected most of the times are likely to capture the clustering structure for each one of the eight data sets (the related histograms are given in Fig. 10). 3. Experimental results For the purposes of the current investigation, ionospheric observations obtained during 30 intense or big storm events (listed in Table 1) from eight European ionosondes (listed in Table 2) were analyzed. As it is mentioned earlier, the observations were grouped as for the development of STIM (Tsagouri and Belehaki, 2008). More precisely, to include the latitudinal dependence of the ionospheric response in STIM’s methodology, two latitudi-

Table 1 List of storm events used for the purposes of the present analysis. Time interval

Min. Dst (nT)

9–12 March 1998 25–27 June 1998 26–28 August 1998 24–26 September 1998 12–17 November 1998 12–17 January 1999 17–21 February 1999 22–24 September 1999 21–24 October 1999 6–8 April 2000 15–17 July 2000 17–19 September 2000 6–10 November 2000 30 March–1 April 2001 17–19 April 2001 17–18 August 2001 27–30 October 2001 5–8 November 2001 24–27 November 2001 11–13 May 2002 7–9 September 2002 29–31 May 2003 17–20 August 2003 19–22 November 2003 21–23 January 2004 6–9 November 2004 9–11 November 2004 14–18 May 2005 12–14 June 2005 31 August–1 September 2005

116 101 155 207 131 112 123 173 237 288 301 193 159 387 114 105 157 292 221 110 181 144 148 422 149 373 289 263 106 131

1135

Table 2 List of ionospheric stations. Station

Geographic longitude (°E)

Geographic latitude (°N)

Athens Chilton El Arenosillo Juliusruh Rome San Vito Soﬁa Tortosa

23.8 358.7 353.3 13.4 12.5 17.8 23.4 0.3

38.1 51.6 37.1 54.6 41.8 40.6 42.7 40.4

nal zones were distinguished: (i) the middle to high latitudinal zone for latitudes greater than 45°N, and (ii) the middle to low latitudinal zone for latitudes from 30° to 45°N. To include the local time dependence of the ionospheric response, the ionospheric observations were further divided based on the LT sector of the observation point at storm onset time. For this purpose, four LT onset sectors were deﬁned as follows: evening (18:00–00:00 in LT), morning (00:00–06:00 in LT), prenoon (06:00–12:00 in LT) and afternoon (12:00–18:00 in LT). In this respect, all available time series (92 in total) were divided into eight groups according to the latitude and local time of the observation point at storm onset. Here it is useful to note that there may be no available data from all eight stations for a given ionospheric storm event. The resulting groups are described below as follows: (a) middle to high latitude, afternoon (MtHA), (b) middle to high latitude, evening (MtHE), (c) middle to high latitude, morning (MtHM), (d) middle to high latitude, prenoon (MtHP), (e) middle to low latitude, afternoon (MtLA), (f) middle to low latitude, evening (MtLE), (g) middle to low latitude, morning (MtLM), (h) middle to low latitude, prenoon (MtLP). All time series are sampled on an hourly basis. The number of available time series at each group as well as the maximum length of a time series in a group is shown in Table 3. Here it is worth clarifying the following points: (i) each time series corresponds to the ionospheric storm-time response recorded over a single location for a particular storm event from the onset to full recovery. Thus, the length of the time series speciﬁes the duration of the disturbance. Table 3 veriﬁes that this duration depends strongly on the latitude and the LT of the obser-

Table 3 Details on the data sets. Group Middle Middle Middle Middle Middle Middle Middle Middle

to to to to to to to to

high latitude, afternoon (MtHA) high latitude, evening (MtHE) high latitude, morning (MtHM) high latitude, prenoon (MtHP) low latitude, afternoon (MtLA) low latitude, evening (MtLE) low latitude, morning (MtLM) low latitude, prenoon (MtLP)

No. of time series

Max. series length (h)

13 11 14 8 17 9 16 4

39 29 25 25 24 24 18 12

1136

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

The clustering results for each group are discussed next (in the sequel a clustering containing k clusters will also be met under the name “k-cluster clustering”). MtHA: In this case, strong negative eﬀects are recorded at all stations/events (Fig. 2(d)). Visual inspection of all three dendrograms does not expose any speciﬁc clustering structure on the time series apart from the fact that the time series 7 seems to diﬀer systematically from all the rest. However, this may be attributed to missing values (15 out of 39) detected at the beginning and the end of the particular time series. This is critical since the missing values correspond to the onset, the initial evolution and the recovery of the ionospheric disturbance. Indeed, the ionospheric storm intensity and part of the recovery are successfully captured by the available records of time series 7 (see dash-dotted line in Fig. 2(d)). Thus, one can conclude that all MtHA time series (except for time series 7) are aggregate to a single

1

1

0.9

0.9

0.8

0.8

dissimilarity

dissimilarity

vation point, which supports further the validity of the initial classiﬁcation of the time series to eight groups; (ii) as the ionosondes locations may diﬀer by up to 2 h in LT, it is possible for time series obtained for the same event but from diﬀerent locations to belong to diﬀerent groups; (iii) the ionospheric response is expressed with the ratio foF2obs/foF2med, where foF2obs stands for the observed values of the foF2 and foF2med for monthly median estimates. Figs. 2–8 correspond to MtHA, MtHE, MtHM, MtHP, MtLA, MtLE, MtLP, MtLP time series and each one of them shows the dendrograms obtained from (a) the complete link algorithm when it is applied on the raw representation of the time series, (b) the complete link algorithm when it is applied on the AR representation of the time series, (c) the combination of the above two dendrograms, (d) the time series in each one of the data sets. More speciﬁcally, in (d), time series from diﬀerent clusters are drawn with diﬀerent lines.

0.7 0.6 0.5 0.4

0.7 0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

5

11

1

8

4

9

6

2

3

10

0

7

6

10

time series identities

5

1

8

2

3

4

7

9

11

time series identities

(a)

(b) 1.3

1 1.2

0.9

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3 0.2

1 0.9 0.8 0.7 0.6 0.5

0.1 0

1.1

4

9

1

8

6

2

3

10

7

5

11

0.4

0

5

10

15

time series identities

time (hours)

(c)

(d)

20

25

Fig. 3. MtHE group: (a–c) See caption of Fig. 2. (d) The MtHE time series. Dashed line is used for time series 5 and 11.

30

1

1

0.9

0.9

0.8

0.8

dissimilarity

dissimilarity

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

0.7 0.6 0.5 0.4

0.7 0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

7

8

10

6

1

2

3

5

9

11

14

4

12

0

13

1137

2

3

4

12

time series identities

1

14

6

8

5

7

10

9

13

11

time series identities

(a)

(b) 1.3

1 1.2 0.9 1.1

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3

1 0.9 0.8 0.7 0.6

0.2 0.5

0.1 0

0.4 4

12

13

1

2

3

14

6

8

5

7

10

9

11

0

5

10

15

time series identities

time (hours)

(c)

(d)

20

25

Fig. 4. MtHM group: (a), (b), (c) See caption of Fig. 2. (d) The MtHM time series. Dashed line is used for time series 4 and 12, while dotted line is used for time series 13.

cluster. This is also veriﬁed by the method for selecting the most representative clustering. In Fig. 10(a) it is shown that the 2-cluster clustering has been selected 31.33% of the time over the considered values of k note that one cluster in this clustering contains only time series 7). MtHE: Prolonged negative eﬀects characterize the ionospheric response in this case too (Fig. 3(d)). Visual inspection of Fig. 3(c) suggests that all the time series except for 5 and 11 form a single cluster. Time series 11 is far apart from the above cluster while time series 5 lies “somewhere in the middle between the cluster and the time series 11”. This is a consequence of two facts: (i) according to the raw representation time series 5 and 11 are grouped together at very low dissimilarity, indicating similar temporal evolution of the ionospheric response, and (ii) according to the AR representation time series 11 lies far away from all the other time series,

indicating signiﬁcant diﬀerence in the followed disturbance pattern. The method for selecting the most representative clustering also suggests that time series 11 form a single cluster. Note however, that the clustering where each one of the 5 and 11 time series form single clusters (second most frequently selected clustering) has been selected for a signiﬁcant number of times. In Fig. 10(b) it is shown that the 2 and 3-cluster clusterings have been selected 26.67% and 11.83% of the time over the considered values of k, respectively. Visual inspection of all time series veriﬁes that time series 5 and 11 are partially diﬀerentiated from all the others, since they record a second negative phase at the recovery phase of the other. An important diﬀerence between them is due to a sharp spike that appears in time series 5 that (most probably) is attributed to autoscaling errors. Taking into account the above, one can argue that time series 5 and 11 form a second cluster. However, since (a) the

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

1

1

0.9

0.9

0.8

0.8

dissimilarity

dissimilarity

1138

0.7 0.6 0.5 0.4

0.7 0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1 1

3

7

4

5

6

8

0

2

4

5

time series identities

8

2

1

3

6

7

time series identities

(a)

(b) 1

1 0.9

0.9

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.8

0.7

0.6

0.5

0.1 5

8

6

1

3

4

2

7

0.4

0

5

10

15

time series identities

time (hours)

(c)

(d)

20

25

Fig. 5. MtHP group: (a–c) See caption of Fig. 2. (d) The MtHP time series. Dashed line is used for the time series 7.

two time series follow suﬃciently the pattern of expansion and recovery of the ionospheric disturbance followed by the others, and (b) diﬀerentiated due to the recording of a second event, the 2- and 3-cluster clustering may be considered of very low importance for the purposes of the current investigation. MtHM: Negative eﬀects are the most prominent feature of the ionospheric response in this particular case, although some deviations of the general trend are also observed (Fig. 4(d)). Visual inspection of Fig. 4(c) ends up with the following two suggestions: (a) all the time series except for 4, 12 and 13 are aggregated to a single cluster. In addition, 4 and 12 form another cluster, while 13 may be considered as a third single element cluster. (b) The time series form the following four clusters {{4, 12}, {13}, {1, 2, 3, 5, 6, 8, 14}, {7, 9, 10, 11}}. The quantitative method for selecting the most representative clustering suggests also that time series 4, 12 and

13 form a single cluster while all the rest form another cluster (see the 2-cluster clustering in Fig. 10(c)). Note however that the 4-cluster clustering has also been selected a signiﬁcant number of times. In Fig. 10(c) it is shown that the 2 and 4-cluster clusterings are selected 19.83% and 15.50% of the time over the considered values of k, respectively. Visual inspection of the time series support the 2-cluster conﬁguration, although it seems once again of low importance for the same reasons as in the MtHE case. MtHP: Strong negative eﬀects are also observed in this case (Fig. 5(d)). Visual inspection of Fig. 5(c) provides the following two suggestions: (a) all time series (except perhaps time series 7) aggregate to a single cluster and (b) the time series under study form the following four clusters {{1, 3, 4}, {5, 6, 8}, {2}, {7}}. The method for selecting the most representative clustering is in agreement with the above conclusions. In Fig. 10(d) it

1

1

0.9

0.9

0.8

0.8

0.7

0.7

dissimilarity

dissimilarity

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

5

6

2

3

4

1 10 14 13 17

7 15

8

0

9 12 11 16

1139

10 13

8 17

time series identities

5 11 12 14

1

6

9

7

2

3 16

4 15

time series identities

(a)

(b) 1.6

1 0.9

1.4

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3

1.2

1

0.8

0.6

0.2 0.4 0.1 0

13 17 10

8 12 14

1

7

9

5

6

2

3

4 15 11 16

0.2

0

5

10

15

time series identities

time (hours)

(c)

(d)

20

25

Fig. 6. MtLA group: (a–c) See caption of Fig. 2. (d) The MtLA time series. Dashed line is used for time series 11 and 16, while dotted line is used for time series 2, 3 and 4.

is shown that the 2- and 4-cluster clusterings are selected 22.50% and 21.33% of the time over the considered values of k, respectively. Visual inspection of the time series indicates also that while no signiﬁcant diﬀerences are detected in the raw representation of the time series, there are noticeable diﬀerences in their temporal evolution. For this reason the 2- or 4-clusters clustering structure suggested is accepted in this case. MtLA: Negative eﬀects mark the general trend in ionospheric response for this latitudinal zone and LT onset sector although there are noticeable diﬀerences in the full evolution of the ionospheric response (Fig. 6(d)). Visual inspection of Fig. 6(c) ends up with the following two suggestions (a) the 3-cluster solution {{1, 5, 6, 7, 8, 9, 10, 12, 13, 14, 17}, {2, 3, 4, 15}, {11, 16}} and (b) the 6-cluster case solution {{1, 7, 8, 9, 10, 12, 13, 14, 17}, {5, 6},{2, 3, 4}, {15}, {11}, {16}}. The method for selecting the most representative clustering is in agreement with

the above conclusions. In Fig. 10(e) it is shown that the 3- and 6-cluster clusterings are selected 17.17% and 13.83% of the time over the considered values of k, respectively. The inspection of the time series supports the 3-cluster clustering structure, as three general trends are identiﬁed for the ionospheric response: (i) negative eﬀects followed by positive eﬀects of short duration, (ii) negative eﬀects followed by positive eﬀects of long duration, and (iii) prolonged negative storm eﬀects observed during the whole disturbance period. MtLE: Strong, but of shorter duration, negative eﬀects are observed from all stations/events in this case (Fig. 7(d)). Visual inspection of Fig. 7(c) suggests that time series 3 and 9 form a cluster, while the rest of the time series form another. However, in the later cluster time series 1 seems to be a bit distant from the other time series of the cluster. The method for selecting the most representative clustering strongly suggests that 3 and 9

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144 1

1

0.9

0.9

0.8

0.8

0.7

0.7

dissimilarity

dissimilarity

1140

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1 2

3

4

6

5

7

8

9

0

1

5

6

time series identities

4

1

2

7

8

3

9

time series identities

(a)

(b) 1.3

1 1.2

0.9

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3

1 0.9 0.8 0.7 0.6

0.2

0.5

0.1 0

1.1

4

6

5

7

2

8

1

3

9

0.4

0

5

10

15

time series identities

time (hours)

(c)

(d)

20

25

Fig. 7. MtLE group: (a–c) See caption of Fig. 2. (d) The MtLE time series. Dashed line is used for time series 3 and 9, while dotted line is used for time series 1.

form a single cluster and less (but still signiﬁcantly) strongly that 1 forms a single element cluster. In Fig. 10(f) it is shown that the 2- and 3-cluster clusterings are selected 26.83% and 16.50% of the time over the considered values of k, respectively. Visual inspection of the time series shows that time series 1 exhibits a peculiar behavior about 10 h after the disturbance onset that lasts several hours. The observed pattern may be attributed to artiﬁcially generated ionospheric records probably due to autoscaling failures. Apart from this point, time series 1 seems very similar to time series 3 and 9. This analysis supports the 2-cluster clustering. However, as the diﬀerences of time series 1, 3, and 9 from the others are joined by periods of non signiﬁcant (less than 20% over median conditions) ionospheric disturbances the 2-cluster conﬁguration is not of signiﬁcant importance in this case.

MtLM: The ionospheric response at middle latitudes seems more complicated for cases where storm onset occurred in the morning sector, as strong positive eﬀects are observed in some cases while no disturbance eﬀects are detected in others (Fig. 8(d)). Indeed, visual inspection of Fig. 8(c) suggests that two main clusters are formed. The one consists of the time series 4, 5, 14, 11, 12, 15 that reﬂect positive disturbance eﬀects, while the second cluster contains the other time series (that exhibit no disturbance) excluding time series 9. This is probably due to the fact that time series 9 has many missing values (8 out of 18) at crucial periods, indicating that time series 9 may have limited contribution to the analysis. The method for selecting the most representative clustering strongly suggests that time series 9 forms a single element cluster and less (but still signiﬁcantly) strongly that time series 4, 5, 11, 12, 14, 15 form another

1

1

0.9

0.9

0.8

0.8

0.7

0.7

dissimilarity

dissimilarity

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

9

15

5

11 12

4 14

1

6

10 16

8

2

7 13

0

3

1141

4

14

1

2

8

12 10 16

5

11 13 15

time series identities

time series identities

(a)

(b)

7

3

6

9

1.5 1 1.4

0.9

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3

1.3 1.2 1.1 1 0.9

0.2 0.8

0.1 0

10 16

1

6

8

2

7

13

3

4 14

5 15 11 12

9

0.7

0

2

4

6

8

10

time series identities

time (hours)

(c)

(d)

12

14

16

18

Fig. 8. MtLM group: (a–c) See caption of Fig. 2. (d) The MtLM time series. Dashed line is used for time series 4, 5, 11, 12, 14 and 15, while dotted line is used for time series 9.

cluster. In Fig. 10(g), it is shown that the 2 and 3-cluster clusterings are selected 28.50% and 21.00% of the time over the considered values of k, respectively. All the above provide signiﬁcant evidence for the existence of 2-cluster clustering structure in this case. MtLP: Although the available time series in this set are too few, Fig. 9(c) suggests that time series 2 and 4 can be grouped together. Although time series 1 seems to form a single cluster, it follows the same general pattern (twospike positive eﬀects) with 2 and 4. Time series 3 seems to diﬀer signiﬁcantly from the rest as the positive ionospheric phase recorded initially is followed by weak negative eﬀects. It is worth noticing that the clustering suggestions produced by the method discussed in Section 2.2.3 are in very close agreement with the results produced by the visual inspection of the corresponding dendrograms. Moreover,

the clustering suggestions are in line with the physical aspects of the phenomenon under study (ionospheric storms). 4. Concluding remarks In this paper, a particular grouping of the foF2 time series that correspond to disturbed ionospheric periods is considered and an identiﬁcation of (possible) clustering structure of the time series in each group is carried out. More speciﬁcally, the time series of the ionospheric response recorded at eight European locations during 30 intense or big storm events are divided into eight groups according to the latitude and the LT of the observation point at storm onset, following basic aspects of STIM methodology (Tsagouri and Belehaki, 2008). For each time series two ways of representation are adopted: the raw representation and the AR representation (coeﬃcients of the

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144 1

1

0.9

0.9

0.8

0.8

0.7

0.7

dissimilarity

dissimilarity

1142

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1 1

3

2

0

4

2

4

3

1

time series identities

time series identities

(a)

(b) 1.5

1 1.4

0.9

foF2obs / foF2med

dissimilarity

0.8 0.7 0.6 0.5 0.4 0.3

1.3

1.2

1.1

1

0.9

0.2 0.1 2

4

1

3

0.8 0

2

4

6

time series identities

time (hours)

(c)

(d)

8

10

12

Fig. 9. MtLP group: (a–c) See caption of Fig. 2. (d) The MtLP time series. Dashed line is used for time series 1, while dotted line is used for time series 3.

AR model that best represents the times series under study). For the time series in each of the above groups, two clustering hierarchies are produced using the complete link clustering algorithm. The ﬁrst is based on the raw time series representation, while the second is based on the AR time series representation. Then, the two hierarchies are combined to a single (ﬁnal) one and the clustering that attributes best the clustering structure of the given group is chosen among the clusterings of the ﬁnal clustering hierarchy. Before proceeding, we should say that the number of the available time series was small. Thus, statistical ﬂuctuations were unavoidable. In addition, several time series had missing values. Despite the above facts, generally speaking, the time series in all groups exhibit limited clustering structure. In many cases, all time series aggregate to a single cluster, while in other cases where the degree of clustering structure is higher the clusters contain a small

number of time series (1–3 time series). The limited clustering structure indicates that the latitude and the LT onset sector, as they are deﬁned within STIM methodology, specify to a signiﬁcant degree the middle latitude ionospheric response. However, a more detailed study of the experimental results gives also signiﬁcant information. MtLA and MtLM cases exhibit both a 3- and 2-cluster structure, while in the MtLP case, although only a few time series are available, a general trend for a 2-cluster structure may be identiﬁed. At higher middle latitudes, only in the MtHP case there is evidence for 2- or 4-cluster structure. Although in many cases the clusters contain a small number of time series (1–3 time series), the results are veriﬁed by interpretation of the time series in terms of ionospheric storm aspects. It is also important to point out that the only set where a (non-trivial) clustering structure is met is the MtLM, where the time series form two relatively well separated clusters of approximately equal size. All the above

0.4

0.4

0.35

0.35

Perentage. of trimes a clustering is selected

Perentage. of trimes a clustering is selected

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

0.3 0.25 0.2 0.15 0.1 0.05

0.3 0.25 0.2 0.15 0.1 0.05

0

0 4

6

8

10

12

1

2

3

4

5

6

7

8

9

Clustering identity

Clustering identity

(a)

(b)

0.4

0.4

0.35

0.35

Perentage. of trimes a clustering is selected

Perentage. of trimes a clustering is selected

2

0.3 0.25 0.2 0.15 0.1 0.05

10

11

0.3 0.25 0.2 0.15 0.1 0.05

0

0 4

2

6

8

10

12

14

1

2

Clustering identity

3

4

5

6

7

8

Clustering identity

(d)

(c) 0.4

0.4

0.35

0.35

Perentage. of trimes a clustering is selected

Perentage. of trimes a clustering is selected

1143

0.3 0.25 0.2 0.15 0.1 0.05

0.3 0.25 0.2 0.15 0.1 0.05

0

0 2

4

6

8

10

12

14

16

1

2

3

4

5

6

7

Clustering identity

Clustering identity

(e)

(f)

8

9

Perentage. of trimes a clustering is selected

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

2

4

6

8

10

12

14

16

Clustering identity

(g) Fig. 10. Histograms of the number of times a clustering is selected. Speciﬁcally the x-axis contains the identities of the clusterings (e.g. “3” stands for the 3cluster clustering), while the y-axis corresponds to the number of times a clustering has been selected. (a), (b), (c), (d), (e), (f), (g) correspond to MtHA, MtHE, MtHM, MtHP, MtLA, MtLE, MtLM, respectively.

1144

K. Koutroumbas et al. / Advances in Space Research 45 (2010) 1129–1144

support the argument that (a) the ionospheric response initiated in the prenoon sector (MtLP and MtHP) and (b) the ionospheric response at lower middle latitudes could be more reliably predicted if the underlying clustering structure were taken into account. Indeed, validation tests of STIM’s performance (where no clustering structure is taken into account) at several European locations provide less successful results for middle to low latitudes (Tsagouri and Belehaki, 2008). This probably suggests that for ionospheric modeling and forecasting purposes at lower middle latitudes one should consider a more sophisticated approach for the LT dependence or a more concise combination of all possible dependences. An interesting output of the present work is that the clustering approach adopted provides us with results that are in agreement with the physical aspects of the phenomenon under study (ionospheric storms). Furthermore, it was proved able to capture ﬁne aspects of the ionospheric storm-time response and this makes it a potentially useful tool for future developments. As a ﬁnal note, it is pointed out that other representations of the time series, such as wavelets, could also be used. Note that the periodogram representation has also been considered, but preliminary results were inferior compared to those obtained from the AR representation. Acknowledgement K.K. and I.T. acknowledge the Internal Grant of ISARS/NOA “Upgraded of ionospheric forecasting method implemented in DIAS system (IONOS)”. References Belehaki, A., Stanislawska, I., Lilensten, J. An overview of ionosphere – thermosphere models available for space weather purposes. Space Science Reviews 147 (3–4), 271–313, doi:10.1007/s11214-009-9510-0, 2009.

Belehaki, A., Cander, Lj., Zolesi, B., Bremer, J., Juren, C., Stanislawska, I., Dialetis, D., Hatzopoulos, M. Ionospheric speciﬁcation and forecasting based on observations from European ionosondes participating in DIAS project. Acta Geophysica 55 (3), 398–409, doi:10.2478/s11600-007-0010-x, 2007. Belehaki, A., Cander, Lj., Zolesi, B., Bremer, J., Juren, C., Stanislawska, I., Dialetis, D., Hatzopoulos, M. Monitoring and forecasting the ionosphere over Europe: the DIAS project. Space Weather 4, S12002, doi:10.1029/2006SW000270, 2006. Boberg, J., Salakoski, T. General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances. Pattern Recognition 26 (9), 1395–1406, 1993. Buonsanto, M.J. Ionospheric storms – a review. Space Science Reviews 88, 563–601, 1999. Chen, G., Wei, Q., Zhang, H. Discovering similar time series patterns with fuzzy clustering and DTW methods. In: Smith, M.H., Gruver, W.A., Hall, L.O. (Eds.), Proc. of the Joint 9th Ifsa World Congress and 20th Naﬁps International Conference (ISBN: 0-7803-7078-3/01), Vancouver, BC, Canada, pp. 2160– 2164, 2001. Fern, X.Z., Brodley, C.E. Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), Washington, DC, pp. 186–193, 2003. Gonzalez, W.D., Tsurutani, B.T., de Gonzalez, A.L.C. Interplanetary origin of geomagnetic storms. Space Science Reviews 88, 529–562, 1999. Haykin, S. Adaptive Filter Theory, fourth ed Prentice Hall, New Jersey, 2001. Mirzaei, A., Rahmati, M., Ahmadi, M. A new method for hierarchical clustering combination. Intelligent Data Analysis 12, 549–571, 2008. Pro¨lss, G.W. Ionospheric F-region storms, in: Volland, H. (Ed.), Handbook of Atmospheric Electrodynamics, vol. 2. CRC Press, Boca Raton, pp. 195–248, 1995. Pro¨lss, G.W. Space weather eﬀects in the upper atmosphere: low and middle latitudes. Lecture Notes in Physics 656, 193–214, 2005. Theodoridis, S., Koutroumbas, K. Pattern Recognition, fourth ed Academic Press, Burlington, USA, 2009. Topchy, A., Jain, A.K., Punch, W. Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (12), 1866–1881, 2005. Tsagouri, I., Belehaki, A. An upgrade of the solar wind driven empirical model for the middle latitude ionospheric storm time response. Journal of Atmospheric Solar-Terrestrial Physics 70 (16), 2061–2076, doi:10.1016/j.jastp. 2008.09.010, 2008.

On the clustering of foF2 time series corresponding to disturbed ionospheric periods

On the clustering of foF2 time series corresponding to disturbed ionospheric periods

Recommend Documents