Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003

Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003

Journal of Volcanology and Geothermal Research 251 (2013) 65–74 Contents lists available at SciVerse ScienceDirect Journal of Volcanology and Geothe...

2MB Sizes 0 Downloads 27 Views

Journal of Volcanology and Geothermal Research 251 (2013) 65–74

Contents lists available at SciVerse ScienceDirect

Journal of Volcanology and Geothermal Research journal homepage: www.elsevier.com/locate/jvolgeores

Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003 Roberto Di Salvo a, Placido Montalto b,⁎, Giuseppe Nunnari a, Marco Neri b, Giuseppe Puglisi b a b

Dipartimento di Ingegneria Elettrica, Elettronica e Informatica, Università degli Studi di Catania, Facoltà di Ingegneria, Italy Istituto Nazionale di Geofisica e Vulcanologia, Sezione di Catania, Osservatorio Etneo, Italy

a r t i c l e

i n f o

Article history: Received 29 April 2011 Accepted 17 February 2012 Available online 25 February 2012 Keywords: Data mining Features extraction Time series clustering Self-Organizing Maps Etna Summit and flank eruptions

a b s t r a c t Time series clustering is an important task in data analysis issues in order to extract implicit, previously unknown, and potentially useful information from a large collection of data. Finding useful similar trends in multivariate time series represents a challenge in several areas including geophysics environment research. While traditional time series analysis methods deal only with univariate time series, multivariate time series analysis is a more suitable approach in the field of research where different kinds of data are available. Moreover, the conventional time series clustering techniques do not provide desired results for geophysical datasets due to the huge amount of data whose sampling rate is different according to the nature of signal. In this paper, a novel approach concerning geophysical multivariate time series clustering is proposed using dynamic time series segmentation and Self Organizing Maps techniques. This method allows finding coupling among trends of different geophysical data recorded from monitoring networks at Mt. Etna spanning from 1996 to 2003, when the transition from summit eruptions to flank eruptions occurred. This information can be used to carry out a more careful evaluation of the state of volcano and to define potential hazard assessment at Mt. Etna. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Monitoring of active volcano uses a wide range of techniques and instrumentations. Its aim is the interpretation of data in order to discover patterns in geophysical and/or geochemical parameters before, during and after eruptions. Monitoring provides the means to answer questions of vital interest to communities affected by impending eruptions, such as when and where will the volcano erupt? What is the status of the volcano? Which areas are safe or dangerous? When will the eruptions cease? (e.g. McNutt et al., 2000). The answers may come from an optimal interpretation of data and give information for forecasting purposes. Unfortunately, the ability to answer these questions depends on an adequate scientific understanding of complex volcano dynamics, both in general and for each specific volcano. A primary role in volcano monitoring is to establish the level of background or baseline activity of each monitored parameter, against which anomalous activity can be measured (McGuire, 1992). Changes in geophysical and geochemical patterns are indicative of possible eruptive reactivation and include changes in seismicity, ground deformations, physical–chemical changes in fumaroles, emission rates of volcanic gases, anomalies in gravity and magnetic fields (Currenti et al., 2005). When combined with geological

⁎ Corresponding author. Tel.: + 39 0957165800. E-mail address: [email protected] (P. Montalto). 0377-0273/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jvolgeores.2012.02.007

mapping and dating studies to reconstruct comprehensive eruptive histories of high-risk volcanoes, these geo-indicators can help to reduce eruption-related hazards to life and property. Different techniques are used to monitor these parameters such as seismic activity, measures of ground deformation, geochemical data etc. Each type of data provides information related to processes which may be related to movement of molten rock or other precursory phenomena. Seismic activity is considered one of the best indicators of the evolution of volcanic activity and is one of the most common surveillance tools for active volcanoes (e.g. Di Grazia et al., 2009; Telesca et al., 2010). It is possible to model deformation caused by change of volume at a certain depth and modeling magma reservoirs, cracks and conduits (e.g. Linde et al., 1993; Van der Laat, 1996). GPS is the most suitable technique to measure ground deformations. However, it provides spot data, i.e. they refer to network vertices whose number rarely exceeds the order of tens in areas of hundreds, often thousands, of square kilometers (e.g. Palano et al., 2009). Microgravity studies, involving the measurement of small changes with time in the value of gravity at a network of stations with respect to a fixed base, are a valuable tool for mapping out the subsurface mass redistributions that are associated to volcanic activity (e.g. Rymer, 1996). Volcanic gas analysis and temperature measurements are also commonly performed. While geophysical precursors may yield different patterns related to the particular volcanic system, tectonic setting and physicochemical properties of the magma, geochemical precursory mostly depends on changing rates of magma degassing and interactions with shallow aquifers (Martini, 1996). Significant variations have been

66

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

observed prior to some eruptions but often the changes occurred with the onset of the eruption (McNutt et al., 2000). The proposed study is focused on Mount Etna (Italy), one of the most well-known active volcanoes on Earth, with an eruption record dating back to several centuries before Christ (Branca and Del Carlo, 2004; Behncke et al., 2005). The greatest challenge for volcanologists studying this volcano lies in the complexity of its geodynamic setting and in the comprehension of the eruptive dynamics. During recent decades, Mount Etna has become one of the most intensely monitored volcanoes of the world, furnishing significant progress in the eruption forecasts (e.g. Bonaccorso et al., 2004). Further, the abundance of geophysical signals produces a redundancy of data that complicates their interpretation. Moreover, information is often hidden by several noise sources and the search to recognize volcanic patterns has become a very complicated task. In order to analyze data recorded at Mt. Etna a data mining framework is applied. Data mining can be defined as the process of extracting implicit, previously unknown, and potentially useful information from a large collection of data where the size, complexity and the amount of data is very large for traditional analysis. These kinds of methods are designed to deal with “static” databases, where the ordering of records has nothing to do with the patterns of interest. Adding time dimension to a database produces a Time Series Database (TSDB) and introduces new aspects and challenges for the tasks of data mining and knowledge discovery. The new challenges include: 1) finding the most efficient representation of time series data; 2) measuring similarity of time series; 3) detecting change points in time series, time series classification and clustering. The application of these techniques in a natural environment is a central goal of modern processing systems. Intensive monitoring of recent eruptions at Mt. Etna has generated integrated time series of data related to several observations on ground deformation, gas emission, volcanic tremor and so on. Starting from this knowledge base, the interpretation of volcano state is a basic aim of volcano monitoring. Civil authorities and communities need to know when and where eruptions will occur, the kinds of geophysical phenomena that might occur, how long eruptions will last, and whether populations near the volcano will be affected by hazards. The aim of this paper is to create an efficient and effective representation of some of the 1996–2003 time series of Etna's geophysical signals, in order to discover useful time-related features and patterns on available data. 2. Recent eruptive activity of Mount Etna and the sampled period Mt. Etna (3329 m; Neri et al., 2008) is located in eastern Sicily, at the front of the Apennine-Maghrebian Chain, along the Malta escarpment and lies on clayish-sandy Pliocene–Pleistocene foredeep deposits (Lentini, 1982; Lanzafame et al., 1997, and references therein). Its near-continuous activity is characterized by eruptions at four summit craters named Voragine (VOR), North-East Crater (NEC), Bocca Nuova (BN) and South-East Crater (SEC), the last three respectively formed in 1911, 1968 and 1971 (Chester et al., 1985; Fig. 1). Moreover, the volcano produces flank eruptions that are mainly grouped along the NE, S and W rift zones (Fig. 1) (Acocella and Neri, 2003). In these latter cases, the eruptive fissures are typically related to the emplacement of dikes radiating from the central conduit of the volcano, with lateral transport of magma, constituting the central–lateral eruptions (Behncke and Neri, 2003a; Acocella et al., 2009). In minor cases, magma along the flanks of the volcano is also emplaced by the way of vertical dikes propagating from deep (>3 km) sources below, forming eccentric (or peripheral) eruptions (Acocella and Neri, 2003, 2009). During the past 400 years, the volcano has produced over sixty flank eruptions (Behncke et al., 2003, 2005), whereas the summit

craters have almost been continuously active (Branca and Del Carlo, 2004), characterized by persistent degassing and frequent explosive activity. The 1991–1993 flank eruption is the most voluminous eruption in the last three centuries (Calvari et al., 1994), both in terms of duration (472 days) and volume of lava erupted (about 235 ×10 6 m3). After this important eruption, the volcano showed a persistent degassing phase (April 1993–June 1995), then an almost continuous summit activity which increased in intensity from July 1995 to June 2001 (Behncke et al., 2003, 2006). The last phase culminated with the July–August 2001 flank eruption (Behncke and Neri, 2003b) followed, fourteen months later, by the more violent explosive and effusive October 2002–January 2003 flank eruption (Andronico et al., 2005). In the last two eruptions, the dike intrusions followed an eccentric path (i.e. bypassed the central conduit system; Lanzafame et al., 2003; Neri et al., 2005). Beginning with the July–August 2001 eruption, Etna predominantly created flank eruptions, as occurred in 2004–2005 (Burton et al., 2005; Neri and Acocella, 2006) and, partially, in the second half of 2006. Summit activity resumed at SEC in 2006 (Neri et al., 2006; Behncke et al., 2008), followed in 2007 by six short (several hours) episodes of lava fountaining occurring at the same summit crater (Patanè et al., 2008; Andronico et al., 2009). In 2008–2009, a long (~14 months) flank eruption emerged from the high west flank of the Valle del Bove (Aloisi et al., 2009; Cannata et al., 2009; Aiuppa et al., 2010; Bonaccorso et al., 2011a). Finally, up to September 2011, 14 short but violent lava fountainings characterized the SEC activity (Vicari et al., 2011; updates are available at: http://www.ct. ingv.it/). Mount Etna is also characterized by the seaward displacement of its eastern to southern flanks (Neri et al., 2009; Ruch et al., 2010; Solaro et al., 2010, and references therein), affecting an on-shore area of >700 km 2 (Neri et al., 2004). This unstable area is confined to the N by the Pernicana Fault System (PFS; Fig. 1a; Acocella and Neri, 2005) and to the SW by the Ragalna Fault System (RFS; Fig. 1a; Rust et al., 2005; Neri et al., 2007). A close relationship between flank deformation and eruptive activity has been noted/observed, highlighting the link between the major flank eruptions and the strongest deformation related to the flank movements (Acocella et al., 2003; Bonforte and Puglisi, 2003; Burton et al., 2005; Walter et al., 2005; Bonaccorso et al., 2006; Neri and Acocella, 2006; Bonforte et al., 2008; Neri et al., 2009; Solaro et al., 2010). In the last decade, the 2001 and 2002–2003 flank eruptions played a crucial role in the recent dynamics of Mount Etna, due to their eccentric magmatic dikes which caused the acceleration of the flank movements (Puglisi et al., 2008; Bonforte et al., 2009). In this paper, we focus on 1996–2003, that encompasses the moment of transition from the period characterized by summit eruptions and slow flank motions toward the period when important flank eruptions occurred, accompanied by stronger flank movements. During this period, different time series, extracted from the INGV database and related to survey GPS measurements, SO2 flux, clinometric data, seismicity and gravimetric data, were acquired (Fig. 2). Their analysis is difficult due to the great quantity and heterogeneity of data. In the following chapters we will show the method used to simplify the data and to study them in an efficient way.

3. Data analysis framework In order to reduce the size of information (signals, data, etc.), definition of peculiar features or properties by a “features extractor” may be useful. The main reasons to keep the pattern representation as small as possible are measurement cost and clustering accuracy. On the other hand, a reduction in the number of features may lead to a loss in the information content and thereby lower the accuracy of the resulting recognition system (Jain et al., 2000).

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

67

Fig. 1. 1995–2003 volcanic activity at Mt. Etna. Lava flows from the summit (SE) and flank (FE) eruptions are drafted in the main picture. The upper left inset (a) reports the main rifts, the unstable sector (1), its inferred frontal boundary (2), its motion (3), and its lateral boundaries (4), consisting of the Pernicana and the Ragalna Fault Systems (PFS and RFS respectively). (b) Eruptive periods. (c) Red line = monthly eruptive rate; blue dot = mean eruptive rate for the post-2001 flank eruptions. Modified from Neri et al. (2009).

68

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

and Independent Component Analysis (ICA; Guo et al., 2008). Moreover, various algorithms have been developed to cluster different types of time series data. The main approaches are first to convert a raw time series data into a feature vector of lower dimension and then apply a conventional clustering algorithm to the extracted feature vectors, hence called a feature-based approach (Warren Liao, 2005). In general, data recorded on an active volcano are related to different kinds of measurement and are acquired with different sampling rate, depending on the available instrumentation. The same problem occurs when data comes from discrete measurements with respect to continuous acquisition. As an example, GPS surveys are represented by a time series with different sampling rate compared to data as SO2 flux or number of earthquakes per day. In order to overcome this limitation, the proposed method takes into account time series trends during periods of interest and then, as will explained later, performs a clustering analysis in order to highlight relationship among the different data. A descriptive diagram of the proposed data mining approach is shown in Fig. 3. Once the data are collected in a TSDB, time series trends are extracted using a segmentation algorithm (Keogh et al., 2004) and then arranged in a set of features vectors constituting the so-called feature space. In the following section we discuss the choice of segmentation algorithm and vertical exploration of the time series trends. The former is a basic step to develop a reliable method to extract relevant features on the available time series, while the latter is performed in order to obtain a parametric features space on which a clustering algorithm is applied. 3.1. Feature extraction

Fig. 2. Time series collected in TSDB during the period spanning from 1996 to 2003. From the upper panel to the lower there are: cumulative areal dilatation, SO2 flux, tilt recorded station, number of daily earthquakes and gravity data.

Unlike feature selection techniques, that refer to algorithms that select the best subset of the input feature set, feature extraction methods create new features based on transformations or combinations of the original feature set. Then, although the terms feature selection and feature extraction are used interchangeably in the literature (Jain and Zongker, 1997), feature extraction generally precedes feature selection. The choice between feature selection and feature extraction depends on the application domain and the specific training data which is available. Since the extraction of relevant features preserves their original physical interpretation and leads to savings in measurement cost, the development of a features extractor is a crucial point. In addition, the retained features may be important for understanding the physical process that generates the patterns. In recent works, many feature extraction techniques have been developed such as linear predictive coding coefficient (Wilpon and Rabiner, 1985), normalized spectrum (Shaw and King, 1992), crosscorrelation function (Goutte et al., 1999), perceptually important point (Fu et al., 2001), time domain and frequency domain representation (Owsley et al., 1997), wavelet transform (Vlachos et al., 2003)

As with most computer science problems, representation of data is the key to efficient and effective solutions. A suitable representation of single-variable time series may be Piecewise Linear Approximation, where the original points are reduced to a set of straight lines, called segments, having different sampling rates spanning from hundreds of Hz to several days. Intuitively, Piecewise Linear Representation (PLR) refers to the approximation of a time series T, of length n, with K straight lines called segments. K is typically much smaller than n, so this representation makes the storage, transmission and computation of the data more efficient (Keogh et al., 2004). In the context of data mining, PLR may be used to support clustering, classification, indexing and association rule mining of time series data. The algorithms, which input a time series and return a PLR, are known as segmentation techniques. The aim of segmentation is to organize time series into a few intervals having uniform characteristics (flatness, linearity, modality, monotonic and so on). In particular, these algorithms are very useful in multivariate time series analysis where each series is acquired with different sampling rate. There are several techniques to segment a time series and they can be distinguished into off-line and on-line

Fig. 3. Knowledge extraction process from Time Series Data Base (TSDB) to clustering process. Briefly, data collected inside a TSDB are processed using segmentation techniques and then a parametric features space is obtained considering the time series trends. These steps lead to a data collection that will be the input pattern for the clustering algorithm.

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

69

approaches. The former approach works with a fixed prior known error threshold, while the latter uses a dynamic error threshold that changes during the execution of the algorithm following a specific criterion. Due to these characteristics, off-line algorithms are simple to realize but less effective than the on-line ones. The sliding window algorithm is an on-line algorithm and works by anchoring the left point of a potential segment to the first data point of a time series, then attempting to approximate the data to the right with increasingly longer segments. At some point i, the error for the potential segment is greater than the user-specified threshold, so the subsequence from the anchor to i-1 is transformed into a segment. The anchor is moved to location i, and the process repeats until the entire time series has been approximated by its PLR (Keogh et al., 2004). Threshold is calculated by summing all the vertical distance between the best-fit line along time series and the actual data points, and then dividing the result by the number of samples. In Fig. 4 the result of time series segmentation is shown. Once the segmentation task has been accomplished for different kinds of data, it is possible to find relationships among their trends. In order to generate a set of homogeneous features, all time series have been resampled considering the time scale of the segmented time series with the major number of samples. Vertical exploration on multiple segmented time series can be performed as follows. Firstly, for each segment the angular coefficient is calculated. Secondly, for each time interval, a feature vector, whose elements are the angular coefficients of the related segments, is defined. Using this approach we can describe time series in a ndimensional vector space that represents the features space. In Fig. 5 vertical exploration of five different segmented time series related to survey GPS measurements, SO2 flux, clinometric data (DAM Station; Fig. 1a), seismicity and gravimetric data (MMO Station; Fig. 1a) recorded during the period 1996–2003, is shown. After the segmentation processing, GPS data show a nearly-constant inflation of the volcanic edifice since 1996, with an evident acceleration in this trend just during the July 2001 eruption. July 2001 is also marked by numerous (several hundred) earthquakes (started just a month

Fig. 5. Vertical exploration of the analyzed time series. For each time interval a n-tuple of descriptive features is obtained considering the angular coefficient of corresponding segments along vertical direction (red line). This method provides trend variations of the time series in each time interval.

before the 2001 eruption), while a minor number of earthquakes characterized the 2002–2003 eruption. SO2 increased dramatically at the onset of the 2002–2003 eruption, preceded and followed by periods of very low SO2 emissions in September 2001–October 2002 and February 2003–January 2004, respectively. An abrupt change in the tilt signal at the onset of the 2002–2003 eruption is also recorded. Gravimetric data apparently did not show significant trends coupled with eruptive activity.

3.2. Features space clustering using SOM Fig. 4. Segmentation process of a time series. The trend approximation (red line) of the original time series (black line) is obtained using sliding window segmentation algorithm. The lower plot shows a detail of the trend approximation.

The Self-Organizing Map (SOM) (Kohonen, 1995), also known as Kohonen map, is a popular neural network approach based on unsupervised learning, widely applied in Data Mining, and represents a

70

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

general paradigm for knowledge extraction from a large and heterogeneous amount of data. SOM represents a useful tool in data exploration and visualization in the context of knowledge discovery applied on TSDB, in which the dimension, complexity or the amount of data has so far been prohibitively large for human observation alone. Self-organization refers to the ability of a biological or technical system to adapt its internal organization to structures sensed in the input of the system. SOM clustering technique is used to organize objects into groups such that similarity within the same cluster is maximized and similarities among different clusters are minimized (Jain et al., 1999). The SOM consists basically of two layers of so-called units or neurons. The input layer consists of N elementary computational units or neurons corresponding to the real-valued input data vector X of dimension N. These units are connected to a second layer of neurons U. By means of lateral connections the neurons in U form a lattice structure of dimensionality M which is typically much smaller than N (Ultsch, 2000) (Fig. 6a). In other words, by SOM the input data space R n is mapped onto a regular two-dimensional lattice of nodes that forms a two-dimensional grid of prototype vectors that are easier to visualize and explore than the original data. The most common lattice structure grids are the rectangular and hexagonal grid (Fig. 6b, c). Reference weight vectors, also called prototype vectors, mi = [mi1, mi2,…,miN], are associated with each neuron in the map and N is the dimensionality of the input data. The fundamentals of SOM are the competition between nodes in the output layer; the basic algorithm is “winner takes all” where not only one node (the winner) but also its neighbors are updated. When an input data vector is presented to the network, it responds with a unit in the output layer, called best-matching unit (BMU), closest to the presented input. From a mathematical point of view, at each iteration, the distance among prototype vector and the input space X are calculated. The BMU b represents the neuron with prototype vector closest to X satisfying the equation: ‖X−mb ‖ ¼ min f ‖X−mi ‖ g i

ð1Þ

The prototype vector of each neuron i is updated following the rule: mi ðt þ 1Þ ¼ mi ðt Þ þ α ðt Þ hbi ðt Þ ½X−mi ðt Þ

ð2Þ

where t is the time, α(t) is the learning rate, hbi is a neighbor function between neuron i and best matching unit b. Both the learning rate and

the neighbor function decrease monotonically over time. In particular, assuming hbi to be Gaussian, it can be expressed as: −jjyb −yi jj2 =ð 2σ 2 ðt Þ Þ

hbi ðt Þ ¼ e

ð3Þ

where rb and ri are the position of neuron i and b respectively, σ(t) is the neighborhood radius at time t. As a result of learning process, the presentation of all input vectors and the adaptation of weight vectors generate a mapping from input space onto the lattice output layer with the property that topological relationships in input space are preserved (Ritter and Schulten, 1986; Kohonen, 1989). As aforementioned, the aim of visualization is to present a large amount of detailed information in order to give a qualitative idea of data properties (Vesanto and Himberg, 1999; Vesanto, 2000). Typically, the number of properties that need to be visualized is much higher than the number of usable visual dimensions. For this reason the lowdimensional map obtained by SOM algorithm gives qualitative and visible regions that should be analyzed separately. A common way to visualize the presence of cluster after SOM learning process is the so-called Unified Distance Matrix (U-matrix). This method provides a color matrix that represents distances between neighboring map units showing the cluster structure of the map: high values of U-matrix indicate a cluster border while uniform areas of low values indicate clusters themselves (Ultsch, 1993). Clustering of a simple features space together with SOM Umatrix is shown in Fig. 7. More in detail, by SOM the three-dimensional features space (Fig. 7a) is projected in the two-dimensional lattice structure where the clusters are clearly showed as darker color regions in the U-matrix (Fig. 7b). While the use of U-matrix leads to a visual inspection of the cluster, automatic clustering of SOM can be used to quantify the number of discovered clusters (Vesanto and Alhoniemi, 2000). By studying the final U-Matrix map, a number of clusters can be identified by Kmeans algorithm (McQueen, 1967; Dubes and Jain, 1976) applied on the prototype vector obtained by the SOM. This task can be accomplished by using unsupervised clustering techniques such as K-means (McQueen, 1967; Dubes and Jain, 1976). The best clustering structure obtained by the K-means algorithm has been chosen according to the Davies–Bouldin performance index (Davies and Bouldin, 1979). Such an index is a function of the number of clusters, the inter-cluster and within-cluster distances. Formally it is defined as follows:  9 8 n
i

ð4Þ

j

Fig. 6. (a) 2D lattice structure of SOM. During the learning phase, BMU gets the largest weight update while units in the neighborhood of the MBU get a smaller update. The most common lattice structures are the rectangular (b) and hexagonal (c) grid.

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

71

Fig. 7. (a) A simple feature set in a three dimensional space. (b) SOM U-matrix after the learning process: uniform areas of low value indicate clusters while high values indicate cluster border. (c) Davies–Boouldin performance index indicating best clustering structure. (d) Clustered lattice structure of the SOM using K-means and DB index applied on the SOM prototype vectors.

Where n is the number of clusters, Sn is the average distance of all cluster objects to their cluster centers, S(Qi, Qj) is the distance between clusters Qi and Qj. Small values of DB correspond to compact clusters whose centers are far away from each other. In light of this, the number of clusters that minimizes DB is taken as the optimal number of clusters. The best clustering structure of the SOM, shown in Fig. 7b and obtained by K-means and Davies–Bouldin (Fig. 7c) performance index, is reported in Fig. 7d.

4. Time series clustering and data interpretation We have applied the described steps to cluster a set of five different time series recorded on Mt. Etna during 1996–2003 (GPS survey, SO2 flux, clinometric data, seismicity and gravimetric data). Clustering of the segmented time series has been performed using the SOM toolbox (Vesanto and Himberg, 1999).

Fig. 8. (a) U-matrix obtained using the time series trends features space. (b–c) the DB index suggests a best structure of three clusters.

72

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

During the training phase each group of neurons is placed in the Umatrix and generates the first structure of clusters by means of features extracted from time series vertical exploration. Through a visual inspection of U-matrix, it is possible to recognize clusters in the feature space (Fig. 8a): high values indicate a cluster border, while uniform areas with low values indicate clusters themselves. By studying the final U-Matrix map, and the underlying feature planes of the map, a number of clusters can be identified by K-means algorithm applied on the SOM prototype vectors. The best clustering structure provided by K-means algorithm together with Davies–Bouldin index suggests the existence of three clusters (Fig. 8b). The best clustering structure projected onto the SOM provides the clustered lattice structure reported in Fig. 8c. Starting from these early results, a labeling system has been used in order to interpret the obtained clusters. The association of labels to the input data provides an additional input in which the information about time of time series is stored. In order to assign each cluster to time intervals, BMU was evaluated for each input pattern used for SOM training. According to the considered period, we found that one cluster is related to signals recorded between 2001 eruption and 2002–2003 eruption, while the other two clusters are related to data recorded before the 2001 eruption and from the 2002–2003 eruption to the end of the dataset (Fig. 8c). The plots shown in Fig. 9 facilitate the inspection of the connection between the multivariate time series and the trained SOM, representing the data from a different point of view. In particular, a new set of input patterns, not used for SOM training, was considered. The aim of this step is the evaluation of the SOM response to unknown input patterns. The picture shows how different regions of the SOM are related to the input feature vectors, indicating how a region of the map is connected to the time series. In accordance with the best clustering structure of the SOM (Fig. 8c), the multiple time series are represented using different colors. To be more precise, the green points are related to the time window from 1996 to 2001, while the blue points are related to the time windows from the start of the 2001 flank eruption to the period before the 2002–2003 flank eruption. Finally, the red points indicate the period concerning the 2002–2003 flank eruption to the end of the available dataset. During the transition periods between the three time windows, a set of labels “x” indicate input vectors which are not recognized by the SOM. Data shown in Fig. 9 testifies the good results of our approach in which input patterns were properly classified. The time window 1996–2001 represents a period characterized by exclusively summit eruptions (Fig. 1b) and relatively slow displacements of the eastern and southern flanks of Etna. In such a period, the earthquake activity was negligible, while GPS increased slowly and several peaks in SO2 were essentially synchronized with summit eruptions. All these parameters are compatible with a progressive inflation and pressurization of the volcano (Allard et al., 2006; Bonforte et al., 2008; Neri et al., 2009; Bonaccorso et al., 2011b). The 2001 and, partially, the 2002–2003 eruptions represent two key events in the diagram reported in Fig. 9; these eruptions were both picked in the earthquake time series (see especially the onset of the 2001 eruption), and this is a reasonable fact looking the type of eruptive events (they were both eccentric and central–lateral eruptions; Neri et al., 2005), highly fed (Fig. 1b), accompanied by the development of several km-long dry and eruptive fractures, and significant accelerations of the motion of the unstable flanks of Etna (Acocella and Neri, 2003; Neri et al., 2004, 2009; Solaro et al., 2010). The GPS signal highlighted a clear variation close to the onset of the 2001 eruption only (Puglisi et al., 2008), while SO2 showed the highest values during the 2002–2003 eruption (Andronico et al., 2005). Also tilt signal showed an abrupt increase at the onset of the 2002–2003 eruption, while in the other periods the tilt variations permit less clear interpretation. Finally, gravimetric signal exhibited long-term variations, the most significant one pointed before the 2001 eruption (Greco et al., 2010; Bonaccorso et al., 2011b).

Fig. 9. Reconstruction of time series using unknown dataset. The different colors indicate how a region of the map is connected to the time series: the green points are related to the time window from 1996 to 2001, blue points are related to the time windows from the start of the 2001 flank eruption to the period before the 2002–2003 flank eruption, red points indicate the period concerning the 2002–2003 flank eruption to the end of the available dataset. Finally, labels “x” indicate input vectors which are not recognized by the SOM.

All these geophysical signals demonstrate a general deflation started at the onset of the 2001 flank eruption. This event marks an abrupt change in the eruptive style of the volcano, which passed from exclusively summit activity (in the period 1994–2001) to prevalent flank eruptions, clearly linked to the dynamics of the unstable flanks (Solaro et al., 2010, and references therein). In conclusion, the geophysical signals changed significantly not at the same time, but in different periods of the time window here analyzed, and with different intensities. Previously, this factor has made the contemporary dealing of these kinds of data, so different and heterogeneous to each other, very difficult. SOM analysis seems able to overcome this dilemma, as shown in Fig. 9, since it is able to distinguish several data-sets characterized by different volcano-tectonic behavior.

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

5. Conclusions In this work we have proposed a novel approach for multivariate time series analysis by using some data mining techniques starting from a Time Series Database (TSDB) of geophysical data recorded at Mt. Etna in the period 1996–2003. In order to extract knowledge from available information, a set of different methods has been used such as Piecewise Linear Representation (PLR), features extraction, self-organizing maps and K-means clustering. The results have been interpreted taking into account the activity related to Mt. Etna before and after the 2001 flank eruption which is regarded as a key/crucial event for the behavior of the volcano (Puglisi et al., 2008; Bonforte et al., 2009). Indeed, after 2001 most eruptive activity has almost only involved the flanks of the volcano rather than the summit craters area. To be more precise, up until 2000, Etna inflated with a deformation rate that progressively reduced with time, nearly stopping entirely between 1998 and 2000; moreover, low-eruptive rate summit eruptions occurred, punctuated by lava fountains (Fig. 1c). After 2001, Etna deflated, feeding highereruptive rate flank eruptions (Fig. 1c), along with large displacements of the entire eastern flank (Fig. 1a). The results obtained using our method provide further evidence about these two behaviors of the volcano, in which the higher rate of magma stored before July 2001 has triggered the emplacement of the dyke responsible for the 2001 and 2002–2003 eruptions, accelerating the deformations affecting the eastern flank (Neri et al., 2005, 2009; Solaro et al., 2010). The developed work represents a powerful tool to cluster time series of markedly heterogeneous geophysical signals, and hence for volcanic hazard assessment at Etna. The obtained results will be considered as a starting point for further and more accurate analyses such as classification tasks in which, given a set of values, they can be assigned to a specific cluster. Moreover, it would be interesting to have feedback on original data to understand why a set of input data is placed in a specific cluster.

Acknowledgments We would like to thank Dr. T. Caltabiano, Dr. S. Gambino and Dr. C. Del Negro for providing geochemical, clinometric and microgravity data used in the clustering analysis. A special thanks to Dr. A. Cannata for his useful discussion about the preparation of the manuscript. We thank Steve Conway for revising the English text. Finally, we acknowledge the editor and reviewers for their helpful advice. This work was partially funded by INGV and the DPC-INGV project “Flank”.

References Acocella, V., Neri, M., 2003. What makes flank eruptions? The 2001 Etna eruption and the possible triggering mechanisms. Bulletin of Volcanology 65, 517–529. doi:10.1007/ s00445-003-0280-3. Acocella, V., Neri, M., 2005. Structural features of an active strike-slip fault on the sliding flank of Mt. Etna (Italy). Journal of Structural Geology 27/2, 343–355. doi:10.1016/ j.jsg.2004.07.006. Acocella, V., Neri, M., 2009. Dike propagation in volcanic edifices: overview and possible developments. Special Issue: Gudmundsson — Volcanoes. Tectonophysics 471, 67–77. doi:10.1016/j.tecto.2008.10.002. Acocella, V., Behncke, B., Neri, M., D'Amico, S., 2003. Link between major flank slip and eruptions at Mt. Etna (Italy). Geophysical Research Letters 30 (24), 2286. doi:10.1029/2003GL018642. Acocella, V., Neri, M., Sulpizio, R., 2009. Dike propagation within active central volcanic edifices: constraints from Somma-Vesuvius, Etna and analogue models. Bulletin of Volcanology 71, 219–223. doi:10.1007/s00445-008-0258-2. Aiuppa, A., Cannata, A., Cannavò, F., Di Grazia, G., Ferrari, F., Giudice, G., Gurrieri, S., Liuzzo, M., Mattia, M., Montalto, P., Patanè, D., Puglisi, G., 2010. Patterns in the recent 2007–2008 activity of Mount Etna volcano investigated by integrated geophysical and geochemical observations. Geochemistry, Geophysics, Geosystems 11, Q09008. doi:10.1029/2010GC003168. Allard, P., Behncke, B., D'Amico, S., Neri, M., Gambino, S., 2006. Mount Etna 1993–2005: anatomy of an evolving eruptive cycle. Earth-Science Reviews 78, 85–114.

73

Aloisi, M., Bonaccorso, A., Cannavò, F., Gambino, S., Mattia, M., Puglisi, G., Boschi, E., 2009. A new dike intrusion style for the Mount Etna May 2008 eruption modelled through continuous tilt and GPS data. Terra Nova 21 (4), 316–321. Andronico, D., Branca, S., Calvari, S., Burton, M.R., Caltabiano, T., Corsaro, R.A., Del Carlo, P., Garfì, G., Lodato, L., Miraglia, L., Muré, F., Neri, M., Pecora, E., Pompilio, M., Salerno, G., Spampinato, L., 2005. A multi-disciplinary study of the 2002–03 Etna eruption: insights into a complex plumbing system. Bulletin of Volcanology 67, 314–330. doi:10.1007/s00445-004-0372-8. Andronico, D., Scollo, S., Cristaldi, A., Ferrari, F., 2009. Monitoring ash emission episodes at Mt. Etna: the 16 November 2006 case study. Journal of Volcanology and Geothermal Research 180, 123–134. Behncke, B., Neri, M., 2003a. Cycles and trends in the recent eruptive behaviour of Mount Etna (Italy). Canadian Journal of Earth Sciences 40, 1405–1411. doi:10.1139/E03-052. Behncke, B., Neri, M., 2003b. The July–August 2001 eruption of Mt. Etna (Sicily). Bulletin of Volcanology 65, 461–476. doi:10.1007/s00445-003-0274-1. Behncke, B., Neri, M., Carniel, R., 2003. An exceptional case of lava dome growth spawning pyroclastic avalanches at Mt. Etna (Italy): the 1999 Bocca Nuova eruption. Journal of Volcanology and Geothermal Research 124, 115–128. doi:10.1016/S0377-0273(03) 00072-6. Behncke, B., Neri, M., Nagay, A., 2005. Lava flow hazard at Mount Etna (Italy): new data from a GIS-based study. In: Manga, M., Ventura, G. (Eds.), Kinematics and dynamics of lava flows: Geol. Soc. Am. Spec. Pap., 396, pp. 187–205. doi:10.1130/0-8137-23965.189. Behncke, B., Neri, M., Pecora, E., Zanon, V., 2006. The exceptional activity and growth of the Southeast Crater, Mount Etna (Italy), between 1996 and 2001. Bulletin of Volcanology 69, 149–173. doi:10.1007/s00445-006-0061-x. Behncke, B., Calvari, S., Giammanco, S., Neri, M., Pinkerton, H., 2008. Pyroclastic density currents resulting from the interaction of basaltic magma with hydrothermally altered rock: an example from the 2006 summit eruptions of Mount Etna, Italy. Bulletin of Volcanology 70, 1249–1268. doi:10.1007/s00445-008-0200-7. Bonaccorso, A., Calvari, S., Coltelli, M., Del Negro, C., Falsaperla, F., 2004. Etna volcano laboratory. Geophysical Monograph series of American Geophysical Union. Bonaccorso, A., Bonforte, A., Guglielmino, F., Palano, M., Puglisi, G., 2006. Composite ground deformation pattern forerunning the 2004–2005 Mount Etna eruption. Journal of Geophysical Research 111, B12207. doi:10.1029/2005JB004206. Bonaccorso, A., Bonforte, A., Currenti, G., Del Negro, C., Di Stefano, A., Greco, F., 2011a. Magma storage, eruptive activity and flank instability: inferences from ground deformation and gravity changes during the 1993–2000 recharging of Mt. Etna volcano. Journal of Volcanology and Geothermal Research 200, 245–254. doi:10.1016/j.jvolgeores.2011.01.001. Bonaccorso, A., Bonforte, A., Calvari, S., Del Negro, S., Di Grazia, G., Ganci, G., Neri, M., Vicari, A., Boschi, E., 2011b. The initial phases of the 2008–2009 Mt. Etna eruption: a multi-disciplinary approach for hazard assessment. Journal of Geophysical Research 116, B03203. doi:10.1029/2010JB007906. Bonforte, A., Puglisi, G., 2003. Magma uprising and flank dynamics on Mt. Etna volcano studied by GPS data (1994–1995). Journal of Geophysical Research 108 (B3), 2153. doi:10.1029/2002JB001845. Bonforte, A., Bonaccorso, A., Guglielmino, F., Palano, M., Puglisi, G., 2008. Feeding system and magma storage beneath Mt. Etna as revealed by recent infation/deflation cycles. Journal of Geophysical Research 113. doi:10.1029/2007JB005334. Bonforte, A., Gambino, S., Neri, M., 2009. Intrusion of eccentric dikes: the case of the 2001 eruption and its role in the dynamics of Mt. Etna volcano. Tectonophysics 471, 78–86. doi:10.1016/j.tecto.2008.09.028. Branca, S., Del Carlo, P., 2004. Eruptions of Mt Etna during the past 3,200 years: a revised compilation integrating the historical and stratigraphic records. In: Bonaccorso, A., Calvari, S., Coltelli, M., Del Negro, C., Falsaperla, S. (Eds.), Etna: Volcano Laboratory: AGU Geophysical Monograph Series, 143, pp. 1–27. Burton, M., Neri, M., Andronico, D., Branca, S., Caltabiano, T., Calvari, S., Corsaro, R.A., Del Carlo, P., Lanzafame, G., Lodato, L., Miraglia, L., Muré, F., Salerno, G., Spampinato, L., 2005. Etna 2004–05: an archetype for geodynamically-controlled effusive eruptions. Geophysical Research Letters 32. doi:10.1029/2005GL022527. Calvari, S., Coltelli, M., Neri, M., Pompilio, M., Scribano, V., 1994. The 1991–93 Etna eruption: chronology and lava flow field evolution. Acta Vulcanologica 4, 1–14. Cannata, A., Montalto, P., Privitera, E., Russo, G., Gresta, S., 2009. Tracking eruptive phenomena by infrasound: May 13, 2008 eruption at Mt. Etna. Geophysical Research Letters 36. doi:10.1029/2008GL036738. Chester, D.K., Duncan, A.M., Guest, J.E., Kilburn, C.R., 1985. Mount Etna. The anatomy of a volcano. Chapman and Hall, London, pp. 1–404. Currenti, G., Del Negro, C., Lapenna, V., Telesca, L., 2005. Scaling characteristics of local geomagnetic field and seismicity at Etna volcano and their dynamics in relation to the eruptive activity. Earth and Planetary Science Letters 235, 96–106. Davies, D.L., Bouldin, D.W., 1979. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1. Di Grazia, G., Cannata, A., Montalto, P., Patanè, D., Privitera, E., Zuccarello, L., Boschi, E., 2009. A multiparameter approach to volcano monitoring based on 4D analyses of seismo-volcanic and acoustic signals: the 2008 Mt. Etna eruption. Geophysical Research Letters 36. doi:10.1029/2009GL039567. Dubes, R., Jain, A.K., 1976. Clustering techniques: the user's dilemma. Pattern Recognition 8, 247–260. Fu, T.-C., Chung, F.L., Ng, V., Luk, R., 2001. Pattern discovery from stock time series using self-organizing maps. KDD 2001 Workshop on Temporal Data Mining, San Francisco, pp. 27–37. Goutte, C., Toft, P., Rostrup, E., 1999. On clustering fMRI time series. NeuroImage 9, 298–310. Greco, F., Currenti, G., Del Negro, C., Napoli, R., Budetta, G., Fedi, M., Boschi, E., 2010. Spatiotemporal gravity variations to look deep into the southern flank of Etna volcano. Journal of Geophysical Research 115, B11411. doi:10.1029/2009JB006835.

74

R. Di Salvo et al. / Journal of Volcanology and Geothermal Research 251 (2013) 65–74

Guo, C., Jia, H., Zhang, N., 2008. Time series clustering based on ICA for stock data analysis. 4th International Conference on Wireless Communications, Networking and Mobile Computing, 2008. Dalian, China, pp. 1–4. Jain, A.K., Zongker, D., 1997. Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19. doi:10.1109/34.574797. Jain, A.K., Murty, M.N., Flynn, P.J., 1999. Data clustering: a review. ACM Computing Surveys 264–323. Jain, A.K., Duin, R.P.W., Mao, J., 2000. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1. Keogh, E., Chu, S., Hart, D., Pazzani, M., 2004. Segmenting time series: a survey and novel approach. In: Last, M., Kandel, A., Bunke, H. (Eds.), Data mining in time series database. World Scientific Publishing Company, pp. 1–21. Kohonen, T., 1989. Self-Organization and Associative Memory. Springer-Verlag, New York. Kohonen, T., 1995. Self Organizing Maps. Springer, Berlin, Heidelberg. Lanzafame, G., Leonardi, A., Neri, M., Rust, D., 1997. Late overthrust of the AppenineMaghrebian Chain at the NE periphery of Mt. Etna, Italy. Comptes Rendus de l' Academie des Sciences Paris 324, 325–332. Lanzafame, G., Neri, M., Acocella, V., Billi, A., Funiciello, R., Giordano, G., 2003. Structural features of the July–August 2001 Mount Etna eruption: evidence for a complex magma supply system. Journal of the Geological Society of London 160, 531–544. doi:10.1144/0016-764902-151. Lentini, F., 1982. The geology of the Mt. Etna basement. Memorie della Società Geoligica Italiana 23, 7–25. Linde, A.T., Agustsson, K., Sacks, I.S., Stefansson, R., 1993. Mechanism of the Hekla Iceland 1991 eruption from continuous borehole strain monitoring. Nature 365, 737–741. Martini, M., 1996. Chemical characters of the gaseous phase in different stages of volcanism: precursors and volcanic activity. In: Scarpa, Tilling (Ed.), Monitoring and Mitigation of Volcano Hazards, pp. 199–219. McGuire, W.J., 1992. Monitoring active volcanoes: procedures and prospects. Proceedings of the Geologists' Association (ISSN: 0016-7878) 103, 303–320. doi:10.1016/S00167878(08)80128-9. McNutt, S.R., Rymer, H., Stix, J., 2000. Synthesis of volcano monitoring. In: Sigurdsson, et al. (Ed.), Encyclopaedia of Volcanoes, pp. 1165–1183. McQueen, J., 1967. Some methods for classification and analysis of multivariate observations. Fifth Berkeley Symposium on mathematics, Statistics and Probability, 1, pp. 281–298. Neri, M., Acocella, V., 2006. The 2004–05 Etna eruption: implications for flank deformation and structural behaviour of the volcano. Journal of Volcanology and Geothermal Research 158, 195–206. doi:10.1016/j.jvolgeores.2006.04.022. Neri, M., Acocella, V., Behncke, B., 2004. The role of the Pernicana Fault System in the spreading of Mount Etna (Italy) during the 2002–2003 eruption. Bulletin of Volcanology 66, 417–430. doi:10.1007/s00445-003-0322-x. Neri, M., Acocella, V., Behncke, B., Maiolino, V., Ursino, A., Velardita, R., 2005. Contrasting triggering mechanisms of the 2001 and 2002–2003 eruptions of Mount Etna (Italy). Journal of Volcanology and Geothermal Research 144, 235–255. doi:10.1016/ j.jvolgeores.2004.11.025. Neri, M., Behncke, B., Burton, M., Giammanco, S., Pecora, E., Privitera, E., Reitano, D., 2006. Continuous soil radon monitoring during the July 2006 Etna eruption. Geophysical Research Letters 33. doi:10.1029/2006GL028394. Neri, M., Guglielmino, F., Rust, D., 2007. Flank instability on Mount Etna: radon, radar interferometry and geodetic data from the southern boundary of the unstable sector. Journal of Geophysical Research 112, 1–15. doi:10.1029/2006JB004756. Neri, M., Mazzarini, F., Tarquini, S., Bisson, M., Isola, I., Behncke, B., 2008. The changing face of Mount Etna's summit area documented with Lidar technology. Geophysical Research Letters 35. doi:10.1029/2008GL033740. Neri, M., Casu, F., Acocella, V., Solaro, G., Pepe, S., Berardino, P., Sansosti, E., Caltabiano, T., Lundgren, P., Lanari, R., 2009. Deformation and eruptions at Mt. Etna (Italy): a lesson from 15 years of observations. Geophysical Research Letters 36. doi:10.1029/ 2008GL036151.

Owsley, L.M.D., Atlas, L.E., Bernard, G.D., 1997. Self-organizing feature maps and hidden Markov models for machine-tool monitoring. IEEE Transactions on Signal Processing 45, 2787–2798. Palano, M., Gresta, S., Puglisi, G., 2009. Time-dependent deformation of the eastern flank of Mt. Etna: after-slip or viscoelastic relaxation? Tectonophysics 473, 300–311. doi:10.1016/j.tecto.2009.02.047. Patanè, D., Di Grazia, G., Cannata, A., Montalto, P., Boschi, E., 2008. The shallow magma pathway geometry at Mt. Etna volcano. Geochemistry, Geophysics, Geosystems 9, 12. doi:10.1029/2008GC002131. Puglisi, G., Bonforte, A., Ferretti, A., Guglielmino, F., Palano, M., Prati, C., 2008. Dynamics of Mount Etna before, during, and after the July–August 2001 eruption inferred from GPS and differential synthetic aperture radar interferometry data. Journal of Geophysical Research 113. doi:10.1029/2006JB004811. Ritter, H., Schulten, K., 1986. Kohonen's self-organizing maps: exploring their computational capabilities. Biological Cybernetics 54, 99–106. Ruch, J., Acocella, V., Storti, F., Neri, M., Pepe, S., Solaro, G., Sansosti, E., 2010. Detachment depth of an unstable volcano revealed by rollover deformation: an integrated approach at Mt. Etna. Geophysical Research Letters 37. doi:10.1029/2010GL044131. Rust, D., Behncke, B., Neri, M., Ciocanel, A., 2005. Nested zones of instability in the Mount Etna volcanic edifice, Sicily. Journal of Volcanology and Geothermal Research 144, 137–153. doi:10.1016/j.jvolgeores.2004.11.021. Rymer, H., 1996. Microgravity monitoring. In: Scarpa, Tilling (Ed.), Monitoring and mitigation of volcano hazards, pp. 169–197. Shaw, C.T., King, G.P., 1992. Using cluster analysis to classify time series. Physica D 58, 288–298. Solaro, G., Acocella, V., Pepe, S., Ruch, J., Neri, M., Sansosti, E., 2010. Anatomy of an unstable volcano through InSAR data: multiple processes affecting flank instability at Mt. Etna in 1994–2008. Journal of Geophysical Research 115. doi:10.1029/2009JB000820. Telesca, L., Lovallo, M., Carniel, R., 2010. Time-dependent Fisher Information Measure of volcanic tremor before 5 April 2003 paroxysm at Stromboli volcano, Italy. Journal of Volcanology and Geothermal Research 195, 78–82. Ultsch, A., 1993. Self organized features planes for monitoring and knowledge acquisition of a chemical process. International Conference on Artificial Neural Networks. SpringerVerlag, London, pp. 864–867. Ultsch, A., 2000. The neuro-data-mine. Symposia on Neural Computation (NC'2000), Berlin, Germany. Van der Laat, R., 1996. Ground-deformation methods and results. In: Scarpa, Tilling (Ed.), Monitoring and Mitigation of Volcano Hazards, pp. 147–168. Vesanto, J., 2000. Using SOM in Data Mining. Licentiate's thesis, Helsinki University of Technology. Vesanto, J., Alhoniemi, E., 2000. Clustering of the self-organizing map. IEEE Transactions on Neural Networks 11, 586–600. Vesanto, J., Himberg, J., 1999. Self-organizing map in Matlab: the SOM toolbox. MATLAB-DSP 1999, pp. 35–40. Vicari, A., Ganci, G., Behncke, B., Cappello, A., Neri, M., Del Negro, C., 2011. Near-real-time forecasting of lava flow hazards during the 12–13 January 2011 Etna eruption. Geophysical Research Letters 38, L13317. doi:10.1029/2011GL047545. Vlachos, M., Lin, J., Keogh, E., Gunopulos, D., 2003. A wavelet-based anytime algorithm for K-means clustering of time series. Third SIAM International Conference on Data Mining, San Francisco, California. Walter, T.R., Acocella, V., Neri, M., Amelung, F., 2005. Feedback processes between magmatism and E-flank movement at Mt. Etna (Italy) during the 2002–2003 eruption. Journal of Geophysical Research 110. doi:10.1029/2005JB003688. Warren Liao, T., 2005. Clustering of time series data — a survey. Pattern Recognition 38, 1857–1874. Wilpon, J.G., Rabiner, L.R., 1985. Modified K-means clustering algorithm for use in isolated word ecognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 33, 587–594.