Science of the Total Environment 347 (2005) 1 – 20 www.elsevier.com/locate/scitotenv
Integrating spatio-temporal information in environmental monitoring data—a visualization approach applied to moss data Katrin Gru¨nfeldT Department of Land- and Water Resources Engineering, Royal Institute of Technology, Teknikringen 72, 100 44 Stockholm, Sweden Received 24 July 2004; accepted 17 December 2004 Available online 13 February 2005
Abstract Large-scale environmental monitoring data being sparse and collected on irregular grids, which may differ from year to year, are difficult to analyse and present. The traditional techniques from statistics and Geographic Information Systems (GIS) may not be useful given the often relatively small sample size combined with varying sampling density. In this study, the freeware visualization package XmdvTool was used for integration and exploration of monitoring data from three surveys of terrestrial mosses. Data on contents of Cu, Ni, Pb, V and Zn in mosses within an area of 300300 km in southern Sweden, sampled in 1985 (177 samples), 1990 (156 samples) and 1995 (188 samples), were integrated and visualized using parallel coordinate and scatterplot display techniques. Several interesting findings about multi-element composition of samples, as well as changing temporal trends in the relations of five metals were made during interactive visual discovery. Visualization techniques for highdimensional data may have limitations considering, for example, number of variables, ranges of data values, and spatial scales. Nevertheless, interactive data manipulation tools encourage the process of visual exploration, and the unique way of integrating spatial, temporal and multi-element components of moss data provided visual insights that are not possible to gain with traditional analysis tools. D 2005 Elsevier B.V. All rights reserved. Keywords: Moss survey; Environmental monitoring; Multi-element data; Visualization
1. Introduction One method used for large-scale monitoring of long-range transport is the moss technique, which was developed in Sweden in the late 1960s as a means of
T Tel.: +468 7907 030; fax: +468 7906 810. E-mail address:
[email protected]. 0048-9697/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.scitotenv.2004.12.054
surveying atmospheric metal deposition (Ru¨hling and Tyler, 1968; Tyler, 1970; Selinus, 1996). At present, moss analysis is used as a monitoring method throughout Europe. With the development of analytical techniques, the number of chemical elements detected, as well as the precision of measurements, have increased over the years. The results of European moss surveys have continuously been published in literature (for example, Ru¨hling and Tyler, 1973;
2
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
Ru¨hling et al., 1987; Ru¨hling and Steinnes, 1998), with a growing number of studies dealing with the interpretation of moss monitoring data and the deposition of metals. It has been noted that metal pollutants are likely to be found close to their emission sources with their concentrations decreasing rapidly with distance (Reimann et al., 1997). To describe the atmospheric trace metal deposition close to the pollution source, Cˇeburnis et al. (2002) proposed semi-empirical models. Regression equations have been applied to describe both the decrease of element contents with respect to the distance from pollution sources (Sucharova and Suchara, 2004) and the transformation of moss concentration data to absolute deposition rates (de Caritat et al., 1997; Berg and Steinnes, 1998). The role of atmospheric deposition of metals, influence of local geology, climate/ vegetation zone, and uptake efficiencies have been discussed by Steinnes (1995), Zechmeister (1995), ¨ yra¨s et al. (1997), de Caritat et al. (1997, 2001), A Reimann et al. (2001b), and Poikolainen et al. (2004), with the conclusion that for a reliable contamination signal over a sizeable area, a major source and considerable contrast between background and polluted areas is needed, and because of the local variability, big composite samples over a large area should be collected. Geochemical and environmental data may have many potential sources of error, such as detection limit problems and extreme values which can give skewed data distribution(s). Furthermore, regional scale moss monitoring data are often collected on an irregular grid, which may differ from year to year. Sampling depends on the availability of material and the sampling density may vary considerably. The size of a dataset for a given area may not even be sufficient for statistical analysis. Reimann and Filzmoser (1999) suggested that due to those specific properties of data non-robust statistical methods would deliver distorted results. Exploratory Data Analysis (EDA) tools, which do not make any assumptions about data distribution (Tukey, 1977; Ku¨rzl, 1988; de Caritat et al., 2001) have been found to be most suitable when a distribution should be characterized and studied, and should, in general, be the first step in the analysis of geochemical and environmental data (Reimann and Filzmoser, 1999). Graphical EDA tools include different plots, graphs
and charts, like boxplots, probability plots, scatterplots, cumulative frequency diagrams, to mention but few. The analysis of spatial data needs a spatial reference to be taken into account. Moss data are most often presented in the form of symbol or (contoured) surface maps showing single chemical elements. Symbol maps present the actual concentration of an element at a sampling location using color or/and proportional symbol (e.g. Ku¨rzl, 1988; Bjo¨rklund and Gustavsson, 1987; Gustavsson et al., ¨ tvo¨s et al., 1997; Krauss-Kessler et al., 1999; O 2003). The total concentration range is divided into classes using continuous scale (Fernandez et al., 2000), percentiles, or quartiles (Reimann et al., 1997) of the data. This is an objective representation of data, and given the relatively small sample size combined with a large area and varying sampling density, interpolation of point data into a surface may not be appropriate when there is a large variation in concentrations. On the other side, a continuous surface allows a fast impression about general spatial trends present in data. Several exact interpolation or best surface fitting techniques have been applied to transform point data into continuous surface. Interpolation using mean values of point data within a specified radius was applied by Herpin et al. (1996), with a conclusion that single high measurements have influence and the surface is an approximation of the real situation. Real et al. (2003) used surface response techniques (n-degree polynomials) for finding the best-fitted surface, while interpolation ¨ tvo¨s et using kriging techniques was performed by O al. (2003), and Schro¨der and Pesch (2004). The kriging technique, giving not only surface maps but also an estimate of error, is the best approach when spatial autocorrelation exists and a model can be fitted to the data. If not, the second widely applied approach is to use inverse distance weighting (IDW) for exact interpolation between known sample values and then add sample locations to visualize the presence of uncertainty between sample locations. The problem of symbol vs. surface maps is discussed in depth in Reimann et al. (1998) or Reimann et al. (2003). Not only are spatial trends of interest but also covariance of chemical elements in multi-element data. The traditional numerical approach is to calculate correlation coefficients, but recent studies
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
have shown that correlation analysis based on correlation coefficients only is not advisable without a graphical check of the XY diagrams (or scatterplots) of the elements (Reimann et al., 2002). Linking XY plots to a GIS display was suggested because distinct sample populations observed in ¨ yra¨s et al., scatterplots can be located in the map (A 1998). Considering the large number of chemical elements measured, dimension reduction techniques have often been applied, like Principal Component Analysis (PCA) (Sucharova and Suchara, 2004) and factor analysis, which plots the results on a map (Reimann et al., 2001b). A pollution index for the evaluation of metal contamination was proposed by Grodzinska et al. (1999), to summarize information, by calculating the mean concentrations of given elements and the deviation from these at the given sampling locations. For the definition of background levels or usual conditions, and contamination factors as relation to the background levels, Fernandez et al. (2002) applied modal analysis. Recently, Schro¨der and Pesch (2004) presented spatially and temporally differentiated indicators of metal accumulation by means of cluster analysis. Spatio-temporal or monitoring data may be used as a complement to other geoscientific or environmental data. Multi-element and multi-medium regional geochemical mapping has been found to be a tool to understand the cycling of elements in the environment (Reimann et al., 2001a). Apart from relative air pollution and deposition rates, a correlation with disease incidence has been established for some chemical elements and specific diseases (Wappelhorst et al., 2000; Wolterbeek and Verbug, 2004). Temporal trends in the environment can be observed and quantified, and geochemical background values can be estimated in case measurements of metals in moss over several years are available. Such databases exist but integrated studies have not been so common. Consecutive element maps are still the usual way of visualizing spatial and temporal trends in the distribution of an element (Krauss-Kessler et al., 1999; Poikolainen et al., 2004). Quantitative change analysis requires samples to be taken from the same locations but interpolation between sample values introduces estimation errors of unknown magnitude and variation, if sample locations do not coincide in different surveys. There
3
are no simple approaches to integrate measurements made over a long period of time. Moreover, when combining data from different sources, issues like data quality and errors, the number of measured chemical elements, their lower detection limits, and the analytical techniques and their precision are to be taken into consideration. Visualization can be thought of as making the contents of a large and complex database visible with no data manipulation involved (Tukey, 1977). The purpose of this study is to test interactive visualization—a combination of parallel coordinate and scatterplot techniques—for simultaneous display of multi-element, spatial and temporal information in moss monitoring data. The moss monitoring data are combined from three surveys conducted over 15 years, of the concentrations of the metals Cu, Ni, Pb, V and Zn. The present paper focuses on illustrating the possibilities of simple graphical representations for exploration of spatial and temporal trends, extreme concentrations and multi-element relations in the moss data, and discusses the advantages and limitations of the used visualization techniques, compared to histograms, quartile plots and proportional symbol maps.
2. Data The data used in the present study belong to the moss monitoring program in Sweden and are composed of spatial coordinates and concentrations of metals Cu (copper), Ni (nickel), Pb (lead), Zn (zinc) and V (vanadium) in the mosses Hylocomium splendens and Pleurozium shreberi, sampled in 1985 (177 samples), 1990 (156 samples) and 1995 (188 samples). Previous studies have shown that data from these moss species can be combined without interspecies calibration and used for regional mapping purposes (Halleraker et al., 1998). The study area measures 300300 km and is located in southern Sweden (Fig. 1), and there are no common sample locations between the three surveys. The smallest concentration step of the data varies from 0.01 ppm in 1985 and 1990 to 0.001 ppm in the 1995 survey. The details of the sampling procedure and the sampled media as well as the analytical techniques can be found in Ru¨hling et al. (1987).
4
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
data, followed by visualization of the integrated dataset to attempt exploration of several aspects of the data simultaneously.
Sweden 7750000
3.1. Data characterization Several EDA techniques may be appropriate for describing the frequency distribution of the data and comparing the three surveys. As the number of sampled locations is relatively low (less than 200 per year), histograms and quartiles (25th, 50th and 75th percentiles) were chosen for the characterization of data distribution for each metal and year. First, histograms of the data were displayed to detect the presence of extreme values in the dataset. The display ranges of histograms were defined for each metal separately according to the lowest and the highest concentration from all three surveys, and the number of intervals in the histograms was set to 50. Quartile values were plotted for each metal in the three surveys to illustrate relative temporal changes in the interquartile range of concentrations of the five metals. Histograms and quartile curves were plotted in Microsoft Excel.
7250000
6750000
3.2. Spatial distribution of metal concentrations
Grid
North m
1850000
1350000
6250000
300000.00
Fig. 1. Location of the study area (marked) in southern Sweden, Swedish National Grid Coordinate System.
To display spatial trends in the distribution of the metal concentrations, and visualize the spatial location Table 1 Metal concentrations (ppm) in mosses in 1985 (177 samples), 1990 (156 samples) and 1995 (188 samples): minimum, quartiles and maximum Year
Concentration ranges as well as quartiles of the metal concentrations are presented in Table 1. The database in the present study contains information about two plane spatial coordinates, the concentration of five metals, and the year of the survey for each sample location.
3. Methods The overall design of the study was to use simple graphical EDA and GIS techniques to present and summarize the spatial and temporal trends in moss
1985
1990
1995
Min 25% Median 75% Max Min 25% Median 75% Max Min 25% Median 75% Max
Cu
Ni
Pb
V
Zn
2.76 4.66 5.68 7.24 34.10 2.73 5.30 6.10 7.12 12.05 2.420 4.056 4.765 5.671 8.470
0.87 1.43 1.78 2.43 7.82 0.67 1.23 1.56 1.81 3.71 0.475 0.876 1.050 1.275 1.775
3.78 8.69 12.40 19.00 59.00 0.46 10.76 13.26 18.11 36.10 2.440 5.650 7.547 9.322 15.169
0.09 1.49 2.25 3.61 8.88 1.18 2.15 2.70 3.37 6.35 0.883 2.104 2.388 3.044 16.400
19.00 35.50 42.90 50.60 113.00 16.67 39.74 44.36 51.44 95.08 16.790 34.800 40.016 46.352 80.400
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
of relative temporal changes, proportional point symbol maps were created for all metals and years using GIS package Idrisi32. Interpolation of the moss data was not considered due to the high variability of the metal concentrations combined with low sampling density and weak spatial autocorrelation. The display range for each element was defined by the values in the 1995 dataset, which showed the lowest concentrations of the metals, reported with the smallest concentration step (see Data chapter). This approach makes the map representation independent of varying concentration ranges and emphasizes the relative temporal changes over the years. The three quartiles and the observed maximum concentrations in the 1995 data (Table 1) were used to divide the concentration range into five classes. The first class includes all concentrations less than the first quartile value, the second class all concentrations between the first and the second quartile values, and so on. The last class is assigned to all concentrations higher than the maximum value in the 1995 data and is referred to as unusual. For Zn and V, extreme concentrations (separated from all of the other values) were observed also in the histograms of the 1995 data. For these metals, the lower limit of class 5 has been lowered, to
5
include those extreme values as unusual. The five classes were represented by symbols differing in size, color and shape, enhancing the concentrations over the median value (classes 3–5). To aid the recognition of spatial patterns the unusual concentrations were represented by striped rectangles while the three quartiles were assigned circles of different size and color. 3.3. Visualization A freeware visualization tool for high-dimensional data XmdvTool (XmdvTool) was chosen to display and explore datasets from three moss surveys. There are other freeware packages available for visualizing multi-dimensional data, for example XGobi/GGobi (GGobi). The reasons for choosing XmdvTool were simple file format, user-friendly manipulation of colors and brushes, and the possibility of four separate brushes, so that three moss surveys could be highlighted using separate colors. Each variable (or dimension) may be independent or interdependent with one or more of the other variables, which may be discrete or continuous in nature or take on symbolic (nominal) values. Each dimension corresponds to an axis, and in a parallel coordinate display, the N axes are
Fig. 2. Visualization of two-dimensional point data as a scatterplot (a) and in parallel coordinates (b). A brush defined in two dimensions (shaded area) selects the highest four values of variables A and B (black points/polylines).
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
V 1985
40
40
frequency
frequency
50 30 20 10 0
40 Cu 1985
30 20 10
frequency
6
0 0.34
5.44
10.54
15.64
Ni 1985
30 20 10 0
11.2
0.7
concentration class, ppm
21.7
32.2
0.16
concentration class, ppm
2.56
4.96
7.36
concentration class, ppm
V 1990
40
30 20 10
40 Cu 1990
30 20 10
frequency
40
frequency
frequency
50
0
0 0.34
5.44 10.54 concentration class, ppm
20 10 0
0.7
15.64
Ni 1990
30
11.2
21.7
0.16
32.2
concentration class, ppm
2.56
4.96
7.36
concentration class, ppm
V 1995
40
30 20 10
40 Cu 1995
30 20 10 0
0 0.34
5.44
10.54
15.64
frequency
40
frequency
frequency
50 Ni 1995
30 20 10 0
0.7
concentration class, ppm
11.2
21.7
32.2
0.16
concentration class, ppm 25 Pb 1985
20 10
frequency
frequency
30
0
7.36
Zn 1985
20 15 10 5
19.2
37.2
15
55.2
concentration class, ppm
45
75
105
concentration class, ppm
25 Pb 1990 20 10
frequency
30
frequency
4.96
0 1.2
0
Zn 1990
20 15 10 5 0
1.2
19.2
37.2
55.2
15
concentration class, ppm
45
75
105
concentration class, ppm
25 Pb 1995 20 10 0
frequency
30
frequency
2.56
concentration class, ppm
Zn 1995
20 15 10 5 0
1.2
19.2
37.2
concentration class, ppm
55.2
15
45
75
105
concentration class, ppm
Fig. 3. Histograms of metal concentrations in mosses in 1985, 1990 and 1995. The X-axis shows absolute frequency of samples per concentration class.
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
organized as uniformly spaced vertical lines. A data element in N-dimensional space manifests itself as a connected set of points (one on each axis) forming a
7
polyline (Fig. 2b). In a scatterplot matrix, which can be opened as an auxiliary display window, two-dimensional scatterplots of all pairs of variables are plotted
Concentration, ppm
Ni 2.55 2.25 1.95
25%
1.65 1.35 1.05 0.75
50% 75%
1985
1990
1995
Year
Concentration, ppm
V 4 3.5 3
25%
2.5 2 1.5 1
50% 75%
1985
Cu Concentration, ppm
1990
1995
Year
7.5 7 6.5 6 5.5 5 4.5 4
25% 50% 75%
1985
1990
1995
Year
Concentration, ppm
Zn 50 47 44 41 38 35 32
25% 50% 75%
1985
Pb Concentration, ppm
1990
Year
20 17.5 15 12.5 10 7.5 5
25% 50% 75%
1985
1990
1995
Year
Fig. 4. Quartile curves of metal concentrations in mosses in 1985, 1990 and 1995.
1995
8
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
(Fig. 2a). Brushing is a selection process in which the user can highlight (select) or mask (hide) a subset of data being graphically displayed by pointing at the data elements, and brushing is associated with linking, which means brushing data elements in one view affects the same data in all other views (Fig. 2). The shape of the brush is a N-dimensional hyperbox, and the user specifies N brush dimensions using N slider bars. Brushes are displayed as shaded regions, where data points which fall within the brush are highlighted in a different color. Note that for the visualization, the different sampling locations of the moss surveys do not pose any problems and, even if the software package was not developed to use with spatial data, in case spatial information is included the scatterplot of the plane coordinates displays the spatial location of samples within the study area. The datasets from three surveys were compiled into one plain text file and one variable identifying the year was added. The maximum and minimum values were set to the lowest and the highest observed concentrations of each metal, respectively. The two first dimensions were assigned to X (easting) and Y (northing) coordinates of samples and the last dimension for the temporal variable (year). In a first step, the parallel coordinate display was used for studying the multielement composition of extreme values detected in histograms. The order of variables (axes) was changed to facilitate the recognition of multi-element signatures. Next, the temporal changes in the correlations between pairs of metals were studied on scatterplot display. Finally, quartile values were compiled into a separate data file, in order to summarize the temporal trends in metal relations, and studied in scatterplots. The scales were defined by the lowest and highest values of quartiles of each metal over three surveys.
4. Results 4.1. Temporal trends in metal concentrations Frequency distributions for the metals are shown in Fig. 3. The histograms revealed the presence of extreme concentrations in the moss data, with the
highest values related to the 1985 survey for all metals except V. For the latter, a number of very high-valued samples appear in 1995. Apart from those, approximately seven concentrations in the 1985 data may be considered as most extreme: three values of Cu, three of Ni, and one value of Pb. General trends are towards lower concentrations from 1985 to 1995. The location of the mode does not always follow this trend though, and based on a purely visual observation, mode values tend to increase slightly from 1985 to 1990 and then decrease in 1995. Multimodality may be observed in the histograms of all metals for 1985, in Pb and Zn for 1990, and in V for 1995. Quartile curves of the elements over the years are shown in Fig. 4. Ni has a continuous decline of concentrations from 1985 to 1995 while Cu, Zn, V, and Pb show rising 25% and 50% levels in 1990 followed by a decrease in 1995. The 25% and 50% levels decrease considerably from 1990 to 1995 for all metals except for the lowest quartile of V. The 75% values for 1990 are close to 1985 levels and then, drop in 1995. Pb shows the most dramatic drop of concentrations between 1990 and 1995. Non-symmetric distributions within the inter-quartile range can easily be recognized. Even in the 1995 data, which have the lowest metal levels in mosses, a tendency towards positive skew can be recognized for most of the metals. 4.2. Spatio–temporal trends in element maps Maps of selected metals (Cu, Pb and V) are presented in Figs. 5–7. The symbol maps of five metals through three surveys exhibit similar spatial trends but the number of samples belonging to class 5 (unusual) varies considerably. Ni and Pb have the largest number of samples in the 1985 data in class 5, distributed mostly over the eastern side of the area. Regarding the 1990 survey, the number of samples in class 5 has decreased, which can be found mainly in the western and southwestern (SW) part of the area. In 1995, the concentrations of the metals drop and the highest ones (class 4) tend to be located near the SW corner of the area, where 14 samples containing
Fig. 5. Spatial distribution of Cu concentration in mosses in 1985, 1990 and 1995. The sampled locations differ between the three years, and class division is related to quartiles values in year 1995 data (see Table 1 and text).
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
6500000
1985: Cu, ppm < 4.056 4.056 - 4.764 4.765 - 5.670 5.671 - 8.471 > 8.471
1300000
Grid
6500000
North m
100000.00 1990: Cu, ppm < 4.056 4.056 - 4.764 4.765 - 5.670 5.671 - 8.471 > 8.471
1300000
Grid
6500000
m 100000.00 1995: Cu, ppm < 4.056 4.056 - 4.764 4.765 - 5.670 5.671 - 8.471
Grid 1300000
North
North
m 100000.00
9
10
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
6500000
1985: Pb, ppm < 5.650 5.650 - 7.546 7.547 - 9.321 9.322 - 15.169 > 15.169
1300000
Grid
6500000
North m
100000.00 1990: Pb, ppm < 5.650 5.650 - 7.546 7.547 - 9.321 9.322 - 15.169 > 15.169
1300000
Grid
6500000
m 100000.00 1995: Pb, ppm < 5.650 5.650 - 7.546 7.547 - 9.321 9.322 - 15.169
Grid 1300000
North
North
m 100000.00
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
high V concentrations also form a spatial cluster (Fig. 7). 4.3. Visualization of extreme concentrations A parallel coordinate display of the combined dataset is shown in Fig. 8. The concentration of each metal in each sample is plotted on the vertical axes and each sample in the database forms a line (polyline). In Fig. 8, concentrations that are considerably higher than the main body of data are selected (blue and green polylines), to visualize the composition of the samples containing these values. The same values were already detected in histograms in Fig. 3 and are the three highest values of Cu, the three highest of Ni, and one extreme value of Pb (all in the 1985 data). Those extreme concentrations were selected using two separate brushes (blue and green), and revealed varying concentrations of metals as well as varying spatial location. The seven values are contained in six samples, because the maximum concentrations of Pb and Cu belong to same sample. It can also be seen that samples containing high Cu, shown by green polylines, also include very high concentrations of Pb while the three highest concentrations of Ni, shown by blue polylines, are not associated with the highest concentrations of other metals. The samples with high content of V related to year 1995 (the highest variable on the year axis) are part of a distinct multi-element signature revealed by a cluster of similar (yellow) polylines. The metal concentrations of those samples were compared to the quartile values, with a conclusion that Cu, Ni, Pb and Zn values are all higher than the 25th percentile and some of them reach levels over the 75th percentile. 4.4. Temporal changes in relations of metals A scatterplot matrix display of integrated moss data is presented in Fig. 9, showing all samples belonging to the 1995 survey highlighted in black. The correlations of metals are visualized in two-dimensional plots and the different years can be selected using the temporal variable. It is interesting to see that the
11
scatter plot clouds for 1995 differ from the other two years. Decreasing concentrations and lateral shifts of scatterplot clouds can be observed for several pairs of metals, especially those with Ni and/or Pb. At the same time, the extreme concentrations may have a considerable influence on the shape of the scatterplots. For example, the highest three values of Cu are affecting the whole dataset and there is a notable change in the shape of the scatterplot cloud with and without those samples. This is illustrated in Fig. 10 where the Cu–Zn scatterplots before and after the removal of seven extreme values are presented. A scatterplot matrix displaying only the quartile values is shown in Fig. 11. As the data are generalized, only changes between 25th and 75th percentile (or the inter-quartile range) are displayed. The three brushes are visualized by rectangular boxes and represent the three years of the survey, 1985 (the pink color) to 1995 (the darkest red), as seen in the scatterplots of the last column or row in the scatterplot matrix. For each metal, the spread of its inter-quartile range is visualized by the vertical extent of a rectangle, which is defined by the first quartile (in lower left corner) and the last quartile (in upper right corner). The relative temporal changes in interquartile ranges and the quartile values are visualized in the scatterplots along the diagonal of the whole matrix (from upper left to lower right). Taking the metal Ni as a reference, a continuous decline of concentrations over time is illustrated by decreasing rectangles with some overlap (the scatterplot in the first column and first row of the matrix). All other metals deviate from this trend to a smaller or larger degree. Comparing the rectangles of the years 1985 and 1990 (see the year column or row of the matrix), the inter-quartile ranges of all metals have decreased, but the 1990 levels are still relatively high for Cu, Pb, Zn and V. From 1990 to 1995 the inter-quartile ranges decrease further, together with a fall in metal concentrations. Distributions that are not symmetric regarding the inter-quartile range are recognized by rectangles in which the median value is not located in the middle of the colored box. Concentrating on the pairs of metals, if the correlation of two metals is not
Fig. 6. Spatial distribution of Pb concentration in mosses in 1985, 1990 and 1995. The sampled locations differ between the three years, and class division is related to quartiles values in year 1995 data (see Table 1 and text).
12
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
6500000
1985: V, ppm < 2.104 2.104 - 2.388 2.388 - 3.044 3.044 - 5.500 > 5.500
1300000
Grid
6500000
North m
100000.00 1990: V, ppm < 2.104 2.104 - 2.388 2.388 - 3.044 3.044 - 5.500 > 5.500
1300000
Grid
6500000
North m
100000.00 1995: V, ppm < 2.104 2.104 - 2.388 2.388 - 3.044 3.044 - 5.500 > 5.500
1300000
Grid
North m
100000.00
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
13
Fig. 8. Parallel coordinate display of moss monitoring data from 1985, 1990 and 1995. Each sample is represented by a polyline connecting vertical axes. Brushed or selected (blue, green and yellow-colored polylines) are extreme outliers of the five metals. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
changing over the years, the quartiles (points on the rectangles) should all be aligned along the diagonal of the scatterplot. As already mentioned above, this is not the case and instead two trends can be seen. First, the rectangles shrink from 1985 to 1995 though the shapes of the rectangles representing three surveys do not change significantly. That means concentrations of the two metals decrease with different rates that are more or less constant during the two time periods, like V–Ni. The second trend is related to considerable changes in the shape of rectangles representing different surveys for the same pair of metals. It can be seen that the correlation between the two metals has changed for the following metal pairs: Pb–Cu, Pb–Zn, Cu–Ni, Cu–V, and V–Zn.
5. Discussion 5.1. Data The vertical placement of histograms having the same concentration scale (X-axis) and bandwidth (class interval) gives very fast overview of the temporal changes in absolute concentrations of metals over the three surveys. The number and extremity of the highest concentrations as well as the presence of several modes and slight changes in the location of modes are easily perceived and encourage hypothesis generation and testing. For example, the 1995 levels appear as background concentrations of the metals and assume a decline in metal input to mosses during
Fig. 7. Spatial distribution of V concentration in moss in 1985, 1990 and 1995. The sampled locations differ between the three years, and class division is related to quartiles values in year 1995 data (see Table 1 and text).
14
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
Fig. 9. Scatterplot matrix display of moss monitoring data from 1985, 1990 and 1995. Highlighted (in black color) are all samples belonging to year 1995 survey. For scales, see Fig. 8.
the studied 15 years. Thus the dislocation of modes towards higher concentration in 1990 may be due to larger degree of overlap of the background and the bpollutionQ concentrations. However, considering the influence of bandwidth on the appearance of the histograms, the judgements made are qualitative rather than quantitative, especially when the dataset is not large and the sample values might be spatially autocorrelated. The presence of spatial trends in data suggests that a histogram of the whole dataset may not represent the distribution of values within different parts of the area. Also, considering the difference in data precision the bandwidth for plotting the histograms might have been too large for the 1995 survey data, resulting in too smooth (and normal-looking) histograms.
Regarding the quartile curves, as most of the median values are highest in 1990 (see Table 1, Fig. 4), one could think that there was no general decrease in Cu, Pb, V and Zn levels during the first five years, even if the maximum concentrations are considerably reduced in the histograms. Increasing 25th percentile values may be due to the decrease in the number of low-valued samples combined with a decrease in sample size (from 177 samples in 1985 to 156 samples in 1990) and a difference in sampling locations. Quartile curves complement the information presented by histograms through visualizing the changes in inter-quartile range that is robust to the presence of extreme values in the data. The median values of the five metals are varying with time, thus taking the year 1995 quartiles to
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
15
Fig. 10. Cu–Zn scatterplot of moss monitoring data from 1985, 1990 and 1995: a) original data, b) cleaned data after removal of extreme outliers. Highlighted (in black color) are all samples belonging to year 1995 survey. For scales, see Fig. 8.
16
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
Fig. 11. Scatterplot matrix of quartiles values of five metals in moss samples from 1985, 1990 and 1995. Years are indicated by shades of red color from the lightest (1985) to the darkest (1995). Points represent quartile values and rectangles inter-quartile range. For scales, see Table 1. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
represent the geochemical background values affects the classification of samples. The interpretation of the symbol maps should therefore differ from metal to metal. For example, Pb (Fig. 6) can be expected to have more than 25% of the samples classed as unusual (pollution) for both 1985 and 1990, because the maximum value of the 1995 data lies between the 50th and 75th percentile value of 1985 and 1990 (see Table 1, Fig. 4). This generalization results in loss of information regarding the spatial distribution of the highest concentrations of Pb (Fig. 6). Due to varying number of samples classed as unusual, the maps of the metals for the same year are not comparable (relative rather than absolute temporal changes are emphasized). Unlike the other techniques applied in this study, parallel coordinate visualization reveals if and how the extreme concentrations are related in multivariate space. Extreme values present in the dataset may
have a large influence on the visual perception of the metal covariance, and the most extreme values could be replaced with a more suitable concentration chosen in accordance with the multi-element signatures they belong to. This avoids both a reduction of the already sparse data and a change of the associations of the elements. The high values forming a distinct signature might be of interest if they are also a part of a spatial cluster. The parallel coordinate display not only indicates the multi-element composition of high-V samples in 1995 data but also their spatial location (see Fig. 8). However, without a quantitative output of brushed data values, one cannot determine the actual concentrations of the metals in the selected samples, which is important when their significance has to be defined. General temporal trends are not easily recognized in parallel coordinate display of the integrated moss data, but the simplicity of interactive brushing promotes visual queries, as distinct concen-
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
trations and associations of the five metals can be brushed and localized in both space and time. The temporal changes in shape and location of scatterplot clouds in Fig. 9 are mostly related to the 1995 data, but due to extreme concentrations the correlation analysis should not be based on the scatterplots only. In order to analyse the covariance of the metals and compare the datasets the data from three years could be studied separately. In scatterplots of quartile values, (Fig. 11) the differences between the metals are well illustrated by the extent of the overlap of the rectangles that represent the three surveys and the temporal changes in correlations of metals are recognized. The same information cannot be obtained from the scatterplots of the whole data, as quartile plots are robust to extreme values that may mask the trends in the display of the original data (see V–Zn scatterplots in Figs. 9 and 10). Visible change in the shape of the rectangles may indicate changing composition of metal input to mosses. For example, regarding Pb–Zn inter-quartile box for year 1995, the decrease in Pb concentrations between 1990 and 1995 is much larger than that of Zn. In case those two metals were correlated in the 1995 survey data the same manner as in 1985 and 1990, the second and the third quartile values of Zn should have been approximately 2 and 4 ppm lower, respectively (based on visual estimation and interactive query of values). The similar conclusions can be made for several pairs of the metals, and to summarize, both the contents and associations of the five metals in mosses have been changing over the 15 years studied. 5.2. Methods A histogram is a relevant tool for the characterization of the shape of a distribution, especially for detection of extreme values, but also multimodality, which would not be visible in boxplots (or box-andwhisker plots) of the same data. As already mentioned, the location of the modes is, however, not reliable considering the smoothing implied by a bandwidth that is much larger than the precision of data. The question is which class width should be chosen for each element and year and how much does the shape of distribution depend on the class width. One can argue that cumulative frequency (CDF) plots may be a better alternative to histograms,
17
but they have limitations, too. The extreme values are well visualized, but CDF-plots are not informative regarding the location of mode(s). In this study the extreme values are visualized using parallel coordinates, and the location of mode(s) is of great interest when the three years are compared. Information about the distribution of values is generalized by comparing quartile values from different years for each metal. The inter-quartile range represents 50% of the data, which are more or less centered around the median value. When looking at the curves in Fig. 4 one has to be aware of the relative scales and the absolute concentrations, in order to avoid under- or overestimation by the visual impression. Quantitative measures are more relevant when differentiating a slight change from a substantial one. Nevertheless a line connecting the respective quartiles of the three surveys visualizes efficiently the direction of change, while the concentration scale allows for absolute comparison. The symbol maps do not always give a good overview of spatial trends in the data as sampling density, the location of sampling points and the type and size of symbols have an influence on the visual impression. The clustering of sampling locations or the presence of unsampled areas may affect the conclusions based on a visual impression towards over- or underestimation of a trend. A common display range defined by one of the studied years has both advantages and disadvantages. When there is no large variation between the median values, the approach used in the present study will probably be most useful. An optimal approach for defining background levels should be data-dependent and based on a comprehensive characterization of the distribution of each chemical element separately. Visual approaches have a good potential to detect changes and their direction. However, the dot images may look too patchy and need a good design of colors, symbols, and the appropriate rules for class division. Color illustrations allow much faster recognition of patterns and trends. The use of percentiles or quartiles to design the color scale would, in most cases, be a good choice. A small number of classes reduce the level of detail in recognizing local variations but may also help to refine the search for interesting patterns in data. In the case of many chemical elements to be studied, the use of single-element maps is tedious but the integration of the information is not possible,
18
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
because the information content becomes too large for a single map. There are no common locations between three surveys and the sampling density varies substantially over the different parts of the area. Interpolation of a surface from the known sampled values may not be advisable as the uncertainty cannot be easily integrated in the final map. Moreover, when three surveys are combined for a time-series analysis, the uncertainty will increase and may reach levels that make the outcome useless. The time-series analysis tools that are available in GIS can only be applied when surfaces are available, or when the sample locations are the same over the years. An alternative to a quantitative analysis can be the use of interactive techniques, for example, short movie based on three symbol maps of the concentration of the same metal. From the location of yellow-colored polylines in Fig. 8 one can assume that those samples contain a little bit more Zn than other metals, even if this was not supported by the comparison of the brushed metal levels with the quartile values. This illustrates a disadvantage of parallel coordinate visualization when variables have varying ranges. Estimation of the absolute levels of the metals in the integrated dataset should be based on quantitative output rather than visual impression. One way to overcome this problem may be a transformation or standardization of the geochemical datasets before integration. Alternatively the concentration levels of the selected samples may be exported and examined by other (numerical) data analysis packages. Another drawback of the parallel coordinate technique is that the documentation of the exploration of the whole dataset may become tedious. Each step in the discovery process would take time to document and produce lots of illustrations with unnecessarily large amount of auxiliary detail. Compared with clearly visible multi-element signatures, general trends in multivariate regional or large-scale geochemical data are not as easily perceived by using only the parallel coordinate display. Visualization of a large number of samples unavoidably produces a clutter of polylines, making the illustration busy. However, visualizing extreme values and samples is a useful technique to analyse environmental datasets and may prove helpful in detecting data errors in an early stage of analysis. To aid interspecies calibration
and visualize differences, moss species could be indicated by an additional variable. Scatterplots contain a lot of information and are excellent to study correlations between the metals. However, the scaling is very important and may affect the visual impression considerably. The integration of spatial reference allows for recognition and exploration of spatial trends in the distribution of the concentrations of metals, but the size of the area together with sampling density determines the readability of the X– Y scatterplot illustrating the geographic location of the samples. The power of a scatterplot matrix is in providing insight into multivariate space but for efficient use with spatial data a link to a GIS map display would be desirable. An example of a dynamic linking of an EDA package to a GIS is integration of Xgobi with ArcView (Symanzik et al., 2003). When comparing the three surveys, the problems of having many variables and limited computer screen size cannot be overcome without extensive zooming or without interactively selecting and deselecting some of the variables. Producing readable illustrations of the exploration results may also become a problem. In the case of more than ten variables, it might be a good idea to use dimension reduction techniques, such as PCA (Principal Component Analysis), to limit the number of variables to be displayed. The information presented by scatterplots of quartiles in the last (year) column of the matrix in Fig. 11, is similar to that of the quartile curves (Fig. 4), but the colored rectangle illustration is visually more appealing and easier to understand. Even if the data are exactly the same, scatterplots of the quartiles are superior to numerical presentation and graphs by offering both overview and comparison. In addition, the correlations of all pairs of variables are visible. For quantitative estimation, one may use other percentiles, in order to represent more than 50% of the data and even fit a mathematical function to describe the correlation.
6. Conclusions In this paper a visual approach for integrating moss monitoring data from three surveys conducted with 5-year intervals is presented. A dataset contain-
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
ing the concentration of five metals, their spatial coordinates, and a temporal variable, can be visualized simultaneously without loss of information. The results indicate that this insight into high-dimensional space gives valuable knowledge about multielement outliers, spatial clusters, and temporal changes in correlations of metal concentrations. Compared to commonly used histograms, quartiles, and element maps, visualization can provide more information, is faster, simple to use, and interactive; therefore, the visualization approach can be recommended in the initial stages of data analysis. The disadvantages of using high-dimensional visualization for multivariate spatio-temporal data are related to the limitation in screen space, varying concentration ranges of chemical elements, and the documentation of the visual thinking process. One might argue that visual exploration is subjective, like finding or missing important features in the plots. This is, however, not truly subjective given that the data only present the facts that have to be discovered. The results of this study suggest that high-dimensional visualization provides an objective approach for integrating and exploring environmental monitoring data, which helps to maintain the initial data quality during the analysis.
Acknowledgements This study has been financed by a grant from the Geological Survey of Sweden (SGU). The author would also like to thank H. Henkel for his encouragement and useful comments, and M. Ward for help with visualization software. The editor J.P. Bennett and two anonymous reviewers are thanked for their comments that led to significant improvements to the paper.
References ¨ yra¨s M, Niskavaara H, Bogatyrev I, Chekushin V, Pavlov V, de A Caritat P, et al. Regional patterns of heavy metals (Co, Cr, Cu, Fe, Ni, Pb, V and Zn) and sulphur in terrestrial moss samples as indication of airborne pollution in a 188,000 km2-area in Northern Finland, Norway and Russia. J Geochem Explor 1997;58:269 – 81.
19
¨ yra¨s M, Pavlov V, Reimann C. Comparison of sulphur and heavy A metal contents and their regional distribution in humus and moss samples from the vicinity of Nikel and Zapoljarnij, Kola Peninsula; Russia. Water Air Soil Pollut 1998;98:361 – 80. Berg T, Steinnes E. Use of mosses (Hylocomium splendens and Pleurozium schreberi) as biomonitors of heavy metal deposition: from relative to absolute deposition values. Environ Pollut 1998;1:61 – 71. Bjo¨rklund A, Gustavsson N. Visualization of geochemical data on maps: new options. J Geochem Explor 1987;29:89 – 103. Cˇeburnis D, Sˇakalys J, Armolaitis K, Valiulis D, Kvietkus K. Instack emissions of heavy metals estimated by moss biomonitoring method and snow-pack analysis. Atmos Environ 2002;36: 1465 – 74. de Caritat P, Reimann C, Chekushin V, Bogatyrev I, Niskavaara H, Braun J. Mass balance between emission and deposition of trace metals and sulphur. Environ Sci Technol 1997;31:2966 – 72. de Caritat P, Reimann C, Bogatyrev I, Chekushin V, Finne TE, Halleraker JH, et al. Regional distribution of Al, B, Ba, Ca, K, La, Mg, Mn, Na, P, Rb, Si, Sr, Th, U and Y in terrestrial moss within a 188,000 km2 area of the central Barents region: influence of geology, seaspray and human activity. Appl Geochem 2001;16:137 – 59. Fernandez JA, Rey A, Carballeira A. An extended study of heavy metal deposition in Galicia (NW Spain) based on moss analysis. Sci Total Environ 2000;254:31 – 44. Fernandez JA, Ederra A, Nunez E, Martinez-Abagair J, Infante M, Heras P, et al. Biomonitoring of metal deposition in northern Spain by moss analysis. Sci Total Environ 2002;300:115 – 27. Ggobi Data Visualization System, http://www.ggobi.org. Grodzinska K, Szarek-Lukaszewska G, Godzik B. Survey of heavy metal deposition in Poland using mosses as indicators. Sci Total Environ 1999;229:41 – 51. Gustavsson N, Lampio E, Tarvainen T. Visualization of geochemical data on maps at the Geological Survey of Finland. J Geochem Explor 1997;59:197 – 207. Halleraker JH, Reimann C, de Caritat P, Finne TE, Kashulina G, Niskavaara H, et al. Reliability of moss (Hylocomium splendens and Pleurozium schreberi) as bioindicator of atmospheric chemistry in the Barents region: interspecies and field duplicate variability. Sci Total Environ 1998;218:123 – 39. Herpin U, Berlekamp J, Markert B, Wolterbeek B, Grodzinska K, Siewers U, et al. The distribution of heavy metals in a transect of the three states the Netherlands, Germany and Poland, determined with the aid of moss monitoring. Sci Total Environ 1996;187:185 – 98. Krauss-Kessler T, Dietl C, Tritchler J, Peichl L. Temporal and spatial trends of metal contents of Bavarian mosses Hypnum Cupressiforme. Sci Total Environ 1999;232:13 – 25. Ku¨rzl H. Exploratory data analysis: recent advances for interpretation of geochemical data. J Geochem Explor 1988;30:309 – 22. ¨ tvo¨s E, Pazmandi T, Tuba Z. First national survey of atmospheric O heavy metal deposition in Hungary by the analysis of mosses. Sci Total Environ 2003;309:151 – 60. Poikolainen J, Kubin E, Piispanen J, Karhu J. Atmospheric heavy metal deposition in Finland during 1985–2000 using mosses as bioindicators. Sci Total Environ 2004;318:171 – 85.
20
K. Gru¨nfeld / Science of the Total Environment 347 (2005) 1–20
Real C, Aboal JR, Fernandez JA, Carballeira A. The use of native mosses to monitor fluorine levels—and associated temporal variations—in the vicinity of an aluminium smelter. Atmos Environ 2003;37:3091 – 102. Reimann C, Filzmoser P. Normal and lognormal distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environ Geol 1999;39:1001 – 14. Reimann C, de Caritat P, Halleraker JH, Finne TE, Kashulina G, Bogatyrev I, et al. Regional atmospheric deposition patterns of Ag, As, Bi, Cd, Hg, Mo, Sb, and Tl in a 188,000 km2 area in the European Arctic as displayed by terrestrial moss samples—long range atmospheric transport versus local impact. Atmos Environ 1997;31:3887 – 901. ¨ yra¨s M, Chekushin V, Bogatyrev I, Boyd R, de Reimann C, A Caritat P, et al. Environmental geochemical atlas of the central Barents region. NGU-GTK-CKE special publication. Trondheim7 Geological Survey of Norway; 1998. 82-7385176-1. Reimann C, Kashulina G, de Caritat P, Niskavaara H. Multielement, multi-medium regional geochemistry in the European Arctic: element concentration, variation and correlation. Appl Geochem 2001a;16:759 – 80. Reimann C, Niskavaara H, Kashulina G, Filzmoser P, Boyd R, Volden T, et al. Critical remarks on the use of terrestrial moss (Hylocomium splendens and Pleurozium schreberi) for monitoring airborne pollution. Environ Pollut 2001b;113:41 – 57. Reimann C, Filzmoser P, Garrett RG. Factor analysis applied to regional geochemical data: problems and possibilities. Appl Geochem 2002;17:185 – 206. Reimann C, Siewers U, Tarvainen T, Bityukova L, Eriksson J, Gilucis A, et al. Agricultural soils in Northern Europe: a geochemical atlas. Geologisches jarbuch, sonderhefte, reihe D, heft SD, vol. 5. Stuttgart7 Schweizerbart’Sche Verlagsbuchhandlung; 2003. Ru¨hling 2, Steinnes E. Atmospheric heavy metal deposition in Europe 1995–1996. Nord 1998;15:1 – 67. Ru¨hling 2, Tyler G. An ecological approach to the lead problem. Bot Not 1968;121:321.
Ru¨hling 2, Tyler G. Heavy metal deposition in Scandinavia. Water Air Soil Pollut 1973;2:445 – 55. Ru¨hling 2, Rasmussen L, Pilegaard K, Ma¨kinen A, Steinnes E. Survey of atmospheric heavy metal deposition in the Nordic countries in 1985. Nord 1987;21:1 – 44. Schro¨der W, Pesch R. Spatial analysis and indicator building for metal accumulation in mosses. Environ Monit Assess 2004;98: 131 – 55. Selinus O. Large-scale monitoring in environmental geochemistry. Appl Geochem 1996;11:251 – 60. Steinnes E. A critical evaluation of the naturally growing moss to monitor the deposition of atmospheric metals. Sci Total Environ 1995;160/161:243 – 9. Sucharova J, Suchara I. Distribution of 36 element deposition rates in a historic mining and smelting area as determined through fine-scale biomonitoring techniques: Part I. Relative and absolute atmospheric deposition levels detected by moss analysis. Water Air Soil Pollut 2004;153:205 – 28. Symanzik J, Swayne DF, Temple Lang D, Cook D. Software integration for multivariate exploratory data analysis. http:// www.math.usu.edu/~symanzik/, Book Chapter. Tukey JW. Exploratory data analysis. Reading7 Addison Wesley; 1977. 506 pp. Tyler G. Moss analysis—a method for surveying heavy metal deposition. In: Englund HM, Berry WT, editors. Proceedings of the 2nd international clean air congress. New York7 Academic Press; 1970. Wappelhorst O, Ku¨hn I, Oehlmann J, Markert B. Deposition and disease: a moss monitoring project as an approach to ascertaining potential connections. Sci Total Environ 2000;249:243 – 56. Wolterbeek HTh, Verbug TG. Atmospheric metal deposition in a moss data correlation study with mortality and disease in the Netherlands. Sci Total Environ 2004;319:53 – 64. XmdvTool, multivariate data visualization tool. http://davis.wpi. edu/~xmdv/. Zechmeister HG. Correlation between altitude and heavy metal deposition in the Alps. Environ Pollut 1995;89:73 – 80.