Kriging-based approach to predict missing air temperature data

Computers and Electronics in Agriculture 142 (2017) 440–449 Contents lists available at ScienceDirect Computers and Electronics in Agriculture journ...

Download PDF

1MB Sizes 0 Downloads 23 Views

Report

PDF Reader
Full Text

Computers and Electronics in Agriculture 142 (2017) 440–449

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag

Original papers

Kriging-based approach to predict missing air temperature data ⁎

MARK

Anastasiya Shtiliyanova, Gianni Bellocchi , David Borras, Ulrich Eza, Raphaël Martin, Pascal Carrère INRA, UMR Ecosystème Prairial, VetAgroSup, 63000 Clermont-Ferrand, France

A R T I C L E I N F O

A B S T R A C T

Keywords: Air temperature Daily and hourly resolutions Data-driven machine learning Inﬁlling Kriging-based temporal interpolation

The geo-statistical Kriging method is conventionally used in the spatial dimension to predict missing values in a series by utilizing information from neighbouring data, supported by the hypothesis that mathematical expectation is a function of distance between observations. By using a data-driven machine learning-based inferencing and exploration framework, this research applies a Kriging-based interpolation in the temporal dimension to ﬁll in data gaps in time-series of air temperatures. It assesses its performance for artiﬁcial gap scenarios (ranging in length from single one to six consecutive data points) generated using data with both daily and hourly resolutions from ﬁve sites in Europe (Laqueuille, France; Grillenburg, Germany; Monte Bondone, Italy; Oensingen, Switzerland; Rothamsted, United Kingdom) and one in France overseas (Sedael, Réunion Island). Results show that the method is capable of predicting missing temperatures with acceptable accuracy, especially with the hourly resolution and for non-high elevation sites: modeling eﬃciency (EF ≤ 1, optimum) > 0.8, with the exception of Monte Bondone, placed at > 2000 m a.s.l. (EF < 0). With daily data, maximum temperature was correctly predicted at all sites (0.6 ≤ EF < 0.9), while some less accuracy (down to EF < 0.4) was noted when predicting missing daily minimum temperatures. In conclusion, the method appears suitable to be applied to ﬁll in hourly temperature gaps, requiring more stringent hypotheses concerning daily data and mountain sites (but further studies are required to draw concluding recommendations).

1. Introduction Long records of weather data are needed for evaluating scenarios in natural resource studies (e.g. Shenk and Franklin, 2001; Mavi, 2004). In many of these studies, surface weather observations are the fundamental forcing data of simulation models, for which the absence of a particular station’s data at a given time can introduce biases into responses or generate spurious trends (e.g. Hoogenboom, 2000). Missing data are a common problem in meteorological observational datasets. Station history records are often incomplete (e.g. Menne et al., 2010) because instruments may break in malfunction, or data transmission may be interrupted. Meteorological phenomena such as precipitation, snow or ice (but also destruction by animals) may even cause the temporary failure to record observations. Likewise, corrosion is a common problem linked to system failure in weather stations and data loggers. Examples of problems that occur when weather stations are not regularly inspected and maintained can be easily found (e.g. http:// www.surfacestations.org/odd_sites.htm). Ensuring regular maintenance of equipment and continuity in the collection of station metadata is critical to long-term station operation and interpretation of the data (e.g. NRCS, 2009). Although ﬁeld maintenance and calibration of ⁎

instrumentation is expected to be performed by the respective agencies, budgetary and staﬃng limitations may prevent routine inspections of weather stations. Consequently, inattention to maintenance has often been identiﬁed as the greatest source of failure in weather stations and networks (Davey et al., 2006, 2007). The process of replacing missing data with substituted values (imputation) is known in ﬁelds like meteorology, ecology, climatology or geosciences (e.g. Zhang and Schultz, 1990; Simolo et al., 2010; McCandless et al., 2011; Taugourdeau et al., 2014). In climate change impact studies, in particular, a good resolution of weather data in time and space is needed for creating complete datasets of past climate as a basis for projecting future climate realizations (Ruane et al., 2015). Missing data pose the challenge (both conceptual and technical) of ﬁnding a suitable way for replacing the missing values with a prediction (Schneider, 2001; Gelman and Hill, 2006). Indeed, the development of approaches for ﬁlling gaps in weather time series is a key challenge for improving data usability, either for statistical description of the meteorology of a given weather station or in the view of complex climatological and ecological studies (e.g. Linacre, 1992). At the same time, there is a need to preserve the existing dataset by replacing missing data with a “probable value”. The latter depends on other available

Corresponding author. E-mail address: [email protected] (G. Bellocchi).

http://dx.doi.org/10.1016/j.compag.2017.09.033 Received 23 March 2017; Received in revised form 18 September 2017; Accepted 23 September 2017 0168-1699/ © 2017 Elsevier B.V. All rights reserved.

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

the understanding that the Kriging interpolation method (whose parameterization is developed on the available observations) holds great potential to reconstruct missing temperature data (the focus of this study). Moreover, comparatively to other MAR methods, it can also be relatively easily implemented and ﬂexibly applied. There are several versions of the Kriging method (Supplementary material). Here, we use the Ordinary Kriging (OK), which is computationally practical and easier to implement than other Kriging variants, that is, it is not necessary neither external data nor an additional knowledge to determine its parameters. Kriging permits to compute an unsampled value (z), knowing its coordinates (x, y) and neighbours. The Kriging methods are generally applied for data spatialization, where the spatial dimension concerns the position of the data and the distance between location points is taken into account. To give the prediction of z (̂ s0) to the unknown value z (s0 ) , the general equation of the OK is a linear combination of n known sample values at points si around s0, based on the vector of n observations at primary locations and the vector of Kriging weights. The latter reﬂect the spatial structure of the method, taking into account the neighbour observations. To compute the Kriging weights we followed Matheron (1963) and Gandin (1963), who introduced correlation functions between neighbouring values, also called semi-variances (Hengl, 2007, 2009). The experimental semi-variance is computed based on the lag vector representing separation between two spatial locations, the vector of spatial sample coordinates, and the number of sample pairs separated by lag. The formula is applied on the pairs of points. Using this formula, a semivariogram is produced to describe the spatial auto-correlation of the variable. Compared to other methods (e.g. covariance), the key advantages of semi-variograms are: the method is based on raw data and requires no pre-calculated indicators (e.g. means, minima or maxima); both linear and non-linear changes can be detected; the identiﬁed change is in relation to expected dynamics, which represent the whole series considered, not just the start and end of a given series; and the dynamics of the series can be analysed, as changes in semi-variogram parameters relate to changes in diﬀerent aspects of the series. In general, the semi-variogram is ﬁtted using authorized variogram models like linear, spherical, exponential, Gaussian, etc. A semi-variogram is described by its sill (the semi-variogram upper bound), practical range (the distance at which the semi-variogram reaches the sill) and nugget eﬀect (a discontinuity of the semi-variogram that can be present at the origin, typically attributed to micro-scale eﬀects or measurement errors). An important hypothesis is that the Kriging assumes some form of stationarity in the variable under study. This could be a limitation in its use and generalization because time series are often statistically described as a random process in which extremes and seasonalities are more dominant than autocorrelation and stationarity (e.g. Cressie and Wikle, 2011). Diﬀerent types of non-stationarity exist (e.g. Meul and Van Meirvenne, 2003; De Benedetto et al., 2012) and, in practice, stationarity is often not guaranteed for large distances (Rivoirard, 2005). For instance, a phenomenon may appear stationary locally, whereas it may be non-stationary over longer distances, e.g. in the presence of a large-scale trend. Stationarity is an important condition that is relevant to gap-ﬁlling in general (Seitchik, 2012). In this study, interpolations are based on the gradients between points of the nearest neighbours in the time series. Missing values are ﬁlled in based on all the characteristics inherent to the historical set. They are used within a geo-statistical interpolation method to calculate the best linear unbiased predictor in the geo-statistical context (e.g. Journel, 1986) by accommodating records by means of a memory-based process (after concepts by Enzi and Camuﬀo, 1996; Diodato and Bellocchi, 2014). Using the memory-based concept, any temperature value in a time series is aﬀected by its previous neighbours and, in turn, aﬀects its next-neighbour values. It is based on these principles that interpolation techniques have been developed to handle continuous values using data points on either side of the data gap (Koehler, 1977; Pielke, 1984; Miller, 1990).

information, including historical statistics and the climate conditions of the place of interest. This is where some understanding of the process under study is required because the density, distribution and range of the variable used to create an imputation function help determine which technique is most suitable for the gap-ﬁlling purpose. Techniques for ﬁlling missing observations in a time series include temporal interpolations, and rely on the relationship between available and missing data (Henn et al., 2013). There are three general explanations for missingness of the data (Rubin, 1987): Missing At Random (MAR), Missing Completely At Random (MCAR) and Missing Not At Random (MNAR). If values are missing at random (MAR), then the available data may be representative of the population. MCAR is a special case of MAR and occurs when the events leading to data missingness are independent on both observable and unobservable factors. MNAR happens when the missing values depend on other missing values, that is, one or more factors are impossible to quantify and identify (Schafer and Graham, 2002). The prediction of missing weather data can rely on the MAR mechanism, because the exact moment when one device stops functioning is generally unknown (here, we exclude situations in which recurrent problems are identiﬁed, e.g. weather stations suﬀering power supply overload or voltage violation during long periods, e.g. Cheng et al., 2009). In this study, imputation is applied to a weather variable, i.e. air temperature (sampled at both hourly and daily resolutions), an important factor controlling most physical, chemical and biological processes on Earth, which are key to many environmental studies and the management of Earth surface resources. For instance, air temperature is a common indicator of ecological eﬃciency as it is related to fundamental characteristics of natural and agricultural systems, e.g. plant processes such as growth, development, sugar partitioning and stress sensing and response (e.g. White et al., 2005). Most agro-climatic indicators are based on air temperature (alone or in combination with other meteorological variables) in order to predict conditions of drought stress based on water balance calculations (e.g. Matthews et al., 2008). Energy-based equations have also been put forward for climate characterization, where the balance of incoming radiation (playing a major role in ﬁeld ecology) can be predicted by air temperature data (e.g. Bellocchi, 2011; Bojanowski et al., 2013). In urban studies, the diﬀerence in air temperature between urban and rural locations within a given time period is a frequently used metric to describe heat islands (Fabrizi et al., 2010). The above examples reﬂect the importance of air temperature in applications of wide scientiﬁc and engineering interest. Although this primary variable is available in many locations, temperature data series can contain gaps ranging from several hours to several days. Aim of this study is to assess a methodology for gap-ﬁlling of hourly and daily temperature series based on a learning process contingent on historical data series. Geo-statistical techniques, in particular, for which near spatial data values are more related to each other than distant data values (Tobler, 1970), are suﬃciently versatile to be extended to the temporal dimension (e.g. Rossi and Posa, 1991; Guccione et al., 2012; Zehn and Cai, 2015). This study provides an approach for exploring and assessing the skill of a geo-statistically based method (1-dimensional case Kriging interpolation) by randomly removing available temperature data from time series at a variety of sites and then evaluates the ability of the technique to reconstruct the gaps, with interest in the impact of the length of missing periods and the fraction of overall missing data. 2. Methodology 2.1. Kriging-based imputation of missing data In Supplementary material, we provide a brief review of the main approaches for the treatment of MAR data in general (also taking into account developments in data missingness made in other domains, e.g. medical surveys, Blankers et al., 2010). Through this review, we gained 441

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

Table 1 List of available sites. Country

Site

Latitude (°N)

Longitude (°E)

Altitude (m a.s.l.)

Time resolution

Years of available data

Germany Italy Switzerland France (metropolitan) France (overseas) United Kingdom

Grillenburg Monte Bondone Oensingen Laqueuille Sedael Rothamsted

50.95 45.98 47.29 45.65 −21.30 51.80

13.50 11.03 7.71 2.73 55.55 -0.36

375 2138 466 1020 1000 132

Hourly Hourly Hourly Hourly Hourly Daily

2004–2008 2003–2007 2002–2009 2002–2011 2006 1981–2011

evaluated by comparing the ﬂuctuations of the observed and the gapﬁlled datasets, while also inspecting extreme values possibly generated by the Kriging. In addition, a set of performance indices was calculated:

To account for the inﬂuence of local factors on the variable of interest, a data-driven machine learning-based inferencing and exploration framework was developed to extract the values of relevant parameters (still, practical range and nugget eﬀect; Supplementary material, Section 3) for a posterior use in Kriging prediction (Hengl, 2007, 2009). The range, by itself, does not provide suﬃcient guidance to determine the Kriging neighbourhood (Cressie, 1993). For that, with data placed on the time scale, a leave-one-out cross-validation approach was used to make the maximal use of the data available, that is, one value was temporary removed from the existing dataset where the remaining values were used by the OK as learning set (Appendix A).

• The mean bias error (BIAS), i.e. the mean diﬀerence between the • •

2.2. Evaluation of the method To illustrate the generation process of gaps, it was calculated how many times a given hour was chosen to be a gap with gaps of given length. This test, in which gaps may overlap, served to check if cases of nested or chained gaps were handled and how the gaps were positioned. The performance of the Kriging-based imputation method was assessed against temperature datasets from ﬁve sites in Europe and one site situated in the Indian Ocean (Réunion Island, France overseas) (Table 1). For four of the European sites and the Réunion, hourly temperature data were available while for Rothamsted (United Kingdom) the multi-year dataset included daily maximum and minimum temperatures. Daily maximum and minimum temperature values were extracted from all hourly datasets, with the exception of Sedael, for which the only year available was insuﬃcient to obtain a convenient set of data. To evaluate the method developed, it was necessary to generate missing values from the complete datasets. For the testing purpose, gaps of one to six hours and of one to six days were generated on hourly and daily datasets, respectively. The hourly data used for the tests were based on the year 2006, extracted from all the sites with hourly resolution. A number of gaps equal to 18% of the total length of the available dataset was applied to both hourly and daily series, with gaps randomly located into the series. In this way, gaps were obtained that were not equally separated and could also have adjacent position. The chosen proportion of 18% corresponds to ∼90% of total gaps when ﬁve-hour length gaps were created, thus leaving ∼10% of data available for learning (with six-hour length gaps, the generation stops at 100% holes and the study gives an empty period, based solely on statistical learning). The method was tested with all the types of gaps, with 200 runs (200 simulations of the same year) at each site with the corresponding type of data (hourly or daily). At the beginning of each simulation, a new gapped set was generated. The method was ﬁrst

•

predicted and observed values, indicates the tendency of the method to over-predict (positive values) or under-predict (negative values) data on average. The mean absolute error (MAE) is a common measure of the average error magnitude, which is more appropriate than the root mean square error (Willmott and Matsuura, 2005) when the error distribution is expected not to be Gaussian (Chai and Draxler, 2014). The mean absolute percent error over the mean (MAPE) compares diﬀerent series of data of diﬀerent magnitude (Armstrong, 1985). For both MAE and MAPE, the further is the value from 0, the larger is the error by the method. The agreement between predictions and observations was also evaluated by using the modeling eﬃciency (EF) by Nash and Sutcliﬀe (1970), ranging from −∞ to 1 (the latter being the optimum value, while negative values indicate that the observed mean is a better predictor than the method).

3. Results and discussion 3.1. Semi-variogram analysis The spherical variogram model was used to calculate Kriging parameters (still, practical range and nugget eﬀect) because the relationship with the distance on the time scale varied too much with other models (data not shown). Figs. 1 and 2 show the semi-variogram plots with the ﬁtted model at each site for both hourly and daily (minimum and maximum) temperatures. Some series exhibit evidence of trend in the data, manifesting itself as the rising portion of the experimental semi-variogram. This behaviour of sample semi-variances, which is especially evident in Fig. 1 (with the exception of Sedael), may indicate that data (and the random process) are non-stationary in time due to the presence of a trend in the mean, and the stationarity as a null hypothesis is rejected (Cressie, 1993; Cressie and Wikle, 2011). However, with the exception of hourly temperature series at Grillenburg (Fig. 1), the monotonic increasing trend of the empirical semi-variograms does not manifest itself before the sill value is reached, which would indeed indicate the presence of diﬀerent phenomena (which may or not have a physical interpretation) and the need of calculating other semi-variograms (e.g. Pardo-Igúzquiza et al., 2005). It is therefore of limited consequence for local prediction of values (i.e. prediction of

Fig. 1. Sample semi-variances (blue continuous line) of the hourly temperature at ﬁve sites with spherical model (orange dotted line) ﬁtted to them. The range/sill point of the model is denoted by a squared dot. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

442

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

Fig. 2. Sample semi-variances (blue continuous line) of the daily minimum (top) and maximum (bottom) temperatures at ﬁve site with spherical model (orange dotted line) ﬁtted to them. The range/sill point of the model is denoted by a squared dot. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

hour length gaps were created in a single year. The bars represent the mean position of the hours chosen to be a gap during the ﬁrst 200 h in a one-year simulation (24 ∗ 365 = 8760 h). Fig. 3b gives the probability, over 200 simulations (i.e. 200 repetitions of one-year simulation), for a given hour to be missing based on ﬁve-hour length gaps. Fig. 3c, d represents the proportion of gaps for the previous two cases, calculated on the whole year. Long gaps (in number of hours) clearly limit the number of available observations. This is visible in Fig. 3c, which shows the proportion of gaps of ﬁve hours, while Fig. 3d gives the proportion of gaps of ﬁve hours during one year and 200 runs.

values separated by less than the range of correlation from the weighted points). This is satisfying from the viewpoint of prediction, since typical Kriging predications depend primarily on semi-variogram values at short distances (e.g. Nordman and Caragea, 2008). Due to this fact, consistent with previous studies (e.g. Dowdall et al., 2005), no attempt was made to de-trend the data using an appropriate function (e.g. Universal Kriging, Cressie, 1993). The semi-variogram parameters indicate that we do not deal here with the nugget eﬀect, which means there are no variations for little distances. With hourly temperatures (Fig. 1), the practical range varies between 2 (Laqueuille, France) and 5 (Grillenburg, Germany) hours. At Grillenburg, where both range and sill are high, the semi-variogram is well-structured indicating that paired samples have inﬂuence on each other regardless of the distance. With daily resolution (Fig. 2), a set of well-structured semi-variograms also conﬁrmed the presence of a temporal dependence structure in the data. In particular, the French site of Laqueuille showed high values of both sill and range.

3.3. Fluctuations of inﬁlled datasets After obtaining the new values for the missing slots, we compared the original and the inﬁlled datasets. With hourly data series, the assessment was based on the year 2006, the only available for the site of Sedael and common to any other site of Table 1. An average predicted value of temperature was obtained thanks to 200 simulations. All the imputations of hourly missing data were made with the Kriging based on eight neighbours surrounding the missing value. Some illustrative results are presented, which show that the general tendency of distributions was respected at all the sites. At Grillenburg, for instance, the

3.2. Generation of missing data Fig. 3a shows (for a situation of temperatures in January and February) how many times a given hour was chosen to be a gap when ﬁve-

(a)

number of times one hour is used as a gap

3.5 3.0

(b)

probability of an hour to be a gap

1.0

2.5

0.8

2.0

0.6

1.5 1.0

0.4

0.5

0.2 0.0

0.0

hour (1 to 200)

(c)

hour (1 to 200)

(d) 17.69%

16%

82.31% 84%

443

Fig. 3. Representation of the gap creation process (gaps of ﬁve hours length, including overlapping gaps). During the simulation the gaps are represented by temperature data from randomly selected missing hours. In (a) the bars represent the selected missing hours during the ﬁrst 200 h of a given yearly simulation. In (b) the bars represent the probability for an hour to be a gap (probability obtained during 200 repeated yearly simulations). In (c) and (d), the pie charts of the yearly simulations of (a) and (b), respectively are presented, with the proportions of missing data (red) and available hourly observations (blue) during the ﬁrst 200 h of a simulation. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

Fig. 4. Predicted and observed temperature values for one hour (a) and four hour (b) length gaps at Grillenburg (Germany) in the months January and February of 2006.

Teŵperature (°C)

Teŵperature (°C)

Grillenburg / January-February / 1h00 gaps

20 15 10 5 0 -5 -10 -15

20 15 10 5 0 -5 -10 -15

(a)

observaƟons

esƟŵates

hours

(b)

Grillenburg / January-February / 4h00 gaps observaƟons

esƟŵates

hours

ﬂuctuations of observed and gap-ﬁlled hourly data for the months January and February compared well with gaps of both one (Fig. 4a) and four hours (Fig. 4b). Some extremes were introduced by the replaced data: when the extreme is situated at the beginning of the dataset this could be considered as a frontier problem, due to the limited number of neighbours for the Kriging. When the extreme is at a different place, problems can arise if the ﬁrst closest neighbour of the missing value is placed far. If the neighbour is an extreme itself, this could be a problem for the generation of the new value because the method does not take into account this situation and the predicted value could also be extreme. A good ﬁtting was obtained with four-hour length gaps at Oensingen in July-October with only small diﬀerences between predicted and actual values (Fig. 5a). Our results indicate that the performance of the method at the mid-mountain sites (∼1000 m a.s.l.) of Laqueuille (Fig. 5b) and Sedael (Fig. 5c) was also correct, with only a few extremes introduced by the method. At the high-altitude site (> 2000 m a.s.l.) of Monte Bondone (Fig. 5d), we observed more extremes and, particularly, a big distortion in August. This indicates that the accuracy of the method may be limited at places located in high-elevation regions. Big ﬂuctuations in temperature series actually occur at high-elevation sites due to a combination of largescale and local inﬂuences (e.g. Barry, 1994; Pepin and Norris, 2005) likely hindering their representativeness (Bhowmik and Costa, 2014), distorting the learning process and, ultimately, altering the ﬁnal predictions. Nonetheless, this argumentation requires to be further investigated. In fact, given the limited number of sites in this study, we cannot identify the speciﬁc impact of elevation as primarily responsible for observed responses. Daily maximum and minimum temperature data were available at Rothamsted. For the other sites, maximum and minimum daily temperatures were extracted from the hourly series. In this case, the appropriate number of neighbours, obtained with the statistical learning based on a leave-one-out cross-validation approach, was diﬀerent across sites. We assessed two datasets of results over multiple years (A and B), each of them corresponding to a diﬀerent neighbourhood (Table 2). For maximum daily temperature, the ﬂuctuations of real and gap-ﬁlled values obtained at Rothamsted (Fig. 6a, years 2001–2011) with the two sets of neighbourhood were similar, though the diﬀerence between the distributions (extremes) is more important compared to the ones obtained with hourly data. The ﬂuctuations of Oensingen (Fig. 6b, years 2005–2009) were more limited but some extremes are still visible, which were not reﬂected by the method. Finally, the tests

performed at Laqueuille (Fig. 6c, years 2004–2009) showed that the two ﬂuctuations follow almost the same tendency, though the gapﬁlling method produced bigger extremes than the real ones. 3.4. Performance metrics The means of observed and predicted values were compared at each site on both hourly and daily series. The comparison of the mean observed value and predicted values is based on the average value of 200 runs for all the types of gap length at each site. For hourly data, Table 3 shows that the means and standard deviations between predictions and observations are comparable, in spite of the extremes observed above. Performance indices are in Table 4. With the exception of Laqueuille and Sedael, negative biases indicated a tendency of the method to under-predict data on average, especially at the high-altitude site of Monte Bondone (BIAS = −0.08). The greatest mean absolute error (MAE ∼ 1.4) was at Grillenburg. According to the mean absolute percent error over the mean, Sedael (MAPE ∼ 5%) and Monte Bondone (MAPE > 18%) are the easiest and the most diﬃcult site to gap-ﬁll, respectively. The diﬃculty to predict missing temperatures at Monte Bondone is conﬁrmed by the negative value of Nash-Sutcliﬀe eﬃciency calculated at this site, while EF was higher than 0.8 at the other sites. The conclusion for the imputation of missing hourly temperature data is that the method gives satisfactory results, although some limitations are apparent at the mountain site of Monte Bondone (2138 m a.s.l.). An indication (to be further explored) is therefore that the validity domain of the method could be limited to stations below 2000 m a.s.l. On the other hand, we detected some extreme values produced by the method and not present in the observational series, or not detected by the method in the observational series when present. Considering that OK cannot fully take into account all the extreme values, the production of (unobserved) extremes could be limited by applying a minimal and maximal distance between the missing value and the neighbour chosen to give support information to the method. Further studies could provide probability predictions of extreme values that have reliable uncertainty estimates (e.g. Coles, 2001). The average values and the performance metrics were calculated on daily temperature data series with two sets of neighbours (Table 2), identiﬁed at each site for maximum and minimum daily temperatures. Table 5 (average values and standard deviations) and Table 6 (performance metrics) show that maximum temperature missing data are better predicted than minimum temperature data. According to BIAS, 444

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

Fig. 5. Predicted and observed real temperature values for gaps on hourly data (year 2006) with four hour length gaps at four sites: (a) Oensingen (Switzerland); (b) Laqueuille (France); (c) Sedael (La Réunion); (d) Monte Bondone (Italy).

Oensingen / July-October / 4h00 gaps

Teŵperature (°C)

real value esƟmated value observaƟons esƟŵates

40 30 20 10 0 -10 -20 -30

(a)

hours Laqueuille / July-October / 4h00 gaps

(b)

observaƟons esƟŵates real value esƟmated value

Teŵperature (°C)

40 30 20 10 0 -10 -20

hours Sedael / July-October / 4h00 gaps

Teŵperature (°C)

30

observaƟons esƟŵates real value esƟmated value

(c)

25 20 15 10 5 0

hours MonteBondone / July-October / 4h00 gaps

Teŵperature (°C)

observaƟons

50 40 30 20 10 0 -10 -20

esƟŵates

(d)

hours

minimum daily temperatures, all the values of the modeling eﬃciency are greater than 0.6 with set A and go down to 0.36 with set B at Monte Bondone. Overall, this analysis indicates that the smaller the neighbourhood necessary to parameterize the method the better are the results. This factor requires to be assessed during the learning procedure, which informs of the number of neighbours leading to satisfactory results in the gap-ﬁlling. The application of the learning procedure (Appendix A) is therefore required to assess alternative parameter values by considering the number of neighbours together with other Kriging parameters.

Table 2 Two sets of neighbourhood at each site. Site

Neighbourhood A

Neighbourhood B

Rothamsted Grillenburg Oensingen Laqueuille Monte Bondone

14 14 20 23 22

26 16 22 28 24

some diﬀerences are apparent depending on the set of neighbours. At Oensingen, for instance, we observed a diﬀerence in the sign of bias for minimum temperatures (set A: +0.07, set B: −0.27). The mean absolute error (MAE) showed the same pattern with elevation for both maximum and minimum daily temperatures with no important diﬀerences between sets of neighbours (slightly higher values at Rothamsted and Monte Bondone with set B). Similarly, with MAPE the two sets gave similar results. In spite of some diﬀerences between maximum and

4. Summary and conclusions 4.1. Summary of ﬁndings of the gap-ﬁlling analysis Based on a critical review of current gap-ﬁlling techniques, a Kriging-based method for inﬁlling missing air temperature data was evaluated. The method makes a screen on available air temperature 445

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

Fig. 6. Predicted and observed daily maximum temperature values (Tmax) for three-hour length: (a) Rothamsted (United Kingdom), number of neighbours equal to seven; (b) Oensingen (Switzerland), number of neighbours equal to 10; (c) Laqueuille (France), number of neighbours equal to 13.

Rothaŵsted / Tŵax / 7 neighbours / 2001-2011

Teŵperature (°C)

50

(a)

observaƟons

esƟŵates

40 30 20 10 0 -10

day

-20

Oensingen / Tŵax / 10 neighbours / 2005-2009

Teŵperature (°C)

40

(b)

observaƟons

esƟŵates

30 20 10 0 -10

hours

-20

Laqueuille / Tŵax / 13 neighbours / 2004-2009 40

observaƟons

(c)

esƟŵates

Teŵperature (°C)

20 0 -20 -40 -60 -80

hours

-100

data at a given site and extracts the most convenient parameter values for the Kriging technique. A learning process was applied to describe (statistically) the site, and the Kriging method (adapted to the temporal dimension) predicted (artiﬁcially generated) gaps in the original dataset. There is an element of originality in this study since Kriging interpolation and learning process, usually applied in isolation in existing studies, were joined in an integrated approach to impute missing values in observational air temperature time series. Based on this analysis with artiﬁcial gaps superimposed on actual datasets, better results were obtained when the method was applied to ﬁll hourly gaps. Indeed, the hourly resolution facilitates the memorization of the general proﬁle of diurnal temperature variations during the learning period and, ultimately, helps ﬁnding missing values during the Kriging application stage. In contrast, daily data increase the chance to have errors during the learning process because only minimum and maximum temperatures are used. They only represent the lowest and the highest points of the pattern of temperatures in the 24 h of a day, which may diﬀer day by day, according to the location of the sites (elevation and geographic context). For instance, air temperature lapse rate (i.e. the rate of decrease of near-surface air temperature with increase of elevation) is usually greater for maximum than minimum temperatures (Thornton et al., 1997). This is reﬂected in the limited success of the method in predicting minimum daily temperatures. In this study, we have not identiﬁed the best approach to predict missing temperature data, have not derived an approach of general validity, and are not claiming to extend it to other weather variables. Rather, we have pursued the question: “to which extent a Kriging-based

Table 3 Average values of predicted and observed hourly temperature data per site (six types of gaps and 200 runs per type and per site). Site

Grillenburg Oensingen Réunion Laqueuille Monte Bondone

Observations

Predictions

Mean (°C)

Standard error (°C)

Mean (°C)

Standard error (°C)

9.31 10.30 18.14 9.30 6.34

0.06 0.10 ∼0.00 0.01 0.09

9.37 10.40 18.14 9.28 6.43

0.01 0.02 0.01 ∼0.00 0.04

Table 4 Performance metrics (six types of gaps, 200 runs per type and per site) on hourly predictions. Site

Mean bias (°C)

Mean absolute error (°C)

Mean absolute error over the mean of observed values (%)

Mean modeling eﬃciency

Grillenburg Monte Bondone Oensingen Laqueuille Sedael

−0.06 −0.08 −0.07 0.01 0.00

1.38 1.16 0.97 1.01 0.88

14.81 18.33 9.40 11.89 4.84

0.92 −0.10 0.96 0.94 0.86

446

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

4.2. Future work on the Kriging-based method for gap-ﬁlling

Table 5 Average values of predicted and observed maximum and minimum daily temperatures (six types of gaps, 200 runs per type and per site) for two sets of neighbourhood (A and B). Site

Observations Mean (°C)

Kriging interpolation can be attractive to ﬁll in gaps in hourly temperature time series. More caution is instead required with daily data. Considering that further assessment is necessary, an indication is that the approach proposed here can be a valid and eﬃcient method to ﬁll in hourly temperature gaps at sites below 2000 m above sea level. When daily data series are to be gap-ﬁlled, the indication from this study is that the best predictions can instead be obtained at lower elevations (roughly below 500 m). Some caution has therefore to be exercised when applying the methods outside such constraints. These results provide valuable information on the degree of uncertainty that gap-ﬁlled data may introduce into temperature datasets which, when used for modeling or experimental analysis purposes, will be propagated through to the ﬁnal results. Uncertainty logically increases with gap length, which may depend on seasonal patterns. Knowledge of the uncertainty in the predicted values used to ﬁll gaps may help when interpreting results from models and experimental analysis. In climate change studies, for instance, the answer that scientists and institutional entities provide to the questions related to temperature increase (or decrease), frequency of extreme temperatures, relation of temperature with rainfall data, etc. are aﬀected by the way missing data have been ﬁlled in. This has implications on how imputed data are used and interpreted within a decision support system, for strategic planning or policy development (Rivington et al., 2006). The resolution of all these issues will require further studies. Moreover, this study has not been exhaustive because there are other imputation methods that could also be explored. For instance, Bayesian variants of Kriging, which put a prior distribution on parameters and update the distribution using the data (e.g. Truong et al., 2014), or conditional covariance functions (e.g. Strebelle, 2002), may be suited to overcome some of the problems inherent to the Ordinary Kriging. Further assessment includes formal evaluation of the Kriging-based approach in a comparative fashion with a wide range of gap-ﬁlling methods (which would make a companion paper to the present analysis) by including, for instance, simpliﬁed time series models (e.g. Pappas et al., 2014), ARIMA (Autoregressive Integrated Moving Average) and SARIMA (Seasonal Autoregressive Integrated Moving Average) imputation models (Kihoro et al., 2013) or direct modeling of diurnal data patterns (for hourly temperature), while also accounting better for the spatial dimension (e.g. Holmes et al., 2013) or extreme learning machine-based models (e.g. Mohammadi et al., 2015). The results obtained here, limited to a few weather stations, can inform subsequent multi-location assessments and extensions for ﬁlling gaps in time series that result from the measurement of other meteorological variables (for instance, evaporation). However, our tests (data not shown) inform that the use of similar approaches cannot be recommended to ﬁll data gaps in high-resolution wind speed and precipitation time series. Owing to the presence of zero-values in the datasets (which create asymmetrical boundaries), data of this type present indeed particular challenges for the development and validity of imputation approaches (e.g. Gustard and Demuth, 2008).

Predictions Standard deviation (°C)

Mean (°C)

Standard deviation (°C)

Neighbourhood A Maximum daily temperature Rothamsted 13.71 Grillenburg 13.85 Oensingen 14.65 Laqueuille 12.09 Monte Bondone 9.85

0.04 0.06 0.42 0.36 0.12

13.75 13.89 14.81 12.44 9.75

0.02 0.04 0.12 0.09 0.07

Minimum daily temperature Rothamsted 6.04 Grillenburg 4.81 Oensingen 6.17 Laqueuille 5.43 Monte Bondone 2.82

0.07 0.16 0.06 0.17 0.15

6.11 4.97 6.12 5.59 2.96

0.01 0.02 0.20 0.06 0.03

Neighbourhood B Maximum daily temperature Rothamsted 13.71 Grillenburg 13.85 Oensingen 14.65 Laqueuille 12.09 Monte Bondone 9.85

0.04 0.03 0.45 0.11 0.07

13.88 13.87 14.90 12.40 9.88

0.04 0.03 0.45 0.11 0.07

Minimum daily temperature Rothamsted 6.04 Grillenburg 4.81 Oensingen 6.17 Laqueuille 5.42 Monte Bondone 2.82

0.11 0.11 0.26 0.27 0.08

6.15 4.85 6.44 5.68 2.89

0.02 0.10 0.17 0.07 0.04

Table 6 Performance metrics (six types of gaps, 200 runs per type and per site) on daily predictions for two sets of neighbourhood (A and B). Mean bias (°C)

Mean absolute error (°C)

Mean absolute error over the mean of observed values (%)

Mean modeling eﬃciency

A

B

A

B

A

B

A

B

Maximum air temperatures Grillenburg −0.05 Monte Bondone 0.10 Oensingen −0.15 Laqueuille −0.35 Rothamsted −0.04

−0.03 −0.02 −0.25 −0.31 −0.17

0.48 0.52 0.85 1.19 1.87

0.49 0.51 0.85 1.20 2.12

3.47 5.23 5.78 9.84 13.66

3.51 5.17 5.71 9.95 15.44

0.88 0.59 0.86 0.61 0.69

0.88 0.70 0.85 0.75 0.76

Minimum air temperatures Grillenburg −0.16 Monte Bondone −0.15 Oensingen 0.07 Laqueuille −0.16 Rothamsted −0.07

−0.05 −0.07 −0.27 −0.26 −0.11

0.50 0.39 0.75 1.05 2.11

0.52 0.52 0.75 1.09 2.37

10.38 18.21 12.07 0.65 34.93

10.86 18.42 12.09 20.02 39.26

0.75 0.70 0.75 0.65 0.68

0.68 0.36 0.77 0.61 0.51

Site

Acknowledgement interpolation approach (adapted to the temporal dimension) can help ﬁlling in missing values in temperature time series?”. In trying to answer this question, we oﬀered new insights on gap-ﬁlling. A learning process combined to Kriging-based interpolation appears suﬃciently original and, to the best of our knowledge, has not been applied before to predict and analyse missing temperature values. The agricultural and ecological informatics community would thus beneﬁt from both the outcomes and the scripts produced by this research, so others may reproduce this research or extend it. The Java scripts implementing the algorithm for gap-ﬁlling are part of the set of tools of the modeling platform Vul’Clim for vulnerability assessment in agriculture (Bellocchi et al., 2014; Eza et al., 2015) and are available on request to authors.

The study was developed with funding from a grant (Bourse Recherche Filière) of the (former) region Auvergne of France and FEDER (European Regional Development Fund), and within the frame of an international research project named “FACCE MACSUR – Modelling European Agriculture with Climate Change for Food Security, a FACCE JPI knowledge hub”. The authors are thankful to Nazzareno Diodato (Monte Pino Met European Research Observatory, Italy) for his valuable guidance and constructive feedbacks. Appendix A. Imputation method The imputation method of missing temperature data implemented 447

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

in this study consists of two steps. In the ﬁrst step, the learning process is applied and statistical methods are used to extract values of Kriging parameters from the available data. In the second step, the adapted version of the Kriging is applied to ﬁll in the gaps of temperature data series.

After the prediction we kept the best results about the neighbours and we use it during the next section. A.2. The Kriging approach The Kriging approach that we suggest is based on the Kriging method described previously and on the parameter values extracted during the learning process. Our implementation follows the steps:

A.1. The learning process The learning process consists in the use of statistical methods for ﬁnding the most convenient parameters used by the Kriging. A twostage learning process was implemented, based on the creation of a statistical screening of the known dataset, followed by the application of cross-validation, using both the Kriging and the screening description.

i. Creation of a group of neighbours. The group of neighbours is deﬁned in the way of the screening description of the data during the learning process. ii. Prediction of the sill and the range. The prediction of the sill and ranges uses the sills and ranges obtained during the learning process. In addition we deﬁne a list of the closest neighbours Lcn for the missing measure. Thanks to this list, we deﬁne an average value for the sill and the ranges using the predicted sills and ranges by the learning process. iii. Prediction of the value for the missing measurement. Before using the Kriging method, we deﬁne an interval containing the most convenient values about the predicted value. This interval has limits (Ld ;Lu ) deﬁned as: Ld = M −5%·M and Lu = M + 5%·M where M is the average value of the neighbours obtained during the learning process. If the prediction belongs to the interval it is saved if not, it the missing measure is memorized. iv. Check. In this step we check if all missing measures has been predicted, if there are still gaps we iterate the process till predicting all the values and if there are still values that are not predicted we apply the linear interpolation. The linear interpolation is applied approximatively between 25% and 50%, depending on the type of the data, the size of the gap and the number of missing measurements.

(a) Statistical screening. The screening representation of the data consists in the creation of N datasets from the historical dataset which has N values. This step is necessary by the cross validation method (Daly et al., 2008). For the cross-validation purpose, each value is ﬁrst removed and then predicted using the Kriging method. Thus each new dataset has N −1 values. Then, for each missing value created by cross-validation, a number of neighbours is considered for comparison of the results obtained by cross-validation. For predicting missing hourly data, this number is about eight neighbours while for daily data it may vary (the number of neighbours is detailed in the section corresponding to experiences). After memorizing the number of neighbours for each missing measurement, the Kriging is applied. (b) Cross-validation and Kriging. After creating N datasets, the missing measurement is predicted and then compared it its real value. The Kriging approach (adapted to this study) follows the steps: (c) Construction of the theoretical variogram using the deﬁned group of neighbours. The group of neighbours being deﬁned in the screening description, we use it for designing the variogram. After the variogram is obtained we use its values for the next step. (d) Computation of the most convenient sill and range. The variance Var of the group of neighbours gives us a good prediction of the sill of the variogram. Before the prediction of the sill value we proceed to some notations. Let us note ﬁrst Yval the value from the variogram and corresponding to Yval·Var . Then, second notation Xval is the x value corresponding to Yval . In this case, we denote with X = x min−x max the distance between the minimum and the maximum x values of the variogram.

Appendix B. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.compag.2017.09.033. References Armstrong, J.S., 1985. Long-Range Forecasting: From Crystal Ball to Computer. John Wiley Sons Inc, New York. Barry, R.G., 1994. Past and potential future changes in mountain environments. In: Beniston, M. (Ed.), Mountain Environments in Changing Climates. Routledge, New York, pp. 3–33. Bellocchi, G., 2011. A comment on “Using analogies from soil science for estimating solar radiation” by Fodor and Mika. Agric. Forest Meteorol. 151, 1293–1294. http://dx. doi.org/10.1016/j.agrformet.2011.05.006. Bellocchi, G., Martin, R., Shtiliyanova, A., Ben Touhami, H., Carrère, P., 2014. Vul’Clim climate change vulnerability studies in the region Auvergne (France). In: FACCE MACSUR Mid-term Scientiﬁc Conference, ‘‘Achievements, Activities, Advancement’’, April 01–04, Sassari, Italy. < http://ocs.macsur.eu/index.php/Hub/Mid-term/ paper/view/209/15 > (Accessed on the 22.03.2017). Bhowmik, A.K., Costa, A.C., 2014. Representativeness impacts on accuracy and precision of climate spatial interpolation in data-scarce regions. Meteorol. Appl. 22, 368–377. http://dx.doi.org/10.1002/met.1463. Blankers, M., Koerter, M.W.J., Schippers, G., 2010. Missing data approaches in eHealth research: simulation study and a tutorial for non-mathematically inclined researchers. J. Med. Intern. Res. 12, e54. http://dx.doi.org/10.2196/jmir.1448. Bojanowski, J.S., Donatelli, M., Skidmore, A.K., Vrieling, A., 2013. Auto-calibration method of temperature-based solar radiation models. Environ. Model. Software 49, 118–128. http://dx.doi.org/10.1016/j.envsoft.2013.08.002. Chai, T., Draxler, R.R., 2014. Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7, 1247–1250. http://dx.doi.org/10.5194/gmd-7-1247-2014. Cheng, D., Zhu, D., Broadwater, R.P., Lee, S., 2009. A graph trace based reliability analysis of electric power systems with time-varying loads and dependent failures. Electric Power Syst. Res. 79, 1321–1328. http://dx.doi.org/10.1016/j.epsr.2009.04. 007. Coles, S., 2001. An Introduction to Statistical Modeling of Extreme Values. SpringerVerlag, London. Cressie, N.A.C., 1993. Statistics for Spatial Data, revised ed. Wiley, New York. Cressie, N.A.C., Wikle, C.K., 2011. Statistics for Spatio-Temporal Data. John Wiley & Sons,

We introduce the same notations concerning the ordinate Y = ymin −ymax The diﬀerence between the minimum and maximum y values of the variogram. Using the notations, we suggest a square plotted on the graphical representation of the variogram and including the most convenient values for the range and the sill. Its opposite points (Xdown ;Ydown) and and (Xup ;Yup) are deﬁned as ﬁrst as Xdown = Xval −5%·X Xup = Xval + 5%·X Ydown = Yval−5%·Y , and second as and Yup = Yval + 5%·Y . We then sort all the points in the square in ascending order of x . In addition, we deﬁne a maximum number of values that we could consider in the square, after sorting, as 20%·N . Taking one of those values gives us a prediction of the range. The sills and ranges predicted for each missing measurement are memorized and used during the Kriging run. In addition, we memorized the average value of the group of neighbours and we used it also for the next section. (e) Prediction of the value for the missing measure. The value of the missing measure is predicted using the screening description and the predicted sill and range.

448

Computers and Electronics in Agriculture 142 (2017) 440–449

A. Shtiliyanova et al.

Meul, M., Van Meirvenne, M., 2003. Kriging soil temperature under diﬀerent types. Geoderma 112, 217–233. http://dx.doi.org/10.1016/S0016-7061(02)00308-7. Miller, M.W.F., 1990. Short-Term Hourly Temperature Interpolation. USAF Environmental Technical Applications Center. < http://www.dtic.mil/dtic/tr/ fulltext/u2/a240489.pdf > (Accessed on the 22.03.2017). Mohammadi, K., Shamshirband, S., Motamedi, S., Petković, D., Hashim, R., Gocic, M., 2015. Extreme learning machine based prediction of daily dew point temperature. Comput. Electron. Agric. 117, 214–225. Nash, J.E., Sutcliﬀe, J.V., 1970. River ﬂow forecasting through conceptual models. Part I A discussion of principles. J. Hydrol. 10, 282–290. http://dx.doi.org/10.1016/00221694(70)90255-6. Nordman, D.J., Caragea, P.C., 2008. Point and interval estimation of variogram models using spatial empirical likelihood. J. Am. Stat. Assoc. 103, 350–361. http://dx.doi. org/10.1198/016214507000001391. NRCS, 2009. SNOTEL Data Collection Network Fact Sheet. Natural Resource Conservation Service. United States Department of Agriculture, Washington, DC. < http://www. wcc.nrcs.usda.gov/factpub/sntlfct1.html > (Accessed on the 22.03.2017). Pappas, C., Papalexiou, S.M., Koutsoyiannis, D., 2014. A quick gap ﬁlling of missing hydrometeorological data. J. Geophys. Res. 119, 9290–9300. http://dx.doi.org/10. 1002/2014JD021633. Pardo-Igúzquiza, E., Dowd, P.A., Grimes, D.I.F., 2005. An automatic moving window for mapping meteorological data. Int. J. Climatol. 25, 665–678. http://dx.doi.org/10. 1002/joc.1128. Pepin, N.C., Norris, J.R., 2005. An examination of the diﬀerences between surface and free-air temperature trend at high-elevation sites: relationships with cloud cover, snow cover, and wind. J. Geophys. Res. 10, D24112. http://dx.doi.org/10.1029/ 2005JD006150. Pielke, R.A., 1984. Mesoscale Meteorological Modeling. Academic Press, New York. Rivington, M., Matthews, K.B., Bellocchi, G., Buchan, K., 2006. Evaluating uncertainty introduced to process-based simulation model estimates by alternative sources of meteorological data. Agric. Syst. 88, 451–471. http://dx.doi.org/10.1016/j.agsy. 2005.07.004. Rivoirard, J., 2005. Concepts and methods of geostatistics. In: Bilodeau, M., Meyer, F., Schmitt, M. (Eds.), Space, Structure and Randomness. Springer-Verlag, New York, pp. 17–37. Rossi, M.E., Posa, D., 1991. Geostatistical modeling of dissolved oxygen distribution in estuarine systems. Environ. Sci. Technol. 25, 474–481. http://dx.doi.org/10.1021/ es00015a015. Ruane, A.C., Goldberg, R., Chryssanthacopoulos, J., 2015. Climate forcing datasets for agricultural modeling: Merged products for gap-ﬁlling and historical climate series estimation. Agric. Forest Meteorol. 200, 233–248. http://dx.doi.org/10.1016/j. agrformet.2014.09.016. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons Inc, New York. http://dx.doi.org/10.1002/9780470316696.ch1. Schafer, J.L., Graham, J.W., 2002. Missing data: our view of the state of the art. Psychol. Meth. 7, 147–177. http://dx.doi.org/10.1037/1082-989X.7.2.147. Schneider, T., 2001. Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J. Clim. 14, 853–871. http:// dx.doi.org/10.1175/1520-0442(2001) 014 0853:AOICDE 2.0.CO;2. Seitchik, E., 2012. Trend and Detail: Gap Filling with the Empirical Mode Decomposition. A Senior Project submitted to The Division of Science, Mathematics and Computing, Bard College, New York. < http://math.bard.edu/student/pdfs/Evan-Seitchik. pdf > (Accessed on the 22.03.2017). Shenk, T.M., Franklin, A.B., 2001. Modeling in Natural Resource Management: Development, Interpretation, and Application. Edited Island Press, Washington, DC. Simolo, C., Brunetti, M., Maugeri, M., Nanni, T., 2010. Improving estimation of missing values in daily precipitation series by a probability density function-preserving approach. Int. J. Climatol. 30, 1564–1576. http://dx.doi.org/10.1002/joc.1992. Strebelle, S., 2002. Conditional simulation of complex geological structures using multiple-point statistics. Math. Geol. 34, 1–22. http://dx.doi.org/10.1023/ A:1014009426274. Taugourdeau, S., Villerd, J., Plantureux, S., Huguenin-Elie, O., Amiaud, B., 2014. Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data. Ecol. Evol. 4, 944–958. http://dx.doi.org/10.1002/ece3.989. Thornton, P.E., Running, S.W., White, M.A., 1997. Generating surface of daily meteorological variables over large regions of complex terrain. J. Hydrol. 190, 214–251. http://dx.doi.org/10.1016/S0022-1694(96)03128-9. Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 46, 234–240. Truong, P.N., Heuvelink, G.B.M., Pebesma, E., 2014. Bayesian area-to-point kriging using expert knowledge as informative priors. Int. J. Appl. Earth Observ. Geoinform. 30, 128–138. http://dx.doi.org/10.1016/j.jag.2014.01.019. White, J.W., Hoogenboom, G., Hunt, L.A., 2005. A structured procedure for assessing how crop models respond to temperature. Agron. J. 97, 426–439. http://dx.doi.org/10. 2134/agronj2005.0426. Willmott, C.J., Matsuura, K., 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 79–82. http://dx.doi.org/10.3354/cr030079. Zehn, Z., Cai, J., 2015. Application of improved Kriging time interpolation in land subsidence monitoring. J. Water Resour. Architect. Eng. 2015, 177–180. Zhang, T., Schultz, A., 1990. EXORCISE - an algorithm for detection of spurious values and prediction of missing data. Comput. Geosci. 16, 1027–1065.

New York. Daly, C., Halbleib, M., Smith, J.I., Gibson, W.P., Doggett, M., Taylor, G.H., Curtis, J., Pasteris, P.P., 2008. Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int. J. Climatol. 28, 2031–2064. http://dx.doi.org/10.1002/joc.1688. Davey, C.A., Redmond, K.T., Simeral, D.B., 2006. Weather and Climate Inventory, National Park Service, Greater Yellowstone Network. Natural Resource Technical Report NPS/GRYN/NRTR-2006/001. Davey, C.A., Redmond, K.T., Simeral, D.B., 2007. Weather and Climate Inventory, National Park Service, Rocky Mountain Network. Natural Resource Technical Report NPS/ROMN/NRTR-2007/036. De Benedetto, D., Castrignanò, A., Sollitto, D., Modugno, F., Buttafuoco, G., Lo Papa, G., 2012. Integrating geophysical and geostatistical techniques to map the spatial variation of clay. Geoderma 171–172, 53–63. http://dx.doi.org/10.1016/j.geoderma. 2011.05.005. Diodato, N., Bellocchi, G., 2014. Long-term winter temperatures in central Mediterranean: forecast skill of an ensemble statistical model. Theor. Appl. Climatol. 116, 131–146. http://dx.doi.org/10.1007/s00704-013-0915-z. Dowdall, M., Gerland, S., Karcher, M., Gwynn, J.P., Rudjord, A.L., Kostad, A.K., 2005. Optimisation of sampling for the temporal monitoring of technetium-99 in the Arctic marine environment. J. Environ. Radioact. 84, 111–130. http://dx.doi.org/10.1016/ j.jenvrad.2005.04.010. Enzi, S., Camuﬀo, D., 1996. Eﬀetti del riscaldamento globale del Mezzogiorno: riscostruzione della storia ambientale nell’Optimum climatico medievale. In: Guerrini, A. (Ed.), Atti del 4° Workshop Progetto Strategico Clima Ambiente e Territorio del Mezzogiorno, Ith ed. Italian National Research Council, Lecce, Italy, pp. 7–72 (in Italian). Eza, U., Shtiliyanova, A., Borras, D., Bellocchi, G., Carrère, P., Martin, R., 2015. An open platform to assess vulnerabilities to climate change: an application to agricultural systems. Ecol. Inform. 30, 389–396. http://dx.doi.org/10.1016/j.ecoinf.2015.10. 009. Fabrizi, R., Bonafoni, S., Biondi, R., 2010. Satellite and ground-based sensors for the urban heat island analysis in the city of Rome. Rem. Sens. 2, 1400–1415. http://dx. doi.org/10.3390/rs2051400. Gandin, L.S., 1963. Objective Analysis of Meteorological Fields. Israel Program for Scientiﬁc Translation, Jerusalem. Gelman, A., Hill, J., 2006. Data Analysis using Regression and Multilevel/Hierarchical Models, ﬁrst ed. Cambridge University Press, New York. Guccione, P., Appice, A., Ciampi, A., Malerba, D., 2012. Trend cluster based Kriging interpolation in sensor data networks. In: Atzmueller, M., Chin, A., Helic, D., Hotho, A. (Eds.), Modeling and Mining Ubiquitous Social Media. Springer-Verlag, Heidelberg, pp. 118–137. Gustard, A., Demuth, S., 2008. Manual on Low-Flow Estimation and Prediction. World Meteorological Organisation, Geneva - No. 1029. < http://www.wmo.int/pages/ prog/hwrp/publications/low-ﬂow_estimation_prediction/WMO%201029%20en. pdf > (Accessed on the 22.03.2017). Hengl, T., 2007. A Practical Guide to Geostatistical Mapping of Environmental Variables. JRC Scientiﬁc and Technical Reports, Ispra. < http://eusoils.jrc.ec.europa.eu/esdb_ archive/eusoils_docs/other/eur22904en.pdf > (Accessed on the 22.03.2017). Hengl, T., 2009. A Practical Guide to Geostatistical Mapping, second ed. University of Amsterdam, The Netherlands. < http://spatial-analyst.net/book/system/ﬁles/ Hengl_2009_GEOSTATe2c1w.pdf > (Accessed on the 22.03.2017). Henn, B., Raleigh, M.S., Fisher, A., Lundquist, J.D., 2013. A comparison of methods for ﬁlling gaps in hourly near-surface air temperature data. J. Hydrometeorol. 14, 929–945. http://dx.doi.org/10.1175/JHM-D-12-027.1. Holmes, T.R.H., Crow, W.T., Hain, C., 2013. Spatial patterns in timing of the diurnal temperature cycle. Hydrol. Earth Syst. Sci. 17, 3695–3706. http://dx.doi.org/10. 5194/hess-17-3695-2013. Hoogenboom, G., 2000. Contribution of agrometeorology to the simulation of crop production and its applications. Agric. Forest Meteorol. 103, 137–157. http://dx.doi. org/10.1016/S0168-1923(00)00108-8. Journel, A.G., 1986. Geostatistics: models and tools for Earth sciences. Math. Geol. 18, 119–139. http://dx.doi.org/10.1007/BF00897658. Kihoro, Y.W.O., Athiany, J.M., Kibunja, K.H.O., 2013. Imputation of incomplete nonstationary seasonal time series. Math. Theory Model. 3, 142–154. Koehler, T.L., 1977. A Test of Seven Methods Which Perform Grid to Observation Interpolations, in Meteorological Applications of Satellite Indirect Soundings II. Project Report, Department of Meteorology, University of Wisconsin, Madison. Linacre, E., 1992. Climate Data and Resources: A Reference and Guide. Routledge, London. Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58, 1246–1266. Matthews, K.B., Rivington, M., Buchan, K., Miller, D., Bellocchi, G., 2008. Characterising the agro-meteorological implications of climate change scenarios for land management stakeholders. Clim. Res. 37, 59–75. http://dx.doi.org/10.3354/cr00751. Mavi, H.S., 2004. Agrometeorology: Principles and Applications of Climate Studies in Agriculture. Haworth Press, New York. McCandless, T.C., Haupt, S.E., Young, G.S., 2011. The eﬀects of imputing missing data on ensemble temperature forecasts. J. Comput. 4, 162–171. http://dx.doi.org/10.4304/ jcp.6.2.162-171. Menne, M.J., William Jr., C.N., Palecki, M.A., 2010. On the reliability of the U.S. surface temperature record. J. Geophys. Res. 115, D11108. http://dx.doi.org/10.1029/ 2009JD013094.

449

Kriging-based approach to predict missing air temperature data

Kriging-based approach to predict missing air temperature data

Recommend Documents