Renewable and Sustainable Energy Reviews 113 (2019) 109260
Contents lists available at ScienceDirect
Renewable and Sustainable Energy Reviews journal homepage: www.elsevier.com/locate/rser
Producing high-quality solar resource maps by integrating high- and lowaccuracy measurements using Gaussian processes
T
Dazhi Yanga,∗, Christian A. Gueymardb a b
Singapore Institute of Manufacturing Technology, Agency for Science, Technology and Research (A*STAR), Singapore Solar Consulting Services, Colebrook, NH, USA
A R T I C LE I N FO
A B S T R A C T
Keywords: Gaussian process Data fusion Solar resource Kriging Mapping
With the objective of producing high-resolution and high-accuracy maps of mean annual irradiance at country scale, this contribution exploits the complementary properties of two distinct sources of solar irradiance data: gridded modeled data derived from satellite observations, and station-specific typical meteorological year (TMY) data. A data fusion procedure based on Gaussian process modeling is used to optimally combine the two sources of data and derive solar resource maps. Gridded physical solar model version 3 (PSM3) satellite-derived data at 4-km resolution and TMY3 data from 67 stations in California are used to produce a map of mean annual global horizontal irradiance at 2-km resolution and exemplify the procedure. It is shown that by integrating the PSM3 data with TMY3 data, the original 5.2% mean error in the PSM3 map is reduced to 1.6%. The demonstrated approach is suitable for a variety of regional-scale applications for which both high-resolution data of low accuracy and low-resolution measurements of high accuracy are available.
1. Introduction Solar energy is one of the most promising sources of renewable energy that could potentially mitigate the pressing problems of increasing energy demand, depleting fossil fuel, and global climate change. Accurate solar energy resource assessments play a central role in the design, simulation, financing, and performance evaluation of solar energy systems. Initially, large-scale assessments and resource mapping have been conducted where research and development of pioneering solar applications were actively being developed, such as over western Europe [1], the Americas [2,3], or Australia [4]. Recently, such need has spawned a large body of literature on country-scale or regional solar resource assessment studies aimed at areas where local resource data were critically needed to help initiate or expand the local solar industry, e.g., in Chile [5], Mexico [6], Libya [7], Mali [8], Nigeria [9], Pakistan [10], Iran [11], Qatar [12], Saudi Arabia [13], or the United Arab Emirates [14,15]. Irradiance resource maps produced from annual or monthly data are typically found in such solar resource studies, because they provide a direct visual summary to all stakeholders who wish to understand the geographical distribution of the solar potential, and because the mean annual irradiation is the most important information needed to characterize the solar resource. Nevertheless, the accuracy of these solar resource data is rarely addressed in these general studies. This in turn ∗
affects the bankability of solar projects [16,17], and limits the credibility of subsequent analysis, such as analyzing the environmental, social, economical, or political impacts of various types of solar installations. This article discusses a method based on Gaussian processes, for producing high-quality solar resource maps that could be used at annual, daily and hourly resolutions. Solar irradiance data used in solar resource assessments or other applications typically come from three complementary sources: (1) modeled data based on remote-sensed observations, (2) modeled reanalysis data, or (3) ground-based instruments. Model predictions obtained through remote sensing (satellite-derived irradiance) or reanalysis are normally less accurate than ground-based observational data, which are always considered the necessary reference to validate predictions from any other source of data [16,17]. The advantage of predictions based on satellite-derived data or reanalysis is their availability on lattices of fixed spatial resolution, i.e., gridded databases, with normally no missing pixel. Reanalysis databases currently have a lower spatio-temporal resolution and higher uncertainty than irradiance data derived from geosynchronous satellite observations, so will not be discussed any longer in what follows. In stark contrast with modeled data, ground-based observations are significantly more accurate if the instruments are properly calibrated, maintained, and qualityassessed per the best practices [17]. However, such sources of highquality ground-based observations are only available at a few locations
Corresponding author. E-mail address:
[email protected] (D. Yang).
https://doi.org/10.1016/j.rser.2019.109260 Received 23 October 2018; Received in revised form 28 January 2019; Accepted 4 July 2019 1364-0321/ © 2019 Elsevier Ltd. All rights reserved.
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
current paper is to demonstrate the flexibility of GP-based data fusion, which is a general form of many existing statistical site-adaptation techniques. Potential applications are reviewed in the next section. The specific solar mapping application is described in Section 3. The typical irradiance data needed for that application are discussed in Section 4. The GP method is described in Section 5. Finally, a practical application to the solar resource mapping of California is detailed in Section 6.
over the world. Integrating satellite-based modeled predictions and ground-based observations using data fusion methods is known, a priori, to be beneficial for many solar engineering problems, such as forecasting or resource assessment. The interested reader is referred to the recent reviews [18,19] on these two topics. In the case of very demanding applications, such as concentrating solar technologies, recourse to “site adaptation” techniques is mandatory to guarantee the bankability of such costly projects [17,19,20]. In solar engineering, data fusion is often used interchangeably with phrases such as “site adaptation”, “dataset merging”, or “measured record extension” [19]. The existing methodologies for data fusion can be categorized into two classes, namely, physically-based methods and statistical methods. The primary approach for the physically-based methods is to adjust the atmospheric input data, most particularly the aerosol optical depth [21], so that the modified satellite-derived data better match the ground-based observations. For statistical methods, the correspondence between the satellite-derived and ground-based data is established via statistical models such as regression or model output statistics. Since the details of these methods have been covered in Ref. [19], they are not reiterated here. Nevertheless, it is important to stipulate that the site adaptation methods just mentioned attempt to remove the bias in long time series of modeled irradiance data relative to a specific location. In contrast, the goal of the present method is to prepare high-resolution maps of mean annual irradiance over a whole country. Even though there is extensive literature on data fusion in solar engineering, the related literature in statistics is even more developed. More specifically, this sub-area of data fusion in statistics—integrating high- and low-accuracy data—is exemplified by a long trail of publications [e.g., 22–25]. All these contributions involve the optimal combination of two sources of data known a priori for having widely different spatial and accuracy characteristics. For clarity, Fig. 1 depicts a flowchart of the general data-fusion procedure. These data sources may consist of the outcome of a design experiment, the output of a computer code, or the measurement of some physical quantity. Additionally, the high- and low-accuracy data are typically integrated via Gaussian process (GP) models, i.e., functions f (⋅) and g (⋅) in Fig. 1. Due to the high flexibility and good predictive performance of the GP modeling framework, this class of methods has received much attention in various engineering fields. However, to the best of the authors’ knowledge, the GP-based data fusion techniques have not been adopted in solar resource assessment. In this context, the goal of the
2. Potential applications of data fusion The general framework of high- and low-accuracy data fusion can be used for a wide spectrum of applications. For instance, Ref. [26] described a correlation-based peer-to-peer photovoltaic (PV) power forecasting method. In that application, the aim is to forecast the power output from a number of distinct PV plants, despite the fact that some are not monitored, and thus do not provide any reliable source of data. In general, a correlation-based method is able to forecast the PV output or irradiance at unobserved locations fairly well [27]. When the interstation distance gets large, however, the correlation among PV systems quickly drops to zero [28,29], and the power output forecast fails. Assuming that satellite-derived irradiance data, or numerical weather prediction (NWP) output, is available over that area, its accuracy can be fine-tuned using data from those monitored PV systems. The forecasts can thus be generated through the adjusted satellite or NWP data. Another application of data fusion is about the optimal siting of weather stations, as part of the design of monitoring networks. Based on the literature, this task typically relies on satellite-derived irradiance data [30,31]. Whereas the design methodology is well developed, there is no way to test the design unless an actual monitoring network is placed. Due to the high cost involved in monitoring network installation, operation and maintenance, a commissioned network is often left “as is” for a long period of time [32]. Nevertheless, the potential inadequacy in design, and the need to periodically redesign the network, have long been expressed in an early paper [33]. In order to redesign the network and to minimize the cost for network positioning adjustment, satellite-derived irradiance can be first adapted to an existing monitoring network. As illustrated in Ref. [32], an overall redesign can be carried out without actually moving the monitoring stations, hence resulting in substantial cost and time savings. Most other applications that involve satellite or NWP data can benefit from data fusion. As much as one would wish to demonstrate such an application, the research is limited by data
Fig. 1. Flowchart of the data-fusion process. The main idea is to construct two mapping functions, f (⋅) and g (⋅) , between: (1) low-accuracy data ( yiL ) and known covariates ( xi ), and (2) high-accuracy data ( yjH ) and a subset of low-accuracy data ( yjL ) that collocates with yjH , respectively. Hence, for any new covariates ( x 0 ), the unknown low-accuracy response can be predicted first; the predicted low-accuracy response ( yˆ0L ) is then used to predict the high-accuracy response ( yˆ0H ). 2
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
freely available from NREL's NSRDB website. The data-generating processes of the two datasets are distinct, and are based on different sources of data. Considering the uncertainty estimates provided in their respective documentation, it is underlined here that the TMY3 is attributed a lower uncertainty than the newer PSM3 data. The main reason for this is that the TMY3 datasets are produced using ground observations of clouds (typically at airports), whereas PSM3 entirely depends on remote-sensed data from space. Because the uncertainty quantification depends on location and timescales, the reader is referred to Refs. [52,53] for details. From a general standpoint, the methodology developed in this contribution can be applied to all other cases where the goal is to use low-uncertainty data at a few dispersed locations to reduce the uncertainty in gridded data covering a large area, such as a country. Ideally, as exemplified in Ref. [49], the low-uncertainty data would come from high-quality observations at radiometric stations that follow the best practices. There are many regions or countries, however, where this approach is simply not possible because of the lack of such highquality observations. Over a country or state the size of California (≈424,000 km2), for instance, it is estimated that the GP method developed here requires about 15 stations to fit, and at least another 5 to validate. No network provides high-quality and freely-accessible data from that many sites in California. Even though there are many “secondary” stations from, e.g., agricultural networks, their quality (presumably low) has not been assessed, and therefore such stations are not used in validation studies. This is corroborated by the recent results of Ref. [54], to the effect that a large fraction of secondary stations in Spain had serious data quality issues. Whenever the quality and low-bias of irradiance observations cannot be guaranteed, location-specific site adaptation techniques may actually end up degrading the quality of modeled data time series, rather than improving them [19]. For the present application dedicated to the mean annual GHI (rather than extended time series), this critical issue can be circumvented. The annual GHI is known to rapidly converge with its long-term average value [3]. After 15 years (the minimum period normally used to construct a TMY file), the latter study indicates that the mean measured GHI is typically within only ±1–2% of the long-term average, even though the period of measurement considered in that study includes volcanic years, during which GHI is atypically low. In a subsequent study [55], Gueymard showed that the GHI interannual variability is extremely low when those atypical volcanic years are excluded. Since NREL's TMY datasets are also constructed by excluding those atypical years, it is postulated that a TMY's annual GHI value is very stable and a very precise determination of its long-term average. Finally, a large part of the uncertainty reported for TMY datasets is caused by random errors in hourly irradiance values. Such random errors decrease sharply with time integration, and essentially disappear when considering an annual average. Hence, even though TMY3 datasets might have some residual bias compared to highquality measurements, they still provide the next best possible determination of the mean annual GHI at geographically dispersed sites. This statement applies to the USA, based on the available information on TMY3 quality. The applicability of the method to other countries is contingent on further studies to evaluate the quality of their measured or TMY datasets [52,56,57]. Before the two datasets are described further (Section 4), the procedure for producing high-quality solar resource maps is outlined. In the present case, the adjective “high-quality” can be used to qualify two important, but distinct, properties: high spatial resolution and high accuracy. To predict the value of a solar variable at an arbitrary location—so that the spatial resolution of the map can be arbitrarily fine—a spatial model is needed. Many such models are available in the spatial statistics literature [58], and various kriging predictors can be used to provide estimations at locations where no observations are available. Instead of kriging, the closely related Gaussian process is used here to model the correlated error process (Sections 5.1 and 5.2). This spatial
availability—identifying collocated and temporally aligned datasets is always difficult [34]. Although some initiatives have kick-started, such as the Data Article by the Solar Energy journal [35], or an international collaboration on improving solar radiation data under the auspices of the International Energy Agency,1 the uptake of such initiatives is still low at the moment. To foster new and useful developments, a practical application is used here to demonstrate the methodology. Once more appropriate datasets are identified, updates can be made.
3. An introduction to the present application For this contribution, the GP-based data fusion approach revolves around a critical resource assessment task—producing solar energy resource maps. In the 1980s, the primary approach for generating solar maps was by interpolating long-term ground-based measurements [e.g., 36–39]. Kriging—the optimal interpolation—has been the dominating method for decades. This interpolation-based approach has continuously been used as the mainstream approach [e.g., 40–42] until the satellite-based approach became popular [e.g., 2, 43]. This paradigm shift followed the development and public availability of several operational satellite-to-irradiance models [e.g., 44–48]. Recently, a kriging-related optimal interpolation technique involving estimates from NWP modeling on a 10-km grid and ground-based observations was introduced to help improve the solar resource mapping on a large-scale. It was shown to provide accurate irradiance maps over a whole country [49]. The method used therein is similar to that developed in Ref. [50], which essentially models the “true” spatial process of NWP output as two parts: (1) the uncorrected measurements (also known as “background”), and (2) an error process. Although that specific method has been used in meteorology and data assimilation, the Gaussian process used in geostatistics is more general because it provides additional flexibility in the formulation of the underlying spatial process. For example, the error process in Ref. [49] is modeled as a function of geographical distance, whereas in GP modeling other meteorological parameters can be included in the formulation. Moreover, the background process is used “as is” in Ref. [49], whereas GP modeling is able to explicitly model this process and provide some degree of smoothing using a trend surface. Based on this discussion, the GP methodology followed here appears as a more general approach than the kriging-based optimal interpolation used in Ref. [49]. For both ground-based and satellite-derived solar irradiance data, it is common to use typical weather data to represent the long-term characteristics of the local climate in a compressed form, from a statistical standpoint [51]. The most well-known format of this kind of dataset is called typical meteorological year (TMY). A TMY file is a “condensed” form of annual dataset that contains hourly values of meteorological variables—such as global horizontal irradiance (GHI), direct normal irradiance (DNI), diffuse horizontal irradiance (DIF), cloud cover, temperature, humidity, pressure, wind speed, or wind direction—that typify the median conditions at a specific location over a longer period of time. For a detailed description of the TMY format, the reader is referred to the National Solar Radiation Database (NSRDB) page of the National Renewable Energy Laboratory (NREL) website.2 To produce resource maps, the values in a TMY file may be either summed or averaged—the choice depends on the desired unit in the plots—over a period of one month (monthly maps) or one year (annual maps). In this regard, two sets of TMY data are considered in this paper: (1) the NSRDB 1991–2005 update, TMY3 station-specific data,3 and (2) physical solar model version 3 (PSM3) gridded data. Both datasets are 1
http://iea-pvps.org/index.php?id=389 https://nsrdb.nrel.gov/ 3 It is noted that TMY3 denotes the particular version of TMY data derived from the NSRDB, i.e., TMY3 is the name of that dataset, whereas “TMY” is the abbreviation of typical meteorological year. 2
3
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
prediction model—referred to as GP 1 hereafter—is built with only lowaccuracy data. Once GP 1 is built, the predicted values need to be adjusted to become high-accuracy estimates, i.e., through correction via data fusion. Since the adjustment is needed at arbitrary locations, another GP model is built (Section 5.3). This spatial adjustment model—built using both high- and low-accuracy data—is referred to as GP 2 . With this outline in place, some exploratory analyses are carried out next to facilitate model building. 4. Data and exploratory analysis 4.1. TMY3 data The first TMY dataset developed by NREL was released in 1977. Subsequently, and still using a similar site-specific methodology, two updates have been made, namely, TMY2 in 1994 [56] and TMY3 in 2007 [52]. The newest version, TMY3, contains ground-based modeled data from 1020 sites in the continental United Sates, Alaska and Hawaii. The entire dataset can be downloaded as a zip file from http:// rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/tmy3/. TMY3 sites contain three classes. The Class I sites produce irradiance estimates with the lowest uncertainty because they are based on reliable cloud observations from airports. Class II sites have somewhat higher uncertainty data, and Class III sites have even higher uncertainty because of an incomplete period of record. For the latter reason, only sites from the first two classes are used in this paper. More specifically, a subset, namely, data from the 67 Class I and II sites in California, is used in accordance with the choice of the PSM3 gridded database described below. Annual mean TMY3-derived GHI values are computed at those 67 specific locations.
Fig. 2. Data locations in California. The randomly-selected 1000 locations with only low-accuracy data (PSM3) are marked with black dots, whereas the 67 locations with both low- and high-accuracy data are marked with colors. Among those 67 locations, 12 are set aside for testing (red crosses). The remaining sites (turquoise diamonds) are used for training. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
subsection. 4.3. Exploratory analysis Since the spatial prediction model, GP 1, requires some input variables to make estimations, it is of interest to observe how GHI is affected by the covariates, so that an appropriate choice of input variables can be made. Fig. 3 plots the PSM3's annual mean GHI against four covariates, namely, longitude, latitude, DNI,5 and temperature. The annual mean DNI and temperature are also computed from the PSM3 data files. It is observed that GHI varies quasi-linearly with longitude, latitude, and DNI, whereas a polynomial fit seems appropriate for its dependence on temperature.6 These results will be used in Section 5.2 to construct GP 1. It should be noted that GP is flexible in terms of handling different covariates. Other covariates, such as station pressure, may also be appropriate. In the present case, adding the latter covariate was not found to significantly improve the results, most likely because of the underlying relationships between station pressure and both DNI and temperature, and because of the general longitudinal and latitudinal gradients in elevation (sea level to the west, medium elevations to the northeast, and higher elevations to the southeast). Next, the relationship between the high-accuracy (TMY3) and lowaccuracy (PSM3) annual mean GHI values is plotted as a scatterplot, as shown in Fig. 4. Since most points in Fig. 4 are below the identity line, the PSM3 GHI values are found to be higher than their TMY3 counterparts, with significant bias. Moreover, the PSM3 data is not only biased, but also contains significant variance. The PSM3 bias could be corrected with a regression-based adjustment (by regressing PSM3 GHI on TMY3 GHI), but this would be insufficient because the variance would remain. The need for an effective variance correction justifies the use of a more powerful tool—namely, GP 2 .
4.2. PSM3 data The most recent update of the NSRDB is based on predictions obtained from version 3 of the Physical Solar Model, hereafter PSM3 [59]. The data is regularly gridded with a spatial resolution of approximately 0.04° in both latitude and longitude. PSM3 uses spaceborne observations from two GOES satellites that sense the American continent, and provides data for all land masses of that continent between latitudes 60∘ N and − 21∘S. In terms of temporal coverage, PSM3 provides halfhourly data from 1998 to 2016. As a result, the entire dataset is over 40 TB in size, and is thus difficult to be manipulated as a whole. For instance, there are over 25,000 grid cells in California alone. To speed up the process, 1000 randomly-selected grid cells are used here to index the low-accuracy PSM3 data. Additionally, those grid cells that are closest to the 67 TMY3 sites are also selected to index both high- and low-accuracy data. The resulting 1067 locations are shown in Fig. 2. Since the spatial adjustment model, GP 2 , requires training, and the final results need to be validated, 55 Class II sites are selected as fitting lattice4 (turquoise diamonds in Fig. 2), whereas the remaining 12 Class I sites are set aside for true out-of-sample validation (red crosses in Fig. 2). Using this procedure, the highest quality data can be used to anchor the “true” accuracy of the prediction models. Unlike the TMY3 data, which can be downloaded directly via the link provided on the NSRDB website, the PSM3 data need to be downloaded through an application programming interface (API). This specific API allows the user to download modeled data for only one location and one year at a time. Since the PSM3 data are also available in TMY format (derived from 18 years of PSM data), a total of 1067 PSM3 files are downloaded. After these raw PSM3 data files are prepared, 1067 low-accuracy annual mean GHI values are computed, similarly to what was done for the 67 TMY3 sites in the previous
5 The reason for considering DNI but not DIF is because DNI was used during the development of TMY, whereas DIF was not [52]. 6 Irradiance during winters is low, due to low elevation of the Sun. In general, locations with low annual-mean temperature experience long winters, hence lower annual-mean GHI. However, due to the intricate climate, one should not expect the temperature–GHI relationship to be linear. A data-driven approach—polynomial fit—is thought appropriate.
4 Data collected at the fitting lattice are used to fit the Gaussian process parameters.
4
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
Fig. 3. Relationship between the PSM3 annual mean GHI in California and the explanatory variables—longitude, latitude, DNI, and temperature. Establishing a linear relationship seems to be sufficient for longitude, latitude, and DNI, whereas a third-degree polynomial is a better fit for temperature.
Fig. 5. Relationship between the TMY3 annual mean GHI and the explanatory variables, showing the various interactions between PSM3 GHI and other parameters.
y (xi ) = f (xi )⊤β + ε (xi ),
i = 1, ⋯, n,
(1)
where f (xi ) = (f1 (xi ), ⋯, fm (xi ))⊤ is a set of predefined functions of the input variables, and β = (β1, ⋯, βm)⊤ are the regression coefficients. The ε (xi ) term is a realization of a stationary Gaussian process with covariance:
cov[ε (xi), ε (xj )] = σ 2R (xi , xj ) = σ 2 exp[−D (xi , xj )].
(2)
From a spatial statistics point of view, Eq. (1) is a model that separates a spatial process into a large-scale variation component (also referred to the mean structure) and a correlated error process that represents the sum of the smooth small-scale variation, microscale variation and measurement errors [58]. To put things into perspective, the d input variables may include latitude, longitude, elevation, pressure, temperature, humidity, or any other variable that relates to irradiance. The function fi (xi ) maps the d space to a line, i.e., fi : d → 1. An example of such functions can be the product of latitude and longitude. Thus, using a collection of m such functions, each taking a different form, f (xi ) maps d to m , or f : d → m . Without loss of generality, it is also assumed that the correlation function R (xi , xj ) in Eq. (2) has an exponential form. Usually in spatial statistics, D (xi , xj ) denotes the Euclidean or great-circle distance between two geolocations. However, when other input variables, such as meteorological parameters, are involved, just assigning equal weights cannot separate the different factor effects. To that end, a weighted distance function:
Fig. 4. Relationship between the TMY3 annual mean GHI and that of PSM3. Most points are below the identity line, indicating PSM3 measurements are higher than TMY3's.
d
Lastly, to facilitate the input variable selection for GP 2 , the relationship between the TMY3 GHI and several interactive terms that involve PSM3 GHI is shown in Fig. 5. The interactions terms are plotted instead of the raw input variables, owing to the linear transformation assumption in GP 2 . The relationship shown in Fig. 5 highly resembles that in Fig. 4. In other words, all four interactive terms will be considered in GP 2 . However, additional hypothesis testing is needed to determine the statistical significance of each input variable. The hypothesis testing procedure is described in more detail in Section 6.
D (xi , xj ) =
∑ θk |xik − xjk |pk
(3)
k=1
is commonly used instead of the geographical distance, where θ = (θ1, ⋯, θd )⊤ and p = (p1 , ⋯, pd )⊤ are scale and power parameters, respectively. Since pk = 2 is commonly considered, the Gaussian correlation function is used in what follows. Therefore, the unknown parameters in Eq. (1) are θ , σ, and β . They need to be estimated, which can be done with the method developed below. To obtain the response at a new point x 0 , the empirical best linear unbiased predictor (BLUP) is given as:
5. Methodology
yˆ (x 0) = f (x 0)⊤βˆ + rR−1 (y − Fβˆ),
In this section, the general form of Gaussian process is presented first. GP 1 and GP 2 are subsequently constructed.
(4)
where
f (x 0) = ( f1 (x 0) ⋯ fm (x 0))⊤,
∈m × 1,
(5)
r = ( R (x 0 , x1) ⋯ R (x 0 , xn )),
∈1 × n,
(6)
5.1. Gaussian process modeling Let xi be a column vector of d input variables, i.e., xi ∈ d × 1, and scalar y (xi ) be its response. The Gaussian process model is expressed by:
y = ( y (x1) ⋯ y (xn ))⊤, 5
∈n × 1,
(7)
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
F = ( f (x1) ⋯ f (xn ))⊤, βˆ = (F⊤R−1F )−1F⊤R−1y,
∈n × m, ∈m × 1.
(8)
5.3. Data fusion
(9)
As described above, GP 1 is able to provide an estimate of y at arbitrary locations. In other words, given the input variables and responses at some sampling locations, maps can be generated at an arbitrarily fine resolution. However, as demonstrated in the exploratory analysis, the y estimates may be subject to errors, so that adjustments are further required to produce high-accuracy maps. Let yL (xi ) and yH (xi ) be the low- and high-accuracy estimates with the same input variable values, where xi is a column vector of 4 input variables, namely, longitude, latitude, DNI, and temperature. Their relationship can then be modeled—in the simplest form—using a linear regression:
Since the β estimate requires knowledge of the hyperparameters θ , the latter can be estimated by maximum likelihood (ML) estimation, i.e., maximizing:
−
1 (n ln(σˆ 2) + ln|R|), 2
(10)
⊤ Fβˆ) R−1 (y
where σˆ 2 = (y − − Fβˆ)/ n [23]. The maximization can be performed using quasi-Newton algorithms, such as the ones implemented in the optim function in the computer language system R. At this stage, it is noted that the computational stability of GP estimation algorithms largely depends on the selected design points, i.e., the input variable values. More specifically, if two paired design points are close to each other in the input space, R may be close to singular. In statistics, there are several remedies to this problem. For instance, the correlation matrix can be modified using a “nugget effect” [e.g. Ref. 60], i.e., R′ = R + νI . Nevertheless, the nugget parameter ν introduces additional smoothing in the predictor. It would no longer be an interpolator, which is not always desired, especially when details of the Gaussian process prediction matter. Incidentally, this opens the possibility of using the technique known as GP estimation with space-filling design [61], which should be considered in future studies.
yH (xi ) = ρyL (xi ) + δ ,
yH (xi ) = ρ (xi ) yL (xi ) + δ (xi ),
Following the Gaussian process modeling framework described above, the first step is to determine xi and f (xi ) . To that end, all four input variables shown in Fig. 3, namely, longitude, latitude, annual mean DNI, and annual mean temperature, are considered. Thus, vector xi has four elements, i.e., d = 4 . It is noted that if DNI and temperature are unavailable, the Gaussian process reduces to a universal kriging model under some constraints. The choice of f (xi ) is also made from Fig. 3. Since linear relationships are observed for the first three variables and a third-degree polynomial is observed for temperature, m in Eq. (5) is set to 6. In addition, because an intercept term is added, m = 7 . More specifically:
i = 1, ⋯, n*.
d
ρ (xi ) = ρ0 +
∑ ρk xik ,
(15)
k=1
σδ2 ,
and δ (xi ) is a Gaussian process with mean μδ , variance and correlation parameters θδ . If yH is the vector for the high-accuracy measurements at all n * locations, its log likelihood up to an additive constant7 is given by:
−
(y − Gα )⊤Rδ−1 (yH − Gα ) ⎤ 1⎡ * 2 n lnσδ + ln|R δ | − H , ⎥ 2⎢ σδ2 ⎣ ⎦
yH = ( yH (x1) ⋯ yH (x n*))⊤, ∈7 × 1.
∈n
(11)
∈ n
*
× 1,
× (d + 2),
α = ( μδ ρ0 ⋯ ρd )⊤,
x ik
∈(d + 2) × 1,
(19)
and R δ are defined in a similar way as in GP 1. Since the ML estimators for α and σδ2 are:
αˆ = (G⊤Rδ−1 G )−1G⊤Rδ−1 yH ,
(12)
|2 ) ,
(17) (18)
σˆδ2 = (yH − Gαˆ )⊤Rδ−1 (yH − Gαˆ )/ n*,
4 in rˆ is Rˆ (x 0 , xi ) = exp(− ∑ k = 1 θˆk |x 0k − 4 Rˆ (xi , xj ) = exp(− ∑ k = 1 θk |x ik − x jk |2 ) .
*
yL (x1) x11 ⋯ yL (x1) x1d ⎞ ⎛1 yL (x1) ⋮ ⋮ ⋱ ⋮ G = ⎜⋮ ⎟, ⎜1 y (x *) y (x *) x * ⋯ y (x *) x * ⎟ L L L n n n1 n n d⎠ ⎝
ˆ ˆ −1 (y − Fβˆ), yˆ (x 0) = f (x 0)⊤βˆ + rR entry i– jth entry in Rˆ is
(16)
where
σδ2
where
(14)
In Eq. (14), ρ (xi ) is assumed to be linear, i.e.,
It is noted that these input variables are in different ranges, which may lead to bias preference to certain variables [62]. Therefore, prior to forming F and f (x 0) , the variable-wise mean and standard deviation are used to perform a z-transformation on each of the input variables. Once the parameters θ , σ, and β are estimated through maximizing (10), the BLUP for y (x 0) is obtained from:
the i th
(13)
where ρ and δ are the slope and intercept of the best-fit line, respectively, and n * is the number of locations where both low- and highaccuracy estimates are available. Assuming some form of nonlinearity exists between yL (xi ) and yH (xi ) , an alternative nonlinear regression with an appropriate function form can be defined, e.g., yH (xi ) = ρ″ [yL (xi )]2 + ρ′yL (xi ) + δ′. In situations where the relationship between yL (xi ) and yH (xi ) is a function of xi , the above regression-based adjustment becomes insufficient. This means that the regression-based adjustment is only able to remove the bias in the low-accuracy measurements, but not the variance. To solve this important issue, a more advanced data-fusion technique is to employ another Gaussian process model, i.e., GP 2 :
5.2. Modeling the low-accuracy PSM3 data
1 ⎛ f1 (xi ) ⎞ ⎞ ⎛ 1 ⎜ f2 (xi ) ⎟ ⎛ x i1 ⎞ ⎜ longitudei ⎟ ⎟ ⎜ ⎜ f (xi ) ⎟ x i2 ⎜ latitudei ⎟ 3 ⎟ ⎜ x i3 ⎟ ⎜ ⎜ DNIi ⎟, x f ( ) f (xi ) = 4 i = ⎜ ⎟ = ⎟ ⎜ x i4 ⎜ temperaturei ⎟ x ( ) f ⎟ ⎜ ⎜ 5 i ⎟ ⎜ 2⎟ x i24 ⎜ f6 (xi ) ⎟ ⎜⎜ 3 ⎟⎟ ⎜ temperaturei ⎟ x ⎜ i 4 ⎜ f (x ) ⎟ ⎝ ⎠ temperaturei3 ⎟ ⎠ ⎝ ⎝ 7 i ⎠
i = 1, ⋯, n *,
the parameters θδ , α , and
and the
1 − (n *ln(σˆδ2) + ln|R δ |), 2
Since all the parameters are estimated together, βˆ , σˆ 2 , and the likelihood are updated at each pass of the optimization routine. Such iterative procedure can be computationally expensive when n is large. Two possible remedies can be used in this situation: (1) using ensemble estimation for the parameter through data subsetting, or (2) using a two-step estimation by first fitting the values of β through regression. In this paper, the former remedy is adopted.
(20)
σδ2
(21)
can be estimated by maximizing: (22)
using the quasi-Newton optimization method. With the estimated value of α , i.e., (μˆ δ , ρˆ0 , ⋯, ρˆd )⊤, the value of δ (xi ) 7 Such constant term does not affect the maximization of likelihood, so it is usually ignored, and one does not need to know its exact value.
6
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
effects βˆ are similar, meaning that the estimator is quite stable. Among all βˆ values, βˆ3 and βˆ7 are negative, implying that latitude and cubed temperature have opposite effects on GHI. Whereas in California, low latitude regions have higher annual mean GHI, see Fig. 8 below, the contribution from cubed temperature is more of a mathematical result. Excluding βˆ1, the largest (absolute) coefficients are βˆ3 and βˆ4 . Thus, the most influential factors on GHI are latitude and DNI. Lastly, to ensure that most βˆ values are statistically significant, two tailed t-tests with 193° of freedom are performed for each β estimate. Since the variance estimate of βˆ is given by Ref. [58]:
in Eq. (14) can be estimated via:
δˆ (xi ) = yH (xi ) − ρˆ (xi ) yL (xi ),
i = 1, ⋯, n*.
(23)
where
d
ρˆ (xi ) = ρˆ0 +
∑ ρˆk xik .
(24)
k=1
Recalling that δ (xi ) is a Gaussian process with mean structure μδ , for an arbitrary new point x 0 , a BLUP—as defined in Eq. (12)—is given by: −1 δˆ (x 0) = μˆ δ + rˆδ Rˆ δ (δˆ − 1n* μˆ δ ),
(25)
ˆ (βˆ) = σˆ 2 (F⊤Rˆ −1F )−1,
where ⊤ δˆ = ( δˆ (x1) ⋯ δˆ (x n*) ) ,
∈n
*
× 1,
ˆ
Rˆ δ =
⎛ Rδ (x1, x1) ⋮ ⎜ˆ * R δ (x n , x1) ⎝
the standard errors, t-values, and thus p-values can be easily calculated. However, the p-values here should not be mixed up with the p-values computed from the regression model: y (xi ) = f (xi )⊤β + ε . The p-values for each βˆ are indicated in parentheses in Table 1. It is observed that most of the parameters are significant at the common α level of 0.05. Unlike the βˆ estimates, the ML estimates (MLE) for the correlation parameters θ differ a lot between folds, as a consequence of the stability issue discussed in Section 5.1. Nevertheless, θˆ1 and θˆ2 are larger than θˆ3 or θˆ4 in general. This indicates that the responses of two points may still have a small correlation, despite their distance in the latitude and/or longitude dimensions being small. This could be expected because most of the variations in the latitude and longitude dimensions have been already explained by the main effects. Due to these observed differences in θ estimates, it is necessary to select one set of parameters as the “final” estimated parameters for GP 1. The selection is based on the predictive performance, as described below. For each of the 55 out-of-sample locations—the turquoise diamonds in Fig. 2—the parameters estimated using each fold are inserted into Eq. (12) to obtain a prediction. It should be noted that all 1000 data points are used for each prediction, even though the parameters are fitted using only 200 data points. The prediction results are plotted in Fig. 6. The normalized root mean square (RMS) errors for each fold are 0.55%, 0.52%, 0.50%, 0.56%, and 0.51%, respectively. These errors are very small, which suggests that the spatial prediction model is appropriate. In addition, it appears that the differences in the θ estimates have little effect on the prediction outcome. Hence, the parameters from the 3rd fold (with the lowest RMS error) are used in subsequent analysis. This selection is analogous to selecting the best component model in an ensemble.
(26) *
rˆδ = ( Rˆ δ (x 0 , x1) ⋯ Rˆ δ (x 0 , x n*) ),
∈1 × n ,
⋯ Rˆ δ (x1, x n*) ⎞ ⋱ ⋮ , ⎟ ⋯ Rˆ δ (x n* , x n*)
∈n
(27)
*
× n *,
⎠
(28)
column vector filled only with ones [63]. The and 1n* is a entries of rˆδ and Rˆ δ are calculated in the same way as in Eq. (12), but using θˆδ . Finally, the BLUP of the adjusted y at x 0 is given by: length-n *
yˆH (x 0) = ρˆ (x 0) yˆL (x 0) + δˆ (x 0),
(29)
where yˆL (x 0) is the prediction made using GP 1 via Eq. (12). 6. Case study: annual mean GHI map for California 6.1. Estimating θ , β , and σ 2 using 1000 low-accuracy PSM3 data points Following the procedure outlined in Sections 5.1 and 5.2, the annual mean GHI in California is first modeled using only the low-accuracy PSM3 data. Since n = 1000 is a fairly large number, the data is randomly split into five folds, each with n = 200 , for the reasons given in Section 5.2. For each fold, a GP 1, i.e., Eq. (1), is fitted. The fitting results are shown in Table 1. It is observed that for different folds, the fitted values for the main Table 1 Fitted Gaussian process parameters for each fold. These GP 1 models are built using the low-accuracy PSM3 data only. The p-value each for βˆ is indicated in the parentheses below. Values smaller than 2 × 10−16 are replaced with 0.000* . Parameter
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
θˆ1 θˆ2
9.3465
23.0873
69.2648
2.5141
27.6691
46.8493
72.9369
19.3532
45.0112
8.0894
θˆ3 θˆ4
21.5756
2.5329
9.7011
5.6783
16.6342
10.5599
5.9674
1.5847
5.8679
3.7899
βˆ1 : intercept
5.3856
5.3847
5.3950
5.3777
5.3812
(0.000*) 0.0284
(0.000*) 0.0212
(0.000*) 0.0210
(0.000*) 0.0174
(0.000*) 0.0239
(0.0008) −0.1218
(0.0280) −0.1190
(0.0191) −0.1179
(0.0725) −0.1233
(0.0206) −0.1127
(0.000*) 0.2561
(0.000*) 0.2688
(0.000*) 0.2808
(0.000*) 0.2816
(0.000*) 0.2742
(0.000*) 0.0544
(0.000*) 0.0421
(0.000*) 0.0317
(0.000*) 0.0426
(0.000*) 0.0501
(0.000*) 0.0145
(0.0001) 0.0105
(0.0025) 0.0106
(0.0002) 0.0179
(0.000*) 0.0145
βˆ7 : temperature3
(0.0001) −0.0090
(0.0081) −0.0019
(0.0052) −0.0017
(0.0001) −0.0042
(0.0040) −0.0068
σˆ 2
(0.0004) 0.0016
(0.4768) 0.0018
(0.5082) 0.0019
(0.2014) 0.0019
(0.0519) 0.0021
βˆ2 : longitude βˆ3 : latitude βˆ4 : DNI
βˆ5 : temperature βˆ6 : temperature2
(30)
6.2. Estimating θδ , α , and σδ2 using 55 pairs of TMY3 and PSM3 data points In the previous section, 55 out-of-sample predictions were made using GP 1. In this section, these 55 predicted-PSM3 data points, as well as the matching 55 TMY3 data points, are used to construct the spatial adjustment model GP 2 following the procedure described in Section 5.3. In the initial modeling, all four input variables are considered for the construction of G , considering the correlations observed in Fig. 5. In this case, the fitted values for θδ are (181.9993, 80.9817, 141.0911, 24.1447), the fitted ρk , k = 0, ⋯, 4 , values are (0.9268, 0.0225, 0.0182, − 0.0085, 0.0062), and the fitted μδ value is 0.1868. To make sure that the fitted parameters are statistically significant, a series of two-tailed t-tests is again performed. It is found that the μδ , ρ3 , and ρ4 estimates have very high p-values, and thus their corresponding variables should be removed from the model. Since removing a variable from the model may change the fitted coefficients of the remaining variables, a backward stepwise selection is performed. As a result, GP 2 with α = (ρ0 , ρ1 , ρ2 , ρ3)⊤ is found to contain only statistically significant coefficients. The new estimates are (0.9618, 0.0231, 0.0160, − 0.0107 ) with p-values (0.000*, 0.0026, 0.0115, 0.0285), respectively. Therefore, with these new estimates, GP 2 is tested on the 12 Class I sites where both types of data are available—the red crosses in Fig. 2—to evaluate the out-of-sample performance of the spatial 7
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
Fig. 6. PSM3 data versus predicted—using θˆ , βˆ , and σˆ 2 estimated in each fold—annual mean GHI at 55 out-of-sample locations.
uncertainty PSM3 GHI estimates [Fig. 8 (a)], (2) an ordinary kriging map using 67 low-uncertainty TMY3 data points [Fig. 8 (b)], (3) the low-accuracy map produced using GP 1 [Fig. 8 (c)], and (4) the adjusted map using a linear regression model [Fig. 8 (d)]. It is evident that the kriged maps—(a, b)—are overly smooth and lacking details, whereas the maps solely based on PSM3 data—(a, c)—overestimate GHI (as could be expected from Fig. 4). On the other hand, the adjusted maps—(d, e)—are close to the “true” mean, i.e., their color range is close to that shown in (b). Between (d) and (e), it appears that the GPmodeled map (e) has more details, although a large-size map would be needed to observe them more precisely. For example, the two small low-GHI patches near the city of Placer (− 120. 7∘, 39. 1∘) that are indicated by circles in Fig. 8 are observed in both (e) and (b), but not in (d). The above results are conclusive, and imply that when more groundbased stations become (hopefully) available in the future, the advantage of a GP-modeled map over a regression-based map would become more significant. Fig. 7. Out-of-sample validation against TMY3 data using GP 2 at 12 locations. The predictions made by a linear regression model and the raw PSM3 estimates are plotted as reference.
7. Conclusion Producing resource maps constitutes an essential initial step in renewable energy resource assessments. Mapping provides an easily interpretable overview on the available resource over a region. In the case of solar energy applications, high-spatial-resolution GHI maps at annual, monthly, or even hourly temporal resolution, are of vital importance to the design, simulation and evaluation of any solar system—most particularly of the PV type. Currently, many global design tools rely, directly or indirectly, on remote-sensed data to support these engineering tasks. Nevertheless, any source of remote-sensed data is subject to bias and variance, which have to be corrected using groundbased measurements. To that end, high-accuracy versions of these maps are desired, which could improve the bankability of solar energy projects. A data fusion technique based on Gaussian processes (GP) is demonstrated using an application of integrating high- and low-accuracy GHI data. In this study, the data fusion is performed spatially at country or state scale, with the goal of preparing high-accuracy and high-resolution solar resource maps. The procedure has two main steps: (1) producing a high-resolution map with a spatial prediction model built using only low-accuracy data, and (2) adjusting the low-accuracy results just found with a spatial adjustment model built using both highand low-accuracy data. In the case study considered here—the production of an annual mean global horizontal irradiance map for California—the two-step GP procedure leads to a significant accuracy improvement in the produced map, as compared to the benchmarking maps using only regular kriging interpolation. Since the GP-based approach is flexible in its formulation, the model construction is not limited by the type of input variables. In the most
adjustment model. The predictions are shown in Fig. 7. To benchmark GP 2 , the linear regression model in Eq. (13) is used. The normalized RMS errors, nRMSEs, between GP 2 and the linear regression model are 1.62% and 2.06% respectively, whereas the nRMSE for the raw PSM3 predictions at these locations is 5.23%. These results show that both adjustment methods lead to significant accuracy improvements. Since both the Gaussian process and the linear regression produce unbiased predictions, the better GP 2 ’s nRMSE is due to variance reduction.
6.3. Producing high-resolution, high-accuracy GHI resource maps Using the GP models constructed above, as well as the estimated parameters, an annual mean GHI map can be produced for California. To emphasize the high-resolution nature of the produced map, a regular lattice with a very-fine resolution of 0.02° in both longitude and latitude is used, which is four times the spatial resolution of the initial PSM3 areal grid size. As a result, the total number of unobserved locations is more than 0.1 million. For each of these 0.1 million locations, Eq. (12) is first used to generate a low-accuracy estimate. Then, Eq. (29) is used to adjust the low-accuracy estimate. Since GP 1 requires DNI and temperature as inputs, the values of DNI and temperature at the unobserved locations are produced using ordinary kriging. The final resource map is shown in Fig. 8 (e). This high-quality map is compared to maps produced using four other methods: (1) an ordinary kriging map using 1067 high8
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
Fig. 8. High-resolution California annual mean GHI map using different methods. Each map contains values from more than 0.1 million locations. The cream to burgundy color bar describes the graduation of GHI from low to high. Note the GP 2 map (e) contains details inherited from kriged TMY map (b), as apparent in, e.g., the circled regions, whereas such details are missing from (a), (c), and (d). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
basic case, geographical information including only longitude, latitude and elevation, can be used. For this basic setting, the spatial prediction model GP 1 becomes a universal kriging model with Gaussian correlation function if the model hyperparameters are assumed equal. In more complex cases, other spatial predictors, such as median-polish kriging [58] or fixed-rank kriging [64], may very well be involved when the type and quality of the available data permits.
[5]
[6]
Acknowledgement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors acknowledge the publicly available TMY3 and NSRDB databases, developed by the National Renewable Energy Laboratory, which were central to this study.
[7]
[8]
References [9]
[1] Hammer A, Heinemann D, Hoyer C, Kuhlemann R, Lorenz E, Müller R, Beyer HG. Solar energy assessment using remote sensing technologies. Remote Sens Environ 2003;86(3):423–32. urban Remote Sensing https://doi.org/10.1016/S00344257(03)00083-Xhttp://www.sciencedirect.com/science/article/pii/ S003442570300083X. [2] Martins F, Pereira E, Abreu S. Satellite-derived solar resource maps for Brazil under SWERA project. Sol Energy 2007;81(4):517–28https://doi.org/10.1016/j.solener. 2006.07.009http://www.sciencedirect.com/science/article/pii/ S0038092X0600199X. [3] Gueymard CA, Wilcox SM. Assessment of spatial and temporal variability in the US solar resource from radiometric measurements and predictions from models using ground-based or satellite data. Sol Energy 2011;85(5):1068–84https://doi.org/10. 1016/j.solener.2011.02.030http://www.sciencedirect.com/science/article/pii/ S0038092X11000855. [4] Blanksby C, Bennett D, Langford S. Improvement to an existing satellite data set in
[10]
[11]
[12]
9
support of an Australia solar atlas, Solar Energy 98. 2013. p. 111–24. iCEM Solar Radiation https://doi.org/10.1016/j.solener.2012.10.026http://www. sciencedirect.com/science/article/pii/S0038092X1200391X. Zurita A, Castillejo-Cuberos A, García M, Mata-Torres C, Simsek Y, García R, Antonanzas-Torres F, Escobar RA. State of the art and future prospects for solar PV development in Chile. Renew Sustain Energy Rev 2018;92:701–27https://doi.org/ 10.1016/j.rser.2018.04.096http://www.sciencedirect.com/science/article/pii/ S1364032118303198. Hernández-Escobedo Q, Rodríguez-García E, Saldaña-Flores R, Fernández-García A, Manzano-Agugliaro F. Solar energy resource assessment in Mexican states along the Gulf of Mexico. Renew Sustain Energy Rev 2015;43:216–38https://doi.org/10. 1016/j.rser.2014.10.025http://www.sciencedirect.com/science/article/pii/ S1364032114008429. Belgasim B, Aldali Y, Abdunnabi MJ, Hashem G, Hossin K. The potential of concentrating solar power (CSP) for electricity generation in Libya. Renew Sustain Energy Rev 2018;90:1–15https://doi.org/10.1016/j.rser.2018.03.045http://www. sciencedirect.com/science/article/pii/S1364032118301357. Nygaard I, Rasmussen K, Badger J, Nielsen TT, Hansen LB, Stisen S, Larsen S, Mariko A, Togola I. Using modeling, satellite images and existing global datasets for rapid preliminary assessments of renewable energy resources: The case of Mali. Renew Sustain Energy Rev 2010;14(8):2359–71https://doi.org/10.1016/j.rser. 2010.04.001http://www.sciencedirect.com/science/article/pii/ S136403211000105X. Okoye CO, Bahrami A, Atikol U. Evaluating the solar resource potential on different tracking surfaces in Nigeria. Renew Sustain Energy Rev 2018;81:1569–81https:// doi.org/10.1016/j.rser.2017.05.235http://www.sciencedirect.com/science/ article/pii/S1364032117308833. Tahir Z, Asim M. Surface measured solar radiation data and solar energy resource assessment of Pakistan: A review. Renew Sustain Energy Rev 2018;81:2839–61https://doi.org/10.1016/j.rser.2017.06.090http://www. sciencedirect.com/science/article/pii/S1364032117310304. Moradi I, Mueller R, Alijani B, Kamali GA. Evaluation of the Heliosat-II method using daily irradiation data for four stations in Iran. Sol Energy 2009;83(2):150–6https://doi.org/10.1016/j.solener.2008.07.010http://www. sciencedirect.com/science/article/pii/S0038092X08001680. Martín-Pomares L, Martínez D, Polo J, Perez-Astudillo D, Bachour D, Sanfilippo A. Analysis of the long-term solar potential for electricity generation in Qatar. Renew Sustain Energy Rev 2017;73:1231–46https://doi.org/10.1016/j.rser.2017.01. 125http://www.sciencedirect.com/science/article/pii/S136403211730134X.
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
[35] Yang D, Gueymard CA, Kleissl J. Editorial: Submission of Data Article is now open. Sol Energy 2018;171:A1–2https://doi.org/10.1016/j.solener.2018.07.006http:// www.sciencedirect.com/science/article/pii/S0038092X18306698. [36] Hay JE. Errors associated with the spatial interpolation of mean solar irradiances. Sol Energy 1986;37(2):135–46https://doi.org/10.1016/0038-092X(86)90071Xhttp://www.sciencedirect.com/science/article/pii/0038092X8690071X. [37] Palz W, Kasten F. European solar radiation atlas. Verlag TUV Rheinland; 1984. [38] Maclaine-Cross I. Linear spatial interpolation of solar radiation statistics. Sol Energy 1980;24(4):413–4https://doi.org/10.1016/0038-092X(80)90306-0http://www. sciencedirect.com/science/article/pii/0038092X80903060. [39] Granger OE. Climatology of global solar radiation in California and an interpolation technique based on orthogonal functions. Sol Energy 1980;24(2):153–68https:// doi.org/10.1016/0038-092X(80)90389-8http://www.sciencedirect.com/science/ article/pii/0038092X80903898. [40] Merino GG, Jones D, Stooksbury DE, Hubbard KG. Determination of semivariogram models to krige hourly and daily solar irradiance in western Nebraska. Journal of Applied Meteorology and Climatology 2001;40(6):1085–94. https://doi.org/ 10.1175/1520-0450(2001)040¡1085:DOSMTK¿2.0.CO;2. [41] Righini R, Gallegos HG, Raichijk C. Approach to drawing new global solar irradiation contour maps for Argentina. Renew Energy 2005;30(8):1241–55https://doi. org/10.1016/j.renene.2004.10.010http://www.sciencedirect.com/science/article/ pii/S0960148104004057. [42] Moreno A, Gilabert M, Martinez B. Mapping daily global solar irradiation over Spain: A comparative study of selected approaches. Sol Energy 2011;85(9):2072–84https://doi.org/10.1016/j.solener.2011.05.017http://www. sciencedirect.com/science/article/pii/S0038092X11001976. [43] Janjai S, Masiri I, Laksanaboonsong J. Satellite-derived solar resource maps for Myanmar. Renew Energy 2013;53:132–40https://doi.org/10.1016/j.renene.2012. 11.014http://www.sciencedirect.com/science/article/pii/S0960148112007173. [44] Perez R, Ineichen P, Moore K, Kmiecik M, Chain C, George R, Vignola F. A new operational model for satellite-derived irradiances: Description and validation. Sol Energy 2002;73(5):307–17https://doi.org/10.1016/S0038-092X(02)001226http://www.sciencedirect.com/science/article/pii/S0038092X02001226. [45] Perez R, Ineichen P, Kmiecik M, Moore K, Renne D, George R. Producing satellitederived irradiances in complex arid terrain the American Solar Energy Society’s Solar 2003 Special Issue Sol Energy 2004;77(4):367–71https://doi.org/10.1016/j. solener.2003.12.016http://www.sciencedirect.com/science/article/pii/ S0038092X03004687. [46] Sengupta M, Habte A, Gotseff P, Weekley A, Lopez A, Molling C, Heidinger A. A physics-based GOES satellite product for use in NREL's National Solar Radiation Database Tech. Rep. NREL/CP-5D00-62237 National Renewable Energy Laboratory; 2014https://www.nrel.gov/docs/fy14osti/62237.pdf. [47] Beyer HG, Costanzo C, Heinemann D. Modifications of the Heliosat procedure for irradiance estimates from satellite images. Sol Energy 1996;56(3):207–12https:// doi.org/10.1016/0038-092X(95)00092-6http://www.sciencedirect.com/science/ article/pii/0038092X95000926. [48] Lefèvre M, Wald L, Diabaté L. Using reduced data sets ISCCP-B2 from the Meteosat satellites to assess surface solar irradiance. Sol Energy 2007;81(2):240–53https:// doi.org/10.1016/j.solener.2006.03.008http://www.sciencedirect.com/science/ article/pii/S0038092X06000867. [49] Ruiz-Arias JA, Quesada-Ruiz S, Fernández EF, Gueymard CA. Optimal combination of gridded and ground-observed solar radiation data for regional solar resource assessment. Sol Energy 2015;112:411–24https://doi.org/10.1016/j.solener.2014. 12.011http://www.sciencedirect.com/science/article/pii/S0038092X14006021. [50] Bouttier F, Courtier P. Data assimilation concepts and methods march 1999, Meteorological training course lecture series. ECMWF; 2002. p. 59. [51] Glasbey C, Graham R, Hunter A. Spatio-temporal variability of solar energy across a region: A statistical modelling approach. Sol Energy 2001;70(4):373–81https://doi. org/10.1016/S0038-092X(00)00152-3http://www.sciencedirect.com/science/ article/pii/S0038092X00001523. [52] Wilcox S, Marion W. Users manual for TMY3 data sets Tech. Rep. NREL/TP-58143156 National Renewable Energy Laboratory; 2008https://www.nrel.gov/docs/ fy08osti/43156.pdf. [53] Yang D. A correct validation of the National Solar Radiation Data Base (NSRDB). Renew Sustain Energy Rev 2018;97:152–5https://doi.org/10.1016/j.rser.2018.08. 023http://www.sciencedirect.com/science/article/pii/S1364032118306087. [54] Urraca R, Antonanzas J, Sanz-Garcia A, Aldama A, Martinez-de Pison FJ. An algorithm based on satellite observations to quality control ground solar sensors: Analysis of Spanish meteorological networks. In: de Cos Juez FJ, Villar JR, de la Cal EA, Herrero Á., Quintián H, Sáez JA, Corchado E, editors. Hybrid artificial intelligent systems. Cham: Springer International Publishing; 2018. p. 609–21. [55] Gueymard CA. Temporal variability in direct and global irradiance at various time scales as affected by aerosols. Sol Energy 2012;86(12):3544–53https://doi.org/10. 1016/j.solener.2012.01.013http://www.sciencedirect.com/science/article/pii/ S0038092X12000291. [56] Marion W, Urban K. User's manual for TMY2s Tech. rep. National Renewable Energy Laboratory; 1995http://rredc.nrel.gov/solar/pubs/tmy2/. [57] Nielsen KP, Vignola F, Ramírez L, Blanc P, Meyer R, Blanco M. Excerpts from the report:“Beyond TMY–meteorological data sets for CSP/STE performance simulations”. AIP conference proceedings. AIP Publishing; 2017. p. 140017. [58] Cressie N. Statistics for spatial data. John Wiley & Sons; 2015. [59] Sengupta M, Xie Y, Lopez A, Habte A, Maclaurin G, Shelby J. The National Solar Radiation Data Base (NSRDB). Renew Sustain Energy Rev 2018;89:51–60https:// doi.org/10.1016/j.rser.2018.03.003http://www.sciencedirect.com/science/ article/pii/S136403211830087X. [60] Gramacy RB, Lee HKH. Bayesian treed Gaussian process models with an application
[13] Zell E, Gasim S, Wilcox S, Katamoura S, Stoffel T, Shibli H, Engel-Cox J, Subie MA. Assessment of solar radiation resources in Saudi Arabia. Sol Energy 2015;119:422–38https://doi.org/10.1016/j.solener.2015.06.031http://www. sciencedirect.com/science/article/pii/S0038092X15003394. [14] Eissa Y, Chiesa M, Ghedira H. Assessment and recalibration of the Heliosat-2 method in global horizontal irradiance modeling over the desert environment of the UAE. Sol Energy 2012;86(6):1816–25https://doi.org/10.1016/j.solener.2012.03. 005http://www.sciencedirect.com/science/article/pii/S0038092X12001107. [15] Gherboudj I, Ghedira H. Assessment of solar energy potential over the United Arab Emirates using remote sensing and weather forecast data. Renew Sustain Energy Rev 2016;55:1210–24https://doi.org/10.1016/j.rser.2015.03.099http://www. sciencedirect.com/science/article/pii/S1364032115002592. [16] Vignola F, Grover C, Lemon N, McMahan A. Building a bankable solar radiation dataset. Sol Energy 2012;86(8):2218–29. Progress in Solar Energy 3 https://doi. org/10.1016/j.solener.2012.05.013http://www.sciencedirect.com/science/article/ pii/S0038092X1200182X. [17] Sengupta M, Habte A, Gueymard C, Wilbert S, Renne D. Best practices handbook for the collection and use of solar resource data for solar energy applications Tech. rep. Golden, CO (United States): National Renewable Energy Lab.(NREL); 2017. [18] Yang D, Kleissl J, Gueymard CA, Pedro HT, Coimbra CF. History and trends in solar irradiance and PV power forecasting: A preliminary assessment and review using text mining Solar Energydoi:https://doi.org/10.1016/j.solener.2017.11.023 http:// www.sciencedirect.com/science/article/pii/S0038092X17310022. [19] Polo J, Wilbert S, Ruiz-Arias J, Meyer R, Gueymard C, Súri M, Martín L, Mieslinger T, Blanc P, Grant I, Boland J, Ineichen P, Remund J, Escobar R, Troccoli A, Sengupta M, Nielsen K, Renne D, Geuder N, Cebecauer T. Preliminary survey on site-adaptation techniques for satellite-derived and reanalysis solar radiation datasets. Sol Energy 2016;132:25–37https://doi.org/10.1016/j.solener.2016.03.001http:// www.sciencedirect.com/science/article/pii/S0038092X16001754. [20] Leloux J, Lorenzo E, Garc’ia-Domingo B, Aguilera J, Gueymard CA. A bankable method of assessing the performance of a CPV plant. Appl Energy 2014;118:1–11https://doi.org/10.1016/j.apenergy.2013.12.014http://www. sciencedirect.com/science/article/pii/S0306261913010131. [21] Gueymard CA, Gustafson WT, Bender G, Etringer A, Storck P. Evaluation of procedures to improve solar resource assessments: optimum use of short-term data from a local weather station to correct bias in long-term satellite derived solar radiation time series. World renewable energy forum conference proceedings. 2012. p. 13–7. [22] Zhang Q, Deng X, Qian PZG, Wang X. Spatial modeling for refining and predicting surface potential mapping with enhanced resolution. Nanoscale 2013;5:921–6https://doi.org/10.1039/C2NR33603K. [23] Xiong S, Qian PZG, Wu CFJ. Sequential design and analysis of high-accuracy and low-accuracy computer codes. Technometrics 2013;55(1):37–46https://doi.org/10. 1080/00401706.2012.723572. [24] Qian PZG, Wu CFJ. Bayesian hierarchical modeling for integrating low-accuracy and high-accuracy experiments. Technometrics 2008;50(2):192–204https://doi. org/10.1198/004017008000000082. [25] Qian Z, Seepersad CC, Joseph VR, Allen JK, Jeff Wu CF. Building surrogate models based on detailed and approximate simulations. J Mech Des 2005;128(4):668–77https://doi.org/10.1115/1.2179459. [26] Elsinga B, van Sark WG. Short-term peer-to-peer solar forecasting in a network of photovoltaic systems. Appl Energy 2017;206:1464–83https://doi.org/10.1016/j. apenergy.2017.09.115http://www.sciencedirect.com/science/article/pii/ S0306261917314010. [27] Aryaputera AW, Yang D, Zhao L, Walsh WM. Very short-term irradiance forecasting at unobserved locations using spatio-temporal kriging. Sol Energy 2015;122:1266–78https://doi.org/10.1016/j.solener.2015.10.023http://www. sciencedirect.com/science/article/pii/S0038092X15005745. [28] Yang D, Dong Z, Reindl T, Jirutitijaroen P, Walsh WM. Solar irradiance forecasting using spatio-temporal empirical kriging and vector autoregressive models with parameter shrinkage. Sol Energy 2014;103:550–62https://doi.org/10.1016/j. solener.2014.01.024http://www.sciencedirect.com/science/article/pii/ S0038092X14000425. [29] Perez R, Kivalov S, Schlemmer J, Hemker K, Hoff TE. Short-term irradiance variability: preliminary estimation of station pair correlation as a function of distance. Sol Energy 2012;86(8):2170–6. progress in Solar Energy 3 https://doi.org/ 10.1016/j.solener.2012.02.027http://www.sciencedirect.com/science/article/pii/ S0038092X12000928. [30] Yang D, Reindl T. Solar irradiance monitoring network design using the variance quadtree algorithm. Renewables: Wind, Water, and Solar 2015;2(1):1https://doi. org/10.1186/s40807-014-0001-x. [31] Zagouras A, Kazantzidis A, Nikitidou E, Argiriou A. Determination of measuring sites for solar irradiance, based on cluster analysis of satellite-derived cloud estimations. Sol Energy 2013;97:1–11https://doi.org/10.1016/j.solener.2013.08. 005http://www.sciencedirect.com/science/article/pii/S0038092X13003150. [32] Yang D. On adding and removing sensors in a solar irradiance monitoring network for areal forecasting and PV system performance evaluation. Sol Energy 2017;155:1417–30https://doi.org/10.1016/j.solener.2017.07.061http://www. sciencedirect.com/science/article/pii/S0038092X17306461. [33] Hay JE, Suckling PW. An assessment of the networks for measuring and modelling solar radiation in British Columbia and adjacent areas of Western Canada. Can Geogr/Le Géographe canadien 1979;23(3):222–38https://doi.org/10.1111/j.15410064.1979.tb00659.x. [34] Yang D. SolarData: An R package for easy access of publicly available solar datasets. Sol Energy 2018;171:A3–12https://doi.org/10.1016/j.solener.2018.06.107http:// www.sciencedirect.com/science/article/pii/S0038092X18306583.
10
Renewable and Sustainable Energy Reviews 113 (2019) 109260
D. Yang and C.A. Gueymard
enbuild.2012.06.024http://www.sciencedirect.com/science/article/pii/ S037877881200312X. [63] Sacks J, Welch WJ, Mitchell TJ, Wynn HP. Design and analysis of computer experiments. Stat Sci 1989;4(4):409–23https://doi.org/10.2307/2245858http:// www.jstor.org/stable/2245858. [64] Cressie N, Johannesson G. Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B 2008;70(1):209–26https://doi.org/10.1111/j.1467-9868.2007.00633. xhttps://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2007.00633.x.
to computer modeling. J Am Stat Assoc 2008;103(483):1119–30https://doi.org/10. 1198/016214508000000689. [61] Ranjan P, Haynes R, Karsten R. A computationally stable approach to Gaussian process interpolation of deterministic computer simulation data. Technometrics 2011;53(4):366–78https://doi.org/10.2307/41714950http://www.jstor.org/ stable/41714950. [62] Heo Y, Zavala VM. Gaussian process modeling for measurement and verification of building energy savings. Energy Build 2012;53:7–18https://doi.org/10.1016/j.
11