Estimation of hourly global solar radiation using Multivariate Adaptive Regression Spline (MARS) – A case study of Hong Kong

Estimation of hourly global solar radiation using Multivariate Adaptive Regression Spline (MARS) – A case study of Hong Kong

Energy 186 (2019) 115857 Contents lists available at ScienceDirect Energy journal homepage: www.elsevier.com/locate/energy Estimation of hourly glo...

2MB Sizes 0 Downloads 36 Views

Energy 186 (2019) 115857

Contents lists available at ScienceDirect

Energy journal homepage: www.elsevier.com/locate/energy

Estimation of hourly global solar radiation using Multivariate Adaptive Regression Spline (MARS) e A case study of Hong Kong Danny H.W. Li a, Wenqiang Chen a, *, Shuyang Li a, Siwei Lou b a

Building Energy Research Group, Department of Architecture and Civil Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong SAR, China b School of Civil Engineering, Guangzhou University, 230 Guangzhou Higher Education Mega Center West Outer Ring Road, Panyu District, Guangzhou, China

a r t i c l e i n f o

a b s t r a c t

Article history: Received 3 April 2019 Received in revised form 20 June 2019 Accepted 31 July 2019 Available online 2 August 2019

Solar energy is the most popular resource for power generation among the various available renewable energy alternatives. Solar radiation data are important for solar systems and energy-efficient building designs. Due to the unavailability of measurement, solar radiation prediction models are required. Recently, machine learning techniques were successfully used for predicting solar radiation. However, previous works were mainly focusing on monthly average daily or daily solar radiation. In this study, models for predicting hourly global solar radiation on a horizontal surface were developed based on Multivariate Adaptive Regression Spline (MARS) method. Hourly meteorological data measured in 7 years were used for the study. Sensitivity analysis was conducted using MARS algorithm and the most important variables were selected as inputs of the proposed models. 16 MARS models with different combinations of input variables were proposed. Logistic regression and Artificial Neural Networks (ANN) methods were also used to develop models for comparative study. Finally, the proposed models were evaluated against measurements and compared with existing models. The results showed that the proposed MARS models have good performance in both prediction accuracy and interpretability. The proposed models could be used to estimate effectively the hourly solar radiation according to different combinations of measured variables. © 2019 Elsevier Ltd. All rights reserved.

Keywords: Hourly global solar radiation MARS Sensitivity analysis Hong Kong

1. Introduction Solar energy has been considered as the most popular renewable energy resource for generating electric power and collecting thermal resources and has been applied in various fields around the world. Solar radiation data are important to active solar energy systems [1] and passive energy-efficient building designs [2]. Longterm ground measurement is the most effective and accurate way of setting up databases of the required solar radiation and other climatic parameters. However, many places do not offer measured solar radiation data [3]. For instance, there are 756 meteorological stations altogether in China, among which only 122 of them have records of global solar radiation (GSR) [4]. Besides, the solar radiation station is quite expensive to build and the maintenance of the devices is time-consuming. Alternatives to get solar radiation data

* Corresponding author. E-mail address: [email protected] (W. Chen). https://doi.org/10.1016/j.energy.2019.115857 0360-5442/© 2019 Elsevier Ltd. All rights reserved.

are needed. Without direct measurement in rural locations, GSR data can be predicted from empirical or machine learning based models using geographical parameters (i.e. solar altitude) and meteorological variables (i.e. sunshine hour, cloud amount, temperature and humidity). Such meteorological variables are easier to measure than GSR data and are commonly recorded in most locations. The solar radiation models developed based on the measured parameters at one location may not be applicable to other areas. Thus, common models with specified coefficients should be applied based on the measured data at different locations and various models with different combinations of predictors are required since different meteorological stations may record different variables. Different criteria can be used to classify the existing models for estimating horizontal GSR which could be mainly divided into two categories, namely daily (or monthly average daily) GSR models, monthly average hourly GSR models. Khorasanizadeh and Mohammadi [5] evaluated 11 models for predicting the monthly mean GSR over six major cities of Iran. Yao et al. [6] analyzed and

2

D.H.W. Li et al. / Energy 186 (2019) 115857

compared 89 existing monthly average daily GSR models and 19 existing daily GSR models in China by using 42 years meteorological data. In a recent review study [7], 105 literature models were assessed to estimate horizontal GSR on the basis of statistical tests, and most of the models were monthly average daily models. There were a number of models for predicting daily or monthly average daily GSR developed in recent years [8e10]. The solar radiation varies a lot during the day because of the changeable weather condition, especially in locations where overcast and partly cloudy skies occur frequently, such as Hong Kong [11]. Thus, daily GSR data are more readily available than data at shorter intervals (i.e. hourly). In general, monthly average or daily values can be used for preliminary sizing or design of photovoltaic systems. Hourly values, however, are required for solar systems that need more correct and precise sizing, or for evaluating building envelope cooling loads [12]. Since the hourly value has recorded the detailed change of solar radiation in a day, it would be more accurate than daily and monthly average daily radiation data. Certainly, the process of accurate estimation becomes more difficult [13]. Ahmad and Tiwari [14] reviewed several parametric models for estimating hourly global radiation. Although the time interval was 1 h, these models were developed only for clear sky conditions and the constants in these models were monthly averaged values. They also reviewed various models for predicting the hourly/daily radiation ratio during the average day of each month. Khatib and Elmenreich [15] proposed a model for hourly GSR from daily solar radiation data using a Generalized Regression Artificial Neural Network. However, the model is only applicable to sites where daily averages of solar radiation are available. A recent review work proposed by Li and Lou [16] also showed that most of the hourly GSR models estimated the ratio of the hourly horizontal GSR to its daily total using the latitude and solar angles. The clearness index, defined as the ratio of horizontal GSR to horizontal extraterrestrial solar radiation (Go), was commonly used as the output of various GSR models. Some early work assumed an equal hourly and daily clearness index. Later, the ratio of the hourly to the daily clearness index was modified with various constant terms. Wu and Chan [17] proposed a novel hybrid model that combines both the Autoregressive and Moving Average (ARMA) and Time Delay Neural Network (TDNN) to predict hourly GSR. Nik etc. [18] estimated six selected models to evaluate the monthly mean hourly GSR from the daily GSR. The results showed that the choice of the models strongly depended on the climatic characteristics of the considered site. S¸en and Tan [19] developed simple models which appeared in the form of parabolic equations with three parameters to estimate monthly average hourly GSR. The parameters were estimated by using the classical least squares technique. Zhang [20] established hourly GSR models for different locations by multiple regressions using 3-h interval measured meteorological data. Then, methodologies of interpolations were developed to produce 1-h data with the 3-h data using the Fourier series. Most hourly GSR models did not predict the amount of hourly radiation based on meteorological data measured simultaneously. Average the hourly values for individual months or estimate the ratio of hourly to daily value is the most common practice for predicting hourly GSR. According to the techniques used to develop the models, the GSR models can be classified into traditional empirical models and machine learning (ML) based models. Empirical prediction models tend to involve regression techniques, the principal merit is the simple format of mathematical expression [21]. The most commonly used meteorological variables for estimating solar radiation are sunshine hour [22e24], cloud amount [25e27], temperature [28] and relative humidity [29]. Empirical models come in a variety of forms, such as one or multi-linear models, logical models, natural logarithmic models, and other nonlinear models.

Normally, multi-parameter models are more accurate than singleparameter models, polynomial models are more accurate than linear models [6,30]. The issue is that these simple empirical models underperform if the systems being modelled are nonlinear. Recently, various ML algorithms, including Artificial Neural Network (ANN) and Support Vector Regression (SVR), have been used for estimating solar radiation [31,32]. Models developed using these techniques tended to be more complex, less interpretable [33], and yet more accurate than regression equations [34,35]. More importantly, ML algorithm can be adopted to identify the contribution of individual input variable in the estimation of output in terms of mean-absolute-error (MAE), mean-bias-error (MBE), rootmean-square-error (RMSE) and coefficient of determination (R2) [36]. Such approaches have been used for many problems with large datasets, large quantities of relevant variables and complex interrelationships between input and output variables [37]. After reviewing the previous solar radiation models, several issues can be found:  Firstly, most models have been developed for estimating monthly average daily or daily GSR. More work is required to study the estimation methods on hourly GSR.  Secondly, there is an increasing interest in ML methods for estimating GSR, but most of them are too complex and less interpretable to use. Large number of parameters are needed to present the final model. Besides, the model would be less useful if there are not enough data for the training.  Thirdly, sensitivity analysis should be conducted carefully to evaluate the importance of each variable before choosing the inputs for solar radiation models. The interactions between input variables should be considered to indicate whether there are inner correlations.  Finally, various appropriate prediction models should be developed according to the availability of the measured parameters. Models should be developed using different combinations of input variables. This paper proposes an applicable and flexible approach for estimating the hourly GSR on a horizontal surface based on routine measurements of various meteorological parameters. Multivariate Adaptive Regression Spline (MARS) [38] was used to develop different hourly GSR models. Although MARS method can be found in several studies on solar radiation modelling, it was only used to predict the monthly average GSR [39] or daily energy output [40]. No study has been found using MARS method for estimating the hourly GSR. More importantly, this study will develop various hourly GSR models to meet more diverse requirements. The final models were developed based on the results of sensitivity analysis. Logistic regression and ANN techniques were modelled for comparison analysis along with two existing empirical models. The rest of paper is organized as follows: Section 2 illustrates the solar radiation and meteorological data used in this study. Section 3 elaborates the methodology for the modelling process including the principles of MARS algorithm and the sensitivity analysis procedures. Section 4 presents the various proposed MARS models. Various statistical indices for evaluating the performance of training and testing were calculated and presented. Lastly, the significance and conclusions of this study are summarized. 2. Solar radiation and meteorological data In this study, long-term measurements taken in Hong Kong (latitude 22º180 N, longitude 114º10’ E) were used to develop various hourly GSR models. Hong Kong has a typical subtropical monsoon climate, which has warm, dry winters and hot, humid

D.H.W. Li et al. / Energy 186 (2019) 115857

summers. Due to the low latitude and the deficiency of fossil fuel resources, Hong Kong has a large potential for solar energy applications [41]. Solar radiation and meteorological data acquired from the measurements made by the Hong Kong Observatory (HKO) were used for the study. The seven years hourly data from 2010 to 2016 were analyzed for the training and testing of the MARS models. Table 1 summarizes all the measured parameters and their measurement frequency and the range of their values. GSR at HKO station was measured using Kipp & Zonen thermopile radiometers, the 1-min average solar radiant power (W/m2) received on a horizontal surface was recorded and accumulated into hourly intervals (MJ/m2). Hourly record of sunshine duration refers to the duration in the 60-min interval centered on the hour in local time. For other parameters measured from the automatic weather station, the recorded data were on the hour. To eliminate spurious data and inaccurate measurements, quality-control tests based on the CIE guidance [42] were adopted. Under Level 0 test, 2930 hourly irradiance data with a solar altitude of less than 4 or the GHI (Global horizontal irradiance) less than 20 W/m2 were rejected. Then, Level 1 test was conducted based on the conversion relationship between GHE (Extraterrestrial solar irradiance) and GHI. Level 2 test was conducted based on whether there was an effective record of meteorological parameters. Afterwards, 0 and 293 hourly data were excluded under Level 1 and Level 2 tests, respectively. After all the tests, 27646 hourly solar radiation and meteorological datasets were retained for the analysis. Table 2 summarizes the quality control tests and the number of accepted and rejected data for each process.

3

select important variables and abandon redundant variables, which can easily avoid underfitting and prevent overfitting. Mathematically, the MARS model can be represented in a form that separately identifies the additive contributions and those associated with different multivariable interactions [38]. MARS makes no assumptions about the underlying functional relationships between dependent and independent variables [43]. The goal is to model the dependence of a response variable y on one or more predictor variables x1 ; …xn given observed realizations fyi ; x1i ; …xni ; gN i¼1 . The system that generates the data can be described by

y ¼ f ðx1 ; …xn Þ þ e

where f is the built MARS model, e is the fitting error of the approximation to f over the domain ðx1 ; …xn Þ of interest. The model-building strategy is like a forward stepwise linear regression, but instead of using the original inputs, MARS automates the building of flexible models by generating a sequence of piecewise splines and their products [43]. The splines are connected smoothly together, and these piecewise curves (polynomials), also known as BFs, result in a flexible model that can handle both linear and nonlinear behavior. BFs always come in a reflected pair, each function is piecewise linear or cubic, with knot at a specified value. The locations of knots can be various for each input variable. The piecewise linear BFs have the following form,

ðx  tÞþ ¼ maxð0; x  tÞ ¼

3. Methodology This section first introduces the principles of MARS model, then a MARS model with all input variables was built to estimate hourly GSR. Sensitivity analysis was conducted using the built MARS model. Five most important variables were selected according to the sensitivity analysis and their different combinations were used as inputs for developing various MARS models. 3.1. Multivariate Adaptive Regression Spline (MARS) The MARS model takes the form of an expansion in product spline basis functions (BFs), where the number of BFs and the parameters associated with each one (product degree and knot locations) are automatically determined by the data. This procedure is motivated by recursive partitioning (e.g. Classification and Regression Tree (CART)) and shares its ability to capture high order interactions. The MARS method has several merits over other machine learning approaches. Firstly, the MARS model is easy to understand and has better interpretability features than other techniques, such as ANN and CART. Secondly, the mathematical expression of the MARS model has less coefficients and convenient for application. This can be important to speed up the calculations and saving computational resources without deteriorating the accuracy of estimation. Thirdly, the algorithm can automatically

(1)

¼ maxð0; t  xÞ

8 <

x  t ; if x > t otherwise :0

and

ðt  xÞþ

8 <

t  x ; if x < t otherwise :0 (2)

where, the “þ” means positive part, t is the knot. Fig. 1 shows an example of a pair of functions (x  t)þ and (t  x)þ at t ¼ 0.5. The general form of the MARS model can be represented by the following expression,

f ðXÞ ¼ b0 þ

M X

bm Вm ðXÞ

(3)

m¼1

where X is a sample vector of input variables, Вm is the m-th BF with coefficient bm , M is the number of BFs, and the term b0 is constant coefficients. Вm in Eq. (3) could be a single spline function or a product of two or more BFs to automatically model interactions between features. The MARS algorithm has the following two phases: Forward process: A set of BFs are generated over the domain of interest. The algorithm starts with a model consisting of just the intercept term and iteratively adds reflected pairs of BFs giving the largest reduction of training error. The forward phase is executed until the maximum number of BFs is reached or the R2 changes less

Table 1 Summary of all measured parameters. Name

Notation

Unit

Frequency

Range

Name

Notation

Unit

Frequency

Range

Horizontal global solar radiation Sunshine duration Visibility Cloud cover amount Wind speed

Go Sh Vis Cld Spd

MJ/m2 hour km Octas m/s

1-min hourly hourly hourly hourly

0.08e3.9 0e1 0.3e40 0e8 0.1e10.5

Dry bulb air temperature Wet bulb air temperature Dew-point temperature Relative humidity Mean sea level air pressure

Temp Wet Dew Rh Pre

C

hourly hourly hourly hourly hourly

3.2e36.1 1.3e29.6 8.6e28.4 21e100 98.92e102.99

C C % kPa

4

D.H.W. Li et al. / Energy 186 (2019) 115857 Table 2 Data quality control process. CIE test level

Criterion

Number of data record (Hour)

Solar altitude a > 0 Solar altitude a > 4; GHI > 20 W/m2 0 < GHI < 1.2 *GHE*sin(a); Effective record of meteorological parameters

Total Level 0 Level 1 Level 2

Rejected

Accepted

2930

30869 27939

0 293

27939 27646

each MARS model and for the sensitivity analysis. Ten meteorological variables were chosen as inputs to estimate the hourly GSR on horizontal surface (G) as shown in Table 3. The hourly value of Go in MJ/m2 can be approximated with the following equation:

Go ¼ sina

Fig. 1. The example of BFs (x  t)þ (orange) and (t  x)þ (blue) used by MARS.

than a threshold [44]. Since a large number of BFs are added one after another, an overfitting model is created. Backward process: For preventing overfitting, the model is pruned by deleting one least important redundant BFs at a time until the model has only the intercept term. At the end of the backward phase, from those “best” models of each size, the one with the lowest Generalized Cross-Validation (GCV) is selected and outputted as the final one [44]. GCV is adopted by MARS to delete the redundant BFs. The expression of GCV is given below:

GCV ¼ MSEtrain

 enp2 1 N

(4)

where MSEtrain is mean squared error of the model in the training data, N is the number of observations, and enp is the effective number of parameters. The detailed calculation of GCV can be found in Ref. [38]. 3.2. Sensitivity analysis To evaluate the contribution of each input parameter to the estimation of hourly GSR, a MARS model was built using all predictors. Then, several factors were calculated to indicate the importance of each input predictor. Together with the analysis of variance (ANOVA) decomposition of the original MARS model, the most important 5 predictors were selected. The modelling of MARS

3600 Esc 106

   ðn  3Þ 1 þ 0:033cos 360 365

where a is the solar altitude, n is the day of year, Esc is the solar constant, 1367 W/m2. The hourly Go captures both the seasonal and daily fluctuation of solar radiation. So, it could be a good predictor for estimating hourly GSR. To choose an appropriate MARS model with all the input variables which can be used for the further sensitivity analysis, it is important to determine the number of BFs. One of the way to select the best number of BFs is to evaluate backward pruning phase's best candidate models of each size using Cross-Validation. Ten-fold Cross-Validation was used to select the “best” number of BFs. All 10 variables from Table 3 were used as input and the hourly GSR was target variable for this process. The values of all inputs were normalized to [0, 1] as recommended by Ref. [38]. For each model created by 10-fold Cross-Validation the R2s of training data and GCV of test data were calculated. Fig. 2 shows the R2 for models of each fold. The red solid line is the average of R2 for each model size. The ten blue dotted lines show the GCV for models of each fold. The blue solid line is the average GCV for each model size. When the number of BFs increases from 1 to 10, the R2 rapidly increases to around 0.9, and the GCV decrease to 0.1. However, both R2 and GCV vary slightly when number of BFs increase from 10 to 50 and 20 BFs should be enough to capture all the features of the input data with an acceptable complexity of the model. Based on this, an original MARS model with all predictors was built, and the maximum number of BFs in the final model was set to 20. Fig. 3 shows the change of R2 and MSE with number of BFs in the backward phase, the one with 20 BFs was selected and outputted as the final model (Model 1). The equation of Model 1 can be expressed as:

G ¼ 0:737 þ 1:26BF1  2:98BF2 þ 2:35BF3  1:6BF4 þ 2:26BF5  1:2BF6  5:03BF7  5:28BF8 þ 5:09BF9 þ0:404BF10  2BF11 þ 0:27BF12  0:893BF13  0:59BF14  0:577BF15  1:44BF16 þ 8:4BF17 þ4:11BF18 þ 0:574BF19

models was achieved through the “ARESLab” toolbox [44] in MATLAB. For interpretation and simplicity, this study only considers the piecewise linear function, the maximum order of interactions between variables was limited to 2. Around 20000 hourly datasets from 2010 to 2014 were used for the training of

(5)

(6)

where G is the hourly GSR in MJ/m2 and BF1, …, BF19 are the BFs used by Model 1. Table 4 lists the BFs of the MARS model and the corresponding equations. The R2 and GCV value of Model 1 are 0.9213 and 0.0708 respectively in the training dataset, showing a strong correlation

D.H.W. Li et al. / Energy 186 (2019) 115857

5

Table 3 Predictors of hourly global solar radiation for MARS model 1. Variable

Name

Notation

Unit

Variable

Name

Notation

Unit

x1 x2 x3 x4 x5

Horizontal extraterrestrial solar radiation Sunshine duration Visibility Cloud cover amount Wind speed

Go Sh Vis Cld Spd

MJ/m2 Hour Km Octas m/s

x6 x7 x8 x9 x10

Dry bulb air temperature Wet bulb air temperature Dew-point temperature Relative humidity Mean sea level air pressure

Temp Wet Dew Rh Pre

C C C % kPa

Table 5 Estimated input variable importance based on Model 1.

Fig. 2. Results of 10-fold Cross-Validation for MARS model.

Fig. 3. Performance of MARS model of each size and the selected final model.

between the 10 meteorological parameters and G. To evaluate the contribution of each predictor, sensitivity analysis was performed

Variable

Notation

delGCV

nSubsets

subsRSS

subsGCV

Ranking

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Go Sh Vis Cld Spd Temp Wet Dew Rh Pre

100 43.216 1.148 1.502 1.174 0.323 0 0.116 4.949 0.436

17 18 9 12 11 4 0 0 13 5

45.184 100 0.772 1.558 1.264 0.23 0 0 2.782 0.284

45.176 100 0.754 1.534 1.243 0.222 0 0 2.758 0.275

2 1 6 4 5 8 10 9 3 7

based on Model 1 (Eq. (6)). Several criteria were considered in this procedure. Table 5 reports the results of estimated input variable importance. The first and second columns are indices of the input variables and their notations. Columns 3 to 6 report the scores of four indicators, detailed explanation of these indicators can be found in Ref. [44]. The last column reports the ranking of importance of each predictor. To check whether the variables enter the model additively or are involved in interactions between variable, analysis of variance (ANOVA) decomposition [38] of MARS Model 1 was conducted. The ANOVA decomposition estimates the proportion of variance explained when all BFs corresponding to that particular ANOVA function are excluded from the model. By comparing it to the GCV estimate of R2 for the full model, the amount of reduction the exclusion brought can be identified. Larger value of GCV and smaller value of R2 indicates greater importance [44]. The result of ANOVA decomposition is shown in Table 6. In summary, based on the investigation of individual contributions of variables by MARS Model 1, and joint contributions of variables in the model by ANOVA decomposition, the most important five parameters are G0, Sh, Rh, Cld and Spd which were selected to examine the various correlations with hourly GSR by using MARS. Different combinations of these predictors were used as inputs for various proposed MARS models.

Table 4 BFs and corresponding equations of Model 1 for estimating G. BF

Equation

BF

Equation

BF1 BF2 BF3 BF4 BF5 BF6 BF7 BF8 BF9 BF10

maxð0; x2  0:1 Þ maxð0; 0:1  x2 Þ BF1maxð0; x1  0:852Þ BF1maxð0; 0:852  x1 Þ maxð0; x1  0:504 Þ maxð0; 0:504  x1 Þ BF5maxð0; x9  0:73Þ BF2maxð0; x1  0:42Þ BF2maxð0; 0:42  x1 Þ maxð0; 0:875  x4 Þ

BF11 BF12 BF13 BF14 BF15 BF16 BF17 BF18 BF19

BF5maxð0; 0:346  x5 Þ BF1maxð0; x3  0:3Þ BF1maxð0; 0:3  x3 Þ maxð0; x9  0:78 Þ BF10maxð0; 0:769  x1 Þ BF5maxð0; 0:563  x10 Þ BF5maxð0; x8  0:922Þ BF2maxð0; x6  0:509Þ maxð0; 0:78  x9 Þmaxð0; 0:9  x2 Þ

6

D.H.W. Li et al. / Energy 186 (2019) 115857

Table 6 ANOVA decomposition of the MARS model. ANOVA Function

GCV

R2

BFs

Variable(s)

1 2 3 4 5 6 7 8 9 10 11 12 13

0.1022 0.1040 0.0735 0.0712 0.0935 0.0715 0.0730 0.0710 0.0735 0.0716 0.0730 0.0714 0.0711

0.8859 0.8839 0.9180 0.9205 0.8956 0.9201 0.9185 0.9207 0.9179 0.9200 0.9185 0.9203 0.9206

2 2 1 1 4 1 1 1 1 1 2 1 1

x1 x2 x4 x9 x1 x1 x1 x1 x1 x1 x2 x2 x2

x2 x4 x5 x8 x9 x10 x3 x6 x9

4. Results This section first gives the mathematical expressions and performance evaluations of various proposed MARS models for predicting G. Then, logistic regression and ANN techniques were used to build various models for comparison analysis along with two existing empirical models. The models were tested against 2 years measured data, statistical indices for evaluating the performance of those models were calculated.

4.1. MARS models Since G0 can be calculated at a specific time using Equation (5), G0 was used as input variable for all the proposed models. Except for G0, all combinations of other four variables including Sh, Rh, Cld and Spd were considered. Totally, 16 MARS models were proposed with 16 different combinations of these 5 predictors as inputs for estimating hourly GSR. The number of BFs in each model was selected in the same way as Model 1 (i.e. 10-fold cross-validation). The performance of the developed MARS models (Model 2 to Model 17) has been assessed in terms of R2 between measured values and estimated values. The mathematical formula of each model and the corresponding BFs can be found in Appendix A. The predictors, performance and number of BFs of each model are summarized in Table 7. The best models with different number of predictors are highlighted. The R2s are various depending on the number of parameters and complexity of the models with the minimum value of 0.402 (Model 2) and maximum value of 0.918 (Model 17). Without any climate

Table 7 Proposed MARS models and their performance in training datasets.

1 predictor 2 predictors

3 predictors

4 predictors

5 predictors

Model

Predictors

BFs

R2

Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12 Model 13 Model 14 Model 15 Model 16 Model 17

G0 G0, G0, G0, G0, G0, G0, G0, G0, G0, G0, G0, G0, G0, G0, G0,

5 7 7 7 7 10 10 10 10 10 10 15 15 15 15 15

0.402 0.896 0.768 0.707 0.443 0.912 0.903 0.902 0.824 0.783 0.721 0.916 0.908 0.916 0.831 0.918

Sh Cld Rh Spd Sh, Rh Sh, Cld Sh, Spd Cld, Rh Cld, Spd Rh, Spd Sh, Cld, Rh Sh, Cld, Spd Sh, Spd, Rh Cld, Rh, Spd Sh, Cld, Rh, Spd

parameters, G0 alone can be used to estimate G with an acceptable accuracy. The R2 increases significantly when Sh, Cld, Rh are included, while there is only a slight increase when Spd is added. The R2 of the model with only G0 and Sh reaches 0.896, which performs the best among models with two predictors. When the number of predictors is increased to 3, 4 and 5, there is an improvement of R2. Model 7 performs the best among models of 3 predictors with an R2 of 0.912, while Model 12 performs the worst with an R2 of 0.721 when Rh and Spd are added. Models with predictor Sh perform better than models without Sh. Model 13 has the same R2 value as Model 15 which indicates that Rh and Spd have the same contribution when they are added to Model 7. Model 16 is a model that takes into consideration of Cld, Rh and Spd rather than Sh, it still offers widespread applicability and relatively good performance with an R2 value of 0.831. Model 17 has 5 predictors but the performance does not improve significantly than Models 13 and 15. To further evaluate the reliability of the proposed MARS models, hourly datasets from 2015 to 2016 were used to test the models. Models 2 to 17 were used to predict hourly GSR. The performance indices including MAE, MBE, RMSE and RMSRE were calculated and are summarized in Table 8. The equations of these indices can be found in Appendix B. The best performance model for each group is highlighted. Generally, the performance of the MARS models improves with the increase of the number of predictors and the complexity of the expressions. The MAEs range from 0.198 to 0.594 MJ/m2, which are relatively small. The MBEs are mainly small negative values except Model 2 indicating that most models slightly underestimate the hourly GSR. The RMSREs range from 22.4% to 60.7%. When Sh is included, the RMSREs are lower than 25.3%. Model 17 has the highest R2 and lowest MAE and RMSE when five predictors are used but its complexity is also the highest. Models 13 and 15 can provide a good performance with a relatively low complexity. Sh is a widely available climatic variable measured in many stations, so it can be the most commonly used parameters for estimating G. Cld seems to be the second best predictor. Spd may have the least contribution but together with Cld and Rh, the model can have a fairly good performance. Models 3, 7, 13 and 17 are highly recommended to estimate hourly GSR. The findings are very helpful for users to select the appropriate one according to the availability of measured parameters. 4.2. Model comparisons Note that the variance of the estimating of hourly GSR can be high because of different realizations of the data. Other possible approaches including ANN and logistic regression techniques were adopted to build new models for comparative analysis. For the model evaluation, not only the prediction accuracy but also the interpretability of the models were considered. 4.2.1. ANN models Recently, applications of ANN in GSR prediction have shown a significant growth trend. Although ANN model has less interpretable feature, it can be a flexible tool [45]. In this study, 3-layer feedforward ANN models were developed. The input variables were identical to the best MARS models (Model 2, 3, 7, 13, 17) in each size, and the target output was G. Totally, five ANN models were developed. The accuracy of the ANN models is sensitive to the architecture of the neural networks. To ensure a fair comparison, the number of neurons in the hidden layer of each ANN model was set to be the same as the number of BFs in the corresponding MARS model, so that similar complexity could be achieved. Logistic sigmoid transfer function was used in the developed models to

D.H.W. Li et al. / Energy 186 (2019) 115857

7

Table 8 Summary of the performance of MARS models in testing datasets.

1 predictor 2 predictors

3 predictors

4 predictors

5 predictors

Model

MAE (MJ/m2)

MBE (MJ/m2)

RMSE (MJ/m2)

RMSRE (%)

R2

Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12 Model 13 Model 14 Model 15 Model 16 Model 17

0.594 0.229 0.319 0.399 0.564 0.210 0.218 0.220 0.283 0.312 0.388 0.201 0.210 0.202 0.280 0.198

þ0.022 ¡0.022 0.021 0.002 0.003 ¡0.021 0.026 0.020 0.012 0.029 0.011 ¡0.021 0.026 0.024 0.021 ¡0.021

0.741 0.308 0.446 0.525 0.708 0.282 0.296 0.299 0.400 0.435 0.513 0.276 0.289 0.276 0.396 0.274

60.7 25.3 36.5 42.3 58.0 23.1 24.2 24.5 32.7 35.6 42.0 22.6 23.7 22.6 32.4 22.4

0.374 0.892 0.774 0.686 0.429 0.909 0.901 0.898 0.818 0.785 0.700 0.913 0.905 0.913 0.822 0.915

term. Function f ðXÞ has the following form:

Table 9 Configurations and performance of the ANN models.

f ðXÞ ¼

Model No.

Inputs

No. of hidden neurons

R2

ANN-1 ANN-2 ANN-3 ANN-4 ANN-5

G0 G0, G0, G0, G0,

5 7 10 15 15

0.400 0.902 0.915 0.918 0.921

Sh Sh, Rh Sh, Cld, Rh Sh, Cld, Rh, Spd

capture the nonlinear features of the inputs. Gradient descent with momentum backpropagation (GDM) training algorithm, which is the most widely used optimization techniques for training neural networks, was used for ANN training. Again, the datasets were used for the ANN development. All the variables were normalized between 0 and 1 before training. A series of scripts written in MATLAB were used to implement the above processes. Table 9 reports the characteristics of the developed ANN models and their performance. 4.2.2. Logistic models Logistic regression is a classical statistical method which can be seen as a special case of the generalized linear model. It can capture the non-linear relationship between the variables and outputs by using a sigmoid curve. Logistic regression models have a wide range of applications in sustainable and energy research, such as modelling CO2 emissions [46]. Five logistic models were built to estimate the hourly GSR. The five models were in the same form which was a product of G0 and a sigmoid function. The models can be expressed as below:

G ¼ G0

1 1 þ ef ðXÞ

(7)

where f ðXÞ is a polynomial function of other meteorological parameters, X can be a combination of several variables or a constant

X ai xi þ a0

(8)

where ai are the coefficients and a0 is a constant term, xi are input variables. Table 10 presents the regression results of the five logistic models. The R2s are in the range of 0.398e0.891, which were less than the corresponding ANN models. 4.2.3. Zhang's model Zhang [20] developed an empirical model to estimate the hourly GSR for the 57 locations for Chinese typical meteorological database. The model shows the correlation between hourly GSR and solar altitude, temperature change from previous hours, the amount of cloud cover and relative humidity. The mathematical model is shown as follows:

( G¼

"

G0 

2 Cld Cld þ c2  þ c3  ðTn  Tn3 Þ 8 8 ),

c0 þ c1  #

þ c4  Rh  c5

c6 (9)

where c0 , …,c6 are regression coefficients, Tn  Tn3 is the temperature difference from previous 3 h. New coefficients c0 , …,c6 were formed and are summarized inTable 11. 4.2.4. Cloud-cover radiation model The Cloud-Cover Radiation Model (CRM) was first proposed by Kasten and Czeplak [47]. The original coefficients of the model were validated against UK data and the results showed in good performance. The CRM was further modified by Gul et al. [48] and Munner and Gul [49] to improve the performance by replacing the four constant coefficients of four location depended coefficients as

Table 10 Coefficients and R2s of the logistic models. Model No.

Inputs

Logistic-1 Logistic-2 Logistic-3 Logistic-4 Logistic-5

G0 G0, G0, G0, G0,

Sh Sh, Rh Sh, Cld, Rh Sh, Cld, Rh, Spd

R2

Coefficients a0

a1

0.314 1.346 0.089 0.121 0.265

2.052 1.780 1.808 1.784

a2

1.508 0.009 0.008

a3

1.553 1.541

a4

0.052

0.398 0.876 0.888 0.888 0.891

8

D.H.W. Li et al. / Energy 186 (2019) 115857

Table 11 Coefficients and R2 of Zhang's models. Model

Zhang

Inputs

R2

Coefficients

G0, Cld, Rh, Tn  Tn3

c0

c1

c

c3

c4

c5

c6

0.953

0.708

1.091

0.049

0.509

0.089

1.182

shown in Eqs. (10) and (11):

GC ¼ A,sin a  B

(10)

i h G ¼ GC , 1  C,ðCld=8ÞD

(11)

where A, B, C and D are four coefficients. In this study, the coefficients in CRM have been recalibrated by the 5 years hourly data measured in Hong Kong. The value of A, B, C and D are 3.930, 0.411, 0.915 and 3.322, and the R2 is 0.765.

4.2.5. Comparison results The above ANN, logistic and empirical models were compared with the best five MARS models (Model 2, 3, 7, 13, 17). Hourly solar radiation and meteorological datasets measured from 2015 to 2016 were used to test the models. The findings are summarized in Table 12. In general, the five ANN models perform the best in estimation of G, followed by MARS models. The five logistic models are not as good as MARS and ANN models. However, in the group with 4 predictors, Logistic-4 is more accurate than Zhang's empirical model. Specifically, for models with 1 predictor - G0, the three models have similar performance with R2 values less than 0.38, indicating that it is necessary to add climatic parameters to improve the prediction accuracy. For group with 2 predictors, scatter plots of measured and predicted hourly GSR by the four models are presented in Fig. 4. It can be seen that the CRM performs better that Logistic-2 with only solar altitude and cloud amount, but still not as accurate as MARS and ANN models. For model groups with 3, 4 and 5 predictors, logistic models are less accurate than MARS and ANN models with RMSREs larger than 26% and R2s below 0.89. Fig. 5 shows the scatter plots of measured and predicted hourly GSR by the four models with 4 predictors. Obviously, scatter plots of Model 13 and ANN-4 are very similar and more data are close to the measured data than those of Logistic-4 and Zhang's models. It indicates that MARS Model 13 has good prediction accuracy and almost equal to the accuracy as ANN-4. The difference of

0.825

R2 values for the five MARS models and ANN models are less than 0.01, and the difference of RMSRE are less than 1%. Taken together, these results suggest that the accuracy of MARS models is better than traditional regression techniques and can be very close to the accuracy of ANN models. It is importance to consider other metrics than accuracy when two models exhibit a similar performance. The interpretability of the MARS and ANN models was evaluated. Because of the subjective nature, there is no clear existing definition or evaluation criteria for interpretability [50]. However, several aspects to evaluate the interpretability can be considered. In our study, the heuristic approach was used. Firstly, the size of the model is one of the most used heuristics. The MARS model is a transparent model with a combination of simple BFs (piecewise splines), the mathematical equation is simple and needs less parameters to describe. The ANN models do not have a clear mathematical expression, the structure contains various neurons in each layer, pairs of weight and bias are needed to describe the model. Apparently, the size of ANN model is larger than the corresponding MARS model with same inputs. Secondly, three generic criteria proposed by Backhaus and Seiffert [51] were considered: “the ability of the model to select features from the input pattern, the ability to provide class-typical data points and information about the decision boundary directly encoded in model parameters”. ANN models are graded 2 out of 3, because they only meet the second and the third criteria, while MARS models can satisfy all the above three criteria. So, ANN models are less interpretable than MARS models. Through the analysis of mathematical expression of the MARS model, the decision boundaries can be identified by the knot location of each BF. The calculated statistical indices showed that the accuracy of the proposed MARS models is better than traditional regression techniques and is very close to the ANN models for all the model sizes. Accordingly, the selected MARS models should have good interpretability and are more convenient for application than the ANN models. The size of the proposed MARS models was smaller than the corresponding ANN models. With the same inputs and similar complexity, the MARS models can have a good approximation of nonlinear mapping without deteriorating the accuracy of

Table 12 Comparison results of ANN, logistic, empirical models and the five best MARS models. Model 1 predictor

2 predictors

3 predictors

4 predictors

5 predictors

Model 2 ANN-1 Logistic-1 Model 3 ANN-2 Logistic-2 CRM Model 7 ANN-3 Logistic-3 Model 13 ANN-4 Logistic-4 Zhang Model 17 ANN-5 Logistic-5

Input parameters G0

G0, Sh

a, Cld G0, Sh, Rh

G0, Sh, Cld, Rh G0, Cld, Rh, Tn  Tn3 G0, Sh, Cld, Rh, Spd

MAE (MJ/m2)

MBE (MJ/m2)

RMSE (MJ/m2)

RMSRE (%)

R2

0.594 0.590 0.600 0.229 0.221 0.252 0.319 0.210 0.203 0.236 0.201 0.197 0.236 0.281 0.198 0.194 0.236

þ0.022 þ0.023 þ0.055 0.022 0.022 þ0.002 0.021 0.021 0.023 þ0.002 0.021 0.024 0.001 0.014 0.021 0.023 0.008

0.741 0.740 0.745 0.308 0.301 0.338 0.451 0.282 0.279 0.317 0.276 0.275 0.317 0.331 0.274 0.270 0.318

60.7 60.7 61.0 25.3 24.7 27.7 36.9 23.1 22.8 26.0 22.6 22.5 26.0 27.08 22.4 22.1 26.0

0.374 0.378 0.376 0.892 0.898 0.873 0.769 0.909 0.913 0.889 0.913 0.915 0.889 0.828 0.915 0.918 0.888

D.H.W. Li et al. / Energy 186 (2019) 115857

9

Fig. 4. Plots of measured and predicted hourly GSR by: (a) MARS Model 3, (b) ANN-2, (c) Logistic-2 and (d) CRM.

Fig. 5. Plots of measured and predicted hourly GSR by: (a) MARS Model 13, (b) ANN-4, (c) Logistic-4 and (d) Zhang's model.

estimation. The MARS model can automatically select important variables and abandon redundant variables by two phases which can avoid underfitting and prevent overfitting. In addition, through the analysis of mathematical expression of the MARS model, we can

identify the additive contributions and those associated with different multivariable interactions. To sum up, the proposed MARS models showed better behavior than other presented models. The proposed 16 MARS models can be very useful for estimating the

10

D.H.W. Li et al. / Energy 186 (2019) 115857

hourly GSR at locations with different available measured meteorological parameters. 5. Conclusions This study proposes an applicable and flexible approach for estimating the hourly GSR on a horizontal surface, especially for locations with limited measured meteorological parameters. Sensitivity analysis of individual parameters was conducted using MARS techniques. Five most important climatic variables were selected and their different combinations were used to develop various MARS models. Hourly meteorological and GSR data of Hong Kong between 2010 and 2016 were used for the training and testing of the proposed models. Various models with different combinations of predictors were developed and their performance was tested by several indices. The performance of the proposed models depends on the number of predictors and the complexity of the mathematical expressions. New models built with ANN and logistic regression techniques, together with two existing empirical models, were compared with the corresponding MARS models. Results showed that the proposed MARS models perform good in terms of accuracy and interpretability. The flexibility and simplicity of the proposed MARS models make them more applicable than other models. In future works, more long-term measured data will be organized and analyzed. Besides, measured data from various locations will also be evaluated in order to develop common models which can be applied in various locations. Nonetheless, the

MARS models developed by this study would be helpful for the estimation of hourly GSR when less meteorological data is available. The findings are important for solar energy system and energy-efficient building designs. Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐ The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Acknowledgements Work described was fully supported by a Strategic Research Grant from the City University of Hong Kong (Project no. 7005036). Wenqiang Chen was supported by a City University of Hong Kong Postgraduate Studentship. Appendix A The mathematical expressions of proposed MARS models (M2 M17) and the corresponding BFs can be shown as follows. All inputs should be normalized between 0 and 1.

M2. BF

Equation

BF1 maxð0; x1  0:96 Þ BF2 maxð0; 0:954  x1 Þ y ¼ 4.54e328*BF1 þ71.7*BF2 þ280*BF3 -73.8*BF4, where x1 is G0.

BF

Equation

BF3 BF4

maxðx1  0:963Þ maxð0; 0:989  x1 Þ

M3. BF

Equation

BF1 maxð0; x2  0:1 Þ BF2 maxð0; 0:1  x2 Þ BF3 BF1maxð0; 0:852  x1 Þ y ¼ 0.633 þ 1.83*BF1 -2.76*BF2 -2.52*BF3 þ1.79*BF4 -0.624*BF5 -17.1*BF6, where x1 is G0, x2 is Sh.

BF

Equation

BF4 BF5 BF6

maxð0; x1  0:504Þ maxð0; 0:504  x1 Þ BF2maxð0; x1  0:682Þ

M4. BF

Equation

BF1 maxð0; 0:337  x1 Þ BF2 maxð0; x1  0:337Þ*maxð0; x2  0:75 Þ BF3 maxð0; x1  0:337Þ*maxð0; 0:75  x2 Þ y ¼ 0.805e1.65*BF1 -17.7*BF2 þ10.5*BF3 -1.56*BF4 þ7.42*BF5 -9.52*BF6, where x1 is G0, x2 is Cld.

BF

Equation

BF4 BF5 BF6

maxð0; x2  0:625Þ maxð0; x1  0:337Þ*maxð0; x2  0:375 Þ maxð0; x1  0:337Þ*maxð0; 0:375  x2 Þ

M5. BF

Equation

BF1 maxð0; x1  0:337Þ BF2 maxð0; 0:337  x1 Þ BF3 BF1maxð0; x2  0:69Þ y ¼ 0.472 þ 3.15*BF1 -1.38*BF2 -12.2*BF3 þ257*BF4 þ71.7*BF5 -258*BF6, where x1 is G0, x2 is Rh.

BF

Equation

BF4 BF5 BF6

maxð0; 0:88  x2 Þ*maxð0; 0:964  x1 Þ maxð0; 0:88  x2 Þmaxð0; x1  0:958Þ maxð0; 0:88  x2 Þ*maxð0; 0:958  x1 Þ

D.H.W. Li et al. / Energy 186 (2019) 115857

11

M6. BF

Equation

BF1 maxð0; x1  0:337Þ BF2 BF1maxð0; 0:394  x2 Þ BF3 maxð0; x2  0:115Þ y ¼ 1.08 þ 3.75*BF1 -10.3*BF2 -3.26*BF3 -1.51*BF4 -53.1*BF5 þ57.7*BF6, where x1 is G0, x2 is Spd.

BF

Equation

BF4 BF5 BF6

maxð0; 0:648  x1 Þ BF3maxð0; 0:961  x1 Þ BF3maxð0; 0:953  x1 Þ

M7. BF

Equation

BF1 maxð0; x2  0:1 Þ*maxð0; x1  0:852Þ BF2 maxð0; x2  0:1 Þ*maxð0; 0:852  x1 Þ BF3 maxð0; x1  0:504Þ BF4 maxð0; 0:504  x1 Þ BF5 BF3maxð0; x3  0:73Þ y ¼ 0.893 þ 2.07*BF1 -1.9*BF2 þ2.11*BF3 -0.956*BF4 -6.95*BF5 -5.37*BF6 þ4.43*BF7 þ1.35*BF8 where x1 is G0, x2 is Sh, x3 is Rh.

BF

Equation

BF6 BF7 BF8 BF9

maxð0; maxð0; maxð0; maxð0;

0:1  x2 Þ*maxð0; x1  0:419Þ 0:94  x3 Þ*maxð0; x2  0:9Þ x2  0:2 Þ 0:2  x2 Þ

-2.09*BF9,

M8. BF

Equation

BF

Equation

BF6 BF1 maxð0; x2  0:1 Þ*maxð0; x1  0:852Þ BF2 maxð0; x2  0:1 Þ*maxð0; 0:852  x1 Þ BF7 BF3 maxð0; x1  0:504Þ BF8 BF4 maxð0; 0:504  x1 Þ BF9 BF5 maxð0; 0:1  x2 Þmaxð0; x1  0:682Þ y ¼ 1.91 þ 2.4*BF1 -2.05*BF2 þ2.11*BF3 -0.919*BF4 -12.8*BF5 -1.76*BF6 þ2.27*BF7 -1.48*BF8 -13.4*BF9, where x1 is G0, x2 is Sh, x3 is Cld.

BF3*maxð0; x3  0:5Þ maxð0; x2  0:9Þ maxð0; 0:9  x2 Þ maxð0; 0:1  x2 Þmaxð0; x3  0:75Þ

M9. BF

Equation

BF

Equation

BF6 BF1 maxð0; x2  0:1 Þ BF2 maxð0; 0:1  x2 Þ BF7 BF3 BF1maxð0; 0:852  x1 Þ BF8 BF4 BF1maxð0; 0:504  x1 Þ BF9 BF5 maxð0; x1  0:504Þ y ¼ 0.715 þ 1.65*BF1 -4.27*BF2 þ2.52*BF3 -2.12*BF4 þ1.83*BF5 -1.13*BF6 -10.5*BF7 þ5.56*BF8 -1.97*BF9, where x1 is G0, x2 is Sh, x3 is Spd.

maxð0; 0:504  x1 Þ BF2*maxð0; x1  0:682Þ BF2*maxð0; 0:682  x1 Þ BF5*maxð0; 0:356  x3 Þ

M10. BF

Equation

BF

BF6 BF1 maxð0; x1  0:337Þ BF2 maxð0; 0:337  x1 Þ BF7 BF3 BF1maxð0; x3  0:71Þ BF8 BF4 maxð0; x2  0:625Þ BF9 BF5 BF1maxð0; 0:375  x2 Þ y ¼ 0.816 þ 3.38*BF1 -2.14*BF2 -9.03*BF3 -2.67*BF4 -2.05*BF5 þ3.89*BF6 þ1.99*BF7 -1.92*BF8 þ2.08*BF9, where x1 is G0, x2 is Cld, x3 is Rh.

Equation BF4maxð0; 0:819  x1 Þ maxð0; 0:652  x2 Þ*maxð0; x1  0:214Þ maxð0; x2  0:75Þ maxð0; x2  0:875Þ

12

D.H.W. Li et al. / Energy 186 (2019) 115857

M11. BF

Equation

BF

Equation

BF6 BF1 maxð0; x1  0:337Þ BF2 maxð0; 0:337  x1 Þ BF7 BF8 BF3 BF1maxð0; 0:75  x2 Þ BF4 maxð0; x2  0:5Þ BF9 BF5 BF1maxð0; x3  0:365Þ y ¼ 0.953 þ 2.02*BF1 -2.71*BF2 -9.73*BF3 -1.63*BF4 -2.69*BF5 -2.9*BF6 þ4.54*BF7 -7.4*BF8 þ10.9*BF9, where x1 is G0, x2 is Cld, x3 is Spd.

BF1maxð0; BF4maxð0; BF1maxð0; BF1maxð0;

0:369  x1 Þ 0:369  x1 Þ x2  0:875Þ 0:875  x2 Þ

M12. BF

Equation

BF

BF6 BF1 maxð0; x1  0:337Þ BF2 maxð0; 0:337  x1 Þ BF7 BF3 BF1maxð0; x2  0:68Þ BF8 BF9 BF4 BF1maxð0; 0:365  x3 Þ BF5 maxð0; 0:89  x2 Þ*maxð0; x1  0:935Þ y ¼ 0.506 þ 3.26*BF1 -1.5*BF2 -10.3*BF3 -2.88*BF4 þ85*BF5 -57.1*BF6 -5.41*BF7 -74*BF8 þ55.9*BF9, where x1 is G0. x2 is Rh, x3 is Spd.

Equation maxð0; maxð0; maxð0; maxð0;

0:89  x2 Þmaxð0; 0:935  x1 Þ x3  0:317Þmaxð0; x2  0:66Þ 0:89  x2 Þ*maxð0; x1  0:958Þ 0:89  x2 Þmaxð0; 0:965  x1 Þ

M13. BF

Equation

BF

Equation

BF8 BF2maxð0; 0:419  x1 Þ BF1 maxð0; x2  0:1 Þ BF2 maxð0; 0:1  x2 Þ BF9 maxð0; 0:91  x4 Þmaxð0; x2  0:9Þ BF3 BF1maxð0; x1  0:852Þ BF10 maxð0; 0:91  x4 Þmaxð0; 0:9  x2 Þ BF4 maxð0; x1  0:504Þ BF11 maxð0; x2  0:2 Þ*maxð0; x3  0:5 Þ BF5 maxð0; 0:504  x1 Þ BF12 maxð0; x2  0:2 Þmaxð0; x1  0:861Þ BF6 BF4maxð0; x5  0:73Þ BF13 maxð0; x2  0:2Þmaxð0; 0:861  x1 Þ BF7 BF2maxð0; x1  0:419Þ BF14 BF4maxð0; x3  0:25Þ y ¼ 0.67 þ 1.55*BF1 -2.83*BF2 þ12.2*BF3 þ2.16*BF4 -1.37*BF5 -5.34*BF6 -4.06*BF7 þ7.83*BF8 þ2.38*BF9 þ0.664*BF10e0.611*BF11e11.9*BF12e1.98*BF13e0.461*BF14, where x1 is G0. x2 is Sh, x3 is Cld, x4 is Rh.

M14. BF

Equation

BF

Equation

BF8 BF3maxð0; 0:356  x4 Þ BF1 maxð0; x2  0:1 Þmaxð0; x1  0:852Þ BF2 maxð0; x2  0:1Þmaxð0; 0:852  x1 Þ BF9 BF7maxð0; 0:759  x1 Þ BF3 BF1maxð0; x1  0:504Þ BF10 maxð0; x4  0:221Þ BF4 maxð0; 0:504  x1 Þ BF11 maxð0; x2  0:9Þ BF5 maxð0; 0:1  x2 Þmaxð0; x1  0:682 Þ BF12 maxð0; 0:9  x2 Þ BF6 maxð0; 0:1  x2 Þmaxð0; 682  x1 Þ BF13 BF10maxð0; 0:4  x2 Þ BF7 maxð0; 0:875  x3 Þ BF14 maxð0; 0:1  x2 Þmaxð0; x3  0:75Þ y ¼ 1.79 þ 2.98*BF1 -1.78*BF2 þ1.89*BF3 -1.04*BF4 -13.4*BF5 þ3.17*BF6 þ0.443*BF7 -1.94*BF8 -0.724*BF9 þ0.241*BF10 þ 1.49*BF11e1.37*BF12e1.22*BF13e15.9*BF14, where x1 is G0. x2 is Sh, x3 is Cld, x4 is Spd.

M15. BF

Equation

BF

Equation

BF8 BF2maxð0; 0:419  x1 Þ BF1 maxð0; x2  0:1 Þ BF2 maxð0; 0:1  x2 Þ BF9 maxð0; x1  0:504Þmaxð0; x4  0:356 Þ BF3 BF1maxð0; x1  0:852Þ BF10 maxð0; x1  0:504Þmaxð0; 0:356  x4 Þ BF4 BF1maxð0; 0:852  x1 Þ BF11 maxð0; 0:94  x3 Þmaxð0; x2  0:9Þ BF5 maxð0; 0:504  x1 Þ BF12 maxð0; 0:94  x3 Þmaxð0; 0:9  x2 Þ BF6 maxð0; x1  0:504Þmaxð0; x3  0:73 Þ BF13 maxð0; x1  0:504Þmaxð0; x4  0:173 Þ BF7 BF2maxð0; x1  0:419 Þ BF14 maxð0; x1  0:504Þmaxð0; 0:173  x4 Þ y ¼ 0.667 þ 1.47*BF1 -2.96*BF2 þ2.21*BF3 -1.8*BF4 -1.2*BF5 -5.46*BF6 -4.58*BF7 þ5.82*BF8 -13.6*BF9 þ9.57*BF10 þ 3.79*BF11 þ 0.649*BF12 þ 12.4*BF13e9.82*BF14, where x1 is G0. x2 is Sh, x3 is Rh, x4 is Spd.

D.H.W. Li et al. / Energy 186 (2019) 115857

13

M16. BF

Equation

BF

BF8 BF1 maxð0; 0:337  x1 Þ BF2 maxð0; x1  0:337Þ*maxð0; x2  0:75Þ BF9 BF10 BF3 maxð0; x1  0:337Þ*maxð0; x3  0:365Þ BF4 maxð0; x2  0:625Þ*maxð0; x1  0:374Þ BF11 BF5 maxð0; x1  0:337Þ*maxð0; 0:875  x2 Þ BF12 BF6 maxð0; x1  0:337Þmaxð0; x2  0:375Þ BF13 BF7 maxð0; x1  0:337Þmaxð0; 0:375  x2 Þ BF14 y ¼ 0.972e2.65*BF1 -5.61*BF2 -5.51*BF3 þ10*BF4 þ6.55*BF5 þ4.31*BF6 -5.68*BF7 -11.7*BF8 -147*BF9 þ146*BF10 -139*BF11 where x1 is G0. x2 is Cld, x3 is Spd, x4 is Rh.

Equation maxð0; x1  0:511Þmaxð0; x4  0:72Þ maxð0; x2  0:625Þmaxð0; x1  0:915Þ maxð0; x2  0:625Þmaxð0; 0:915  x1 Þ maxð0; x2  0:625 Þ*maxð0; 0:959  x1 Þ maxð0; x2  0:625Þmaxð0; x1  0:951Þ maxð0; x1  0:511Þmaxð0; x3  0:192Þ maxð0; x4  0:7Þ þ 127*BF12 þ 5.55*BF13e0.733*BF14,

M17. BF

Equation

BF

Equation

BF8 BF2maxð0; 0:419  x1 Þ BF1 maxð0; x2  0:1 Þ BF9 BF4maxð0; x4  0:346Þ BF2 maxð0; 0:1  x2 Þ BF3 BF1maxð0; x1  0:852Þ BF10 maxð0; x5  0:78Þ BF4 maxð0; x1  0:504Þ BF11 maxð0; x2  0:2 Þ*maxð0; x3  0:5 Þ BF5 maxð0; 0:504  x1 Þ BF12 BF4maxð0; x4  0:173Þ BF6 BF4maxð0; x5  0:73Þ BF13 maxð0; x2  0:2Þmaxð0; 0:86  x1 Þ BF7 BF2maxð0; x1  0:419Þ BF14 BF4maxð0; x3  0:25Þ y ¼ 0.766 þ 1.47*BF1 -2.59*BF2 þ1.89*BF3 þ2.03*BF4 -1.37*BF5 -5.04*BF6 -5.06*BF7 þ7.73*BF8 -4.32*BF9 -0.799*BF10e0.705*BF11 þ 3.06*BF12e1.91*BF13e0.514*BF14, where x1 is G0. x2 is Sh, x3 is Cld, x4 is Spd, x5 is Rh.

Appendix B The statistical indices used in the present study can be expressed as

Pn 

2 yi;m  yi;c R ¼1  2 Pn  i¼1 yi;m  ym i¼1

2

MSE ¼

n  2 1X yi;m  yi;c n i¼1

MAE ¼

n 1X yi;c  yi;m n i¼1

MBE ¼

n   1X y  yi;m n i¼1 i;c

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n  2 u 1X yi;m  yi;c RMSE ¼ t n i¼1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u

n u 1X yi;m  yi;c 2 RMSRE ¼ t n i¼1 ym where n is the number of observations, yi;m is the ith measured data, yi;c is the ith calculated data, ym is the mean of measured data. References [1] Li DHW, Cheung KL, Lam TNT, Chan WWH. A study of grid-connected photovoltaic (PV) system in Hong Kong. Appl Energy 2012;90(1):122e7. [2] Li DHW, Yang L, Lam JC. Zero energy buildings and sustainable development implications e a review. Energy 2013;54:1e10. [3] Muneer T. Solar radiation and daylight models. second ed. Oxford: Elsevier;

2004. [4] Zang H, Xu Q, Bian H. Generation of typical solar radiation data for different climates of China. Energy 2012;38(1):236e48. [5] Khorasanizadeh H, Mohammadi K. Introducing the best model for predicting the monthly mean global solar radiation over six major cities of Iran. Energy 2013;51:257e66. [6] Yao W, Li Z, Wang Y, Jiang F, Hu L. Evaluation of global solar radiation models for Shanghai, China. Energy Convers Manag 2014;84:597e612. [7] Bayrakçı HC, Demircan C, Keçebas¸ A. The development of empirical models for estimating global solar radiation on horizontal surface: a case study. Renew Sustain Energy Rev 2018;81:2771e82. [8] El Mghouchi Y, El Bouardi A, Choulli Z, Ajzoul T. Models for obtaining the daily direct, diffuse and global solar radiations. Renew Sustain Energy Rev 2016;56: 87e99. [9] Sabziparvar AA. A simple formula for estimating global solar radiation in central arid deserts of Iran. Renew Energy 2008;33(5):1002e10. [10] Duzen H, Aydin H. Sunshine-based estimation of global solar radiation on horizontal surface at Lake Van region (Turkey). Energy Convers Manag 2012;58:35e46. [11] Lou S, Li DHW, Lam JC. CIE Standard Sky classification by accessible climatic indices. Renew Energy 2017;113:347e56. [12] Notton G, Paoli C, Vasileva S, Nivet ML, Canaletti J-L, Cristofari C. Estimation of hourly global solar irradiation on tilted planes from horizontal one using artificial neural networks. Energy 2012;39(1):166e79. [13] Zhang J, Zhao L, Deng S, Xu W, Zhang Y. A critical review of the models used to estimate solar radiation. Renew Sustain Energy Rev 2017;70:314e29. [14] Ahmad MJ, Tiwari GN. Solar radiation models-A review. Int J Energy Res 2011;35(4):271e90. [15] Khatib T, Elmenreich W. A model for hourly solar radiation data generation from daily solar radiation data using a generalized regression artificial neural network. Int J Photoenergy 2015;2015:1e13. [16] Li DHW, Lou S. Review of solar irradiance and daylight illuminance modeling and sky classification. Renew Energy 2018;126:445e53. [17] Wu J, Chan CK. Prediction of hourly solar radiation using a novel hybrid model of ARMA and TDNN. Sol Energy 2011;85(5):808e17. [18] Wan Nik WB, Ibrahim MZ, Samo KB, Muzathik AM. Monthly mean hourly global solar radiation estimation. Sol Energy 2012;86(1):379e87. [19] S¸en Z, Tan E. Simple models of solar radiation data for northwestern part of Turkey. Energy Convers Manag 2001;42:587e98. [20] Zhang Q. Development of the typical meteorological database for Chinese locations. Energy Build 2006;38(11):1320e6. [21] Jiang Y. Prediction of monthly mean daily diffuse solar radiation using artificial neural networks and comparison with other empirical models. Energy Policy 2008;36(10):3833e7. [22] Li H, Ma W, Lian Y, Wang X, Zhao L. Global solar radiation estimation with sunshine duration in Tibet, China. Renew Energy 2011;36(11):3141e5. [23] Paulescu E, Blaga R. Regression models for hourly diffuse solar radiation. Sol Energy 2016;125:111e24. [24] Despotovic M, Nedic V, Despotovic D, Cvetanovic S. Review and statistical

14

[25] [26] [27]

[28]

[29]

[30] [31]

[32] [33] [34]

[35]

[36]

[37]

D.H.W. Li et al. / Energy 186 (2019) 115857 analysis of different global solar radiation sunshine models. Renew Sustain Energy Rev 2015;52:1869e80. Brinsfield R, Yaramanoglu M, Wheaton F. Ground level solar radiation prediction model including cloud cover effects. Sol Energy 1984;33(6):493e9. Yang D, Jirutitijaroen P, Walsh WM. Hourly solar irradiance time series forecasting using cloud cover index. Sol Energy 2012;86(12):3531e43. Kosti c R, Mikulovi c J. The empirical models for estimating solar insolation in Serbia by using meteorological data on cloudiness. Renew Energy 2017;114: 1281e93. Gandoman FH, Abdel Aleem SHE, Omar N, Ahmadi A, Alenezi FQ. Short-term solar power forecasting considering cloud coverage and ambient temperature variation effects. Renew Energy 2018;123:793e805. Halawa E, GhaffarianHoseini A, Hin Wa Li D. Empirical correlations as a means for estimating monthly average daily global radiation: a critical overview. Renew Energy 2014;72:149e53. Al-Mostafa ZA, Maghrabi AH, Al-Shehri SM. Sunshine-based global radiation models: a review and case study. Energy Convers Manag 2014;84:209e16. Chen J-L, Li G-S. Evaluation of support vector machine for estimation of solar radiation from measured meteorological variables. Theor Appl Climatol 2013;115(3e4):627e38. Celik AN, Muneer T. Neural network based method for conversion of solar radiation data. Energy Convers Manag 2013;67:117e24. Trevor Hastie RT, Friedman Jerome. The elements of statistical learning. second ed. Springer; 2009. Kaytez F, Taplamacioglu MC, Cam E, Hardalac F. Forecasting electricity consumption: a comparison of regression analysis, neural networks and least squares support vector machines. Int J Electr Power Energy Syst 2015;67: 431e8. Meenal R, Selvakumar AI. Assessment of SVM, empirical and ANN based solar radiation prediction models with most influencing input parameters. Renew Energy 2018;121:324e43. Zou L, Wang L, Lin A, Zhu H, Peng Y, Zhao Z. Estimation of global solar radiation using an artificial neural network based on an interpolation technique in southeast China. J Atmos Sol Terr Phys 2016;146:110e22. Chen J-L, Liu H-B, Wu W, Xie D-T. Estimation of monthly solar radiation from measured temperatures using support vector machines e a case study. Renew

Energy 2011;36(1):413e20. [38] Friedman JH. Multivariate adaptive regression splines (with discussion). Ann Stat 1991;19(1):1e141. [39] Keshtegar B, Mert C, Kisi O. Comparison of four heuristic regression techniques in solar radiation modeling: kriging method vs RSM, MARS and M5 model tree. Renew Sustain Energy Rev 2018;81:330e41. [40] Li Y, He Y, Su Y, Shu L. Forecasting the daily power output of a grid-connected photovoltaic system based on multivariate adaptive regression splines. Appl Energy 2016;180:392e401. [41] Zhang W, Lu L, Peng J. Evaluation of potential benefits of solar photovoltaic shadings in Hong Kong. Energy 2017;137:1152e8. [42] CIE. Guide to recommended practice of daylight measurment. Vienna, Austria: Central Bureau of the CIE; 1994. [43] Friedman JH. An introduction to multivariate adaptive regression splines. Stat Methods Med Res 1995;1995(4):197e217. [44] Jekabsons G. ARESLab: adaptive regression splines toolbox for matlab/octave. available at: http://www.cs.rtu.lv/jekabsons/; 2016. [45] Rao KDVSK, Premalatha M, Naveen C. Analysis of different combinations of meteorological parameters in predicting the horizontal global solar radiation with ANN approach: a case study. Renew Sustain Energy Rev 2018;91: 248e58. [46] Meng M, Niu D. Modeling CO2 emissions from fossil fuel combustion using the logistic equation. Energy 2011;36(5):3355e9. [47] Kasten F, Czeplak G. Solar and terrestrial radiation dependent on the amount and type of cloud. Sol Energy 1980;24(2):177e89. [48] Gul MS, Muneer T, Kambezidis HD. Models for obtaining solar radiation from other meteorological data. Sol Energy 1998;64(1e3):99e108. [49] Muneer T, Gul MS. Evaluation of sunshine and cloud cover based models for generating solar radiation data. Energy Convers Manag 2000;41(5):461e82. [50] Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L. Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th international conference on data science and advanced analytics (DSAA); 2018. p. 80e9. [51] Backhaus A, Seiffert U. Classification in high-dimensional spectral data: accuracy vs. interpretability vs. model size. Neurocomputing 2014;131:15e22.