Assessment of linear emulators in lightweight Bayesian calibration of dynamic building energy models for parameter estimation and performance prediction

Assessment of linear emulators in lightweight Bayesian calibration of dynamic building energy models for parameter estimation and performance prediction

Energy and Buildings 124 (2016) 194–202 Contents lists available at ScienceDirect Energy and Buildings journal homepage: www.elsevier.com/locate/enb...

2MB Sizes 1 Downloads 50 Views

Energy and Buildings 124 (2016) 194–202

Contents lists available at ScienceDirect

Energy and Buildings journal homepage: www.elsevier.com/locate/enbuild

Assessment of linear emulators in lightweight Bayesian calibration of dynamic building energy models for parameter estimation and performance prediction Qi Li ∗ , Godfried Augenbroe, Jason Brown School of Architecture, Georgia Institute of Technology, Atlanta, GA 30332, USA

a r t i c l e

i n f o

Article history: Received 5 November 2015 Received in revised form 2 February 2016 Accepted 10 April 2016 Available online 27 April 2016 Keywords: Bayesian calibration Dynamic model Multiple responses Regression Parameter estimation Probabilistic prediction Building energy Retrofit analysis

a b s t r a c t Calibration of building energy models is widely used in building energy audits and retrofit practices. Li et al. (2015) proposed a lightweight approach for the Bayesian calibration of dynamic building energy models, which alleviate the computation issues by the use of a linear regression emulator. As a further extension, this paper has the following contributions. First, it provides a brief literature review that motivates the original work. Second, it explained the detailed calibration methodology and its mathematical formulas, and in addition the prediction using meta-models. Third, it introduced new performance metrics for evaluating predictive distributions under uncertainty. Fourth, it used the standard Bayesian calibration method as the benchmark, assessed the capability of regression emulators of different complexity, and showed the comparison result in a case study. Compared to the standard Gaussian process emulator, the linear regression emulator including main and interaction effects is much simpler both in interpretation and implementation, calibrations are performed much more quickly, and the calibration performances are similar. This indicates a capability to perform fast risk-conscious calibration for most current retrofit practice where only monthly consumption and demand data from utility bills are available. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Recent decades saw an increasing attention in pursuing energy efficient buildings, which has significantly benefited from implementing building energy modeling in design, operational management, life-cycle assessment and retrofit analyses. Compared to other types of models, dynamic building energy models are engineering based and can provide the most detailed prediction of building performance. This makes them very suitable for evaluating energy conservation measures (ECMs) and retrofit practice decision making. However, considerable discrepancies exist between model prediction and field observation of building energy use in actual practices. Main causes include uncertainties in manufacturing, construction and building actual operation in reality. Assumptions, simplifications and approximations in modeling and simulation, also known as model inadequacy, are among the other main causes. In addition to improvement of model quality through detailed building audits, short-term testing, and energy monitoring, cali-

bration of building energy models alleviates these discrepancies by adjusting model parameters through comparison between predictions and observations, such that the model outputs are close enough to the reality and model parameters remain realistic. In retrofit analyses, they will then be used to predict potential energy savings of ECMs, a solid basis for measure selection such that the expected benefits can be realized. Li et al. [1] proposed a lightweight Bayesian calibration method that performs parameter estimation and performance prediction within a stochastic framework, and systemically handles multiple types of field observation to improve the results. Following a brief literature review, this paper will explain the proposed calibration and prediction methods in detailed mathematical formulas. After that a case study will be provided to demonstrate the method, as well as a thorough comparison of results from different linear regression emulators against the standard Bayesian calibration method under a variety of accuracy and efficiency metrics. Discussions and conclusion will be provided at the end. 2. Literature review

∗ Corresponding author. E-mail address: [email protected] (Q. Li). http://dx.doi.org/10.1016/j.enbuild.2016.04.025 0378-7788/© 2016 Elsevier B.V. All rights reserved.

Current calibration methods and procedures in common use are summarized in Refs. [2,3]. In general, approaches to tuning

Q. Li et al. / Energy and Buildings 124 (2016) 194–202

dynamic models to field observations can be categorized into two groups. Manual calibration approaches mostly rely on iterative interventions by the modelers. The iteration usually requires special expertise and experience, and can be facilitated by several analytical tools, such as graphical comparisons [4] and signature analysis methods [5]. Some model simplification techniques are also commonly used in manual calibration practice [6–8]. However, manual approaches have drawbacks. First, special expertise and experience that required can only be acquired by real practice. In addition, in order to obtain a close match between model output and observation, these methods become time-consuming and labor-intensive due to much manual iteration in choosing and adjusting parameters. Automated calibration approaches, on the other hand, can overcome these drawbacks by the use of mathematical and statistical techniques. Its general principle is to find the optimal value of parameters such that discrepancies between model prediction and measured data are minimized. This usually include several elements. First, an objective function needs to be specified to penalize the discrepancy, of which the most common function is coefficient of variance root mean square error (CV-RMSE) from ASHRAE Guideline 14 [9]. The objective function can also include terms that penalize unreasonable parameter values despite their capability of reducing discrepancy [10]. Second, exploration of the parameter space is performed to find the optimal values, either by parametric study in which the range of parameter values are specified based on experience [11], or by uncertainty analyses in which the possible parameter values are obtained through rigorous uncertainty quantification [12,13]. The computational cost of simulation can be reduced by fitting a statistical emulator to replace the physical model. Commonly used statistical emulators include Gaussian process [14], support vector regression with Gaussian kernels [15] and artificial neural networks [16,17]. However, most common automated calibration approaches, by providing only one or a small number of values, cannot fully consider uncertainties of model parameters and their propagation into predictions in a systematic way, which prevent them from considering potential risks of under-performing ECMs. Bayesian inference has been applied widely in many scientific and engineering domains, and Bayesian calibration, as proposed by Kennedy and O’Hagan [18], combines information from different sources into an estimation of model parameters using Bayesian inference. It begins with modelers’ knowledge and experience as prior beliefs for probability distributions of model parameters, and then maps them into a probability distribution of model output through building simulation. Given these output distributions, field observations are used via Bayes’ rule to update those prior beliefs. This will provide posterior estimates of parameter probability distributions, such that by using these posterior estimates the behavior of building energy model is more closely aligned with reality. In addition, by introducing an additional mathematical term to account for the remaining discrepancies, Bayesian calibration explicitly considers the impact of model inadequacy and thus reduces the risk of over-fitting. All these features make the current Bayesian calibration technique to be a competitive method in calibration practice. The application of the standard Bayesian calibration method in building simulation domain was first seen in Ref. [19], which calibrated a reduced order energy model that uses a quasi-steady-state formulation of heat balance equations and aggregated building parameters. It employed Gaussian process emulators for both the physical model and its model inadequacy. Prior distributions of model parameters came from uncertainty quantification techniques, and important parameters chosen by sensitivity analysis using Morris method [20] were calibrated against observation from utility bills. However, challenges remain in applying this standard

195

Bayesian calibration method to dynamic building energy models. First, dynamic models typically have hundreds of uncertain parameters, which requires a huge-size sample of simulation results to fully reveal the response surface. Although the Gaussian process emulator, as used in the standard Bayesian calibration method, have exact fit with all the sample points, the computational effort increases drastically as the sample size increases and becomes prohibitively expensive for dynamic models. This motivates the attempt to use a simpler, less accurate but much faster emulator in the computation to improve overall efficiency while still obtaining satisfactory results. In addition, a full Bayesian analysis, i.e. estimating all of the parameters and coefficients in both the emulator and the model inadequacy term at the same time, adds to the computation demand with limited benefits. This issue can be addressed by a two-step approach where fitting emulator and estimating model parameters separately. Finally, commonly activities like detailed on-site visits, sub-metering, etc. provides valuable information for calibrating dynamic building models, and calibrations against multiple types of observations separately are susceptible to inconsistent and in accurate results of parameter values. This necessitates a systemic way to incorporate all the information in an automated calibration approach. All of the above reasons motivates the work in Ref. [1]. The following section will provide a detailed explanation of the methodology. 3. Methodology 3.1. Meta-model formulation From a statistical perspective, the calibration problem can be formulated by the following meta-model, a classical representation from [18]: y =  (x, t) + ı (x) + εm

(1)

where y is standardized field observation, usually including the most common total energy consumption and peak demand for each month shown in monthly utility bills. y can also include indoor temperature, supply air flow rate, etc., that can be obtained from sub-metering or a building audit. Each type of output is standardized into [0, 1] by their separate minimum and maximum values to be obtained from experimental design and simulation; this ensures that all the types of observation are of the same magnitude and are considered equally important regardless of units.  (x, t) is the output of dynamic simulation, represented as a function of model inputs x and model parameters t. For building energy models, model inputs mostly refer to the weather conditions. Their values for historical building consumption are commonly assumed to vary over time in a known manner, represented by the use of actual meteorology year (AMY) data files. Model inputs for this meta-model formulation also include output indicators, such that a single metamodel can yield outputs corresponding to different observations. Model parameters are building features that determine the consumption outputs given varying inputs. They include both physical parameters whose value remain relatively constant under considered time scope, like construction properties, and parameters that describe varying processes yet exhibit constant patterns, like average occupancy in a certain space. This meta-model also considers model inadequacy by including model error ı (x), assumed only depending on model inputs, and random observation  error  εm , assumed to follow a Gaussian distribution, i.e. εm ∼N 0, m 2 . This meta-model relates field observation with simulation output through a mathematical formulation, and the calibration of building energy models within the retrofit context becomes: first, assuming the “true” value of unknown model parameters t is  for the given historical observation, calibration aims to obtain a

196

Q. Li et al. / Energy and Buildings 124 (2016) 194–202

confident estimate of ; second, calibration aims to estimate the coefficients in both ı (x) and m 2 , such that the sum of all three terms forms a prediction that would ideally be equal to the true outcome, lending support to the decision making in retrofit practice. 3.2. Experimental design While the simulation output  (x, t) is in fact a known mechanism and is deterministic given x and t, using dynamic simulation directly in automated calibration is laborious and inefficient most of time due to its relatively long computation time and the large amount of simulations to be expected. This usually leads to the use of a statistical emulator. Experimental design is used to sample over probable value variations of model parameters, and the simulation output from the selected sample serves two purposes: to analyze which model inputs and parameters are important in affecting the output of interest, and to fit an emulator that exhibits similar results to physical model under probable input and parameter values. This study uses Monte Carlo simulation technique with Latin hypercube design (LHD), first proposed by Mckay et al. [21], as the sampling algorithm. In order to generate n sample points, each dimension of the parameter space is divided into n intervals of equal probability, and LHD proceeds such that all n sample points fill the n intervals on each dimension and each interval contains exact one sample point. This feature can dramatically reduce the sample size compared with random sampling, and provide more robust results in terms of the variance of estimation [22]. Another advantage of LHD is that any of its subset retains all the desirable properties, since all the sample points still uniformly cover the n intervals on each dimension. For these reasons LHD is especially suitable for an input space of high dimension, and hence is used in this proposed method. This probability-based sampling algorithm ensures regions of more probable parameter values contain more sample points, so the fitted emulator would have better accuracy in these regions. This requires the modelers to specify the probable variations, i.e. the probability distribution of values for each parameter, which can be based on modelers’ knowledge and experience, information acquired from detailed building audit and intrusive testing, and generic uncertain information for common building characteristics from uncertainty quantification. One assumption of this approach is that all the model outputs are independent from each other after controlling model inputs. This assumption is appropriate for calibration using monthly consumption where the impact of previous month’s consumption on the current month is negligible. Special techniques like time series regression or Hidden Markov Model may be needed when this assumption does not hold, for example in calibrating consumptions at hourly or smaller time interval. 3.3. Sensitivity analysis While simulation of dynamic building energy models involves a large number of parameters, only a few of them have an important impact on model outputs like monthly consumption and peak demand, and account for most variations of these outputs. In addition, when only relatively few observations are available, tuning the values of parameters that outnumber the observations in a data-driven calibration approach would become a highly underdetermined problem with an infinite number of solutions, i.e. there would be an infinite set of combinations of calibration parameters values that all result in the same model output values. To ensure computational efficiency and avoid these issues, only important parameters will be selected and considered in the calibration process. When calibrating building models in practice, most of time the important parameters are manually chosen based on model-

ers’ experience, and the tuning processes are often not discussed in detail in many case studies. This practice is not transparent and repeatable, and is vulnerable to the risk of missing important parameters in some specific cases where previous experience does not hold. For this reason sensitivity analysis (SA) are often used to rank the parameters in order of their importance to an output, which can be used as the basis for further analysis where experience can be utilized more efficiently and effectively. This proposed method uses a regression-based SA method, as originated from the lasso [23] and applied by [24] into building energy domain, to select significant parameters. It is computationally efficient in providing acceptable SA result for most dynamic building energy models. A linear model is used to consider all the inputs and parameters of the model: yi = ˇ0 + ˇ1 xi1 + ˇ2 xi2 + . . . + ˇp xip + εi = ˇ0 +

p

j=1

ˇj xij

+εi , i = 1, 2, . . .n

(2)

where yi is the model outcome, xij s, j = 1, 2, . . .p, are the covariates including both model inputs and parameters, ˇj s are the regression coefficients, and εi s are the regression residuals. The lasso is used to select significant covariates by estimating the coefficients ˇj s in the following equation: ˆ lasso = argmin ˇ

 n 

yi − ˇ0 −

2

p j=1

ˇj xij



p

+

j=1

|ˇj |

(3)

i=1

where  is the tuning parameter that controls the number of selected parameters. Compared with ordinary least squares estimates (OLS), the lasso penalizes large models by adding the last penalty term, which would shrink the estimates of some ˇj s to be zero. As a result, only a few significant parameters will remain in the final model and considered important. This also ensures that most of the output variations can be explained by these selected covariates, and validate the use of a statistical emulator in Bayesian calibration. 3.4. Emulator fitting Instead of a Gaussian process model as in current standard Bayesian calibration formulation, this study proposes a multiple linear regression model as an emulator due to its easy interpretation, fast fitting and considerable accuracy. It assumes the physical model output  (x, t) to be represented as:  (x, t) = f (x, t) + εl = f (d) + εl = ˇ0 +



+ where d =



1≤i
p+q i=1

ˇi di +

p+q j=1

ˇj dj

ˇij di dj + εl

2

(4)

x1 , x2 , . . ., xp , t1 , t2 , . . ., tq



are significant model

inputs and parameters chosen by sensitivity analysis; are linear combination of all main effects,



p+q

ˇd j=1 j j

2

p+q i=1

ˇi di

are linear

ˇ d d are linear combination of all quadratic effects, 1≤i
Q. Li et al. / Energy and Buildings 124 (2016) 194–202

ters pertain the desirable properties that ensure a good coverage of the parameter space. Stepwise regression with Akaike information criterion [25] chooses the final linear model with only significant terms remained. If the model shows a good fit, i.e. if most variations of the model output can be explained by the model and residuals are negligible, it is reasonable to believe that by tuning parameters t to reduce the discrepancy between linear model output and observation, the discrepancy between the physical model and observation can also be reduced. 3.5. Automated calibration This study assumes model error ı (x) to follow a Gaussian process, where all ı in the observations form a joint multivariate Gaussian distribution ı specified by mean and covariance matrix ı . This indicates that model outputs are highly correlated when inputs are close enough under certain measure of similarity. Generally the model is assumed as unbiased, namely ı = 0. Squared exponential kernel is employed for element cij in ı = C (x, x) that represents the covariance between ıi and ıj :



2 1  − ωs xis − xjs 

cij = exp

(5)

s

where xis , s = 1, 2, . . ., n is the sth dimension of model inputs, ωs is the corresponding weight factor, and  is the precision. By specifying the structure of the covariance function, a Gaussian process model can flexibly represent the model behavior and obtain an exact fit on given samples. Using the matrix form, all of the observations under t =  form a joint Gaussian distribution:









y∼N (, ˙) ,  = f x,  ,  = ı + m 2 + l 2 I







  

(7)





Therefore the posterior distributions p , |y can be obtained

 

from their prior distributions p  , p ( ) and the likelihood of observations:





p y|, = ||

−1 2

 1

exp −

2

(y − )T −1 (y − )



distributions of calibration inputs can then be used in uncertainty analyses of dynamic models, such as in retrofit analyses that the associated risks of energy performance can be well represented and considered. As introduced in previous section, the current Bayesian calibration method assumes the emulator  (x, t) to be another Gaussian process. Given n field observations and m simulation sample points, all the n + m outputs form a joint multivariate Gaussian distribution with a covariance matrix of size (n + m) × (n + m), and it would be prohibitively expensive to calculate −1 when m becomes large, preventing use of large amount of simulations. This method, on the other hand, reduces the size to n × n by fitting a linear model and calibrating this model in a separate step. If the whole covariance matrix is denoted as M:



M=



C

CT

s



(9)

where  is the n × n covariance matrix of y, s is the m × m covariance matrix of , and C is the n × m covariance matrix between y and . The use of linear regression assumes that εl s are independently and identically distributed with constant variance l 2 , so all y and  are independent from each other conditioned on x and t. Therefore s becomes a constant diagonal matrix l 2 I and C becomes 0, and during the calculation only  will be updated and −1 be calculated. This enables to only consider y in the calibration process. When calibrating monthly consumption, the number of observations are relatively small compared with the number of sample points. In this way this method will save a huge amount of computational effort when calculating the observation likelihood. 3.6. Model prediction

(6)

As a result, the calibration problem becomes estimating true value of calibration parameters  and hyper-parameters = ω, , l such that the probability of y becomes maximal, and this is where Bayesian inference is introduced. Following Bayes’ rule: p , |y ∝ p y|, p  p ( )

197

(8)

Prior distributions of calibration parameters come from the associated uncertainties, and priors of are usually assigned to maximize the prediction power of linear model. A full Bayesian calibration approach estimates all the parameters at the same time using Markov Chain Monte Carlo (MCMC) simulation [26]. It generates a sequence of sample points from a process, known as a Markov chain, in which the current state only depends on the state on the previous time step. Under certain conditions the Markov chain will reach an equilibrium state after a few burn-in steps, and the sample points afterwards would come from the desired distribution, i.e. the posterior distributions. As one specific type, random walk Monte Carlo generates possible candidates of state change at each time step from a perturbation of current state, and this perturbation follows a user-specified proposal distribution. By comparing the probability of current state and the proposed candidate, an algorithm would determine that whether or not this candidate will be accepted as the next state. This study uses uniform distribution as the proposal distribution to generate candidates, and uses the Metropolis algorithm [26] to determine the state of Markov chain in the next time step. The resultant posterior

Within the Bayesian calibration framework, future outcome y∗ given x∗ and t∗ =  can be calculated by:





y∗ = f x∗ ,  + ı∗ + ε∗m + ε∗l



(10)



in which f x∗ ,  comes from the linear model conditioned on the posterior distribution of , while ε∗m and ε∗l are random realizations of separate Gaussian distributions. The combined model error ı and ı∗ given x, x∗ form a joint distribution:



ı ı∗



 ∼N

0,

C (x, x)

C (x, x∗ )

C (x∗ , x)

C (x∗ , x∗ )



(11)

Conditioning the joint Gaussian prior distribution on the observations yields:



ı∗ |ı, x, x∗ ∼N C (x∗ , x) C(x, x)−1 ı, C (x∗ , x∗ ) −C (x∗ , x) C(x, x)−1 C (x, x∗ )



(12)

which are conditioned on the posterior distributions of ω and . 3.7. Performance metrics Two agreement criteria from ASHRAE Guideline 14 [9] are the most common and widely used to evaluate the single-point prediction of calibrated building energy models. As shown in Eq. (13), these two criteria compares simulation predicted data yˆ i to the utility data used for calibration yi with n equals the number of data points and p usually equals 1.

NMBE =

|

n  i=1



yi − yˆ i |

(n − p) y¯



1 , CVRMSE = y

n  i=1

yi − yˆ i

(n − p)

2 (13)

198

Q. Li et al. / Energy and Buildings 124 (2016) 194–202 Table 1 Uncertainties of calibration parameters.

Fig. 1. Dynamic building model in OpenStudio.

ASHRAE Guideline 14 recommends that a NMBE no more than 0.05 and a CV-RMSE no more than 0.15 indicate a well-calibrated model for monthly observations. In addition, The Continuous rank probability score (CRPS) measures the closeness of predictive distributions and corresponding observations. This score is widely used in forecast verification [27] and has also been applied in evaluating building performance predictions [28]. If the predictive distribution is obtained from Monte Carlo simulation, as in this study, it can be calculated as: CRPS (F, y) = EF |Y − y| −

1 EF |Y − Y ’| 2

(14)

where F is the predictive distribution of random variable Y represented by the sample set, y is the single observation, EF is the expectation over F, Y ’ is an independent random variable with identical distribution as Y , and this identical distribution can be obtained by random permutations of the sample set F. A larger CRPS value indicates a larger discrepancy between the predictive distribution and the single observation. More details about CRPS can be found at Ref. [27]. 4. Case study 4.1. Description of dynamic building energy model The same campus building on the Georgia Tech campus as in Ref. [1] was used in this paper. It is a three-story building that have offices along the perimeter and biological laboratories in the core. The building system consumes steam and chilled water from a central (district) plant as well as electricity from a utility. This building uses a variable air volume (VAV) with reheat system, with six air handling units (AHUs) that supply conditioned air to the whole building except mechanical rooms. Domestic hot water that is mostly used by the biological laboratories in the building also comes from district heating through two heat exchangers. This building has no ventilation or sewerage heat recovery system, neither has no on-site energy generation system installed. A dynamic building energy model was built in OpenStudio 1.2 according to building design specifications (Fig. 1). On-site building audit was performed to adjust the model. Available observations include the chilled water, steam and electricity consumption recorded at 15-min time intervals from January 2011 to November 2013. Due to the poor quality of steam consumption data, only chilled water and electricity were considered in this case study. Electricity usage mainly comes from lighting and plug loads with fairly constant schedules during the year, so their usage density and schedules was adjusted to calibrate electricity manually, and then remain fixed in the later automated calibration process. To rep-

Model parameters

Mode Min

Max

Ambient environment Ground temperature ( ◦ C)

18.0

12.6

23.4

Thermal properties Conductivity Density Specific heat Thermal absorptance Solar absorptance Visible absorptance Solar transmittance Front side solar reflectance Back side solar reflectance Visible transmittance Front side visible reflectance Back side visible reflectance Infrared transmittance Front side infrared hemispherical emissivity Back side infrared hemispherical emissivity Dirt correction factor U-factor Solar heat gain coefficient Thermal resistance

– – – – – – – – – – – – – – – – – – –

±15% ±3% ±36.75% ±12% ±12% ±12% ±3% ±3% ±3% ±3% ±3% ±3% ±3% ±3% ±3% ±30% ±15% ±15% ±15%

Infiltration Effective leakage area per window area (cm2 /m2 )

2.60

0.26

4.94

Internal load Occupancy density: lab (person/m2 ) Occupancy density: office (person/m2 ) Occupancy density: hallway (person/m2 ) Occupancy density: restroom (person/m2 )

0.30 0.30 0.11 0.10

0 0 0 0

0.70 0.70 0.26 0.23

System control Cooling Set-point at occupied hours ◦ C) Cooling Set-point at unoccupied hours (◦ C) Chilled water supply temperature for each AHU (◦ C) Supply air temperature of each AHU (◦ C) Outdoor air flow rate of each AHU (m3 /s)

22.2 27.8 6.7 12.8 –

20.0 25 5.0 11.0 −70%

27.8 30.0 8.0 14.0 +20%

resent common utility bills, daily average consumption and peak demand for each month were calculated using observation, C=

1 m Ri t, D = max Ri M i=1 1≤i≤m

(15)

where C is monthly average chilled water energy use per day, D is monthly peak demand,Ri , i = 1, 2, . . ., m is consumption rate of chilled water energy for each 15-min time interval t, m and M are the total number of time intervals and days in each month so that M = 96m. A few data gaps due to sensor dysfunction or system maintenance are filled by taking the average of readings of the same time for six neighboring days, three forwards and three afterwards. Observations from 2011 to 2012 were used to calibrate the model and the remaining were used for validation. 4.2. Calibration process Uncertainties in calibration parameters were assumed to follow triangle distributions and the parameters were derived from a generic uncertainty quantification (UQ) repository, which include uncertainty information for general building envelope and materials, system components, and usage scenarios and operation. In addition, uncertainties in case-specific parameters, including cooling setpoint temperature and actual outdoor air rate for each air system, were also quantified and sampled. A complete list of uncertain parameters and their distribution parameters can be found in Table 1, which in total has 136 uncertain model parameters. All of these uncertainties will also be used as prior distributions in the Bayesian calibration.

Q. Li et al. / Energy and Buildings 124 (2016) 194–202 Table 2 Significant model inputs and parameters.

199

5. Results 5.1. Parameter estimation

Model inputs

Model parameters

Observation Type

Cooling Set-point at occupied hours

Monthly average weather conditions, including: Dry bulb temperature Relative humidity Wind speed Wind direction

Occupancy density: lab Occupancy density: office Outdoor air flow rate: AHU 1–1 Outdoor air flow rate: AHU 2 Outdoor air flow rate: AHU 3

Table 3 Information of three linear regression emulators. Emulator

LM

LI

LQ

Number of main effect terms Number of interaction effect terms Number of quadratic effect terms Adjusted R2 Root mean square error (RMSE)—training Root mean square error (RMSE)—testing

11 0 0 0.9387 0.0588 0.1411

11 33 0 0.9761 0.0348 0.0914

11 35 4 0.9810 0.0311 0.0832

The dynamic building model was exported from OpenStudio and converted into an EnergyPlus 8.2 model. A program developed by Lee et al. [29] was used to sample simulation outputs. A LHD sample of 600 runs were generated and half of it was simulated using AMY data from Hartsfield-Jackson Atlanta International Airport for each of the two years. In each run, the sampling algorithm chose only the consumption and demand data in one single month, resulted in an entire sample of 1200 data points that roughly meet the suggested “10r” rule where r is the number of parameter. They were then standardized by their global maximum and minimum values and differentiated by an indicator input named as ‘Observation Type’. Table 2 shows the significant inputs and parameters chosen by sensitivity analysis. Three multiple linear regression models were fitted to optimize prediction performance as an emulator. The linear-main (LM) emulator includes only the significant main effects. The linear-interaction (LI) emulator includes all significant main and interaction effects. The linear-quadratic (LQ) emulator includes all significant main, interactions and quadratic effects. Table 3 shows some information about these three emulators. Another sample of 1200 data points drew from the same set of simulation was used to assess their out-of-sample prediction performance. Note that the responses of these emulators range from 0 to 1 as a result of standardization. This indicates that the emulators’ expected error in fitting the physical model is 8.32–14.11%. The residual plots shown in Fig. 2 indicates that the LI and LQ emulators can adequately depict the behavior of the physical model and it is proper to assume that the residuals follow a Gaussian distribution. However the residuals of LM emulator present a clear pattern that missed by the emulator, suggesting its inadequacy and potential problem in later process. Prior distributions of calibration inputs were inherited from previous steps. Prior distributions of ω and m were adopted from Ref. [30] to assume a 20% model inadequacy and 5% random error. 10,000 MCMC simulations were performed with the first 1000 sample points removed as burn-in, and the total time of automated calibration process for the emulators range from 5 to 20 min. This study also uses the standard Bayesian calibration method to benchmark the calibration performance, which essentially uses the Gaussian process (GP) emulator. Given the same calibration parameters, their prior distributions, the 1200-point training sample, the number of MCMC simulations and the computer for executing programs, the entire one-step process took more than 10 h.

Fig. 3 shows the empirical probability density functions (PDFs) of prior and posterior distributions for all six calibration parameters, obtained separately by all the emulators. In general, results from LQ and LI emulators are almost identical and agree well with result from GP emulator, while result from LM shows more notable differences. As for the overall posterior distributions, the cooling set point for occupied hours was higher than expected, which indicates either a higher occupant setting or capacity deterioration of the cooling system. Occupant density in labs and offices was slightly lower than expected. Outdoor air flow rate for all of the three AHUs were all close to the 30% minimum requirement of design specification, which may suggest dysfunction in controlling the outdoor air dampers. Further interpretations of the posteriors require a deep investigation of the actual building and is outside of the scope of this work.

5.2. Model calibration and validation The calibrated physical model was evaluated by taking a subset of m = 90 sample points from the remaining MCMC simulations and performing EnergyPlus simulation on each sample point. Simulation result for calibration period became the estimates and result for validation period became the predictions. Predictions of the meta-model for the validation period were generated directly according to Eq. (12). Meta-model has exact fit on the training dataset, so its estimate for calibration period was not considered. Values of ASHRAE Guideline 14 criteria was calculated as follows: Pi =

1 2m

 m i=1

Piic +

m j=1

Piid

 (16)

where Pi is the overall NMBE or CVRMSE criterion, Piic and Piid are the criteria value of consumption and demand respectively, calculated by comparing the ith sample point, out of m, to the observation according to Eq. (13). Table 4 and Fig. 4 show the result and its visualization compared with ASHRAE recommended value. Overall the in-sample estimate is better than out-of-sample prediction, and the prediction using meta-model is better than using physical model. The standard method using GP emulator has an overall satisfactory result, with only the NMBE value of physical model in validation period exceeds the recommended value. Comparison of different emulators against this benchmark reveals that the LI emulator has performance similar to the GP emulator for physical model and slightly worse for meta-model, while the LQ emulator is similar in estimates, notably better in physical model prediction and worse in meta-model prediction. The LM emulator performs the worst in all of the emulators. Similarly, the CRPS value was calculated by comparing each empirical distribution, obtained from the 90-point subset sample for either physical or meta-model, with the single observation, and then averaged over all the months and the observation types. Result can be found at Table 5 and Fig. 5. The result shows a similar pattern as in ASHRAE agreement criteria, although no recommended value is available for a direct assessment. In the end, using result from the standard method as the benchmark, the LI emulator is recommended in this case study since it provides the closest results, has simpler model form for interpretation and less computation demand.

200

Q. Li et al. / Energy and Buildings 124 (2016) 194–202

Fig. 2. Residual plots of three emulators. Left: LM; Middle: LI; Right: LQ.

Fig. 3. Prior and posterior distribution of calibration parameters using four emulators.

Table 4 Performance metrics of calibration and validation, ASHRAE agreement criteria. Emulator

NMBE

CV-RMSE

Physical model

GP LQ LI LM

Meta-model

Physical model

Meta-model

Calibration

Validation

Validation

Calibration

Validation

Validation

0.02 0.06 0.03 0.06

0.10 0.06 0.09 0.19

0.03 0.08 0.04 0.03

0.10 0.11 0.10 0.17

0.15 0.12 0.14 0.27

0.13 0.18 0.16 0.18

6. Discussion 6.1. Validity of linear regression emulator General observations from the result are that calibrating dynamic energy models using either the standard Bayesian cali-

bration method, or the proposed with the LI or LQ emulator, yields similar and acceptable results in terms of both parameter estimation and performance prediction but in a much faster way. This proves the validity that, by sacrificing the accuracy by a limited magnitude, the computation can be improved significantly while still obtaining similar performance. While other statistical

Q. Li et al. / Energy and Buildings 124 (2016) 194–202

201

emulators, like support vector machines and artificial neural network, could also be employed in this framework, linear emulators including interactions and/or quadratic terms could also be good candidates for certain practices. Review of the overall results indicates that, an adequate depiction of the physical model’s behavior, represented by the emulator fitting information and the residual plot, is indispensable for desired calibration result. Therefore it is expected that more explicit criteria been developed to provide more guidance in choosing the adequate linear emulator and determining its performance limit. This case study also proves the capability of the two-step approach where emulator fitting and model calibration are performed separately. One notable feature of this methodology is that fitting of the linear emulator reuses the same sample in sensitivity analysis, so the total number of dynamic model runs is 600 in this case study. As for the traditional approach, sample generated for sensitivity analysis using Morris method is of similar size, but lack the desired space coverage properties. Therefore an additional sample is required for fitting the Gaussian process emulator and the total number of runs will thus be larger. Hence it is believed that the newly proposed method has computation benefits in both dynamic simulation and model fitting. 6.2. Physical model vs. meta-model

Fig. 4. Visualization of performance metrics of calibration and validation, ASHRAE agreement criteria. Table 5 Performance metrics of calibration and validation, CRPS. Emulator

CRPS Physical model

GP LQ LI LM

Meta-model

Calibration

Validation

Validation

0.028 0.034 0.03 0.052

0.042 0.034 0.043 0.090

0.031 0.049 0.041 0.039

A closer look at the performance metrics seems to reveal a tradeoff between prediction of physical and meta-model for linear emulators: more accurate linear emulator leads to better physical prediction but worse meta-model prediction. A possible reason that is acknowledged by other disciplines is that, since the discrepancy term is fitted by a Gaussian process model, a more explicitly defined mean vector, generated by the linear emulator, would constrain the Gaussian process model’s flexibility in fitting the observations. However, in retrofit analyses where physical model parameters are to be changed to reflect ECM measures, any predictions under changes that not considered by the meta-model could yield incorrect results and conclusions. Note that the meta-model prediction of using the LM emulator yields acceptable result, even though the calibration parameter values are notably different from the other three emulators. Therefore the calibrated physical model are more important in this practice and the calibration parameters should receive more attention. It is recommended to consider the model discrepancy term as a statistical technique to prevent over-fitting. When the validation of physical model shows a large discrepancy, the modeler should review the model assumptions and parameter’s prior distributions and consider re-calibration after appropriate adjustment, rather than relying on the meta-model to provide “accurate” predictions. Another possible future work may involve the comparison of LHD sample simulations against the observation to detect potentially incorrect model. 6.3. Model performance measures under uncertainty As compared to commonly used ASHRAE Guideline 14 criteria, the CRPS and similar metrics fit more appropriately with decision-making process under uncertainty. It is recommended that corresponding recommended values been developed to provide models with a direct quality assessment, and this recommended value should be obtained from consideration of real practice such that the value can be translated into certain risk measures. 7. Conclusions

Fig. 5. Visualization of performance metrics of calibration and validation, CRPS.

This study has assessed the performance of three linear regression emulators in a newly proposed lightweight Bayesian

202

Q. Li et al. / Energy and Buildings 124 (2016) 194–202

calibration framework for calibrating dynamic building energy models with multiple types of observations. This method fitted a multiple linear regression model to replace physical model in the meta-model formulation, and obtained posterior distributions of calibration parameters by using Bayesian inference. Different performance metrics were introduced and a case study was performed. The result shows that use of linear emulator including at least significant main and interaction effects provides comparable results with the current Bayesian calibration method in terms of both parameter values and model performance, and it also achieves significant reduction in computation demand. This lightweight, flexible calibration method applies to most building operation and retrofit analyses when only utility bills are available, but additional observations can also be incorporated to utilize more valuable information. This method is also capable of handling high dimensional parameters of dynamic models and large quantity of simulation samples. All these preferable properties provide it with great potential in future applications, such as real-time building optimal control with integrated sensor system, large scale retrofit and urban energy management with aggregated consumption data. More explicit criteria and guidelines are to be developed to facilitate the use and selection of appropriate linear emulator in real practice for risk-conscious decision making. Acknowledgment This study was fully funded by the NSF-EFRI SEED grant 1038248: “Risk-conscious Design and Retrofit of Buildings for Low Energy” awarded to the Georgia Institute of Technology. References [1] Q. Li, L. Gu, G. Augenbroe, J. Brown, Calibration of dynamic building energy models with multiple responses using bayesian inference and linear regression models, 6th Int. Build. Phys. Conf. IBPC 2015 78 (2015) 13–19, http://dx.doi.org/10.1016/j.egypro.2015.11.037. [2] D. Coakley, P. Raftery, M. Keane, A review of methods to match building energy simulation models to measured data, Renew. Sustain. Energy Rev. 37 (2014) 123–141, http://dx.doi.org/10.1016/j.rser.2014.05.007. [3] T.A. Reddy, Literature review on calibration of building energy simulation programs: uses, problems, procedures, uncertainty, and tools, ASHRAE Trans. 112 (2006) 226. [4] J.S. Haberl, T.E. Bou-Saada, Procedures for calibrating hourly simulation models to measured building energy and environmental data, J. Sol. Energy Eng. 120 (1998) 193, http://dx.doi.org/10.1115/1.2888069. [5] G. Liu, M. Liu, A rapid calibration procedure and case study for simplified simulation models of commonly used HVAC systems, Build. Environ. 46 (2011) 409–420, http://dx.doi.org/10.1016/j.buildenv.2010.08.002. [6] H. Akbari, S.J. Konopacki, Application of an end-use disaggregation algorithm for obtaining building energy-use data, J. Sol. Energy Eng. 120 (1998) 205–210. [7] J. Yoon, E.J. Lee, D.E. Claridge, Calibration procedure for energy performance simulation of a commercial building, J. Sol. Energy Eng. 125 (2003) 251, http://dx.doi.org/10.1115/1.1564076. [8] P. Raftery, M. Keane, J. O’Donnell, Calibrating whole building energy models: an evidence-based methodology, Energy Build. 43 (2011) 2356–2364, http:// dx.doi.org/10.1016/j.enbuild.2011.05.020. [9] Ashrae, ASHRAE guideline 14-2002: measurement of energy and demand savings, Am. Soc. Heat. Vent. Air Cond. 8400 (2002) http://scholar.google.com/ scholar?hl=en&btnG=Search&q=intitle:Guideline+142002:+Measurement+of+Energy+and+Demand+Savings#0.

[10] W.L. Carroll, R.J. Hitchcock, Tuning simulated building descriptions to match actual utility data—methods and implementation, ASHRAE Trans. 99 (1993) 928–934. [11] T.A. Reddy, I. Maor, C. Panjapornpon, Calibrating detailed building energy simulation programs with measured data—part I: general methodology (RP-1051), HVAC&R Res. 13 (2007) 221–241, http://dx.doi.org/10.1080/ 10789669.2007.10390952. [12] S. de Wit, G. Augenbroe, Analysis of uncertainty in building design evaluations and its implications, Energy Build. 34 (2002) 951–958, http://dx. doi.org/10.1016/S0378-7788(02)00070-1. [13] I. Macdonald, Quantifying the Effects of Uncertainty in Building Simulation, 2002 http://www.strath.ac.uk/media/departments/mechanicalengineering/ esru/research/phdmphilprojects/macdonald thesis.pdf. [14] M. Manfren, N. Aste, R. Moshksar, Calibration and uncertainty analysis for computer models—a meta-model based approach for integrated building energy simulation, Appl. Energy 103 (2013) 627–641, http://dx.doi.org/10. 1016/j.apenergy.2012.10.031. ´ A [15] B. Eisenhower, Z. O’Neill, S. Narayanan, V.a. Fonoberov, I. Mezic, methodology for meta-model based optimization in building energy models, Energy Build. 47 (2012) 292–301, http://dx.doi.org/10.1016/j.enbuild.2011. 12.001. [16] S. Kalogirou, M. Bojic, Artificial neural networks for the prediction of the energy consumption of a passive solar building, Energy 25 (2000) 479–491, http://dx.doi.org/10.1016/S0360-5442(99)00086-9. [17] A.H. Neto, F.A.S. Fiorelli, Comparison between detailed model simulation and artificial neural network for forecasting building energy consumption, Energy Build. 40 (2008) 2169–2176, http://dx.doi.org/10.1016/j.enbuild.2008.06.013. [18] M.C. Kennedy, A. O’Hagan, Bayesian calibration of computer models, J. R. Stat. Soc. Ser. B Stat. Methodol. 63 (2001) 425–464, http://dx.doi.org/10.1111/ 1467-9868.00294. [19] Y. Heo, R. Choudhary, G.a. Augenbroe, Calibration of building energy models for retrofit analysis under uncertainty, Energy Build. 47 (2012) 550–560, http://dx.doi.org/10.1016/j.enbuild.2011.12.029. [20] M.D. Morris, Factorial sampling plans for preliminary computational experiments, Technometrics 33 (1991) 161–174, http://dx.doi.org/10.2307/ 1269043. [21] M.D. McKay, R.J. Beckman, W.J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics 21 (1979) 239, http://dx.doi.org/10.2307/1268522. [22] A. Saltelli, S. Tarantola, F. Campolongo, Sensitivity anaysis as an ingredient of modeling, Stat. Sci. 15 (2000) 377–395, http://dx.doi.org/10.1214/ss/ 1009213004. [23] R. Tibshirani, Regression selection and shrinkage via the lasso, J. R. Stat. Soc. B 58 (1994) 267–288, http://dx.doi.org/10.2307/2346178. [24] Y. Sun, L. Gu, C.F.J. Wu, G. Augenbroe, Exploring HVAC system sizing under uncertainty, Energy Build. 81 (2014) 243–252, http://dx.doi.org/10.1016/j. enbuild.2014.06.026. [25] H. Akaike, A new look at the statistical model identification, IEEE Trans. Autom. Control 19 (1974) 716–723, http://dx.doi.org/10.1109/TAC.1974. 1100705. [26] N. Metropolis, Equation of state calculations by fast computing machines, J. Chem. Phys. 21 (1953) 1087, http://dx.doi.org/10.1063/1.1699114. [27] T. Gneiting, A.E. Raftery, Strictly proper scoring rules, prediction, and estimation, J. Am. Stat. Assoc. 102 (2007) 359–378, http://dx.doi.org/10.1198/ 016214506000001437. [28] Y. Sun, Closing the Building Energy Performance Gap By Improving Our Predictions, Georgia Institute of Technology, 2014. [29] B.D. Lee, Y. Sun, G. Augenbroe, C.J.J. Paredis, Towards better prediction of building performance: a workbench to analyze uncertainty in building simulation, BS2013 13th Conf. Int. Build. Perform. Simul. Assoc. Chambéry, Fr. August 26–28 (2013) 1231–1238. [30] S. Guillas, J. Rougier, A. Maute, A.D. Richmond, C.D. Linkletter, Bayesian calibration of the thermosphere–ionosphere electrodynamics general circulation model (TIE-GCM), Geosci. Model Dev. Discuss. 2 (2009) 485–506, http://dx.doi.org/10.5194/gmdd-2-485-2009.