International Journal of Forecasting xxx (xxxx) xxx
Contents lists available at ScienceDirect
International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast
Temperature anomaly detection for electric load forecasting ∗
Masoud Sobhani a , Tao Hong a , , Claude Martin b a b
Systems Engineering Management, University of North Carolina at Charlotte, United States of America North Carolina Association of Electric Cooperatives, United States of America
a b s t r a c t Since temperature variables are used in many load forecasting models, the quality of historical temperature data is crucial to the forecast accuracy. The raw data collected by local weather stations and archived by government agencies often include many missing values and incorrect readings, and thus cannot be used directly by load forecasters. As a result, many power companies today purchase data from commercial weather service vendors. Such quality-controlled data may still have many defects, but many load forecasters have been using them in full faith. This paper proposes a novel temperature anomaly detection methodology that makes use of the local load information collected by power companies. The effectiveness of the proposed method is demonstrated through two public datasets: one from the Global Energy Forecasting Competition 2014 and the other from ISO New England. The results show that the accuracy of the final load forecasts can be enhanced by removing the detected observations from the original input data. © 2019 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction Electricity load forecasts are used to support many decision-making processes in the energy sector, such as power system planning and operations, ratemaking, financial planning, and energy trading. A 1% improvement in forecast accuracy can help a large power company to save millions of dollars (Hong, 2015). Many factors affect the load forecasting accuracy, such as model efficacy, load characteristics, and data quality. Most papers in the load forecasting literature are on modeling techniques and methodologies for loads at a high voltage level (Hong & Fan, 2016), and range from point load forecasting to probabilistic load forecasting, from short-term to longterm forecasting (Fan & Hyndman, 2012; Hong, Wilson and Xie, 2014; Hyndman & Fan, 2010). The smart grid initiatives of the last decade have led to many investigations being conducted on load forecasting at lower voltage levels (Wang, Chen, Hong, & Kang, 2019). Some representative investigations were conducted using data at the household level or for industrial plants (Alasali, Haben, Becerra, & Holderbaum, 2018; Ben Taieb, Huser, Hyndman, & Genton, 2016; Berk, Hoffmann, & Müller, 2018; Bracale, Carpinelli, De Falco, & Hong, 2019). Some ∗ Corresponding author. E-mail address:
[email protected] (T. Hong).
notable methods were recognized as winning entries in the Global Energy Forecasting Competitions (GEFCom) of 2012, 2014 and 2017 (Hong, Pinson and Fan, 2014; Hong et al., 2016; Hong, Xie, & Black, 2019). A typical load forecasting model requires information on the load history, the temperature history, and the corresponding timestamp. The quality of such input data is of critical importance for the load forecast accuracy on the output side. In other words, garbage in, garbage out. In practice, load forecasters usually spend more time on anomaly detection and data cleansing than on model building. Nevertheless, data quality issues have not attracted much attention from the research community until recently. The Global Energy Forecasting Competitions released real-world data to the contestants, and some winning entries addressed issues with the data quality during the competitions. One winning team in the hierarchical load forecasting track of GEFCom2012 screened outliers by taking the mean hourly load as the normal value and removing any hourly loads that were less than 20% of the mean (Charlton & Singleton, 2014). In the probabilistic load forecasting track of GEFCom2014, a winning team developed a model-based anomaly detection method with a fixed threshold for cleaning the load data (Xie & Hong, 2016). In addition, growing concerns regarding cybersecurity have stimulated research on load forecasting during data attacks (Luo, Hong and Fang, 2018;
https://doi.org/10.1016/j.ijforecast.2019.04.022 0169-2070/© 2019 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
2
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
Luo, Hong and Yue, 2018), where an emerging topic is the detection of anomalies in the historical load data (Luo, Hong, Fang, 2018; Luo, Hong, Yue, 2018; Yue, Hong, & Wang, 2019). Another solution to data quality issues is to apply robust regression models to load forecasting (Luo, Hong, & Fang, 2019). Of the small number of load forecasting papers that have tackled data quality issues, few have addressed data quality issues in the temperature record. Hong, Wang, Pahwa, Gui, and Hsiang (2010) discussed the data quality issues associated with the historical weather data and their implications for short term load forecasting. Following GEFCom2012, Hong, Wang, and White (2015) proposed a method for selecting appropriate weather stations for load forecasting. Sobhani, Campbell, Sangamwar, Li, and Hong (2019) evaluated several weather station combination methods for load forecasting. Finally, Kanda and Veguillas (2019) eliminated several weather stations in the final match of GEFCom2017 due to data quality concerns. The raw data collected by local weather stations and archived by government agencies, such as the National Oceanic and Atmospheric Administration, are often full of missing values and incorrect readings, and therefore cannot be used directly by load forecasters. Many utility companies today purchase data from commercial weather service vendors that have gone through a quality control process conducted by meteorologists based on meteorological science. Such quality-controlled data may look ordinary to most people, but they still have some defects, as was recognized by some GEFCom2017 winners (Hong et al., 2019; Kanda & Veguillas, 2019). Nevertheless, many load forecasters have been using them in full faith. This paper proposes a novel temperature anomaly detection methodology for electric load forecasting by making use of the load history in order to detect anomalies in the temperature history. The methodology includes two major components: a load-based temperature prediction model (to be discussed in Section 3) and a model-based anomaly detection procedure (to be discussed in Section 4). Section 5 uses two public datasets to demonstrate the effectiveness of the proposed method: one from GEFCom2014 and the other from ISO New England (ISONE). We also elaborate on the difference between load data cleansing and temperature data cleansing. The results show that the final load forecast accuracy can be enhanced by removing the detected observations from the temperature history data. 2. Data We have endeavored to make the results reproducible by other researchers by using two public datasets in this paper. We are also publishing part of the code as supplementary files. One dataset is from the load forecasting track of GFECom2014, while the other is from ISONE. GEFCom2014 released seven years of hourly loads (from 2005 to 2011) and 11 years of hourly temperatures (from January 2001 to 2011). This paper uses three full calendar years, 2008 to 2010. The ISONE data include system load and weather data for the ISO New England Control
Area and its eight wholesale load zones. The weather data come from seven different weather stations in the ISONE territory. While the data date back to 2003, this paper uses three years (2014 to 2016) of hourly load and temperature data. Fig. 1 shows the scatter plot of temperature vs. load using the GEFCom2014 data. The scatter plot displays a strong correlation between the temperature and the load. Although temperature is one of the causal factors for the load variations (left graph), we employ the corresponding correlation between them for predicting the temperature using the load and its variants as independent variables (right graph). Fig. 2 shows boxplots of the load and temperature grouped by month during the same three years, which display the seasonal patterns of both profiles. 3. A load-based temperature prediction model 3.1. The basic idea Temperature is well known to be a driving factor of the load, and many load forecasting models employ temperature variables. A widely-adopted benchmark model for load forecasting is known as the vanilla model (Hong, Pinson et al., 2014): Lt = β0 + β1 Trendt + β2 Mt + β3 Wt + β4 Ht + β5 Wt Ht
+ β6 Tt + β7 Tt2 + β8 Tt3 + β9 Mt Tt + β10 Mt Tt2 + β11 Mt Tt3 + β12 Ht Tt + β13 Ht Tt2 + β14 Ht Tt3 ,
(1)
where Lt is the load forecast for time t; βi are the coefficients estimated using the ordinary least squares method; Mt , Wt and Ht are the coincident month-of-the-year, dayof-the-week, and hour-of-the-day for time t, respectively, which are classification variables; and Tt is the coincident temperature. This vanilla model can be improved further by including additional variables, such as lagged temperature and holiday variables (Hong, Wilson et al., 2014; Wang, Liu, & Hong, 2016). The strong correlation between the load and temperature and the autocorrelation between the load and its variants, such as leads and lags, enable us to predict the coincident temperature using the loads of surrounding hours. In other words, we are trying to build a function for fitting the temperature–load scatter plot in Fig. 1. Since the scatter plot has two wings, we can fit them using two separate regression functions: one for the upper wing and the other for the lower one. The cut-off point is set to the comfort temperature, which is typically between 57 ◦ F and 63 ◦ F. A rigorous method for identifying the optimal cut-off temperature would be to look for the temperature that corresponds to the minimal load in the comfort temperature zone. However, since the anomaly detection performance is not sensitive to the temperature cut-off, we implement a simple method in this paper in order to stay focused on the basic idea of using the load to predict the temperature. We group the historical data by month and hour, which leads to 12 × 24 = 288 groups. For each group, we use the simple average temperature of that group as the cut-off.
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
3
Fig. 1. Load–temperature (left) and temperature–load (right) scatter plots.
Fig. 2. Boxplots of the data grouped by month, for temperature (top) and load (bottom).
3.2. Model selection Due to the earth’s rotation and its orbit around the sun, a temperature profile will present two seasonal patterns: time of year and time of day. In addition to these two
patterns, the load series also has a third seasonal pattern, namely day of the week, which is due mainly to human activities on workdays and non-workdays. We capture the seasonal patterns by inheriting the calendar variables from the vanilla model, such as Month, representing the
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
4
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx Table 1 Candidate models. Predictor candidates
M1 M2 M3 M4 M5 M6 M7 M8
Trend
Load Load2 Load3
× × × × × × × ×
× × × × × × × ×
Hour
Month
Weekday
× × × × × × ×
× × × × × ×
× × × × ×
Month × Load Month × Load2 Month × Load3
Hour × Load Hour × Load2 Hour × Load3
× × × ×
× × ×
Table 2 Ex-post temperature forecasting results from the candidate models. Model
MAE (◦ F)
M1 M2 M3 M4 M5 M6 M7 M8
4.30 3.14 2.97 2.96 2.93 2.85 2.79 2.74
Hour × Weekday
Month × Hour
× ×
×
Table 3 Customizing model M8 by adding lagged and leading values of the load. Model
Equation
MAE (◦ F)
R1 R2 R3 R4 R5 R6
M8 + Load(t − 1) M8 + Load(t − 1) + Load(t − 2) M8 + Load(t − 1) + Load(t − 2) + Load(t − 3) R3 + Load(t + 1) R3 + Load(t + 1) + Load(t + 2) R3 + Load(t + 1) + Load(t + 2) + Load(t + 3)
2.54 2.51 2.50 2.42 2.38 2.37
3.3. Model improvement 12 months of a year, Hour, representing the 24 h of a day, and Weekday, representing the seven days of a week. In addition, following the structure of the vanilla model, we also consider as candidate variables the interactions between these calendar variables and polynomials of the load. Trend is a quantitative variable for capturing a possible overall increasing or decreasing trend of the temperature. Although the temperature is not affected significantly by human activities on a short time scale, we include this variable in order to make the model more generalized. Appendix A discusses the pros and cons of the Trend variable, and Table 1 lists all candidate variables and the eight corresponding models, denoted by M1 to M8. We select the best model based on the ex-post temperature forecasting performances. Two years (namely 2008 and 2009) of hourly load and temperature data from GEFCom2014 are used as the training data for forecasting the hourly temperatures of 2010, given the actual hourly loads of 2010. The mean absolute error (MAE) is used to measure the performance. The MAE values of the eight candidate models are listed in Table 2. MAE =
n 1∑
n
|Actualt − Forecastt |
(2)
t =1
The most accurate model among these eight candidates is M8: Tt = β0 + β1 Trendt + β2 Mt + β3 Ht + β4 Wt + β5 Mt Ht
+ β6 Wt Ht + f (Lt , Mt , Ht ) ,
(3)
where f (Lt , Mt , Ht ) = β7 Lt + β8 L2t + β9 L3t + β10 Mt Lt + β11 Mt L2t
+ β12 Mt L3t + β13 Ht Lt + β14 Ht L2t + β15 Ht L3t .
(4)
Lagged temperature variables are strong predictors of the electricity demand, a fact which was referred to by Wang et al. (2016) as the recency effect. Similarly, we predict the temperature by building six recency effect models by adding polynomials of lagged and leading load values, together with their interactions with Month and Hour. Table 3 presents the forecasting results using the customized models. Adding lagged and lead load values of the three hours (R6) improved the MAE by 13.5%, from 2.74 ◦ F for M8 down to 2.37 ◦ F for R6. Typically, the load profiles of holidays are not the same as those of regular days. There are different types of holidays with respect to the day of the week that they fall on. The load profiles of the holidays differ depending on the situation of the holiday and features of the location. Of all federal holidays in the United States, New Year’s Day, Memorial Day, Independence Day, Labor Day, Thanksgiving Day and Christmas Day are usually known as the ‘‘big six’’ holidays, because they are observed by many business entities and government agencies. The prediction accuracy of the temperature on holidays can be improved by 5.9% (from 2.56 ◦ F to 2.41 ◦ F) when the holiday effects are considered, and the MAE decreases from 2.37 ◦ F to 2.36 ◦ F over the entire forecasting period including both holidays and regular days. Since the improvement from adding holiday effects is minor, we use model R6 in Table 3 without holiday effects to keep the presentation concise. The code attached with the paper does not have holiday information either. Interested readers may want to further fine tune the model by adding holiday effects.
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
4. Model-based anomaly detection Model-based anomaly detection compares the observations with the predictions from a model estimated using the same set of data. If the deviation is beyond a certain threshold, the corresponding observation is treated as an anomaly. The procedure can be executed in three steps: (1) estimate the model using the data to be examined; (2) calculate error statistics; (3) set a threshold, and label the observations that pass the threshold as anomalies. Luo, Hong, Fang (2018) and Luo, Hong, Yue (2018) used this methodology for detecting anomalies in the load data. In this paper, the error statistics in the second step include the mean (µ) and standard deviation (σ ) of the absolute errors, while the threshold is set to µ + σ . Fig. 3 shows the flowchart of the proposed methodology for temperature anomaly detection.
5. Case study 5.1. GEFCom2014 experiment The hourly load and temperature data for the years 2008 and 2009 are used as the training data for estimating the model, while the year 2010 is the forecast period. The temperature series is the simple average of all 25 weather stations that were provided to the contestants. We simulated anomalies by selecting n% of the temperature observations in the training data at random and multiplying them by 1 + k%. We can then simulate anomalies at different levels by varying the two parameters n and k. In this paper, we set n from 10 to 40 with increments of 10, and k from 5 to 40 with increments of 5, leading to a total of 32 test cases. The anomalies are then used to test the proposed method. Following Luo, Hong, Fang (2018) and Luo, Hong, Yue (2018), two measures are used for evaluating the detection precision:
• the false negative rate (FNR), defined as the ratio of the number of undetected anomalies to the total number of anomalies; and • the false positive rate (FPR), defined as the ratio of the number of normal points that are detected as anomalies to the number of normal points. Smaller FNR or FPR values indicate more precise detection performances. Fig. 4 depicts the resulting FNR and FPR values, and shows that, for a given n, FNR and FPR decrease as k increases. In other words, the proposed temperature anomaly detection method is more effective with more corrupted datasets. In the most corrupted data (n = 40% and k = 40%), the detection method was able to detect 72% of the simulated anomalies. We demonstrate the effectiveness of temperature anomaly detection in load forecasting by further comparing the forecast accuracies with and without the temperature anomaly detection procedure. The vanilla model
5
is used as the load forecasting model here, but can be replaced by other sophisticated models. In each test case, we cleanse the corrupted data by simply removing the detected anomalies. We then apply the vanilla model to both the raw and cleansed data in order to check whether the forecast error is reduced. This paper uses the mean absolute percentage error (MAPE) to measure the load forecasting performance:
MAPE =
n 100 ∑ ⏐ Actualt − Predictt ⏐
n
t =1
⏐ ⏐ ⏐
Actualt
⏐ ⏐. ⏐
(5)
Table 4 compares the MAPE values of the corrupted and cleansed data using a heat map, with cooler colors (green) indicating lower MAPEs and warmer colors (red) indicating higher MAPEs. Each MAPE value is the simple average of 20 simulations. While the MAPE increases as the corruption level goes up, the proposed anomaly detection method alleviates the impact of the corrupted data. Moreover, the data cleansed by the proposed method have led to more accurate forecasts in all 32 test cases with corrupted data. When applying data cleansing to the original temperature data, the resulting MAPE is 6.16%. This is very similar to using the original temperature data directly, which has a MAPE of 6.15%. This is not a surprise, because the temperature data used in GEFCom2014 had already gone through two data cleansing processes before being released to the contestants: one by a commercial weather service provider and the other by the analyst working at the utility company. 5.2. ISONE experiment We further test our proposed method on the ISONE dataset to show that its effectiveness is not limited to one specific dataset. In summary, the findings and conclusions of the GEFCom2014 experiment stay the same here. Since the ISONE dataset covers eight load zones from six states of the U.S. and seven weather stations, we focus on this multi-regional aspect in our ISONE experiment in order to avoid verbose presentation. In this paper, 2014 and 2015 are used as the training data, while the forecast period is 2016. We consider all pairs of load zones and weather stations, resulting in 56 test cases, and implement the proposed temperature prediction model for each pair. The mean absolute error (MAE) is used as the measure for evaluating the temperature predictions. The default pairs published by ISONE are based on the simple assumption that the closest weather station to a load zone will be best at explaining the variation of the load in that zone. In practice, the closest weather station may or may not be the best one. A weather station is surrounded by various different distant and close load zones, and all of them have the potential to have the best correlation of load and temperature. Moreover, in reality there may not be a weather station close to a load zone, which is why we show that using different load zones for
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
6
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
Fig. 3. Flowchart of the proposed anomaly detection method.
Fig. 4. FNR and FPR values in different corruption scenarios. Table 4 Load forecasting performances (in average MAPEs) for corrupted and cleansed data.
a specific weather station could result in different predictions. We did this to demonstrate the practical value of the proposed methodology. Table 5 shows the MAEs of temperature predictions for all pairs of load zones and weather stations on a heat map, with cooler colors (green) indicating lower MAEs and warmer colors (red) indicating higher MAEs. For instance, when the temperature from weather station BOS
is estimated using the NEMASSBOST load, the MAE is 4.15 ◦ F. Of all eight load zones, NEMASSBOST is the best one for modeling the temperature of BOS. Meanwhile, BOS is the best of the seven weather stations for modeling the load of NEMASSBOST. Both the distance between a weather station and a load zone and the location of the load zone may affect the quality of the temperature predictions. For example,
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
7
Table 5 The temperature predictions (in MAEs) for all 56 load zone/weather station pairs of ISONE.
Table 6 The anomaly detection ratios (%) for all load zone/weather station pairs of ISONE. Zone/WS
PWM
CON
BTV
BDL
PVD
ORH
BOS
ME NH VT CT RI SEMASS WCMASS NEMASSBOST
7.41 5.50 7.63 7.32 7.36 7.13 7.16 7.14
7.49 5.91 7.20 6.73 6.06 5.66 6.10 5.89
8.11 7.69 6.84 8.19 8.21 8.41 8.14 8.45
7.71 6.66 7.19 5.64 6.35 6.58 6.25 6.31
8.26 6.72 7.65 6.79 6.07 5.98 6.52 6.32
7.21 5.99 7.09 6.27 6.17 5.84 6.03 5.70
8.06 3.40 7.56 7.48 6.57 7.12 6.87 6.56
the predictions of the BVT temperature are not good from any load zone. Geographically, this weather station is far away from all load zones except VT, which explains why the VT load is the best for modeling the BVT temperature. We apply the proposed temperature anomaly detection method to each of the 56 test cases. Table 6 shows the percentages of detected anomalies in the historical temperature data. All of these percentages appear to be relatively low compared with the corrupted data in the GEFCom2014 experiment presented earlier in this paper. Again, this is not a surprise, because ISO New England, as the independent system operator controlling the transmission grid of the six northeast states, is diligent about publishing high-quality data on its website. Their weather data are commercial grade and have already gone through rounds of quality control before being used for load forecasting. The temperature profiles shown in Fig. 5 highlight some of the anomalies detected in the temperature data published by ISO New England. For instance, in the middle of the bottom panel, the temperature profile stays flat for several hours in a row, which is rare in the real world. Kanda and Veguillas (2019) eliminated several weather stations for the same reason in GEFCom2017. The cause of erroneous non-varying temperature data might be either incorrect readings or a deficient procedure for replacing missing values. The graph shows that the proposed method can detect many anomalies that are being ignored by the state-of-the-art methods used by weather service providers. Although Fig. 5 highlights many observations, it is possible that not all of them are erroneous. As was shown in
Fig. 4, the proposed method does have false positives. The better the data quality, the higher the false positive ratio. One way of reducing the false positive ratio is to raise the threshold discussed in Section 4 and illustrated in Fig. 3. However, one consequence of raising the threshold is an increase in the false negative ratio. Although fine-tuning the threshold could lead to better detection results, it is outside the scope of this paper, the focus of which is the core idea of predicting temperatures using load information. The ultimate goal of this paper is to improve the load forecasting accuracy. Thus, we subsequently forecast the load of each load zone for each test case using the raw and cleansed temperature data respectively. Overall, removing the detected anomalies from the raw data improved the forecast accuracy in 33 cases out of 56. 5.3. Load data cleansing vs. temperature data cleansing We indicated the advantages of temperature data cleansing for the performance of load forecasting by also conducting load data cleansing using the same modelbased approach as Luo, Hong, Fang (2018) and Luo, Hong, Yue (2018) and assigning the same threshold (e.g., µ + σ ). The load forecasting performance is being tested in four cases: using the raw data, cleansing the temperature data only, cleansing the load data only, and cleansing first the temperature data and then the load data. Table 7 lists the MAPE values for each zone with the best-pairing weather station. Temperature cleansing wins seven of the eight zones compared with using raw data. On average, temperature data cleansing reduces the MAPE from 4.24% to 4.20%. Compared with cleansing the load only, cleansing both the temperature and the load wins seven of the eight zones. On average, adding temperature data cleansing reduces the MAPE from 4.19% to 4.15%. Overall, the initial temperature data cleansing improves the data quality and consequently enhances the accuracy of the load forecasts. In other words, although the weather data published by ISONE are already of a high quality, the detection method can still work effectively and help to reduce the load forecast errors. We illustrate the difference between load data cleansing and temperature data cleansing further by plotting in Fig. 6 the load anomaly detection results for the same two months as in Fig. 5. While there are overlaps between the
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
8
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
Fig. 5. Temperature profiles for two sample months in the ISONE case study. The detected points are indicated by small red circles. Table 7 MAPEs (in %) of the load forecasts for the eight matching ‘‘load zone–weather station’’ pairs (before and after data cleansing). Load zone
Raw data
Cleanse temperature
Cleanse load
Cleanse temperature and then load
ME NH VT CT RI SEMASS WCMASS NEMASSBOST Average
3.75 3.85 4.75 4.88 3.88 4.45 4.31 4.04 4.24
3.74 3.79 4.71 4.81 3.85 4.42 4.33 3.97 4.20
3.70 3.82 4.81 4.64 3.87 4.44 4.34 3.92 4.19
3.70 3.73 4.76 4.55 3.83 4.40 4.33 3.90 4.15
anomalies detected for load and temperature, there are many observations that are identified by only one of the anomaly detection methods. In practice, the two methods can be used together. In fact, as Table 7 shows, using the two together achieves the lowest MAPE on average. 6. Conclusion Inspired by the fact that the temperature is a key driver of the electricity demand, we propose an anomaly detection method for the temperature data that makes use of local load information. The method consists of two key components: a load-based temperature prediction model and a model-based anomaly detection procedure. The temperature prediction model is a regression model that
uses calendar variables, lagged and leading load values, and the interactions among them. The estimated temperature profile is then used as a baseline or reference for detecting the anomalies. We ensured the reproducibility of our research and demonstrated the fact that the proposed method is not specific to a single dataset or location by using two public datasets that covered seven states of the United States to set up the computational experiment. In both experiments, the proposed method leads to improved load forecasts. The significance of this work lies in its interdisciplinary nature, in building a bridge between meteorology and energy. Over the past several decades, meteorologists have been analyzing weather data and developing numerical weather prediction models without using the electricity demand as an input. On the other hand, load forecasters have been taking the published weather history in full faith without devoting much effort to cleansing the weather data. This research demonstrates the benefits of marrying the two domains by using load data to validate the temperature data. It also opens the door to using data cleansing for other weather variables that are used in load forecasting models, such as the humidity and wind speed. Similar approaches could be tested on the other weather variables for improving the quality of the predictors and the forecast accuracies. Acknowledgment This research work was originally inspired by the two meteorologists and GEFCom2017 winners Kanda and
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
9
Fig. 6. Load profiles for two sample months in the ISONE case study. The detected points are indicated by small red circles. Table A.1 The temperature predictions (in MAE) for all 56 load zone/weather station pairs of ISONE.
Quintana Veguillas, who, during their presentation at the 2017 International Symposium on Energy Analytics, pointed out that temperature profiles should not stay flat for several consecutive hours. This work is supported in part by the National Science Foundation under Grant No. 1839812, and the U.S. Department of Energy, Cybersecurity for Energy Delivery Systems Program under Grant M616000124. Appendix A. The trend variable
The Trend variable was included when developing the load-based temperature prediction model in order to capture possible overall increases or decreases in temperature. Although the weather is not influenced by human activities, we made the assumption of a possible trend so as to generalize the model. This section shows that the cost of including Trend is negligible when the temperature data do not show an obvious trend. We repeated the ISONE case study with the Trend variable excluded from the temperature prediction model. Tables A.1 and A.2 are similar to Tables 5 and 6 respectively, but with Trend excluded from the model. The
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.
10
M. Sobhani, T. Hong and C. Martin / International Journal of Forecasting xxx (xxxx) xxx
Table A.2 The anomaly detection ratios (%) for load zone/weather station pairs of ISONE. Zone/WS
PWM
CON
BTV
BDL
PVD
ORH
BOS
ME NH VT CT RI SEMASS WCMASS NEMASSBOST
7.26 5.80 7.75 7.27 7.43 7.12 7.31 7.24
7.73 6.02 7.34 6.80 6.33 5.73 6.41 5.93
8.30 7.96 6.93 8.22 8.65 8.60 8.57 8.44
7.77 6.78 7.29 5.66 6.67 6.66 6.46 6.34
8.33 7.15 7.91 6.83 6.69 6.09 7.05 6.35
7.29 6.17 7.36 6.35 6.49 6.00 6.38 5.84
7.90 4.10 7.82 7.50 7.11 7.11 7.39 6.52
temperature prediction accuracy after removing the trend from the model is only 3% different (either higher or lower) on average. The average detection ratio is increased by about 2% relative to the initial model. Appendix B. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.ijforecast.2019. 04.022. References Alasali, F., Haben, S., Becerra, V., & Holderbaum, W. (2018). Day-ahead industrial load forecasting for electric RTG cranes. Journal of Modern Power Systems and Clean Energy, 6(2), 223–234. Ben Taieb, S., Huser, R., Hyndman, R. J., & Genton, M. G. (2016). Forecasting uncertainty in electricity smart meter data by boosting additive quantile regression. IEEE Transactions on Smart Grid, 7(5), 2448–2455. Berk, K., Hoffmann, A., & Müller, A. (2018). Probabilistic forecasting of industrial electricity load with regime switching behavior. International Journal of Forecasting, 34(2), 147–162. Bracale, A., Carpinelli, G., De Falco, P., & Hong, T. (2019). Shortterm industrial reactive power forecasting. International Journal of Electrical Power & Energy Systems, 107, 177–185. Charlton, N., & Singleton, C. (2014). A refined parametric model for short term load forecasting. International Journal of Forecasting, 30(2), 364–368. Fan, S., & Hyndman, R. J. (2012). Short-term load forecasting based on a semi-parametric additive model. IEEE Transactions on Power Systems, 27(1), 134–141. Hong, T. (2015). Crystal ball lessons in predictive analytics. EnergyBiz, 12(2), 35–37. Hong, T., & Fan, S. (2016). Probabilistic electric load forecasting: A tutorial review. International Journal of Forecasting, 32(3), 914–938. Hong, T., Pinson, P., & Fan, S. (2014). Global energy forecasting competition 2012. International Journal of Forecasting, 30(2), 357–363. Hong, T., Pinson, P., Fan, S., Zareipour, H., Troccoli, A., & Hyndman, R. J. (2016). Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. International Journal of Forecasting, 32(3), 896–913.
Hong, T., Wang, P., Pahwa, A., Gui, M., & Hsiang, S. M. (2010). Cost of temperature history data uncertainties in short term electric load forecasting. In 2010 IEEE 11th international conference on probabilistic methods applied to power systems (pp. 212–217). IEEE. Hong, T., Wang, P., & White, L. (2015). Weather station selection for electric load forecasting. International Journal of Forecasting, 31(2), 286–295. Hong, T., Wilson, J., & Xie, J. (2014). Long term probabilistic load forecasting and normalization with hourly information. IEEE Transactions on Smart Grid, 5(1), 456–462. Hong, T., Xie, J., & Black, J. (2019). Global energy forecasting competition 2017: Hierarchical probabilistic load forecasting. International Journal of Forecasting, 35(4), 1389–1399. Hyndman, R. J., & Fan, S. (2010). Density forecasting for long-term peak electricity demand. IEEE Transactions on Power Systems, 25(2), 1142–1153. Kanda, I., & Veguillas, J. M. Q. (2019). Data preprocessing and quantile regression for probabilistic load forecasting in GEFCom2017 final match. International Journal of Forecasting, 35(4), 1460–1468. Luo, J., Hong, T., & Fang, S.-C. (2018). Benchmarking robustness of load forecasting models under data integrity attacks. International Journal of Forecasting, 34(1), 89–104. Luo, J., Hong, T., & Fang, S.-C. (2019). Benchmarking robustness of load forecasting models under data integrity attacks. IEEE Transactions on Smart Grid, 10(5), 5397–5404. Luo, J., Hong, T., & Yue, M. (2018). Real-time anomaly detection for very short-term load forecasting. Journal of Modern Power Systems and Clean Energy, 6(2), 235–243. Sobhani, M., Campbell, A., Sangamwar, S., Li, C., & Hong, T. (2019). Combining weather stations for electric load forecasting. Energies, 12(8), 1510. Wang, Y., Chen, Q., Hong, T., & Kang, C. (2019). Review of smart meter data analytics: applications, methodologies, and challenges. IEEE Transactions on Smart Grid, 10(3), 3125–3148. Wang, P., Liu, B., & Hong, T. (2016). Electric load forecasting with recency effect: A big data approach. International Journal of Forecasting, 32(3), 585–597. Xie, J., & Hong, T. (2016). Gefcom2014 probabilistic electric load forecasting: An integrated solution with forecast combination and residual simulation. International Journal of Forecasting, 32(3), 1012–1016. Yue, M., Hong, T., & Wang, J. (2019). Descriptive analytics based anomaly detection for cybersecure load forecasting. IEEE Transactions on Smart Grid, 10(6), 5964–5974.
Masoud Sobhani is a Ph.D. student at the University of North Carolina at Charlotte. Tao Hong is an Associate Professor of Systems Engineering and Engineering Management at the University of North Carolina at Charlotte. Claude Martin is Senior Load Research Analyst at North Carolina Associate of Electric Cooperatives.
Please cite this article as: M. Sobhani, T. Hong and C. Martin, Temperature anomaly detection for electric load forecasting. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.04.022.