## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Empirical prediction intervals improve energy forecasting

Edited by B. L. Turner, Arizona State University, Tempe, AZ, and approved July 11, 2017 (received for review December 8, 2016)

## Significance

While many forecasters are moving toward generating probabilistic predictions, energy forecasts typically still consist of point projections and scenarios without associated probabilities. Empirical density forecasting methods provide a probabilistic amendment to existing point forecasts. Here we lay the groundwork for evaluating the performance of these methods in the data-scarce setting of long-term forecasts. Results can give policy analysts and other users confidence in estimating forecast uncertainties with empirical methods.

## Abstract

Hundreds of organizations and analysts use energy projections, such as those contained in the US Energy Information Administration (EIA)’s Annual Energy Outlook (AEO), for investment and policy decisions. Retrospective analyses of past AEO projections have shown that observed values can differ from the projection by several hundred percent, and thus a thorough treatment of uncertainty is essential. We evaluate the out-of-sample forecasting performance of several empirical density forecasting methods, using the continuous ranked probability score (CRPS). The analysis confirms that a Gaussian density, estimated on past forecasting errors, gives comparatively accurate uncertainty estimates over a variety of energy quantities in the AEO, in particular outperforming scenario projections provided in the AEO. We report probabilistic uncertainties for 18 core quantities of the AEO 2016 projections. Our work frames how to produce, evaluate, and rank probabilistic forecasts in this setting. We propose a log transformation of forecast errors for price projections and a modified nonparametric empirical density forecasting method. Our findings give guidance on how to evaluate and communicate uncertainty in future energy outlooks.

Projections of quantities such as electricity and fuel demands, commodity prices, and specific energy consumption and production rates are widely used to inform private and public investment decisions, long-term strategies, and policy analysis (1⇓–3). Policy analysts and decision makers often use modeled projections as forecasts with little or no discussion about the associated uncertainty (2, 4, 5). [Energy outlooks are often referred to as projections because they refrain from incorporating future policy changes into the reference scenario. In contrast, the term forecast denotes a best estimate allowing for all changes of the state of the world (6). While we are aware of this difference, our analysis treats the reference scenario as the best estimate forecast. We use the terms forecast and projection interchangeably.] Here we are concerned with national-scale forecasts in the energy industry that span a range from years to decades. Two of the most influential sets of energy projections are those of the US Energy Information Administration (EIA) and the International Energy Agency (IEA), complemented by those made by private oil and gas companies, such as Shell, ExxonMobil, and Statoil. When assessed retrospectively, such energy projections have sometimes shown very large deviations from the realized values (7⇓–9). Providing information on the likely uncertainty associated with such projections would help individuals and organizations use them in a more informed manner (10⇓–12).

All of the energy outlooks mentioned above provide point projections without a probabilistic treatment of uncertainty. Often, point forecasts are labeled as a “reference scenario” and are accompanied by alternative scenarios. While scenarios may be used to bound a range of possible outcomes, they can easily be misinterpreted (13) and are typically not intended to reflect any treatment of probability. The fact that most projections in the energy space do not report probability distributions around predicted values, or an expected variance, is a problem that has been frequently noted in the literature (13⇓⇓⇓–17). Shlyakhter et al. (14) criticize the EIA for not treating uncertainty in the Annual Energy Outlook (AEO). Density forecasting is increasingly becoming the standard (16, 18) in a variety of disciplines ranging from forecasts of inflation rates (19⇓–21), financial risk management, and trading operations (22, 23) to demographics (24), peak electricity demand (25), and wind power generation (26, 27). There are a number of procedures for probabilistic forecasting (22). Most of these methods take an integrated approach to forecast the whole distribution, including the best estimate. The empirical methods we use here instead allow analysts or forecast users to attach an uncertainty distribution to a preexisting point forecast.

The importance of density forecast evaluation has been discussed by several authors (17, 28⇓–30). When methods are chosen to generate probabilistic energy forecasts, such evaluation is often omitted. Our work is a step toward making energy density forecasting more feasible and robust by framing how to evaluate a probabilistic forecast in this setting.

### Choosing a Density Forecasting Method.

We compare different methods by testing how accurately they estimate the uncertainty of data that were not used to train the methods.

We argue that if a forecaster is choosing between different methods, this should be the central criterion, even though others such as usability and ease of explanation might also be relevant. Adopting a frequentist’s approach, we view a future observation as a random event around the given forecast. A density prediction is best if it equals the probability density function (PDF) from which this future observation is drawn.

Density forecasts are evaluated by their calibration and their sharpness subject to calibration (29). By sharpness we mean that narrower PDFs are preferable. Calibration, as a core concept of forecast evaluation, refers to the predictive density representing correctly the true PDF of the observation. Measuring calibration requires the availability of unknown observations. This can be simulated by using an early portion of the time series to train the density prediction and using later actual values as the test observations. This procedure is referred to as out-of-sample forecast evaluation. Dividing the data into these two sets requires a long enough record of historical data and forecasts to draw statistically significant conclusions. While the AEO sample size is small, we see no viable alternative to this procedure and find that even small sample results can provide useful insights.

As it is a measure of both calibration and sharpness, we use the continuous ranked probability score (CRPS) (30⇓–32) to compare density forecasts. For point forecast evaluation we work with the average prediction error, here the mean absolute percentage error (MAPE), and the transformed mean absolute logarithmic error (MALE) for prices (*Materials and Methods*).

### Empirical Density Prediction Methods.

We compare four different data-driven parametric and nonparametric estimates of forecast uncertainty in the form of PDFs (Table 1 and *Materials and Methods*). A simple method of empirical prediction intervals (EPIs), first published by Williams and Goodman (33), uses the distribution of past forecast errors to create a probability density forecast around an existing point forecast. It relies on the assumption that past errors are a good estimator of the forecaster’s current ability to predict the future. EPIs are an established approach and have been used in a number of fields such as meteorology (34), including the creation of the classic “cone of uncertainty” now routinely produced for likely hurricane tracks (35), future commodity prices (36), and the values of macroeconomic variables such as inflation (20). There is a continuing interest in the method from researchers in applied mathematics and statistics (18, 37, 38). We introduce a second nonparametric EPI, which is a modification of Williams and Goodman’s EPI, with a centered error distribution. For a third, parametric, prediction method we use the forecasting errors to estimate a Gaussian density forecast. A parametric PDF has the advantage of greater ease of use. We use the volatility of the time series of historical values to inform a fourth probabilistic forecast, which is valuable in cases where the forecasting record is short.

We apply the four different methods to 18 quantities in EIA’s AEO (39), which are chosen based on EIA’s Retrospective Review (40) (*Materials and Methods*). The AEO forecasting record spans more than 30 years. Unfortunately, in the context of forecast evaluation a sample size of

We begin by evaluating the point forecast performance of the AEO reference case over our test range of AEO 2003–2014. Using the same out-of-sample AEOs and historical observations, we then compare the calibration and sharpness of the four different density forecasts. The prediction intervals are also compared with the scenarios published in the AEO. We find that over the test range a normal distribution based on past forecasting errors clearly outperformed uncertainties based on the scenarios in the AEO. This conclusion is for the diverse set of all quantities, but depending upon the quantity, in some cases other methods showed better results. We conclude the paper with a comparative discussion of the methods and their applicability to energy forecasting.

## Results

We evaluate the predictive performance of four uncertainty estimation methods (Table 1) over the test range of AEO 2003–2014 and observations of 2002–2015, using 1985–2002 as the training range. The test range excludes AEO 2009, which did not provide scenarios for the updated reference case. We determine the number of quantities for which a method performed best. We find that Gaussian densities informed by retrospective errors (G_{1}) or based on the variability of the historical values (G_{2}) performed best for the most quantities. The original nonparametric method, as in ref. 33 (NP_{1}), performed best in very few cases. The centered nonparametric distribution (NP_{2}), which gives the largest weight to the AEO reference case projection instead of the bias, performed better over the test range than NP_{1}. The respective best empirical uncertainty estimation methods had significantly better calibration than methods based on the AEO scenarios with 95% confidence. In fact, G_{1} significantly outperformed the scenarios for all quantities and provided a valid general approach to estimate the uncertainty in the AEO.

While we have performed analysis for 18 quantities forecasted in the AEO, we use 2 of the quantities, natural gas wellhead price in nominal dollars per 1,000 cubic feet (hereafter natural gas price) and total electricity sales in billion kilowatt hours (hereafter electricity sales), for illustration purposes (Figs. 1 and 2). Results for all 18 quantities can be found in *SI Appendix*.

### Error Metric and Transformation for Price Quantities.

All forecast evaluation scores are computed on the basis of the deviations of the forecasts *SI Appendix*).

The structure of the relative errors as a function of forecast year and forecast horizon is shown in Fig. 3. The horizon

### Retrospective Analysis Can Inform Density Forecasts.

We illustrate examples of the four probabilistic forecasting methods listed in Table 1. Figs. 1 and 2 compare the nonparametric methods to the methods that performed better for the two example quantities, that is, the two Gaussian predictions.

A nonparametric distribution of the errors (NP_{1}) results in the EPI shown in Fig. 1*A*. Here the median of the errors is not exactly zero, which is often referred to as bias. We see that this results in a second point forecast or a best estimate forecast that is not equal to the reference case scenario. If we can assume that the forecasting errors are stationary, then past and future errors follow the same PDF, and this bias should yield a better point forecast than the reference case. However, we found this is not the case for most quantities.

Modifying the nonparametric distribution in such way that it places the greatest weight on the AEO reference case projection is one approach to combat this problem (NP_{2}). This centered EPI for electricity sales is shown in Fig. 2*A*. In the percentage-error space, we center by subtracting the median error *SI Appendix*).

These two nonparametric estimations are compared with two parametric distributions, Gaussians with a mean of zero and the variance of the errors (G_{1}) (Fig. 2*B*) and with the variance of historical values (G_{2}) (Fig. 1*B*). When modeling normality, we implicitly make assumptions about the nature of the errors. Extreme errors, which can have large consequences for decision making, occur frequently in energy forecasting (14). A Gaussian PDF may not do an adequate job of representing heavier tails and might underestimate the probability of extreme events. However, a parametric distribution will generate longer tails than a nonparametric error PDF. Regarding usability, the simplicity of a two-parameter specification prevails over nonparametric distributions. A discussion of normality and correlation in the errors is provided in *SI Appendix*.

### Past Bias in the AEO Does Not Predict Future Bias.

Recently, electricity sales have been flat. Can a forecast be better than a constant prediction using the last observation, i.e., persistence? We can assess the point forecasting skill of the AEO reference case projections by comparing them with benchmark forecasts such as persistence or simple linear regression. To compare different point forecasts, we evaluate the MAPE and the MALE for prices. MAPE and MALE are defined as the sum over the absolute value of all observed errors for a given horizon (*Materials and Methods*). A larger MAPE/MALE indicates that the forecast has performed worse over the test range 2003–2014 (Fig. 4).

We find that persistence performed surprisingly well over the test range of the last decade, outperforming the AEO for 10 of the 18 quantities. This is due to the fact that the recent decade has seen trend changes that are conducive to persistence forecasts. If the length of the fitted window is optimized for the test range, a simple linear regression significantly outperforms the reference case for eight quantities with 95% confidence. Point forecast comparison of the AEO reference case with the median of the errors reveals that correcting for the bias is not a good strategy in most cases. The AEO reference case was a better point forecast than the bias for most of the quantities over the test range, except for coal production and residential energy consumption. We therefore anticipate that centering the nonparametric uncertainty (NP_{2}) is advised for most quantities except those.

### Gaussian Density Forecasts Often Perform Well.

Scoring rules, or scores, provide a means for comparing the performance of different probabilistic forecasts. We use the CRPS, which is a strictly proper score in this case (31). It assigns value not only to the predicted probability of an observation but also to the distance of a predicted probability mass from an observation. It is therefore relatively robust to specific functional forms of the density forecasts (30) and allows for comparison with point and ensemble forecasts (31, 32) (*Materials and Methods*).

The results of the average CRPS over the test range for each horizon in units of relative or log error are illustrated in Fig. 5. A standalone value of the CRPS is not meaningful; it serves to provide a comparison between different methods. As the CRPS reduces to the MAPE/MALE for a point forecast, it is informative to compare the results to the MAPE/MALE of the AEO reference case. In Fig. 5, we find that the scenarios (S) only marginally improve the prediction with respect to the point forecast. In addition, we see that for the natural gas price, NP_{1} is larger than the MALE due to poor point forecast performance of the EPI’s median.

To find the best density prediction method, we normalize the CRPS of each method by the CRPS of the scenario ensemble (S) for every horizon (Fig. 6). For every quantity, we then average over a core range of horizons *SI Appendix*).

The ranking of all quantities shows that the two Gaussian methods perform well for most quantities (Fig. 7). G_{1} counts as the best method for 9 of the 18 quantities and G_{2} for 3 quantities. The performance of G_{2} is, however, often similar to that of G_{1} and it is second best for 8 quantities. The fact that these parametric methods performed well over the test range is convenient, because there are standard ways to use a normal distribution as a model input. Besides these parametric methods, also NP_{2} performed well. As expected, in the two cases of coal production and residential energy consumption, including the bias with NP_{1} seemed the best approach over the test range. In the following section, we analyze whether the empirical methods performed significantly better than uncertainty estimates based on the scenarios.

### AEO Scenario Ranges Are Narrower Than Observed Uncertainties.

Every AEO includes a number of scenarios, intended as sensitivity studies on the reference case under a small number of varied input assumptions. No value is assigned to the probability that a future outcome will lie within the scenario range. The CRPS allows for comparison of a density forecast with an ensemble forecast. It assigns every discrete scenario an equal point probability mass (S). Because of the varying number of scenarios in the AEO, we make a simplification and consider only the reference case and the high- and low-envelope scenarios, which do not correspond to a specific scenario in the AEO (*Materials and Methods*). In addition, we discuss a Gaussian distribution (SP_{1}) and a uniform distribution (SP_{2}) based on the envelope scenarios.

The CRPS scores normalized by the score of S are shown in Fig. 6. Fig. 6 also includes the scores for the sensitivity cases SP_{1} and SP_{2}. A normalized CRPS of an empirical method that is _{1} was the better probabilistic forecast over the test range. We find that the best-ranked empirical method for a respective quantity was significantly better than both S and SP_{1} with 95% confidence. In fact, NP_{2}, G_{1}, and G_{2} all show significant improvements (Fig. 7). These results are likely due to the fact that over the test range on average the scenario range of all AEO quantities covered only *SI Appendix*). The width between the highest and the lowest scenario, however, changes greatly from one AEO to another and is somewhat correlated to the number of scenarios published.

## Discussion and Conclusion

This analysis showed that empirical density prediction methods, based on forecasting errors or historical deviations, provide valuable approaches for including an estimate of uncertainty with a forecast. There are empirical methods available for estimating the uncertainty around the AEO reference case, which have proved to be significantly more accurate over the past decade than the scenarios of the AEO. We find that a Gaussian distribution based on past errors (G_{1}) offers a method with convincing ease of use and good performance over the different quantities (Fig. 7). We therefore recommend that the EIA and others producing energy forecasts include the SD of forecast errors in their retrospective reports. We supply the values for AEO 2016 in *SI Appendix*. A nonparametric distribution of the observed forecast errors was the better density forecast only in a few cases, confirming that focusing on representing the exact error distribution does not need to provide the better out-of-sample forecast. Point forecast evaluation illuminated that EIA’s forecast bias is in most cases not consistent and that using a bias-corrected reference case does typically not lead to the better forecast.

As both the forecasting process and the energy system can be nonstationary, there is no way to be sure that our results will be applicable to future data. However, the way we evaluated and chose a method is a robust procedure. Hence, in the absence of other insights we recommend using one of the Gaussian distributions.

Despite the advantages of probabilistic forecasts, scenarios convey important information about the workings of energy predictions and allow users to better understand and compare the assumptions. We emphasize that the combined use of a density forecast and scenarios would be a fruitful approach to describe the uncertainty of a forecast. Empirical density forecasts are easily reproducible, but other probabilistic methods such as a quantile forecasting could also advance energy projections.

## Materials and Methods

See *SI Appendix* for a detailed description of the materials and methods used.

### Data.

The dataset consists of AEOs 1982–2016 and historical values from 1985 to 2015. Historical data were taken from the EIA Retrospective Review (40) and the AEOs (39), and conversions were applied where necessary. All data are publicly available on the EIA website. Refer to *SI Appendix: Data Description* for more detail. The data analysis was performed in R (44).

### List of Methods.

#### Point forecasting methods.

##### AEO reference case.

We treat the AEO reference case as a point forecast. The reference case is a projection of the current state of laws and regulations and does not represent a best estimate forecast. Also the EIA chooses the reference case as a best estimate when determining projection errors (40).

##### Median errors (NP_{1}).

The median of the EPI with a nonparametric distribution of the errors (NP_{1}) is computed as the reference case adjusted by the median of past forecasting errors.

##### Persistence.

Persistence refers to a constant forecast equal to the last observation. Here, we use the forecasted value at

##### Simple linear model.

This benchmark is a simple linear regression with time as the predictor. The quantity is regressed over a moving window of the last seven historical observations. This size of window is the optimum for the test range.

#### Density forecasting methods.

#### NP_{1}.

This method is an EPI with a nonparametric distribution of the forecasting errors and a median different from the reference case. This method was originally published by ref. 33.

#### NP_{2}.

This method is an EPI with a nonparametric error distribution, which is centered such that the median and

#### G_{1}.

This method is a Gaussian distribution with the SD of the past errors and a mean and median of

#### G_{2}.

This method is a Gaussian distribution with a SD based on a sample of all relative deviations between two historical data points which are

#### S.

This ensemble forecast consists of the reference case and the highest and lowest scenario projections in every year. These correspond to the envelope of all scenarios by using only the highest and lowest projected values.

#### SP.

Two parametric density predictions are based on the envelope scenarios in the AEO. We chose a Gaussian distribution with the distance to the farthest scenario as 1 SD (SP_{1}) and a uniform distribution between the envelope scenarios (SP_{2}).

### MAPE.

The MAPE is a measure for point forecast performance. This becomes the MALE in the case of price forecasts with log errors. They are defined as

### CRPS.

The CPRS for every horizon, as we use it in this paper, is defined as* _{H}* by the CRPS

*of the scenario ensemble.*

_{S, H}### Improvement Testing.

We perform a bootstrap on the single CRPS results in a horizon sample, which then is used to compute the CRPS* _{H}* and the aggregated CRPS average for the ranking. For each of the four methods, we determine the portion of resampled results that indicates that S or SP

_{1}is the better forecast. If this portion is smaller than 0.05, we speak of the method as being a significant improvement over the scenarios.

### Sensitivity Analysis On the Ranking Results.

To test the sensitivity of the ranking, we varied the default assumptions. Instead of first averaging the normalized CRPS and then ranking that result, we alternatively first ranked the CRPS* _{H}* and then averaged over the horizons. We also averaged over the full range of horizons

## Acknowledgments

We thank Evan D. Sherwin, Inês L. Azevedo, Cosma R. Shalizi, Alexander L. Davis, Stephen E. Fienberg, and Max Henrion for their advice and assistance. Evan D. Sherwin led the data collection and adjustments. We thank the EIA for hosting a presentation and discussion about this work, in particular Faouzi Aloulou, David Daniels, and John Staub. This work was supported by the Electric Power Research Institute and by the Center for Climate and Energy Decision Making through a cooperative agreement between the National Science Foundation and Carnegie Mellon University (SES-0949710).

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. Email: kaack{at}cmu.edu.

Author contributions: L.H.K., J.A., and M.G.M. designed research; L.H.K., J.A., M.G.M., and P.M. performed research; L.H.K. and P.M. contributed new reagents/analytic tools; L.H.K. analyzed data; and L.H.K., J.A., M.G.M., and P.M. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1619938114/-/DCSupplemental.

## References

- ↵.
- Winebrake JJ,
- Sakva D

- ↵.
- Wara M,
- Cullenward D,
- Teitelbaum R

- ↵.
- Gilbert AQ,
- Sovacool BK

- ↵.
- Neuhauser A

*US News World Rep*. Available at https://www.usnews.com/news/articles/2015/05/28/wasted-energy-the-pitfalls-of-the-eias-policy-neutral-approach. Accessed July 23, 2017. - ↵.
- Harvey C

*Wash Post*. Available at https://www.washingtonpost.com/news/energy-environment/wp/2016/05/13/how-we-get-energy-is-changing-rapidly-and-its-sparking-a-huge-fight-over-forecasting-the-future/?utm_term=.987c4550ffc8. - ↵.
- Intergovernmental Panel on Climate change

- ↵.
- Fischer C,
- Herrnstadt E,
- Morgenstern R

- ↵.
- Linderoth H

- ↵
- ↵.
- Schlaifer R,
- Raiffa H

- ↵.
- Morgan MG,
- Henrion M

- ↵.
- Fischhoff B,
- Davis AL

- ↵.
- Morgan MG,
- Keith DW

- ↵.
- Shlyakhter AI,
- Kammen DM,
- Broido CL,
- Wilson R

- ↵.
- Craig PP,
- Gadgil A,
- Koomey JG

- ↵.
- Gneiting T

- ↵.
- Vahey SP,
- Wakerly L

*Globalisation and Inflation Dynamics in Asia and the Pacific*(Bank for International Settlements, Basel, Switzerland), BIS Paper 70b. Available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2248763. Accessed July 23, 2017. - ↵.
- Gneiting T,
- Katzfuss M

- ↵.
- Engle RF,
- White H

- Diebold FX,
- Tay AS,
- Wallis KF

- ↵.
- Britton E,
- Fisher P,
- Whitley J

- ↵.
- Blix M,
- Sellin P

- ↵
- ↵
- ↵.
- Raftery AE,
- Li N,
- Ševčíková H,
- Gerland P,
- Heilig GK

- ↵.
- McSharry PE,
- Bouwman S,
- Bloemhof G

- ↵.
- Taylor JW,
- McSharry PE,
- Buizza R

- ↵.
- Pinson P

- ↵
- ↵.
- Gneiting T,
- Balabdaoui F,
- Raftery AE

- ↵.
- Smith LA,
- Suckling EB,
- Thompson EL,
- Maynard T,
- Du H

- ↵
- ↵.
- Hersbach H

- ↵.
- Williams WH,
- Goodman ML

- ↵.
- Pinson P,
- Kariniotakis G

- ↵.
- NOAA National HurricaneCenter

*National Hurricane Center Forecast Verification*. Available at www.nhc.noaa.gov/verification/verify6.shtml. Accessed July 23, 2017. - ↵.
- Isengildina-Massa O,
- Irwin S,
- Good DL,
- Massa L

- ↵.
- Knüppel M

- ↵.
- Lee YS,
- Scholtes S

- ↵.
- US Energy InformationAdministration

*Annual Energy Outlook*. Available at www.eia.gov/forecasts/aeo/. Accessed July 23, 2017. - ↵.
- US EnergyInformationAdministration

*Annual Energy Outlook Retrospective Review*. Available at https://www.eia.gov/forecasts/aeo/retrospective/. Accessed July 23, 2017. - ↵.
- O’Neill BC,
- Desai M

- ↵.
- Auffhammer M

- ↵.
- Sprenkle CM

- ↵.
- R Core Team

## Citation Manager Formats

## Sign up for Article Alerts

## Article Classifications

- Social Sciences
- Sustainability Science