Combining Forecasts for Short Term Electricity Load Forecasting Weight 0.0 0.2 0.4 0.6 0.8 1.0 0 1000 2000 3000 4000 5000 Step M. Devaine, P. Gaillard, Y.Goude, G. Stoltz ENS Paris - EDF R&D - CNRS, INRIA (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 1 / 28
Motivation of Electricity Load Forecasting Electricity can not be stored, thus forecasting elec. consumption: to avoid blackouts on the grid to avoid financial penalties to optimize the management of production units and electricity trading Managing a wild variety of production units: nuclear plants fuel, coal and gas plants renewable energy: water dams, wind farms, solar panels... (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 2 / 28
Motivation of Electricity Load Forecasting Short-term load forecasting: from 1 day to a few hours horizon (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 3 / 28
Application to Electricity Load Data Electricity Data Trend 1/9/2002 13/1/2003 28/5/2003 9/10/2003 21/2/2004 4/7/2004 16/11/2004 31/3/2005 12/8/2005 25/12/2005 8/5/2006 20/9/2006 1/2/2007 16/6/2007 28/10/2007 10/3/2008 23/7/2008 4/12/2008 18/4/2009 31/8/2009 40000 50000 60000 70000 80000 90000 (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 4 / 28
Application to Electricity Load Data Electricity Data Yearly Pattern 1/1/2006 20/1/2006 8/2/2006 27/2/2006 18/3/2006 7/4/2006 26/4/2006 15/5/2006 3/6/2006 22/6/2006 12/7/2006 31/7/2006 19/8/2006 7/9/2006 26/9/2006 16/10/2006 4/11/2006 23/11/2006 12/12/2006 31/12/2006 30000 40000 50000 60000 70000 80000 (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 5 / 28
Application to Electricity Load Data Electricity Data Weekly Pattern 1/6/2006 2/6/2006 4/6/2006 5/6/2006 7/6/2006 8/6/2006 10/6/2006 12/6/2006 13/6/2006 15/6/2006 16/6/2006 18/6/2006 19/6/2006 21/6/2006 23/6/2006 24/6/2006 26/6/2006 27/6/2006 29/6/2006 30/6/2006 35000 40000 45000 50000 55000 (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 6 / 28
Application to Electricity Load Data Electricity Data Daily Pattern Load 40000 45000 50000 55000 60000 65000 70000 Mo Tu We Th Fr Sa Su 0 10 20 30 40 Instant (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 7 / 28
Application to Electricity Load Data Electricity Data Special Days Load (MW) 60000 65000 70000 75000 80000 Normal Special Tariff 55000 60000 65000 70000 75000 80000 85000 0 10 20 30 40 Instant 20/12/2007 20/12/2007 21/12/2007 22/12/2007 23/12/2007 24/12/2007 25/12/2007 25/12/2007 26/12/2007 27/12/2007 28/12/2007 29/12/2007 30/12/2007 30/12/2007 31/12/2007 1/1/2008 2/1/2008 3/1/2008 4/1/2008 4/1/2008 (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 8 / 28
Application to Electricity Load Data Electricity Data Load-Temperature (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 9 / 28
Application to Electricity Load Data Electricity Data Load-Cloud Cover Load (MW) 60000 65000 70000 75000 Cloud cover (Octets) 0 2 4 6 8 0 10 20 30 40 Instant 0 10 20 30 40 Instant (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 10 / 28
Application to Electricity Load Data Parametric Models Operational Forecasts: a high dimensional non-linear regression model, see [Bruhns et al.(2005)] Metehore Model Separate the Weather dep. and the Weather ind. Load: L WD t : L WI t : L t = L WD t + L WI t + ε t Cooling and Heating effect Felt temperature (expo. smoothing of the real temperature...) Trend Daily, Weekly and Yearly cycles (Regression, Fourier basis) Trend ε t: AR(1 Week) process (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 11 / 28
Application to Electricity Load Data Parametric Models Metehore Model f(t ) Load (MW) 0.6 0.7 0.8 0.9 1.0 Mo Th We Tu Fr Sa Su 10 0 10 20 30 5 10 15 20 T Hour Saturday Shape 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Monday Shape 0.4 0.5 0.6 0.7 0.8 0.9 1.0 5 10 15 20 Hour 5 10 15 20 Hour (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 12 / 28
Application to Electricity Load Data Semi-Parametric Models In used at EDF R& D, see [Pierrot and Goude (2011)], and [Wood (2006)] for a in depth presentation of the statistical method. GAM Model L t = 6 j=1 f j(hour t) IDayType t =j +f 7(Toy t, I t) + f 8(t) + g 1(T t, Time t) + g 2(T t 48, Time t) + g 3(Cloud t) + h(l t 24h ) + ε t f j s: Weather Independant Load (shapes of days,yearly cycle, trend) g j s: Weather Dependant Load h: Lagged effects (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 13 / 28
Application to Electricity Load Data z z Semi-Parametric Models GAM Model Temperature Effect 65000 60000 55000 50000 45000 0 10 week.temp 20 20 10 50 40 30 week.ind Load (MW) 10000 5000 0 5000 10000 Mo we Fr Sa Su 0 10 20 30 40 Hour Yearly Cycle Trend 80000 70000 60000 50000 40000 0.0 0.2 0.4 Posan 0.6 0.8 0 10 40 30 20 Instant 10000 5000 0 5000 10000 120000 140000 160000 180000 200000 220000 240000 t (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 14 / 28
Application to Electricity Load Data Non-Parametric Models Similarity models based on wavelets decomposition proposed in [Antoniadis et al. (2006)], [Antoniadis et al. (2010)], results presented in [Cugliary (2011)]. Functional Model Partitioning the load into blocks of load curves Z i (t) Classify this curves into clusters according to calendar informations In each cluster find similarity W i,j [0, 1] between curve i and j with a wavelets based distance Forecast tomorrow s curve Z n+1(t): n 1 Ẑ n+1(t) = W n,mz m+1(t) Tomorrow will look like the days following days similar than today in the past J. Cugliari at the EDF R & D center, Clamart, 2011 m=1 (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 15 / 28
Application to Electricity Load Data Non-Parametric Models Functional Model Load (MW) 55000 60000 65000 70000 75000 80000 85000 0 100 200 300 400 500 600 700 Time (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 16 / 28
Sequential Combination of Specialized Experts Framework and Algorithms This framework was introduced in [Blum (1997)] and further studied in [Freund et al. (1997)] On-line Sequential Aggregation At each time t [1, T ], we have access to Y t 1 = (y 1,..., y t 1), y i [0, B], and the past experts (e.g. GAM, Metehore or functional models) then the environment generates y t and the individual predictors (experts) ( f j,t ) 1 j N the forecaster builds his combined forecast ŷ t the environment reveals y t to the forecaster the experts incur loss l : R + R + R +, l(f j,t, y t) Individual Sequence: worst case bounds no assumption on an underlying stochastic process a general framework to embed all kind of base forecasters The square loss will be used in our experiments, thus l(x, y) = (x y) 2. (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 17 / 28
Sequential Combination of Specialized Experts Framework and Algorithms Each time t the experts can be active (produce a forecast) or inactive (do not produce any forecast) We denote E t {1,..., N} the set of active experts at time index t Aggregation consists in convex aggregation rule: p t = (p 1,t,..., p N,t ) X ŷ t = N p j,t f j,t j=1 X : {p t R N, p j,t 0, p 1,t +... + p N,t = 1} is the set of convex weight vectors over N elements the weights are produced sequentially with an algorithm based on the concept of regret (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 18 / 28
Sequential Combination of Specialized Experts Framework and Algorithms Supposing weights p are produced by the algorithm A, the regret with respect to the expert j up to time T is: R t(a, j) = t=1,...,t (l t(p t) l t(δ j )) Where l t(p) is the loss of the combined forecast based on weights p t, δ j the dirac mass of the expert j. R t(a, q) = t=1,...,t (l t(p t) l t(q)) Where q X. Goal: find an algorithm A that minimizes the regret, e.g. that obtains a minimal regret in o(t ) min j R t(a, j): as well or better than the best expert E η min q R t(a, q): as well or better than the best convex combination E grad η (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 19 / 28
Sequential Combination of Specialized Experts Framework and Algorithms E η, Exponential Weight Aggregation Input: η > 0 Initialisation: w 1 = (1/N,..., 1/N) For t from 1 to T do: End Do -Forecast ŷ t = 1 i Et w i,t j E t w j,t f j,t -Observe y t -For expert i from 1 to N update the weights: End For w i,t+1 = eηr t 1(E η,j) I {j Et } k E t e ηr t 1(E η,k) Eη grad : same algorithm, replacing the loss l t by l t such that l t(p t) l t(q) l t(p t) (p t q ) = l t(p t) l t(q), (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 20 / 28
Sequential Combination of Specialized Experts Framework and Algorithms Compound Experts: j T 1 = (j 1,..., j T ) size ( j1 T ) T = I {jt 1 j t } and size ( q T ) T 1 = t=2 The regrets are ( ) R T A, j T ( 1 = T ( ) ) t=1 l t(p t) l t δjt F η,α ( ) R T A, q T 1 = T ( t=1 lt(p t) l ) t(q t) Fη,α grad t=2 I {qt 1 q t } (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 21 / 28
Sequential Combination of Specialized Experts Framework and Algorithms F η,α, Fixed-Share Input: η > 0, α [0, 1] Initialisation: w 1 = (1/N,..., 1/N) For t from 1 to T do: -Forecast ŷ t = 1 wi,t j w j,tf j,t -Observe y t -For expert i from 1 to N update the weights: End For v i,t = w i,t e ηlt (δ i ) w i,t+1 = (1 α)v i,t + α M 1 j i v j,t End Do (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 22 / 28
Context: Sequential Combination of Specialized Experts Application produce one day ahead load forecasts of the French load consumption every day at noon (weigths are updated according to that constraint, it induces a modif. of the algorithms) base forecasters are obtained from R& D models: Metehore model: 15 experts GAM model: 8 experts Functional model: 1 expert this experts specialized on winter/summer periods, some are inactive on banking holidays Time intervals Every 30 minutes Number of days D 320 Time indexes T 15 360 Number of experts N 24 (= 15 + 8 + 1) Median of the y t 56.33 (GW) Bound B on the y t 92.76 (GW) Table: Some characteristics of the observations y t of the French data set of operational forecasting. (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 23 / 28
Sequential Combination of Specialized Experts Application Name of the benchmark procedure Formula Value Uniform convex weight vector rmse ( (1/24,..., 1/24) ) = 0.748 Best single expert Best convex weight vector min j=1,...,24 min q X rmse(j) = 0.782 rmse(q) = 0.683 Best compound expert Size at most m = 50 Size at most m = 100 Size at most m = T 1 = 10 359 min rmse ( j T ) j1 T 1 C50 min rmse ( j T ) j1 T 1 C100 min j T 1 E1 E2... E T rmse ( j1 T ) = 0.534 = 0.474 = 0.223 Table: Definition and performance of several (possibly off-line) benchmarking procedures on the French data set (GW) of operational forecasting. (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 24 / 28
Sequential Combination of Specialized Experts Application Optimisation of the aggregation rules parameters: fixeds values Value of η 10 6 10 5 10 4 2 10 4 10 3 5 10 3 10 2 rmse of E η (u) 0.724 0.722 0.718 0.731 0.788 E grad η (u) 0.724 0.722 0.712 0.683 0.650 0.668 Table: Performance obtained by the sequential aggregation rules for various choices ofη. Value of η 0.01 0.01 0.01 1 1 1 500 500 500 α 0.001 0.01 0.05 0.001 0.01 0.05 0.001 0.01 0.05 mse of F η,α 0.678 0.683 0.704 0.711 0.659 0.652 0.674 0.633 0.632 F grad η,α 0.646 0.669 0.700 0.622 0.598 0.637 0.683 0.675 0.671 Table: Performance obtained by the sequential aggregation rules F η,α and F grad η,α for various choices of η and α. (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 25 / 28
Sequential Combination of Specialized Experts Application Optimisation of the aggregation rules parameters: online calibration Table: Best constant pair (η, α) Grid rmse of F η,α 0.632 0.644 F grad η,α 0.598 0.599 Performance obtained by the rules F η,α and F grad η,α for the best constant choices of η and α and with the meta-rule selecting sequentially the values of η and α. We obtain a significant improvement of 20% of the RMSE over the best expert. Performance of the fixed-share rule is comparable to the best compound expert with 50 shifts. (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 26 / 28
Sequential Combination of Specialized Experts Application Example of Weights Weight 0.0 0.2 0.4 0.6 0.8 1.0 Weight 0.0 0.2 0.4 0.6 0.8 1.0 0 5000 10000 15000 Half hours 0 5000 10000 15000 Half hours (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 27 / 28
Conclusion and Future Work Building the specialized experts: for extreme weather conditions, holidays etc... Intraday forecasts Algorithms based on exogenous informations: meteo, calendar data... Density forecasts based on experts advices (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 28 / 28
Conclusion and Future Work A. Antoniadis, E. Paparoditis, and T. Sapatinas. A functional wavelet kernel approach for time series prediction. Journal of the Royal Statistical Society: Series B, 68(5):837-857, 2006. A. Antoniadis, X. Brossat, J. Cugliari, and J.M. Poggi. Clustering functional data using wavelets. In Proceedings of the Nineteenth International Conference on Computational Statistics (COMPSTAT), 2010. J. Cugliari, Prévision non paramétrique de processus à valeurs fonctionnelles, Application à la consommation d électricité, PhD Thesis. A. Blum, Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning, 26:5-23, 1997. Bruhns, A., Deurveilher G., and Roy, J.S. (2005), A non-linear regression model for mid-term load forecasting and improvements in seasonnality, presented at the 15th Power Systems Computation Conference, August 22 26, 2005, Liege, Belgium. Y. Freund, R. Schapire, Y. Singer, and M. Warmuth, Using and combining predictors that specialize, In Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing (STOC), pages 334-343, 1997. A. Pierrot and Y. Goude, Short-Term Electricity Load Forecasting With Generalized Additive Models, Proceedings of ISAP power 2011, 2011. Wood, S.N. (2006) Generalized Additive Models: An Introduction with R. CRC/Chapman & Hall. (ENS Paris - EDF R&D - CNRS, INRIA) 11/02/2012 28 / 28