Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Size: px

Start display at page:

Download "Big Data Techniques Applied to Very Short-term Wind Power Forecasting"

Nigel Leonard
8 years ago
Views:

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.

1 Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher Center for Power and Energy Systems, INESC TEC, Portugal Joint work with Laura Cavalcante and Marisa Reis EWEA Technology Workshop: Wind Power Forecasting October 2015, Leuven, Belgium

pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

2 Introduction Vector Autogression (VAR) models can be applied to combine wind power time series distributed in space Two important requirements for a practical implementation Reduce the number of non-null coefficients Low computational time in large datasets This work provides the following original contributions Explores a set of sparse structures for the VAR model Applies the alternating direction method of multipliers (ADMM) to estimate the VAR coefficients Explores parallel computing 2 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

provides the following original contributions Explores a set of sparse structures for the VAR model Applies the alternating direction method

3 Autoregressive Model Linear Time Series Models Lasso-VAR Model and variants Solving Lasso-VAR with ADMM algorithm Univariate model: uses past observations from the same time series AR(p) - Autoregressive Model of order p forecasts the variable y t given the past p values y t = c +b 1 y t 1 +b 2 y t b p y t p +ε t VAR(p) - Vector Autoregressive Model of order p forecasts the vector of k variables Y t = (Y 1,t,Y 2,t,...,Y k,t ) Y t = c +B 1 Y t 1 +B 2 Y t B p Y t p + u t 3 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

= c +b 1 y t 1 +b 2 y t 2 + ++b p y t p +ε t VAR(p) - Vector Autoregressive Model of order p forecasts the vector of k variables Y t = (Y

4 Linear Time Series Models Lasso-VAR Model and variants Solving Lasso-VAR with ADMM algorithm Least Absolute Shrinkage and Selection Operator (LASSO)-VAR Model The Lasso-VAR estimation minimizes the residual sum of squares subject to an L 1 constraint 1 2 Y BZ 2 F s.t. B 1 t Equivalently, it can be defined in the Lagrangian form as 1 2 Y BZ 2 F +λ B 1, where X p = ( n i=1 xi p ) 1/p, X 2 F = m i=1 n j=1 xij 2 is the Frobenius norm and the regularization parameter λ 0 is inverse related to t Fits the regression model and simultaneously performs variable selection by shrinking regression coefficients to zero 4 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

mation minimizes the residual sum of squares subject to an L 1 constraint 1 2 Y BZ 2 F s.t. B 1 t Equivalently, it can be defined in the Lagrangian form as 1 2 Y BZ 2 F +λ

5 Linear Time Series Models Lasso-VAR Model and variants Solving Lasso-VAR with ADMM algorithm Lasso-VAR Model: Extensions and Generalizations Lasso Extensions Penalty Illustration Row Lasso λ B i 1 Matricial Lasso λ B 1 Lag Lasso Group Lasso Sparse Group Lasso λ p l=1 B l 1 λ i j (B 1 ) ij...(b p ) ij 2 (1 α)λ p l=1 B l F +αλ B 1 5 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

1 Matricial Lasso λ B 1 Lag Lasso Group Lasso Sparse Group Lasso λ p l=1 B l 1 λ i j (B 1 ) ij.

6 Linear Time Series Models Lasso-VAR Model and variants Solving Lasso-VAR with ADMM algorithm Parameter Estimation and the ADMM Algorithm The goal is to estimate the sparse matrix of coefficients with a simple and powerful algorithm ADMM framework has several advantages Combines the problem separability offered by the dual ascent method with the convergence properties of the method of multipliers Convex problems with nondifferentiable constraints (as LASSO) can be easily addressed Parallel Optimization: break up large datasets into blocks and carry out the optimization over each block 6 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

method with the convergence properties of the method of multipliers Convex problems with nondifferentiable constraints (as LASSO) can be easily addressed Parallel

7 ADMM Algorithm Introduction Linear Time Series Models Lasso-VAR Model and variants Solving Lasso-VAR with ADMM algorithm Lasso-VAR: minimize 1 2 Y BZ 2 F +λ B 1 ADMM problem form: minimize 1 2 Y BZ 2 F } {{ } f(b) +λ H 1 } {{ } f(h) s.t. B H = 0 Augmented Lagrangian L ρ (B,H,W) = 1 2 Y BZ 2 F +λ H 1 +WT (B H)+ ρ 2 B H 2 F 7 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

2 Y BZ 2 F } {{ } f(b) +λ H 1 } {{ } f(h) s.t.

8 Parallel Computing Introduction Linear Time Series Models Lasso-VAR Model and variants Solving Lasso-VAR with ADMM algorithm The goal is to split data and use ADMM to solve the problem in a distributed manner (with N objective terms) Z 1 Z 2... ZN Split data across features and use ADMM sharing problem Z 1 Z 2. Z N Split data across examples and use ADMM consensus optimization 8 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

objective terms) Z 1 Z 2... ZN Split data across features and use ADMM sharing problem Z 1 Z 2.

9 ADMM and Parallel Computing Linear Time Series Models Lasso-VAR Model and variants Solving Lasso-VAR with ADMM algorithm Splitting Across Examples min N i=1 1/2 Y i B i Z i 2 F } {{ } f i (B i ) +λ B i 1 } {{ } g(b i ) Splitting Across Features min 1/2 Y N i=1 B 2 iz i + N i=1 λ B i 1 F } {{ } } {{ } f i (B i ) g( N i=1 B i Z i ) min N i=1 f i(b i )+g(h) s.t B i H = 0 B k+1 i := arg min B i (f i (B i ) + ρ 2 B i H k + U k i 2 ) F H k+1 := arg min (g(h) + Nρ H B k+1 U k 2 H 2 F U k+1 i := U k i + B k+1 i H k+1 ) min N i=1 f i(b i )+g( N i=1 H i) s.t B i Z i H i = 0 B k+1 i H k+1 i := arg min B i (f i (B i ) + ρ 2 ( := arg min g( N H i=1 H i ) + ρ 2 U k+1 := U k + B k+1 i Z i H k+1 i B i Z i H k i + U k i 2 ) F N H i U k i B k+1 2 ) i Z i F i=1 9 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

t B i H = 0 B k+1 i := arg min B i (f i (B i ) + ρ 2 B i H k + U k i 2 ) F H k+1 := arg min (g(h) + Nρ H B k+1 U k 2 H 2 F U k+1 i := U k i + B k+1 i H k+1 ) min N i=1 f i(b i )+g( N i=1 H i) s.

10 Case Study description Description Numerical Results Conclusions Apply ADMM algorithm to several LASSO-VAR(2) variants in order to produce wind power forecasts from 1 to 6 hours ahead Dataset 68 wind farms (same control area) Training period: 9 months Test period: 3 months Time resolution: 1 hour LASSO and ADMM parameters estimated by 5-fold cross-validation Calculate the improvement in terms of Root Mean Squared Error (RMSE) compared to an Autoregression model - AR(2) 10 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

months Time resolution: 1 hour LASSO and ADMM parameters estimated by 5-fold cross-validation Calculate the improvement in terms of Root

11 Description Numerical Results Conclusions RMSE Improvement over AR results Wind Farm with best improvement Improvement over AR (%) Row L V Matricial L V Lag L V Group L V Sparse L V No Sparsity Time Horizon (h) 11 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

V Matricial L V Lag L V Group L V Sparse L V No Sparsity 7 1 2 3 4 5 6 Time

12 RMSE Improvement over AR result Description Numerical Results Conclusions Wind Farm with intermediate improvement Improvement over AR (%) Row L V Matricial L V Lag L V Group L V Sparse L V No Sparsity Time Horizon (h) 12 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

Matricial L V Lag L V Group L V Sparse L V No Sparsity 5 4 1 2 3 4 5 6 Time

13 RMSE Improvement over AR result Description Numerical Results Conclusions Wind Farm with worst improvement Improvement over AR (%) Row L V Matricial L V Lag L V Group L V Sparse L V No Sparsity Time Horizon (h) N o of wind farms with negative imp. (average over the time horizon): 3 N o of wind farms with negative imp. in at least one lead-time: 13 Group LASSO does not have negative imp. in the first two lead-times 13 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

with negative imp. (average over the time horizon): 3 N o of wind farms with negative imp.

14 RMSE Improvement over AR result Description Numerical Results Conclusions Global Improvement over AR (%) Row L V Matricial L V Lag L V Group L V Sparse L V No Sparsity Time Horizon (h) 14 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

L V Lag L V Group L V Sparse L V No Sparsity 2 1 2 3 4 5 6 Time

15 Running Time Introduction Description Numerical Results Conclusions Lasso Extensions Not distributed Distributed over Examples Row Lasso Matricial Lasso Lag Lasso Group Lasso Sparse Lasso Table: Time (in sec) to run data divided by a i7 8-cores processor The same tolerance (1e-3) was used for the ADMM The error results for each LASSO extension are very similar 15 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

5 Table: Time (in sec) to run data divided by a i7 8-cores processor The same tolerance (1e-3) was used for the ADMM

16 Final Remarks and Future Work Description Numerical Results Conclusions The adequate choice of a sparse structure can improve the forecast skill of the VAR model The case-study results indicate that Information from selected distributed time series can improve the forecast error compared to an AR model The Group LASSO-VAR model achieves the highest global improvement and the Lag LASSO-VAR model provides the lowest improvement (mainly for the first lead times) Future Work Explore more complex sparse structures Extend the statistical model to the probabilistic forecast framework Apply this framework to other smart grid related problems 16 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

global improvement and the Lag LASSO-VAR model provides the lowest improvement (mainly for the first lead times) Future Work Explore more complex sparse structures Extend the

17 Acknowledgements Introduction Description Numerical Results Conclusions This work was made in the framework of the SusCity project ( MITP-TB/CS/0026/2013 ) financed by national funds through Fundação para a Ciência e a Tecnologia (FCT), Portugal. 17 / 17 Ricardo Bessa Big Data Techniques Applied to Wind Power Forecasting

financed by national funds through Fundação para a Ciência e a Tecnologia (FCT),

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya