From Sparse Approximation to Forecast of Intraday Load Curves

Transcription

1 From Sparse Approximation to Forecast of Intraday Load Curves Mathilde Mougeot Joint work with D. Picard, K. Tribouley (P7)& V. Lefieux, L. Teyssier-Maillard (RTE) 1/43

2 Electrical Consumption Time series 6 x Figure : Two weeks of electrical consumption 2/43

3 Intraday load curves From Sparse Approximation towards Forecast: 6 x Figure : Functional data, Intra day load curves 3/43

4 Intraday load curve 5.5 x Intra day load curve, 3 sampling (48 pts), Y R n=48 (Y t 1 t 28) 4/43

5 Intra-day load curves 7 x 14 Electrical Consumption signals Intraday load curves for some days : dashed dot line, : solid line, : dot line, : dashed line. 5/43

6 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts Scientific collaboration with RTE Réseau Transport électrique who wants to revised its Forecasting model based on time serie 6/43

7 Modeling each signal as a function We investigate the problem in a supervised learning setting. We consider each time unit signal Z i = (U i,y i ), i = 1,...,n The generic consumption signal observed on the time unit: Y i, i = 1,...,n The design (here fixed equi distributed): U i = i n We want to identify f (for each signal) in such a way that the model Y i = f(u i )+ǫ i. makes sense (has small errors ǫ i s). 7/43

8 Using a dictionary Consider a dictionary D of functions D = {g 1,...g p } and Assume that f can be well fitted by this dictionary f = p β l g l +h l=1 where h is a small function (in absolute value). The model is Y i = p β l g l (U i )+h(u i )+ǫ i, i = 1,...,n l=1 which coincides with the linear model : Y= Xβ +ǫ with X(n p) putting ǫ i = h(u i )+ǫ i and G il = g l (U i ). 8/43

9 Classical framework - more observations than variables n > p - and weak collinearity between co variables, X T X invertible y 1 y 2 y n = x x 1p x n1... x np Thin matrix β 1 β 2 β p +ǫ Unique Solution: ˆβ = (X T X) 1 X T Y 9/43

10 High dimensional framework Solution: ˆβ = Argmin Y Xβ 2 - More variables than observations n << p β 1 y 1 x x 1p β 2 y 2 = y n x n1... x np β p +ǫ Fat matrix Infinity of ˆβ solutions. Need more assumptions on β to solve the problem Ex: Lasso (l 1 penalization), Ridge (l 2 )... 1/43

11 Alternative procedure Learning Out of Leaders*: based on 2 Thresholding steps, weak complexity, sparse solution, Algorithm in 3 steps (X column normalized): step compute size 1. SELECTION Find b Leaders X b (n,b) (threshold) b < n << p 2. REGRESSION on Leaders β = (X T b X b ) 1 X T b Y (1,b) 3. THRESHOLD the coefficients ˆβ (1, Ŝ) ( ) MM, D. Picard, K. Tribouley, JRSS B 212,B Stat. Methodol. vol 74 11/43

12 LOL - step1 1. SELECTION Leaders among p X X b (n,b) based on a (Y,X l ) correlation search and thresholding: K l = 1 n n X il Y i i=1 l, 1 l p Find the set B = {l, K l λ 1 }. Theoretical Threshold, λ 1 = T 1 logp n, T 1 : constant (σ,ν,m,c ) Data driven choice of λ 1 for practical applications (LOLA) 12/43

13 LOL - step 2 Y = Xβ +ǫ, Y (n 1), X (n p), S non zero coefficients β SELECTION Find b Leaders X b 2. REGRESSION on Leaders β = (X t b X b ) 1 X t b Y THRESHOLD the coefficients ˆβ 13/43

14 LOL, step 3: Threshold Threshold (again like step 1) the estimated coefficients to obtain the final predictor ˆβ l = β l I{ β l λ 2 } logp Threshold λ 2 = T 2 n, For some constant T 2 >, T 2 (σ,ν,m,c ) To have: - Estimation: ˆβ l - Selection: ˆβ l, sparse solution - Prediction:X ˆβ Data driven choice of λ 2 for pratical applications (LOLA) 14/43

15 LOL assumptions, theoretical thresholds When: 1. Sparsity: B (S,M) := {β R p, p j=1 I{ β j } S, β l1 (p) M}. 2. Dimension: p exp( n), 3. Coherence: τ n logp n Choose: the thresholds λ 1, λ 2 logp λ 1 = n, λ logp 2 = n Approximation, Concentration results: 1 n - Prediction loss: n i=1 (Ŷi EY i ) 2 = d(ˆβ,β) 2 ( ) sup P d(ˆβ,β) > η β B (S,M) 4e γnη2 for η 2 logp DS[ n τ n ] 2 1 for η 2 DS[ logp n τ n ] 2 ( ) MM, D. Picard, K. Tribouley, JRSS B 212,B Stat. Methodol. vol 74 15/43

16 Illustration on simulated data, LOL step 1 X i.i.d. N(,1), β N(2,1), S = 1 The leaders are: logp B = {l, K l λ 1 }, with λ 1 = T 1 n Applications: adaptive λ 1 4 n=25, S=1, b=2., λ n = 25, p = 1 ρ = S n =.25, δ = 1 n p =.75 card(b) = 17 >> S 16/43

17 Illustration on simulated data, LOL step2,3 Ex1: X i.i.d. N(,1), β = N(2,1),S = 1, (Leaders b = 17) 5 n=25, S=1, p=1,b=2,snr=5 5 n=25, S=1, p=1,b=2,snr= λ 2 λ n = 25,p = 1 ρ =.4, δ =.75 p Ŝ Ŝ p S 1 S = /43

18 Illustration on simulated data, LOL step2,3 Another example Ex2: X i.i.d. N(,1),, S = 2, ρ =.8, δ =.75 4 n=25, S=2, p=1,b=2,snr= λ 2 1 λ p Ŝ Ŝ p S S = /43

19 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts 19/43

20 Segmentation of the intra-day load curves using Sparse Approximation on a Generic Dictionary 8 years of data: from January 1 st 23 to August 31 th 21 T = 28 intra day load curves (n = 48) 6 x /43

21 Approximation using a Generic Dictionary Each day t, Y t = Xβ t +ǫ t with Dictionary of p functions D = {g 1,...g p } G il = g l (U i ) For daily load curves, a good choice happened finally to be a mixture of the Fourier basis and the Haar basis, p = (1:1) constant function (1) 2. (2:24) cosine functions (with increasing frequencies) (23) 3. (25:47) sine functions (with increasing frequencies)(23) 4. (48:62) Haar functions (with increasing frequencies)(15) Approximation: p = 7, E MAPE = 1.4% 21/43

22 Apx: November 18 th 27 S = 12, MAPE =.57 =.57%. MAPE = 1 n=48 n i Y i Ŷi /Y i 7.2 x 14 Electrical consumption x Dictionnary coefficients (12) 7 6 C S H Figure : left: observed signal - red line, approximated signal -blue line right: S coefficients on the dictionary 22/43

23 Apx: June 17 th, 29 S = 5, MAPE = x 14 Electrical consumption x Dictionnary coefficients (6) C S H Figure : left: observed signal - red line, approximated signal -blue line right: S coefficients on the dictionary 23/43

24 Performances Comparison between various dictionaries: MAPE and sparsity average Dictionary E (%) MAPE (%) S Haar Fourier DB Mixed /43

25 Support for all coefficients 2 using the mixte dictionary (Fourier, Haar) 1.9 C S H /43

26 Segmentation of the intra-day load curves using Sparse Approximation on a Generic Dictionary 8 years of data: from January 1 st 23 to August 13 th 21 T = 28 intra day load curves (n = 48) Sparse approximation of the intra day load curves( S = 7, same support) using a clustering algorithm in 2 steps (k-means algorithm) Segmentation of the daily signals in clusters... From Cluster to Groups using calendar interpretation Patterns defined by Group Centroids 26/43

27 Two step Clustering results x 1 4 x 1 4 x 1 4 x Cluster 1 (66) Cluster 2 (173) Cluster 3 (62) Cluster 4 (51) 1 2 Figure : T = 28 intra day load curves of size n = 48 (clustering using S = 7 approximated coefficients) Remarque: stability study for the 4 main clusters 27/43

28 Mining the clusters... over days (1...7) and months (1..12) exhibits specific consumption periods Cluster 1.a Day 1 5 Cluster 1.a Month 2 1 Cluster 3.a Day Cluster 3.a Month Cluster 1.b Day 2 Cluster 1.b Month 6 4 Cluster 3.b Day 3 2 Cluster 3.b Month Cluster 2.a Day Cluster 2.a Month Cluster 4.a Day Cluster 4.a Month Cluster 2.b Day Cluster 2.b Month 2 1 Cluster 4.b Day 4 2 Cluster 4.b Month /43

29 to Groups From clusters: Cluster 1.a Day Cluster 1.a Month Cluster 3.a Day Cluster 3.a Month Cluster 1.b Day Cluster 1.b Month Cluster 3.b Day Cluster 3.b Month Cluster 2.a Day Cluster 2.a Month Cluster 4.a Day Cluster 4.a Month Cluster 2.b Day Cluster 2.b Month Cluster 4.b Day Cluster 4.b Month To groups: calendar interpretation of the clusters Months Days /43

30 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts 3/43

31 Spot of Temperatures, Cloud Cover and Wind information Figure : Temp., Cloud Cover spots ( 39) and wind data ( 293) 31/43

32 Intraday Specific Dictionary: high dimensional model Each day t, Y t = X t β t +ǫ t with X t = [P t M t ] First model: p = 94 2, Pt = [C t B t ] (C t : group centroid, B t = Y t 7 ) 92, Mt, Meteorogical information (Temperature, Cloud Cover, wind Indicators (min, max, med, std) computed over the 39 meteorological spots and 95% PCA (8: T:5/N:25/W:5). Approximation performance: LOL adaptive S = 13, Ē MAPE =.34% To remind: Mixte Generic dictionnary: MAPE=1.34%, S = 7 32/43

33 Intraday Specific Dictionary: low dimensional model Each day t, Y t = X t β t +ǫ t X t = [P t M t ] Selected model, p = P t = [C t B t ] Patterns: 2 group centroid, B t = Y t 7 2. M t Meteorological data 12 (Temperature, Cloud Cover, Wind) Approximation performance: LOL adaptive S = 2.5 [2;8], Ē MAPE = 1.5% [min.2; max.5] LOL fixed sparsity S = 5 [5;5], Ē MAPE =.8% [min.17; max.5] To remind: Mixte Generic dictionnary: MAPE=1.34%, S = 7 : Nice MAPE and Sparsity for approximation 33/43

34 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Experts dedicated to strategies - Aggregation of Experts 34/43

35 From Sparse approximation to Forecast: Ŷ t Approximation for day t: Ŷ t = X tˆβt +δ t with X t = [P t ;M t ] P t = [C t ;B t ] with n = 48, p = 14 = x ?? Forecast Expert: Ỹ t = X t βt X t = [P t ;M t ] is supposed to be known βt = ˆβ t Plug in estimated coefficients at time t = S(t) << t, with Strategy S 35/43

36 Specialized Experts focus on Different Strategies 1. Time depending (t-1, t-7) (2) 2. (Meteorological configuration of the day (Temperature) 2) 3. Constrained meteorological configuration of the day (Temperature/Cloud Covering) 4. Group constraint meteorological configuration of the day (Temperature/group) 5. Meteorological configuration of the day constrained by the type of the day (Temperature/day) 6. Meteorological configuration of the day constrained by a calendar group (Temperature/calendar) 7. Meteorological configuration of the day (Cloud cover) 8. group constraint meteorological configuration of the day (Cloud Cover/group) 9. Meteorological configuration of the day constrained by the type of the day (Cloud Cover/day) 1. Meteorological configuration of the day constrained by a calendar group (Cloud Cover/calendar) K = 12 36/43

37 Experts hits Frequencies for the expert to perform at best:.14 Winning Expert frequencies tm1 tm7 T Tm T/N Tm/N T/G T/d T/c Ns/G N/d N/c data: form September 1 st 21 to August 31 th 21 The expert are daily competitive 37/43

38 Experts performances data: form September 1 st 21 to August 31 th 21 Names mean median std Naive Oracle ty tw T Tm T/N Tm/N T/G T/d T/c N/G N/d N/c /43

39 Aggregation and performances All experts participate to forecast modulated by their performance of approximation given the strategy at S = t. Ŷ t = s M ws t Ỹs t s M ws t with w s t = exp Y t s Ŷs t s 2 2 /θ t s = S s(t) x ?? 39/43

40 Aggregation performance Names mean median std Naive Oracle ty tw T Tm T/N Tm/N T/G T/d T/c N/G N/d N/c AGG /43

41 Forecasting some intraday load curves Some nice intra day forecasting for different periods: 5.5 x 14 21/6/ x 14 29/1/ Y pred 4.6 Y pred x 14 21/2/ x 14 21/6/ Y 3.4 Y pred pred /43

42 Conclusion and perspectives Universal approach for functional data (with intra day pattern) Sparse approximation using a Generic dictionary for compression and pattern extraction Intra day specific dictionaries for approximation and prediction Forecasting Various experts for prediction Agregation using exponential weights Competitive approach compared to usual time serie analysis with much less parameters. 42/43

43 Thanks for your attention! More information on: sites.google.com/site/mougeotmathilde/research 43/43