From Sparse Approximation to Forecast of Intraday Load Curves

Size: px
Start display at page:

Download "From Sparse Approximation to Forecast of Intraday Load Curves"

Transcription

1 From Sparse Approximation to Forecast of Intraday Load Curves Mathilde Mougeot Joint work with D. Picard, K. Tribouley (P7)& V. Lefieux, L. Teyssier-Maillard (RTE) 1/43

2 Electrical Consumption Time series 6 x Figure : Two weeks of electrical consumption 2/43

3 Intraday load curves From Sparse Approximation towards Forecast: 6 x Figure : Functional data, Intra day load curves 3/43

4 Intraday load curve 5.5 x Intra day load curve, 3 sampling (48 pts), Y R n=48 (Y t 1 t 28) 4/43

5 Intra-day load curves 7 x 14 Electrical Consumption signals Intraday load curves for some days : dashed dot line, : solid line, : dot line, : dashed line. 5/43

6 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts Scientific collaboration with RTE Réseau Transport électrique who wants to revised its Forecasting model based on time serie 6/43

7 Modeling each signal as a function We investigate the problem in a supervised learning setting. We consider each time unit signal Z i = (U i,y i ), i = 1,...,n The generic consumption signal observed on the time unit: Y i, i = 1,...,n The design (here fixed equi distributed): U i = i n We want to identify f (for each signal) in such a way that the model Y i = f(u i )+ǫ i. makes sense (has small errors ǫ i s). 7/43

8 Using a dictionary Consider a dictionary D of functions D = {g 1,...g p } and Assume that f can be well fitted by this dictionary f = p β l g l +h l=1 where h is a small function (in absolute value). The model is Y i = p β l g l (U i )+h(u i )+ǫ i, i = 1,...,n l=1 which coincides with the linear model : Y= Xβ +ǫ with X(n p) putting ǫ i = h(u i )+ǫ i and G il = g l (U i ). 8/43

9 Classical framework - more observations than variables n > p - and weak collinearity between co variables, X T X invertible y 1 y 2 y n = x x 1p x n1... x np Thin matrix β 1 β 2 β p +ǫ Unique Solution: ˆβ = (X T X) 1 X T Y 9/43

10 High dimensional framework Solution: ˆβ = Argmin Y Xβ 2 - More variables than observations n << p β 1 y 1 x x 1p β 2 y 2 = y n x n1... x np β p +ǫ Fat matrix Infinity of ˆβ solutions. Need more assumptions on β to solve the problem Ex: Lasso (l 1 penalization), Ridge (l 2 )... 1/43

11 Alternative procedure Learning Out of Leaders*: based on 2 Thresholding steps, weak complexity, sparse solution, Algorithm in 3 steps (X column normalized): step compute size 1. SELECTION Find b Leaders X b (n,b) (threshold) b < n << p 2. REGRESSION on Leaders β = (X T b X b ) 1 X T b Y (1,b) 3. THRESHOLD the coefficients ˆβ (1, Ŝ) ( ) MM, D. Picard, K. Tribouley, JRSS B 212,B Stat. Methodol. vol 74 11/43

12 LOL - step1 1. SELECTION Leaders among p X X b (n,b) based on a (Y,X l ) correlation search and thresholding: K l = 1 n n X il Y i i=1 l, 1 l p Find the set B = {l, K l λ 1 }. Theoretical Threshold, λ 1 = T 1 logp n, T 1 : constant (σ,ν,m,c ) Data driven choice of λ 1 for practical applications (LOLA) 12/43

13 LOL - step 2 Y = Xβ +ǫ, Y (n 1), X (n p), S non zero coefficients β SELECTION Find b Leaders X b 2. REGRESSION on Leaders β = (X t b X b ) 1 X t b Y THRESHOLD the coefficients ˆβ 13/43

14 LOL, step 3: Threshold Threshold (again like step 1) the estimated coefficients to obtain the final predictor ˆβ l = β l I{ β l λ 2 } logp Threshold λ 2 = T 2 n, For some constant T 2 >, T 2 (σ,ν,m,c ) To have: - Estimation: ˆβ l - Selection: ˆβ l, sparse solution - Prediction:X ˆβ Data driven choice of λ 2 for pratical applications (LOLA) 14/43

15 LOL assumptions, theoretical thresholds When: 1. Sparsity: B (S,M) := {β R p, p j=1 I{ β j } S, β l1 (p) M}. 2. Dimension: p exp( n), 3. Coherence: τ n logp n Choose: the thresholds λ 1, λ 2 logp λ 1 = n, λ logp 2 = n Approximation, Concentration results: 1 n - Prediction loss: n i=1 (Ŷi EY i ) 2 = d(ˆβ,β) 2 ( ) sup P d(ˆβ,β) > η β B (S,M) 4e γnη2 for η 2 logp DS[ n τ n ] 2 1 for η 2 DS[ logp n τ n ] 2 ( ) MM, D. Picard, K. Tribouley, JRSS B 212,B Stat. Methodol. vol 74 15/43

16 Illustration on simulated data, LOL step 1 X i.i.d. N(,1), β N(2,1), S = 1 The leaders are: logp B = {l, K l λ 1 }, with λ 1 = T 1 n Applications: adaptive λ 1 4 n=25, S=1, b=2., λ n = 25, p = 1 ρ = S n =.25, δ = 1 n p =.75 card(b) = 17 >> S 16/43

17 Illustration on simulated data, LOL step2,3 Ex1: X i.i.d. N(,1), β = N(2,1),S = 1, (Leaders b = 17) 5 n=25, S=1, p=1,b=2,snr=5 5 n=25, S=1, p=1,b=2,snr= λ 2 λ n = 25,p = 1 ρ =.4, δ =.75 p Ŝ Ŝ p S 1 S = /43

18 Illustration on simulated data, LOL step2,3 Another example Ex2: X i.i.d. N(,1),, S = 2, ρ =.8, δ =.75 4 n=25, S=2, p=1,b=2,snr= λ 2 1 λ p Ŝ Ŝ p S S = /43

19 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts 19/43

20 Segmentation of the intra-day load curves using Sparse Approximation on a Generic Dictionary 8 years of data: from January 1 st 23 to August 31 th 21 T = 28 intra day load curves (n = 48) 6 x /43

21 Approximation using a Generic Dictionary Each day t, Y t = Xβ t +ǫ t with Dictionary of p functions D = {g 1,...g p } G il = g l (U i ) For daily load curves, a good choice happened finally to be a mixture of the Fourier basis and the Haar basis, p = (1:1) constant function (1) 2. (2:24) cosine functions (with increasing frequencies) (23) 3. (25:47) sine functions (with increasing frequencies)(23) 4. (48:62) Haar functions (with increasing frequencies)(15) Approximation: p = 7, E MAPE = 1.4% 21/43

22 Apx: November 18 th 27 S = 12, MAPE =.57 =.57%. MAPE = 1 n=48 n i Y i Ŷi /Y i 7.2 x 14 Electrical consumption x Dictionnary coefficients (12) 7 6 C S H Figure : left: observed signal - red line, approximated signal -blue line right: S coefficients on the dictionary 22/43

23 Apx: June 17 th, 29 S = 5, MAPE = x 14 Electrical consumption x Dictionnary coefficients (6) C S H Figure : left: observed signal - red line, approximated signal -blue line right: S coefficients on the dictionary 23/43

24 Performances Comparison between various dictionaries: MAPE and sparsity average Dictionary E (%) MAPE (%) S Haar Fourier DB Mixed /43

25 Support for all coefficients 2 using the mixte dictionary (Fourier, Haar) 1.9 C S H /43

26 Segmentation of the intra-day load curves using Sparse Approximation on a Generic Dictionary 8 years of data: from January 1 st 23 to August 13 th 21 T = 28 intra day load curves (n = 48) Sparse approximation of the intra day load curves( S = 7, same support) using a clustering algorithm in 2 steps (k-means algorithm) Segmentation of the daily signals in clusters... From Cluster to Groups using calendar interpretation Patterns defined by Group Centroids 26/43

27 Two step Clustering results x 1 4 x 1 4 x 1 4 x Cluster 1 (66) Cluster 2 (173) Cluster 3 (62) Cluster 4 (51) 1 2 Figure : T = 28 intra day load curves of size n = 48 (clustering using S = 7 approximated coefficients) Remarque: stability study for the 4 main clusters 27/43

28 Mining the clusters... over days (1...7) and months (1..12) exhibits specific consumption periods Cluster 1.a Day 1 5 Cluster 1.a Month 2 1 Cluster 3.a Day Cluster 3.a Month Cluster 1.b Day 2 Cluster 1.b Month 6 4 Cluster 3.b Day 3 2 Cluster 3.b Month Cluster 2.a Day Cluster 2.a Month Cluster 4.a Day Cluster 4.a Month Cluster 2.b Day Cluster 2.b Month 2 1 Cluster 4.b Day 4 2 Cluster 4.b Month /43

29 to Groups From clusters: Cluster 1.a Day Cluster 1.a Month Cluster 3.a Day Cluster 3.a Month Cluster 1.b Day Cluster 1.b Month Cluster 3.b Day Cluster 3.b Month Cluster 2.a Day Cluster 2.a Month Cluster 4.a Day Cluster 4.a Month Cluster 2.b Day Cluster 2.b Month Cluster 4.b Day Cluster 4.b Month To groups: calendar interpretation of the clusters Months Days /43

30 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts 3/43

31 Spot of Temperatures, Cloud Cover and Wind information Figure : Temp., Cloud Cover spots ( 39) and wind data ( 293) 31/43

32 Intraday Specific Dictionary: high dimensional model Each day t, Y t = X t β t +ǫ t with X t = [P t M t ] First model: p = 94 2, Pt = [C t B t ] (C t : group centroid, B t = Y t 7 ) 92, Mt, Meteorogical information (Temperature, Cloud Cover, wind Indicators (min, max, med, std) computed over the 39 meteorological spots and 95% PCA (8: T:5/N:25/W:5). Approximation performance: LOL adaptive S = 13, Ē MAPE =.34% To remind: Mixte Generic dictionnary: MAPE=1.34%, S = 7 32/43

33 Intraday Specific Dictionary: low dimensional model Each day t, Y t = X t β t +ǫ t X t = [P t M t ] Selected model, p = P t = [C t B t ] Patterns: 2 group centroid, B t = Y t 7 2. M t Meteorological data 12 (Temperature, Cloud Cover, Wind) Approximation performance: LOL adaptive S = 2.5 [2;8], Ē MAPE = 1.5% [min.2; max.5] LOL fixed sparsity S = 5 [5;5], Ē MAPE =.8% [min.17; max.5] To remind: Mixte Generic dictionnary: MAPE=1.34%, S = 7 : Nice MAPE and Sparsity for approximation 33/43

34 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Experts dedicated to strategies - Aggregation of Experts 34/43

35 From Sparse approximation to Forecast: Ŷ t Approximation for day t: Ŷ t = X tˆβt +δ t with X t = [P t ;M t ] P t = [C t ;B t ] with n = 48, p = 14 = x ?? Forecast Expert: Ỹ t = X t βt X t = [P t ;M t ] is supposed to be known βt = ˆβ t Plug in estimated coefficients at time t = S(t) << t, with Strategy S 35/43

36 Specialized Experts focus on Different Strategies 1. Time depending (t-1, t-7) (2) 2. (Meteorological configuration of the day (Temperature) 2) 3. Constrained meteorological configuration of the day (Temperature/Cloud Covering) 4. Group constraint meteorological configuration of the day (Temperature/group) 5. Meteorological configuration of the day constrained by the type of the day (Temperature/day) 6. Meteorological configuration of the day constrained by a calendar group (Temperature/calendar) 7. Meteorological configuration of the day (Cloud cover) 8. group constraint meteorological configuration of the day (Cloud Cover/group) 9. Meteorological configuration of the day constrained by the type of the day (Cloud Cover/day) 1. Meteorological configuration of the day constrained by a calendar group (Cloud Cover/calendar) K = 12 36/43

37 Experts hits Frequencies for the expert to perform at best:.14 Winning Expert frequencies tm1 tm7 T Tm T/N Tm/N T/G T/d T/c Ns/G N/d N/c data: form September 1 st 21 to August 31 th 21 The expert are daily competitive 37/43

38 Experts performances data: form September 1 st 21 to August 31 th 21 Names mean median std Naive Oracle ty tw T Tm T/N Tm/N T/G T/d T/c N/G N/d N/c /43

39 Aggregation and performances All experts participate to forecast modulated by their performance of approximation given the strategy at S = t. Ŷ t = s M ws t Ỹs t s M ws t with w s t = exp Y t s Ŷs t s 2 2 /θ t s = S s(t) x ?? 39/43

40 Aggregation performance Names mean median std Naive Oracle ty tw T Tm T/N Tm/N T/G T/d T/c N/G N/d N/c AGG /43

41 Forecasting some intraday load curves Some nice intra day forecasting for different periods: 5.5 x 14 21/6/ x 14 29/1/ Y pred 4.6 Y pred x 14 21/2/ x 14 21/6/ Y 3.4 Y pred pred /43

42 Conclusion and perspectives Universal approach for functional data (with intra day pattern) Sparse approximation using a Generic dictionary for compression and pattern extraction Intra day specific dictionaries for approximation and prediction Forecasting Various experts for prediction Agregation using exponential weights Competitive approach compared to usual time serie analysis with much less parameters. 42/43

43 Thanks for your attention! More information on: sites.google.com/site/mougeotmathilde/research 43/43

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Part II Redundant Dictionaries and Pursuit Algorithms

Part II Redundant Dictionaries and Pursuit Algorithms Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Lasso on Categorical Data

Lasso on Categorical Data Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

More information

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

How to assess the risk of a large portfolio? How to estimate a large covariance matrix? Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Extreme Value Modeling for Detection and Attribution of Climate Extremes

Extreme Value Modeling for Detection and Attribution of Climate Extremes Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems

Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems Giovanni S Alberti (joint with H Ammari) DMA, École Normale Supérieure, Paris June 16, 2015 Giovanni S Alberti

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Social Media Aided Stock Market Predictions by Sparsity Induced Regression

Social Media Aided Stock Market Predictions by Sparsity Induced Regression Social Media Aided Stock Market Predictions by Sparsity Induced Regression Delft Center for Systems and Control Social Media Aided Stock Market Predictions by Sparsity Induced Regression For the degree

More information

Context selection for volume forecast

Context selection for volume forecast Prediction of time series and non stationary time series, 10 february 2012 Context selection for volume forecast The influence of news on traded volume Nathanaël Mayo Outlook 1 Events Definition Combining

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Predicting daily incoming solar energy from weather data

Predicting daily incoming solar energy from weather data Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

itesla Project Innovative Tools for Electrical System Security within Large Areas

itesla Project Innovative Tools for Electrical System Security within Large Areas itesla Project Innovative Tools for Electrical System Security within Large Areas Samir ISSAD RTE France samir.issad@rte-france.com PSCC 2014 Panel Session 22/08/2014 Advanced data-driven modeling techniques

More information

Causal Leading Indicators Detection for Demand Forecasting

Causal Leading Indicators Detection for Demand Forecasting Causal Leading Indicators Detection for Demand Forecasting Yves R. Sagaert, El-Houssaine Aghezzaf, Nikolaos Kourentzes, Bram Desmet Department of Industrial Management, Ghent University 13/07/2015 EURO

More information

A Study on the Comparison of Electricity Forecasting Models: Korea and China

A Study on the Comparison of Electricity Forecasting Models: Korea and China Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison

More information

HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11.

HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11. COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg daniel.harenberg@gmx.de University of Mannheim Econ 714, 28.11.06 Daniel Harenberg

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

Government Debt Management: the Long and the Short of it

Government Debt Management: the Long and the Short of it Government Debt Management: the Long and the Short of it E. Faraglia (U. of Cambridge and CEPR) A. Marcet (IAE, UAB, ICREA, BGSE, MOVE and CEPR), R. Oikonomou (U.C. Louvain) A. Scott (LBS and CEPR) ()

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

International Stock Market Integration: A Dynamic General Equilibrium Approach

International Stock Market Integration: A Dynamic General Equilibrium Approach International Stock Market Integration: A Dynamic General Equilibrium Approach Harjoat S. Bhamra London Business School 2003 Outline of talk 1 Introduction......................... 1 2 Economy...........................

More information

VI. Real Business Cycles Models

VI. Real Business Cycles Models VI. Real Business Cycles Models Introduction Business cycle research studies the causes and consequences of the recurrent expansions and contractions in aggregate economic activity that occur in most industrialized

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Limitations of Equilibrium Or: What if τ LS τ adj?

Limitations of Equilibrium Or: What if τ LS τ adj? Limitations of Equilibrium Or: What if τ LS τ adj? Bob Plant, Laura Davies Department of Meteorology, University of Reading, UK With thanks to: Steve Derbyshire, Alan Grant, Steve Woolnough and Jeff Chagnon

More information

Introduction: Overview of Kernel Methods

Introduction: Overview of Kernel Methods Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University

More information

Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Multiple Choice Models II

Multiple Choice Models II Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Predictive modelling around the world 28.11.13

Predictive modelling around the world 28.11.13 Predictive modelling around the world 28.11.13 Agenda Why this presentation is really interesting Introduction to predictive modelling Case studies Conclusions Why this presentation is really interesting

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Vekua approximations and applications to soundeld sampling and source localisation

Vekua approximations and applications to soundeld sampling and source localisation 1/50 Vekua approximations and applications to soundeld sampling and source localisation Gilles Chardon Acoustics Research Institute, Vienna, Austria 2/50 Introduction Context Joint work with Laurent Daudet

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

More information

EHz Network Analysis and phonemic Diagrams

EHz Network Analysis and phonemic Diagrams Contribute 22 Classification of EEG recordings in auditory brain activity via a logistic functional linear regression model Irène Gannaz Abstract We want to analyse EEG recordings in order to investigate

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Indian Research Journal of Extension Education Special Issue (Volume I), January, 2012 161

Indian Research Journal of Extension Education Special Issue (Volume I), January, 2012 161 Indian Research Journal of Extension Education Special Issue (Volume I), January, 2012 161 A Simple Weather Forecasting Model Using Mathematical Regression Paras 1 and Sanjay Mathur 2 1. Assistant Professor,

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

ABSTRACT JEL: C35, C63, M15. KEYWORDS: Project Management, Performance, Prediction, Earned Value INTRODUCTION

ABSTRACT JEL: C35, C63, M15. KEYWORDS: Project Management, Performance, Prediction, Earned Value INTRODUCTION GLOBAL JOURNAL OF BUSINESS RESEARCH VOLUME 7 NUMBER 5 013 EXTREME PROGRAMMING PROJECT PERFORMANCE MANAGEMENT BY STATISTICAL EARNED VALUE ANALYSIS Wei Lu, Duke University Li Lu, University of Electronic

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Sovereign Defaults. Iskander Karibzhanov. October 14, 2014

Sovereign Defaults. Iskander Karibzhanov. October 14, 2014 Sovereign Defaults Iskander Karibzhanov October 14, 214 1 Motivation Two recent papers advance frontiers of sovereign default modeling. First, Aguiar and Gopinath (26) highlight the importance of fluctuations

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Wealth Accumulation, Credit Card Borrowing, and. Consumption-Income Comovement

Wealth Accumulation, Credit Card Borrowing, and. Consumption-Income Comovement Wealth Accumulation, Credit Card Borrowing, and Consumption-Income Comovement David Laibson, Andrea Repetto, and Jeremy Tobacman Current Draft: May 2002 1 Self control problems People are patient in the

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

Using Order Book Data

Using Order Book Data Q3 2007 Using Order Book Data Improve Automated Model Performance by Thom Hartle TradeFlow Charts and Studies - Patent Pending TM Reprinted from the July 2007 issue of Automated Trader Magazine www.automatedtrader.net

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Forecasting in supply chains

Forecasting in supply chains 1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

More information

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear

More information

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

To give it a definition, an implicit function of x and y is simply any relationship that takes the form: 2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

More information

Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015

Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015 Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd COEURE workshop Brussels 3-4 July 2015 WHAT IS BIG DATA IN ECONOMICS? Frank Diebold claimed to have introduced

More information

Package bigdata. R topics documented: February 19, 2015

Package bigdata. R topics documented: February 19, 2015 Type Package Title Big Data Analytics Version 0.1 Date 2011-02-12 Author Han Liu, Tuo Zhao Maintainer Han Liu Depends glmnet, Matrix, lattice, Package bigdata February 19, 2015 The

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

Distressed Debt Prices and Recovery Rate Estimation

Distressed Debt Prices and Recovery Rate Estimation Distressed Debt Prices and Recovery Rate Estimation Robert Jarrow Joint Work with Xin Guo and Haizhi Lin May 2008 Introduction Recent market events highlight the importance of understanding credit risk.

More information

SWISS ARMY KNIFE INDICATOR John F. Ehlers

SWISS ARMY KNIFE INDICATOR John F. Ehlers SWISS ARMY KNIFE INDICATOR John F. Ehlers The indicator I describe in this article does all the common functions of the usual indicators, such as smoothing and momentum generation. It also does some unusual

More information

Penalized Splines - A statistical Idea with numerous Applications...

Penalized Splines - A statistical Idea with numerous Applications... Penalized Splines - A statistical Idea with numerous Applications... Göran Kauermann Ludwig-Maximilians-University Munich Graz 7. September 2011 1 Penalized Splines - A statistical Idea with numerous Applications...

More information

Numerical PDE methods for exotic options

Numerical PDE methods for exotic options Lecture 8 Numerical PDE methods for exotic options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 Barrier options For barrier option part of the option contract is triggered if the asset

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

Risk Management and Stress Testing From Theory to Practice

Risk Management and Stress Testing From Theory to Practice Conference for 65th Birthday Professor Wolfgang Runggaldier Risk Management and Stress Testing From Theory to Practice Michele Bonollo michele.bonollo@sgsbpvn.it Bressanone, 18 luglio 2007 What did I learn

More information

The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation. Lectures INF2320 p. 1/88 The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

More information

Investigation of VLF Test Parameters. Joshua Perkel Jorge Altamirano Nigel Hampton

Investigation of VLF Test Parameters. Joshua Perkel Jorge Altamirano Nigel Hampton Investigation of VLF Test Parameters Joshua Perkel Jorge Altamirano Nigel Hampton 1 Introduction - why IEEE400.2 is in use with recommendations of test times and test voltages At the start of the CDFI

More information

4.3. David E. Rudack*, Meteorological Development Laboratory Office of Science and Technology National Weather Service, NOAA 1.

4.3. David E. Rudack*, Meteorological Development Laboratory Office of Science and Technology National Weather Service, NOAA 1. 43 RESULTS OF SENSITIVITY TESTING OF MOS WIND SPEED AND DIRECTION GUIDANCE USING VARIOUS SAMPLE SIZES FROM THE GLOBAL ENSEMBLE FORECAST SYSTEM (GEFS) RE- FORECASTS David E Rudack*, Meteorological Development

More information

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails 12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and. Master s Thesis Title A Study on Active Queue Management Mechanisms for Internet Routers: Design, Performance Analysis, and Parameter Tuning Supervisor Prof. Masayuki Murata Author Tomoya Eguchi February

More information

MAINTAINED SYSTEMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University ENGINEERING RELIABILITY INTRODUCTION

MAINTAINED SYSTEMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University ENGINEERING RELIABILITY INTRODUCTION MAINTAINED SYSTEMS Harry G. Kwatny Department of Mechanical Engineering & Mechanics Drexel University OUTLINE MAINTE MAINTE MAINTAINED UNITS Maintenance can be employed in two different manners: Preventive

More information

Singular Spectrum Analysis with Rssa

Singular Spectrum Analysis with Rssa Singular Spectrum Analysis with Rssa Maurizio Sanarico Chief Data Scientist SDG Consulting Milano R June 4, 2014 Motivation Discover structural components in complex time series Working hypothesis: a signal

More information