From Sparse Approximation to Forecast of Intraday Load Curves
|
|
- Mark Gibbs
- 7 years ago
- Views:
Transcription
1 From Sparse Approximation to Forecast of Intraday Load Curves Mathilde Mougeot Joint work with D. Picard, K. Tribouley (P7)& V. Lefieux, L. Teyssier-Maillard (RTE) 1/43
2 Electrical Consumption Time series 6 x Figure : Two weeks of electrical consumption 2/43
3 Intraday load curves From Sparse Approximation towards Forecast: 6 x Figure : Functional data, Intra day load curves 3/43
4 Intraday load curve 5.5 x Intra day load curve, 3 sampling (48 pts), Y R n=48 (Y t 1 t 28) 4/43
5 Intra-day load curves 7 x 14 Electrical Consumption signals Intraday load curves for some days : dashed dot line, : solid line, : dot line, : dashed line. 5/43
6 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts Scientific collaboration with RTE Réseau Transport électrique who wants to revised its Forecasting model based on time serie 6/43
7 Modeling each signal as a function We investigate the problem in a supervised learning setting. We consider each time unit signal Z i = (U i,y i ), i = 1,...,n The generic consumption signal observed on the time unit: Y i, i = 1,...,n The design (here fixed equi distributed): U i = i n We want to identify f (for each signal) in such a way that the model Y i = f(u i )+ǫ i. makes sense (has small errors ǫ i s). 7/43
8 Using a dictionary Consider a dictionary D of functions D = {g 1,...g p } and Assume that f can be well fitted by this dictionary f = p β l g l +h l=1 where h is a small function (in absolute value). The model is Y i = p β l g l (U i )+h(u i )+ǫ i, i = 1,...,n l=1 which coincides with the linear model : Y= Xβ +ǫ with X(n p) putting ǫ i = h(u i )+ǫ i and G il = g l (U i ). 8/43
9 Classical framework - more observations than variables n > p - and weak collinearity between co variables, X T X invertible y 1 y 2 y n = x x 1p x n1... x np Thin matrix β 1 β 2 β p +ǫ Unique Solution: ˆβ = (X T X) 1 X T Y 9/43
10 High dimensional framework Solution: ˆβ = Argmin Y Xβ 2 - More variables than observations n << p β 1 y 1 x x 1p β 2 y 2 = y n x n1... x np β p +ǫ Fat matrix Infinity of ˆβ solutions. Need more assumptions on β to solve the problem Ex: Lasso (l 1 penalization), Ridge (l 2 )... 1/43
11 Alternative procedure Learning Out of Leaders*: based on 2 Thresholding steps, weak complexity, sparse solution, Algorithm in 3 steps (X column normalized): step compute size 1. SELECTION Find b Leaders X b (n,b) (threshold) b < n << p 2. REGRESSION on Leaders β = (X T b X b ) 1 X T b Y (1,b) 3. THRESHOLD the coefficients ˆβ (1, Ŝ) ( ) MM, D. Picard, K. Tribouley, JRSS B 212,B Stat. Methodol. vol 74 11/43
12 LOL - step1 1. SELECTION Leaders among p X X b (n,b) based on a (Y,X l ) correlation search and thresholding: K l = 1 n n X il Y i i=1 l, 1 l p Find the set B = {l, K l λ 1 }. Theoretical Threshold, λ 1 = T 1 logp n, T 1 : constant (σ,ν,m,c ) Data driven choice of λ 1 for practical applications (LOLA) 12/43
13 LOL - step 2 Y = Xβ +ǫ, Y (n 1), X (n p), S non zero coefficients β SELECTION Find b Leaders X b 2. REGRESSION on Leaders β = (X t b X b ) 1 X t b Y THRESHOLD the coefficients ˆβ 13/43
14 LOL, step 3: Threshold Threshold (again like step 1) the estimated coefficients to obtain the final predictor ˆβ l = β l I{ β l λ 2 } logp Threshold λ 2 = T 2 n, For some constant T 2 >, T 2 (σ,ν,m,c ) To have: - Estimation: ˆβ l - Selection: ˆβ l, sparse solution - Prediction:X ˆβ Data driven choice of λ 2 for pratical applications (LOLA) 14/43
15 LOL assumptions, theoretical thresholds When: 1. Sparsity: B (S,M) := {β R p, p j=1 I{ β j } S, β l1 (p) M}. 2. Dimension: p exp( n), 3. Coherence: τ n logp n Choose: the thresholds λ 1, λ 2 logp λ 1 = n, λ logp 2 = n Approximation, Concentration results: 1 n - Prediction loss: n i=1 (Ŷi EY i ) 2 = d(ˆβ,β) 2 ( ) sup P d(ˆβ,β) > η β B (S,M) 4e γnη2 for η 2 logp DS[ n τ n ] 2 1 for η 2 DS[ logp n τ n ] 2 ( ) MM, D. Picard, K. Tribouley, JRSS B 212,B Stat. Methodol. vol 74 15/43
16 Illustration on simulated data, LOL step 1 X i.i.d. N(,1), β N(2,1), S = 1 The leaders are: logp B = {l, K l λ 1 }, with λ 1 = T 1 n Applications: adaptive λ 1 4 n=25, S=1, b=2., λ n = 25, p = 1 ρ = S n =.25, δ = 1 n p =.75 card(b) = 17 >> S 16/43
17 Illustration on simulated data, LOL step2,3 Ex1: X i.i.d. N(,1), β = N(2,1),S = 1, (Leaders b = 17) 5 n=25, S=1, p=1,b=2,snr=5 5 n=25, S=1, p=1,b=2,snr= λ 2 λ n = 25,p = 1 ρ =.4, δ =.75 p Ŝ Ŝ p S 1 S = /43
18 Illustration on simulated data, LOL step2,3 Another example Ex2: X i.i.d. N(,1),, S = 2, ρ =.8, δ =.75 4 n=25, S=2, p=1,b=2,snr= λ 2 1 λ p Ŝ Ŝ p S S = /43
19 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts 19/43
20 Segmentation of the intra-day load curves using Sparse Approximation on a Generic Dictionary 8 years of data: from January 1 st 23 to August 31 th 21 T = 28 intra day load curves (n = 48) 6 x /43
21 Approximation using a Generic Dictionary Each day t, Y t = Xβ t +ǫ t with Dictionary of p functions D = {g 1,...g p } G il = g l (U i ) For daily load curves, a good choice happened finally to be a mixture of the Fourier basis and the Haar basis, p = (1:1) constant function (1) 2. (2:24) cosine functions (with increasing frequencies) (23) 3. (25:47) sine functions (with increasing frequencies)(23) 4. (48:62) Haar functions (with increasing frequencies)(15) Approximation: p = 7, E MAPE = 1.4% 21/43
22 Apx: November 18 th 27 S = 12, MAPE =.57 =.57%. MAPE = 1 n=48 n i Y i Ŷi /Y i 7.2 x 14 Electrical consumption x Dictionnary coefficients (12) 7 6 C S H Figure : left: observed signal - red line, approximated signal -blue line right: S coefficients on the dictionary 22/43
23 Apx: June 17 th, 29 S = 5, MAPE = x 14 Electrical consumption x Dictionnary coefficients (6) C S H Figure : left: observed signal - red line, approximated signal -blue line right: S coefficients on the dictionary 23/43
24 Performances Comparison between various dictionaries: MAPE and sparsity average Dictionary E (%) MAPE (%) S Haar Fourier DB Mixed /43
25 Support for all coefficients 2 using the mixte dictionary (Fourier, Haar) 1.9 C S H /43
26 Segmentation of the intra-day load curves using Sparse Approximation on a Generic Dictionary 8 years of data: from January 1 st 23 to August 13 th 21 T = 28 intra day load curves (n = 48) Sparse approximation of the intra day load curves( S = 7, same support) using a clustering algorithm in 2 steps (k-means algorithm) Segmentation of the daily signals in clusters... From Cluster to Groups using calendar interpretation Patterns defined by Group Centroids 26/43
27 Two step Clustering results x 1 4 x 1 4 x 1 4 x Cluster 1 (66) Cluster 2 (173) Cluster 3 (62) Cluster 4 (51) 1 2 Figure : T = 28 intra day load curves of size n = 48 (clustering using S = 7 approximated coefficients) Remarque: stability study for the 4 main clusters 27/43
28 Mining the clusters... over days (1...7) and months (1..12) exhibits specific consumption periods Cluster 1.a Day 1 5 Cluster 1.a Month 2 1 Cluster 3.a Day Cluster 3.a Month Cluster 1.b Day 2 Cluster 1.b Month 6 4 Cluster 3.b Day 3 2 Cluster 3.b Month Cluster 2.a Day Cluster 2.a Month Cluster 4.a Day Cluster 4.a Month Cluster 2.b Day Cluster 2.b Month 2 1 Cluster 4.b Day 4 2 Cluster 4.b Month /43
29 to Groups From clusters: Cluster 1.a Day Cluster 1.a Month Cluster 3.a Day Cluster 3.a Month Cluster 1.b Day Cluster 1.b Month Cluster 3.b Day Cluster 3.b Month Cluster 2.a Day Cluster 2.a Month Cluster 4.a Day Cluster 4.a Month Cluster 2.b Day Cluster 2.b Month Cluster 4.b Day Cluster 4.b Month To groups: calendar interpretation of the clusters Months Days /43
30 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Strategies using Expert - Aggregation of Experts 3/43
31 Spot of Temperatures, Cloud Cover and Wind information Figure : Temp., Cloud Cover spots ( 39) and wind data ( 293) 31/43
32 Intraday Specific Dictionary: high dimensional model Each day t, Y t = X t β t +ǫ t with X t = [P t M t ] First model: p = 94 2, Pt = [C t B t ] (C t : group centroid, B t = Y t 7 ) 92, Mt, Meteorogical information (Temperature, Cloud Cover, wind Indicators (min, max, med, std) computed over the 39 meteorological spots and 95% PCA (8: T:5/N:25/W:5). Approximation performance: LOL adaptive S = 13, Ē MAPE =.34% To remind: Mixte Generic dictionnary: MAPE=1.34%, S = 7 32/43
33 Intraday Specific Dictionary: low dimensional model Each day t, Y t = X t β t +ǫ t X t = [P t M t ] Selected model, p = P t = [C t B t ] Patterns: 2 group centroid, B t = Y t 7 2. M t Meteorological data 12 (Temperature, Cloud Cover, Wind) Approximation performance: LOL adaptive S = 2.5 [2;8], Ē MAPE = 1.5% [min.2; max.5] LOL fixed sparsity S = 5 [5;5], Ē MAPE =.8% [min.17; max.5] To remind: Mixte Generic dictionnary: MAPE=1.34%, S = 7 : Nice MAPE and Sparsity for approximation 33/43
34 Outline. From Sparse Approximation towards Forecast: I. High dimensional Regression - Theoretical framework II. Sparse Approximation. Application to the intra day load curves - Generic Dictionary, knowledge discovery - Specific dictionary composed of Climate functional variables III. Towards Forecast - Experts dedicated to strategies - Aggregation of Experts 34/43
35 From Sparse approximation to Forecast: Ŷ t Approximation for day t: Ŷ t = X tˆβt +δ t with X t = [P t ;M t ] P t = [C t ;B t ] with n = 48, p = 14 = x ?? Forecast Expert: Ỹ t = X t βt X t = [P t ;M t ] is supposed to be known βt = ˆβ t Plug in estimated coefficients at time t = S(t) << t, with Strategy S 35/43
36 Specialized Experts focus on Different Strategies 1. Time depending (t-1, t-7) (2) 2. (Meteorological configuration of the day (Temperature) 2) 3. Constrained meteorological configuration of the day (Temperature/Cloud Covering) 4. Group constraint meteorological configuration of the day (Temperature/group) 5. Meteorological configuration of the day constrained by the type of the day (Temperature/day) 6. Meteorological configuration of the day constrained by a calendar group (Temperature/calendar) 7. Meteorological configuration of the day (Cloud cover) 8. group constraint meteorological configuration of the day (Cloud Cover/group) 9. Meteorological configuration of the day constrained by the type of the day (Cloud Cover/day) 1. Meteorological configuration of the day constrained by a calendar group (Cloud Cover/calendar) K = 12 36/43
37 Experts hits Frequencies for the expert to perform at best:.14 Winning Expert frequencies tm1 tm7 T Tm T/N Tm/N T/G T/d T/c Ns/G N/d N/c data: form September 1 st 21 to August 31 th 21 The expert are daily competitive 37/43
38 Experts performances data: form September 1 st 21 to August 31 th 21 Names mean median std Naive Oracle ty tw T Tm T/N Tm/N T/G T/d T/c N/G N/d N/c /43
39 Aggregation and performances All experts participate to forecast modulated by their performance of approximation given the strategy at S = t. Ŷ t = s M ws t Ỹs t s M ws t with w s t = exp Y t s Ŷs t s 2 2 /θ t s = S s(t) x ?? 39/43
40 Aggregation performance Names mean median std Naive Oracle ty tw T Tm T/N Tm/N T/G T/d T/c N/G N/d N/c AGG /43
41 Forecasting some intraday load curves Some nice intra day forecasting for different periods: 5.5 x 14 21/6/ x 14 29/1/ Y pred 4.6 Y pred x 14 21/2/ x 14 21/6/ Y 3.4 Y pred pred /43
42 Conclusion and perspectives Universal approach for functional data (with intra day pattern) Sparse approximation using a Generic dictionary for compression and pattern extraction Intra day specific dictionaries for approximation and prediction Forecasting Various experts for prediction Agregation using exponential weights Competitive approach compared to usual time serie analysis with much less parameters. 42/43
43 Thanks for your attention! More information on: sites.google.com/site/mougeotmathilde/research 43/43
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPart II Redundant Dictionaries and Pursuit Algorithms
Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationLasso on Categorical Data
Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.
More informationHow to assess the risk of a large portfolio? How to estimate a large covariance matrix?
Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationExtreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationDisjoint sparsity for signal separation and applications to hybrid imaging inverse problems
Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems Giovanni S Alberti (joint with H Ammari) DMA, École Normale Supérieure, Paris June 16, 2015 Giovanni S Alberti
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationSocial Media Aided Stock Market Predictions by Sparsity Induced Regression
Social Media Aided Stock Market Predictions by Sparsity Induced Regression Delft Center for Systems and Control Social Media Aided Stock Market Predictions by Sparsity Induced Regression For the degree
More informationContext selection for volume forecast
Prediction of time series and non stationary time series, 10 february 2012 Context selection for volume forecast The influence of news on traded volume Nathanaël Mayo Outlook 1 Events Definition Combining
More informationFitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationWeb-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationPredicting daily incoming solar energy from weather data
Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationitesla Project Innovative Tools for Electrical System Security within Large Areas
itesla Project Innovative Tools for Electrical System Security within Large Areas Samir ISSAD RTE France samir.issad@rte-france.com PSCC 2014 Panel Session 22/08/2014 Advanced data-driven modeling techniques
More informationCausal Leading Indicators Detection for Demand Forecasting
Causal Leading Indicators Detection for Demand Forecasting Yves R. Sagaert, El-Houssaine Aghezzaf, Nikolaos Kourentzes, Bram Desmet Department of Industrial Management, Ghent University 13/07/2015 EURO
More informationA Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison
More informationHETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11.
COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg daniel.harenberg@gmx.de University of Mannheim Econ 714, 28.11.06 Daniel Harenberg
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationCredit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
More informationGovernment Debt Management: the Long and the Short of it
Government Debt Management: the Long and the Short of it E. Faraglia (U. of Cambridge and CEPR) A. Marcet (IAE, UAB, ICREA, BGSE, MOVE and CEPR), R. Oikonomou (U.C. Louvain) A. Scott (LBS and CEPR) ()
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationInternational Stock Market Integration: A Dynamic General Equilibrium Approach
International Stock Market Integration: A Dynamic General Equilibrium Approach Harjoat S. Bhamra London Business School 2003 Outline of talk 1 Introduction......................... 1 2 Economy...........................
More informationVI. Real Business Cycles Models
VI. Real Business Cycles Models Introduction Business cycle research studies the causes and consequences of the recurrent expansions and contractions in aggregate economic activity that occur in most industrialized
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationLimitations of Equilibrium Or: What if τ LS τ adj?
Limitations of Equilibrium Or: What if τ LS τ adj? Bob Plant, Laura Davies Department of Meteorology, University of Reading, UK With thanks to: Steve Derbyshire, Alan Grant, Steve Woolnough and Jeff Chagnon
More informationIntroduction: Overview of Kernel Methods
Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University
More informationRegularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
More informationGenerating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationMultiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationPredictive modelling around the world 28.11.13
Predictive modelling around the world 28.11.13 Agenda Why this presentation is really interesting Introduction to predictive modelling Case studies Conclusions Why this presentation is really interesting
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationRidge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationVekua approximations and applications to soundeld sampling and source localisation
1/50 Vekua approximations and applications to soundeld sampling and source localisation Gilles Chardon Acoustics Research Institute, Vienna, Austria 2/50 Introduction Context Joint work with Laurent Daudet
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationStatistical Analysis of Life Insurance Policy Termination and Survivorship
Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial
More informationEHz Network Analysis and phonemic Diagrams
Contribute 22 Classification of EEG recordings in auditory brain activity via a logistic functional linear regression model Irène Gannaz Abstract We want to analyse EEG recordings in order to investigate
More informationLecture Topic: Low-Rank Approximations
Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationIndian Research Journal of Extension Education Special Issue (Volume I), January, 2012 161
Indian Research Journal of Extension Education Special Issue (Volume I), January, 2012 161 A Simple Weather Forecasting Model Using Mathematical Regression Paras 1 and Sanjay Mathur 2 1. Assistant Professor,
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationABSTRACT JEL: C35, C63, M15. KEYWORDS: Project Management, Performance, Prediction, Earned Value INTRODUCTION
GLOBAL JOURNAL OF BUSINESS RESEARCH VOLUME 7 NUMBER 5 013 EXTREME PROGRAMMING PROJECT PERFORMANCE MANAGEMENT BY STATISTICAL EARNED VALUE ANALYSIS Wei Lu, Duke University Li Lu, University of Electronic
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationTHREE DIMENSIONAL GEOMETRY
Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationSovereign Defaults. Iskander Karibzhanov. October 14, 2014
Sovereign Defaults Iskander Karibzhanov October 14, 214 1 Motivation Two recent papers advance frontiers of sovereign default modeling. First, Aguiar and Gopinath (26) highlight the importance of fluctuations
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationWealth Accumulation, Credit Card Borrowing, and. Consumption-Income Comovement
Wealth Accumulation, Credit Card Borrowing, and Consumption-Income Comovement David Laibson, Andrea Repetto, and Jeremy Tobacman Current Draft: May 2002 1 Self control problems People are patient in the
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationUsing Order Book Data
Q3 2007 Using Order Book Data Improve Automated Model Performance by Thom Hartle TradeFlow Charts and Studies - Patent Pending TM Reprinted from the July 2007 issue of Automated Trader Magazine www.automatedtrader.net
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationα α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationForecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
More informationRegression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear
More informationTo give it a definition, an implicit function of x and y is simply any relationship that takes the form:
2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationSome stability results of parameter identification in a jump diffusion model
Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss
More informationBig data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015
Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd COEURE workshop Brussels 3-4 July 2015 WHAT IS BIG DATA IN ECONOMICS? Frank Diebold claimed to have introduced
More informationPackage bigdata. R topics documented: February 19, 2015
Type Package Title Big Data Analytics Version 0.1 Date 2011-02-12 Author Han Liu, Tuo Zhao Maintainer Han Liu Depends glmnet, Matrix, lattice, Package bigdata February 19, 2015 The
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationDistressed Debt Prices and Recovery Rate Estimation
Distressed Debt Prices and Recovery Rate Estimation Robert Jarrow Joint Work with Xin Guo and Haizhi Lin May 2008 Introduction Recent market events highlight the importance of understanding credit risk.
More informationSWISS ARMY KNIFE INDICATOR John F. Ehlers
SWISS ARMY KNIFE INDICATOR John F. Ehlers The indicator I describe in this article does all the common functions of the usual indicators, such as smoothing and momentum generation. It also does some unusual
More informationPenalized Splines - A statistical Idea with numerous Applications...
Penalized Splines - A statistical Idea with numerous Applications... Göran Kauermann Ludwig-Maximilians-University Munich Graz 7. September 2011 1 Penalized Splines - A statistical Idea with numerous Applications...
More informationNumerical PDE methods for exotic options
Lecture 8 Numerical PDE methods for exotic options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 Barrier options For barrier option part of the option contract is triggered if the asset
More information3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
More informationRisk Management and Stress Testing From Theory to Practice
Conference for 65th Birthday Professor Wolfgang Runggaldier Risk Management and Stress Testing From Theory to Practice Michele Bonollo michele.bonollo@sgsbpvn.it Bressanone, 18 luglio 2007 What did I learn
More informationThe Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
More informationInvestigation of VLF Test Parameters. Joshua Perkel Jorge Altamirano Nigel Hampton
Investigation of VLF Test Parameters Joshua Perkel Jorge Altamirano Nigel Hampton 1 Introduction - why IEEE400.2 is in use with recommendations of test times and test voltages At the start of the CDFI
More information4.3. David E. Rudack*, Meteorological Development Laboratory Office of Science and Technology National Weather Service, NOAA 1.
43 RESULTS OF SENSITIVITY TESTING OF MOS WIND SPEED AND DIRECTION GUIDANCE USING VARIOUS SAMPLE SIZES FROM THE GLOBAL ENSEMBLE FORECAST SYSTEM (GEFS) RE- FORECASTS David E Rudack*, Meteorological Development
More informationA Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationIntroduction to nonparametric regression: Least squares vs. Nearest neighbors
Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,
More informationMSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationCyber-Security Analysis of State Estimators in Power Systems
Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute
More informationMaster s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.
Master s Thesis Title A Study on Active Queue Management Mechanisms for Internet Routers: Design, Performance Analysis, and Parameter Tuning Supervisor Prof. Masayuki Murata Author Tomoya Eguchi February
More informationMAINTAINED SYSTEMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University ENGINEERING RELIABILITY INTRODUCTION
MAINTAINED SYSTEMS Harry G. Kwatny Department of Mechanical Engineering & Mechanics Drexel University OUTLINE MAINTE MAINTE MAINTAINED UNITS Maintenance can be employed in two different manners: Preventive
More informationSingular Spectrum Analysis with Rssa
Singular Spectrum Analysis with Rssa Maurizio Sanarico Chief Data Scientist SDG Consulting Milano R June 4, 2014 Motivation Discover structural components in complex time series Working hypothesis: a signal
More information