GAM for large datasets and load forecasting
|
|
|
- Brett Flowers
- 10 years ago
- Views:
Transcription
1 GAM for large datasets and load forecasting Simon Wood, University of Bath, U.K. in collaboration with Yannig Goude, EDF
2 French Grid load GW day GW day Half hourly load data from 1st Sept 22. Pierrot and Goude (211) successfully predicted one day ahead using 48 separate models, one for each half hour period of the day.
3 Why 48 models? The 48 separate models each have a form something like MW t = f (MW t 48 ) + j g j (x jt ) + ɛ t where f and the g j are a smooth functions to be estimated, and the x j are other covariates. 48 Separate models has some disadvantages 1. Continuity of effects across the day is not enforced. 2. Information is wasted as the correlation between neighbouring time points is not utilised. 3. Fitting 48 models, each to 1/48 of the data promotes estimate instability and creates a huge model checking task each time the model is updated (every day?) But existing methods for fitting such smooth additive models could not cope with fitting one model to all the data.
4 Smooth additive models The basic model is y i = j f j (x ji ) + ɛ i where the f j are smooth functions to be estimated. Need to estimate the f j (including how smooth). Represent each f j using a spline basis expansion f j (x j ) = k b jk(x j )β jk where b jk are basis functions and β jk are coefficients (maybe 1-1 of each). So model becomes y i = j b jk (x ji )β jk + ɛ i = X i β + ɛ i k... a linear model. But it is much too flexible.
5 Penalizing overfit To avoid over fitting, we penalize function wiggliness (lack of smoothness) while fitting. In particular let J j (f j ) = β T Sβ measure wiggliness of f j. Estimate β to minimize (y i X i β) 2 + i j λ j β T Sβ where λ j are smoothing parameters. High λ j smooth f j. Low λ j wiggly f j.
6 Example basis-penalty: P-splines Eilers and Marx have popularized the use of B-spline bases with discrete penalties. If b k (x) is a B-spline and β k an unknown coefficient, then K f (x) = β k b k (x). Wiggliness can be penalized by e.g. k K 1 P = (β j 1 2β j + β j+1 ) 2 = β T Sβ. k=2 b k (x) x f(x) x
7 Example varying P-spline penalty y basis functions y (β i 1 2β i + β i+1 ) 2 = 27 i x x y (β i 1 2β i + β i+1 ) 2 = 16.4 i y (β i 1 2β i + β i+1 ) 2 =.4 i
8 Choosing how much to smooth y λ too high λ about right y y λ too low x x x Choose λ j to optimize model s ability to predict data it wasn t fitted too. Generalized cross validation chooses λ j to minimize V(λ) = y Xβ 2 [n tr(f)] 2 where F = (X T X + j λ js j ) 1 X T X. tr(f) = effective degrees of freedom.
9 The main problem We estimate β by the minimizer of y Xβ 2 + j λ j β T Sβ and the λ j to minimise V(λ) = y Xβ 2 [n tr(f)] 2 If X is too large, then we can easily run out of computer memory when trying to fit the model. A simple approach will let us fit the model without forming X whole.
10 A smaller equivalent fitting problem Let X be n p Consider the QR decomposition X = QR, where R is p p and Q has orthogonal columns. Let f = Q T y and γ = y 2 f 2. y Xβ 2 = f Rβ 2 + γ and V(λ) = y Xβ 2 [n tr(f)] 2 = f Rβ 2 + γ [n tr(f)] 2 where F = (R T R + j λ js j ) 1 R T R. So once we have R and f we can estimate β and λ without X...
11 QR updating for R and f [ ] [ ] X y Let X =, y = X 1 y 1 [ ] R Form X = Q R & = Q X 1 R. 1 [ ] Q Then X = QR where Q = Q I 1. [ ] Q T y Also f = Q T y = Q T 1 y 1 Applying these results recursively, R and f can be accumulated from sub-blocks of X without forming X whole.
12 X T X updating and Choleski R is the Choleski factor of X T X. Partitioning X row-wise into sub-matrices X 1, X 2,..., we have X T X = j X T j X j which can be used to accumulate X T X without needing to form X whole. At same time accumulate X T y = j XT j y. Choleski decomposition gives R T R = X T X, and f = R 1 X T y. This is twice as fast as QR approach, but less numerically stable.
13 Generalizations 1. Generalized additive models (GAMs) allow distributions other than normal (and non-identity link functions). 2. GAMs can be estimated by iteratively estimating working linear models, using R and f accumulation tricks on each. 3. REML, AIC etc can also be used for λ j estimation. REML can be made especially efficient in this context. 4. The accumulation methods are easy to parallelize. 5. Updating a model fit with new data is very easy. 6. AR1 residuals can also be handled easily in the non-generalized case.
14 Air pollution example Around 5 daily death rates, for Chicago, along with time, ozone, pm1, tmp (last 3 averaged over preceding 3 days). Peng and Welty (24). Appropriate GAM is: death i Poi, log{e(death i )} = f 1 (time i ) + f 2 (ozone i, tmp i ) + f 3 (pm1 i ). f 1 and f 3 penalized cubic regression splines, f 2 tensor product spline. Results suggest a very strong ozone - temperature interaction.
15 Air pollution Chicago results s(time,136.85) time 1se te(o3,tmp,38.63) +1se tmp s(pm1,3.42) o pm1
16 Air pollution Chicago results To test the truth of ozone-temp interaction it would be good to fit to the equivalent data for around 1 other cities, simultaneously. Model is log{e(death i )} = γ j + α k + f k (o3 i, temp i ) + f 4 (t i ) if observation i is from city j and age group k (there are 3 age groups recorded). Model has 82 coefs and is estimated from 1.2M data. Fitting takes 12.5 minutes using 4 cores of a $6 PC. Same model to just Chicago, takes 11.5 minutes by previous methods.
17 Air pollution all cities results <65 s(time,277.36) tmp time o >74 tmp tmp o o3
18 Back to grid load GW day GW day Now the 48 separate grid load model can be replaced by L i = γ j + f k (I i, L i 48 ) + g 1 (t i ) + g 2 (I i, toy i ) + g 3 (T i, I i ) + g 4 (T.24 i, T.48 i ) + g 5 (cloud i ) + ST i h(i i ) + e i if observation i is from day of the week j, and day class k. e i = ρe i 1 + ɛ i and ɛ i N(, σ 2 ) (AR1).
19 Fitting grid load model Using QR update fitting takes under 3 minutes and we estimate ρ =.98. Update with new data then takes under 2 minutes. We fit up to 31 August 28, and then predicted the next years data, one day at a time, with model update. Predictive RMSE was 1156MW (124MW for fit). Predictive MAPE 1.62% (1.46% for fit). Setting ρ = gives overfit, and worse predictive performance.
20 Residuals MAPE (%) Mo Tu We Th Fr Sa Su Instant 9/1/22 12/19/22 4/8/23 8/12/23 12/4/23 3/22/24 7/26/24 11/17/24 3/7/25 7/6/25 1/23/25 2/18/26 6/2/26 1/8/26 2/3/27 6/4/27 9/21/27 1/17/28 5/6/28 8/31/28 1/9/28 17/9/28 4/1/28 21/1/28 15/11/28 2/12/28 19/12/28 13/1/29 3/1/29 16/2/29 4/3/29 21/3/29 7/4/29 28/4/29 27/5/29 17/6/29 4/7/29 25/7/29 11/8/29 31/8/
21 Temperature effect Temperature Effect 7 65 L[t] T[t] 1 1 I[t]
22 L[t] L[t] L[t] L[t] Lagged load Effects Lagged Load Effect, WW Lagged Load Effect, WH L[t 1] I[t] L[t 1] I[t] 3 4 Lagged Load Effect, HH Lagged Load Effect, HW L[t 1] I[t] L[t 1] I[t] 3 4
How To Calculate Electricity Load Data
Combining Forecasts for Short Term Electricity Load Forecasting Weight 0.0 0.2 0.4 0.6 0.8 1.0 0 1000 2000 3000 4000 5000 Step M. Devaine, P. Gaillard, Y.Goude, G. Stoltz ENS Paris - EDF R&D - CNRS, INRIA
α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
GLAM Array Methods in Statistics
GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a low-storage, high-speed, GLAM method for multidimensional smoothing, when data forms an array,
3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is
Time series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
Estimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
Trend and Seasonal Components
Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Penalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED
1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort [email protected] Session Number: TBR14 Insurance has always been a data business The industry has successfully
Penalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
A Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison
Polynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
Forecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
Sales forecasting # 2
Sales forecasting # 2 Arthur Charpentier [email protected] 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
Regression Analysis LECTURE 2. The Multiple Regression Model in Matrices Consider the regression equation. (1) y = β 0 + β 1 x 1 + + β k x k + ε,
LECTURE 2 Regression Analysis The Multiple Regression Model in Matrices Consider the regression equation (1) y = β 0 + β 1 x 1 + + β k x k + ε, and imagine that T observations on the variables y, x 1,,x
Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
Energy Demand Forecasting Industry Practices and Challenges
Industry Practices and Challenges Mathieu Sinn (IBM Research) 12 June 2014 ACM e-energy Cambridge, UK 2010 2014 IBM IBM Corporation Corporation Outline Overview: Smarter Energy Research at IBM Industry
Smoothing and Non-Parametric Regression
Smoothing and Non-Parametric Regression Germán Rodríguez [email protected] Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest
MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
Penalized Splines - A statistical Idea with numerous Applications...
Penalized Splines - A statistical Idea with numerous Applications... Göran Kauermann Ludwig-Maximilians-University Munich Graz 7. September 2011 1 Penalized Splines - A statistical Idea with numerous Applications...
A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH [email protected]
A Regime-Switching Model for Electricity Spot Prices Gero Schindlmayr EnBW Trading GmbH [email protected] May 31, 25 A Regime-Switching Model for Electricity Spot Prices Abstract Electricity markets
Regularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1
Rob J Hyndman Forecasting using 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1 Outline 1 Regression with ARIMA errors 2 Example: Japanese cars 3 Using Fourier terms for seasonality 4
Iterative Solvers for Linear Systems
9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Random effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
Computer Graphics. Geometric Modeling. Page 1. Copyright Gotsman, Elber, Barequet, Karni, Sheffer Computer Science - Technion. An Example.
An Example 2 3 4 Outline Objective: Develop methods and algorithms to mathematically model shape of real world objects Categories: Wire-Frame Representation Object is represented as as a set of points
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
exspline That: Explaining Geographic Variation in Insurance Pricing
Paper 8441-2016 exspline That: Explaining Geographic Variation in Insurance Pricing Carol Frigo and Kelsey Osterloo, State Farm Insurance ABSTRACT Generalized linear models (GLMs) are commonly used to
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
ISSUES IN UNIVARIATE FORECASTING
ISSUES IN UNIVARIATE FORECASTING By Rohaiza Zakaria 1, T. Zalizam T. Muda 2 and Suzilah Ismail 3 UUM College of Arts and Sciences, Universiti Utara Malaysia 1 [email protected], 2 [email protected] and 3
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Long-term Health Spending Persistence among the Privately Insured: Exploring Dynamic Panel Estimation Approaches
Long-term Health Spending Persistence among the Privately Insured: Exploring Dynamic Panel Estimation Approaches Sebastian Calonico, PhD Assistant Professor of Economics University of Miami Co-authors
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
Corporate Defaults and Large Macroeconomic Shocks
Corporate Defaults and Large Macroeconomic Shocks Mathias Drehmann Bank of England Andrew Patton London School of Economics and Bank of England Steffen Sorensen Bank of England The presentation expresses
POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001,
Chapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
Flexible geoadditive survival analysis of non-hodgkin lymphoma in Peru
Statistics & Operations Research Transactions SORT 36 (2) July-December 2012, 221-230 ISSN: 1696-2281 eissn: 2013-8830 www.idescat.cat/sort/ Statistics & Operations Research c Institut d Estadística de
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Smoothing. Fitting without a parametrization
Smoothing or Fitting without a parametrization Volker Blobel University of Hamburg March 2005 1. Why smoothing 2. Running median smoother 3. Orthogonal polynomials 4. Transformations 5. Spline functions
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
[3] Big Data: Model Selection
[3] Big Data: Model Selection Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/teaching [3] Making Model Decisions Out-of-Sample vs In-Sample performance Regularization
Hedging Options In The Incomplete Market With Stochastic Volatility. Rituparna Sen Sunday, Nov 15
Hedging Options In The Incomplete Market With Stochastic Volatility Rituparna Sen Sunday, Nov 15 1. Motivation This is a pure jump model and hence avoids the theoretical drawbacks of continuous path models.
Estimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
Multiple Choice: 2 points each
MID TERM MSF 503 Modeling 1 Name: Answers go here! NEATNESS COUNTS!!! Multiple Choice: 2 points each 1. In Excel, the VLOOKUP function does what? Searches the first row of a range of cells, and then returns
Package smoothhr. November 9, 2015
Encoding UTF-8 Type Package Depends R (>= 2.12.0),survival,splines Package smoothhr November 9, 2015 Title Smooth Hazard Ratio Curves Taking a Reference Value Version 1.0.2 Date 2015-10-29 Author Artur
Sales and operations planning (SOP) Demand forecasting
ing, introduction Sales and operations planning (SOP) forecasting To balance supply with demand and synchronize all operational plans Capture demand data forecasting Balancing of supply, demand, and budgets.
Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
PREDICTIVE MODELS IN LIFE INSURANCE
PREDICTIVE MODELS IN LIFE INSURANCE Date: 17 June 2010 Philip L. Adams, ASA, MAAA Agenda 1. Predictive Models Defined 2. Predictive models past and present 3. Actuarial perspective 4. Application to Life
Nonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
Credit Scoring Modelling for Retail Banking Sector.
Credit Scoring Modelling for Retail Banking Sector. Elena Bartolozzi, Matthew Cornford, Leticia García-Ergüín, Cristina Pascual Deocón, Oscar Iván Vasquez & Fransico Javier Plaza. II Modelling Week, Universidad
Nonnested model comparison of GLM and GAM count regression models for life insurance data
Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Volatility modeling in financial markets
Volatility modeling in financial markets Master Thesis Sergiy Ladokhin Supervisors: Dr. Sandjai Bhulai, VU University Amsterdam Brian Doelkahar, Fortis Bank Nederland VU University Amsterdam Faculty of
7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
Factor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network
, pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
2.2 Elimination of Trend and Seasonality
26 CHAPTER 2. TREND AND SEASONAL COMPONENTS 2.2 Elimination of Trend and Seasonality Here we assume that the TS model is additive and there exist both trend and seasonal components, that is X t = m t +
SAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
Linear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, [email protected] Spring 2007 Lecture 3: QR, least squares, linear regression Linear Algebra Methods for Data Mining, Spring 2007, University
SUPPLEMENT TO USAGE-BASED PRICING AND DEMAND FOR RESIDENTIAL BROADBAND : APPENDIX (Econometrica, Vol. 84, No. 2, March 2016, 411 443)
Econometrica Supplementary Material SUPPLEMENT TO USAGE-BASED PRICING AND DEMAND FOR RESIDENTIAL BROADBAND : APPENDIX (Econometrica, Vol. 84, No. 2, March 2016, 411 443) BY AVIV NEVO, JOHN L. TURNER, AND
ARMA, GARCH and Related Option Pricing Method
ARMA, GARCH and Related Option Pricing Method Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook September
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Hypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin
Final Mathematics 51, Section 1, Fall 24 Instructor: D.A. Levin Name YOU MUST SHOW YOUR WORK TO RECEIVE CREDIT. A CORRECT ANSWER WITHOUT SHOWING YOUR REASONING WILL NOT RECEIVE CREDIT. Problem Points Possible
Optimal Hiring of Cloud Servers A. Stephen McGough, Isi Mitrani. EPEW 2014, Florence
Optimal Hiring of Cloud Servers A. Stephen McGough, Isi Mitrani EPEW 2014, Florence Scenario How many cloud instances should be hired? Requests Host hiring servers The number of active servers is controlled
HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg [email protected]. University of Mannheim. Econ 714, 28.11.
COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg [email protected] University of Mannheim Econ 714, 28.11.06 Daniel Harenberg
CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
7 Time series analysis
7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are
