Statistics in Geophysics: Linear Regression II
|
|
- Tiffany Melanie Stevenson
- 7 years ago
- Views:
Transcription
1 Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28
2 Model definition Suppose we have the following model under consideration: y = Xβ + ɛ, where y = (y 1..., y n ) is an n 1 vector of observations on the response, β = (β 0, β 1,..., β k ) is a (k + 1) 1 vector of parameters, ɛ = (ɛ 1,..., ɛ n ) is an n 1 vector of random errors, and X is the n (k + 1) design matrix with X = 1 x 11 x 1k... 1 x n1 x nk. Winter Term 2013/14 2/28
3 Model assumptions The following assumptions are made: 1 E(ɛ) = 0. 2 Cov(ɛ) = σ 2 I. 3 The design matrix has full column rank, that is, rank(x) = k + 1 = p. 4 The normal regression model is obtained if additionally ɛ N (0, σ 2 I). For stochastic covariates these assumptions are to be understood conditionally on X. It follows that E(y) = Xβ and Cov(y) = σ 2 I. In case of assumption 4, we have y N (Xβ, σ 2 I). Winter Term 2013/14 3/28
4 Modelling the effects of continuous covariates We can fit nonlinear relationships between continuous covariates and the response within the scope of linear models. Two simple methods for dealing with nonlinearity: 1 Variable transformation; 2 Polynomial regression. Sometimes it is customary to transform the continuous response as well. Winter Term 2013/14 4/28
5 Modelling the effects of categorical covariates We might want to include a categorical variable with two or more distinct levels. In such a case we cannot set up a continuous scale for the categorical variable. A remedy is to define new covariates, so-called dummy variables, and estimate a separate effect for each category of the original covariate. We can deal with c categories by the introduction of c 1 dummy variables. Winter Term 2013/14 5/28
6 Example: Turkey data weightp agew origin G G G G V V V V W W W W W Turkey weights in pounds, ages in weeks, and origin (Georgia (G); Virgina (V); Wisconsin (W)), of 13 Thanksgiving turkeys. Winter Term 2013/14 6/28
7 Dummy coding for categorical covariates Given a covariate x {1,..., c} with c categories, we define the c 1 dummy variables x i1 = { 1 xi = 1 0 otherwise,... x i,c 1 = { 1 xi = c 1 0 otherwise, for i = 1..., n, and include them as explanatory variables in the regression model y i = β 0 + β 1 x i β i,c 1 x i,c ɛ i. For reasons of identifiability, we omit the dummy variable for category c, where c is the reference category. Winter Term 2013/14 7/28
8 Design matrix for the turkey data using dummy coding > X (Intercept) agew originv originw Winter Term 2013/14 8/28
9 Interactions between covariates An interaction between predictor variables exists if the effect of a covariate depends on the value of at least one other covariate. Consider the following model between a response y and two covariates x 1 and x 2 : y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + ɛ, where the term β 3 x 1 x 2 is called an interaction between x 1 and x 2. The terms β 1 x 1 and β 2 x 2 depend on only one variable and are called main effects. Winter Term 2013/14 9/28
10 Least squares estimation of regression coefficients The error sum of squares is ɛ ɛ = (y Xβ) (y Xβ) = y y β X y y Xβ + β X Xβ = y y 2β X y + β X Xβ. The least squares estimator of β is the value ˆβ, which minimizes ɛ ɛ. Minimizing ɛ ɛ with respect to β yields ˆβ = (X X) 1 X y, which is obtained irrespective of any distribution properties of the errors. Winter Term 2013/14 10/28
11 Maximum likelihood estimation of regression coefficients Assuming normally distributed errors yields the likelihood ( L(β, σ 2 1 ) = exp 1 ) (2πσ 2 n/2 ) 2σ 2 (y Xβ) (y Xβ). The log-likelihood is given by l(β, σ 2 ) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 (y Xβ) (y Xβ). Maximizing l(β, σ 2 ) with respect to β is equivalent to minimizing (y Xβ) (y Xβ), which is the least squares criterion. The MLE, ˆβ ML, therefore is identical to the least squares estimator. Winter Term 2013/14 11/28
12 Fitted values and residuals Based on ˆβ = (X X) 1 X y, we can estimate the (conditional) mean of y by Ê(y) = ŷ = Xˆβ. Substituting the least squares estimator further results in ŷ = X(X X) 1 X y = Hy, where the n n matrix H is called the hat matrix. The residuals are ˆɛ = y ŷ = y Hy = (I H)y. Winter Term 2013/14 12/28
13 Estimation of the error variance Maximization of the log-likelihood with respect to σ 2 yields ˆσ 2 ML = 1 n (y Xˆβ) (y Xˆβ) = 1 n (y ŷ) (y ŷ) = 1 n ˆɛ ˆɛ. Since E(ˆσ ML) 2 = n p n the MLE of σ 2 is biased. σ 2, An unbiased estimator is ˆσ 2 = 1 n p ˆɛ ˆɛ. Winter Term 2013/14 13/28
14 Properties of the least squares estimator For the least squares estimator we have E(ˆβ) = β. Cov(ˆβ) = σ 2 (X X) 1. Gauss-Markov Theorem: Among all linear and unbiased estimators ˆβ L, the least squares estimator has minimal variances, implying Var( ˆβ j ) Var( ˆβ L j ), j = 0,..., k. If ɛ N (0, σ 2 I), then ˆβ N (β, σ 2 (X X) 1 ). Winter Term 2013/14 14/28
15 ANOVA table for multiple linear regression and R 2 Source Degrees of freedom Sum of squares Mean square F -value of variation (df) (SS) (MS) Regression k SSR MSR = SSR/k Residual n p = n (k + 1) SSE ˆσ 2 = SSE n p Total n 1 SST The multiple coefficient of determination is still computed as R 2 = SST SSR, but is no longer the square of the Pearson correlation between the response and any of the predictor variables. MSR ˆσ 2 Winter Term 2013/14 15/28
16 Interval estimation and tests We would like to construct confidence intervals and statistical tests for hypotheses regarding the unknown regression parameters β. A requirement for the construction of tests and confidence intervals is the assumption of normally distributed errors. However, tests and confidence intervals are relatively robust to mild departures from the normality assumption. Moreover, tests and confidence intervals, derived under the assumption of normality, remain valid for large sample size even with non-normal errors. Winter Term 2013/14 16/28
17 Testing linear hypotheses Hypotheses: 1 General linear hypothesis: H 0 : Cβ = d against H 1 : Cβ d, where C is a r p matrix with rank(c) = r p (r linear independent restrictions) and d is a p 1 vector. 2 Test of significance (t-test): H 0 : β j = 0 against H 1 : β j 0. Winter Term 2013/14 17/28
18 Testing linear hypotheses Hypotheses: 3 Composite test of subvector: H 0 : β 1 = 0 against H 1 : β Test for significance of regression: H 0 : β 1 = β 2 = = β k = 0 against H 1 : β j 0 for at least one j {1,..., k}. Winter Term 2013/14 18/28
19 Testing linear hypotheses Test statistics: 1 F = 1 r (Cˆβ d) (ˆσ 2 C(X X) 1 C ) 1 (Cˆβ d) F r,n p. 2 t j = ˆβ j ˆσ ˆβj t n p, where ˆσ ˆβ j denotes the standard error of ˆβ j. 3 F = 1 r (ˆβ 1 ) Ĉov(ˆβ 1 ) 1 (ˆβ 1 ) F r,n p. 4 F = n p k SSR SSE = n p k R 2 1 R 2 F k,n p. Winter Term 2013/14 19/28
20 Testing linear hypotheses Critical values: Reject H 0 in the case of: 1 F > F r,n p (1 α). 2 t > t n p (1 α/2). 3 F > F r,n p (1 α). 4 F > F k,n p (1 α) Winter Term 2013/14 20/28
21 Confidence intervals and regions for regression coefficients Confidence interval for β j : A confidence interval for β j with level 1 α is given by [ ˆβ j t n p (1 α/2)ˆσ ˆβ j, ˆβ j + t n p (1 α/2)ˆσ ˆβj ]. Confidence region for subvector β 1 : A confidence ellipsoid for β 1 = (β 1,..., β r ) with level 1 α is given by { β 1 : 1 r (ˆβ 1 β 1 ) Ĉov(ˆβ } 1 ) 1 (ˆβ 1 β 1 ) F r,n p (1 α). Winter Term 2013/14 21/28
22 Prediction intervals Confidence interval for the (conditional) mean of a future observation: A confidence interval for E(y 0 ) of a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(x 0 (X X) 1 x 0 ) 1/2. Prediction interval for a future observation: A prediction interval for a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(1 + x 0 (X X) 1 x 0 ) 1/2. Winter Term 2013/14 22/28
23 The corrected coefficient of determination We already defined the coefficient of determination, R 2, as a measure for the goodness-of-fit to the data. The use of R 2 is limited, since it will never decrease with the addition of a new covariate into the model. The corrected coefficient of determination, Radj 2, adjusts for this problem, by including a correction term for the number of parameters. It is defined by R 2 adj = 1 n 1 n p (1 R2 ). Winter Term 2013/14 23/28
24 Akaike information criterion The Akaike information criterion (AIC) is one of the most widely used criteria for model choice within the scope of likelihood-based inference. The AIC is defined by AIC = 2l(ˆβ ML, σ 2 ML) + 2(p + 1), where l(ˆβ ML, σml 2 ) is the maximum value of the log-likelihood. Smaller values of the AIC correspond to a better model fit. Winter Term 2013/14 24/28
25 Bayesian information criterion The Bayesian information criterion (BIC) is defined by BIC = 2l(ˆβ ML, σ 2 ML) + log(n)(p + 1). The BIC multiplied by 1/2 is also known as Schwartz criterion. Smaller values of the BIC correspond to a better model fit. The BIC penalizes complex models much more than the AIC. Winter Term 2013/14 25/28
26 Practical use of model choice criteria To select the most promising models from candidate models, we first obtain a preselection of potential models. All potential models can now be assessed with the aid of one of the various model choice criteria (AIC, BIC). This method is not always practical, since the number of regressor variables and modelling variants can be very large in many applications. In this case, we can use the following partially heuristic methods. Winter Term 2013/14 26/28
27 Practical use of model choice criteria II Complete model selection: In case that the number of predictor variables is not too large, we can determine the best model with the leaps-and-bounds algorithm. Forward selection: 1 Based on a starting model, forward selection includes one additional variable in every iteration of the algorithm. 2 The variable which offers the greatest reduction of a preselected model choice criterion is chosen. 3 The algorithm terminates if no further reduction is possible. Winter Term 2013/14 27/28
28 Practical use of model choice criteria III Backward elimination: 1 Backward elimination starts with the full model containing all predictor variables. 2 Subsequently, in every iteration, the covariate which provides the greatest reduction of the model choice criterion is eliminated from the model. 3 The algorithm terminates if no further reduction is possible. Stepwise selection: Stepwise selection is a combination of forward selection and backward elimination. In every iteration of the algorithm, a predictor variable may be added to the model or removed from the model. Winter Term 2013/14 28/28
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationNotes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationUniversity of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.
University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationOne-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationPart II. Multiple Linear Regression
Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationWeb-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationMultivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationGeneral Regression Formulae ) (N-2) (1 - r 2 YX
General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More information1 Theory: The General Linear Model
QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the
More informationRegression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More information1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationSections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More information1 Simple Linear Regression I Least Squares Estimation
Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationOne-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More information3.1 Least squares in matrix form
118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression
More informationChapter 6: Point Estimation. Fall 2011. - Probability & Statistics
STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationHow to do hydrological data validation using regression
World Bank & Government of The Netherlands funded Training module # SWDP - 37 How to do hydrological data validation using regression New Delhi, February 00 CSMRS Building, 4th Floor, Olof Palme Marg,
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationEmpirical Model-Building and Response Surfaces
Empirical Model-Building and Response Surfaces GEORGE E. P. BOX NORMAN R. DRAPER Technische Universitat Darmstadt FACHBEREICH INFORMATIK BIBLIOTHEK Invortar-Nf.-. Sachgsbiete: Standort: New York John Wiley
More informationMULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis
Journal of tourism [No. 8] MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM Assistant Ph.D. Erika KULCSÁR Babeş Bolyai University of Cluj Napoca, Romania Abstract This paper analysis
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationPOLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationLinear Regression. Guy Lebanon
Linear Regression Guy Lebanon Linear Regression Model and Least Squares Estimation Linear regression is probably the most popular model for predicting a RV Y R based on multiple RVs X 1,..., X d R. It
More informationCollege Readiness LINKING STUDY
College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-26-32 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
More informationFalse. Model 2 is not a special case of Model 1, because Model 2 includes X5, which is not part of Model 1. What she ought to do is estimate
Sociology 59 - Research Statistics I Final Exam Answer Key December 6, 00 Where appropriate, show your work - partial credit may be given. (On the other hand, don't waste a lot of time on excess verbiage.)
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationTESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS. Abstract
TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS GARY G. VENTER Abstract The use of age-to-age factors applied to cumulative losses has been shown to produce least-squares optimal reserve estimates when certain
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationThe Method of Least Squares
Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationTechnical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE
Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationSmoothing and Non-Parametric Regression
Smoothing and Non-Parametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More information