# Prediction and Confidence Intervals in Regression

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus. Assignment #1 due Friday Substantial penalty if not turned in until Monday. Feedback to me In-class feedback form from web page Cohort academic reps, quality circle Feedback questions on class web page

2 Fall Semester, Review of Key Points Regression model adds assumptions to the equation 0. Equation, stated either in terms of the average of the response ave( Y X )= b + b X i i 0 1 i or by adding an error term to the average, obtaining for each of the individual values of the response Yi = ave( Yi Xi) + εi = β + β X + ε Independent observations (independent errors) 2. Constant variance observations (equal error variance) i 3. Normally distributed around the regression line, written as an assumption about the unobserved errors: ε i 2 ~ N( 0, σ ) Least squares estimates of intercept, slope and errors i Importance of the assumptions evident in the context of prediction. Checking the assumptions Linearity (order IS important) Scatterplots(data/residuals) (nonlinearity) Independence Plots highlighting trend (autocorrelation) Constant variance Residual plots (heteroscedasticity) Normality Quantile plots (outliers, skewness) (summary on p. 46) One outlier can exert much influence on a regression JMP-IN script illustrating effect of outlier on the fitted least squares regression line (available from my web page for the class). Another version of this script is packaged with the JMP software and installed on your machine (as part of the standard installation).

3 Fall Semester, Review Questions from Feedback Can I use the RMSE to compare models? Yes, but You need to ensure that the comparison is honest. As long as you have the same response in both models, the comparison of two RMSE values makes sense. If you change the response, such as through a log transformation of the response as in the second question of Assignment #1, you need to be careful. The following figure compares a linear fit to a log-log fit for the MPG versus the weight: MPG City Weight(lb) If you compare the nominal RMSEs shown for the two models, you will get the following RMSE linear 2.2 RMSE log-log While the linear fit looks inferior, it s not that much worse! The problem is that we have ignored the units; these two reported RMSE values are on different scales. The linear RMSE is 2.2 MPG, but the log-log RMSE is log MPG, a very different scale. JMP-IN helps out if you ask it. The following summary describes the accuracy of the fitted log-log model, but in the original units. That summary gives an RMSE of 1.9 MPG, close to the linear value.

4 Fall Semester, What does RMSE tell you? Fit Measured on Original Scale Sum of Squared Error Root Mean Square Error RSquare Sum of Residuals The RMSE gives the SD of the residuals. The RMSE thus estimates the concentration of the data around the fitted equation. Here s the plot of the residuals from the linear equation. Residual Weight(lb) Visual inspection of the normal quantile plot of the residuals suggests the RMSE is around 2-3. If the data are roughly normal, then most of the residuals lie within about ± 2 RMSE of their mean (at zero): Normal Quantile Plot

5 Fall Semester, Key Applications Providing a margin for error For a feature of the population How does the spending on advertising affect the level of sales? Form confidence intervals as (estimate) +/ 2 SE(estimate) For the prediction of another observation Will sales exceed \$10,000,000 next month? Form prediction intervals as (prediction) +/ 2 RMSE Concepts and Terminology Standard error Same concept as in Stat 603/604, but now in regression models Want a standard error for both the estimated slope and intercept. Emphasize slope since it measures expected impact of changes in the predictor upon values of the response. Similar to expression for SE(sample average), but with one important adjustment (page 91) SE( estimated slope) = SE(ˆ) β SD( error) 1 = sample size SD( predictor) σ 1 = n SD( X) Confidence intervals, t-ratios for parameters As in Stat 603/604, the empirical rule + CLT give intervals of the form Fitted intercept +/ 2 SE(Fitted intercept) Fitted slope +/ 2 SE(Fitted slope) t-ratio answers the question How many SEs is the estimate away from zero?

6 Fall Semester, In-sample prediction versus extrapolation In-sample prediction... able to check model properties. Out-of-sample prediction... hope form of model continues. Statistical extrapolation penalty assumes model form continues. See the example of the display space transformation (p ). Another approach to extrapolation penalty is to compare models e.g., Hurricane path uncertainty formed by comparing the predictions of different models. Confidence intervals for the regression line Y X Answers Where do I think the population regression line lies? Fitted line ± 2 SE(Fitted line) Regression line is average value of response for chosen values of X. Statistical extrapolation penalty CI for regression line grows wider as get farther away from the mean of the predictor. Is this penalty reasonable or optimistic (i.e., too narrow)? JMP-IN: Confid Curves Fit option from the fit pop-up.

7 Fall Semester, Prediction intervals for individual observations Y X Answers Where do I think a single new observation will fall? Interval captures single new random observation rather than average. Must accommodate random variation about the fitted model. Holds about 95% of data surrounding the fitted regression line. Approximate in sample form: Fitted line ± 2 RMSE Typically more useful that CIs for the regression line: More often are trying to predict a new observation, than wondering where the average of a collection of future values lies. JMP-IN: Confid Curves Indiv option from the fit pop-up. Goodness-of-fit and R 2 Proportion of variation in response captured by the fitted model and is thus a relative measure of the goodness of fit RMSE is an absolute measure of the accuracy in the units of Y. Computed from the variability in response and residuals (p 92) Ratio of sums of squared deviations. Related to RMSE, RMSE 2 approximately equals (1 R 2 ) Var(Y) Square of the correlation between the predictor X and response Y. corr(x,y) 2 = R 2 Another way to answer What s a big correlation?

8 Fall Semester, Examples for Today The ideal regression model Utopia.jmp, page 47 Which features of a regression change when the sample size grows? Simulate data from a normal population. Add more observations. Use formula to generate more values. Some features stay about the same : intercept, slope, RMSE, R 2 (Note: Each of these estimates a population feature.) Standard errors shrink and confidence intervals become more narrow. Philadelphia housing prices Phila.jmp, page 62 How do crime rates impact the average selling price of houses? Initial plot shows that Center City is leveraged (unusual in X). Initial analysis with all data finds \$577 impact per crime (p 64). Residuals show lack of normality (p 65). Without CC, regression has much steeper decay, \$2289/crime (p 66). Residuals remain non-normal (p 67). Why is CC an outlier? What do we learn from this point? Alternative analysis with transformation suggests may be not so unusual. (see pages 68-70)

9 Fall Semester, Housing construction Cottages.jmp, page 89 How much can a builder expect to profit from building larger homes? Highly leveraged observation ( special cottage ) (p 89) Contrast confidence intervals with prediction intervals. role of assumptions of constant variance and normality. Model with special cottage R 2 0.8, RMSE 3500 (p 90) Predictions suggest profitable Model without special cottage R , RMSE 3500 (p94-95) Predictions are useless Recall demonstration with JMP-IN rubber-band regression line. Do we keep the outlier, or do we exclude the outlier? Liquor sales and display space Display.jmp, page 99 How precise is our estimate of the number of display feet? Can this model be used to predict sales for a promotion with 20 feet? Lack of sensitivity of optimal display footage to transformation Log gives 2.77 feet (p 19-20), whereas reciprocal gives 2.57 feet (p 22) 95% confidence interval for the optimal footage (p 100). 95% CI for optimal under log model is [ 2.38, 3.17 ] Predictions out to 20 feet are very sensitive to transformation Prediction interval at 20 feet is far from range of data. Very sensitive: Log interval does not include reciprocal pred (p111) Management has to decide which model to use. Have we captured the true uncertainty?

10 Fall Semester, Key Take-Away Points Standard error again used in inference R 2 Confidence interval for the slope is (fitted slope) ± 2 SE(slope) As the sample size grows, the confidence interval shrinks. If 0 is outside the interval, the predictor is significant. Formula for standard error is different from that for an average depends on the variation in the predictor more variation in predictor, more precise slope as a measure of goodness of fit Square of usual correlation between predictor and response Popular as percentage of explained variation Prediction intervals used to measure accuracy of predictions Formed as predicted value (from equation) with margin for error set as plus or minus twice the SD of the residuals, (predicted value) +/ 2 RMSE (in-sample only!) RMSE provides baseline measure of predictive accuracy In-sample prediction vs. out-of-sample extrapolation Models are more reliable in-sample. Statistical extrapolation penalty is often too small because it assumes the model equation extends beyond the range of observation. Role of outliers Next Time Single leveraged observation can skew fit to rest of data. Outliers may be very informative; not a simple choice to delete them. Multiple regression Combining the information from several predictors to model more variation, the only way to more accurate predictions.

### Interpreting Multiple Regression

Fall Semester, 2001 Statistics 621 Lecture 5 Robert Stine 1 Preliminaries Interpreting Multiple Regression Project and assignments Hope to have some further information on project soon. Due date for Assignment

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

### Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Simple Linear Regression

Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

### SELF-TEST: SIMPLE REGRESSION

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

### 1. ε is normally distributed with a mean of 0 2. the variance, σ 2, is constant 3. All pairs of error terms are uncorrelated

STAT E-150 Statistical Methods Residual Analysis; Data Transformations The validity of the inference methods (hypothesis testing, confidence intervals, and prediction intervals) depends on the error term,

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Section I: Multiple Choice Select the best answer for each question.

Chapter 15 (Regression Inference) AP Statistics Practice Test (TPS- 4 p796) Section I: Multiple Choice Select the best answer for each question. 1. Which of the following is not one of the conditions that

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Lecture 18 Linear Regression

Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation - used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Simple Linear Regression

STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

### 12.1 Inference for Linear Regression

12.1 Inference for Linear Regression Least Squares Regression Line y = a + bx You might want to refresh your memory of LSR lines by reviewing Chapter 3! 1 Sample Distribution of b p740 Shape Center Spread

### Lecture 7 Linear Regression Diagnostics

Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Regression analysis in the Assistant fits a model with one continuous predictor and one continuous response and can fit two types of models:

This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. The simple regression procedure in the

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### Assessing Forecasting Error: The Prediction Interval. David Gerbing School of Business Administration Portland State University

Assessing Forecasting Error: The Prediction Interval David Gerbing School of Business Administration Portland State University November 27, 2015 Contents 1 Prediction Intervals 1 1.1 Modeling Error..................................

### where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Linear and Piecewise Linear Regressions

Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

### Assumptions in the Normal Linear Regression Model. A2: The error terms (and thus the Y s at each X) have constant variance.

Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and thus the Y s at each X) have constant variance. A3: The error terms are independent.

### E205 Final: Version B

Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

### Data and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10

Data and Regression Analysis Lecturer: Prof. Duane S. Boning Rev 10 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance (ANOVA) 2. Multivariate Analysis of Variance Model forms 3.

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

### Hence, multiplying by 12, the 95% interval for the hourly rate is (965, 1435)

Confidence Intervals for Poisson data For an observation from a Poisson distribution, we have σ 2 = λ. If we observe r events, then our estimate ˆλ = r : N(λ, λ) If r is bigger than 20, we can use this

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS

LESSON SEVEN CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS An interval estimate for μ of the form a margin of error would provide the user with a measure of the uncertainty associated with the point estimate.

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Using JMP with a Specific

1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

### Paired Differences and Regression

Paired Differences and Regression Students sometimes have difficulty distinguishing between paired data and independent samples when comparing two means. One can return to this topic after covering simple

### Coefficient of Determination

Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance

Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

### Lectures 4&5 Program. 1. Residuals and diagnostics. 2. Variable selection

Lectures 4&5 Program 1. Residuals and diagnostics 2. Variable selection 1 Assumptions for linear regression y i = η i + ε i i = 1, 2,..., n 1. Linearity: η i = β 0 + β 1 x i1 + + β p x ip 2. Constant variance

### Applying Statistics Recommended by Regulatory Documents

Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

### Multiple Regression in SPSS STAT 314

Multiple Regression in SPSS STAT 314 I. The accompanying data is on y = profit margin of savings and loan companies in a given year, x 1 = net revenues in that year, and x 2 = number of savings and loan

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### 0.1 Multiple Regression Models

0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### 31. SIMPLE LINEAR REGRESSION VI: LEVERAGE AND INFLUENCE

31. SIMPLE LINEAR REGRESSION VI: LEVERAGE AND INFLUENCE These topics are not covered in the text, but they are important. Leverage If the data set contains outliers, these can affect the leastsquares fit.

### Inference for Regression

Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about

### CHAPTER 2 AND 10: Least Squares Regression

CHAPTER 2 AND 0: Least Squares Regression In chapter 2 and 0 we will be looking at the relationship between two quantitative variables measured on the same individual. General Procedure:. Make a scatterplot

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Non-Linear Regression Analysis

Non-Linear Regression Analysis By Chanaka Kaluarachchi Presentation outline Linear regression Checking linear Assumptions Linear vs non-linear Non linear regression analysis Linear regression (reminder)

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### An example ANOVA situation. 1-Way ANOVA. Some notation for ANOVA. Are these differences significant? Example (Treating Blisters)

An example ANOVA situation Example (Treating Blisters) 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

### Chapter 8. Linear Regression. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 8 Linear Regression Copyright 2012, 2008, 2005 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### find confidence interval for a population mean when the population standard deviation is KNOWN Understand the new distribution the t-distribution

Section 8.3 1 Estimating a Population Mean Topics find confidence interval for a population mean when the population standard deviation is KNOWN find confidence interval for a population mean when the

### Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### Residuals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i - y i

A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M23-1 M23-2 Example 1: continued Case

### Financial Econometrics Jeffrey R. Russell Final Exam

Name Financial Econometrics Jeffrey R. Russell Final Exam You have 3 hours to complete the exam. Use can use a calculator. Try to fit all your work in the space provided. If you find you need more space

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### Practice 3 SPSS. Partially based on Notes from the University of Reading:

Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether

### Linear Regression with One Regressor

Linear Regression with One Regressor Michael Ash Lecture 10 Analogy to the Mean True parameter µ Y β 0 and β 1 Meaning Central tendency Intercept and slope E(Y ) E(Y X ) = β 0 + β 1 X Data Y i (X i, Y

### Lecture 5: Correlation and Linear Regression

Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### LEARNING OBJECTIVES SCALES OF MEASUREMENT: A REVIEW SCALES OF MEASUREMENT: A REVIEW DESCRIBING RESULTS DESCRIBING RESULTS 8/14/2016

UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION LEARNING OBJECTIVES Contrast three ways of describing results: Comparing group percentages Correlating scores Comparing group means Describe

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares

### SCATTER PLOTS AND TREND LINES

Name SCATTER PLOTS AND TREND LINES VERSION 2 Lessons 1 3 1. You have collected the following data while researching the Winter Olympics. You are trying to determine if there is a relationship between the

### Appendix C: Graphs. Vern Lindberg

Vern Lindberg 1 Making Graphs A picture is worth a thousand words. Graphical presentation of data is a vital tool in the sciences and engineering. Good graphs convey a great deal of information and can

### Two-Variable Regression: Interval Estimation and Hypothesis Testing

Two-Variable Regression: Interval Estimation and Hypothesis Testing Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Confidence Intervals & Hypothesis Testing

### Calculating Interval Forecasts

Calculating Chapter 7 (Chatfield) Monika Turyna & Thomas Hrdina Department of Economics, University of Vienna Summer Term 2009 Terminology An interval forecast consists of an upper and a lower limit between

### Chapter 2. Looking at Data: Relationships. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 2 Looking at Data: Relationships Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 2 Looking at Data: Relationships 2.1 Scatterplots

### Refer to Thinking with Mathematical Models pages 82 through 84 to answer the following questions:

Name: Date: Core: Problem 4.1 Virtuan Man: Relating Body Measurements Focus Question: If you have data relating two variables, how can you check to see whether a linear model is a good fit? Refer to pages

### Linear Model Selection and Regularization

Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

### AP Statistics Section :12.2 Transforming to Achieve Linearity

AP Statistics Section :12.2 Transforming to Achieve Linearity In Chapter 3, we learned how to analyze relationships between two quantitative variables that showed a linear pattern. When two-variable data

### Weighted least squares

Weighted least squares Patrick Breheny February 7 Patrick Breheny BST 760: Advanced Regression 1/17 Introduction Known weights As a precursor to fitting generalized linear models, let s first deal with

### Ch. 12 The Analysis of Variance. can be modelled as normal random variables. come from populations having the same variance.

Ch. 12 The Analysis of Variance Residual Analysis and Model Assessment Model Assumptions: The measurements X i,j can be modelled as normal random variables. are statistically independent. come from populations

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### t-tests and F-tests in regression

t-tests and F-tests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and F-tests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R

### Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### RNR / ENTO Assumptions for Simple Linear Regression

74 RNR / ENTO 63 --Assumptions for Simple Linear Regression Statistical statements (hypothesis tests and CI estimation) with least squares estimates depends on 4 assumptions:. Linearity of the mean responses

Lecture 11: Outliers and Influential data Regression III: Advanced Methods William G. Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Outlying Observations: Why pay attention?

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson

fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson Contents What Are These Demos About? How to Use These Demos If This Is Your First Time Using Fathom Tutorial: An Extended Example