# Yiming Peng, Department of Statistics. February 12, 2013

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013

2 2

3 Presentation and Data Short Courses Regression Analysis Using JMP Download Data to Desktop 3

4 Presentation Outline Simple Linear Regression Multiple Linear Regression Questions/Comments

5 Presentation Outline (if time permits) Regression with Binary Response Variables Individual Goals/Interests

6 1. Definition 2. Scatterplot and Correlation 3. Model and Estimation 4. Coefficient of Determination (R 2 ) 5. Assumptions 6. Caution 7. Example

7 Simple Linear Regression (SLR) is used to study the relationship between a variable of interest and another variable. Both variables must be quantitative (continuous) Variable of interest known as Response or Dependent Variable (Y) Other variable known as Explanatory or Independent Variable (X)

8 Objectives How is the response variable affected by changes in explanatory variable? We would like a numerical description of how both variables vary together. Determine the significance of the explanatory variable in explaining the variability in the response (not necessarily causation). Predict values of the response variable for given values of the explanatory variable.

9 Scatterplots are used to graphically examine the relationship between two quantitative variables. Student Beers BAC

10 After plotting two variables on a scatterplot, we describe the relationship by examining the form, direction, and strength of the association. We look for an overall pattern Form: linear, curved, clusters, no pattern Direction: positive, negative, no direction Strength: how closely the points fit the form and deviations from that pattern which cause significant changes in the direction of the overall pattern. Outliers

11 No Relationship Non-Linear Relationship Positive Linear Relationship Negative Linear Relationship Y Y Y

12 An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.

13 Correlation Measures the direction and strength of a linear relationship between two quantitative variables. Pearson Correlation Coefficient Assumption of normality Calculation: Spearman s Rho and Kendall s Tau are used for non-normal quantitative variables.

14 Properties of Pearson Correlation Coefficient -1 r 1 Positive values of r: as one variable increases, the other increases Negative values of r: as one variable increases, the other decreases Values close to 0 indicate no linear relationship between the two variables Values close to +1 or -1 indicated strong linear relationships r doesn t distinguish explanatory and response variables r has no unit Important note: Correlation does not imply causation

15 Pearson Correlation Coefficient: General Guidelines 0 r < 0.2 : Very Weak linear relationship 0.2 r < 0.4 : Weak linear relationship 0.4 r < 0.6 : Moderate linear relationship 0.6 r < 0.8 : Strong linear relationship 0.8 r < 1.0 : Very Strong linear relationship

16 The Simple Linear Regression Model Basic Model: response = deterministic + stochastic Deterministic: model of the linear relationship between X and Y Stochastic: Variation, uncertainty, and miscellaneous factors Model y i = value of the response variable for the i th observation x i = value of the explanatory variable for the i th observation β 0 = y-intercept β 1 = slope ε i = random error, iid Normal(0,σ 2 )

17 But which line best describes our data?

18 The least-squares regression line is the unique line such that the sum of the total vertical (y) distances is zero and sum of the squared vertical (y) distances between the data points and the line is the smallest possible. Distances between the points and line are squared so all are positive values. This is done so that distances can be properly added.

19 Least Square Estimation Predicted Values Residuals The distinction between explanatory and response variables is essential in regression

20 Interpretation of Parameters β 0 : Value of Y when X=0 β 1 : Change in the value of Y with an increase of 1 unit of X (also known as the slope of the line) Hypothesis Testing β 0 - Test whether the true y-intercept is different from 0 Null Hypothesis: β 0 =0 Alternative Hypothesis: β 0 0 β 1 - Test whether the slope is different from 0 Null Hypothesis: β 1 =0 Alternative Hypothesis: β 1 0

21 Analysis of Variance (ANOVA) for Simple Linear Regression Source Df Sum of Squares Mean Square F Ratio P-value Model 1 SSR SSR/1 F 1 =MSR/MSE P(F>F 1,1-α,1,n-2) Error n-2 SSE SSE/(n-2) Total n-1 SST

22

23 Coefficient of Determination (R 2 ) Percent variation in the response variable (Y) that is explained by the least squares regression line 0 R 2 1 Calculation: Prediction

24 Assumptions of Simple Linear Regression 1. Independence Residuals are independent of each other Related to the method in which the data were collected or time related data Tested by plotting time collected vs. residuals Parametric test: Durbin-Watson Test 2. Constant Variance Variance of the residuals is constant Tested by plotting predicted values vs. residuals Parametric test: Brown-Forsythe Test

25 Assumptions of Simple Linear Regression 3. Normality Residuals are normally distributed Tested by evaluating histograms and normal-quantile plots of residuals Parametric test: Shapiro Wilkes test

26 Constant Variance: Plot of Fitted Values vs. Residuals Good Residual Plot: No Pattern Bad Residual Plot: Variability Increasing Predicted Values Predicted Values

27 Normality: Histogram and Q-Q Plot of Residuals Normal Assumption Appropriate Normal Assumption Not Appropriate

28 Some Remedies Non-Constant Variance: Weight Least Squares Non-normality: Box-Cox Transformation Dependence: Auto-Regressive Models

29 Do not use a regression on inappropriate data. Pattern in the residuals Presence of large outliers Use residual plots for help. Clumped data falsely appearing linear Recognize when the correlation/regression is performed on averages. A relationship, however strong, does not itself imply causation. Beware of lurking variables.

30 A lurking variable is a variable not included in the study design that does or may have an effect on the variables studied. Lurking variables can falsely suggest a relationship. What is the lurking variable here? Some more obvious than others. Strong positive association between the number firefighters at a fire site and the amount of damage a fire does. Negative association between moderate amounts of wine-drinking and death rates from heart disease in developed nations.

31 Example Dataset: Fitness A researcher was interested in the relationship between oxygen uptake and a number of potential explanatory variables separately, including age, weight, running time, running pulse rate, rest pulse rate, and max pulse rate. Filename: Fitness0.jmp (JMP sample data)

33 Multiple Linear Regression 1. Definition 2. Categorical Explanatory Variables 3. Model and Estimation 4. Adjusted Coefficient of Determination 5. Assumptions 6. Model Selection 7. Example

34 Multiple Linear Regression Explanatory Variables Two Types: Continuous and Categorical Continuous Predictor Variables Examples Time, Grade Point Average, Test Score, etc. Coded with one parameter β # x # Categorical Predictor Variables Examples Sex, Political Affiliation, Marital Status, etc. Actual value assigned to Category not important Ex) Sex - Male/Female, M/F, 1/2, 0/1, etc. Coded Differently than continuous variables

35 Multiple Linear Regression Similar to simple linear regression, except now there is more than one explanatory variable, which may be continuous and/or categorical. Model y i = value of the response variable for the i th observation x #i = value of the explanatory variable # for the i th observation β 0 = y-intercept β # = parameter corresponding to explanatory variable # ε i = random error, iid Normal(0,σ 2 )

36 Multiple Linear Regression Least Square Estimation Predicted Values Residuals

37 Multiple Linear Regression Interpretation of Parameters β 0 : Value of Y when X=0 Β # : Change in the value of Y with an increase of 1 unit of X # in the presence of the other explanatory variables Hypothesis Testing β 0 - Test whether the true y-intercept is different from 0 Β # - Null Hypothesis: β 0 =0 Alternative Hypothesis: β 0 0 Test of whether the value change in Y with an increase of 1 unit in X# is different from 0 in the presence of the other explanatory variables. Null Hypothesis: β # =0 Alternative Hypothesis: β # 0

38 Multiple Linear Regression Adjusted Coefficient of Determination (ADJ R 2 ) Percent variation in the response variable (Y) that is explained by the least squares regression line with explanatory variables x 1, x 2,,x p Calculation of R 2 : The R 2 value will increase as explanatory variables added to the model The adjusted R 2 introduces a penalty for the number of explanatory variables.

39 Multiple Linear Regression Other Model Evaluation Statistics Akaike Information Criterion (AIC or AICc) Schwartz Information Criterion (SIC) Bayesian Information Criterion (BIC) Mallows C p Prediction Sum of Squares (PRESS)

40 Multiple Linear Regression Model Selection 2 Goals: Complex enough to fit the data well Simple to interpret, does not overfit the data Study the effect of each explanatory variable on the response Y Continuous Variable Graph Y versus X Categorical Variable - Boxplot of Y for categories of X

41 Multiple Linear Regression Model Selection cont. Multicollinearity Correlations among explanatory variables resulting in an increase in variance Reduces the significance value of the variable Occurs when several explanatory variables are used in the model Diagnostic Variance Inflation Factor (VIF) Correlation Matrix

42 Multiple Linear Regression Algorithmic Model Selection Backward Selection: Start with all explanatory variables in the model and remove those that are insignificant Forward Selection: Stepwise Selection: Start with no explanatory variables in the model and add best explanatory variables one at a time Start with two forward selection steps then alternate backward and forward selection steps until no variables to add or remove

43 Multiple Linear Regression Example Dataset: Fitness A researcher was interested in the relationship between oxygen uptake and a number of potential explanatory variables together, including age, weight, running time, running pulse rate, rest pulse rate, and max pulse rate. Filename: Fitness0.jmp (JMP sample data)

44 Multiple Linear Regression Other Multiple Linear Regression Issues Outliers Interaction Terms Higher Order Terms

46 Regression with Non-Normal Response Logistic Regression with Binary Response

47 Logistic Regression Consider a binary response variable. Variable with two outcomes One outcome represented by a 1 and the other represented by a 0 Examples: Does the person have a disease? Yes or No Who is the person voting for? Outcome of a baseball game? Romney or Obama Win or loss

48 Logistic Regression Consider the linear probability model y i = β 0 + β 1 x i where y i = response for observation i x i = quantitative explanatory variable Predicted values represent the probability of Y=1 given X Issue: Predicted probability for some subjects fall outside of the [0,1] range.

49 Logistic Regression Consider the logistic regression model [ ] = P(Y i =1 x i ) = π(x i ) = exp ( β 0 + β 1 x i ) 1+ exp( β 0 + β 1 x i ) E Y i ( xi ) ( x ) π logit[ π ( xi )] = log = β0 + β1x 1 π i Predicted values from the regression equation fall between 0 and 1 i

50 Logistic Regression Interpretation of Coefficient β Odds Ratio The odds ratio is a statistic that measures the odds of an event compared to the odds of another event. Say the probability of Event 1 is π 1 and the probability of Event 2 is π 2. Then the odds ratio of Event 1 to Event 2 is: Odds _ Ratio Odds( π1) = Odds ( π ) 2 = π1 1 π1 π 2 1 π 2 Value of Odds Ratio range from 0 to Infinity Value between 0 and 1 indicate the odds of Event 2 are greater Value between 1 and infinity indicate odds of Event 1 are greater Value equal to 1 indicates events are equally likely

51 Logistic Regression Example Dataset: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as a function of their baseline APACHE II Score. Patients are coded as 1 or 0 depending on whether they are dead or alive in 30 days, respectively. We wish to predict death from baseline APACHE II score in these patients. Filename: APACHE.jmp Important Note: JMP models the probability of the 0 category

52 Thank you!

### Using JMP with a Specific

1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Statistical Modeling Using SAS

Statistical Modeling Using SAS Xiangming Fang Department of Biostatistics East Carolina University SAS Code Workshop Series 2012 Xiangming Fang (Department of Biostatistics) Statistical Modeling Using

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### The scatterplot indicates a positive linear relationship between waist size and body fat percentage:

STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### AMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015

AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### SELF-TEST: SIMPLE REGRESSION

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### 11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables.

Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

### 0.1 Multiple Regression Models

0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

### 7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Module 3: Multiple Regression Concepts

Contents Module 3: Multiple Regression Concepts Fiona Steele 1 Centre for Multilevel Modelling...4 What is Multiple Regression?... 4 Motivation... 4 Conditioning... 4 Data for multiple regression analysis...

### New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### Multiple Regression in SPSS STAT 314

Multiple Regression in SPSS STAT 314 I. The accompanying data is on y = profit margin of savings and loan companies in a given year, x 1 = net revenues in that year, and x 2 = number of savings and loan

### Class 6: Chapter 12. Key Ideas. Explanatory Design. Correlational Designs

Class 6: Chapter 12 Correlational Designs l 1 Key Ideas Explanatory and predictor designs Characteristics of correlational research Scatterplots and calculating associations Steps in conducting a correlational

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### AP * Statistics Review. Linear Regression

AP * Statistics Review Linear Regression Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares

### How to interpret scientific & statistical graphs

How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott 1 A brief introduction Graphics:

### STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Chapter 2: Looking at Data Relationships (Part 1)

Chapter 2: Looking at Data Relationships (Part 1) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way

### Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

### Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

### Practice 3 SPSS. Partially based on Notes from the University of Reading:

Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

### Logistic regression: Model selection

Logistic regression: April 14 The WCGS data Measures of predictive power Today we will look at issues of model selection and measuring the predictive power of a model in logistic regression Our data set

### By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 2307 Old Cafeteria Complex 2 When want to predict one variable from a combination of several variables. When want

### ANNOTATED OUTPUT--SPSS Simple Linear (OLS) Regression

Simple Linear (OLS) Regression Regression is a method for studying the relationship of a dependent variable and one or more independent variables. Simple Linear Regression tells you the amount of variance

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation

Multiple Regression - Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent

### ST 311 Evening Problem Session Solutions Week 11

1. p. 175, Question 32 (Modules 10.1-10.4) [Learning Objectives J1, J3, J9, J11-14, J17] Since 1980, average mortgage rates have fluctuated from a low of under 6% to a high of over 14%. Is there a relationship

### What is correlational research?

Key Ideas Purpose and use of correlational designs How correlational research developed Types of correlational designs Key characteristics of correlational designs Procedures used in correlational studies

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Lecture 18 Linear Regression

Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation - used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation

### Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two

Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### Algebra I: Lesson 5-4 (5074) SAS Curriculum Pathways

Two-Variable Quantitative Data: Lesson Summary with Examples Bivariate data involves two quantitative variables and deals with relationships between those variables. By plotting bivariate data as ordered

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### Simple Linear Regression

Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

### Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

### The Basics of Regression Analysis. for TIPPS. Lehana Thabane. What does correlation measure? Correlation is a measure of strength, not causation!

The Purpose of Regression Modeling The Basics of Regression Analysis for TIPPS Lehana Thabane To verify the association or relationship between a single variable and one or more explanatory One explanatory

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### Residuals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i - y i

A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M23-1 M23-2 Example 1: continued Case

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Homework 8 Solutions

Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

### In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

### Homework 11. Part 1. Name: Score: / null

Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

### Study Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables

Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written

### The Statistics Tutor s Quick Guide to

statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcp-marshallowen-7