TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics"

Transcription

1 UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002 Multiple Linear Regression Prof Haslett Date Venue Time Instructions to Candidates: Answer all questions.. All carry equal marks.in all questions, extra marks will be awarded for imaginative answers, including those that go beyond the question as posed when explaining and illustrating the ideas discussed Materials permitted for this examination: Calculator, Log tables Materials omitted from the front page of an examination paper will not be permitted during an examination Questions should start on page 2 only Page 1 of 13

2 Q1 Write short notes on EIGHT of the following topics. You should illustrate your notes by referring to examples. You may draw on other questions in this exam paper or on examples discussed in class. However additional credit will be given for the use of other examples. All topics carry equal marks a) Lessons from my project b) Objectives in regression c) Linear regression does not necessarily mean straight lines d) The role of the Normal distribution in regression e) Transformations f) Variance Inflation Factors g) Critical analysis of rows and columns/cases and variables h) Extra and Sequential Sums of Squares i) Sampling Distributions and Standard Errors in Regression j) Interactions Q2 In a clinical experiment, 64 patients (31/33, Male/Female) were administered a drug. The response (mg/l, based on a blood analysis 12 hours later) was noted. Three drug levels were used (200,400,600); the gender and weights (kg) of the patients were noted. (Naturally, the average Male/Female weights differed.). Several analyses are reported as below. a) In a pair of preliminary analyses, the response by gender was analysed, as overleaf (Q2A) Explain how these two analyses should be interpreted. (6 marks) b) Two further simple analyses are reported (Q2B) Interpret the analyses. Explain the different interpretations of the Confidence and Prediction intervals. (6 marks) Page 2 of 13

3 resp resp resp XST7002 Q2A Two-Sample T-Test: resp, Gender Gender N Mean StDev SE Mean Regression of Response on Gender. M=1; F=0 Difference = mu (0) - mu (1) Estimate for difference: % CI for difference: (0.074, 0.660) T-Test of difference = 0 (vs not =): T-Value = 2.51 P-Value = DF = Resp vs Gender M,F/1,0 resp = Gender S R-Sq 9.2% R-Sq(adj) 7.7% 2.0 Regression Analysis: resp versus Gender resp = Gender Gender Predictor Coef SE Coef T P Constant Gender S = R-Sq = 9.2 Q2B Regression Analysis: resp versus dose resp = dose Predictor Coef SE Coef T P Constant dose S = R-Sq = 73.7% Regression Analysis: resp versus wt resp = wt Predictor Coef SE Coef T P Constant wt S = R-Sq = 18.3% Resp vs Dose resp = dose Regression 95% CI 95% PI S R-Sq 73.7% R-Sq(adj) 73.2% Resp vs Wt resp = wt Regression 95% CI 95% PI S R-Sq 18.3% R-Sq(adj) 16.9% dose wt Q2 Continues Page 3 of 13

4 c) Two multiple regression analyses are presented below Q2C The researcher is puzzled by their different interpretations as regards the apparent importance of dosage. The dose/wt variable is a derived variable, being the ratio of dose to weight. Provide her with an explanation. Use this to discuss ideas of correlated predictor variables and of direct and indirect relationships. d) A further derived variable, (Gender*dose/wt) is created; this is the simple (8 marks) product of the binary Gender variable by dose/wt. The resulting regression is in Q2D. What is the purpose of including such a variable in such a regression analysis? What is the interpretation in this case? Contrast with the analysis above. Illustrate your discussion rough sketches of the two simple regression lines (resp vs dose/wt) that are implicit in this model. (8 marks) e) A final analysis leads to Q2E. Discuss the interpretation. Would you propose further analyses? Q2C Regression Analysis: resp versus wt, Gender, dose resp = wt Gender dose Predictor Coef SE Coef T P Constant wt Gender dose S = R-Sq = 89.6% Regression Analysis: resp versus wt, Gender, dose, dose/wt resp = wt Gender dose dose/wt Predictor Coef SE Coef T P VIF Constant wt Gender dose dose/wt (5 marks) S = R-Sq = 90.6% Q2 Continues Page 4 of 13

5 Q2D Regression Analysis: resp versus Gender, dose/wt, Gender*dose/wt resp = Gender dose/wt Gender*dose/wt Predictor Coef SE Coef T P Constant Gender dose/wt Gender*dose/wt S = R-Sq = 89.2% R-Sq(adj) = 88.7% Q2E Regression Analysis: resp versus dose/wt, Gender, wt, dose, Gender*dose/ resp = dose/wt Gender wt dose Gender*dose/wt Predictor Coef SE Coef T P VIF Constant dose/wt Gender wt dose Gender*dose/wt S = R-Sq = 90.6% R-Sq(adj) = 89.8% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS dose/wt Gender wt dose Gender*dose/wt Page 5 of 13

6 Q3 Data are available on the Weight (gms) and physical dimensions Length, Width and Height (cms) of 56 perch. All are caught from the same lake (Laengelmavesi) near Tampere in Finland. A matrix plot and various analyses are presented below. The interest lies in relating the dimensions to the weight. Matrix Plot of Weight, Length, Ht, width Weight Length Ht width a) It is immediately apparent that separate linear regressions of Weight on Length, Width and Height will encounter difficulties. Discuss. (6 marks) b) All variables were log transformed; the resulting multiple regression analysis as in Q3A overleaf. Discuss the various aspects of this transformation and subsequent analysis. What are the implications of the VIF values? (6 marks) c) In an attempt at a simpler model, the derived variable Vol=Length Height Width was formed. The Fitted Line plot in C overleaf summarises the analysis; the SE for the slope is returned as Explain the features of this analysis and plot. (8 marks) d) Use the models in (b) and (c) above, to compute approximate 95% Prediction Intervals for the Weights of two fish with dimensions (Length, Height and Width) being respectively (14.7, 3.5 and 2.0) and (45.2, 11.9 and 7.3). Explain carefully the basis, in the fitted models, for your calculations. (8 marks) e) The slope SE is What are the implications for possible further simplification? (3 marks) f) It is remarked that although the last model (c) provides an excellent and simple fit, its interpretation differs to an extent that seems to be statistically significant - from the details of the model fitted in (b). Discuss. (2 marks) Page 6 of 13

7 Frequency Deleted Residual Percent Deleted Residual XST7002 Q3A Regression Analysis: logwt versus loglen, loght, logwidth logwt = loglen loght logwidth Predictor Coef SE Coef T P VIF Constant loglen loght logwidth S = R-Sq = 99.4% R-Sq(adj) = 99.4% Unusual Observations Obs loglen logwt Fit SE Fit Residual St Resid X R R R R X Q3B R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. 99 Normal Probability Plot Residual Plots for logwt Versus Fits Deleted Residual Fitted Value Histogram Versus Order Q3C Deleted Residual Observation Order Page 7 of 13

8 Weight XST Weight vs Vol log10(weight) = log10(vol) Regression 95% PI S R-Sq 99.3% R-Sq(adj) 99.3% Vol Page 8 of 13

9 Outline Solution Q2 a) The two analyses are equivalent, though differently packaged. The T-test reports that the observed means and mean difference in response are 1.611, and The regression reports the same info (to within rounding error): when Gender =0 (F) the regression reports the expected value to be 1.61; when Gender =1 (M) the regression reports = The regression model regards Gender and as an Indicator variable; the plot which interpolates to other values of Gender is not interpretable. Treating the remaining variation (other than due to Gender) can be regarded as random, both find that the T-ratio for the difference is Both report that this is statistically significant. b) Resp vs dose reports in increase in the average increase in response of per unit of dose; this is =0.74 per 200 units. If the remaining error can be treated as random, this is hugely statistically significant. Res vs wt shows an apparent reduction in response associated with weight, also statistically significant. Anticipating later analyses, response is often more naturally sensitive to dose per unit weight; the latter analysis is consistent with this. The prediction intervals as shown are effectively descriptive. Most of the data lie within these. The Confidence Intervals qualify statements about the mean response (over very many patients with specified dose or weight. One interpretation is that regression lines that are statistically consistent with the data must lie within the CI band. Page 9 of 13

10 c) The first analysis suggests that weight, dose and gender are all important predictors of response. The second suggests that, when dose/wt is included as a predictor, neither dose nor weight contribute much Dose extra information. We already know that weight and gender and interrelated. And by construction Dose/wt Weight does/wt is correlated to dose and Gender to weight. The apparent confusion arises frequently when the x-variable (predictor variables) are themselves inter-related. The slope coefficients are in such cases - not simply interpretable in terms of the corresponding bivariate correlations. Resp The diagram not a requirement, but worthy of marks if offered provides one way to envision a possible set of direct and indirect relationships with response. The concept of direct and indirect relationships has been discussed in class. d) The new derived variable simplifies the direct consideration and comparison of two simple models: Resp vs dose/wt separately for M/F. Separately these may be written as resp = int cpt + slope (dose/wt), with potentially different values for each for M/F. This can also be a way to investigate interaction. Here F dose/wt M ( ) + ( ) dose/wt Lines, when sketched, are effectively parallel and have almost same slope). Since the values and are small compared to SE, via T = 0.58, T = 0.00 we can conclude that a single regression relationship, for both M and F, is likely to be adequate. This in turn suggest that the Weight and Gender terms in the second model in c) may not be simply interpreted as suggested there as individually necessary. For the Gender term is correlated with Weight. Perhaps the inclusion of Weight requires the inclusion of Gender to counter-balance it. e) The analysis confirms that the important variable is dose/wt. No other variables are significant. However, the very high VIF values for dose and dose/wt point to the fact that these variables are (naturally) highly interdependent. One of these is likely to be to most important. The choice should be guided by the considerations of the way the drug interacts, biochemically, with the patient. Page 10 of 13

11 Q3 a) It is clear that the bivariate relationships (top row) are not linear. Additionally, there is clear evidence of variance of weight increasing with weight. It is also the case that there is a great deal of correlation between the 3 x variables, likely to cause problems if they are ever used together. b) The model in the log scale shows that all vars are very significantly different from 0; tho that was never in doubt. R 2 is high. This can also be written as Wt = Len 1.65 Ht 0.81 Width error. The nominal interpretation is that an increase of 1 in eg loglen ( ie an increase of 10- in Len) will induce - on average an increase of 1.65 in logwt (ie =45-fold in Wt) if all other variables are held constant. But The VIF values suggest that the covariates are correlated- as anticipated and that the SEs are therefore inflated. This was apparent also in the scatterplots. Effectively this means that some fish are large in respect of all three dimensions, and some are small. In these circumstances one option is to choose a single composite that reflects all of the variables. It is likely to be futile to choose one of them. There are however a number of unusual observations. 4 of these exhibit large residuals, which merit attention. Three are very large and positive, and one is large and negative. Two are influential, being far from the others in respect of (log)len, Ht, Width. There is nothing wrong with this, necessarily. c) The option followed was to choose a product named Vol. The Fitted Line plot has fitted Weight to Vol, in the log scale, presenting the analysis in the back-transformed anti-log scale. Alternatively as Log(Vol) = Log(Length)+ Log(Height) + Log(Width), the composite Log(Vol) variable is the sum of the three Log(covariates). The fitted model can be written as Wt = Vol error = 0.3 Vol error = 0.3 Len 0.98 Ht 0.98 Width error. An increase of 1 in LogVol (ie a 10-fold increase) will generate, on average a fold (ie 9.55-fold) increase in Wt, on average. The constant 0.3 could be thought of as fish-density, were fish to be cuboid. As it is, it is a combination of fish density and the ratio of actual fish volume to the volume of the corresponding cuboid. When Vol 0.98 is large (ie when Wt is large) the absolute errors implicit in a 10 error -fold variation are large.this exhibits the fan-like figure for the prediction intervals Predicting the weight if a large fish is harder than that of a small fish in absolute terms. The issue is equivalent to describing the prediction error in %age terms. Page 11 of 13

12 The R2 value is almost as high as in b). The value of s is This is in the logscale and compares with the value of above. The model is not quite as tight-fitting, but it is simpler. No info is available on unusual obs, or SEs. d) The two prediction equations are: (as in b) Pred logwt = Log(len) log(ht)m Log(Width) 2(0.037) (as in c) Pred logwt = Log(len*Ht*Width) 2(0.040), back transformed as antilog( Pred log(wt)) as below Fish dimensions Fish len ht width Vol log 10 dimensions Log(vol) sum log model b coeffs model c coeff const len ht width s const vol s Fish Pred LogWt lo hi Pred LogWt lo hi backtransform backtransform Conclusions: very similar See f) below Model c has a slope is ( 0.011). This includes coeff=1. That is, the data are statistically consistent with 1; a Null Hyp: slope =1 would not be rejected. A simpler version of this model would then be LogWt = const + Log Vol. Equiv this is Wt=0.3 Vol. This model is not unlike the Tree model discussed in class. Note that the SE (0.011) is very much smaller than the SE s for each of the dimensions in model b). That s because their SE s have been inflated by, effectively, the lack of determinancy of the separate coeffs. Note however, that the correlation between these dimensions has no implications at all for the usefulness of the prediction equation generated by model b). It is simply the case that many different combinations of these coefficients are effectively equivalent to each other. e) However, model (c) corresponds to giving coefficients of 1 to each (log) dimension. This is just about consistent with the fit for LogHt ( (0.21) it is not as consistent with LogLen (1.65 2(0.22) and LogWidth (0.55 2(0.18). The implications are that Fish that are very long will be given low values of log Wt in model c and Fish that are very wide will be given high values of Page 12 of 13

13 LogWt in model c. Fish are not cuboids. However it is moot whether the data would require a rejection (in model b) of the Null Hyp that all coefficients were equal to 1. XST7002 Page 13 of 13

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Residuals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i - y i

Residuals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i - y i A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M23-1 M23-2 Example 1: continued Case

More information

31. SIMPLE LINEAR REGRESSION VI: LEVERAGE AND INFLUENCE

31. SIMPLE LINEAR REGRESSION VI: LEVERAGE AND INFLUENCE 31. SIMPLE LINEAR REGRESSION VI: LEVERAGE AND INFLUENCE These topics are not covered in the text, but they are important. Leverage If the data set contains outliers, these can affect the leastsquares fit.

More information

Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

More information

Multiple Regression Analysis in Minitab 1

Multiple Regression Analysis in Minitab 1 Multiple Regression Analysis in Minitab 1 Suppose we are interested in how the exercise and body mass index affect the blood pressure. A random sample of 10 males 50 years of age is selected and their

More information

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance 1. Introduction The Analysis of Covariance (generally known as ANCOVA) is a technique that sits between analysis of variance and regression analysis. It has a number of purposes

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Paired Differences and Regression

Paired Differences and Regression Paired Differences and Regression Students sometimes have difficulty distinguishing between paired data and independent samples when comparing two means. One can return to this topic after covering simple

More information

In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Simple Linear Regression

Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

More information

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients ( Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

More information

Name: Student ID#: Serial #:

Name: Student ID#: Serial #: STAT 22 Business Statistics II- Term3 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS Department Of Mathematics & Statistics DHAHRAN, SAUDI ARABIA STAT 22: BUSINESS STATISTICS II Third Exam July, 202 9:20

More information

For example, enter the following data in three COLUMNS in a new View window.

For example, enter the following data in three COLUMNS in a new View window. Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

More information

Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2

Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2 Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Do the following using Mintab (1) Make a normal probability plot for each of the two curing times.

Do the following using Mintab (1) Make a normal probability plot for each of the two curing times. SMAM 314 Computer Assignment 4 1. An experiment was performed to determine the effect of curing time on the comprehensive strength of concrete blocks. Two independent random samples of 14 blocks were prepared

More information

UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING MULTIPLE REGRESSION UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis. Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

More information

Multiple Regression in SPSS STAT 314

Multiple Regression in SPSS STAT 314 Multiple Regression in SPSS STAT 314 I. The accompanying data is on y = profit margin of savings and loan companies in a given year, x 1 = net revenues in that year, and x 2 = number of savings and loan

More information

Chapter 11: Two Variable Regression Analysis

Chapter 11: Two Variable Regression Analysis Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

More information

A. Karpinski

A. Karpinski Chapter 3 Multiple Linear Regression Page 1. Overview of multiple regression 3-2 2. Considering relationships among variables 3-3 3. Extending the simple regression model to multiple predictors 3-4 4.

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Assumptions in the Normal Linear Regression Model. A2: The error terms (and thus the Y s at each X) have constant variance.

Assumptions in the Normal Linear Regression Model. A2: The error terms (and thus the Y s at each X) have constant variance. Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and thus the Y s at each X) have constant variance. A3: The error terms are independent.

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

More information

Chapter 15 Multiple Regression

Chapter 15 Multiple Regression Multiple Regression Learning Objectives 1. Understand how multiple regression analysis can be used to develop relationships involving one dependent variable and several independent variables. 2. Be able

More information

psyc3010 lecture 8 standard and hierarchical multiple regression last week: correlation and regression Next week: moderated regression

psyc3010 lecture 8 standard and hierarchical multiple regression last week: correlation and regression Next week: moderated regression psyc3010 lecture 8 standard and hierarchical multiple regression last week: correlation and regression Next week: moderated regression 1 last week this week last week we revised correlation & regression

More information

c 2015, Jeffrey S. Simonoff 1

c 2015, Jeffrey S. Simonoff 1 Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

BIOSTATISTICS QUIZ ANSWERS

BIOSTATISTICS QUIZ ANSWERS BIOSTATISTICS QUIZ ANSWERS 1. When you read scientific literature, do you know whether the statistical tests that were used were appropriate and why they were used? a. Always b. Mostly c. Rarely d. Never

More information

Solution Let us regress percentage of games versus total payroll.

Solution Let us regress percentage of games versus total payroll. Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

More information

Perform hypothesis testing

Perform hypothesis testing Multivariate hypothesis tests for fixed effects Testing homogeneity of level-1 variances In the following sections, we use the model displayed in the figure below to illustrate the hypothesis tests. Partial

More information

Questions and Answers on Hypothesis Testing and Confidence Intervals

Questions and Answers on Hypothesis Testing and Confidence Intervals Questions and Answers on Hypothesis Testing and Confidence Intervals L. Magee Fall, 2008 1. Using 25 observations and 5 regressors, including the constant term, a researcher estimates a linear regression

More information

AP * Statistics Review. Linear Regression

AP * Statistics Review. Linear Regression AP * Statistics Review Linear Regression Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How

More information

An example ANOVA situation. 1-Way ANOVA. Some notation for ANOVA. Are these differences significant? Example (Treating Blisters)

An example ANOVA situation. 1-Way ANOVA. Some notation for ANOVA. Are these differences significant? Example (Treating Blisters) An example ANOVA situation Example (Treating Blisters) 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Relationship of two variables

Relationship of two variables Relationship of two variables A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. Scatter Plot (or Scatter Diagram) A plot

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Interpreting Multiple Regression

Interpreting Multiple Regression Fall Semester, 2001 Statistics 621 Lecture 5 Robert Stine 1 Preliminaries Interpreting Multiple Regression Project and assignments Hope to have some further information on project soon. Due date for Assignment

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

AP Statistics Solutions to Packet 14

AP Statistics Solutions to Packet 14 AP Statistics Solutions to Packet 4 Inference for Regression Inference about the Model Predictions and Conditions HW #,, 6, 7 4. AN ETINCT BEAST, I Archaeopteryx is an extinct beast having feathers like

More information

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

More information

RNR / ENTO Assumptions for Simple Linear Regression

RNR / ENTO Assumptions for Simple Linear Regression 74 RNR / ENTO 63 --Assumptions for Simple Linear Regression Statistical statements (hypothesis tests and CI estimation) with least squares estimates depends on 4 assumptions:. Linearity of the mean responses

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

A correlation exists between two variables when one of them is related to the other in some way.

A correlation exists between two variables when one of them is related to the other in some way. Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether

More information

Analyzing Linear Relationships, Two or More Variables

Analyzing Linear Relationships, Two or More Variables PART V ANALYZING RELATIONSHIPS CHAPTER 14 Analyzing Linear Relationships, Two or More Variables INTRODUCTION In the previous chapter, we introduced Kate Cameron, the owner of Woodbon, a company that produces

More information

Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January Use the University stationery to give your answers to the following questions. Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

More information

Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

More information

AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

More information

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p. Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

More information

CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE

CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE A. K. Gupta, Vipul Sharma and M. Manoj NDRI, Karnal-132001 When analyzing farm records, simple descriptive statistics can reveal a

More information

Section I: Multiple Choice Select the best answer for each question.

Section I: Multiple Choice Select the best answer for each question. Chapter 15 (Regression Inference) AP Statistics Practice Test (TPS- 4 p796) Section I: Multiple Choice Select the best answer for each question. 1. Which of the following is not one of the conditions that

More information

Yiming Peng, Department of Statistics. February 12, 2013

Yiming Peng, Department of Statistics. February 12, 2013 Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

More information

Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two

Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship

More information

The scatterplot indicates a positive linear relationship between waist size and body fat percentage:

The scatterplot indicates a positive linear relationship between waist size and body fat percentage: STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

2011 # AP Exam Solutions 2011 # # # #1 5/16/2011

2011 # AP Exam Solutions 2011 # # # #1 5/16/2011 2011 AP Exam Solutions 1. A professional sports team evaluates potential players for a certain position based on two main characteristics, speed and strength. (a) Speed is measured by the time required

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Data and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10

Data and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10 Data and Regression Analysis Lecturer: Prof. Duane S. Boning Rev 10 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance (ANOVA) 2. Multivariate Analysis of Variance Model forms 3.

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Module 3: Multiple Regression Concepts

Module 3: Multiple Regression Concepts Contents Module 3: Multiple Regression Concepts Fiona Steele 1 Centre for Multilevel Modelling...4 What is Multiple Regression?... 4 Motivation... 4 Conditioning... 4 Data for multiple regression analysis...

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Chapter 10 Correlation and Regression. Overview. Section 10-2 Correlation Key Concept. Definition. Definition. Exploring the Data

Chapter 10 Correlation and Regression. Overview. Section 10-2 Correlation Key Concept. Definition. Definition. Exploring the Data Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10- Regression Overview This chapter introduces important methods for making inferences about a correlation (or relationship) between

More information

SELF-TEST: SIMPLE REGRESSION

SELF-TEST: SIMPLE REGRESSION ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

0.1 Multiple Regression Models

0.1 Multiple Regression Models 0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

More information

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results. BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

More information

ANOVA MULTIPLE CHOICE QUESTIONS. In the following multiple-choice questions, select the best answer.

ANOVA MULTIPLE CHOICE QUESTIONS. In the following multiple-choice questions, select the best answer. ANOVA MULTIPLE CHOICE QUESTIONS In the following multiple-choice questions, select the best answer. 1. Analysis of variance is a statistical method of comparing the of several populations. a. standard

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Inference for Regression

Inference for Regression Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Prediction and Confidence Intervals in Regression

Prediction and Confidence Intervals in Regression Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus.

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

Versions 1a Page 1 of 17

Versions 1a Page 1 of 17 Note to Students: This practice exam is intended to give you an idea of the type of questions the instructor asks and the approximate length of the exam. It does NOT indicate the exact questions or the

More information