TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Size: px
Start display at page:

Download "TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics"

Transcription

1 UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002 Multiple Linear Regression Prof Haslett Date Venue Time Instructions to Candidates: Answer all questions.. All carry equal marks.in all questions, extra marks will be awarded for imaginative answers, including those that go beyond the question as posed when explaining and illustrating the ideas discussed Materials permitted for this examination: Calculator, Log tables Materials omitted from the front page of an examination paper will not be permitted during an examination Questions should start on page 2 only Page 1 of 13

2 Q1 Write short notes on EIGHT of the following topics. You should illustrate your notes by referring to examples. You may draw on other questions in this exam paper or on examples discussed in class. However additional credit will be given for the use of other examples. All topics carry equal marks a) Lessons from my project b) Objectives in regression c) Linear regression does not necessarily mean straight lines d) The role of the Normal distribution in regression e) Transformations f) Variance Inflation Factors g) Critical analysis of rows and columns/cases and variables h) Extra and Sequential Sums of Squares i) Sampling Distributions and Standard Errors in Regression j) Interactions Q2 In a clinical experiment, 64 patients (31/33, Male/Female) were administered a drug. The response (mg/l, based on a blood analysis 12 hours later) was noted. Three drug levels were used (200,400,600); the gender and weights (kg) of the patients were noted. (Naturally, the average Male/Female weights differed.). Several analyses are reported as below. a) In a pair of preliminary analyses, the response by gender was analysed, as overleaf (Q2A) Explain how these two analyses should be interpreted. (6 marks) b) Two further simple analyses are reported (Q2B) Interpret the analyses. Explain the different interpretations of the Confidence and Prediction intervals. (6 marks) Page 2 of 13

3 resp resp resp XST7002 Q2A Two-Sample T-Test: resp, Gender Gender N Mean StDev SE Mean Regression of Response on Gender. M=1; F=0 Difference = mu (0) - mu (1) Estimate for difference: % CI for difference: (0.074, 0.660) T-Test of difference = 0 (vs not =): T-Value = 2.51 P-Value = DF = Resp vs Gender M,F/1,0 resp = Gender S R-Sq 9.2% R-Sq(adj) 7.7% 2.0 Regression Analysis: resp versus Gender resp = Gender Gender Predictor Coef SE Coef T P Constant Gender S = R-Sq = 9.2 Q2B Regression Analysis: resp versus dose resp = dose Predictor Coef SE Coef T P Constant dose S = R-Sq = 73.7% Regression Analysis: resp versus wt resp = wt Predictor Coef SE Coef T P Constant wt S = R-Sq = 18.3% Resp vs Dose resp = dose Regression 95% CI 95% PI S R-Sq 73.7% R-Sq(adj) 73.2% Resp vs Wt resp = wt Regression 95% CI 95% PI S R-Sq 18.3% R-Sq(adj) 16.9% dose wt Q2 Continues Page 3 of 13

4 c) Two multiple regression analyses are presented below Q2C The researcher is puzzled by their different interpretations as regards the apparent importance of dosage. The dose/wt variable is a derived variable, being the ratio of dose to weight. Provide her with an explanation. Use this to discuss ideas of correlated predictor variables and of direct and indirect relationships. d) A further derived variable, (Gender*dose/wt) is created; this is the simple (8 marks) product of the binary Gender variable by dose/wt. The resulting regression is in Q2D. What is the purpose of including such a variable in such a regression analysis? What is the interpretation in this case? Contrast with the analysis above. Illustrate your discussion rough sketches of the two simple regression lines (resp vs dose/wt) that are implicit in this model. (8 marks) e) A final analysis leads to Q2E. Discuss the interpretation. Would you propose further analyses? Q2C Regression Analysis: resp versus wt, Gender, dose resp = wt Gender dose Predictor Coef SE Coef T P Constant wt Gender dose S = R-Sq = 89.6% Regression Analysis: resp versus wt, Gender, dose, dose/wt resp = wt Gender dose dose/wt Predictor Coef SE Coef T P VIF Constant wt Gender dose dose/wt (5 marks) S = R-Sq = 90.6% Q2 Continues Page 4 of 13

5 Q2D Regression Analysis: resp versus Gender, dose/wt, Gender*dose/wt resp = Gender dose/wt Gender*dose/wt Predictor Coef SE Coef T P Constant Gender dose/wt Gender*dose/wt S = R-Sq = 89.2% R-Sq(adj) = 88.7% Q2E Regression Analysis: resp versus dose/wt, Gender, wt, dose, Gender*dose/ resp = dose/wt Gender wt dose Gender*dose/wt Predictor Coef SE Coef T P VIF Constant dose/wt Gender wt dose Gender*dose/wt S = R-Sq = 90.6% R-Sq(adj) = 89.8% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS dose/wt Gender wt dose Gender*dose/wt Page 5 of 13

6 Q3 Data are available on the Weight (gms) and physical dimensions Length, Width and Height (cms) of 56 perch. All are caught from the same lake (Laengelmavesi) near Tampere in Finland. A matrix plot and various analyses are presented below. The interest lies in relating the dimensions to the weight. Matrix Plot of Weight, Length, Ht, width Weight Length Ht width a) It is immediately apparent that separate linear regressions of Weight on Length, Width and Height will encounter difficulties. Discuss. (6 marks) b) All variables were log transformed; the resulting multiple regression analysis as in Q3A overleaf. Discuss the various aspects of this transformation and subsequent analysis. What are the implications of the VIF values? (6 marks) c) In an attempt at a simpler model, the derived variable Vol=Length Height Width was formed. The Fitted Line plot in C overleaf summarises the analysis; the SE for the slope is returned as Explain the features of this analysis and plot. (8 marks) d) Use the models in (b) and (c) above, to compute approximate 95% Prediction Intervals for the Weights of two fish with dimensions (Length, Height and Width) being respectively (14.7, 3.5 and 2.0) and (45.2, 11.9 and 7.3). Explain carefully the basis, in the fitted models, for your calculations. (8 marks) e) The slope SE is What are the implications for possible further simplification? (3 marks) f) It is remarked that although the last model (c) provides an excellent and simple fit, its interpretation differs to an extent that seems to be statistically significant - from the details of the model fitted in (b). Discuss. (2 marks) Page 6 of 13

7 Frequency Deleted Residual Percent Deleted Residual XST7002 Q3A Regression Analysis: logwt versus loglen, loght, logwidth logwt = loglen loght logwidth Predictor Coef SE Coef T P VIF Constant loglen loght logwidth S = R-Sq = 99.4% R-Sq(adj) = 99.4% Unusual Observations Obs loglen logwt Fit SE Fit Residual St Resid X R R R R X Q3B R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. 99 Normal Probability Plot Residual Plots for logwt Versus Fits Deleted Residual Fitted Value Histogram Versus Order Q3C Deleted Residual Observation Order Page 7 of 13

8 Weight XST Weight vs Vol log10(weight) = log10(vol) Regression 95% PI S R-Sq 99.3% R-Sq(adj) 99.3% Vol Page 8 of 13

9 Outline Solution Q2 a) The two analyses are equivalent, though differently packaged. The T-test reports that the observed means and mean difference in response are 1.611, and The regression reports the same info (to within rounding error): when Gender =0 (F) the regression reports the expected value to be 1.61; when Gender =1 (M) the regression reports = The regression model regards Gender and as an Indicator variable; the plot which interpolates to other values of Gender is not interpretable. Treating the remaining variation (other than due to Gender) can be regarded as random, both find that the T-ratio for the difference is Both report that this is statistically significant. b) Resp vs dose reports in increase in the average increase in response of per unit of dose; this is =0.74 per 200 units. If the remaining error can be treated as random, this is hugely statistically significant. Res vs wt shows an apparent reduction in response associated with weight, also statistically significant. Anticipating later analyses, response is often more naturally sensitive to dose per unit weight; the latter analysis is consistent with this. The prediction intervals as shown are effectively descriptive. Most of the data lie within these. The Confidence Intervals qualify statements about the mean response (over very many patients with specified dose or weight. One interpretation is that regression lines that are statistically consistent with the data must lie within the CI band. Page 9 of 13

10 c) The first analysis suggests that weight, dose and gender are all important predictors of response. The second suggests that, when dose/wt is included as a predictor, neither dose nor weight contribute much Dose extra information. We already know that weight and gender and interrelated. And by construction Dose/wt Weight does/wt is correlated to dose and Gender to weight. The apparent confusion arises frequently when the x-variable (predictor variables) are themselves inter-related. The slope coefficients are in such cases - not simply interpretable in terms of the corresponding bivariate correlations. Resp The diagram not a requirement, but worthy of marks if offered provides one way to envision a possible set of direct and indirect relationships with response. The concept of direct and indirect relationships has been discussed in class. d) The new derived variable simplifies the direct consideration and comparison of two simple models: Resp vs dose/wt separately for M/F. Separately these may be written as resp = int cpt + slope (dose/wt), with potentially different values for each for M/F. This can also be a way to investigate interaction. Here F dose/wt M ( ) + ( ) dose/wt Lines, when sketched, are effectively parallel and have almost same slope). Since the values and are small compared to SE, via T = 0.58, T = 0.00 we can conclude that a single regression relationship, for both M and F, is likely to be adequate. This in turn suggest that the Weight and Gender terms in the second model in c) may not be simply interpreted as suggested there as individually necessary. For the Gender term is correlated with Weight. Perhaps the inclusion of Weight requires the inclusion of Gender to counter-balance it. e) The analysis confirms that the important variable is dose/wt. No other variables are significant. However, the very high VIF values for dose and dose/wt point to the fact that these variables are (naturally) highly interdependent. One of these is likely to be to most important. The choice should be guided by the considerations of the way the drug interacts, biochemically, with the patient. Page 10 of 13

11 Q3 a) It is clear that the bivariate relationships (top row) are not linear. Additionally, there is clear evidence of variance of weight increasing with weight. It is also the case that there is a great deal of correlation between the 3 x variables, likely to cause problems if they are ever used together. b) The model in the log scale shows that all vars are very significantly different from 0; tho that was never in doubt. R 2 is high. This can also be written as Wt = Len 1.65 Ht 0.81 Width error. The nominal interpretation is that an increase of 1 in eg loglen ( ie an increase of 10- in Len) will induce - on average an increase of 1.65 in logwt (ie =45-fold in Wt) if all other variables are held constant. But The VIF values suggest that the covariates are correlated- as anticipated and that the SEs are therefore inflated. This was apparent also in the scatterplots. Effectively this means that some fish are large in respect of all three dimensions, and some are small. In these circumstances one option is to choose a single composite that reflects all of the variables. It is likely to be futile to choose one of them. There are however a number of unusual observations. 4 of these exhibit large residuals, which merit attention. Three are very large and positive, and one is large and negative. Two are influential, being far from the others in respect of (log)len, Ht, Width. There is nothing wrong with this, necessarily. c) The option followed was to choose a product named Vol. The Fitted Line plot has fitted Weight to Vol, in the log scale, presenting the analysis in the back-transformed anti-log scale. Alternatively as Log(Vol) = Log(Length)+ Log(Height) + Log(Width), the composite Log(Vol) variable is the sum of the three Log(covariates). The fitted model can be written as Wt = Vol error = 0.3 Vol error = 0.3 Len 0.98 Ht 0.98 Width error. An increase of 1 in LogVol (ie a 10-fold increase) will generate, on average a fold (ie 9.55-fold) increase in Wt, on average. The constant 0.3 could be thought of as fish-density, were fish to be cuboid. As it is, it is a combination of fish density and the ratio of actual fish volume to the volume of the corresponding cuboid. When Vol 0.98 is large (ie when Wt is large) the absolute errors implicit in a 10 error -fold variation are large.this exhibits the fan-like figure for the prediction intervals Predicting the weight if a large fish is harder than that of a small fish in absolute terms. The issue is equivalent to describing the prediction error in %age terms. Page 11 of 13

12 The R2 value is almost as high as in b). The value of s is This is in the logscale and compares with the value of above. The model is not quite as tight-fitting, but it is simpler. No info is available on unusual obs, or SEs. d) The two prediction equations are: (as in b) Pred logwt = Log(len) log(ht)m Log(Width) 2(0.037) (as in c) Pred logwt = Log(len*Ht*Width) 2(0.040), back transformed as antilog( Pred log(wt)) as below Fish dimensions Fish len ht width Vol log 10 dimensions Log(vol) sum log model b coeffs model c coeff const len ht width s const vol s Fish Pred LogWt lo hi Pred LogWt lo hi backtransform backtransform Conclusions: very similar See f) below Model c has a slope is ( 0.011). This includes coeff=1. That is, the data are statistically consistent with 1; a Null Hyp: slope =1 would not be rejected. A simpler version of this model would then be LogWt = const + Log Vol. Equiv this is Wt=0.3 Vol. This model is not unlike the Tree model discussed in class. Note that the SE (0.011) is very much smaller than the SE s for each of the dimensions in model b). That s because their SE s have been inflated by, effectively, the lack of determinancy of the separate coeffs. Note however, that the correlation between these dimensions has no implications at all for the usefulness of the prediction equation generated by model b). It is simply the case that many different combinations of these coefficients are effectively equivalent to each other. e) However, model (c) corresponds to giving coefficients of 1 to each (log) dimension. This is just about consistent with the fit for LogHt ( (0.21) it is not as consistent with LogLen (1.65 2(0.22) and LogWidth (0.55 2(0.18). The implications are that Fish that are very long will be given low values of log Wt in model c and Fish that are very wide will be given high values of Page 12 of 13

13 LogWt in model c. Fish are not cuboids. However it is moot whether the data would require a rejection (in model b) of the Null Hyp that all coefficients were equal to 1. XST7002 Page 13 of 13

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

c 2015, Jeffrey S. Simonoff 1

c 2015, Jeffrey S. Simonoff 1 Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Solution Let us regress percentage of games versus total payroll.

Solution Let us regress percentage of games versus total payroll. Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Case Study Call Centre Hypothesis Testing

Case Study Call Centre Hypothesis Testing is often thought of as an advanced Six Sigma tool but it is a very useful technique with many applications and in many cases it can be quite simple to use. Hypothesis tests are used to make comparisons

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

The Volatility Index Stefan Iacono University System of Maryland Foundation

The Volatility Index Stefan Iacono University System of Maryland Foundation 1 The Volatility Index Stefan Iacono University System of Maryland Foundation 28 May, 2014 Mr. Joe Rinaldi 2 The Volatility Index Introduction The CBOE s VIX, often called the market fear gauge, measures

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

13. Poisson Regression Analysis

13. Poisson Regression Analysis 136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

More information

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas Regression and Time Series Analysis of Petroleum Product Sales in Masters Energy oil and Gas 1 Ezeliora Chukwuemeka Daniel 1 Department of Industrial and Production Engineering, Nnamdi Azikiwe University

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Chapter 4 and 5 solutions

Chapter 4 and 5 solutions Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

More information

Example G Cost of construction of nuclear power plants

Example G Cost of construction of nuclear power plants 1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14 HOW TO USE MINITAB: DESIGN OF EXPERIMENTS 1 Noelle M. Richard 08/27/14 CONTENTS 1. Terminology 2. Factorial Designs When to Use? (preliminary experiments) Full Factorial Design General Full Factorial Design

More information

PTC Thermistor: Time Interval to Trip Study

PTC Thermistor: Time Interval to Trip Study PTC Thermistor: Time Interval to Trip Study by by David C. C. Wilson Owner Owner // Principal Principal Consultant Consultant Wilson Consulting Services, LLC April 5, 5, 5 Page 1-19 Table of Contents Description

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

with functions, expressions and equations which follow in units 3 and 4.

with functions, expressions and equations which follow in units 3 and 4. Grade 8 Overview View unit yearlong overview here The unit design was created in line with the areas of focus for grade 8 Mathematics as identified by the Common Core State Standards and the PARCC Model

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

A full analysis example Multiple correlations Partial correlations

A full analysis example Multiple correlations Partial correlations A full analysis example Multiple correlations Partial correlations New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical,

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information