Exercise Page 1 of 32

Size: px
Start display at page:

Download "Exercise Page 1 of 32"

Transcription

1 Exercise 10.1 (a) Plot wages versus LOS. Describe the relationship. There is one woman with relatively high wages for her length of service. Circle this point and do not use it in the rest of this exercise. (b) Find the least-squares line. Summarize the significance test for the slope. What do you conclude? The least-squares line is ( ) The standard error for the estimate of the slope is The corresponding t- and p-values are 2.85 and , respectively. With a p-value so small, we conclude that there is significant evidence against the mull hypothesis that the slope is zero. (c) State carefully what the slope tells you about the relationship between wages and length of service. Page 1 of 32

2 The slope tells us that for every unit increase in LOS, the average wage will increase by units. (d) Give a 95% confidence interval for the slope. A 95% confidence interval for the slope is [ , ]. > confint( m1) 2.5 % 97.5 % (Intercept) LOS Exercise 10.2 Refer to the previous exercise. Analyze the data with the outlier included. How does this change the estimates of the parameters,, and? What effect does the outlier have on the results of the significance test for the slope? The estimates of the parameters from the previous exercise (without the outlier) were: With the outlier included, the estimates become: The outlier has the effect of lowering the significance of the test for the slope increasing the p-value from to Exercise 10.5 In Example 10.8 we examined the yield in bushels per acre of corn for the years 1966, 1976, 1986, and Data for all years between 1957 and 1996 appear in Table 10.2 (a) Plot the yield versus year. Describe the relationship. Are there any outliers or unusual years? Page 2 of 32

3 Yield Exercise Year Comment: The relationship looks roughly linear. There don t appear to be any outliers. As for unusual years, there are a few years in the 70s and 80s that appear to have a lower yield than what might be expected. (b) Perform the regression analysis and summarize the results. How rapidly has yield increased over time? The least-squares line is ( ) Page 3 of 32

4 The significance tests for intercept and slope are both highly significant the p-value for the test of the intercept being equal to zero was p = 2.77e-15 and for the test of the slope being equal to zero was p = 1.26e-15. The average yield has increases by approximately 1.84 bushels per acre per year. > m5= lm( Yield ~ Year) > summary( m5) Call: lm(formula = Yield ~ Year) Residuals: Min 1Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-15 *** Year e-15 *** --Signif. codes: 0 *** ** 0.01 * Residual standard error: on 38 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 38 DF, p-value: 1.257e-15 Exercise 10.6 (a) Find the equation of the least-squares line. The equation is: ( ) > m6= lm( Y ~ X) > summary( m6) Call: lm(formula = Y ~ X) Residuals: Min 1Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** X e-07 *** --Signif. codes: 0 *** ** 0.01 * Page 4 of 32

5 Residual standard error: on 7 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 312 on 1 and 7 DF, p-value: 4.596e-07 (b) Test the null hypothesis that the slope is zero and describe your conclusion. The p-value for this test is p = 4.6e-07, so we reject at the 0.05 level. The conclusion is that it is extremely unlikely that the slope is zero. (c) Give a 95% confidence interval for the slope. A 95% confidence interval for the slope is [ ] > confint( m6) 2.5 % 97.5 % (Intercept) X (d) The parameter corresponds to natural gas consumption for cooking, hot water, and other uses when there is no demand for heating. Give a 95% confidence interval for this parameter. A 95% confidence interval for the intercept is [ ] Exercise 10.8 (a) Plot the data. Does the trend in lean over time appear to be linear? Comment: Yes, the trend appears to be linear. Plot: Lean Exercise Year (b) What is the equation of the least-squares line? What percentage of the variation in lean is explained by this line? Page 5 of 32

6 The equation of the least-squares line is ( ) From the R-squared value, approximately 98.7% of the variation is explained by this line. > m8= lm( Lean ~ Year) > summary( m8) Call: lm(formula = Lean ~ Year) Residuals: Min 1Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * Year e-12 *** --Signif. codes: 0 *** ** 0.01 * Residual standard error: on 11 degrees of freedom Multiple R-squared: 0.988, Adjusted R-squared: F-statistic: on 1 and 11 DF, p-value: 6.503e-12 (c) Give a 95% confidence interval for the average rate of change (tenths of a millimeter per year) of the lean. A 95% confidence interval for the average rate of change (tenths of a millimeter per year) of the lean is [ ] Exercise 10.9 (a) In 1918 the lean was (The coded value is 71.) Using the least-squares equation for the years 1975 to 1987, calculate a predicted value for the lean in The predicted value for the lean in 1918 is tenths of a millimeter. > predict( m8, newdata=data.frame( Year=c( 18)), interval="prediction", level=0.95) fit lwr upr (b) Although the least-squares line gives an excellent fit to the data for 1975 to 1987, this pattern did not extend back to Write a short statement explaining why this conclusion follows from the information available. Use numerical and graphical summaries to support your explanation. The reason why the conclusion (that the linear pattern does not extend back to 1918) follows from the information available is because the predicted value for lean in 1918 ( Page 6 of 32

7 meters/coded value 106.6) does not match the value of the actual lean in 1918 ( meters/ coded value 71). As for the part of the question asking for numerical and graphical summaries, I m not sure which numerical/graphical summaries the author has in mind; also, I don t see how any numerical/graphical summary would explain why the pattern does not extend back to Exercise (a) The engineers working on the Leaning Tower of Pisa are most interested in how much the tower will lean if no corrective action is taken. Use the least-squares equation to predict the tower s lean in the year The predicted value for the lean in 1997 is tenths of a millimeter. > predict( m8, newdata=data.frame( Year=c( 97)), interval="prediction", level=0.95) fit lwr upr (b) To give a margin of error for the lean in 1997, would you use a confidence interval for a mean response or a prediction interval? Explain your choice. We would use a prediction interval because this was a prediction. Exercise Exercise 10.6 gives information about the regression of natural gas consumptionon degree-days for a particular household. (a) What is the t statistic for testing? The t statistic was (b) For the alternative, what critical value would you use for a test at the significance level? Do you reject at this level? We would use the critical value. Yes, we would reject at this level. (c) How would you report the P-value for this test? We would report p= e-07. > pt( q=17.663, df=7, lower.tail=f) [1] e-07 Exercise (a) Find and ( We have: ) from the data. and from the equation: Page 7 of 32

8 ( ) we have: ( ) ( ) Thus ) ( ( ) > mean( Spheres) [1] > m15= lm( Vein ~ Spheres) > summary( m15) Call: lm(formula = Vein ~ Spheres) Residuals: Min 1Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Spheres e-06 *** --Signif. codes: 0 *** ** 0.01 * Residual standard error: on 8 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 8 DF, p-value: 4.733e-06 (b) We expect x and y to be positively associated. State hypotheses in terms of the slope of the population regression line that expresses this expectation, and carry out a significance test. What conclusions do you draw? The hypotheses are Page 8 of 32

9 The p-value of the significance test, carried out on a t-value of from a distributions is 2.365e-06 (get this either by dividing the p-value from the two-sided test from the output from part (a) above (i.e. p= 4.73e-06) by 2 or by using the pt() function in R). The conclusion is that the slope is positive. Either > 4.73e-06/2 [1] 2.365e-06 or > pt( q=10.810, df=8, lower.tail=f) [1] e-06 (c) Find a 99% confidence interval for the slope. A 99% confidence interval is [ ] > confint( m15, level=.99) 0.5 % 99.5 % (Intercept) Spheres (d) Suppose that we observe a value of Spheres equal to 15.0 for one dog. Give a 90% interval for predicting the variable Vein for that dog. A 90% prediction interval for Vein corresponding to a value of 15.0 for Spheres is [ ] > predict( m15, newdata=data.frame( Spheres=c( 15.0)), interval="prediction", level=0.90) fit lwr upr Exercise (a) Plot the data. Are there any outliers or unusual points? There are not outliers or unusual points. Plot: Page 9 of 32

10 I 2.0 Exercise V (b) Find the least-squares fit to the data, and estimate 1/R for this wire. Then give a 95% confidence interval for 1/R. The least-squares fit is ( ) the estimate for 1/R is and a 95% confidence interval for 1/R is [ > m17= lm( I ~ V) > summary( m17) Call: lm(formula = I ~ V) Page 10 of 32 ]

11 Residuals: Coefficients: Estimate Std. Error t value (Intercept) V Signif. codes: 0 *** ** Pr(> t ) *** * Residual standard error: on 3 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 3 DF, p-value: > confint( m17) 2.5 % 97.5 % (Intercept) V (c) If estimates 1/R, then 1/ estimates R. Estimate the resistance R. Similarly, if L and U represent the lower and upper confidence limits for 1/R, then the corresponding limits for R are given by 1/U and 1/L, as long as L and U are positive. Use this face and your answer to (b) to find a 95% confidence interval for R. A 95% confidence interval for R is [ ] [ ] (d) Ohm s law states that in the model is 0. Calculate the test statistic for this hypothesis and give an approximate P-value. The test statistic for this hypothesis (see Code and Output for part (b)) is: and the corresponding p-value from a two-sided test is approximately: Exercise (a) Plot the data. Are there any outliers or unusual points? There don t appear to be any outliers. There are a few different VO2 values corresponding to the same HR value, which might be unusual. Plot: Page 11 of 32

12 V Exercise HR (b) Compute the least-squares regression line for predicting oxygen uptake from hear rate for this individual. The least-squares regression line is ( ) > m19= lm( V02 ~ HR) > m19 Call: lm(formula = V02 ~ HR) Coefficients: (Intercept) HR (c) Test the null hypothesis that the slope of the regression line is 0. Explain in words the meaning of your conclusions from this test. Page 12 of 32

13 The p-value for a two-sided test of the slope being 0 is 1.00e-11. The meaning is that there is a statistically significant linear relationship between the two variables. > summary( m19) Call: lm(formula = V02 ~ HR) Residuals: Min 1Q Median Q Coefficients: Estimate Std. Error t value (Intercept) HR Signif. codes: 0 *** ** 0.01 Max Pr(> t ) 4.59e-09 *** 1.00e-11 *** * Residual standard error: on 17 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 17 DF, p-value: 1.000e-11 (d) Calculate a 95% confidence interval for the oxygen uptake of this individual on a future occasion when his heart rate is 95. Repeat the calculation for heart rate % confidence intervals for the oxygen uptake of this individual when his heart rate is 95 and 110 are [ ] and [ ] respectively. > predict( m19, newdata=data.frame( HR=c( 95, 110)), interval="confidence", level=0.95) fit lwr upr (e) From what you have learned in (a), (b), (c), and (d) of this exercise, do you think that the researchers should use predicted VO2 in place of measured VO2 for this individual under similar experimental conditions? Explain your answer. Yes, researchers may use predicted VO2 in place of measured VO2 for this individual. I don t know how to explain it using parts (a), (b), (c), and (d), but I would say, using the Adjusted R-squared value of that a straight-line model does a good job of explaining the relationship between VO2 and heart rate. Exercise Calculate the t statistic for testing Specify an appropriate alternative hypothesis for this Page 13 of 32

14 problem and give an approximate p-value for the test. Then explain your conclusion in words a physician can understand. The t statistic is, an appropriate alternative hypothesis is and an approximate p-value is The conclusion is that there is a statistically significant straight line relationship between the traditional procedure and the new procedure; and every unit increase in the new method results in 0.83 units increase in the old method? > 2* pt( 0.83/0.065, df=81-2, lower.tail=f) [1] e-21 Exercise (a) It is reasonable to suppose that greater airflow will cause more evaporation. State hypotheses to test this belief and calculate the test statistic. Find an approximate P-value for the significance test and report your conclusion. The hypotheses are The test statistic is and an approximate P-value is The conclusion is that greater airflow will cause more evaporation. (b) Construct a 95% confidence interval for the additional evaporation experience when airflow increases by 1 unit. A 95% confidence interval or the slope is ( )( ) [ ] Exercise Return to the data on current versus voltage given in the Ohm s law experiment in Exercise (a) Compute all values for the ANOVA table. > anova( m17) Analysis of Variance Table Response: I Df Sum Sq Mean Sq F value Page 14 of 32 Pr(>F)

15 V Residuals *** (b) State the null hypothesis tested by the ANOVA F statistic, and explain in plain language what this hypothesis says. The null hypothesis test by the ANOVA F statistic is (Moore and McCabe 2006, p 655): In plain language, this hypothesis says that y is not linearly related to x. (c) What is the distribution of this F statistic when is true? Find an approximate P-value for the test of The distribution of this F statistic when is true is an ( ) distribution. An approximate P-value is. Exercise (a) The correlation between monthly income and birth weight was r=0.39. Calculate the t statistic for testing the null hypothesis that the correlation is 0 in the entire population of infants. The t statistic for testing the null hypothesis that the correlation is 0 is given by (Moore and McCabe, p 664) (b) The researchers expected that higher birth weights would be associated with higher incomes. Express this expectation as an alternative hypothesis for the population correlation. The alternative hypothesis expressing this expectation is: (c) Determine a P-value for versus the alternative that you specified in (b). What conclusion does your test suggest? The P-value is This suggests that monthly income and birth weight are related specifically, it suggests that there is a positive correlation between the two. Code and output: > pt( , df=38, lower.tail=f) [1] Exercise (a) The correlation between parental control and self-esteem was r = Calculate the t statistic for testing the null hypothesis that the population correlation is 0. The t statistic is: Page 15 of 32

16 ( ) (b) Find an approximate P-value for testing versus the two-sided alternative and report your conclusion. An approximate p-value for testing the null hypothesis against the two-sided alternative is p = e-07. The conclusion is that the correlation coefficient is significant and that since the value of the correlation coefficient is only -0.19, we can conclude that there is no strong linear relationship? Exercise (a) Plot the data and describe the pattern. Is it reasonable to summarize this kind of relationship with a correlation? The pattern looks somewhat linear. It seems reasonable to summarize this kind of relationship with a correlation because the correlation between the variables is Humerus Exercise Femur Femur Humerus (b) Find the correlation and perform the significance test. Summarize the results and report your conclusion. The correlation is, the t statistic is, and the corresponding p-value for testing the null hypothesis against the two-sided alternative is Page 16 of 32

17 > cor.test( Humerus, Femur) Pearson's product-moment correlation data: Humerus and Femur t = , df = 3, p-value = alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor Exercise (a) Plot the data and describe the relationship between the two scores. There appears to be a somewhat linear relationship between the two scores Round Exercise Round1 (b) Find the correlation between the two scores and test the null hypothesis that the population correlation is 0. Summarize your results. The correlation is, the t statistic is, and the corresponding p- value for testing the null hypothesis against the two-sided alternative is summary, there is evidence of a linear relationship. Page 17 of 32 In

18 > cor.test( Round1, Round2) Pearson's product-moment correlation data: Round1 and Round2 t = , df = 10, p-value = alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor (c) The plot shows one outlier. Recompute the correlation and redo the significance test without this observation. Write a short summary explaining the effect of the outlier on the correlation and significance test in (b). The correlation becomes, the t statistic, and the corresponding p-value for testing the null hypothesis against the two-sided alternative In summary, the outlier in part (b) had the effect of reducing the correlation and increasing the p-value. > > > > detach( data) data=data[-8,] attach( data) cor.test( Round1, Round2) Pearson's product-moment correlation data: Round1 and Round2 t = , df = 9, p-value = alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor Exercise (a) Find the equation of the least-squares line for predicting GHP from FVC. The equation is ( ) where the slope and intercept were found using the equations (Moore and McCabe, p 157) Page 18 of 32

19 and (b) Give the results of the significance test for the null hypothesis that the slope is 0. (Hint: What is the relation between this test and the test for a zero correlation?) Testing for the null hypothesis that the slope is 0 is the same as testing for the null hypothesis that the correlation is zero? Recall that the t statistic for testing zero correlation is and hence the p-value for a test of zero correlation against the two-sided alternative is Exercise (a) Plot the data with SAT on the x axis and ACT on the y axis. Describe the overall pattern and any unusual observations. The overall relationship looks linear. There s a potential outlier (observation 42). Plot: Page 19 of 32

20 (b) Find the least-squares regression line and draw it on your plot. Give the results of the significance test for the slope. The least-squares line is ( ) The significance test for the slope yields a p-value of Thus we strongly reject the null hypothesis that the slope is zero. Code and Outputs: > m39=lm( ACT ~ SAT)#-> a= , b= > summary(m39) Call: lm(formula = ACT ~ SAT) Residuals: Min 1Q Median Q Max Page 20 of 32

21 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SAT e-15 *** --Signif. codes: 0 *** ** 0.01 * Residual standard error: on 58 degrees of freedom Multiple R-squared: 0.667, Adjusted R-squared: F-statistic: on 1 and 58 DF, p-value: 1.796e-15 (c) What is the correlation between the two tests? The correlation is. Exercise (a) What is the mean of these predicted values? Compare it with the mean of the ACT scores. The mean of predicted values is The mean of the ACT scores is also Code and Outputs: > ACT.predicted=predict( m39, newdata=data.frame( HR=SAT)) > mean( ACT.predicted) [1] (b) Compare the standard deviation of the predicted values with the standard deviation of the actual ACT scores. If least-squares regression is used to predict ACT scores for a large number of students such as these, the average predicted value will be accurate but the variability of the predicted scores will be too small. The standard deviation of predicted values is , while the standard deviation of actual ACT scores is > s.act.predicted=sd( ACT.predicted)#== > s.act=sd( ACT)#== (c) Find the SAT score for a student who is one standard deviation above the mean ( ( ) ). Find the predicted ACT score and standardize this score. (Use the means and standard deviations from this set of data for these calculations.) Student #6 scored a 1440 on the SAT, which is standard deviations above the mean. The predicted ACT score for this student is , which when standardized also becomes in other words it is standard deviations above the mean predicted ACT score. > predict( m39, newdata=data.frame( SAT=c(1440)))#== > ( mean( ACT.predicted))/s.ACT.predicted#== (d) Repeat part (c) for a student whose SAT score is one standard deviation below the mean (z = 1). Page 21 of 32

22 ACT ACT Student #7 scored a 490 on the SAT, which is standard deviations below the mean. The predicted ACT score for this student is , which is also standard deviations below the mean predicted ACT score. (e) What do you conclude from parts (c) and (d)? Perform additional calculation for different z s if needed. We conclude that when using this least-squares line to predict values, the prediction will be the same number of standard deviations above/below the mean of predicted values as the explanatory variable is above/below the mean of explanatory variables. Exercise (a) Using the data in Table 10.4, find the values of and. Using the formula We get the values and, which seems wrong. However, if we use the formula we get the values and (b) Plot the data with the least-squares line and the new prediction line. Exercise 10.41: Using a.1= s.x/s.y formula least-squares line new prediction line SAT Exercise 10.41: Using a.1= s.y/s.x formula least-squares line new prediction line SAT (c) Use the new line to find predicted ACT scores. Find the mean and the standard deviation of these scores. How do they compare with the mean and standard deviation of the ACT scores? Page 22 of 32

23 Using the formula instead of the formula to determine the slope of the least-squares equation yields new predicted ACT scores with a mean of and a standard deviation of Compared with the mean and standard deviation of the ACT scores, they are the same. Cod and Outpue: > ACT.predicted.new <- (function(satscore){ *SATscore})( SAT) > mean( ACT.predicted.new)#== [1] > mean( ACT)#== [1] > sd( ACT.predicted.new)#== [1] > sd( ACT)#== [1] Page 23 of 32

24 Exercise (a) Based on the output for model (3.7) a business analyst concluded: [ ] Provide a detailed critique of this conclusion. The discernable pattern indicates that an improper model has been fit. Also, the outlier (observation 13) in the plot of studentized residuals versus Distance warrants some concern. (b) Does the ordinary straight line regression model (3.7) seem to fit the data well? If not, carefully describe how the model can be improved. Page 24 of 32

25 Standardized Residuals The ordinary straight line regression does sort of fit the data well, but the model can be improved by removing the outlier. Even after the model is refit with this outlier removed, there is still a discernable pattern in the plot of studentized residuals versus Distance: Exercise 3.4.1: Standardized Residuals versus Distance Distance The pattern indicated that a quadratic term should be added. Addition of a quadratic term yields: Page 25 of 32

26 Standardized Residuals Exercise 3.4.1: Standardized Residuals versus Distance The problem of non-random residuals appears to have been fixed, however the improvement in adjusted R-squared when going from the Fare ~ Distance model to the Fare ~ Distance + DistanceSquared Distance model is small the change is only from 99.63% to 99.87%. It might be possible to improve the model further by looking for more outliers and then refitting the model with them removed. Page 26 of 32

27 Exercise Is the following statement true or false? If you believe that the statement is false, provide a brief explanation. Response: Suppose that a straight line regression model has been fit to bivariate data set of the form ( ) ( ) ( ) Furthermore, suppose that the distribution of X appears to be normal while the Y variable is highly skewed. A plot of standardized residuals from the least squares regression line produce a quadratic pattern with increasing variance when plotted against ( ). In this case, one should consider adding a quadratic term in X to the regression model and thus consider a model of the form. I agree. Regarding the plot of standardized residuals, Sheather writes that if a plot of residuals against X that produces a discernible pattern, then the shape of the pattern provides information on the function of x that is missing from the model (p 49). He goes on to write that if the residuals from the straight-line fit of Y and X have a quadratic pattern, then we can conclude that there is need for a quadratic term to be added to the original straight-line regression model (p 50). Regarding the issue of increasing variance, Sheather writes that there are two methods to deal with it transformations and weighted least squares. Exercise Part A (a) Develop a simple linear regression model based on least squares that predicts advertising revenue per page from circulation (i.e. feel free to transform either the predictor or the response variable or both variables). Ensure that you provide justification for you choice of model. A simple linear regression model is ( ) ( ) ( ) Justification: The plot of AdRevenue ~ Circulation has x values that are spread too far a log transformation on the Xs will help bring them closer together. The plot of AdRevenue ~ log( Circulation) has y values that are too care apart a log transformation on the Ys will, again, help bring them closer together. The final model ( ) ( ) looks good visiually. A plot of all three models is below: Page 27 of 32

28 AdRevenue AdRevenue 800 Exercise 3.4.3: AdRevenue ~ log( Circulation) 800 Exercise 3.4.3: AdRevenue ~ Circulation Circulation log(circulation) log(adrevenue) 6.5 Exercise 3.4.3: log( AdRevenue) ~ log( Circulation) log(circulation) (b) Find a 95% prediction interval for the advertising revenue per page for magazines with the following circulations: 1) 0.5 million A 95% prediction interval for advertising revenue is [ ]. 2) 20 million ]. A 95 prediction interval for advertising revenue is [ > logcirculation= log( Circulation) > logadrevenue= log( AdRevenue) > m343= lm( logadrevenue ~ logcirculation) Page 28 of 32 3

29 > logadrevenue.predicted= predict( m343, newdata=data.frame( logcirculation= log( c( 0.5, 20))), interval="prediction", level=.95) > AdRevenue.predicted= exp( logadrevenue.predicted) > AdRevenue.predicted fit lwr upr (c) Describe any weaknesses in your model. Interpretation of the least squares coefficients becomes difficult with the log transformation applied to both explanatory and response variables. Part B (a) Develop a polynomial regression model based on least-squares that directly predicts the effect on advertising revenue per page of an increase in circulation of 1 million people (i.e. do not transform either the predictor nor the response variable). Ensure that you provide detailed justification for your choice of model. [Hint: Consider polynomial model of order up to 3.] A polynomial regression model based on least-squares that directly predicts the effect on advertising revenue per page of an increase in circulation of 1 million people is: ( ( ) ( ) ) Detailed Justification: This is how we arrived at the above model: 1) we first fit three models: i. ii. iii. 2) we identified the leverage points 3) For each model, we identified which of the leverage points were bad (using the rule that identifies points outside the interval -2 to 2 as bad) the bad leverage points were 2, 20, and 49 for the first model; 2, 4, 20, 49 for the second; and 2, 8, 20, 49 for the third. 4) we removed the bad leverage points for each model and then refit each model to the new data set (the data set with the bad leverage points removed) 5) Finally, we compared the R squared vales for all three models. Model 3 resulted in the highest adjusted R- squared value, so that is why it was chosen. Code and Outputs: #IDENTIFY LEVERAGE POINTS: leverage.vals= lm.influence(m343b.2)$hat Page 29 of 32

30 leverage.vals[ leverage.vals > 4/ length( Circulation)]# these points are levereage points: 2, 4, 6, 8, 20, 46, 49. #CALCULATE STANDARDIZED RESIDUALS FOR ABOVE LEVERAGE PTS AND DETERMINE WHICH ONES ARE OUTLIERS/BAD: rstandard.vals1= rstandard( m343b.1)[ c( 2, 4, 6, 8, 20, 46, 49)] rstandard.vals1[ rstandard.vals1 > 2 rstandard.vals < -2]# these are bad leverage points: 2, 20, 49 rstandard.vals2= rstandard( m343b.2)[ c( 2, 4, 6, 8, 20, 46, 49)] rstandard.vals2[ rstandard.vals2 > 2 rstandard.vals < -2]# these are bad leverage points: 2, 4, 20, 49 rstandard.vals3= rstandard( m343b.3)[ c( 2, 4, 6, 8, 20, 46, 49)] rstandard.vals3[ rstandard.vals3 > 2 rstandard.vals < -2]# these are bad leverage points: 2, 8, 20, 49 #REMOVE BAD LEVERAGE POINTS (i.e., OUTLIERS) AND REFIT EACH MODEL: ad.new1= ad[ -c(2, 20, 49),] detach( ad) attach( ad.new1) m343b.1= lm( AdRevenue ~ Circulation) ad.new2= ad[ -c(2, 4, 20, 49),] detach( ad.new1) attach( ad.new2) m343b.2= lm( AdRevenue ~ Circulation + CirculationSquared) ad.new3= ad[ -c(2, 8, 20, 49),] detach( ad.new2) attach( ad.new3) m343b.3= lm( AdRevenue ~ Circulation + CirculationSquared + CirculationCubed) #COMPARE THE THREE MODELS AGAIN WHEN OUTLIER FOR EACH MODEL ARE REMOVED: > summary( m343b.1) Call: lm(formula = AdRevenue ~ Circulation) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** Circulation <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 65 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 65 DF, p-value: < 2.2e-16 Page 30 of 32

31 > summary( m343b.2) Call: lm(formula = AdRevenue ~ Circulation + CirculationSquared) Residuals: Min 1Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-13 *** Circulation < 2e-16 *** CirculationSquared e-06 *** --Signif. codes: 0 *** ** 0.01 * Residual standard error: on 63 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 63 DF, p-value: < 2.2e-16 > summary( m343b.3) Call: lm(formula = AdRevenue ~ Circulation + CirculationSquared + CirculationCubed) Residuals: Min 1Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** Circulation e-10 *** CirculationSquared ** CirculationCubed ** --Signif. codes: 0 *** ** 0.01 * Residual standard error: on 62 degrees of freedom Multiple R-squared: 0.933, Adjusted R-squared: F-statistic: on 3 and 62 DF, p-value: < 2.2e-16 (b) Find a 95% prediction interval for the advertising revenue per page for magazines with the following circulations: (i) 0.5 million A 95% prediction interval for the first model is [ ]. A 95% prediction interval for the second model is [ ]. A 95% prediction interval for the third model is [ ]. (ii) 20 million Page 31 of 32

32 Part C A 95% prediction interval for the first model is [ ]. A 95% prediction interval for the second model is [ ]. A 95% prediction interval for the third model is [ ]. (c) Describe any weaknesses in your model. The weakness in our model (i.e. model 3) is how much greater the lengths of its predication intervals are compared to the other two models. (a) Compare the model in Part A with that in Part B. Decide which provides a better model. Give reasons to justify your choice. Not sure. Part A? Because the models in Part B give such different prediction intervals? (b) Compare the prediction intervals in Part A with those in Part B. In each case, decide which interval you would recommend. Give reasons to justify each choice. Not sure the intervals are all so different and it isn t clear why any particular one would be better than the others. Page 32 of 32

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

N-Way Analysis of Variance

N-Way Analysis of Variance N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

Simple Linear Regression

Simple Linear Regression Chapter Nine Simple Linear Regression Consider the following three scenarios: 1. The CEO of the local Tourism Authority would like to know whether a family s annual expenditure on recreation is related

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

How Far is too Far? Statistical Outlier Detection

How Far is too Far? Statistical Outlier Detection How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 30-325-329 Outline What is an Outlier, and Why are

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. ch12 practice test 1) The null hypothesis that x and y are is H0: = 0. 1) 2) When a two-sided significance test about a population slope has a P-value below 0.05, the 95% confidence interval for A) does

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

Using Excel for Statistical Analysis

Using Excel for Statistical Analysis Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information