Chapter 8 Linear Regression

Size: px
Start display at page:

Download "Chapter 8 Linear Regression"

Transcription

1 88 Part II Exploring Relationships Between Variales Chapter 8 Linear Regression. Cereals. Potassium ˆ Fier ( 9) 28 mg. According to the model, we expect cereal with 9 grams of fier to have 28 milligrams of potassium. 2. Horsepower. mpg ˆ HP ( 2) 3. 7 mpg. According to the model, we expect a car with 2 horsepower to get aout 3.7 miles per gallon. 3. More cereal. A negative residual means that the potassium content is actually lower than the model predicts for a cereal with that much fier. 4. Horsepower, again. A positive residual means that the car gets etter gas mileage than the model predicts for a car with that much horsepower. 5. Another owl. The model predicts that cereals will have approximately 27 more milligrams of potassium for each additional gram of fier. 6. More horsepower. The model predicts that cars lose an average of.84 miles per gallon for each additional horse power. 7. Cereal again. R 2 (. 93) Aout 8.5% of the variaility in potassium content is accounted for y the model. 8. Another car. R 2 ( 869. ) Aout 75.5% of the variaility in fuel economy is accounted for y the model. 9. Last owl! True potassium content of cereals vary from the predicted values with a standard deviation of 3.77 milligrams.. Last tank! True fuel economy varies from the predicted amount with a standard deviation of miles per gallon.. Residuals. a) The scattered residuals plot indicates an appropriate linear model. Copyright 2 Pearson Education, Inc.

2 Chapter 8 Linear Regression 89 ) The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear. c) The fanned pattern indicates that the linear model is not appropriate. The model s predicting power decreases as the values of the explanatory variale increase. 2. Residuals. a) The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear. ) The fanned pattern indicates heteroscedastic data. The models predicting power increases as the value of the explanatory variale increases. c) The scattered residuals plot indicates an appropriate linear model. 3. What slope? The only slope that makes sense is 3 pounds per foot. 3 pounds per foot is too small. For example, a Honda Civic is aout 4 feet long, and a Cadillac DeVille is aout 7 feet long. If the slope of the regression line were 3 pounds per foot, the Cadillac would e predicted to outweigh the Civic y only 9 pounds! (The real difference is aout 5 pounds.) Similarly, 3 pounds per foot is too small. A slope of 3 pounds per foot would predict a weight difference of 9 pounds (4.5 tons) etween Civic and DeVille. The only answer that is even reasonale is 3 pounds per foot, which predicts a difference of 9 pounds. This isn t very close to the actual difference of 5 pounds, ut at least it is in the right allpark. 4. What slope? The only slope that makes sense is foot in height per inch in circumference.. feet per inch is too small. A trunk would have to increase in circumference y inches for every foot in height. If that were true, pine trees would e all trunk! feet per inch (and, similarly feet per inch) is too large. If pine trees reach a maximum height of 6 feet, for instance, then the variation in circumference of the trunk would only e 6 inches. Pine tree trunks certainly come in more sizes than that. The only slope that is reasonale is foot in height per inch in circumference. 5. Real estate. a) The explanatory variale (x) is size, measured in square feet, and the response variale (y) is price measured in thousands of dollars. ) The units of the slope are thousands of dollars per square foot. c) The slope of the regression line predicting price from size should e positive. Bigger homes are expected to cost more. 6. Roller coaster. a) The explanatory variale (x) is initial drop, measured in feet, and the response variale (y) is duration, measured in seconds. ) The units of the slope are seconds per foot. Copyright 2 Pearson Education, Inc.

3 9 Part II Exploring Relationships Between Variales c) The slope of the regression line predicting duration from initial drop should e positive. Coasters with higher initial drops proaly provide longer rides. 7. Real estate again. 7.4% of the variaility in price can e accounted for y variaility in size. (In other words, 7.4% of the variaility in price can e accounted for y the linear model.) 8. Coasters again. 2.4% of the variaility in duration can e accounted for y variaility in initial drop. (In other words, 2.4% of the variaility in duration can e accounted for y the linear model.) 9. Real estate redux. 2 a) The correlation etween size and price is r R The positive value of the square root is used, since the relationship is elieved to e positive. ) The price of a home that is one standard deviation aove the mean size would e predicted to e.845 standard deviations (in other words r standard deviations) aove the mean price. c) The price of a home that is two standard deviations elow the mean size would e predicted to e.69 (or ) standard deviations elow the mean price. 2. Another ride. 2 a) The correlation etween drop and duration is r R The positive value of the square root is used, since the relationship is elieved to e positive. ) The duration of a coaster whose initial drop is one standard deviation elow the mean drop would e predicted to e aout.352 standard deviations (in other words, r standard deviations) elow the mean duration. c) The duration of a coaster whose initial drop is three standard deviation aove the mean drop would e predicted to e aout.56 (or ) standard deviations aove the mean duration. 2. More real estate. a) According to the linear model, the price of a home is expected to increase $6 (.6 thousand dollars) for each additional square-foot in size. ) Priˆ ce ( Sqrft) Priˆ ce ( 3) Priˆ ce According to the linear model, a 3 square-foot home is expected to have a price of $23,82. c) Priˆ ce ( Sqrft) Priˆ ce ( 2) Priˆ ce 2. 2 According to the linear model, a 2 square-foot home is expected to have a price of $2,2. The asking price is $2,2 - $6 $5,2. $6 is the (negative) residual. Copyright 2 Pearson Education, Inc.

4 Chapter 8 Linear Regression Last ride. a) According to the linear model, the duration of a coaster ride is expected to increase y aout.242 seconds for each additional foot of initial drop. ) Duration ˆ ( Drop) Duration ˆ ( 2) Duration ˆ According to the linear model, a coaster with a 2 foot initial drop is expected to last seconds. c) Duration ˆ ( Drop) Duration ˆ ( 5) Duration ˆ According to the linear model, a coaster with a 5 foot initial drop is expected to last seconds. The advertised duration is shorter, at 2 seconds. 2 seconds seconds seconds, a negative residual. 23. Misinterpretations. a) R 2 is an indication of the strength of the model, not the appropriateness of the model. A scattered residuals plot is the indicator of an appropriate model. ) Regression models give predictions, not actual values. The student should have said, The model predicts that a ird inches tall is expected to have a wingspan of 7 inches. 24. More misinterpretations. a) R 2 measures the amount of variation accounted for y the model. Literacy rate determines 64% of the variaility in life expectancy. ) Regression models give predictions, not actual values. The student should have said, The slope of the line shows that an increase of 5% in literacy rate is associated with an expected 2-year improvement in life expectancy. 25. ESP. a) First, since no one has ESP, you must have scored 2 standard deviations aove the mean y chance. On your next attempt, you are unlikely to duplicate the extraordinary event of scoring 2 standard deviations aove the mean. You will likely regress towards the mean on your second try, getting a lower score. If you want to impress your friend, don t take the test again. Let your friend think you can read his mind! ) Your friend doesn t have ESP, either. No one does. Your friend will likely regress towards the mean score on his second attempt, as well, meaning his score will proaly go up. If the goal is to get a higher score, your friend should try again. Copyright 2 Pearson Education, Inc.

5 92 Part II Exploring Relationships Between Variales 26. SI jinx. Athletes, especially rookies, usually end up on the cover of Sports Illustrated for extraordinary performances. If these performances represent the upper end of the distriution of performance for this athlete, future performance is likely to regress toward the average performance of that athlete. An athlete s average performance usually isn t notale enough to land the cover of SI. Of course, there are always exceptions, like Michael Jordan, Tiger Woods, Serena Williams, and others. 27. Cigarettes. a) A linear model is proaly appropriate. The residuals plot shows some initially low points, ut there is not clear curvature. ) 92.4% of the variaility in nicotine level is accounted for y variaility in tar content. (In other words, 92.4% of the variaility in nicotine level is accounted for y the linear model.) 28. Attendance 26. a) The linear model is appropriate. Although the relationship is not strong, it is reasonaly straight, and the residuals plot shows no pattern. ) 48.5% of the variaility in attendance is accounted for y variaility in the numer of wins. (In other words, 48.5% of the variaility is accounted for y the model.) c) The residuals spread out. There is more variation in attendance as the numer of wins increases. d) The Yankees attendance was aout 3, fans more than we might expect given the numer of wins. This is a positive residual. 29. Another cigarette. 2 a) The correlation etween tar and nicotine is r R The positive value of the square root is used, since the relationship is elieved to e positive. Evidence of the positive relationship is the positive coefficient of tar in the regression output. ) The average nicotine content of cigarettes that are two standard deviations elow the mean in tar content would e expected to e aout.922 ( 2. 96) standard deviations elow the mean nicotine content. c) Cigarettes that are one standard deviation aove average in nicotine content are expected to e aout.96 standard deviations (in other words, r standard deviations) aove the mean tar content. 3. Second inning a) The correlation etween attendance and numer of wins is r R The positive value of the square root is used, since the relationship is positive. ) A team that is two standard deviations aove the mean in numer of wins would e expected to have attendance that is.394 (or ) standard deviations aove the mean attendance. Copyright 2 Pearson Education, Inc.

6 Chapter 8 Linear Regression 93 c) A team that is one standard deviation elow the mean in attendance would e expected to have a numer of wins that is.697 standard deviations (in other words, r standard deviations) elow the mean numer of wins. The correlation etween two variales is the same, regardless of the direction in which predictions are made. Be careful, though, since the same is NOT true for predictions made using the slope of the regression equation. Slopes are valid only for predictions in the direction for which they were intended. 3. Last cigarette. a) Nicotˆ ine ( Tar) is the equation of the regression line that predicts nicotine content from tar content of cigarettes. ) Nicotˆ ine ( Tar) The model predicts that cigarette with 4 mg Nicotˆ ine ( 4) of tar will have aout.44 mg of nicotine. Nicotˆ ine 44. c) For each additional mg of tar, the model predicts an increase of.65 mg of nicotine. d) The model predicts that a cigarette with no tar would have.54 mg of nicotine. e) Nicotˆ ine ( Tar) The model predicts that a cigarette with 7 mg Nicotˆ of tar will have.694 mg of nicotine. If the ine ( 7) residual is.5, the cigarette actually had Nicotˆ ine mg of nicotine. 32. Last inning 26. a) Attendance ˆ Wins is the equation of the regression line that predicts attendance from the numer of games won y American League aseall teams. ) + ( ) Attendance ˆ Wins Attendan ˆ ce Attendance ˆ 2, 58 + ( ) ( 5) c) For each additional win, the model predicts an increase in attendance of people. d) A negative residual means that the team s actual attendance is lower than the attendance model predicts for a team with as many wins. e) Attendance ˆ Wins Attendan ˆ ce + ( ) ( 83) Attendance ˆ 3, The model predicts that a team with 5 wins will have attendance of 2,58 people. The predicted attendance for the Cardinals was 3, The actual attendance of 42,588 gives a residual of 42,588 3, , The Cardinals had over 2, more people attending on average than the model predicted. Copyright 2 Pearson Education, Inc.

7 94 Part II Exploring Relationships Between Variales 33. Income and housing revisited. a) Yes. Both housing cost index and median family income are quantitative. The scatterplot is Straight Enough, although there may e a few outliers. The spread increases a it for states with large median incomes, ut we can still fit a regression line. ) Using the summary statistics given in the prolem, calculate the slope and intercept: c) rshci smfi (. 65)( 6. 55) yˆ + x y + x ( 46234) (If you went ack to Chapter 7, and found the regression equation from the original data, the equation is HCI ˆ MFI, not a huge difference!) HCI ˆ MFI HCI ˆ ( 44993) HCI ˆ (Using the regression equation calculated from the actual data would give an average housing cost index of approximately ) d) The prediction is too low. Washington has a positive residual. (223.5 from the equation generated from the original data.) e) The correlation is the slope of the regression line that relates z-scores, so the regression equation would e zˆ 65. z. HCI MFI f) The correlation is the slope of the regression line that relates z-scores, so the regression equation would e zˆ 65. z. MFI 34. Interest rates and mortgages. HCI a) Yes. Both interest rate and total mortgages are quantitative, and the scatterplot is Straight Enough. The spread is fairly constant, and there are no outliers. ) Using the summary statistics given in the prolem, calculate the slope and intercept: rsmortamt sintrate (. )(. ) yˆ + x y + x ( 8. 88) The regression equation that predicts HCI from MFI is HCI ˆ MFI The model predicts that a state with median family income of $44993 have an average housing cost index of The regression equation that predicts total mortgage amount from interest rate is MortAmt ˆ IntRate (If you went ack to Chapter 7, and found the regression equation from the original data, the equation is MortAmt ˆ IntRate.) Copyright 2 Pearson Education, Inc.

8 c) MortAmt ˆ IntRate MortAmt ˆ ( 2) MortAmt ˆ Chapter 8 Linear Regression 95 d) We should e very cautious in making a prediction aout an interest rate of 2%. It is well outside the range of our original x-variale, and care should always e taken when extrapolating. This prediction may not e appropriate. e) The correlation is the slope of the regression line that relates z-scores, so the regression equation would e zˆ 84. z. MortAmt IntRate f) The correlation is the slope of the regression line that relates z-scores, so the regression equation would e zˆ 84. z. IntRate MortAmt If interest rates were 2%, we would expect there to e $65.52 million in total mortgages. ($65.39 million if you worked with the actual data.) 35. Online clothes. a) Using the summary statistics given in the prolem, calculate the slope and intercept: rstotal sage (. 37)( ) yˆ + x y + x ( ) The regression equation that predicts total online clothing purchase amount from age is Total ˆ Age ) Yes. Both total purchases and age are quantitative variales, and the scatterplot is Straight Enough, even though it is quite flat. There are no outliers and the plot does not spread throughout the plot. c) Total ˆ Age Total ˆ ( 8) Total ˆ The model predicts that an 8 year old will have $ in total yearly online clothing purchases. Total ˆ Age Total ˆ ( 5) Total ˆ d) R 2 (. 37) %.. The model predicts that a 5 year old will have $ in total yearly online clothing purchases. e) This model would not e useful to the company. The scatterplot is nearly flat. The model accounts for almost none of the variaility in total yearly purchases. Copyright 2 Pearson Education, Inc.

9 96 Part II Exploring Relationships Between Variales 36. Online clothes II. a) Using the summary statistics given in the prolem, calculate the slope and intercept: rstotal sincome (. 722)( ) yˆ + x y + x ( ) 3. 6 (Since the mean income is a relatively large numer, the value of the intercept will vary, ased on the rounding of the slope. Notice that it is very close to zero in the context of yearly income.) ) The assumptions for regression are met. Both variales are quantitative and the plot is Straight Enough. There are several possile outliers, ut none of these points are extreme, and there are 5 data points to estalish a pattern. The spread of the plot does not change throughout the range of income. c) Total ˆ Income Total ˆ ( 2, ) Total ˆ $ The regression equation that predicts total online clothing purchase amount from income is Total ˆ Income The model predicts that a person with $2, yearly income will make $28.4 in online purchases. (Predictions may vary, ased on rounding of the model.) Total ˆ Income Total ˆ ( 8, ) Total ˆ $ The model predicts that a person with $8, yearly income will make $928.4 in online purchases. (Predictions may vary, ased on rounding of the model.) d) R 2 (. 722) %. e) The model accounts for a 52.% of the variation in total yearly purchases, so the model would proaly e useful to the company. Additionally, the difference etween the predicted purchases of a person with $2, yearly income and $8, yearly income is of practical significance. 37. SAT scores. a) The association etween SAT Math scores and SAT Veral Scores was linear, moderate in strength, and positive. Students with high SAT Math scores typically had high SAT Veral scores. ) One student got a 5 Veral and 8 Math. That set of scores doesn t seem to fit the pattern. Copyright 2 Pearson Education, Inc.

10 Chapter 8 Linear Regression 97 c) r.685 indicates a moderate, positive association etween SAT Math and SAT Veral, ut only ecause the scatterplot shows a linear relationship. Students who scored one standard deviation aove the mean in SAT Math were expected to score.685 standard deviations aove the mean in SAT Veral. Additionally, R 2 (. 685) , so 46.9% of the variaility in math score was accounted for y variaility in veral score. d) The scatterplot of veral and math scores shows a relationship that is straight enough, so a linear model is appropriate. rsmath sveral (. 685)( 96.) yˆ + x y + x ( ) The equation of the least squares regression line for predicting SAT Math score from SAT Veral score is Math ˆ ( Veral). e) For each additional point in veral score, the model predicts an increase of.662 points in math score. A more meaningful interpretation might e scaled up. For each additional points in veral score, the model predicts an increase of 6.62 points in math score. f) Math ˆ ( Veral) Math ˆ ( 5) Math ˆ According to the model, a student with a veral score of 5 was expected to have a math score of g) Math ˆ ( Veral) Math ˆ ( 8) Math ˆ According to the model, a student with a veral score of 8 was expected to have a math score of She actually scored 8 on math, so her residual was points 38. Success in college a) A scatterplot showed the relationship etween comined SAT score and GPA to e reasonaly linear, so a linear model is appropriate. rsgpa ssat 47)(. 56) yˆ + x y + x ( 833). 262 The regression equation predicting GPA from SAT score is: GPA ˆ ( SAT ) ) The model predicts that a student with an SAT score of would have a GPA of.262. The y-intercept is not meaningful in this context, since oth scores are impossile. Copyright 2 Pearson Education, Inc.

11 98 Part II Exploring Relationships Between Variales c) The model predicts that students who scored points higher on the SAT tended to have a GPA that was.24 higher. d) GPA ˆ ( SAT ) GPA ˆ ( 2) GPA ˆ 323. According to the model, a student with an SAT score of 2 is expected to have a GPA of e) According to the model, SAT score is not a very good predictor of college GPA. R 2 (. 47) , which means that only 22.9% of the variaility in GPA can e accounted for y the model. The rest of the variaility is determined y other factors. f) A student would prefer to have a positive residual. A positive residual means that the student s actual GPA is higher than the model predicts for someone with the same SAT score. 39. SAT, take 2. a) r.685. The correlation etween SAT Math and SAT Veral is a unitless measure of the degree of linear association etween the two variales. It doesn t depend on the order in which you are making predictions. ) The scatterplot of veral and math scores shows a relationship that is straight enough, so a linear model is appropriate. rsveral smath (. 685)( 99.) yˆ + x y + x ( 62. 2) The equation of the least squares regression line for predicting SAT Veral score from SAT Math score is: Veral ˆ ( Math) c) A positive residual means that the student s actual veral score was higher than the score the model predicted for someone with the same math score. d) e) Veral ˆ ( Math) Veral ˆ ( 5) Veral ˆ Math ˆ ( Veral) Math ˆ ( ) Math ˆ According to the model, a person with a math score of 5 was expected to have a veral score of points. According to the model, a person with a veral score of was expected to have a math score of points. Copyright 2 Pearson Education, Inc.

12 Chapter 8 Linear Regression 99 f) The prediction in part e) does not cycle ack to 5 points ecause the regression equation used to predict math from veral is a different equation than the regression equation used to predict veral from math. One was generated y minimizing squared residuals in the veral direction, the other was generated y minimizing squared residuals in the math direction. If a math score is one standard deviation aove the mean, its predicted veral score regresses toward the mean. The same is true for a veral score used to predict a math score. 4. Success, part 2. rssat sgpa (. 47)( 23) yˆ + x y + x ( 2. 66) The regression equation to predict SAT score from GPA is: SAT ˆ ( GPA ) SAT ˆ ( 3 ) SAT ˆ 868. The model predicts that a student with a GPA of 3. is expected to have an SAT score of Used cars 27. a) We are attempting to predict the price in dollars of used Toyota Corollas from their age in years. A scatterplot of the relationship is at the right. ) There is a strong, negative, linear association etween price and age of used Toyota Corollas. Price ($) Age and Price of Used Toyota Corollas c) The scatterplot provides evidence that the relationship is Straight Enough. A linear model will likely e an appropriate model Age (years) d) Since R 2.944, simply take the square root to find r Since association etween age and price is negative, r e) 94.4% of the variaility in price of a used Toyota Corolla can e accounted for y variaility in the age of the car. f) The relationship is not perfect. Other factors, such as options, condition, and mileage explain the rest of the variaility in price. 42. Drug ause. a) The scatterplot shows a positive, strong, linear relationship. It is straight enough to make the linear model the appropriate model. ) 87.3% of the variaility in percentage of other drug usage can e accounted for y percentage of marijuana use. Copyright 2 Pearson Education, Inc.

13 Part II Exploring Relationships Between Variales c) R 2.873, so r (since the relationship is positive). rso s ˆ The regression equation used to predict y M + x the percentage of teens who use other ( )(.) 2 y + x drugs from the percentage who have used ( 23. 9) marijuana is: Other ˆ ( Marijuana ) (Using the actual data set from Chapter 7, Other ˆ ( Marijuana) ) d) According to the model, each additional percent of teens using marijuana is expected to add.6 percent to the percentage of teens using other drugs. e) The results do not confirm marijuana as a gateway drug. They do indicate an association etween marijuana and other drug usage, ut association does not imply causation. 43. More used cars 27. a) The scatterplot from the previous exercise shows that the relationship is straight, so the linear model is appropriate. The regression equation to predict the price of a used Toyota Corolla from its age is Price ˆ ( Years). The computer regression output used is at the right. ) According to the model, for each additional year in age, the car is expected to drop $959 in price. c) The model predicts that a new Toyota Corolla ( years old) will cost $4,285. d) Price ˆ ( Years) Price ˆ ( 7) Price ˆ 7573 e) Buy the car with the negative residual. Its actual price is lower than predicted. f) Price ˆ Price ˆ ( Years) ( ) Price ˆ 4696 Dependent variale is: No Selector Price ($) R squared 94.4% R squared (adjusted) 94.% s 86.2 with degrees of freedom Source Regression Residual Variale Constant Age (yr) Sum of Squares Coefficient According to the model, an appropriate price for a 7-year old Toyota Corolla is $7573. According to the model, a -year-old Corolla is expected to cost $4696. The car has an actual price of $35, so its residual is $35 $4696 $96 The car costs $96 less than predicted. g) The model would not e useful for predicting the price of a 2-year-old Corolla. The oldest car in the list is 3 years old. Predicting a price after 2 years would e an extrapolation. df 3 s.e. of Coeff Mean Square t-ratio F-ratio 22 pro.. Copyright 2 Pearson Education, Inc.

14 44. Birth rates 25. a) A scatterplot of the live irth rates in the US over time is at the right. The association is negative, strong, and appears to e curved, with one low outlier, the rate of 4.8 live irths per women age 5 44 in 975. Generally, as time passes, the irth rate is getting lower. ) Although the association is slightly curved, it is straight enough to try a linear model. The linear regression output from a computer program is shown elow: Dependent variale is: Birth Rate No Selector R squared 67.4% R squared (adjusted) 62.8% s.22 with degrees of freedom Source Regression Residual Sum of Squares df 7 Mean Square F-ratio 4.5 Chapter 8 Linear Regression # of live irths US Birth Rate Year The linear regression model for predicting irth rate from year is: Birthrate ˆ ( Year ) Variale Constant Year Coefficient s.e. of Coeff t-ratio pro c) The residuals plot, at the right, shows a slight curve. Additionally, the scatterplot shows a low outlier for the year 975. We may want to investigate further. At the very least, e cautious when using this model. d) The model predicts that each passing year is associated with a decline in irth rate of. irths per women. e) Birthrate ˆ ( Year) Birthrate ˆ ( 978) Birthrate ˆ The model predicts aout 6.74 irths per women in 978. f) If the actual irth rate in 978 was 5. irths per women, the model has a residual of irths per women. This means that the model predicted.74 irths higher than the actual rate. g) Birthrate ˆ ( Year) Birthrate ˆ ( 2) Birthrate ˆ 3. 2 Residuals Predicted # of live irths per women According to the model, the irth rate in 2 is predicted to e 3.6 irths per women. This prediction seems a it low. It is an extrapolation outside the range of the data, and furthermore, the model only explains 67% of the variaility in irth rate. Don t place too much faith in this prediction Copyright 2 Pearson Education, Inc.

15 2 Part II Exploring Relationships Between Variales h) Birthrate ˆ ( Year) Birthrate ˆ ( 225) Birthrate ˆ. 55 According to the model, the irth rate in 225 is predicted to e.55 irths per women. This prediction is an extreme extrapolation outside the range of the data, which is dangerous. No faith should e placed in this prediction. 45. Burgers. a) The scatterplot of calories vs. fat content in fast food hamurgers is at the right. The relationship appears linear, so a linear model is appropriate. 675 Fat and Calories of Fast Food Burgers Dependent variale is: Calories No Selector R squared 92.3% R squared (adjusted) 9.7% s with degrees of freedom Source Regression Residual Sum of Squares df 5 Mean Square F-ratio 59.8 # of Calories Variale Constant Fat Coefficient s.e. of Coeff t-ratio pro Fat (grams) ) From the computer regression output, R %. 92.3% of the variaility in the numer of calories can e explained y the variaility in the numer of grams of fat in a fast food urger. c) From the computer regression output, the regression equation that predicts the numer of calories in a fast food urger from its fat content is: Calories ˆ Fat d) The residuals plot at the right shows no pattern. The linear model appears to e appropriate. e) The model predicts that a fat free urger would have calories. Since there are no data values close to, this is an extrapolation outside the data and isn t of much use. + ( ) f) For each additional gram of fat in a urger, the model predicts an increase of.56 calories. predicted(c/f) g) Calories ˆ ( Fat) ( 28) The model predicts a urger with 28 grams of fat will have calories. If the residual is +33, the actual numer of calories is calories. 46. Chicken. a) The scatterplot is fairly straight, so the linear model is appropriate. ) The correlation of.947 indicates a strong, linear, positive relationship etween fat and calories for chicken sandwiches Copyright 2 Pearson Education, Inc.

16 Chapter 8 Linear Regression 3 c) rs s Cal Fat (. 947)( 44.) yˆ + x y + x ( 2. 6) The linear model for predicting calories from fat in chicken sandwiches is: Calories ˆ Fat + ( ) d) For each additional gram of fat, the model predicts an increase in calories. e) According to the model, a fat-free chicken sandwich would have calories. This is proaly an extrapolation, although without the actual data, we can t e sure. f) In this context, a negative residual means that a chicken sandwich has fewer calories than the model predicts. 47. A second helping of urgers. a) The model from the previous exercise was for predicting numer of calories from numer of grams of fat. In order to predict grams of fat from the numer of calories, a new linear model needs to e generated. ) The scatterplot at the right shows the relationship etween numer fat grams and numer of calories in a set of fast food urgers. The association is strong, positive, and linear. Burgers with higher numers of calories typically have higher fat contents. The relationship is straight enough to apply a linear model. Dependent variale is: Fat No Selector R squared 92.3% R squared (adjusted) 9.7% s with degrees of freedom Fat (grams) Calories and Fat in Fast Food Burgers Source Regression Residual Sum of Squares df 5 Mean Square F-ratio # of Calories Variale Constant Calories Coefficient s.e. of Coeff t-ratio pro The linear model for predicting fat from calories is: Fat ˆ ( Calories) The model predicts that for every additional calories, the fat content is expected to increase y aout 8.3 grams predicted(f/c) Copyright 2 Pearson Education, Inc.

17 4 Part II Exploring Relationships Between Variales The residuals plot shows no pattern, so the model is appropriate. R %, so 92.3% of the variaility in fat content can e accounted for y the model. Fat ˆ Calories Fat ˆ ( 6) Fat ˆ A second helping of chicken. + ( ) a) The model from the previous exercise was for predicting numer of calories from numer of grams of fat. In order to predict grams of fat from the numer of calories, a new linear model needs to e generated. ) The scatterplot is fairly straight, so the linear model is appropriate. The correlation of.947 indicates a strong, linear, positive relationship etween fat and calories for chicken sandwiches. rsfat s yˆ + x Cal 947)(.) 98 y + x ( ) According to the model, a urger with 6 calories is expected to have 35. grams of fat. The linear model for predicting fat from calories in chicken sandwiches is: Fat ˆ Calories + ( ) According to the linear model, a chicken sandwich with 4 calories is expected to have approximately ( 4) 5. 9 grams of fat. 49. Body fat. a) The scatterplot of % ody fat and weight of 2 male sujects, at the right, shows a strong, positive, linear association. Generally, as a suject s weight increases, so does % ody fat. The association is straight enough to justify the use of the linear model. The linear model that predicts % ody fat from weight is: % Fat ˆ ( Weight) Body fat (%) 3 2 Weight and Body Fat Weight(l) ) The residuals plot, at the right, shows no apparent pattern. The linear model is appropriate. c) According to the model, for each additional pound of weight, ody fat is expected to increase y aout.25%. Residual d) Only 48.5% of the variaility in % ody fat can e accounted for y the model. The model is not expected to make predictions that are accurate Predicted Body Fat (%) Copyright 2 Pearson Education, Inc.

18 e) % Fat ˆ ( Weight) % Fat ˆ ( 9) % Fat ˆ Chapter 8 Linear Regression 5 According to the model, the predicted ody fat for a 9-pound man is %. The residual is %. 5. Body fat, again. The scatterplot of % ody fat and waist size is at the right. The association is strong, linear, and positive. As waist size increases, % ody fat has a tendency to increase, as well. The scatterplot is straight enough to justify the use of the linear model. The linear model for predicting % ody fat from waist size is : % Fat ˆ ( Waist). For each additional inch in waist size, the model predicts an increase of 2.222% ody fat. 78.7% of the variaility in % ody fat can e accounted for y waist size. The residuals plot, at right, shows no apparent pattern. The residuals plot and the relatively high value of R 2 indicate an appropriate model with more predicting power than the model ased on weight. Body fat (%) Residual Waist Size and Body Fat Waist Size (inches) Predicted Body Fat (%) 5. Heptathlon 24. a) Both high jump height and 8 meter time are quantitative variales, the association is straight enough to use linear regression. Dependent variale is: High Jump No Selector R squared 6.4% R squared (adjusted) 2.9% s.67 with degrees of freedom Source Regression Residual Variale Constant 8m Sum of Squares Coefficient e-3 df 24 s.e. of Coeff Mean Square t-ratio F-ratio 4.7 pro..4 High jump heights Heptathlon Results meter times (sec) Copyright 2 Pearson Education, Inc.

19 6 Part II Exploring Relationships Between Variales The regression equation to predict high jump from 8m results is: Highjump ˆ ( 8 m). According to the model, the predicted high jump decreases y an average of.67 meters for each additional second in 8 meter time. ) R 2 6.%. 4 This means that 6.4% of the variaility in high jump height is accounted for y the variaility in 8 meter time. c) Yes, good high jumpers tend to e fast runners. The slope of the association is negative. Faster runners tend to jump higher, as well. d) The residuals plot is fairly patternless. The scatterplot shows a slight tendency for less variation in high jump height among the slower runners than the faster runners. Overall, the linear model is appropriate. e) The linear model is not particularly useful for predicting high jump performance. First of all, 6.4% of the variaility in high jump height is accounted for y the variaility in 8 meter time, leaving 83.6% of the variaility accounted for y other variales. Secondly, the residual standard deviation is.62 meters, which is not much smaller than the standard deviation of all high jumps,.66 meters. Predictions are not likely to e accurate. 52. Heptathlon 24 again. a) Both high jump height and long jump distance are quantitative variales, the association is straight enough, and there are no outliers. It is appropriate to use linear regression. Dependent variale is: Long Jump No Selector R squared 2.6% R squared (adjusted) 9.% s.96 with degrees of freedom Source Regression Residual Variale Constant High Jump Sum of Squares Coefficient df 24 s.e. of Coeff Mean Square t-ratio F-ratio 3.47 pro The regression equation to predict long jump from high jump results is: Longjump ˆ ( Highjump). Residuals plot Predicted high jumps (meters) According to the model, the predicted long jump increases y an average of.54 meters for each additional meter in high jump height. ) R 2 2.%. 6 This means that only 2.6% of the variaility in long jump distance is accounted for y the variaility in high jump height. c) Yes, good high jumpers tend to e good long jumpers. The slope of the association is positive. Better high jumpers tend to e etter long jumpers, as well. Long jump distance Residual Heptathlon Results High jump heights (meters) Copyright 2 Pearson Education, Inc.

20 Chapter 8 Linear Regression 7 d) The residuals plot is fairly patternless. The linear model is appropriate. e) The linear model is not particularly useful for predicting long jump performance. First of all, only 2.6% of the variaility in long jump distance is accounted for y the variaility in high jump height, leaving 87.4% of the variaility accounted for y other variales. Secondly, the residual standard deviation is.96 meters, which is aout the same as the standard deviation of all long jumps jumps,.26 meters. Predictions are not likely to e accurate. Residual Residuals plot Predicted high jumps (meters) 53. Least squares. If the 4 x-values are plugged into yˆ 7+. x, the 4 predicted values are ŷ 8, 29, 5 and 62, respectively. The 4 residuals are 8, 2, 3, and 8. The squared residuals are 64, 44, 96, and 324, respectively. The sum of the squared residuals is 79. Least squares means that no other line has a sum lower than 79. In other words, it s the est fit. y x 54. Least squares. If the 4 x-values are plugged into yˆ x, the 4 predicted values are ŷ 885, 795, 75, and 65, respectively. The 4 residuals are 65, -45, 95, and -5. The squared residuals are 4225, 225, 925, and 225, respectively. The sum of the squared residuals is 34,5. Least squares means that no other line has a sum lower than 34,5. In other words, it s the est fit. y x Copyright 2 Pearson Education, Inc.

Homework 8 Solutions

Homework 8 Solutions Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Chapter 7 Scatterplots, Association, and Correlation

Chapter 7 Scatterplots, Association, and Correlation 78 Part II Exploring Relationships Between Variables Chapter 7 Scatterplots, Association, and Correlation 1. Association. a) Either weight in grams or weight in ounces could be the explanatory or response

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. ch12 practice test 1) The null hypothesis that x and y are is H0: = 0. 1) 2) When a two-sided significance test about a population slope has a P-value below 0.05, the 95% confidence interval for A) does

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5 Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Shape of Data Distributions

Shape of Data Distributions Lesson 13 Main Idea Describe a data distribution by its center, spread, and overall shape. Relate the choice of center and spread to the shape of the distribution. New Vocabulary distribution symmetric

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1.

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1. Review Problems for Exam 3 Math 1040 1 1. Find the probability that a standard normal random variable is less than 2.37. Looking up 2.37 on the normal table, we see that the probability is 0.9911. 2. Find

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Non-Linear Regression 2006-2008 Samuel L. Baker

Non-Linear Regression 2006-2008 Samuel L. Baker NON-LINEAR REGRESSION 1 Non-Linear Regression 2006-2008 Samuel L. Baker The linear least squares method that you have een using fits a straight line or a flat plane to a unch of data points. Sometimes

More information

Chapter 9 Descriptive Statistics for Bivariate Data

Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

WHAT IS A BETTER PREDICTOR OF ACADEMIC SUCCESS IN AN MBA PROGRAM: WORK EXPERIENCE OR THE GMAT?

WHAT IS A BETTER PREDICTOR OF ACADEMIC SUCCESS IN AN MBA PROGRAM: WORK EXPERIENCE OR THE GMAT? WHAT IS A BETTER PREDICTOR OF ACADEMIC SUCCESS IN AN MBA PROGRAM: WORK EXPERIENCE OR THE GMAT? Michael H. Deis, School of Business, Clayton State University, Morrow, Georgia 3060, (678)466-4541, MichaelDeis@clayton.edu

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Worksheet A5: Slope Intercept Form

Worksheet A5: Slope Intercept Form Name Date Worksheet A5: Slope Intercept Form Find the Slope of each line below 1 3 Y - - - - - - - - - - Graph the lines containing the point below, then find their slopes from counting on the graph!.

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 11:20AM-12:30PM,

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Linear Regression. use http://www.stat.columbia.edu/~martin/w1111/data/body_fat. 30 35 40 45 waist

Linear Regression. use http://www.stat.columbia.edu/~martin/w1111/data/body_fat. 30 35 40 45 waist Linear Regression In this tutorial we will explore fitting linear regression models using STATA. We will also cover ways of re-expressing variables in a data set if the conditions for linear regression

More information

WIN AT ANY COST? How should sports teams spend their m oney to win more games?

WIN AT ANY COST? How should sports teams spend their m oney to win more games? Mathalicious 2014 lesson guide WIN AT ANY COST? How should sports teams spend their m oney to win more games? Professional sports teams drop serious cash to try and secure the very best talent, and the

More information

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Scatterplot; Roles of Variables 3 Features of Relationship Correlation Regression Definition Scatterplot displays relationship

More information

First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Solution Let us regress percentage of games versus total payroll.

Solution Let us regress percentage of games versus total payroll. Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

Logs Transformation in a Regression Equation

Logs Transformation in a Regression Equation Fall, 2001 1 Logs as the Predictor Logs Transformation in a Regression Equation The interpretation of the slope and intercept in a regression change when the predictor (X) is put on a log scale. In this

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

5 Double Integrals over Rectangular Regions

5 Double Integrals over Rectangular Regions Chapter 7 Section 5 Doule Integrals over Rectangular Regions 569 5 Doule Integrals over Rectangular Regions In Prolems 5 through 53, use the method of Lagrange multipliers to find the indicated maximum

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

THE LEAST SQUARES LINE (other names Best-Fit Line or Regression Line )

THE LEAST SQUARES LINE (other names Best-Fit Line or Regression Line ) Sales THE LEAST SQUARES LINE (other names Best-Fit Line or Regression Line ) 1 Problem: A sales manager noticed that the annual sales of his employees increase with years of experience. To estimate the

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Lesson 3.2.1 Using Lines to Make Predictions

Lesson 3.2.1 Using Lines to Make Predictions STATWAY INSTRUCTOR NOTES i INSTRUCTOR SPECIFIC MATERIAL IS INDENTED AND APPEARS IN GREY ESTIMATED TIME 50 minutes MATERIALS REQUIRED Overhead or electronic display of scatterplots in lesson BRIEF DESCRIPTION

More information

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability RIT Score Range: Below 171 Below 171 Data Analysis and Statistics Solves simple problems based on data from tables* Compares

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Transforming Bivariate Data

Transforming Bivariate Data Math Objectives Students will recognize that bivariate data can be transformed to reduce the curvature in the graph of a relationship between two variables. Students will use scatterplots, residual plots,

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Chapter 7: Modeling Relationships of Multiple Variables with Linear Regression

Chapter 7: Modeling Relationships of Multiple Variables with Linear Regression Chapter 7: Modeling Relationships of Multiple Variables with Linear Regression Overview Chapters 5 and 6 examined methods to test relationships between two variables. Many research projects, however, require

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

ALGEBRA I (Common Core) Thursday, January 28, 2016 1:15 to 4:15 p.m., only

ALGEBRA I (Common Core) Thursday, January 28, 2016 1:15 to 4:15 p.m., only ALGEBRA I (COMMON CORE) The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION ALGEBRA I (Common Core) Thursday, January 28, 2016 1:15 to 4:15 p.m., only Student Name: School Name: The

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

The Importance of Statistics Education

The Importance of Statistics Education The Importance of Statistics Education Professor Jessica Utts Department of Statistics University of California, Irvine http://www.ics.uci.edu/~jutts jutts@uci.edu Outline of Talk What is Statistics? Four

More information

Interaction between quantitative predictors

Interaction between quantitative predictors Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information