1
DISPLAYING THE RELATIONSHIP DEFINITIONS: Studes are often conducted to attempt to show that some eplanatory varable causes the values of some response varable to occur. The response or dependent varable s the response of nterest, the varable we want to predct, and s usually denoted by y. The eplanatory or ndependent varable attempts to eplan the response and s usually denoted by. A scatterplot shows the relatonshp between two quanttatve varables and y. The values of the varable are marked on the horzontal as, and the values of the y varable are marked n the vertcal as. Each par of observatons (, y ), s represented as a pont n the plot. Two varables are sad to be postvely assocated f, as ncreases, the values of y tends to ncrease. Two varables are sad to be negatvely assocated f, as ncreases, the values of y tends to decrease. When a scatterplot does not show a partcular drecton, nether postve, nor negatve, we say that there s no lnear assocaton.
Fnal 100 90 80 70 60 50 40 30 0 10 0 Scatterplot of Fnal vs Mdterm Scores The 10th student (1, 38) 0 10 0 30 40 50 Mdterm X Mdterm Score Y Fnal Score 1 39 6 44 69 3 3 68 4 40 86 5 45 88.5 6 46 88.5 7 33 76 8 39 66.5 9 3.5 75 10 1 38 11 30 71 1 39 88 13 44 96.5 14 8.5 71.5 15 38 96 16 43 8.5 17 4 85 18 5.5 8 19 47 95 0 36 39 1 31.5 58 3 49 3 4 6 4 1 59 5 41 90 3
Let's Do It! 1 The data below was obtaned n a study of age and systolc blood pressure of s randomly selected subjects. Make a scatter plot to eamne the relatonshp between () = age and (y) = pressure. Comment on the relatonshp wth respect to form, drecton, strength, and any departures or usual values. Subject Age Pressure y A 43 18 B 48 10 C 56 135 D 61 143 E 67 141 F 70 15 4
Notes of Cauton 1. An observed relatonshp between two varables does not mply that there s some causal lnk between the two varables. For eample, consder the followng scatter-plot of IQ score versus shoe sze: IQ Shoe Sze As a person ages ther shoe sze ncreases as well as ther IQ. Although there s a postve assocaton, there s no causal lnk between the two varables shoes sze and IQ. Most studes attempt to show that some eplanatory varable "causes" the values of the response to occur. Whle we can never postvely determne whether or not there s a dstnct cause-and-effect relatonshp, we can assess f there appears to be such relatonshp.. A relatonshp between two varables can be nfluenced by confoundng varables. Consder the followng scatter-plot of the number of sport magaznes read n a month versus the heght of the person: Number of magaznes read : women : men Heght Overall there appears to be a postve assocaton between heght and number of magaznes. However, f for each gender, there does not appear to be an assocaton. Gender s a confoundng varable and aggregatng the data across gender can result n msleadng conclusons. Any study, especally an observatonal study, has the potental to be wrongly nterpreted because of confoundng varables. 5
3. Unusual data ponts (outlers) can mslead the assocaton, especally f the data set s small. Consder the followng scatter-plot of the percentage of people who speak Englsh versus populaton sze. Percent who speak Englsh Outler Populaton Sze The eght ponts n the scatter-plot represents eght countres from Central and South Amerca selected at random. The outler s Meco Cty. 4. Sometmes a scatter plot, such as the one n Fgure below, shows a curvlnear relatonshp between the data. In ths stuaton, Methods for curvlnear relatonshps are beyond the scope of ths course. 6
Smple Lnear Regresson Fnal 100 90 80 70 60 50 40 30 0 10 0 Scatterplot of Fnal vs Mdterm Scores L n e # 1 L n e # 0 10 0 30 40 50 Mdterm So the queston remans as to how to fnd a best-fttng lne? Equaton of a Lne y = a + b where b = slope - the amount y changes when s ncreased by 1 unt. a = y-ntercept - the value of y when s set equal to zero. 7
DEFINITION:: The least squares regresson lne, gven by y a b, s the lne that makes the sum of the squared vertcal devatons of the data ponts from the lne as small as possble. Performng the regresson s often stated as regress y on. Least squares regresson lne for regressng fnal eam scores, y, on mdterm eam scores,, s gven by y 7. 5175.. Estmated slope of b=1.75 tells us that for a 1-pont ncrease on the mdterm we would epect, on average, an ncrease of 1.75 ponts on the fnal eam. Estmated y-ntercept of a=7.5 tells us that f someone were to score 0 ponts on the mdterm, we would predct they would get 7.5 ponts on the fnal eam. Suppose a new student scores 40 ponts on the mdterm. Based on our model, what would be ther predcted fnal eam score? Plug the value of =40 nto our estmated equaton. The predcted fnal Eam score s y 7. 5175. ( 40) 77. 5 ponts. 8
Let's Do It! 13. Chldhood Growth The growth of chldren from early chldhood through adolescence generally follows a lnear pattern. Data on the heghts of female Amercans durng chldhood, from four to nne years old, were compled and the least squares regresson lne was obtaned as y 80 6, where y s heght n centmeters and s age n years. Note that 1 nch s equal to.54 centmeters. (a) Interpret the value of the estmated slope b= 6. (b) Would nterpretaton of the value of the estmated y-ntercept, a= 80, make sense here? If yes, nterpret t. If no, eplan why not. (c) What would you predct the heght to be for a female Amercan at 8 years old? Gve your answer frst n centmeters then n nches. (d) What would you predct the heght to be for a female Amercan at 5 years old? Gve your answer frst n centmeters then n nches. (e) Why do you thnk your answer to part (d) was so naccurate? 9
Calculatng the Least Squares Regresson Lne The Least Squares Regresson Lne The least squares regresson lne s gven by y a b where slope = y y b y ntercept = a y b n n y y Eample Test 1 versus Test Obtanng the Regresson Lne By Hand a) Look at the relatonshp graphcally wth a scatter-plot to confrm ntally that a lnear model seems approprate. b)calculate the estmated regresson lne by completng the calculaton table shown below. n n y y 5 884 6070 0 b 5 760 60 00 11.. 70 a y b 11. 60 08.. 5 5 Least squares equaton: y 08. 11. c) Slope of the lne s b= 1.1. Ths means that Test scores are epected to go up by 1.1 ponts on average for each addtonal pont scored on test 1. 10
d) A student who scored 15 ponts on Test 1 s predcted to score y 08. 11. ( 15) 17. 3 ponts on Test. Test 1 versus Test Obtanng the Regresson Lne Usng the TI Calculator To obtan the least squares regresson lne usng the TI graphng calculator we would frst need to enter the data. L1 L 8 9 10 13 1 14 14 15 16 19 Enter the values of the quanttatve varable = Test 1 nto L 1 and enter to correspondng value of the quanttatve varable y=test nto L. To get the least squared regresson equaton we use the followng sequence of buttons Your output screen should provde the least squares regresson equaton as y = a + b wth the y-ntercept of a=0.8 and the slope of b=1.1. Cauton: There are two lnear regresson optons-namely LnReg(a+b) and LnReg(a+b). We request the latter opton, whch uses b to represent the slope. 11
Let's Do It! 13.3 Ol-Change Data The table below presents data on = the number of ol changes per year and y = the cost of repars for a random sample of 10 cars of a certan make and model, from a gven regon. (a) Make a scatter-plot of the ponts as a check for lnearty and outlers. Comment on your plot. (b) Fnd the least squares regresson lne for regressng cost on number of ol changes. Descrbe what the estmated y-ntercept and estmated slope represent. (c) Use your least squares regresson lne to predct the cost of car repars for a car that had four ol changes. 1
CORRELATION: HOW STRONG IS THE LINEAR RELATIONSHIP? DEFINITION: The sample correlaton coeffcent r measures the strength of the lnear relatonshp between two quanttatve varables. It descrbes the drecton of the lnear assocaton and ndcates how closely the ponts n a scatter-plot are to the least squares regresson lne. Features of the correlaton coeffcent. 1. Range 1 r 1. Sgn The sgn of the correlaton coeffcent ndcates drecton of assocaton negatve [-1, 0) or postve (0, +1]. 3. Magntude The magntude of the correlaton coeffcent ndcates the strength of the lnear assocaton. If the data follow a straght lne r 1 (f the slope s postve) or r 1 (f the slope s negatve), ndcatng a perfect lnear assocaton. If r 0 then there s no lnear assocaton. 4. Measures Strength The correlaton only measures the strength of the lnear assocaton. 5. Unt-less The correlaton s computed usng standard scores of the two varables. It has no unt of measure and the absolute value of r wll not change f the unts of measurement for or y are changed. The correlaton between and y the same as the correlaton between y and. Some Pctures... s 13
y r 0.8. Postve, moderate to strong lnear assocaton, y Negatve, weak lnear assocaton, r 0. y A strong assocaton, just not a lnear one, r 0. Let's Do It! 113.8Matchng Graphs The scatter-plot #1 to the rght yelds a regresson lne of y = -.6 + 1.1 and a correlaton of r = 0.84. Usng ths nformaton as a base, match each of the four scatter-plots below to the correct descrpton of ts regresson lne and correlaton coeffcent. The scales on the aes of the scatter-plots are the same. 14
15
How to Calculate the Correlaton Coeffcent r The formula: r n n y y n y y Eample Test 1 v e r s us Test Obtanng t he Correlaton Coeffcent By Hand We already have computed the summaton quanttes needed for fndng r, shown n the calculaton table. Completed Calculaton Table y y y 8 9 64 7 81 10 13 100 130 169 1 14 144 168 196 14 15 196 10 5 16 19 56 304 361 Total: 60 y 70 760 y 884 y 103 r n y y n n y y 5( 884) ( 60)( 70) 5( 760) ( 60) 5( 103) ( 70) 0. 965 The large postve correlaton coeffcent and the scatter-plot ndcate a strong, postve, lnear assocaton between Test 1 and Test scores. 16
Obtanng the Correlaton Usng the TI To get the regresson lne and the correlaton coeffcent usng the TI we frst need to turn on the dagnostc opton. If the data s n L 1 and the y data s n L, then the steps are as follows: Let s Do It! Brth Rates We gathered data from 1970 for twelve natons on the percentage of women aged 14 or older who were economcally actve and the crude brth rate. (We defne the crude brth rate as the number of brths n a year per 1000 populaton sze) We are nterested n the relatonshp of the crude brth rate (y) on the percentage of women who were economcally actve () a. Create the scatter-plot. Determne f there s a postve, negatve, or assocaton between and y. Naton y Algera 48 Argentna 19 1 Denmark 34 14 E. Germany 40 11 Guatemala 8 41 Inda 1 37 Ireland 0 Jamaca 0 31 Japan 37 19 Phlppnes 19 4 USA 30 15 Sovet Unon 46 18 b. Fnd the equaton of the regresson lne. Interpret the slope. c. Fnd the correlaton coeffcent r. Homework wll be posted on MyMathlab 17