Chapter 3. The Normal Distribution

Chapter 3. The Normal Distribution Topics covered in this chapter: Z-scores Normal Probabilities Normal Percentiles Z-scores Example 3.6: The standard normal table The Problem: What proportion of observations on a standard Normal variable z take values less than 1.47? Find a z-score in SPSS. 1. Open SPSS. 2. Type the number 1.47 in the first cell of the Data Editor. 3. Go to the Transform menu. 4. Scroll to the Compute Variable option. The following window should open: 34

35 Chapter 3 5. Under Function Group, scroll down and select the CDF & Noncentral CDF option. 6. Under Functions and Special Variables, scroll down and double-click the Cdfnorm option. Now the window should appear like this:

The Normal Distribution 36 7. Replace the question mark under Numeric Expression with the variable ZValue by highlighting the question mark, clicking on the variable ZValue to the left and then clicking the arrow to the left of the Numeric Expression box. 8. Under Target Variable type any variable name you like, for example Probability. 9. Click OK. Now the answer should be adjacent to the value of 1.47 in your Data Editor in a column entitled whatever you named the Target Variable as seen below. Normal Probabilities Example 3.8: Who qualifies for an athletic scholarship? The Problem: The NCAA considers a student a partial qualifier if the combined SAT score is at least 720. Partial qualifiers can receive athletic scholarships and practice with the team, but they can t compete during their first college year. What proportion of all students who take the SAT would be partial qualifiers, receiving a combined SAT score of between 720 and 820? SAT scores are distributed with a mean of 1026 and a standard deviation of 209. 1. Open a new window in SPSS. 2. Click on the Variable View tab and create a variable named SAT. 3. Click on the Data View tab and enter two data values: 720 and 820.

37 Chapter 3 4. Go to the Transform menu. 5. Scroll to the Compute Variable option. The following window should open: 6. Under Function Group, scroll down to the CDF & Noncentral CDF option.

The Normal Distribution 38 7. Under Functions and Special Variables, scroll down to the Cdf.Normal option and double-click. Now the previous window should appear like this: 8. Replace the first question mark under Numeric Expression with the variable SAT by highlighting the first question mark, clicking on the variable Quant to the left and then clicking the arrow to the left of the Numeric Expression box. 9. Replace the second question mark under Numeric Expression with the mean of 1026 as given in the problem. 10. Replace the third question mark under Numeric Expression with the standard deviation of 209. 11. Under Target Variable type the variable name Probability.

39 Chapter 3 12. Click OK. Now two probabilities may be viewed in the Data Editor, the probability that a student scores less than a 720 for their combined SAT score and the probability that a student scores less than an 820 for their combined SAT score. Since the question asked for the probability that a student scored between a 720 and an 820, the two probabilities should be subtracted, leaving a final probability of 0.16 0.07 = 0.09 or 9 percent. Normal Percentiles Example 3.9: Find the top 10% using software The Problem: Scores on the SAT Verbal test in recent years follow approximately the N(504,111) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT? 1. Click on the Variable View tab. 2. Create three variables named Prob, Mean, and SD. Change the number of decimals for Mean and SD to 0.

The Normal Distribution 40 3. Go to the Data View tab. 4. Type.90 under the Prob column in the first row. We want the location of the top 10% which has the same bordering point as the lower 90%, and the normal distribution uses only lower probabilities. 5. Type 504 under the Mean column. Type 111 under the SD column. 6. Go to the Transform menu. 7. Scroll to the Compute Variable option. The following window should open: 8. Under Function Group, scroll down to the Inverse DF option. 9. Under Functions and Special Variables, scroll down to the Idf.Normal option and double-click. 10. Replace the first question mark under Numeric Expression with the variable Prob by highlighting the first question mark, clicking on the variable Prob to the left and then clicking the arrow to the left of the Numeric Expression box. 11. Replace the second question mark under Numeric Expression with the variable Mean by highlighting the second question mark, clicking on the variable Mean to the left and then clicking the arrow to the left of the Numeric Expression box. 12. Replace the third question mark under Numeric Expression with the variable SD by highlighting the third question mark, clicking on the

41 Chapter 3 variable SD to the left and then clicking the arrow to the left of the Numeric Expression box. 13. Under Target Variable type a variable name you like, for example ANS. 14. Click OK. Now the answer should be adjacent to the three variables in your SPSS Data Editor in a column entitled whatever you typed in Target Variable.

The Normal Distribution 42 Chapter 3 Exercises 3.9 Men s and women s heights. 3.11 Monsoon rains. 3.13 Table A. 3.29 Standard normal drill. 3.31 Acid rain? 3.33 A milling machine. 3.35 In my Chevrolet. 3.37 The middle half. 3.39 What s your percentile? 3.41 Heights of men and women. 3.43 A surprising calculation. 3.47 Normal is only approximate: ACT scores. 3.49 Are the data normal? Fruit fly thorax lengths. 3.51 Are the data normal? Soil penetrability. 3.53 Where are the quartiles?

317 Chapter 3 SPSS Solutions 3.9 It s inconvenient to use Minitab for a computation such as this. Using a standard calculator, we can easily compute the z-scores. To compute the z-scores, we use the formula z = ( value μ)/ σ. Either do the subtraction first, or be sure to use parentheses. A woman six feet (72 ) tall is 2.96 standard deviations above the mean; the six foot tall man is 0.964 standard deviations above the mean. The woman is much taller, relative to other women, than the man is, compared to other men. 3.11 To find the percent of years with less than 697 mm of rain, we use Transform, Compute Variable. Locate the CDF & Noncentral CDF Function group, then the CDF.Normal function in the lower box. Clicking on that will transfer the command shell into the Numeric Expression box. Notice that in the lower center of the box there is a description of the command and its parameters. Enter the parameters as shown, then OK computes the probability into the worksheet (as variable Drought, here). For more decimal places in your result (remember, the default is two), click on the Variable view tab and increase them. About 2.9% of all years will have less than 697 mm of rain. To find the percent of normal rainfall years (between 683 mm and 1022 mm), we ll find the cumulative probability for 1022 mm and subtract the cumulative probability of

318 683. We do this in one combination of CDF.Normal calculations as shown below. About 96.1% of all years will have normal rainfall. 3.13 Here, we are given a relative frequency under the standard Normal curve. We need to find the value of z. We ll again use Transform, Compute Variable. Locate the Inverse DF Function group, then the IDF.Normal function in the lower box. Clicking on that will transfer the command shell into the Numeric Expression box. Notice that in the lower center of the box there is a description of the command and its parameters. Enter the parameters as shown, then OK computes the probability into the worksheet (as variable Z here). The point z with 20% of the area below it is z = 0.842. We repeat for part (b) using 0.6 as the area to the left of the point (since 40% of the observations are above it). This point is z = 0.253. 3.29 As with Exercise 3.13 above, use Transform, Compute Variable, we want the Inverse DF and IDF.Normal. As before, enter the area to the left of the desired point on the curve (0.8), the value of the mean (0) and standard deviation (1). This point is z = 0.842.

319 Part (b) asks for the point with 35% of all observations above it; this means that 65% = 0.65 are below it. This point is z = 0.39. 3.31 To find the proportion of rainy days that meet the acid rain criteria, we use Transform, Compute Variable. Locate the CDF & Noncentral CDF Function group, then the CDF.Normal function in the lower box. Clicking on that will transfer the command shell into the Numeric Expression box. Notice that in the lower center of the box there is a description of the command and its parameters. Enter the parameters as shown, then OK computes the probability into the worksheet (as variable Acid, here). For more decimal places in your result (remember, the default is two), click on the Variable view tab and increase them. At this location 22.9% of days will qualify as acid rain days. 3.33 To find the proportion of slots that meet specifications, we ll use Transform, Compute Variable and find the cumulative probability for 0.878 inch and subtract the cumulative probability of 0.872 inch. We do this in one combination of CDF.Normal calculations as shown below. About 98.76% of slots will meet the specifications. 3.35 This problem refers to the information given about 2008 model vehicles. They had mean 18.7 mpg and standard deviation 4.3 mpg. We want to know the area to the left of the Chevy Malibu (with 25 mpg). Use Transform, Compute Variable and find the cumulative probability for the Malibu as below. 92.86% of 2008 cars had worse mileage than the Chevy Malibu.

320 3.37 To find the quartiles, we want the points with (respectively) 25% and 75% of the area below them. We can find these values using Transform, Compute Variable. We want the Inverse DF and IDF.Normal. As before, enter the area to the left of the desired point on the curve (0.25, then 0.75), the value of the mean (18.7) and standard deviation (4.3). This point is z = 0.842. We find that Q 1 (the 25 th percentile) is 15.80 mpg and Q 3 (the 75 th percentile) is 21.60 mpg. 3.39 The percentile corresponds to the area to the left of the value of interest. We find this using Transform, Compute Variable and find the cumulative probability for the Jacob as below. We see that Jacob is not quite at the 15 th percentile (his is 14.9). 3.41 We want to know what proportion of women are taller than the average man (69.3 ). We ll use Transform, Compute Variable but subtract the percent of women shorter than 69.3 from 1 to find the proportion taller than 69.3 Be sure to use the values for the women s distribution: mean (64), and the standard deviation (2.7). We see that not quite 2.5% (2.48%) of women should be taller than the average man.

321 3.43 To find the proportion of students scoring at least 750, we ll use Transform, Compute Variable and subtract the proportion scoring less than 750 from 1 as we did in Exercise 3.41. We see that 3.1% of men scored at least 750 while only 1.1% of women did this well. 3.47 To find the proportion scoring higher than 27, divide the given numbers; to find the proportion scoring 27 or more, add the number that scored 27 to the first. We find that 11.5% scored higher than 27, while 15.3% scored at least 27. To compare this with the Normal computation, use CDF.Normal to find the proportion scoring at least than 27 by subtracting the proportion scoring less than 27 from 1. We would expect 12.3% to score at least 27 if the scores were exactly Normal. 3.49 Open worksheet file ex03-49. We ll create a histogram of the lengths and compute summary statistics using Analyze, Descriptive Statistics, Explore. Click to enter variable Length in the Dependent List. Click Plots and be sure the Histogram box is checked. To find the quartiles of this distribution, click Statistics and ask for Percentiles. Weighted Average(Definition 1) Length Percentiles Percentiles 5 10 25 50 75 90 95.6400.6800.7600.8000.8600.8800.9200 Tukey's Hinges Length.7600.8000.8400

322 Descriptives Length Statistic Std. Error Mean.8004.01116 Median.8000 Variance.006 Std. Deviation.07815 Minimum.64 Maximum.94 Range.30 Interquartile Range.10 Skewness -.361.340 Kurtosis -.566.668 This distribution actually looks a bit skewed left (other windows also show this same general shape); there are no outliers. The mean ( x = 0.800) is the same (within rounding) as the median (Med = 0.8); the standard deviation is s = 0.078; the quartiles are Q 1 = 0.76 and Q 3 = 0.86. The distances to the quartiles from the median (0.04 and 0.06) are roughly similar. These all suggest the distribution is rather symmetric.

323 In part (c), we want to find the percent of observations expected to be between the two quartiles (0.76 and 0.86) if the distribution is Normal. We ll use CDF.Normal to find the proportion by subtracting the proportion less than 0.76 from the proportion less than 0.86. About 47.5% of all observations between 0.76 and 0.86. To find what actual proportion lies between these values, sort the list using Data, Sort Cases. Enter the variable name Length in both the Sort by box. Click OK. Examining the worksheet after the sort, we find there are 11 values less than 0.76 and 12 values greater than 0.86; that means (49 23)/49 = 53.1% of the values are between the quartiles. 3.51 Open worksheet file ta02-05. We want stemplots of the data for both loose and intermediate compression. Use Analyze, Descriptive Statistics, Explore and enter Pent as the Dependent variable and Comp as the Factor. Pent Stem-and-Leaf Plot for Comp= I Pent Stem-and-Leaf Plot for Comp= L Frequency Stem & Leaf Frequency Stem & Leaf 2.00 2. 99 14.00 3. 01111112333444 3.00 3. 568 1.00 Extremes (>=4.3) Stem width: 1.00 Each leaf: 1 case(s) 4.00 39. 4689 2.00 40. 03 5.00 41. 12369 3.00 42. 079 2.00 43. 04 2.00 44. 11 2.00 Extremes (>=4.89) Stem width:.10 Each leaf: 1 case(s)

324 We see below that both of these distributions are not Normal; they are skewed right with high outliers (indicated as Extremes). 3.53 We ll find the z-scores corresponding to the quartiles using Transform, Compute Variable, and ask for the IDF.Normal. We specify area to the left (0.25) of Q 1, the mean (0) and standard deviation (1). Since the Normal distribution is symmetric, we ll find only Q 1. (Q 3 will have the same value, but a positive number).