Normal distributions in SPSS Bro. David E. Brown, BYU Idaho Department of Mathematics February 2, 2012 1 Calculating probabilities and percents from measurements: The CDF.NORMAL command 1. Go to the Variable View and create a variable by typing a name for it under the Name heading. 2. Make any desired adjustments to your variable s properties. Put 2, 3, or 4 in the Decimals box, to tell SPSS to round your probability or percentile to 2, 3, or 4 decimal places, respectively. 1 3. Return to the Data View. 4. Put any number at all in first row of your variable and press the Enter key. 5. In the Transform menu, select Compute Variable... The Compute Variable dialog will appear. 6. Put the name of your variable in the Target Variable: box. 7. You have options. Please DRAW THE PICTURE, as we do in class, to help you sort out the options and understand them: (a) If you need a percentile or left-tailed probability: i. Click CDF & Noncentral CDF in the Function group box. ii. In the Functions and Special Variables box, double-click Cdf.Normal. The expression iii. Replace the first? with the given measurement. Replace the second? with the mean of your normal distribution and the third? with the standard deviation of your normal distribution. IF YOU RE WORKING WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. Example: Say you need the probability of getting a measurement less than 97.5, when the mean is µ = 98.6 and the standard deviation is σ = 0.62. You ll have CDF.NORMAL(97.5, 98.6, 0.62), in the Numeric Expression box, and the probability SPSS gives you which is left-tailed is 0.0380. iv. Go to Step 8, below. (b) If you need a right-tailed probability or the top so many percent, you need to subtract the corresponding left-tailed probability from 1. Here s how: i. Put the number 1 in the Numeric Expression box, and type a hyphen to tell SPSS to subtract (or click the subtraction button in the keypad on the screen). ii. Click CDF & Noncentral CDF in the Function group box. iii. In the Functions and Special Variables box, double-click Cdf.Normal. The expression 1 It s traditional among nerds to use 4 decimal places. This may vary from discipline to discipline. 1
iv. Replace the first? with the given measurement. Replace the second? with the mean of your normal distribution and the third? with the standard deviation of your normal distribution. IF YOU RE WORKING WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. Example: Say you need the probability of getting a measurement greater than 112, when the mean is µ = 100 and the standard deviation is σ = 15. You ll have 1-CDF.NORMAL(112, 100, 15) in the Numeric Expression box, and the probability SPSS gives you which is right-tailed is 0.2119. v. Go to Step 8, below. (c) If you need a two-tailed probability: i. Determine the measurement or z-score that delineates the left tail and use Step 7a to get the area of that tail (which is the left-tailed probability). ii. Determine the measurement or z-score that delineates the right tail and use Step 7b to get the are of the right tail (which is the right-tailed probability). iii. Add your left-tailed probability to the right-tailed probability you have just gotten. You re done. Example: Say you need the probability of getting a z-score either less than 2.00 or greater than 2.00. Then the mean is µ = 0 and the standard deviation is σ = 1, because z-scores obey the standard normal distribution. Your left tail is delineated by 2.00 and your right tail by 2.00. You can 2 use CDF.NORMAL(-2.00, 0, 1) to get a left-tailed area of 0.0228, then use 1-CDF.NORMAL(2.00, 0, 1) to get a right-tailed area of 0.0228, then add to get 0.0228 + 0.0228 = 0.0456 as your two-tailed area. (d) If you need a probability or percentage for values trapped between two numbers, you ll have to (a) calculate the left-tailed probability corresponding to the higher number, (b) calculate the lefttailed probability corresponding to the lower number, and (c) subtract. Example: Say you need the probability that a z-score will be between 0.5 and 1.37. You ll need to take the left-tailed area corresponding to z = 1.37 and subtract from it the left-tailed area corresponding to z = 0.5. Here s how: i. Click CDF & Noncentral CDF in the Function group box. ii. In the Functions and Special Variables box, double-click Cdf.Normal. The expression iii. Replace the first? with the highest given measurement (for example, the 1.37). Replace the second? with the mean of your distribution and the third? with the standard deviation of your distribution. IF YOU RE WORKING WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. iv. Type a hyphen to the right of the CDF.NORMAL() expression you already have to tell SPSS to subtract. (You can click the subtraction button in the keypad on the screen, instead of typing a hyphen.) v. In the Functions and Special Variables box, double-click Cdf.Normal. A second copy of the expression vi. Replace the first? in this second copy with the lowest given measurement ( 0.5, in our example). Replace the second? with the mean of your normal distribution and the third? with the standard deviation of your normal distribution. IF YOU RE WORKING WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. At this point, you should have CDF.NORMAL(#,#,#) - CDF.NORMAL(#, #, #) in the Numeric Expression box, except instead of # s, you ll have numbers. Example: If you need the area between z = 0.5 and z = 1.37, you ll have CDF.NORMAL(1.37, 0, 1)-CDF.NORMAL(-0.5, 0, 1) in the Numeric Expression box. 2 You can, but you don t have to. In this example, the two tails are symmetric, so you could use SPSS to find the left-tailed area and double it, instead. Page 2
vii. Go to Step 8, below. 8. Click OK. The Change existing variable? dialog will appear. 3 9. Click OK. The Output window may or may not appear. Either way, go to the Data View. The percentile you seek is in the first row of your variable s column. It is expressed as a decimal number, so if you want a percentage, be sure to convert correctly. In our example, SPSS gives us 0.1902, which is 19.02%. 2 Getting measurements or percentiles from probabilities: The IDF.NORMAL command 1. Go to the Variable View and create a variable by typing a name for it under the Name heading. 2. Make any desired adjustments to your variable s properties. For Decimals, use what makes sense. Examples: (1) If your measurements are counts, they have to be whole numbers. So put 0 in the Decimals box. (2) If your measurements are dollar amounts, you could put 2 in the Decimals box, to round to the nearest penny. Or, if you prefer, you could put 0 in the Decimals box, to round to the nearest dollar. 3. Return to the Data View. 4. Put any number at all in first row of your variable. 5. In the Transform menu, select Compute Variable... The Compute Variable dialog will appear. 6. Put the name of your variable in the Target Variable: box. 7. You have options. Please DRAW THE PICTURE, as we do in class, to help you sort out the options and understand them: (a) If you need a percentile or have a left-tailed probability: i. Click Inverse DF in the Function group box. ii. In the Functions and Special Variables box, double-click Idf.Normal. The expression iii. Replace the first? with the given percentile or left-tail probability, expressed as a decimal. Replace the second? with the mean of your normal distribution and the third? with the standard deviation of your normal distribution. IF YOU RE WORKING WITH Z- SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. Example: Say you need the 35 th percentile, and the mean is µ = 100 and the standard deviation is σ = 15. You ll have IDF.NORMAL(0.30, 100, 15) in the Numeric Expression box. iv. Go to Step 8, below. In our example, SPSS will tell you that x = 94.2 is the 35 th percentile. (b) If you have a right-tailed probability or a top so many percent, you ll have to subtract it from 1 to convert it to a left-tailed probability or percentile and then calculate the measurement. Here s how: i. Click Inverse DF in the Function group box. ii. In the Functions and Special Variables box, double-click Idf.Normal. The expression 3 More than one student has told me that the Change existing variable? dialog does not appear. So we take a look at it together, and every single time, the Change existing variable? dialog appears. Maybe this means we all need to pay closer attention to what we re doing and to how the computer responds. Page 3
iii. Replace the first? with 1 followed by the given right-tailed area. 4 Replace the second? with the mean of your normal distribution and the third? with the standard deviation of your normal distribution. IF YOU RE WORKING WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. Example: Say you need the measurement that delineates the top 5% of measurements when µ = 98.2 and σ = 0.62. You ll have IDF.NORMAL(1-0.05, 98.2, 0.62) in the Numeric Expression box. iv. Go to Step 8, below. In our example, SPSS will tell you that x = 99.2 is the measurement that delineates the top 5% of measurements. (c) If you have a two-tailed probability: i. Determine how much area is in the left tail and use Step 7a to get the measurement corresponding to your left-tailed area ii. Determine how much area is in the right tail and use Step 7b to get the measurement corresponding to your right-tailed area. You re done. Example: Say you need the z-scores that delineate the lowest 1% and the highest 1%. The area in the left tail is 0.01, and so is the area in the right tail. For the left tail, you can use IDF.NORMAL(0.01, 0, 1), and get z = 2.33. For the right tail, you can 5 IDF.NORMAL(1-0.01, 0, 1), and get z = 2.33. (d) If you have a probability or percentage that s trapped between two unknown values, you ll have to first calculate the left-tailed area (or probability or percentile) corresponding to its lower boundary, second find the z-score or measurement corresponding to this left-tailed area, and then repeat for the upper boundary of your percentage or probability. Here s how: i. Calculate the left-tailed area corresponding to the lower boundary of your probability or percentage. Example: Suppose you need the range of measurements that correspond to the middle 90% of some normal distribution. That means 10% will be outside the desired range. Since it s the middle 90% you need, the left-tail area must be the same as the right-tail area. So divide the remaining 10% in half, to get 5% in each tail. So, the lower boundary of your range is 5% (or, 0.05), because the left tail has an area of 5%. ii. Click Inverse DF in the Function group box. iii. In the Functions and Special Variables box, double-click Idf.Normal. The expression iv. Replace the first? with the left-tail area of interest (0.05, in our example). Replace the second? with the mean of your normal distribution and the third? with the standard deviation of your normal distribution. IF YOU RE WORKING WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. For our example, suppose the mean is µ = 100 and the standard deviation is σ = 15. Then you ll have IDF.NORMAL(0.05, 100, 15) in the Numeric Expression box. v. Do Steps 8 and 9, below, and come back to this point to get the measurement corresponding the upper boundary. In the current example, SPSS will give us x = 75.33 as the measurement that delineates the left 5% tail. vi. Calculate the left-tailed area corresponding to the upper boundary of the given probability or percentage. Example, continued: We saw above that taking out the middle 90% of a normal distribution leaves a tail of area 5% on the left. So the left tail area corresponding to the upper boundary is left tail area + middle 90% = 5% + 90% = 95%, or 0.95. 4 Why does the 1 go inside the parentheses for IDF.NORMAL but outside the parentheses for CDF.NORMAL? Because on the one hand, CDF.NORMAL is an area; subtracting 1 CDF.NORMAL is subtracting areas (the overall area of 1 minus the area that CDF.NORMAL gives you). On the other hand, IDF.NORMAL is not an area. In fact, the first? is supposed to be a left-tailed area. You can t just replace it with a right-tailed area. You subtract your left-tailed area from 1 inside IDF.NORMAL to make sure IDF.NORMAL has a left-tailed area to work with. 5 You can, but you don t have to. You could use the fact that the distribution is symmetric to conclude that the right 1% tail is delineated by z = 2.33. Page 4
vii. Go back to the Compute variable... dialog in the Transform menu. The expression IDF.NORMAL(?,?,?) should already be in the Numeric Expression box, from when you went through this procedure for the lower boundary. (If not, do Steps 7(d)ii and 7(d)iii and continue with Step 7(d)viii.) viii. Put in the IDF.NORMAL command the left-tailed area for the upper boundary (0.95, in our example). Make sure the next number inside IDF.NORMAL is the mean and the third number the standard deviation of your normal distribution. IF YOU RE WORKING WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the standard deviation. So we ll have IDF.NORMAL(0.05, 100, 15) in the Numeric Expression box. ix. Do Steps 8 and 9, below. In our example, SPSS will give us x = 124.67 as the measurement that delineates the top 5% of the measurements. 8. Click OK. The Change existing variable? dialog appears. 9. Click OK. The Output window may or may not appear. Either way, the measurement value you seek is in the first row of your variable s column. It is expressed in the same units of measure as other values of your variable. Example: If your data are measurements of time, in years, then the value SPSS has just given you is also time, in years. Note: If you are finding the values between which a given probability is trapped, and you have only found the measurement corresponding to one of the boundaries, go back to Step 7(d)vi and continue from there. As always, if you have questions, please ask them! Page 5