Chapter 3. The Normal Distribution

Similar documents
Normal distributions in SPSS

AP Statistics Solutions to Packet 2

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

First Midterm Exam (MATH1070 Spring 2012)

Section 1.3 Exercises (Solutions)

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

4. Descriptive Statistics: Measures of Variability and Central Tendency

Using SPSS, Chapter 2: Descriptive Statistics

6 3 The Standard Normal Distribution

Descriptive Statistics

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Exploratory data analysis (Chapter 2) Fall 2011

Describing, Exploring, and Comparing Data

Lecture 1: Review and Exploratory Data Analysis (EDA)

Exercise 1.12 (Pg )

6.2 Normal distribution. Standard Normal Distribution:

Chapter 23. Inferences for Regression

IBM SPSS Statistics for Beginners for Windows

GeoGebra Statistics and Probability

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

MEASURES OF VARIATION

The Normal Distribution

Data Analysis Tools. Tools for Summarizing Data

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

STAB22 section 1.1. total = 88(200/100) + 85(200/100) + 77(300/100) + 90(200/100) + 80(100/100) = = 837,

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Variables. Exploratory Data Analysis

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

5) The table below describes the smoking habits of a group of asthma sufferers. two way table ( ( cell cell ) (cell cell) (cell cell) )

Descriptive Statistics

z-scores AND THE NORMAL CURVE MODEL

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

AP * Statistics Review. Descriptive Statistics

Lesson 7 Z-Scores and Probability

2. Filling Data Gaps, Data validation & Descriptive Statistics

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

Data exploration with Microsoft Excel: univariate analysis

4 Other useful features on the course web page. 5 Accessing SAS

Lesson 20. Probability and Cumulative Distribution Functions

MBA 611 STATISTICS AND QUANTITATIVE METHODS

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Coins, Presidents, and Justices: Normal Distributions and z-scores

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Exploratory Data Analysis

Multiple Regression. Page 24

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Probability Distributions

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Probability Distributions

Appendix 2.1 Tabular and Graphical Methods Using Excel

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

MULTIPLE REGRESSION EXAMPLE

A Picture Really Is Worth a Thousand Words

Using Excel for descriptive statistics

Data exploration with Microsoft Excel: analysing more than one variable

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

Probability. Distribution. Outline

7. Normal Distributions

How to Use a Data Spreadsheet: Excel

How To Test For Significance On A Data Set

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Topic 9 ~ Measures of Spread

Father s height (inches)

Interpreting Data in Normal Distributions

Unit 7: Normal Curves

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

2 Describing, Exploring, and

Intermediate. Microsoft Excel Tables and Printing

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Mind on Statistics. Chapter 8

Chapter 7 Section 7.1: Inference for the Mean of a Population

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Appendix III: SPSS Preliminary

Summarizing and Displaying Categorical Data

SPSS Explore procedure

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Exploratory Data Analysis. Psychology 3256

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

TI-Inspire manual 1. I n str uctions. Ti-Inspire for statistics. General Introduction

Lab 1: The metric system measurement of length and weight

Descriptive Statistics

Using Excel for Analyzing Survey Questionnaires Jennifer Leahy

Transcription:

Chapter 3. The Normal Distribution Topics covered in this chapter: Z-scores Normal Probabilities Normal Percentiles Z-scores Example 3.6: The standard normal table The Problem: What proportion of observations on a standard Normal variable z take values less than 1.47? Find a z-score in SPSS. 1. Open SPSS. 2. Type the number 1.47 in the first cell of the Data Editor. 3. Go to the Transform menu. 4. Scroll to the Compute Variable option. The following window should open: 34

35 Chapter 3 5. Under Function Group, scroll down and select the CDF & Noncentral CDF option. 6. Under Functions and Special Variables, scroll down and double-click the Cdfnorm option. Now the window should appear like this:

The Normal Distribution 36 7. Replace the question mark under Numeric Expression with the variable ZValue by highlighting the question mark, clicking on the variable ZValue to the left and then clicking the arrow to the left of the Numeric Expression box. 8. Under Target Variable type any variable name you like, for example Probability. 9. Click OK. Now the answer should be adjacent to the value of 1.47 in your Data Editor in a column entitled whatever you named the Target Variable as seen below. Normal Probabilities Example 3.8: Who qualifies for an athletic scholarship? The Problem: The NCAA considers a student a partial qualifier if the combined SAT score is at least 720. Partial qualifiers can receive athletic scholarships and practice with the team, but they can t compete during their first college year. What proportion of all students who take the SAT would be partial qualifiers, receiving a combined SAT score of between 720 and 820? SAT scores are distributed with a mean of 1026 and a standard deviation of 209. 1. Open a new window in SPSS. 2. Click on the Variable View tab and create a variable named SAT. 3. Click on the Data View tab and enter two data values: 720 and 820.

37 Chapter 3 4. Go to the Transform menu. 5. Scroll to the Compute Variable option. The following window should open: 6. Under Function Group, scroll down to the CDF & Noncentral CDF option.

The Normal Distribution 38 7. Under Functions and Special Variables, scroll down to the Cdf.Normal option and double-click. Now the previous window should appear like this: 8. Replace the first question mark under Numeric Expression with the variable SAT by highlighting the first question mark, clicking on the variable Quant to the left and then clicking the arrow to the left of the Numeric Expression box. 9. Replace the second question mark under Numeric Expression with the mean of 1026 as given in the problem. 10. Replace the third question mark under Numeric Expression with the standard deviation of 209. 11. Under Target Variable type the variable name Probability.

39 Chapter 3 12. Click OK. Now two probabilities may be viewed in the Data Editor, the probability that a student scores less than a 720 for their combined SAT score and the probability that a student scores less than an 820 for their combined SAT score. Since the question asked for the probability that a student scored between a 720 and an 820, the two probabilities should be subtracted, leaving a final probability of 0.16 0.07 = 0.09 or 9 percent. Normal Percentiles Example 3.9: Find the top 10% using software The Problem: Scores on the SAT Verbal test in recent years follow approximately the N(504,111) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT? 1. Click on the Variable View tab. 2. Create three variables named Prob, Mean, and SD. Change the number of decimals for Mean and SD to 0.

The Normal Distribution 40 3. Go to the Data View tab. 4. Type.90 under the Prob column in the first row. We want the location of the top 10% which has the same bordering point as the lower 90%, and the normal distribution uses only lower probabilities. 5. Type 504 under the Mean column. Type 111 under the SD column. 6. Go to the Transform menu. 7. Scroll to the Compute Variable option. The following window should open: 8. Under Function Group, scroll down to the Inverse DF option. 9. Under Functions and Special Variables, scroll down to the Idf.Normal option and double-click. 10. Replace the first question mark under Numeric Expression with the variable Prob by highlighting the first question mark, clicking on the variable Prob to the left and then clicking the arrow to the left of the Numeric Expression box. 11. Replace the second question mark under Numeric Expression with the variable Mean by highlighting the second question mark, clicking on the variable Mean to the left and then clicking the arrow to the left of the Numeric Expression box. 12. Replace the third question mark under Numeric Expression with the variable SD by highlighting the third question mark, clicking on the

41 Chapter 3 variable SD to the left and then clicking the arrow to the left of the Numeric Expression box. 13. Under Target Variable type a variable name you like, for example ANS. 14. Click OK. Now the answer should be adjacent to the three variables in your SPSS Data Editor in a column entitled whatever you typed in Target Variable.

The Normal Distribution 42 Chapter 3 Exercises 3.9 Men s and women s heights. 3.11 Monsoon rains. 3.13 Table A. 3.29 Standard normal drill. 3.31 Acid rain? 3.33 A milling machine. 3.35 In my Chevrolet. 3.37 The middle half. 3.39 What s your percentile? 3.41 Heights of men and women. 3.43 A surprising calculation. 3.47 Normal is only approximate: ACT scores. 3.49 Are the data normal? Fruit fly thorax lengths. 3.51 Are the data normal? Soil penetrability. 3.53 Where are the quartiles?

317 Chapter 3 SPSS Solutions 3.9 It s inconvenient to use Minitab for a computation such as this. Using a standard calculator, we can easily compute the z-scores. To compute the z-scores, we use the formula z = ( value μ)/ σ. Either do the subtraction first, or be sure to use parentheses. A woman six feet (72 ) tall is 2.96 standard deviations above the mean; the six foot tall man is 0.964 standard deviations above the mean. The woman is much taller, relative to other women, than the man is, compared to other men. 3.11 To find the percent of years with less than 697 mm of rain, we use Transform, Compute Variable. Locate the CDF & Noncentral CDF Function group, then the CDF.Normal function in the lower box. Clicking on that will transfer the command shell into the Numeric Expression box. Notice that in the lower center of the box there is a description of the command and its parameters. Enter the parameters as shown, then OK computes the probability into the worksheet (as variable Drought, here). For more decimal places in your result (remember, the default is two), click on the Variable view tab and increase them. About 2.9% of all years will have less than 697 mm of rain. To find the percent of normal rainfall years (between 683 mm and 1022 mm), we ll find the cumulative probability for 1022 mm and subtract the cumulative probability of

318 683. We do this in one combination of CDF.Normal calculations as shown below. About 96.1% of all years will have normal rainfall. 3.13 Here, we are given a relative frequency under the standard Normal curve. We need to find the value of z. We ll again use Transform, Compute Variable. Locate the Inverse DF Function group, then the IDF.Normal function in the lower box. Clicking on that will transfer the command shell into the Numeric Expression box. Notice that in the lower center of the box there is a description of the command and its parameters. Enter the parameters as shown, then OK computes the probability into the worksheet (as variable Z here). The point z with 20% of the area below it is z = 0.842. We repeat for part (b) using 0.6 as the area to the left of the point (since 40% of the observations are above it). This point is z = 0.253. 3.29 As with Exercise 3.13 above, use Transform, Compute Variable, we want the Inverse DF and IDF.Normal. As before, enter the area to the left of the desired point on the curve (0.8), the value of the mean (0) and standard deviation (1). This point is z = 0.842.

319 Part (b) asks for the point with 35% of all observations above it; this means that 65% = 0.65 are below it. This point is z = 0.39. 3.31 To find the proportion of rainy days that meet the acid rain criteria, we use Transform, Compute Variable. Locate the CDF & Noncentral CDF Function group, then the CDF.Normal function in the lower box. Clicking on that will transfer the command shell into the Numeric Expression box. Notice that in the lower center of the box there is a description of the command and its parameters. Enter the parameters as shown, then OK computes the probability into the worksheet (as variable Acid, here). For more decimal places in your result (remember, the default is two), click on the Variable view tab and increase them. At this location 22.9% of days will qualify as acid rain days. 3.33 To find the proportion of slots that meet specifications, we ll use Transform, Compute Variable and find the cumulative probability for 0.878 inch and subtract the cumulative probability of 0.872 inch. We do this in one combination of CDF.Normal calculations as shown below. About 98.76% of slots will meet the specifications. 3.35 This problem refers to the information given about 2008 model vehicles. They had mean 18.7 mpg and standard deviation 4.3 mpg. We want to know the area to the left of the Chevy Malibu (with 25 mpg). Use Transform, Compute Variable and find the cumulative probability for the Malibu as below. 92.86% of 2008 cars had worse mileage than the Chevy Malibu.

320 3.37 To find the quartiles, we want the points with (respectively) 25% and 75% of the area below them. We can find these values using Transform, Compute Variable. We want the Inverse DF and IDF.Normal. As before, enter the area to the left of the desired point on the curve (0.25, then 0.75), the value of the mean (18.7) and standard deviation (4.3). This point is z = 0.842. We find that Q 1 (the 25 th percentile) is 15.80 mpg and Q 3 (the 75 th percentile) is 21.60 mpg. 3.39 The percentile corresponds to the area to the left of the value of interest. We find this using Transform, Compute Variable and find the cumulative probability for the Jacob as below. We see that Jacob is not quite at the 15 th percentile (his is 14.9). 3.41 We want to know what proportion of women are taller than the average man (69.3 ). We ll use Transform, Compute Variable but subtract the percent of women shorter than 69.3 from 1 to find the proportion taller than 69.3 Be sure to use the values for the women s distribution: mean (64), and the standard deviation (2.7). We see that not quite 2.5% (2.48%) of women should be taller than the average man.

321 3.43 To find the proportion of students scoring at least 750, we ll use Transform, Compute Variable and subtract the proportion scoring less than 750 from 1 as we did in Exercise 3.41. We see that 3.1% of men scored at least 750 while only 1.1% of women did this well. 3.47 To find the proportion scoring higher than 27, divide the given numbers; to find the proportion scoring 27 or more, add the number that scored 27 to the first. We find that 11.5% scored higher than 27, while 15.3% scored at least 27. To compare this with the Normal computation, use CDF.Normal to find the proportion scoring at least than 27 by subtracting the proportion scoring less than 27 from 1. We would expect 12.3% to score at least 27 if the scores were exactly Normal. 3.49 Open worksheet file ex03-49. We ll create a histogram of the lengths and compute summary statistics using Analyze, Descriptive Statistics, Explore. Click to enter variable Length in the Dependent List. Click Plots and be sure the Histogram box is checked. To find the quartiles of this distribution, click Statistics and ask for Percentiles. Weighted Average(Definition 1) Length Percentiles Percentiles 5 10 25 50 75 90 95.6400.6800.7600.8000.8600.8800.9200 Tukey's Hinges Length.7600.8000.8400

322 Descriptives Length Statistic Std. Error Mean.8004.01116 Median.8000 Variance.006 Std. Deviation.07815 Minimum.64 Maximum.94 Range.30 Interquartile Range.10 Skewness -.361.340 Kurtosis -.566.668 This distribution actually looks a bit skewed left (other windows also show this same general shape); there are no outliers. The mean ( x = 0.800) is the same (within rounding) as the median (Med = 0.8); the standard deviation is s = 0.078; the quartiles are Q 1 = 0.76 and Q 3 = 0.86. The distances to the quartiles from the median (0.04 and 0.06) are roughly similar. These all suggest the distribution is rather symmetric.

323 In part (c), we want to find the percent of observations expected to be between the two quartiles (0.76 and 0.86) if the distribution is Normal. We ll use CDF.Normal to find the proportion by subtracting the proportion less than 0.76 from the proportion less than 0.86. About 47.5% of all observations between 0.76 and 0.86. To find what actual proportion lies between these values, sort the list using Data, Sort Cases. Enter the variable name Length in both the Sort by box. Click OK. Examining the worksheet after the sort, we find there are 11 values less than 0.76 and 12 values greater than 0.86; that means (49 23)/49 = 53.1% of the values are between the quartiles. 3.51 Open worksheet file ta02-05. We want stemplots of the data for both loose and intermediate compression. Use Analyze, Descriptive Statistics, Explore and enter Pent as the Dependent variable and Comp as the Factor. Pent Stem-and-Leaf Plot for Comp= I Pent Stem-and-Leaf Plot for Comp= L Frequency Stem & Leaf Frequency Stem & Leaf 2.00 2. 99 14.00 3. 01111112333444 3.00 3. 568 1.00 Extremes (>=4.3) Stem width: 1.00 Each leaf: 1 case(s) 4.00 39. 4689 2.00 40. 03 5.00 41. 12369 3.00 42. 079 2.00 43. 04 2.00 44. 11 2.00 Extremes (>=4.89) Stem width:.10 Each leaf: 1 case(s)

324 We see below that both of these distributions are not Normal; they are skewed right with high outliers (indicated as Extremes). 3.53 We ll find the z-scores corresponding to the quartiles using Transform, Compute Variable, and ask for the IDF.Normal. We specify area to the left (0.25) of Q 1, the mean (0) and standard deviation (1). Since the Normal distribution is symmetric, we ll find only Q 1. (Q 3 will have the same value, but a positive number).