Measures of Spread and Boxplots Discrete Math, Section 9.4



Similar documents
Descriptive Statistics

I. Chi-squared Distributions

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Confidence Intervals for One Mean

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

1 Computing the Standard Deviation of Sample Means

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Lesson 15 ANOVA (analysis of variance)

5: Introduction to Estimation

Math C067 Sampling Distributions

Determining the sample size

Chapter 7: Confidence Interval and Sample Size

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Normal Distribution.

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Hypothesis testing. Null and alternative hypotheses

Chapter 7 Methods of Finding Estimators

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Properties of MLE: consistency, asymptotic normality. Fisher information.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

1. C. The formula for the confidence interval for a population mean is: x t, which was

Lesson 17 Pearson s Correlation Coefficient

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

PSYCHOLOGICAL STATISTICS

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method


Sampling Distribution And Central Limit Theorem

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Now here is the important step

CHAPTER 3 THE TIME VALUE OF MONEY

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Output Analysis (2, Chapters 10 &11 Law)

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Maximum Likelihood Estimators.

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Section 11.3: The Integral Test

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Mathematical goals. Starting points. Materials required. Time needed

Chapter XIV: Fundamentals of Probability and Statistics *

Exploratory Data Analysis

5 Boolean Decision Trees (February 11)

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

3. Greatest Common Divisor - Least Common Multiple

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

MEP Pupil Text 9. The mean, median and mode are three different ways of describing the average.

Chapter 14 Nonparametric Statistics

Hypergeometric Distributions

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Confidence Intervals for Linear Regression Slope

One-sample test of proportions

Modified Line Search Method for Global Optimization

Topic 5: Confidence Intervals (Chapter 9)

Basic Elements of Arithmetic Sequences and Series

G r a d e. 2 M a t h e M a t i c s. statistics and Probability

TI-83, TI-83 Plus or TI-84 for Non-Business Statistics

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

CS103X: Discrete Structures Homework 4 Solutions

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

MATH 083 Final Exam Review

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Soving Recurrence Relations

Quadrat Sampling in Population Ecology

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Practice Problems for Test 3

Convexity, Inequalities, and Norms

Overview of some probability distributions.

NATIONAL SENIOR CERTIFICATE GRADE 12

1 Correlation and Regression Analysis

Lecture 4: Cheeger s Inequality

How To Solve The Homewor Problem Beautifully

Building Blocks Problem Related to Harmonic Series

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

NATIONAL SENIOR CERTIFICATE GRADE 11

Statistical inference: example 1. Inferential Statistics

Central Limit Theorem and Its Applications to Baseball

Confidence Intervals

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Department of Computer Science, University of Otago

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

LECTURE 13: Cross-validation

Transcription:

Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9, 10, 11, 13, 16} Now plot both data sets o these umber lies: S 1 S What do you observe? We'll look at a few differet measures of spread (also called measures of dispersio): Rage Iterquartile rage Five Number Summary Variace Stadard Deviatio I. The Rage As we defied the other day: The rage of a set of umbers is the differece betwee the largest ad smallest umbers i the set, i.e. rage = (largest value) - (smallest value) Questio: What importat thigs does the rage tell us? What problems ca occur if the rage is used to measure the spread of a set of data? Page 1

II. The Iterquartile Rage (IQR) Before we defie the iterquartile rage, we eed to discuss the topics of percetiles ad quartiles. Questio: Suppose you score a 760 o the SAT Math ad you are told that this placed you i the 95 th percetile. What does this mea? Quartile defiitios: The lower quartile (or first quartile) of a set is the 5 th percetile. Notatio: Q 1. The upper quartile (or third quartile) of a set is the 75 th percetile. Notatio: Q 3. The secod quartile is the media. Notatio: Q or M. To compute quartiles: 1. Sort the data from smallest to largest.. Fid the media. This is the secod quartile, Q. 3. Look at the first half of the data (ot icludig Q ). Fid the media of the first half of the data. This is the first quartile, Q 1. 4. Look at the secod half of the data (ot icludig Q ). Fid the media of the secod half of the data. This is the third quartile, Q 3. Example : Quartiles Calculate Q 1, the media, ad Q 3 for the followig data set: 5 7 10 14 18 19 5 9 31 33 Defiitio: The iterquartile rage (IQR) is defied to be the differece betwee the third quartile ad the first quartile. Thus, we ca express the IQR usig this formula: IQR = Q 3 - Q 1 Example 3: IQR Fid the IQR for Example. Questio: Why would IQR be a "good" measure of spread? We ca also use the IQR to determie whether a umber is a outlier of a data set: A Test for Outliers: A data poit is cosidered to be a outlier if it lies more tha 1.5 iterquartile rages below Q 1 (i.e., the umber is less tha Q 1-1.5IQR) or 1.5 iterquartile rages above Q 3 (i.e., the umber is greater tha Q 3 + 1.5IQR). Page

III. Five-Number Summary ad Boxplots Aother way to describe both the ceter ad spread of a set of umbers is to use its five-umber summary. The five-umber summary cosists of: miimum value Q 1 media Q 3 maximum value Example 4 (Yates, et. al.): Bods' Home Rus The followig data are the umbers of home rus Barry Bods hit i his first 16 seasos, sorted: 16 19 4 4 5 33 33 34 34 37 37 40 4 46 49 73 a. Create a five-umber summary of this data. b. We suspect that Bods' 73-home-ru seaso is a outlier. Is it? c. For good measure, is the 16-home-ru seaso a outlier? Give specific calculatios. We ca represet the five-umber summary graphically. A boxplot or box-ad-whisker plot is a graphical represetatio of the five-umber summary: The box exteds from Q 1 to Q 3. The box is divided at the media. The whiskers exted from Q 1 to the mi ad from Q 3 to the max. Example 5 Make a box-ad-whisker plot for the last example. Notes of importace: 1. The five-umber summary is a excellet way to measure the spread of a skewed data set.. Two side-by-side boxplots ca be a good way of comparig two related data sets. 3. It is importat to label the umbers whe makig boxplots to compare data. 4. Boxplots ca be draw either horizotally or vertically. Page 3

That was fu, but While they're ice, boxplots coceal outliers. As a result, we adopt a modified boxplot. The modified boxplot is similar, except that outliers are plotted as idividual poits. Modified Boxplot: A modified boxplot is draw as follows: A cetral box exteds from Q 1 to Q 3. The box is divided at the media. Observatios more tha 1.5IQR outside the cetral box (the outliers) are plotted idividually The whiskers exted from Q 1 to the smallest value that is ot a outlier ad from Q 3 to the largest value that is ot a outlier. Example 6 Now draw a modified boxplot for the previous example. Usig the Calculator to Your Advatage The TI graphig calculators ca do five-umber summaries, boxplots, ad modified boxplots: The five-umber summary is foud uder 1-Var Stats. Scroll dow i the list for mix, Q 1, Med, Q 3, ad maxx. Boxplots ad modified boxplots are foud uder the Stat Plot meu. Look for them i the secod row of stat plot optios. Be sure to use ZoomStat! (You otherwise may miss the outliers!) Aother Test for Outliers: Eter the data i a list. Use the calculator to make a modified boxplot. The modified boxplot will show whether or ot there are outliers. Example 49 (Uderstadig Statistics) I a hurry? O the ru? Hugry as well? How about a ice cream bar as a sack. Ice cream bars are popular amog all age groups. Cosumer Reports did a study of ice cream bars. Twety-seve bars with taste ratigs of at least "fair" were listed, ad the cost per bar was icluded i the report. Just how much does a ice-cream bar cost? The data, expressed i dollar, appear below. As you ca see, the cost varies quite a bit, partly because the bars are ot uiform i size. 0.99 1.07 1.00 0.50 0.37 1.03 1.07 1.07 0.97 0.63 0.33 0.50 0.97 1.08 0.47 0.84 1.3 0.5 0.50 0.40 0.33 0.35 0.17 0.38 0.0 0.18 0.16 a. Compute the five-umber summary ad the iterquartile rage for this data set. (Who said example umbers had to be borig?) Page 4

b. Use the TI graphig calculator to create a boxplot ad a modified boxplot. Draw these graphs below, labelig the five-umber summary o each graph. c. Did this data set cotai ay outliers? Example 8: Comparig Sog Legths usig Side-By-Side Boxplots Below are the legths of the tracks o three differet CDs, listed i miutes ad secods: U's How to Dismatle a Atomic Bomb: 3:14 3:59 5:08 4:50 5:47 3:39 4:30 5:03 3:51 4:41 4:1 Dave Matthews Bad's Crash: 4:07 6:7 5:16 4:1 6:39 6:11 5:54 4:07 5:4 5:53 5:00 9:11 Somethig Corporate's North: 3:08 :57 3:7 3:7 4:07 3:03 3:4 3:38 3:16 3:49 3:18 3:51 Use the TI graphig calculator to make 3 side-by-side modified boxplots. [Hit: Begi by eterig this data ito lists. Before we ca make boxplots, we eed to covert miutes ad secods to all secods. You ca use lists o the calculator to do this too.] a. Sketch the boxplots here, labelig the legths i secods: b. Do ay of these albums cotai sogs whose track legths are outliers? Which oe? c. What does the modified boxplot say about how track legths vary o North? d. Use the modified boxplots to compare the legths of tracks o these three CDs. e. Suppose you are DJ at a radio statio ad you have these three albums at your disposal. You have approximately 4 miutes of time to fill before a program begis ad you eed to fid a sog. Lookig at the boxplots, which CD would be the best place to look? (Keep i mid that ot all sogs o all CDs are radio sigles.) Page 5

IV. Sample Variace ad Sample Stadard Deviatio Before we begi, we must make a importat distictio: A umber that describes a Populatio is called a Parameter. A umber that describes a Sample is called a Statistic. Recall: What is the differece betwee a populatio ad a sample? Note that, i practice, whe a parameter is ukow, we use the correspodig statistic. Whe we talked about the mea, we talked about the sample mea, a statistic. We used the otatio x for the sample mea. If we are talkig about the populatio mea, a parameter, we use the otatio µ. Just as the mea is the most commoly-used measure of ceter, the stadard deviatio is the most commoly-used measure of spread. I order to defie stadard deviatio, we first defie variace. Defiitio: Sample Variace The sample variace, deoted s, of a set of observatios { x 1, x,..., x}, is give by s = "( xi! x) i= 1! 1 Defiitio: Sample Stadard Deviatio The sample stadard deviatio, deoted s, of a set of observatios { x 1, x,..., x}, is the square root of the sample variace, ad is give by s = "( xi! x) i= 1! 1 Note that variace is't used all that ofte, but stadard deviatio is defied i terms of variace, so we iclude it. Your calculator ca easily compute both the sample stadard deviatio ad the sample variace, but we will first work with them a bit to get a uderstadig of how the formulas work. Example 9 Compute the variace ad stadard deviatio of the followig sample: 3 5 8 7 8 Page 6

Notes of Importace: 1. The parameters populatio variace ad populatio stadard deviatio are deoted by σ ad σ, respectively.. If you kow all of the data i a populatio, you ca also compute the populatio variace ad stadard deviatio by very similar formulas:! i= 1 ( x " µ ) i! i= 1 ( x " µ ) i # = ad # = Note that the distictio betwee these formulas ad the sample formulas is that we use the populatio mea here ad we divide by istead of by -1. Your book presets these formulas i a somewhat misleadig way. Be careful to use sample formulas whe you are workig with a sample. 3. The TI calculators ca compute stadard deviatio. It's uder 1-Var Stats. The fourth etry, Sx, is the sample stadard deviatio, ad the fifth etry, σx, is the populatio stadard deviatio. You usually wat the sample stadard deviatio. 4. The stadard deviatio of a set of umbers measures how umbers are spread out from the mea. 5. As we saw o a recet worksheet, the quatity x i! x is called the deviatio from the mea ad the sum of all the deviatios for ay data set always equals 0. 6. The stadard deviatio is oresistat to outliers. 7. The quatity -1, which appears i the deomiator of the formulas for sample variace ad sample stadard deviatio, is called the umber of degrees of freedom. 8. (Yates et. al.) The mea ad stadard deviatio are excellet measures of spread for data sets which a symmetric. For skewed data sets, the media ad five-umber summary may be more helpful. Example 10 (Freedma): Spread Each of the followig lists has a average of 50. For which oe is the spread of the umbers aroud the average biggest? Smallest? a. 0 0 40 50 60 80 100 b. 0 48 49 50 51 5 100 c. 0 1 50 98 99 100 Example 11 (Freedma): Spread Repeat the directios show i Example 10 for the followig lists: a. 47 49 50 51 53 b. 46 48 50 5 54 c. 46 49 50 51 54 Page 7

Example 1 (Freedma): Stadard Deviatios Each of the followig lists has a average of 50. For each oe, guess whether the stadard deviatio is aroud 1,, or 10. (This example does't require ay arithmetic.) a. 49 51 49 51 49 51 49 51 49 51 b. 48 5 48 5 48 5 48 5 48 5 c. 48 51 48 5 47 5 46 51 53 51 d. 54 49 46 49 51 53 59 59 49 49 e. 60 36 31 50 48 50 54 56 6 53 Example 13 (Ima): Fial Exams A graduate studet i biology has bee asked to grade 40 fial exams, selected at radom from several large sectios of a itroductory course. There are the grades: 77 68 86 84 95 98 87 71 84 9 96 83 6 83 81 85 91 74 61 5 83 73 85 78 50 81 37 60 85 100 79 81 75 9 80 75 78 71 64 65 a. Calculate the sample mea ad sample stadard deviatio: sample mea: sample stadard deviatio: b. What percet of the raw data lies withi oe stadard deviatio from the mea? c. What percet lies withi two stadard deviatios? d. What percet lies withi three stadard deviatios? Homework: 9.4: #3, 5, 7, 1 Page 8

This example illustrates the followig rule of thumb: Rule of Thumb o Spread: 68-95-99.7 Rule: For may samples of data, 1. About /3 of the sample observatios fall withi oe sample stadard deviatio of the mea.. About 95% of the observatios fall withi two sample deviatios of the mea. 3. About 99.7% of the observatios fall withi three sample deviatios of the mea. Note: This rule is particularly accurate for data which have a bell-shaped graph. Example 14: Fidig Stadard Deviatio a. Fid the sample stadard deviatio for the followig sample: 100 90 90 85 80 75 75 75 70 70 65 60 40 40 40 b. Suppose the above data were all of the data for a populatio. Compute the populatio stadard deviatio. Chebyshev's Theorem Form 1: The probability that ay radom variable X will assume a value withi k stadard deviatios of the mea is at least 1-1/k. I terms of a formula, 1 P ( µ! k# < X < µ + k# ) " 1! k Form : The probability that ay radomly-chose outcome lies betwee µ-k ad µ+k is at least! 1" k Note: Chebyshev's Theorem is amed for Russia Mathematicia Pafuty L. Chebyshev (181-1894). As this is a Aglicizatio of his last ame, it's oe of several ways you'll see it spelled i books. Please do ot cofuse his first ame with ay rappers. Page 9

Example 15 (Mizrahi/Sulliva): Usig Chebyshev Suppose tha a experimet with umerical outcomes has mea 4 ad stadard deviatio 1. Use Chebyshev's Theorem to estimate the probability that a outcomes lies betwee ad 6. Example 16 (Walpole/Myers/Myers): More from Pafuty A radom variable X has a mea µ = 8, a variace σ =9, ad a ukow probability distributio. Fid: a. P (! 4 < X < 0) b. P ( X! 8 > 6) Homework: 9.4: #19, 0 Page 10