Topic 9 ~ Measures of Spread



Similar documents
MEASURES OF VARIATION

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Variables. Exploratory Data Analysis

Exploratory data analysis (Chapter 2) Fall 2011

Exercise 1.12 (Pg )

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

3: Summary Statistics

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Descriptive Statistics

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Lecture 1: Review and Exploratory Data Analysis (EDA)

3.2 Measures of Spread

How To Write A Data Analysis

Chapter 1: Exploring Data

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Means, standard deviations and. and standard errors

Exploratory Data Analysis. Psychology 3256

Interpreting Data in Normal Distributions

Data Exploration Data Visualization

Measures of Central Tendency and Variability: Summarizing your Data for Others

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

9.1 Measures of Center and Spread

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Standard Deviation Estimator

Algebra I Vocabulary Cards

Northumberland Knowledge

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Exploratory Data Analysis

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Section 1.3 Exercises (Solutions)

Geostatistics Exploratory Analysis

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 7. One-way ANOVA

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Shape of Data Distributions

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Mean = (sum of the values / the number of the value) if probabilities are equal

Descriptive Statistics and Measurement Scales

Chapter 3. The Normal Distribution

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Chi Square Tests. Chapter Introduction

Box-and-Whisker Plots

Descriptive Statistics

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Summarizing and Displaying Categorical Data

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

First Midterm Exam (MATH1070 Spring 2012)

CALCULATIONS & STATISTICS

AP * Statistics Review. Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics

Final Exam Practice Problem Answers

AP Statistics Solutions to Packet 2

9. Sampling Distributions

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Organizing Topic: Data Analysis

Diagrams and Graphs of Statistical Data

Name: Date: Use the following to answer questions 2-3:

AP STATISTICS REVIEW (YMS Chapters 1-8)

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Week 1. Exploratory Data Analysis

Foundation of Quantitative Data Analysis

Recall this chart that showed how most of our course would be organized:

Lesson 4 Measures of Central Tendency

Introduction; Descriptive & Univariate Statistics

Random Variables. Chapter 2. Random Variables 1

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Manhattan Center for Science and Math High School Mathematics Department Curriculum

+ Chapter 1 Exploring Data

Thursday, November 13: 6.1 Discrete Random Variables

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

determining relationships among the explanatory variables, and

Algebra 1 Course Information

Mind on Statistics. Chapter 2

5. Linear Regression

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

International Statistical Institute, 56th Session, 2007: Phil Everson

THE BINOMIAL DISTRIBUTION & PROBABILITY

Problem of the Month Pick a Pocket

Basics of Statistics

Measurement with Ratios

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Lecture 2. Summarizing the Sample

1 Descriptive statistics: mode, mean and median

Probability Distributions

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Module 4: Data Exploration

Transcription:

AP Statistics Topic 9 ~ Measures of Spread Activity 9 : Baseball Lineups The table to the right contains data on the ages of the two teams involved in game of the 200 National League Division Series. Is there a relationship between the ages of the players on the teams and the outcome of the NLDS? The Reds lost in three games. a. Identify the observational units. Also identify the explanatory and the response variable. Classify each variable as categorical or quantitative. Observational units: starting players in game of 200 NLDS Explanatory variable: age Type: quantitative Response variable: NLDS Winner Type: categorical b. Create comparative dotplots, using the axes shown here for comparing the ages between the two teams. (Be sure to label which dotplot represents which team.) Comment on how the age distributions compare. Reds Phillies Comment: The starting lineup for the Phillies tended to be older than the starting lineup for the Reds. The Phillies are more consistent in age than the Reds. c. Calculate the mean and median age of each team's lineup. Reds Mean: 29.67 Median: 29 Phillies Mean: 32.22 Median: 32

d. Which team's lineup appears to have more variability in its ages? The Red's lineup appears to have more variability in its ages. In the previous topic, you learned that the mean and median are two ways to measure the center of a distribution; you will now learn several ways to measure the spread, or variability, of a distribution. e. What is the age of the oldest player in the Phillies lineup? The youngest? What is the difference in age between the oldest and youngest player? Oldest: 38 Youngest: 30 Difference: 38-30 = 8 f. Repeat part e for the Red's lineup. Oldest: 36 Youngest: 23 Difference: 36-23 = 3 A very simple, but not particularly useful, measure of variability is the range, calculated as the difference between the maximum and minimum values in a data set. Another measure of variability is the interquartile range (IQR), which is the difference between the upper quartile and the lower quartile of a distribution. The lower quartile (or the 25th percentile, abbreviated Q ) is the value such that 25% of the data values are less that that value and 75% are greater than it, while the upper quartile (or the 75th percentile, abbreviated Q 3 ) is the value such that 75% of the values in the data set are less than that value and 25% are greater than it. Thus, the IQR is the range of the middle 50% of the data. g. Determine the lower and upper quartiles of the ages for the Phillies. Then find the IQR of the Phillie's ages. 30 3 3 3 32 32 32 33 38 IQR = Q3 - Q = 32.5-3 =.5 Q = 3 Q3 = 32.5 h. Determine the lower and upper quartiles of the ages for the Reds. Then find the IQR of the Red's ages. 23 26 27 27 29 30 34 35 36 Q3 = 34.5 Q = 26.5 IQR = Q3 - Q = 34.5-26.5 = 8 i. Which team has the greater age range? Which has the greater IQR? Are these values consistent with your answer to question d? The Reds have a greater range of ages, 3 versus 8 years, and a greater interquartile range of ages, 8 versus.5 years. j. Based on this analysis, summarize how the age distributions differ between the 200 Reds and Phillies (shape, center, spread). The distribution of ages for the Reds has no distinct shape, but there is a roughly symmetric cluster of ages between 23 & 30 and an evenly dispersed set of ages between 34 & 36. The median age is 29 and the IQR is 8 years. The distribution of ages for the Phillies is mound-shaped and symmetric with a potential outlier in the 38-year-old left-fielder Raul Ibanez. The median age of 32 and an IQR of.5 years. The Cincinnati team tended to be younger than the Philadelphia team, though the ages of the Red's players vary quite a bit more. 2

Activity 9 2: Baseball Lineups Other measures of variability examine how far the data values fall or deviate from the mean of the distribution. a. The mean age for Cincinnati's starting lineup in game one of the 200 NLDS was approximately 29.67. Complete the missing entries for Votto and Rolen in the "deviation from the mean" column of the following table by calculating the differences between their ages and the mean age. 27-29.67 = -2.67 35-29.67 = 5.33 b. Add the values in the "deviation from Mean" column. Then calculate the average deviation from the mean. -. 03/9 = -. 0033 The un rounded values from the table appear in the table at right. Fathom calculates the sum of the deviations to be zero, as in the table below. -. 03 c. The sum of the deviations from the mean is always equal to zero. Verify this fact for the data set {, 5, 2}. ( + 5 + 2)/3 = 6-6 = -5 5-6 = - 2-6 = 6-5 + - + 6 = 0 d. Given the fact that the sum of the deviations from the mean is always zero, what does that imply about using the average deviation from the mean as a measure of spread (variation) for a data set? The average deviation is a useless measure of spread since it is always going to be zero. Because a measure of spread is concerned with distances from the mean rather than direction from the mean, you could work with the absolute values of these deviations. e. Complete the missing entries in the "Absolute Deviation" column of the table below. Then calculate the average absolute deviation. Report the units of measurement for this calculation. 32.67/9 = 3.63 years 2.67 5.33 7.3 28.4 32.67 60.0 The measure of spread you have just calculated is the mean absolute deviation (MAD). It is certainly a reasonable measure of the amount of variation relative to the mean in a data set, but there is yet another measure of spread that has properties desirable to statisticians, as you soon shall see. f. Complete the missing entries in the "Squared Deviation" column of the table above. Then calculate the average squared deviation. Report the units of measurement for this calculation. This value is called the variance (V). g. 7.78 years 2 To convert back to the original units of the data set years of age take the square root of the average squared deviation. 4.2 years The measure of spread you have just calculated is the standard deviation (SD). The standard deviation is the most widely used measure of variation in statistical calculations. 3

The standard deviation ("baby" sigma σ) is a widely used measure of variability. To compute the standard deviation, you calculate the difference between the mean and each data value and then square the difference: (data value mean) 2. Add these squared terms, and divide the number of observational units n. The standard deviation is the square root of the result: or, more simply, The standard deviation can loosely be interpreted as the typical distance that a data value in the distribution deviates from the mean. The variance σ 2 is calculated by the formula The variance is literally the average squared deviation from the mean. σ is the Greek lowercase "sigma" and is used to represent the standard deviation (of a population). is the Greek uppercase "sigma" and is the symbol used to imply summation. μ is the Greek lowercase "myoo" and is the symbol used to represent the mean. n is the number of observational units. x is used to represent the value of a variable for a particular observational unit. Here's how to do it on the TI 83/84. standard deviation h. Calculate, with technology, the standard deviation of the ages for the Phillies' starting lineup in game of the 200 NLDS. 2.2 years i. j. Now, remove 38 year old Raul Ibanez from the Phillies lineup and calculate the standard deviation of the ages for the Phillies' starting lineup..87 Calculate the range, interquartile range (IQR), and standard deviation of the ages for the Phillies' starting lineup in game of the 200 NLDS with and without Raul Ibanez. Complete the table below. 8 3.5 2.2 years.87 years k. Which measures of spread are resistant to outliers and which are not? Explain. The IQR is least affected by the presence of Raul Ibanez since it changed the least when he is included in the Phillies lineup. 4

count count Activity 9-4: Placement Exam Scores 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 Placement Score a. The distribution of placement scores appears to be roughly symmetric and mound shaped. b. μ 0.22 and σ 3.859 μ - σ = 0.22 3.859 = 6.362 32 μ + σ = 0.22 + 3.859 = 4.08 c. 46 of the 23 scores fall within one standard deviation of the mean, i.e., between 7 and 4 inclusive.this accounts for 46/23.685 or about 69% of the scores. This is quite consistent with the 68% advertised by the Empirical Rule. 5 7 2 6 7 7 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 2 5 6 2 Placement Score 8 4 7 d. μ - 2σ = 0.22 2(3.859) = 2.503 μ + 2σ = 0.22 + 2(3.859) = 7.939 2.503 2.503 6.362 4.08 202 of the 23 scores fall within two standard deviations of the mean, i.e., between 3 and 7, inclusive. This accounts for.948 or about 95% of the scores. This is exactly what the Empirical Rule advertises. e. 23 of the 23 scores fall within three standard deviations of the mean, i.e., between and 9. This accounts for 00% of the placement scores. This is quite in line with the Empirical Rule. 5

Activity 9-5: SATs and ACTs a. 740 is 240 points above the mean SAT score. b. 30 is 9 points above the mean ACT score. c. No. You cannot compare these point differences because the SAT and ACT scores are not measured on the same numeric scale. d. Bobby's SAT score is 240/240 = standard deviation above the mean SAT score. e. Kathy's ACT score is 9/6 =.5 standard deviations above the mean ACT score. f. Kathy's ACT z score is which is greater than Bobby's SAT z score of 740 500. 30 2 6 =.5 g. Since Kathy's score of 30 on the ACT is.5 standard deviations above the mean score in the approximately Normal distribution of ACT scores, while Bobby's score of 540 on the SAT is only one standard deviation above the mean score in the approximately Normal distribution of SAT scores, Kathy performed better relative to the peers whose scores appear in the distribution of all ACT scores. 240 = h. z Peter = 380 500 240 = 0.5 z Kelly = 5 2 6 = i. Peter has the higher z score since < 0.5. j. A z score turns out to be negative when calculated for any score less than the mean score for the associated distribution. 6

Activity 9-6: Marriage Ages a. Husbands tend to be older than their wives by a mean of.875 years and a median of.5 years on average. b. The IQR for the distribution of husbands' ages is 44.5 25 = 9.5 years. The IQR for the distribution of wives' ages is 4.5 24 = 7.5 years. The standard deviation of husbands' ages is 4.26 years while the standard deviation of wives' ages is 3.27. There is more variability in the distribution of husbands' ages than in the distribution of wives' ages. c. The distributions of husbands' ages and wives' ages are both skewed right. The median age of the husbands is 30.5, while the median age of the wives is only 29, indicating that the husbands tended to be older than the wives. With an interquartile range of 9.5, two more than that of the wives, there is slightly more variation in the ages of the husbands than in the ages of the wives. 5 25 35 45 55 65 75 husbands' ages 5 25 35 45 55 65 75 wives' ages d. The mean difference (husband age minus wife age) is equal to the difference of the means, mean husband age minus mean wife age. The median difference is NOT equal to the difference of the medians. IQR = e. Neither the difference in the IQRs, nor the difference in the standard deviations, is equal to the IQR of the differences or the standard deviation of the differences. 7