Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)



Similar documents
Exploratory data analysis (Chapter 2) Fall 2011

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

3: Summary Statistics

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Data Exploration Data Visualization

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

a. mean b. interquartile range c. range d. median

Exercise 1.12 (Pg )

Variables. Exploratory Data Analysis

Means, standard deviations and. and standard errors

Topic 9 ~ Measures of Spread

Lecture 1: Review and Exploratory Data Analysis (EDA)

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

AP * Statistics Review. Descriptive Statistics

Exploratory Data Analysis. Psychology 3256

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Descriptive Statistics

Exploratory Data Analysis

Geostatistics Exploratory Analysis

STAT355 - Probability & Statistics

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Chapter 1: Exploring Data

Descriptive Statistics

Diagrams and Graphs of Statistical Data

Lecture 2. Summarizing the Sample

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Module 4: Data Exploration

2. Filling Data Gaps, Data validation & Descriptive Statistics

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Shape of Data Distributions

2 Describing, Exploring, and

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

First Midterm Exam (MATH1070 Spring 2012)

+ Chapter 1 Exploring Data

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Week 1. Exploratory Data Analysis

Using SPSS, Chapter 2: Descriptive Statistics

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Chapter 2 Data Exploration

Lesson 4 Measures of Central Tendency

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Mean = (sum of the values / the number of the value) if probabilities are equal

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY


RECOMMENDED COURSE(S): Algebra I or II, Integrated Math I, II, or III, Statistics/Probability; Introduction to Health Science

determining relationships among the explanatory variables, and

Northumberland Knowledge

MEASURES OF CENTER AND SPREAD MEASURES OF CENTER 11/20/2014. What is a measure of center? a value at the center or middle of a data set

AMS 7L LAB #2 Spring, Exploratory Data Analysis

List of Examples. Examples 319

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Final Exam Practice Problem Answers

THE BINOMIAL DISTRIBUTION & PROBABILITY

How To Write A Data Analysis

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Box-and-Whisker Plots

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

ETL PROCESS IN DATA WAREHOUSE

Descriptive statistics parameters: Measures of centrality

SPSS Manual for Introductory Applied Statistics: A Variable Approach

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Summarizing and Displaying Categorical Data

Box-and-Whisker Plots

Mind on Statistics. Chapter 2

Basics of Statistics

Assignment #03: Time Management with Excel

Descriptive Statistics: Summary Statistics

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Interpreting Data in Normal Distributions

Exploratory Data Analysis

Foundation of Quantitative Data Analysis

Descriptive Statistics and Measurement Scales

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Exploratory Data Analysis

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Random Variables. Chapter 2. Random Variables 1

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Describing, Exploring, and Comparing Data

MEASURES OF LOCATION AND SPREAD

Measurement with Ratios

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

3 Describing Distributions

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

TIPS FOR DOING STATISTICS IN EXCEL

Chapter 7 Section 1 Homework Set A

Bellwork Students will review their study guide for their test. Box-and-Whisker Plots will be discussed after the test.

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Transcription:

Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center of symmetry. We could average the minimum and maximum data values (called the midrange) as a measure of center, but the midrange is very sensitive to skewed distributions and outliers. Center: Finding the Median (cont.) A more reasonable choice for center than the midrange is the value with exactly half the data values below it and half above it. This particular value is called the median. The median is the middle data value (once the data values have been ordered) that divides the histogram into two equal areas. The median has the same units as the data. Slide 5-1 Slide 5- Median n + 1 The sample median is the largest observation. n +1 If is not a whole number, the median is the average of the two observations on either side. Spread: Home on the Range When describing a distribution numerically, we always report a measure of its spread along with its center. The range of the data is the difference between the maximum and minimum values: Range = max min. A disadvantage of the range is that a single extreme value can make it very large and, thus, not representative of the data overall. Slide 5-3 Slide 5-4

The Interquartile Range The interquartile range (IQR) allows us to ignore extreme data values and concentrate on the middle of the data. To find the IQR, we first need to know what quartiles are Quartiles Quartiles split the data into quarters Lower quartile (Q 1 ) divides bottom half of data into two median of observations below the median Upper quartile (Q 3 ) divides upper half of data into two median of observations above the median The difference between the quartiles is the IQR, so IQR = upper quartile lower quartile. Slide 5-5 Slide 5-6 The Interquartile Range (cont.) The lower and upper quartiles are the 5 th and 75 th percentiles of the data, so The IQR contains the middle 50% of the values of the distribution, as shown in Figure 5.3 from the text: The Five-Number Summary Five number summary { Min, Q 1, Median, Q 3, Max } Example: Slide 5-7 Slide 5-8

Boxplots Boxplot A boxplot is a graphical display of the fivenumber summary. The steps involved in constructing a boxplot can also be found on pages 60-61 of the text. Boxplots are particularly useful when comparing groups. Data 1.5 IQR (pull back until hit observation) Q 1 Med Q 3 1.5 IQR (pull back until hit observation) Scale Figure.4.4 Construction of a box plot. From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 000. Slide 5-9 Slide 5-10 Construction of Boxplot Comparing Groups With Boxplots Data: breaking strength of wire in kilograms 0 14 18 3 10 3 10 7 5 1 The following set of boxplots compares the effectiveness of various coffee containers: Leaf Unit = 1.0 kg 4 1 004 5 1 8 (4) 033 57 Find Median Find Quartiles Q 1 = Q 3 = Calculate Interquartile range Q 3 -Q 1 = Calculate whisker length 1.5 x (Q 3 -Q 1 ) = What does this graphical display tell you? Slide 5-11 Slide 5-1

Summarizing Symmetric Distributions Medians do a good job of identifying the center of skewed distributions. When we have symmetric data, the mean is a good measure of center. We find the mean by adding up all of the data values and dividing by n, the number of data values we have. Sample Mean average The sample mean is denoted by The sample mean = x Sum of the observations Number of observations Mean (a) (b) (c) Slide 5-13 Figure.4.1 Mechanical construction representing a dot plot: (a) shows a balanced rod while (b) and (c) show unbalanced rods. Slide 5-14 Mean or Median? Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance. In symmetric distributions, the mean and median are approximately the same in value, so either measure of center may be used. For skewed data, though, it s better to report the median than the mean as a measure of center. Figure.4. Relationship between mean and median P Med = x (a) Data symmetric about P P Med x (b) Two largest points moved to the right The mean and the median. [Grey disks in (b) are the ``ghosts'' of the points that were moved.] Slide 5-15 From Chance Encounters by C.J. Wild and G.A.F. Seber, John Wiley & Sons, 000. Slide 5-16

What About Spread? A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. A deviation is the distance that a data value is from the mean. Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations. Variance The sample variance, denoted by s, is found using the formula s = ( x1 x) + ( x x) +... + ( xn x) 1 = ( x x) i Slide 5-17 Slide 5-18 Sample Standard Deviation Shape, Center, and Spread s x = ( x1 x) + ( x x) +... + ( xn x) 1 = ( x x) In same units as data So preferable to sample variance Equals zero only if all observations identical Sensitive to outliers (extreme observations) Button on calculator learn to use it! Much simpler than applying formula i When telling about a quantitative variable, always report the shape of its distribution, along with a center and a spread. If the shape is skewed, report the median and IQR. If the shape is symmetric, report the mean and standard deviation and possibly the median and IQR as well. Slide 5-19 Slide 5-0

What About Outliers? If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing. Note: The median and IQR are not likely to be affected by the outliers. What Can Go Wrong? Do a reality check don t let technology do your thinking for you. Don t forget to sort the values before finding the median or percentiles. Don t compute numerical summaries of a categorical variable. Watch out for multiple modes multiple modes might indicate multiple groups in your data. Slide 5-1 Slide 5- What Can Go Wrong? (cont.) Be aware of slightly different methods different statistics packages and calculators may give you different answers for the same data. Beware of outliers. Make a picture (make a picture, make a picture). Be careful when comparing groups that have very different spreads. So What Do We Know? We describe distributions in terms of shape, center, and spread. For symmetric distributions, it s safe to use the mean and standard deviation; for skewed distributions, it s better to use the median and interquartile range. Always make a picture don t make judgments about which measures of center and spread to use by just looking at the data. Slide 5-3 Slide 5-4