Histograms and density curves

Similar documents
AP Statistics Solutions to Packet 2

Exercise 1.12 (Pg )

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Section 1.3 Exercises (Solutions)

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

MEASURES OF VARIATION

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Interpreting Data in Normal Distributions

Unit 7: Normal Curves

Lesson 4 Measures of Central Tendency

Statistics Revision Sheet Question 6 of Paper 2

Descriptive Statistics

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Descriptive Statistics

The Normal Distribution

Exploratory Data Analysis

z-scores AND THE NORMAL CURVE MODEL

Continuous Random Variables

AP * Statistics Review. Descriptive Statistics

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

3.2 Measures of Spread

6 3 The Standard Normal Distribution

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

6.4 Normal Distribution

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Diagrams and Graphs of Statistical Data

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

MBA 611 STATISTICS AND QUANTITATIVE METHODS

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

TEACHER NOTES MATH NSPIRED

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Descriptive Statistics and Measurement Scales

The Normal Distribution

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

8. THE NORMAL DISTRIBUTION

WEEK #22: PDFs and CDFs, Measures of Center and Spread

Probability, Mean and Median

Summarizing and Displaying Categorical Data


VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

First Midterm Exam (MATH1070 Spring 2012)

Simple linear regression

DATA INTERPRETATION AND STATISTICS

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Lecture 2. Summarizing the Sample

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

4. Continuous Random Variables, the Pareto and Normal Distributions

Descriptive statistics parameters: Measures of centrality

Name: Date: Use the following to answer questions 2-3:

Exploratory data analysis (Chapter 2) Fall 2011

How To Write A Data Analysis

Dongfeng Li. Autumn 2010

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Chapter 3 RANDOM VARIATE GENERATION

7. Normal Distributions

Random Variables. Chapter 2. Random Variables 1

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment Statistics and Probability Example Part 1

2: Frequency Distributions

Chapter 3. The Normal Distribution

Data Exploration Data Visualization

Chapter 2 Data Exploration

Means, standard deviations and. and standard errors

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive Statistics

Coins, Presidents, and Justices: Normal Distributions and z-scores

Section 6.4: Work. We illustrate with an example.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Part II Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Part II

Important Probability Distributions OPRE 6301

Characteristics of Binomial Distributions

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

3.4 The Normal Distribution

Week 1. Exploratory Data Analysis

Probability. Distribution. Outline

Variables. Exploratory Data Analysis

3: Summary Statistics

What Does the Normal Distribution Sound Like?

THE BINOMIAL DISTRIBUTION & PROBABILITY

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Chapter 1: Exploring Data

Key Concept. Density Curve

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

2. Filling Data Gaps, Data validation & Descriptive Statistics

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

The Standard Normal distribution

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Shape of Data Distributions

Normal distribution. ) 2 /2σ. 2π σ

Transcription:

Histograms and density curves What s in our toolkit so far? Plot the data: histogram (or stemplot) Look for the overall pattern and identify deviations and outliers Numerical summary to briefly describe center and spread A new idea: If the pattern is sufficiently regular, approximate it with a smooth curve. Any curve that is always on or above the horizontal axis and has total are underneath equal to one is a density curve. Area under the curve in a range of values indicates the proportion of values in that range. Come in a variety of shapes, but the normal family of familiar bell-shaped densities is commonly used. Remember the density is only an approximation, but it simplifies analysis and is generally accurate enough for practical use. The Normal Distrbution, Jan 9, 2004-1 -

Examples 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0 10 20 30 40 Sulfur oxide (in tons) 0.07 0.06 0.05 0.04 0.03 Shaded area of histogram: 0.29 0.02 0.01 0.00 0 10 20 30 40 Sulfur oxide (in tons) 0.07 0.06 0.05 0.04 0.03 Shaded area under the curve: 0.30 0.02 0.01 0.00 0 10 20 30 40 Sulfur oxide (in tons) 0.04 0.03 0.02 0.01 0.00 40 46 52 58 64 70 76 82 88 94 100 Waiting time between eruptions (min) The Normal Distrbution, Jan 9, 2004-2 -

Median and mean of a density curve Median: The equal-areas point with 50% of the mass on either side. Mean: The balancing point of the curve, if it were a solid mass. Note: The mean and median of a symmetric density curve are equal. The mean of a skewed curve is pulled away from the median in the direction of the long tail. The mean and standard deviation of a density are denoted µ and σ, rather than x and s, to indicate that they refer to an idealized model, and not actual data. The Normal Distrbution, Jan 9, 2004-3 -

Normal distributions: N (µ, σ) The normal distribution is symmetric, single-peaked, bell-shaped. The density curve is given by ( ) f(x) = 1 exp 1 2πσ 2 2σ 2(X µ)2. It is determined by two parameters µ and σ: µ is the mean (also the median) σ is the standard deviation Note: The point where the curve changes from concave to convex is σ units from µ in either direction. The Normal Distrbution, Jan 9, 2004-4 -

The 68-95-99.7 rule About 68% of the data fall inside (µ σ, µ + σ). About 95% of the data fall inside (µ 2σ, µ + 2σ). About 99.7% of the data fall inside (µ 3σ, µ + 3σ). The Normal Distrbution, Jan 9, 2004-5 -

Example Scores on the Wechsler Adult Intelligence Scale (WAIS) for the 20 to 34 age group are approximately N(110, 25). About what percent of people in this age group have scores above 110? About what percent have scores above 160? In what range do the middle 95% of all scores lie? The Normal Distrbution, Jan 9, 2004-6 -

Standardization and z-scores Linear transformation of normal distributions: X N (µ, σ) a X + b N (a µ + b, a σ) In particular it follows that X µ σ N (0, 1). N (0, 1) is called standard normal distribution. For a real number x the standardized value or z-score z = x µ σ tells how many standard deviations x is from µ, and in what direction. Standardization enables us to use a standard normal table to find probabilities for any normal variable. For example: What is the proportion of N(0, 1) observations less than 1.2? What is the proportion of N(3, 1.5) observations greater than 5? What is the proportion of N(10, 5) observations between 3 and 9? The Normal Distrbution, Jan 9, 2004-7 -

Normal calculations Standard normal calculations 1. State the problem in terms of x. 2. Standardize: z = x µ σ. 3. Look up the required value(s) on the standard normal table. 4. Reality check: Does the answer make sense? Backward normal calculations We can also calculate the values, given the probabilities: If MPG N (25.7, 5.88), what is the minimum MPG required to be in the top 10%? Backward normal calculations 1. State the problem in terms of the probability of being less than some number. 2. Look up the required value(s) on the standard normal table. 3. Unstandardize, i.e. solve z = x µ σ for x. The Normal Distrbution, Jan 9, 2004-8 -

Example Suppose X N (0, 1). P(X 2) =? P(X > 2) =? P( 1 X 2) =? Find the value z such that P(X z) = 0.95 P(X > z) = 0.99 P( z X < z) = 0.68 P( z X < z) = 0.95 P( z X < z) = 0.997 Suppose X N (10, 5). P(X < 5) =? P( 3 < X < 5) =? P( x < X < x) = 0.95 The Normal Distrbution, Jan 9, 2004-9 -

Assessing Normality How to make a normal quantile plot 1. Arrange the data in increasing order. 2. Record the percentiles ( 1 n, 2 n,..., n n ). 3. Find the z-scores for these percentiles. 4. Plot x on the vertical axis against z on the horizontal axis. Use of normal quantile plots If the data are (approximately) normal, the plot will be close to a straight line. Systematic deviations from a straight line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall patter of the plot. Sample Quantiles 2 1 0 1 2 N(0, 1) Sample Quantiles 0 1 2 3 4 5 6 Exp(1) Sample Quantiles 0.0 0.2 0.4 0.6 0.8 1.0 U(0, 1) 3 2 1 0 1 2 3 Theoretical Quantiles 3 2 1 0 1 2 3 Theoretical Quantiles 3 2 1 0 1 2 3 Theoretical Quantiles The Normal Distrbution, Jan 9, 2004-10 -

Estimation The normal density is just one possible density curve. There are many others, some with compact mathematical formulas and many without. estimation software fits an arbitrary density to data to give a smooth summary of the overall pattern. 0.2 0.1 0.0 0 10 20 30 40 Velocity of galaxy (1000km/s) The Normal Distrbution, Jan 9, 2004-11 -

Histogram How to scale a histogram? Easiest way to draw a histogram: qqually spaced bins counts on the vertical axis Frequency 5 4 3 2 1 0 0 10 20 30 40 50 60 70 Sosa home runs Disadvantage: Scaling depends on number of observations and bin width. Scale histogram such that area of each bar corresponds to proportion of data: height = counts width total number 0.04 0.03 0.02 0.01 0.00 0 10 20 30 40 50 60 70 Sosa home runs Proportion of data in interval (0, 10]: height width = 0.02 10 = 0.2 = 20% Since n = 15 this corresponds to 3 observations. The Normal Distrbution, Jan 9, 2004-12 -

curves 0.5 0.4 n=250 0.3 0.2 0.1 0.0 4 3 2 1 0 1 2 3 4 0.5 0.4 0.3 0.2 0.1 0.0 4 3 2 1 0 1 2 3 4 0.5 0.4 n=2500 n=250000 x x Proportion of data in (1,2]: #{x i : 1 < x i 2} n 0.3 0.2 0.1 0.0 4 3 2 1 0 1 2 3 4 0.5 0.4 0.3 0.2 0.1 n 0.0 4 3 2 1 0 1 2 3 4 x x 2 1 n f(x) dx Probability that a new observation X fall into [a, b] P(a X b) = b a f(x) dx = lim n #{x i : 1 < x i 2} n The Normal Distrbution, Jan 9, 2004-13 -