Probability histograms

Similar documents
AMS 5 CHANCE VARIABILITY

Stat 20: Intro to Probability and Statistics

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

MATH 140 Lab 4: Probability and the Standard Normal Distribution

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

4. Continuous Random Variables, the Pareto and Normal Distributions

The Binomial Probability Distribution

Probability Distribution for Discrete Random Variables

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

Chapter 4. Probability and Probability Distributions

You flip a fair coin four times, what is the probability that you obtain three heads.

13.0 Central Limit Theorem

Part III. Lecture 3: Probability and Stochastic Processes. Stephen Kinsella (UL) EC4024 February 8, / 149

$ ( $1) = 40

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

Chapter 20: chance error in sampling

Normal distribution. ) 2 /2σ. 2π σ

Contemporary Mathematics- MAT 130. Probability. a) What is the probability of obtaining a number less than 4?

8. THE NORMAL DISTRIBUTION

6 3 The Standard Normal Distribution

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

The normal approximation to the binomial

Section 6.2 Definition of Probability

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Week 4: Standard Error and Confidence Intervals

E3: PROBABILITY AND STATISTICS lecture notes

The normal approximation to the binomial

AP Statistics Solutions to Packet 2

Stats on the TI 83 and TI 84 Calculator

Chapter 5. Random variables

An Introduction to Basic Statistics and Probability

Descriptive Statistics

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

6.4 Normal Distribution

Math 431 An Introduction to Probability. Final Exam Solutions

Mathematical goals. Starting points. Materials required. Time needed

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Fairfield Public Schools

What Does the Normal Distribution Sound Like?

Normal Distribution Lecture Notes

Betting systems: how not to lose your money gambling

WISE Sampling Distribution of the Mean Tutorial

Foundation of Quantitative Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Lab 11. Simulations. The Concept

Stat 20: Intro to Probability and Statistics

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) B) C) D) 0.

Risk and Uncertainty. Managerial Economics: Economic Tools for Today s Decision Makers, 4/e

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

WEEK #23: Statistics for Spread; Binomial Distribution

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Chapter 3 RANDOM VARIATE GENERATION

Simple linear regression

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

Normal Probability Distribution

MEASURES OF VARIATION

Chapter 16: law of averages

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Programming Your Calculator Casio fx-7400g PLUS

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

3.2 Measures of Spread

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

University of California, Los Angeles Department of Statistics. Random variables

Week 3&4: Z tables and the Sampling Distribution of X

Continuous Random Variables

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 5 Solutions

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Solution. Solution. (a) Sum of probabilities = 1 (Verify) (b) (see graph) Chapter 4 (Sections ) Homework Solutions. Section 4.

What is the Probability of Pigging Out

Probability: The Study of Randomness Randomness and Probability Models. IPS Chapters 4 Sections

3.4 The Normal Distribution

Descriptive Statistics and Measurement Scales

ACMS Section 02 Elements of Statistics October 28, Midterm Examination II

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

CALCULATIONS & STATISTICS

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS NUMBER OF TOSSES

University of Chicago Graduate School of Business. Business 41000: Business Statistics

Unit 7: Normal Curves

Teaching the, Gaussian Distribution with Computers in Senior High School

Lecture Notes Module 1

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

Section 6-5 Sample Spaces and Probability

Math Quizzes Winter 2009

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

ACMS Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Answers

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Transcription:

Percentiles For a given histogram, we fix a percentage and find the value for which that percentage of the observations is below such value. This is called the percentile. For example, we obtain the histogram of the family incomes in the US, according to the last census. The first percentile corresponds to the income such that 1% of the families make less than that amount. Q: The Math SAT score among the applicants to a certain university have an average of 535 and a SD of 100. Assuming that the scores follow the normal curve, find the 95th percentile of the score distribution. A: We need to find the number (score) such that 95% of the scores are below that number, according to the normal curve. So, if z is the 95th percentile, we need that the area under the curve corresponding to (, z) be equal to 95%. This is equivalent to finding z, such that the interval ( z, z) has an area under the curve of 90%. Then, according to the normal table, z 1.65. So students that are 1.65 standard units away from the average will be on the 95th percentile. This corresponds to 165 points above average, or 700 points.

Consider the box Probability histograms 1 1 1 1 3 4 4 Probability histogram Then, the chances of obtaining a ticket with a 1 are 4/7, the chances of a 3 are 1/7 and the chances of a 4 are 2/7. We can display that information graphically in a probability histogram. Density 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4

Each box is centered at a number and its area corresponds to the probability of that number. The sum of the areas of the boxes is equal to one. This is because the areas are associated with probabilities or chances. Probability histograms are used to represent chance.

Histograms based on sampled data are used to represent how the data are distributed over their range. Probability histograms correspond to the chances that a random variable take some specific values. 100 repetitions Density Density Density 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 2 3 4 5 6 7 8 9 10 11 12 sum of two dice 1000 repetitions 2 3 4 5 6 7 8 9 10 11 12 sum of two dice Probability histogram Empirical histograms based on the frequencies of observed outcomes of an experiment converge to the corresponding probability histograms, as can be seen by the example of rolling two dice. 2 3 4 5 6 7 8 9 10 11 12 sum of two dice

In the previous example consider taking the product of the two dice. 100 rolls The convergence is also true when considering the product of the two dice. In this case we notice that the probability histogram is much more irregular than the one obtained for the sum. Density Density Density 0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.00 0.04 0.08 0 5 10 15 20 25 30 35 Product of two dice 10000 rolls 0 5 10 15 20 25 30 35 Product of two dice Probability histogram 0 5 10 15 20 25 30 35 Product of two dice The regularity is a general feature related to the sum.

Consider the problem of tossing a fair coin a certain number of times n. We can obtain the probability histogram for each n. 10 tosses 0.00 0.10 0.20 0.02 0.06 0.000 0.010 0.020 0 1 2 3 4 5 6 7 8 9 10 100 tosses 39 47 49 51 53 61 35 37 41 43 45 55 57 59 63 65 1000 tosses 450 455 460 465 470 475 480 485 490 495 500 505 510 515 520 525 530 535 540 545 550 We observe that the probability histogram of the number of tails converges to a very regular curve as the number of tosses is increased This curve is a common probability density named Gaussian curve.

Using the normal approximation We can approximate the probability histogram of the sum of heads in a large number of coin tosses using the normal curve. Q: A coin is tossed 100 times, what is the probability of getting exactly 50 heads? A: We can look at the probability histogram for this case. We observe that the chances corresponding to 50 are equal to the area of the box that has a base from 49.5 to 50.5. The area of this box is 7.96%. Q: What about an approximation using the normal curve? A: First step is to calculate the mean and standard deviation. Consider a box model where there is a zero for the tail and 1 for the head, 0 1.

Average of the Box: 1 2. SD of the Box: 1 2 When drawing a ticket from this box 100 times with replacement, the expected value of the sum of the draws is 100 1 2 = 50 In general, the expected value of the sum of the draws is given by (number of draws) (average of box) The standard error of the sum of the draws is given by the square root law (number of draws) (SD of box) where SD of box stands for the standard deviation of the list of numbers in the box.

The standard error for the sum of the draws is given by 100 1 2 = 5 Now we have to convert the base of the rectangle to standard units: 49.5 50 5 =.01 50.5 50 5 = 0.1 So the normal approximation consists of the area under the normal curve for the interval (-0.1,0.1). According to the table, this is equal to 7.965%. Q: What are the approximate chances of getting between 45 and 55 heads inclusive? A: The probability of getting between 45 and 55 heads is equal to the areas of the rectangles between 45 and 55 in the probability histogram. This is approximated by the area under the normal curve for the interval (44.5,55.5). In standard units this corresponds to the interval (-1.1,1.1), which has a probability of 72.87% according to the table.

Q: What are the approximate chances of getting between 45 and 55 heads exclusive? A: This time the probability is given by the areas of the rectangles between 46 and 54, which is approximately the area under the curve corresponding to the interval (45.5,54.5), this is the interval (-0.9,0.9) in standard units, which has a probability of 63.19%. Very often it is not specified if the end points are included or not. In that case we consider the approximation using the given interval. So, for the previous example, we would have (45,55) that is converted to (-1,1) in standard units and yields 68.27% probability.

When can we use the normal approximation? Consider the box 1 2 9 the probability histogram for the tickets in the box is far from being normal. Nevertheless, if we consider the experiment of drawing tickets from the box and sum the results over and over again, then the probability histogram of the sum will be approximated by the normal curve.what if we consider the product of the tickets? In that case the probability histogram will not be approximated by a normal curve, no matter how many draws from the box we take. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 1 2 3 4 5 6 7 8 9

The Central Limit Theorem In general it is true that the probability histogram of the sum of draws from a box of tickets will be approximated by the normal curve. This is a mathematical fact that can be expressed and proved as a theorem. The Central Limit Theorem. When drawing at random with replacement from a box, the probability histogram for the sum will follow a normal curve, in the limit. This is even if the probability histogram of the contents of the box does not have a probability histogram that is approximately normal The reason why the CLT is used as an approximation for distributions of lists of numbers is that it often happens that the uncertainty in the data can be thought of as the sum of several sources of randomness.

Q: Four hundred draws will be made at random with replacement from the box 1 3 5 7 Estimate the chance that the sum of the draws will be more than 1,500. A: The average in the box is 4 and the SD is about 2.24. The expected value for the sum is 4 400 = 1, 600 and the SE is 400 2.24 45. Converting 1,500 to standard units we have 1, 500 1, 600 45 = 2.22. According to the normal curve, the chance of being above -2.22 is about 99%.

Q: Estimate the chance that there will be fewer than 90 3 s. A: The number of 3 s is like the sum of 400 draws from the box 1, 3 0 s where the ticket marked as 1 corresponds to the 3. The average in such a box is 1/4 and the SD is about 0.43. Thus the expected number of 3 s is 400 1/4 = 100 and the SE is 400 0.43 = 8.66. Converting 90 to standard units we have 90 100 = 1.15. 8.66 According to the normal curve, the chance of being below -1.15 is about 12%.