The basics of probability theory. Distribution of variables, some important distributions

Similar documents
Section 5 Part 2. Probability Distributions for Discrete Random Variables

An Introduction to Basic Statistics and Probability

E3: PROBABILITY AND STATISTICS lecture notes

4. Continuous Random Variables, the Pareto and Normal Distributions

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

AMS 5 CHANCE VARIABILITY

Chapter 4. Probability and Probability Distributions

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Characteristics of Binomial Distributions

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Normal distribution. ) 2 /2σ. 2π σ

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

7. Normal Distributions

You flip a fair coin four times, what is the probability that you obtain three heads.

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

6.4 Normal Distribution

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Probability and statistics; Rehearsal for pattern recognition

Exploratory data analysis (Chapter 2) Fall 2011

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

The Binomial Probability Distribution

The Normal Distribution

ACMS Section 02 Elements of Statistics October 28, Midterm Examination II

WEEK #22: PDFs and CDFs, Measures of Center and Spread

MATH 140 Lab 4: Probability and the Standard Normal Distribution

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Probability Distributions

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability. Distribution. Outline

Stat 20: Intro to Probability and Statistics

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

THE BINOMIAL DISTRIBUTION & PROBABILITY

A review of the portions of probability useful for understanding experimental design and analysis.

Descriptive Statistics and Measurement Scales

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

ACMS Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Answers

ST 371 (IV): Discrete Random Variables

Introduction to Probability

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Variables. Exploratory Data Analysis

Sample Term Test 2A. 1. A variable X has a distribution which is described by the density curve shown below:

Unit 19: Probability Models

Lecture Notes Module 1

9. Sampling Distributions

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

How To Write A Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Review for Test 2. Chapters 4, 5 and 6

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 4 Lecture Notes

The Normal Distribution

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Statistics 151 Practice Midterm 1 Mike Kowalski

Means, standard deviations and. and standard errors

Lecture 6: Discrete & Continuous Probability and Random Variables

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

MEASURES OF VARIATION

CONTINGENCY (CROSS- TABULATION) TABLES

Statistics. Measurement. Scales of Measurement 7/18/2012

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Pr(X = x) = f(x) = λe λx

PROBABILITY AND SAMPLING DISTRIBUTIONS

Exploratory Data Analysis

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Chapter 4. Probability Distributions

TEACHER NOTES MATH NSPIRED

Basic Probability Concepts

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Random variables, probability distributions, binomial random variable

EMPIRICAL FREQUENCY DISTRIBUTION

Section 6.2 Definition of Probability

Week 4: Standard Error and Confidence Intervals

Basics of Statistics

Unit 4 The Bernoulli and Binomial Distributions

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

Sums of Independent Random Variables

6 3 The Standard Normal Distribution

II. DISTRIBUTIONS distribution normal distribution. standard scores

Lecture 8. Confidence intervals and the central limit theorem

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Mind on Statistics. Chapter 8

WEEK #23: Statistics for Spread; Binomial Distribution

Chapter 3 RANDOM VARIATE GENERATION

Descriptive Statistics

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Interpreting Data in Normal Distributions

Transcription:

The basics of probability theory. Distribution of variables, some important distributions 1

Random experiment The outcome is not determined uniquely by the considered conditions. For example, tossing a coin, rolling a dice, measuring the concentration of a solution, measuring the body weight of an animal, etc. are experiments. Every experiment has more, sometimes infinitely large outcomes 2

Event: the result (or outcome) of an experiment elementary events: the possible outcomes of an experiment. composite event: it can be divided into subevents. Example. The experiment is rolling a dice. Elementary events are 1,2,3,4,5,6. Composite events: E1={1,3,5} (the result is an odd number). E2={2,4,6} (the result is an even number). E3={5,6} (the result is greater than 4). Ω={1,2,3,4,5,6} (the result is the certain event). 3

Operations with events The complementary event of an event A is the event which occurs whenever A fails. Example: = ={2,4,6} The sum of two events A and B is the event A+B which occurs if A or B occurs. E1+E2={1,3,5}+{2,4,6}={1,2,3,4,5,6} E1+E3={1,3,5}+{5,6}={1,3,5,6} The product of two events A and B is the event AB which occurs if A and B occur. E1 E2={1,3,5} {2,4,6}= E1 E3={1,3,5} {5,6}={5} If A B =, then we say that A and B are mutually exclusive events. 4

The concept of probability 5

The concept of probability Lets repeat an experiment n times under the same conditions. In a large number of n experiments the event A is observed to occur k times (0 k n). k : frequency of the occurrence of the event A. k/n : relative frequency of the occurrence of the event A. 0 k/n 1 If n is large, k/n will approximate a given number. This number is called the probability of the occurrence of the event A and it is denoted by P(A). 0 P(A) 1 6

Example: the tossing of a (fair) coin k: number of head s n= 10 100 1000 10000 100000 k= 7 42 510 5005 49998 k/n= 0.7 0.42 0.51 0.5005 0.49998 P( head )=0.5 1 0.5 0 10 100 1000 10000 k/n 7

Probability facts Any probability is a number between 0 and 1. All possible outcomes together must have probability 1. The probability of the complementary event of A is 1-P(A). 8

Formula for the calculation of simple exact probabilities 9

Rules of probability calculus Assumption: all elementary events are equally probable F P(A) = = T number of favorite outcomes total number of outcomes Examples: Rolling a dice. What is the probability that the dice shows 5? If we let X represent the value of the outcome, then P(X=5)=1/6. What is the probability that the dice shows an odd number? P(odd)=1/2. Here F=3, T=6, so F/T=3/6=1/2. 10

Population, sample 11

Population, sample Population: the entire group of individuals that we want information about. Sample: a part of the population that we actually examine in order to get information A simple random sample of size n consists of n individuals chosen from the population in such a way that every set of n individuals has an equal chance to be in the sample actually selected. 12

Examples Sample data set Questionnaire filled in by a group of pharmacy students Blood pressure of 20 healthy women Population Pharmacy students Students Blood pressure of women (whoever) 13

Sample Bar chart of relative frequencies of a categorical variable Population (approximates) Distribution of that variable in the population 100 Gender 80 77 Percent 60 40 20 23 0 male female 14

Sample Population (approximates) Histogram of relative frequencies of a continuous variable Distribution of that variable in the population 30 Body height 30 Body height 20 20 10 Frequency Std. Dev = 8.52 Mean = 170.4 0 N = 87.00 150.0 160.0 170.0 180.0 190.0 155.0 165.0 175.0 185.0 195.0 Body height Frequency 10 Std. Dev = 8.52 Mean = 170.4 0 N = 87.00 150.0 160.0 170.0 180.0 190.0 155.0 165.0 175.0 185.0 195.0 Body height 15

Sample Mean (x) Standard deviation (SD) Median Population (approximates) Mean μ (unknown) Standard deviation σ (unknown) Median (unknown) 30 Body height 20 10 Frequency Std. Dev = 8.52 Mean = 170.4 0 N = 87.00 150.0 160.0 170.0 180.0 190.0 155.0 165.0 175.0 185.0 195.0 Body height 16

Random variables, probability distributions (distribution of the population) 17

Random variables, probability distributions A random variable is a variable whose value is a numerical outcome of a random phenomenon. Notation: X, Y,.. Examples The experiment is tossing a coin. X(H)=1 and X(T)=2 Y(H)=-10 and Y(T)=10 The experiment is rolling a dice. Ω={1,2,3,4,5,6}. Let define the X to be the number shown on the dice. 18

Distribution of a discrete (categorical) random variable A discrete random variable X has finite number of possible values The probability distribution of X lists the values and their probabilities: Value of X: x 1 x 2 x 3 x n Probability: p 1 p 2 p 3 p n p i 0, p 1 + p 2 +p 3 +p n =1 19

Examples The experiment is tossing a coin. p 1 =0.5, p 2 =0.5 The experiment is rolling a dice. p 1 =1/6, p 2 =1/6,, p 6 =1/6 1 0.5 1 0.5 0 x1 x2 0 x1 x2 x3 x4 x5 x6 20

Example: rolling two dices Let random variable X be the sum of the two numbers shown on the two dices. P(X=1)=0, (X=1 is impossible) P(X=2)=1/36 (the only favourable event is (1,1), and the number of all possible event is 36. ) j=1 j=2 j=3 j=4 j=5 j=6 i=1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) X 2 3 4 5 6 7 i=2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) X 3 4 5 6 7 8 i=3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) X 4 5 6 7 8 9 i=4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) X 5 6 7 8 9 10 i=5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) X 6 7 8 9 10 11 i=6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) X 7 8 9 10 11 12 6 5 4 /36 3 2 1 0 P(X=1) P(X=2) P(X=3) P(X=4) P(X=5) P(X=6) P(X=7) P(X=8) P(X=9) P(X=10) P(X=11) P(X=12) 21

Uniform discrete distributions: all p i -s are equal 1 1 0.5 0.5 0 x1 x2 0 x1 x2 x3 x4 x5 x6 Not uniform 6 5 4 /36 3 2 1 0 P(X=1) P(X=2) P(X=3) P(X=4) P(X=5) P(X=6) P(X=7) P(X=8) P(X=9) P(X=10) P(X=11) P(X=12) 22

The binomial distribution Let's consider an experiment A that may have only two possible mutually exclusive outcomes (success, failure) Let P(A)=p We now repeat the experiment n times, let X denote the absolute frequency of the event A. The probability that X will assume any given possible value k is expressed by the binomial formula 23

Example Number of successful cases Probability distribution Distribution function Probability of "success" 0 0.028247525 0.028247525 0.3 1 0.121060821 0.149308346 2 0.233474441 0.382782786 3 0.266827932 0.649610718 4 0.200120949 0.849731667 5 0.102919345 0.952651013 6 0.036756909 0.989407922 7 0.009001692 0.998409614 8 0.001446701 0.999856314 9 0.000137781 0.999994095 10 5.9049E-06 1 Összesen 1 Probability distribution Distribution function 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 2 3 4 5 6 7 8 9 10 11 0 0 1 2 3 4 5 6 7 8 9 10 11 Binomial distribution n=10, input p, k=0,1,,10 In a certain population the occurrence of some disease is p=0.3. What is the probability that examining n=10 patients, there will be exactly k=4 diseased? According to the formula, P(X=4)=10!/(4!6!)(0.3)4(0.7)6=210 0.0081 0.117649= 0.200120949. 24

Continuous random variable A continuous random variable X has takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The density curve is on the above the horizontal axis, and has area exactly 1 underneath it. The probability of any event is the area under the density curve and above the values of X that make up the event. b f ( x) dx = P( a < x < b) a f ( x) dx = 1 P(A) A 25

The density curve The density curve is an idealized description of the overall pattern of a distribution that smoothes out the irregularities in the actual data. The density curve is on the above the horizontal axis, and has area exactly 1 underneath it. The area under the curve and above any range of values is the proportion of all observations that fall in that range. P(A) A 26

The density curve 1. is on the above the horizontal axis: f(x) 0 The density curve 2. has area exactly 1 underneath it. f ( x) dx =1 a b 3. The area under the curve and above any range of values is the proportion of all observations that fall in that range. b f ( x) dx = P( a x < b) a 27

Special density curve: Normal distribution 30 Body height 20 10 Frequency 0 Std. Dev = 8.52 Mean = 170.4 N = 87.00 150.0 160.0 170.0 180.0 190.0 155.0 165.0 175.0 185.0 195.0 Body height 28

Special distribution of a continuous variable: The normal distribution The normal curve was developed mathematically in 1733 by DeMoivre as an approximation to the binomial distribution. His paper was not discovered until 1924 by Karl Pearson. Laplace used the normal curve in 1783 to describe the distribution of errors. Subsequently, Gauss used the normal curve to analyze astronomical data in 1809. The normal curve is often called the Gaussian distribution. The term bell-shaped curve is often used in everyday usage. 29

Special distribution of a continuous variable: The normal distribution N(μ,σ 2 ) The density curves are symmetric, single-peaked and bell-shaped The 68-95-99.7 rule. In the normal distribution with mean μ and standard deviation σ: 68% of the observations fall within σ of the mean μ 95% of the observations fall within 2σ of the mean μ 99.7% of the observations fall within 3σ of the mean μ 30

Normal distributions N(μ, σ 2 ) N(0,1) N(1,1) 0.6 Probability Density Function y=normal(x;0;1) 0.6 Probability Density Function y=normal(x;1;1) 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.6 0.0-3 -2-1 0 1 2 3 Probability Density Function y=normal(x;0;2) 0.0-3 -2-1 0 1 2 3 0.5 0.4 0.3 0.2 0.1 0.0-3 -2-1 0 1 2 3 μ, σ : parameters (a parameter is a number that describes the distribution) N(0,2) 31

Imagination of the distribution given the sample mean and sample SD supposing normal distribution In the papers generally sample means and SD-s are published. We use these value to know the original distribution For example, age in years: 55.2 ± 15.7 39.5 70.9 68.26% of the data are supposed to be in this interval 95.44% of the data are supposed to be in the interval (23.8, 86.6) 32

Stent length per lesion (mm): 18.8 ± 10.5 Using these values as parameters, the normal distribution would be like this: Probability Density Function y=normal(x;18.8;10.5) 0.040 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000-5 0 5 10 15 20 25 30 35 40 The original distribution of stent length was probably skewed, or if the distribution was normal, there were negative values in the data set. 33

Standard normal probabilities x (x): proportion of area to the left of x -4 0.00003-3 0.0013-2.58 0.0049 0.6 Probability Density Function y=normal(x;0;1) 1. -2.33 0.0099-2 0.0228-1.96 0.0250 0.5 0.4 0. 0. -1.65 0.0495 0.3-1 0.1587 0.2 0. 0 0.5 0.1 0. 1 0.8413 1.65 0.9505 1.96 0.975 0.0-3 -2-1 0 1 2 3-1.96 1.96 0. 2 0.9772 0.025 0.95 0.025 2.33 0.9901 2.58 0.9951 3 0.9987 4 0.99997 34

The central limit theorem 35

Distribution of sample means http://onlinestatbook.com/stat_sim/sampling_dist/index.html 36

The population is not normally distributed 37

The central limit theorem If the sample size n is large (say, at least 30), then the population of all possible sample means approximately has a normal distribution with mean µ and standard deviation σ no matter n what probability describes the population sampled 38

Tha standard error of mean (SE or SEM) σ is called the standard error of mean n Meaning: the dispersion of the sample means around the (unknown) population mean. 39

Calculation of the standard error from the standard deviation when σ is unknown Given x 1, x 2, x 3,, x n statistical sample, the stadard error can be calculated by SE = SD n = n i= 1 ( x i n( n x) 1) 2 It expresses the dispersion of the sample means around the (unknown) population mean. 40

Review questions What is the concept of probability? What is the formula for computing simple exact probabilities? When can it be used? What is the distribution of a discrete variable? What are the properties of a discrete distribution? What is the distribution of a continuous variable? What are the properties of the density function? What is the uniform distribution? What is the binomial distribution? What is the normal distribution? Parameters of the normal distribution. Properties of the normal distribution. 41

Problems 1. If we roll a dice, there are 6 possible outcomes. If X represents the value of the outcome, find the following probabilities: a) P(X=1);b) P(X>1); c) P(1<X<4) 2. A fair coin is tossed twice. List the possible outcomes? What is the probability of getting two tails? 3. For a standard normal distribution, find the following probabilities using the standard normal table: P(X<0)=. P(X>0)=.. P(X<1)=. P(X>1)=.. P(X<-1)=.. P(-1<X<1)= 42

Useful WEB pages http://onlinestatbook.com/rvls/index.html http://my.execpc.com/~helberg/statistics.html http://www.regentsprep.org/regents/math/algtri g/ats2/normallesson.htm 43