Probability Concepts. Marie Diener-West, PhD Johns Hopkins University

Similar documents
Summary Measures (Ratio, Proportion, Rate) Marie Diener-West, PhD Johns Hopkins University

Measures of Prognosis. Sukon Kanchanaraksa, PhD Johns Hopkins University

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

PROBABILITY. Chapter. 0009T_c04_ qxd 06/03/03 19:53 Page 133

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

How to Approach a Study: Concepts, Hypotheses, and Theoretical Frameworks. Lynda Burton, ScD Johns Hopkins University

Biostatistics and Epidemiology within the Paradigm of Public Health. Sukon Kanchanaraksa, PhD Marie Diener-West, PhD Johns Hopkins University

CONTINGENCY (CROSS- TABULATION) TABLES

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

Chapter 7 Probability. Example of a random circumstance. Random Circumstance. What does probability mean?? Goals in this chapter

Decision Making Under Uncertainty. Professor Peter Cramton Economics 300

Blood Type Probability O 0.42 A 0.43 B 0.11 AB 0.04

What do we mean by cause in public health?

Case-Control Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

Chapter 13 & 14 - Probability PART

Evaluation of Diagnostic and Screening Tests: Validity and Reliability. Sukon Kanchanaraksa, PhD Johns Hopkins University

Probability. Sample space: all the possible outcomes of a probability experiment, i.e., the population of outcomes

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

STA 371G: Statistics and Modeling

Lesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables

Economics and Economic Evaluation

Cohort Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

Section 6.2 Definition of Probability

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

2. Three dice are tossed. Find the probability of a) a sum of 4; or b) a sum greater than 4 (may use complement)

MATH 140 Lab 4: Probability and the Standard Normal Distribution

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

3. Data Analysis, Statistics, and Probability

TOPIC P2: SAMPLE SPACE AND ASSIGNING PROBABILITIES SPOTLIGHT: THE CASINO GAME OF ROULETTE. Topic P2: Sample Space and Assigning Probabilities

Lecture 13. Understanding Probability and Long-Term Expectations

Homework 8 Solutions

36 Odds, Expected Value, and Conditional Probability

Chapter 5 Section 2 day f.notebook. November 17, Honors Statistics

Economic and Financial Considerations in Health Policy. Gerard F. Anderson, PhD Johns Hopkins University

Chapter What is the probability that a card chosen from an ordinary deck of 52 cards is an ace? Ans: 4/52.

Bayesian Tutorial (Sheet Updated 20 March)

Section 6-5 Sample Spaces and Probability

The study of probability has increased in popularity over the years because of its wide range of practical applications.

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

In the situations that we will encounter, we may generally calculate the probability of an event

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

Basic Probability Theory II

RATIOS, PROPORTIONS, PERCENTAGES, AND RATES

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Multiple Year Cost Calculations

COMMON CORE STATE STANDARDS FOR

OPRE504 Chapter Study Guide Chapter 7 Randomness and Probability. Terminology of Probability. Probability Rules:

AP Stats - Probability Review

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Unit 2 Introduction to Probability

Noise. Patrick N. Breysse, PhD, CIH Peter S.J. Lees, PhD, CIH. Johns Hopkins University

Mind on Statistics. Chapter 4

Elements of probability theory

What is the purpose of this document? What is in the document? How do I send Feedback?

E3: PROBABILITY AND STATISTICS lecture notes

Basic Probability Concepts

Study Designs. Simon Day, PhD Johns Hopkins University

Section B. Some Basic Economic Concepts

1) The table lists the smoking habits of a group of college students. Answer: 0.218

Section 5 Part 2. Probability Distributions for Discrete Random Variables

What Is Probability?

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION


Probability and statistical hypothesis testing. Holger Diessel

Scientific Method. 2. Design Study. 1. Ask Question. Questionnaire. Descriptive Research Study. 6: Share Findings. 1: Ask Question.

Statistics in Geophysics: Introduction and Probability Theory

3.2 Conditional Probability and Independent Events

Unit 4 The Bernoulli and Binomial Distributions

Measure #112 (NQF 2372): Breast Cancer Screening National Quality Strategy Domain: Effective Clinical Care

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Probability definitions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Introduction to Game Theory IIIii. Payoffs: Probability and Expected Utility

STAT 319 Probability and Statistics For Engineers PROBABILITY. Engineering College, Hail University, Saudi Arabia

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Coin Flip Questions. Suppose you flip a coin five times and write down the sequence of results, like HHHHH or HTTHT.

Chapter 4: Probability and Counting Rules

DECISION MAKING UNDER UNCERTAINTY:

STA 256: Statistics and Probability I

Chapter 4. Probability and Probability Distributions

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Chapter 3. Probability

Cost-Benefit and Cost-Effectiveness Analysis. Kevin Frick, PhD Johns Hopkins University

RetirementWorks. proposes an allocation between an existing IRA and a rollover (or conversion) to a Roth IRA based on objective analysis;

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Chapter 7: Effect Modification

Lesson 1. Basics of Probability. Principles of Mathematics 12: Explained! 314

Chapter 4 - Practice Problems 1

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

WOMEN AND RISK MADE SIMPLE

Probability. Section 9. Probability. Probability of A = Number of outcomes for which A happens Total number of outcomes (sample space)

Towards Business Process Standards. Anna O. Orlova, PhD Johns Hopkins Bloomberg School of Public Health

Probabilistic Strategies: Solutions

Unit 19: Probability Models

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University

Latent Class Regression Part II

Transcription:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2008, The Johns Hopkins University and Marie Diener-West. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

Probability Concepts Marie Diener-West, PhD Johns Hopkins University

Section A Definitions and Examples

What Is Probability? Probability provides a measure of the uncertainty (or certainty) associated with the occurrence of events or outcomes Probability is useful in exploring and quantifying relationships 4

What Do We Mean by Probability of 1 in 14 Million? Winning the Maryland Lottery Heads every time in 24 tosses of a coin 350 people in the world with a characteristic 17 people in the United States with a characteristic 5

Useful Set Notation and Definitions A set is a collection of objects The objects in a set are called the elements of the set The joint occurrence of two sets (the intersection) contains the common elements Two sets are mutually exclusive if the sets have no common elements 6

Intersection of Sets The intersection of two sets, A and B, is another set which consists of the common elements of both A and B (all elements belonging to both A and B) The intersection is the joint occurrence of A and B Notation: A and B, A B 7

Example of an Intersection of Events 40 women (characteristic A) 30 individuals aged 30 or older (characteristic B) Suppose we are told that there are 20 women aged 30 years or older The intersection of sets A and B is the set of 20 women aged 30 or older (A and B) The intersection is the joint occurrence of individuals who are women and of age 30 years or older 8

Example of Mutually Exclusive Events 60 women aged 60 or older (characteristic A) 20 women aged less than 60 (characteristic B) The intersection of the two sets, A and B, is empty because a woman cannot belong to both age groups Sets A and B are mutually exclusive (e.g., the two age groups are mutually exclusive) because individuals cannot belong to both age groups at the same time 9

Union of Sets The union of two sets, A and B, is another set which consists of all elements belonging to set A or to set B (or some may belong to both sets) Notation: A or B, A B 10

Example of a Union of Events 40 women (characteristic A) 30 individuals aged 30 or older (characteristic B) The intersection of sets A and B is the set of 20 women aged 30 or over (A and B) Sets A and B total to 70 But 20 individuals are common to both sets The union of sets A and B is the set of all individuals who are women or aged 30 The union of the these two sets (A or B) consists of the 50 unique individuals who have one or both characteristics (A alone or B alone or both A and B) 11

Intersections and Unions If you have a group of 40 women (set A) And you have a group of 30 individuals aged 30 or over (set B) 12

Intersections and Unions The intersection of sets A and B are the 20 women who are aged 30 or over 13

Intersections and Unions If you have a group of 40 women (set A) And you have a group of 30 individuals aged 30 or over (set B) 14

Intersections and Unions The union of sets A and B is determined by adding up the total numbers of each set... Set A = 40 + Set B = 30 Total = 70 15

Intersections and Unions... And then subtracting out the individuals common to both sets Set A = 40 + Set B = 30 Total = 70 - Common = 20 Union: 50 unique individuals 16

Classical Interpretation of Probability An event occurs in N mutually exclusive and equally likely ways If m events possess a characteristic, E, then the probability of the occurrence of E is equal to m/n P(E) = m/n 17

Example of the Classical Interpretation of Probability Flipping two fair coins (equal chances of obtaining a head or a tail) results in N=4 equally likely and mutually exclusive outcomes E 1 = HH, E 2 = HT, E 3 = TH, E 4 = TT The probability of each outcome is 1/4 or 0.25 P(E 1 )=P(E 2 )=P(E 3 )=P(E 4 )= 0.25 18

Relative Frequency Interpretation of Probability Some process or experiment is repeated for a large number of times, n If m events possess the characteristic, E, then the relative frequency of occurrence of E, m/n will be approximately equal to the probability of E 19

Example 1: Flip Two Coins 100 Times Possible Outcomes Frequency E 1 = HH 22 E 2 = HT 28 E 3 = TH 23 E 4 = TT 27 Total 100 P(E 1 ) = P(2 heads) is 0.25 using the classical interpretation of probability P(E 1 ) is approximately 22/100 = 0.22 using the relative frequency interpretation of probability 20

Example 2: Flip Two Coins 10,000 Times Possible Outcomes Frequency E 1 = HH 2,410 E 2 = HT 2,582 E 3 = TH 2,404 E 4 = TT 2,604 Total 10,000 P(E 1 ) = P(2 heads) is 0.25 using the classical interpretation of probability P(E 1 ) is approximately 2,410/10,000 = 0.24 using the relative frequency interpretation of probability 21

Rules of Probability with Mutually Exclusive Events When there are n mutually exclusive events (cannot occur together), the probability of any event is nonnegative P(Event i) 0 The sum of the probabilities of all mutually exclusive events equals 1 P(Event 1) + P(Event 2) + +P(Event n) = 1 If Event i and Event j are mutually exclusive events, then the probability of either Event i or Event j is the sum of the two probabilities P(Event i or Event j ) = P(Event i) + P(Event j) 22

Addition Rule of Probability General rule: if two events, A and B, are not mutually exclusive, then the probability that event A or event B occurs is: P(A or B) = P(A) + P(B) P(A and B) P(A B) = P(A) + P(B) P(A B) Special case: when two events, A and B, are mutually exclusive, then the probability that event A or event B occurs is: P(A or B) = P(A) + P(B) P(A B) = P(A) + P(B) since P(A and B) = 0 for mutually exclusive events 23

Joint Probability The joint probability of an event A and an event B is P(A B) = P(A and B) When events A and B are mutually exclusive, then P(A and B) = 0 24

Conditional Probability The conditional probability of an event A given an event B is present is: P(A I B) P(A B) = where P (B) 0 P(B) 25

Multiplication Rule of Probability j AjB) General rule: the multiplication rule specifies the joint probability as: P(A IB) = P(B)P(A B) Special case: When events A and B are independent, then: P(A B) = P(A) P(AIB) = P(A) P(B) 26

Example: Relationship between Gender and Age Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 27

Marginal Probabilities Associated with the Example Marginal probabilities can be calculated: P(Male) = 50/200 = 0.25 P(Female) = 150/200 = 0.75 P(Young) = 70/200 = 0.35 P(Older) = 130/200 = 0.65 28

Probability of Selecting an Older Female? Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 Joint probability = P(Female and Older )= P( Female Older) = 110/200 = 0.55 29

Probability of Selecting a Female or Older Individual? Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 By inspecting the table, we can see that the probability of the union: P( Female or Older) = P(Female Older) = (40+110+20)/200 = 0.85 30

Probability of Selecting a Female or Older Individual? Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 Using the addition rule of probability: P( Female or Older) = P( Female) + P(Older) P(Female and Older) = 150/200 + 130/200 110/200 = 170/200=0.85 31

Probability of Selecting a Younger Male? Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 Joint Probability = P( Male and Younger ) = 30/200 = 0.15 32

Probability of Selecting a Male or a Younger Individual? Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 By inspecting the table, we can see that the probability of the union: P( Male or Younger) = (30+20+40)/200 = 0.45 33

Probability of Selecting a Male or a Younger Individual? Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 Using the addition rule of probability: P( Male or Younger) = P( Male) + P(Younger) P(Male and Younger) = 50/200 + 70/200 30/200 = 90/200=0.45 34

Are Two Characteristics, Sex and Age, Independent? Is sex independent of age in this group of patients? If sex and age are independent: Then the probability of being in a particular age group should be the same for both sexes In other words, the conditional probabilities should be equal Assess whether: P(Older given Male) = P(Older given Female) = P(Older) P(Older Male) = P(Older Female) = P(Older) 35

Conditional Probability for Males Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 P(Older given Male) = P(Older Male) =20/50 = 0.40 36

Conditional Probability for Females Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 P(Older given Female) = P(Older Female) =110/150= 0.73 37

Overall (Marginal Probability) Patients with Disease X Age Gender Young Older Total Male 30 20 50 Female 40 110 150 Total 70 130 200 P(Older) = 130/200= 0.65 38

Comparing Conditional Probabilities For males: P(Older Male) = 0.40 For females: P(Older Female) = 0.73 For all patients: P(Older) = 0.65 In this group of patients, age and sex are not independent because the probability of being in the older age group depends on gender! P(Older Male) P(Older Female) P(Older) 39

Summary Probabilities can describe certainty associated with an event or characteristic Types of events: Mutually exclusive events Independent events Addition and multiple rules of probability Types of probabilities: Marginal, joint, conditional Future applications of probability (e.g. screening tests) 40

Section B Using Probability in an Epidemiology Word Problem

Probability Word Problem In a certain population of women: 4% have had breast cancer 20% are smokers 3% are smokers who have had breast cancer From this problem, three probability statements may be stated A 2x2 table of cell counts and marginal totals can be constructed 42

Probability Statement 1 4% have had breast cancer P (breast cancer) = 0.04 Women with Breast Cancer Smoker Yes No Total Yes No Total 4 96 100 43

Probability Statement 2 20% are smokers P (smoker) = 0.20 Women with Breast Cancer Smoker Yes No Total Yes 20 No 80 Total 4 96 100 44

Probability Statement 3 Three percent are smokers who have had breast cancer P (breast cancer and smoker) = 3/100 = 0.03 Women with Breast Cancer Smoker Yes No Total Yes 3 20 No 80 Total 4 96 100 45

Completing the 2 x 2 Table for Word Problem Women with Breast Cancer Smoker Yes No Total Yes 3 17 20 No 1 79 80 Total 4 96 100 46

Marginal Probabilities P (breast cancer) = 0.04 P (no breast cancer) = 0.96 P (smoker) = 0.20 P (non-smoker) = 0.80 Women with Breast Cancer Smoker Yes No Total Yes 3 17 20 No 1 79 80 Total 4 96 100 47

Question: Breast Cancer or Smoking? Method 1 What is the probability that a woman selected at random from this population has had breast cancer or smokes? Method 1: one can see from the table that: P(Breast Cancer or Smoker) = P(Breast Cancer Smoker) = (3+1+17)/100 = 0.21 Women with Breast Cancer Smoker Yes No Total Yes 3 17 20 No 1 79 80 Total 4 96 100 48

Question: Breast Cancer or Smoking? Method 2 What is the probability that a woman selected at random from this population has had breast cancer or smokes? Method 2: using the addition rule: P(Breast Cancer or Smoker) = P(Breast Cancer) + P(Smoker) P(Breast Cancer and Smoker) = 0.04 + 0.20 0.03 = 0.21 = 21% Women with Breast Cancer Smoker Yes No Total Yes 3 17 20 No 1 79 80 Total 4 96 100 49

Answer: Breast Cancer or Smoking? The probability that a woman has had breast cancer or smokes is 0.21 or 21% This probability may be derived from either: Inspection of the 2 x 2 table The probability statement 50

Question: Is Breast Cancer Independent of Smoking? Is breast cancer independent of smoking in this population? In other words, is the probability of breast cancer the same for both smokers and non-smokers? Compare the separate (conditional) probabilities of breast cancer for the two smoking groups Women with Breast Cancer Smoker Yes No Total Yes 3 17 20 No 1 79 80 Total 4 96 100 51

Comparing Conditional Probabilities of Breast Cancer For smokers: P(Breast Cancer Smoker) = 3/20=0.15 For non-smokers: P(Breast Cancer Non-Smoker) = 1/80=0.01 For all women: P(Breast Cancer) = 4/100 = 0.04 Women with Breast Cancer Smoker Yes No Total Yes 3 17 20 No 1 79 80 Total 4 96 100 52

Answer: Breast Cancer Is Not Independent of Smoking In this group of patients, breast cancer and smoking are not independent because the probability of having breast cancer differs by smoking group Semantics it s the same to say: Breast cancer and smoking are not independent Breast cancer and smoking are dependent (for example, there appears to be an association The probability of breast cancer is increased in smokers P(Breast Cancer Smoker) P(Breast Cancer Non-Smoker) P(Breast Cancer) 53

Other Questions? What is the probability of having breast cancer? (marginal probability) What is the probability of being a smoker for a woman without breast cancer? (conditional probability) What is the joint probability of not smoking and not having breast cancer? (intersection) What is the probability of being a smoker or not being a smoker? (union) What is the joint probability of being both a smoker and a non-smoker? (intersection) 54