Chapter 11 R Probability Examples

Similar documents
Binomial Distribution n = 20, p = 0.3

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Solutions to Homework 6 Statistics 302 Professor Larget

Mean = (sum of the values / the number of the value) if probabilities are equal

Solutions to Homework 3 Statistics 302 Professor Larget

Important Probability Distributions OPRE 6301

E3: PROBABILITY AND STATISTICS lecture notes

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Multinomial and Ordinal Logistic Regression

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Stat 5102 Notes: Nonparametric Tests and. confidence interval

STAT 360 Probability and Statistics. Fall 2012

Statistical Functions in Excel

Pr(X = x) = f(x) = λe λx

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

MULTIPLE REGRESSION EXAMPLE

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Binomial Sampling and the Binomial Distribution

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Stats on the TI 83 and TI 84 Calculator

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Normal distribution. ) 2 /2σ. 2π σ

Unit 26: Small Sample Inference for One Mean

Statistical Analysis of Proportions

Bootstrap Example and Sample Code

Normal Approximation. Contents. 1 Normal Approximation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Solutions to Homework 10 Statistics 302 Professor Larget

Point and Interval Estimates

Variables. Exploratory Data Analysis

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Week 4: Standard Error and Confidence Intervals

Review of Random Variables

The Variability of P-Values. Summary

Chapter 7. One-way ANOVA

Exploratory Data Analysis

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Review #2. Statistics

Table of Useful R commands

Generalized Linear Models

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

AMS 5 CHANCE VARIABILITY

Confidence Intervals for One Standard Deviation Using Standard Deviation

Inference for two Population Means

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Binary Diagnostic Tests Two Independent Samples

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Two-sample inference: Continuous data

Online 12 - Sections 9.1 and 9.2-Doug Ensley

11. Analysis of Case-control Studies Logistic Regression

3.4 Statistical inference for 2 populations based on two samples

WHERE DOES THE 10% CONDITION COME FROM?

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

Lecture 7: Continuous Random Variables

Random Variables. Chapter 2. Random Variables 1

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Introduction to Quantitative Methods

4.3 Lagrange Approximation

Regression Analysis: A Complete Example

STRUTS: Statistical Rules of Thumb. Seattle, WA

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

2.6. Probability. In general the probability density of a random variable satisfies two conditions:

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

4. Continuous Random Variables, the Pareto and Normal Distributions

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Multivariate Logistic Regression

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Introduction to General and Generalized Linear Models

STAT 35A HW2 Solutions

ECON1003: Analysis of Economic Data Fall 2003 Answers to Quiz #2 11:40a.m. 12:25p.m. (45 minutes) Tuesday, October 28, 2003

Chapter 23 Inferences About Means


II. DISTRIBUTIONS distribution normal distribution. standard scores

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Association Between Variables

Name: Date: Use the following to answer questions 3-4:

Chapter 7 Section 7.1: Inference for the Mean of a Population

Using Stata for Categorical Data Analysis

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Psychology 205: Research Methods in Psychology

Simple Random Sampling

Dongfeng Li. Autumn 2010

Probability and statistical hypothesis testing. Holger Diessel

Chapter 4 Lecture Notes

Mind on Statistics. Chapter 12

LOGIT AND PROBIT ANALYSIS

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

SAS Software to Fit the Generalized Linear Model

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Statistics 100A Homework 4 Solutions

Transcription:

Chapter 11 R Probability Examples Bret Larget March 26, 2014 Abstract This document shows some probability examples and R code that goes beyond the scope of the Lock 5 textbook. 1 2 2 Tables To illustrate the ideas, we begin with an artificial example where each of a sample of 20 individuals is characterized by sex and whether or not they have one or more pierced ears. Here is the data in a table. Ears Sex Pierced not Pierced Total Female 10 2 12 Male 1 7 8 Total 11 9 20 Let p F and p M be the population proportions of individuals with at least one pierced ear for females and males, respectively, and assume that the 20 indivuals are a random sample from a population of interest. Here is a hypothesis we can test to assess the evidence that a larger proportion of females have at least one pierced ear than do males. H 0 : p F = p M H A : p F > p M Earlier, we computed a p-value for this test by taking an array of 11 ones and 9 zeros, sampling 12 of these without replacement, taking the difference in sample proportions of ones for the samples of size 12 and 8, and seeing what proportion of these were at least as large as the observed difference 10 12 1 8. = 0.7083 From this process, we can define X to be the number of individuals in the sample of size 12 with pierced ears if we were to sample 12 individuals at random from 20 of which 11 have a pierced ear. In the sample, X = 10. For the data to at least as extreme, X would need to be 10 or larger. The only outcome with a more extreme difference in proportions would be if X = 11 and of the data looked like this. 1

Ears Sex Pierced not Pierced Total Female 11 1 12 Male 0 8 8 Total 11 9 20 There are only 11 individuals with pierced ears, so that is the maximum that can be in the first sample. The minimum is 3 which would happen if all 8 individuals from the sample of 8 had pierced ears. The p-value, then, is P (X = 10 X = 11 which is P (X = 10+P (X = 11 as these two events are disjoint. One way to think about computing these probabilities is to think about all ways to choose 12 balls from 20 and to count how many of these ways have 10 (or 11 of the color associated with pierced ears. Using the binomial coefficient ( n k = ( 20 12 n! k!(n k! for the number of ways to choose k items from n, we find the following. ( 11 9 P (X = 10 = 10( 2 11 36. = = 0.0031 125970 The p-value is P (X = 10 + P (X = 11, so we also need to calculate ( 9 P (X = 11 = 11( 1 = 1 9 125970 and their sum P-value = ( 20 12 11 x=10 ( 11 ( 9 x ( 20 12 12 x. = 7.1446 10 5. = 0.0032 Summary in context. There is very strong evidence that the population proportion of people with at least one pierced ear is higher for females than males (p = 0.0032, 1-sided Fisher s Exact Text. Calculations in R. Here are severl ways to do this calculation in R. First, using choose(. choose(11, 10 * choose(9, 2/choose(20, 12 + choose(11, 11 * choose(9, 1/choose(20, 12 ## [1] 0.003215 A second way recognizes that X is a hypergeometric random variable with parameters m = 11 ones, n = 9 zeros, and a sample size of k = 12, using dhyper(. sum(dhyper(10:11, m = 11, n = 9, k = 12 ## [1] 0.003215 A third way uses the fact that P (X 10 = 1 P (X 9. 2

1 - phyper(9, m = 11, n = 9, k = 12 ## [1] 0.003215 Finally, there is a builtin function named fisher.test( that takes the data as a matrix and the direction of the alternative hypothesis refers to the upper left element. mat = matrix(c(10, 1, 2, 7, nrow = 2, ncol = 2 mat ## [,1] [,2] ## [1,] 10 2 ## [2,] 1 7 fisher.test(mat, alternative = "greater" ## ## Fisher's Exact Test for Count Data ## ## data: mat ## p-value = 0.003215 ## alternative hypothesis: true odds ratio is greater than 1 ## 95 percent confidence interval: ## 2.634 Inf ## sample estimates: ## odds ratio ## 26.6 1.1 Another Problem Exercise 11.34 on page 652 categorizes members of the Rock and Roll Hall of Fame on whether or not the member is a performer or not, and on whether or not the member (individual or group has at least one female member or not. Here is the data. Female No female members members Total Performer 32 149 181 Not a performer 9 83 92 Total 41 232 273 The observed proportion of performer members with at least one female member is 32/181. = 0.1768 while the observed proportion of nonperformer members with at least one female member is 9/92. = 0.0978. This data is the entire population, so inference from a sample to a population does not make sense. But we can ask how unusual it would be to see a difference as large as 32 181 9 92. = 0.079 when taking random samples without replacement of sizes 181 and 92 from the total. In this example, it would be much more tedious to compute the probabilities of each outcome at least as extreme as the real data individually, so we do not. 3

1 - phyper(31, 41, 232, 181 ## [1] 0.058 sum(dhyper(32:41, 41, 232, 181 ## [1] 0.058 2 Normal Distribution Problems The density of a normal distribution with mean µ and standard deviation σ is f(x µ, σ = 1 σ 2π e 1 2( x µ σ 2 The probability that a normal random variable X is between a and b where a < b + is Using the change of variable it follows that where b a b a f(x µ, σ dx z = x µ, dz = dx/σ σ f(x µ, σ dx = (b µ/σ (a µ/σ φ(z = 1 2π e 1 2 (z2 φ(z dz is a normal density with µ = 0 and σ = 1 called the standard normal density. Hence, every probability calculation for an arbitrary normal distribution can be rewritten as an equivalent calculation for a standard normal distribution. ( a µ P (a < X < b = P < Z < b µ σ σ 2.1 Finding Probabilities Probabilities from normal distributions are computed by finding areas under a normal curve. The R function pnorm( will tell the area to the left. Here are several examples from a normal curve with µ = 500 and σ = 100, drawn below. require(ggplot2 ## Loading required package: ggplot2 ggplot(data.frame(x = c(200, 800, aes(x = x + stat_function(fun = dnorm, args = list(mean = 500, sd = 100 + geom_segment(aes(x = 200, xend = 800, y = 0, yend = 0 4

0.004 0.003 y 0.002 0.001 0.000 200 400 600 800 x P (X < 400 pnorm(400, 500, 100 ## [1] 0.1587 P (X > 650 1 - pnorm(650, 500, 100 ## [1] 0.06681 P ( X 500 > 150 pnorm(350, 500, 100 + (1 - pnorm(650, 500, 100 ## [1] 0.1336 2.2 Finding Quantiles The 0.05 and 0.95 quantiles. qnorm(c(0.05, 0.95, 500, 100 ## [1] 335.5 664.5 2.3 Finding Coverage Probabilities Suppose that a sample mean from an independent sample from a large normal population has a standard error of 20. Then, a 95% confidence interval using the 2 SE rule covers the true mean if X µ < 40, or 40 < X µ < 40. As X µ N(0, 20, we calculate the coverage probability as 5

pnorm(40, 0, 20 - pnorm(-40, 0, 20 ## [1] 0.9545 Note this is slightly larger than 0.95, but it also depended on knowing the SE perfectly. 2.4 Power Calculation Suppose that the standard error of the sample mean is 20 and the distribution is normal; in symbols, X N(µ, 20. For the hypotheses H 0 : µ = 50 versus H A : µ < 50, do the following. 1. Assume H 0 is true. For what value c is it true that if X = c, the p-value calculated from a normal distribution will be 0.01? qnorm(0.01, 50, 20 ## [1] 3.473 2. If H 0 is true, what is the probability that the p-value is less than 0.01? pnorm(qnorm(0.01, 50, 20, 50, 20 ## [1] 0.01 Or, think it through! 3. If X = 28.7, what is the p-value? pnorm(28.7, 50, 20 ## [1] 0.1434 4. If µ = 44, what is the probability that the p-value will be less than 0.01? pnorm(qnorm(0.01, 50, 20, 44, 20 ## [1] 0.02136 3 Binomial Calculations The binomial distribution with parameters n and p has probability mass function ( n p(x = p x (1 p n x, for x = 0, 1, 2,..., n x Here are some sample calculations for a distribution with n = 50 and p = 0.3. Note that dbinom( computes binomial probabilities at individual outcomes, pbinom( computes the cumulative distribution function, the sum of probabilities less than or equal to a value, and qbinom( finds quantiles. 6

1. P (X = 14. dbinom(14, 50, 0.3 ## [1] 0.1189 2. P (X = x for x = 5,..., 10. dbinom(5:10, 50, 0.3 ## [1] 0.0005509 0.0017709 0.0047705 0.0109891 0.0219783 0.0386190 3. P (X 18 two ways. pbinom(18, 50, 0.3 ## [1] 0.8594 sum(dbinom(0:18, 50, 0.3 ## [1] 0.8594 4. P (14 X 18. pbinom(18, 50, 0.3 - pbinom(13, 50, 0.3 ## [1] 0.5316 sum(dbinom(14:18, 50, 0.3 ## [1] 0.5316 5. P (X 20. 1 - pbinom(19, 50, 0.3 ## [1] 0.0848 sum(dbinom(20:50, 50, 0.3 ## [1] 0.0848 6. The 0.1 and 0.9 quantiles. 7

qbinom(c(0.1, 0.9, 50, 0.3 ## [1] 11 19 7. A number c so that P (X c 0.4 and P (X c 0.6. (This is the 0.4 quantile. qbinom(0.4, 50, 0.3 ## [1] 14 pbinom(qbinom(0.4, 50, 0.3, 50, 0.3 ## [1] 0.4468 1 - pbinom(qbinom(0.4, 50, 0.3-1, 50, 0.3 ## [1] 0.6721 8