STAT 200. Guided Exercise 4

Similar documents
Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Normal distribution. ) 2 /2σ. 2π σ

Probability Distributions

Section 6.1 Discrete Random variables Probability Distribution

Chapter 5: Normal Probability Distributions - Solutions

Mind on Statistics. Chapter 8

Independent samples t-test. Dr. Tom Pierce Radford University

AP STATISTICS 2010 SCORING GUIDELINES

Characteristics of Binomial Distributions

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Normal Distribution as an Approximation to the Binomial Distribution

The normal approximation to the binomial

Chapter 4. iclicker Question 4.4 Pre-lecture. Part 2. Binomial Distribution. J.C. Wang. iclicker Question 4.4 Pre-lecture

The Math. P (x) = 5! = = 120.

Chapter 5: Discrete Probability Distributions

CALCULATIONS & STATISTICS

Binomial Random Variables

Chapter 4. Probability and Probability Distributions

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Notes on Continuous Random Variables

The Binomial Probability Distribution

STAT 200 QUIZ 2 Solutions Section 6380 Fall 2013

Math Quizzes Winter 2009

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Stat 104: Quantitative Methods for Economists. Study Guide Solutions, part 2

Chapter 5. Discrete Probability Distributions

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

Chapter 4. Probability Distributions

Ch. 13.2: Mathematical Expectation

2.5 Zeros of a Polynomial Functions

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Sample Size and Power in Clinical Trials

The Binomial Distribution

Chapter 5 - Practice Problems 1

Z - Scores. Why is this Important?

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

Unit 26 Estimation with Confidence Intervals

Review #2. Statistics

Introduction to Hypothesis Testing

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

6 PROBABILITY GENERATING FUNCTIONS

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Lab 11. Simulations. The Concept

6.3 Conditional Probability and Independence

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Solution Let us regress percentage of games versus total payroll.

Mind on Statistics. Chapter 12

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

The normal approximation to the binomial

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Week 3&4: Z tables and the Sampling Distribution of X

Example: Find the expected value of the random variable X. X P(X)

Chapter 1: Exploring Data

$ ( $1) = 40

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Section 5 Part 2. Probability Distributions for Discrete Random Variables

ECE302 Spring 2006 HW4 Solutions February 6,

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Unit 4 The Bernoulli and Binomial Distributions

Betting on Excel to enliven the teaching of probability

Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

Discrete Structures for Computer Science

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Predicting Defaults of Loans using Lending Club s Loan Data

Two Correlated Proportions (McNemar Test)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) B) C) D) 0.

Descriptive Statistics

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions.

Sample Questions for Mastery #5

Chapter 3 Review Math 1030

Probability Distributions

Ch. 6.1 #7-49 odd. The area is found by looking up z= 0.75 in Table E and subtracting 0.5. Area = =

An Introduction to Basic Statistics and Probability

M 1313 Review Test 4 1

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Lecture 7: Continuous Random Variables

MATH 103/GRACEY PRACTICE QUIZ/CHAPTER 1. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

4. Continuous Random Variables, the Pareto and Normal Distributions

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

b. What is the probability of an event that is certain to occur? ANSWER: P(certain to occur) = 1.0

6.4 Normal Distribution

Some special discrete probability distributions

Statistics 2014 Scoring Guidelines

Week 5: Expected value and Betting systems

CURVE FITTING LEAST SQUARES APPROXIMATION

WHERE DOES THE 10% CONDITION COME FROM?

6th Grade Lesson Plan: Probably Probability

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Worldwide Casino Consulting Inc.

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Important Probability Distributions OPRE 6301

Transcription:

STAT 200 Guided Exercise 4 1. Let s Revisit this Problem. Fill in the table again. Diagnostic tests are not infallible. We often express a fale positive and a false negative with any test. There are further terms which we will discuss in this exercise. Imagine that the probability is 0.95 that a certain test will diagnose a diabetic correctly as being diabetic, and it is 0.05 that it will diagnose a person who is not diabetic as being diabetic. It is known that roughly 10% if the population is diabetic. What is the probability that a person diagnosed as being diabetic actually is diabetic? Hint: This is a use Bayes theorem problem, which we did not cover in the lectures. There is another way to handle this problem mack a mock 2 by 2 table of the data based on the information you already know. Once the table is complete, you can solve for the conditional probability. Since some of the probabilities are small, I would suggest you make a table that is based on 100,000 people. I have started the table for you. Test Results Diabetes Status Diabetic Not Diabetic Diabetic Not Diabetic 9500 500 10,000 4500 85500 90,000 14000 86000 100,000 a. What is the probability that a person diagnosed as being diabetic actually is diabetic? P(D Test says D) = 9500/14,000 =.6786 b. What is the odds of the test results saying you are a diabetic (versus not a diabetic) for those who truly are a diabetic? Odds = 9500/500 = 19 c. What is the odds of the test results saying you are a diabetic (versus not a diabetic) for those who are not a diabetic? Odds = 4500/85500 =.052632 d. What is the odds ratio for the test results saying you are a diabetic (versus not a diabetic) comparing diabetics to non diabetics? Interpret in words this odds ratio. Odds Ratio = 19/.052632 = 361 ; Those that are diabetic are 361 times more likely to get a test result saying they are diabetic than those who are not diabetic 1

e. We can think of our table in the following way: Test Results Diabetes Status Diabetic Not Diabetic Diabetic True Positive False Negative Not Diabetic False Positive True Negative The sensitivity of a test is expressed as the probability of a positive test among patients with the disease. The formula is given as: What is the sensitivity of this test? This is P(Pos Test Diabetic) = 9,500/10,000 =.95 A conditional probability! f. The specificity of a test is expressed as the probability of a negative test among patients without the disease. The formula is given as: What is the specificity of this test? This is P(Neg Test Not Diabetic) = 85,500/90,000 =.95 A conditional probability! 2

2. Discrete Random Variable: The number of Games in a Baseball World Series. Based on past results found in the Information Please Almanac, there is a 0.1809 probability that a baseball World Series contest will last four games, a 0.2234 probability that it will last five games, a 0.2234 probability that it will last six games, and a 0.3723 probability that it will last seven games. The probability table is given below: X 4 5 6 7 P(X).1809.2234.2234.3723 a. What is the mean (expected value) number of games in a World Series? E(x) = 4*.1809 + 5*.2234 + 6*.2234 + 7*.3727 = 5.7871 b. What is the variance of the number of games in a World Series? Var = (4-5.7871) 2 *.1809 + (5-5.7871) 2 *.2234 + (6-5.7871) 2 *.2234 + (7-5.7871) 2 *.3723 Var = 1.2740 c. Is it unusual for a team to sweep the World Series (win all four games in a row)? It is not unusual. We expect that 18.09% of the time. However, it is the lowest probability of the possible outcomes, and there is an 81.91% chance of more than 4 games. I would expect that networks look at the probabilities associated with a sweep when bidding on the coverage of the World Series. 3

3. Consider an experiment in which 10 identical small boxes are placed side-by-side on a table. A crystal is placed, at random, inside one of the boxes. A self-professed psychic is asked to pick the box that contains the crystal. This experiment is repeated seven times, and x is the number of correct decisions in seven tries. Thus, it is a Binomial random variable. a. If the psychic is guessing, what is the value of p, the probability of a correct decision on each trial? P(success) = 1/10 =.1 This means a random person just guessing where the crystal is under one of 10 boxes has a 1in 10 or 10% chance of being right. b. Fill in the remaining portions of this table reflecting the probability distribution for this variable using the binomial table or the binomial formula. The Binomial Table for n = 7 and p =.10 is much easier! X 0 1 2 3 4 5 6 7 p(x).4783.3720.1240.0230.0026.0002.0000.0000 c. If the psychic is guessing, what is the expected number of correct decisions in seven trials, and what is the variance? E(x) = n*p = 7 *.1 =.7 V(x) =n*p*q = 7 *.1 *.9 =.63; Std dev. =.7937 d. If the psychic is guessing, what is the probability of no correct decisions in seven trials? Just read the answer from the table! It is pretty high - there is a high probability you won t get any right. X 0 1 2 3 4 5 6 7 P(x).4783.3720.1240.0230.0026.0002.0000.0000 e. One of the psychics who took the test got all seven wrong. Suppose the criteria for having ESP is that you could guess right with p =.5. In other words, if you are a psychic you might not get it right all the time, but you should be doing much better than chance. If p=.5 instead of.10, what is the probability of guessing incorrectly on all seven trials? If a person really was a psychic, it would be rare that such a person would guess none right in 7 tries. X 0 1 2 3 4 5 6 7 P(x).0078.0547.1641.2734.2734.1641.0547.0078 4

4. If a single bit of data (0 or 1) is transmitted over a noisy communication channel, it has a probability p of being incorrectly transmitted. To improve the reliability of the transmission, the bit is transmitted n times, where n is odd. A decoder at the receiving end, called a majority decoder, decides that the correct message is the one carried by the majority of the received bits. This means that if there are five transmissions of a (0,1) bit, the bit used by at least three of the transmissions would be considered correct. Assume that each bit is independently subject to being corrupted with the same probability p, and that p=.1. Note, p is the probability of an error, and in terms of a binomial problem we will think of X as the number of errors in n transmissions. a. If a company sent only one transmission, what is the probability of it being received without an error? p=.1, which is the probability of an incorrect transmission. So q = 1-p =.90. The probability of it being received without an error is.9. If the information is important, this probability might seem too low. b. A company decides to use 5 transmissions as a strategy to reduce errors (n=5). Set up the outcomes for 5 transmissions and the probabilities associated with each outcome using the binomial distribution. X 0 1 2 3 4 5 p(x).5905.3281.0729.0081.0005.0000 c. Calculate the mean, variance, and standard deviation for this problem. E(x) = n*p = 5 *.1 =.5 V(x) = n*p*q = 5 *.1 *.9 =.45; Std dev. =.6708 d. If five messages are sent for each bit, the probability that the message is correctly received is the probability of two or fewer errors. This is not easy to see, but think it through with me. If the system sends 3, 4, 5 wrong messages, the majority decoder strategy will accept the wrong message and make a wrong decision. But it the wrong message is sent 2, 1 or 0 times, the right message will be accepted. Look at the probability of zero, 1 or 2 messages from our binomial table above. What is the probability that the message is correctly received in five transmissions (i.e., 2 or fewer errors)? Compare that with the answer your derived in Part a. Did sending five transmissions improve the chances of sending the message correctly? P(x=0) + P(x=1) + P(x=2) =.5905 +.3281 +.0729 =.9914 This is much better that.9 The majority decoder strategy with n= 5 transmissions greatly improved the chance of a right transmission 5

5. Discrete Random Variable Problem. A concert producer has scheduled an outdoor concert on a Saturday. If it does not rain, he expects to make $20,000 profit from the concert If it does rain, the producer will be forced to cancel the concert and lose $12,000 (from fees, advertising, stadium rental and so forth) The probability of rain on Saturday is.4. a. What is the expected profit from the concert? Hint: write out the probability distribution and solve for the expectation. The values that your random variable can take are the dollar values. x $20,000 -$12,000 P(x).6.4 E(x) = 20,000*.6-12,000*.4 = $12,000-4,800 = $7,200 b. For a fee of $1,000 an insurance company will insure against all losses from a rained out concert. If the producer buys the insurance, what is her expected profit from the concert? Note: an insurance fee is a fixed cost incurred regardless of whether is rains or not. x $20,000 0 P(x).6.4 E(x) = 20,000*.6 + 0*.4 = $12,000 - $1,000 = $11,000 x $19,000-1000 P(x).6.4 E(x) = 19,000*.6-1,000*.4 = $11,400 - $400 = $11,000 c. Assuming the forecast is accurate, do you believe the insurance company has charged too much or too little? Hint: reformulate the problem to express outcomes in terms of the insurance company and what they expect to pay out. x 0 -$12,000 P(x).6.4 E(x) = 0*.6-12,000*.4 = -$4,800 payout Yet they only charged $1,000 - they charged too little. 6

6. Normal Distribution Problem. Plastic bags used for packaging produce are manufactured so that the breaking strength of the bag is normally distributed with a mean of 5 pounds per square inch and a standard deviation of 1.5 pounds per square inch. What proportion of the bags produced have a breaking strength of: a. Less than 3.17 pounds per square inch? Z = (3.17 5)/1.5 = -1.22; P(<= Z) =.5 -.3888 =.1112 b. At least 3.6 pounds per square inch? Z = (3.6 5)/1.5 = -.9333; P(>=Z) =.3238 +.5 =.8238 c. Between 5 and 5.5 pounds per square inch? Z = (5.5 5)/1.5 =.3333; P(5<Z<5.5) =.1293 d. Between 3.2 and 4.2 pounds per square inch? Z = (3.2 5)/1.5 = -1.20; P(5<= Z ) =.3849 Z = (4.2-5)/1.5 = -.5333; P(5<= Z ) =.2019 Answer =.3849 -.2019 =.1830 e. Between what two values symmetrically distributed around the mean will 95% of the breaking strengths fall? Be careful here! With the normal distribution we need to be more precise than 2 standard deviations. 5 ± 1.96(1.5) = 2.06 to 7.94 7

7. Normal Distribution Problem. You have been hired as a consultant to provide analysis for the Personnel Department at ZTel company, a large communications company. Every applicant of ZTel must take a standardized exam, and the hire or no-hire decision depends in part on this exam. The exam was purchased from a company which says the exam is distributed approximately normal with: µ = 525 σ = 55 The current interview policy has two phases. The first phase separates all applicants into one of three categories: Automatic Interview score of 600 or above Maybe Interview score of 500 to 600 Automatic Rejects score less than 500 The Maybe group are passed on to a second phase where their previous experiences, education, special skills, and other factors are taken into consideration in whether to grant an interview or not. No one at the company can remember why the values of 600 and 500 were used as the standards for automatic interview or rejection, and most likely there were decided arbitrarily by a former Personnel Manager. The current Personnel Manager of Ztel needs to know the following: a. The probability associated with the current standard of being automatically rejected - what proportion of the applicants are automatically rejected? Z = (500-525)/55 = -.4545 P(X <= -.4545)=.5 -.1753 Automatic Reject < 500 =.5 -.1753 =.3247 b. The probability associated with the current standard of being automatically interviewed - what proportion of the applicants are automatically interviewed? Z = (600-525)/55 = 1.364 P(X >= 1.364) =.5 -.4137 Automatic Interview > 600 =.5 -.4131 =.0863 c. The manger notices that applicants that score between 535 and 580 tend to be good hires, having both good skills and a higher probability of accepting an offer to the company. She would like to give this group a higher priority in the second phase of evaluation. What percentage of the applicants should she expect to fall within this range? Z = (580-525)/55 = 1.000 P(Z) =.3413 Z = (535-525)/55 =.182 P(Z) =.0721 P (535 <= X <=580) =.3413 -.0721 =.2692 26.9% or about 27 percent are in the sweet Spot 8

d. The manager would prefer that the exam score for automatically interview would be set at the top 15% (the 85 th percentile) and the automatic rejection would be set at 20% (at the 20 th percentile). What are the exam values in this distribution associated with these probabilities (in this case, round to whole numbers)? For the top 15% automatically interviewed, it would be at the 85th percentile, z = 1.04 1.04 = (x-525)/55 = (1.04*55)+525 = 582.2 582 For the bottom 20% it would be at the 20th percentile, z = -.8416 -.8416 = (x-525)/55 =-.8416*55+525 = 478.71 479 Summarize your results as a recommendation to your client. The old approach used thresholds that were arbitrary. With the new approach we could identify the percentage of applicants in the good high range as well as defend the automatic interview and automatic reject in terms of percentiles in the distribution. 9