1 Review for Final
2 Chapter 2: Data quantifiers: sample mean, sample variance, sample standard deviation Quartiles, percentiles, median, interquartile range Dot diagrams Histogram Boxplots
3 Chapter 3: Set theory, set operations, union, intersection, complement, Venn diagram Counting principle, addition principle and multiplication principle Permutation and combination Conditional probability, Independent events, Bayes theorem False positive and total probability theorem
4 Chapter 4: Random variables, probability on random variables, accumulative propability Bernoulli distribution Binomial distribution Hypergeometrical distribution Negative binomial distribution Poisson distribution Mean, variance (standard deviation), moments Chebyshev theorem
5 Chapter 5: Continuous random variables Uniform distribution Exponential distribution Normal distribution, standard normal distribution Z and α, the use of Table 3
6 Chapter 6: Sample, sample mean, sample variance Law of large numbers Central limit theorem Computing probability of sample mean When population variance is not known, t-distribution and sample variance
7 Chapter 7: Inference statistics Point estimation Interval estimation
8 Chapter 8: Test of hypothesis Null hypothesis and alternate hypothesis Type-I and Type-II errors.
9 Example: A company owns 400 laptops. Each laptop has an 8% probability of not working. You randomly select 20 laptops for your salespeople. (a) What is the likelihood that 5 will be broken? (b) What is the likelihood that they will all work? (c) What is the likelihood that they will all be broken? Analysis: working and not working for one computer is a Bernoulli random variable Not working: p = 0.08 Working: q = 1 p = 0.92 With 20 laptops, it is a binomial distribution with n = 20 (a). P X = 5 = b 5; 20,0.08 = (b). P X = 0 = b 0; 20,0.08 = = (c). P X = 20 = b 20; 20,0.08 = =
10 Example: An audio amplifier contains six transistors. It has been ascertained that three of the transistors are faulty but it is not known which three. Amy removes three transistors at random, and inspects them. What is the probability that two of them are faulty? Analysis: this a hypergeometric distribution problem, the pool has two different subsets, total in the pool is N = 6, 3 faulty and 3 non-faulty, pick up n = 3, find the probability of having two (X = 2) faulty ones. The probability function for hypergeometric distribution is: a N a P X = x = x n x N n Determine variable and parameters: N = 6, a = 3, n = 3, x = 2 P X = 2 = = 3! 2! 1! 3! 2! 1! 6! 3! 3! = 34 6!
11 Example: An oil company conducts a geological study that indicates that an exploratory oil well should have a 20% chance of striking oil. What is the probability that the first strike comes on the third well drilled? Analysis: each drill is a Bernoulli distribution (success or fail) with p = 0.2. The probability of first success in x trials is the geometric distribution problem. The probability of first r success in x trials is the negative binomial distribution problem. For negative binomial distribution: P X = x = x 1 r 1 Where r = 1, it becomes geometric distribution. In this problem, x = 3, r = 1, p = 0.2 (q = 0.8). pr q x r P X = 3 =
12 Example: A new superman MasterCard has been issued to 2000 customers. Of these customers, 1500 hold a Visa card, 500 hold an American Express card and 40 hold a Visa card and an American Express card. Find the probability that a customer chosen at random holds a Visa card, given that the customer holds an American Express card. Analysis: this is a conditional probability problem (find probability of Holding American Express given the person is holding Visa). P A B = P(A B) P(B) P A = , P B =, P A B = P A B = 40/ /2000 = = 2 25
13 Example: Hazel thinks she may be allergic to eating peanuts, and takes a test that gives the following results: For people that really do have the allergy, the test says "Yes" 90% of the time For people that do not have the allergy, the test says "Yes" 5% of the time ("false positive") If 1.3% of the population have the allergy, and Hazel's test says "Yes", what are the chances that Hazel really does have the allergy? Analysis: this is a false positive problem. The question is to find the conditional probability P(A B), where B is being tested positive, and A is she really does have allergy. From total probability: P B = P B A P(A) + P B A P(A ) = = P A B = P(A B) P(B) = = = 19.2% Hazel has only 19.2 percent chance of being really have allergy.
14 Suppose that a random variable X has the probability distribution density function f x = 0 x < 1 c/x 4 x 1 (a). Find the value of c. (b). Find the probability of P X < 1. (c). Find the probability of P 2 X 4. (d). Find the mean and variance of the random variable X. Solution: (a). f x dx = c 1 x c 4 dx = 3x 3 1 = c 3 = 1, c = 3 (b). P X < 1 = 0 4 (c). P 2 X 4 = 3 dx = x 4 x = (d). μ = x 3 dx = 3 x 4 2 x E X 2 = x 2 3 x 4 dx = 3x = 1.5 = 3.0, Var X = E X 2 μ 2 = = 0.75
15 Example: The new Endeavor SUV has been recalled because 5% of the cars experience brake failure. The Tahoe dealership has sold 200 of these cars. What is the probability that fewer than 4% of the cars from Tahoe experience brake failure? Analysis: this is actually a binomial distribution problem, but can be solved as normal distribution problem. In binomial distribution, p = 0.05 q = 0.95, n = 200. x = = 8, X μ P X 8 = F Z σ From the approximation of binomial distribution as normal distribution, we have μ = np = = 10, σ 2 = npq = = 9.5 σ = 3.08 From Table 3, can find the value of F Z. Z = = 0.64
16 Example: To estimate the spending of people during Christmas, a department store takes a random sample of 30 people. It finds out that the mean spending of the sample is 800 dollars and the standard deviation is 200. Assume that people s spending is normally distributed, with 98 percent confidence, over what interval does the mean of population spending lie? Analysis: The problem has the sample size 30, the mean and standard deviation are both with the sample, further, it assumes that the population is normally distributed, therefore it is a t-distribution problem (Table 4 will be used). The equation relevant to this problem is: s x t α/2 n < μ < x + t s α/2 n x = 800, s = 200, n = 30,1 α = 98%, α/2 = Find from Table 4: t 0.01 = 2.462, < μ < Question: what is v > 30, and Table 4 cannot give t α/2?
17 Example: A sample of size 10 is used to estimate the mean height of a plant which has standard deviation 10 inches. What is the probability (or confidence) that the error is less than 5 inches in this estimation. Analysis: The problem has the sample size 10, the standard deviation is with the Population, that is σ, therefore this is a central limit theorem problem (Table 3 will be used). The equation relevant to this problem is: σ E = z α/2 n Find z α/2, then find α. E = 5, σ = 10, n = 10. z α/2 = E n σ = Confidence factor = 1 α
18 Example: The number of calls for service at the DMV counter follow the Poisson distribution. The average service rate is 2 people per minute. What is the probability that the time between two calls is (a). Less than 1 minute (b). Greater than 5 minutes? Analysis: between two calls there is no call, so this corresponds to the x=0 case in Poisson distribution and is proportional to e αt (λ = αt), or f t = αe αt, t > 0 Here α = 2. Solution: (a). P t < = f t dt = 2e 2t dt (b). P t > 5 = 2e 2t dt
19 Example: the number of customers arriving at a bank can be described by a Poisson distribution. An average of 4 customers arrive per minute. What is the probability that the time between arrivals of two customers will be a) < 15 seconds? b) at least 30 seconds? Analysis: arrival is a Poisson process with λ = at, the probability with no customer Between a given time interval t is Therefore f t P X = 0 0! e λ = e at = ae at = 4e 4t (need to be normalized) = λ0 a) a = 4; P t < 15 s = P t < 1 m = 4 1/4 4 e 4t dt = 1 e 1 0 b) P t > 30 s = P t > 1 m = 4 2 e 4t dt = e 2 1/2
20 Example: A random sample of size 20 is taken from a population With uniform distribution: < x < 5 f x = 0 otherwise What will be the variance of the sample mean? Analysis: from the central limit theorem, the variance of the sample mean From a continuous population is: Var X = σ2 n The question is then to find the variance σ 2 of the uniform distribution. Solution: for uniform distribution μ = 2.5, σ 2 = E x 2 μ 2 = 0.2x 2 dx Var X 0 5 = σ2 n
21 Example: computer break-down per year are integers, 0, 1, 2, 3,.. Assume the mean number of computer break-down per year is 11.6 with standard deviation of 3.3. Using a normal distribution, approximate the probability that there will be at least 8 (8 or more) break-downs in a given year, and the break-down between 9 and 15. Analysis: this is a normal distribution problem, the key to solve this problem is To convert the random variable to the standard normal distribution. P x 8 1 F z = = = P 9 x 15 = F z = F z = = F 1.03 F( 0.78)
22 Example: A library loses, on average, 6 books per year. What are the probabilities it loses (a) 4 books on a given year (b) 10 books over a 2 year period Analysis: this is a Poisson process problem: λ = at P X = x = λx x! e λ (a) α = 6. λ = 6 1 Therefore f(4; 6) = 64 e 6 4! = (b) α = 6. λ = 6 2 = 12 Therefore f 10; 12 = 1210 e 12 10! = F 10; 12 F 9; 12 = 0.134
23 Example: An insurance company is reviewing its current policy rates. When originally setting the rates they believed that the average claim amount was $1,800. They are concerned that the true mean is actually higher than this, because they could potentially lose a lot of money. They randomly select 40 claims, and calculate a sample mean of $1,950. Assuming that the standard deviation of claims is $500, and set α = 0.05, test to see if the insurance company should be concerned. H 0 : Average claim amount is less or equal to $1,800. H 1 : Average claim amount is greater than $1,800. Known conditions: μ = 1800, x = 1950, σ = 500, n = 40, α = 0.05 (z α = 1.96). One-sided test: H 0 : x μ σ/ n z α x μ σ/ n = / 40 = 1.89 < 1.98 Therefore H 0 is true.
24 Example: Trying to encourage people to stop driving to campus, the university claims that on average it takes people 30 minutes to find a parking space on campus. I don t think it takes so long to find a spot. In fact I have a sample of the last five times I drove to campus, and I calculated x = 20. Assuming that the time it takes to find a parking spot is normal, and that σ = 6 minutes, then perform a hypothesis test with level α = 0.10 to see if my claim is correct. H 0 : On average it takes 30 minutes to find parking spot H 1 : It takes less than 30 minutes to find a parking spot Known conditions: μ = 30, x = 20, σ = 6, n = 5, α = 0.1 (z α = 1.28). One-sided test: H 0 : x μ σ n z α x μ σ/ n = / 5 = 3.73 < 1.28 Therefore H 0 is false. H 1 is true.
25 Example: A sample of 40 sales receipts from a grocery store has x = $137 and σ = $30.2. Use these values and level of significance as 0.01 to test whether or not the mean of sales at the grocery store are different from $150. H 0 : The average of sales is $150. H 1 : The average of sales is not $150. Known conditions: μ = 150, x = 137, σ = 30.2, n = 40, α = 0.01 (z α/2 = 2.58). Two-sided test: H 0 : z α x μ σ/ n z α x μ σ/ n = / 40 = 2.72 < 2.58 Therefore H 0 is false. H 1 is true.