** STA Part 3 (08/Dec/13) ** Exam 3 of 3: Chance and Inference STA 1020 Fall 2013 Section 09 MWF 10:40-11: State

Size: px
Start display at page:

Download "** STA 1020 - Part 3 (08/Dec/13) ** Exam 3 of 3: Chance and Inference STA 1020 Fall 2013 Section 09 MWF 10:40-11:35 0035 State"

Transcription

1 MATERIAL FOR EXAM #3 Contents Exam 3 of 3: Chance and Inference STA 1020 Fall 2013 Section 09 MWF 10:40-11: State Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: menaldi/teach/13f1020.htm Statistics is the Science of collecting, describing and interpreting data... It is said that Probability is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible Quizzes every chapter and then Third Partial Exam Chapter 17 - Thinking about Chance Chapter 18 - Probability Models Chapter 19 - Simulation mostly skipped! Chapter 20 - The House Edge: Expected Values Chapter 21 - What is Confidence Interval? Chapter 22 - What is a Test of significance? Chapter 23 - Use and Abuse of Statistical Inference skipped! Chapter 24 - Two-way Tables and Chi-Square Test JLM (WSU) STA / 112 JLM (WSU) STA / 112 Thought Questions... Two Concepts of Probability Part 3: Probability. The Theory of Statistics Chapter 17 Here are two very different probability questions: If you roll a 6-sided die and do it fairly, what is the probability that it will land with 3 showing? What is the probability that in your lifetime you will travel to a foreign country other than one you have already visited? For which question was it easier to provide a precise answer? Why? For which one could we all agree? What is wrong with the following partial answer: The probability that I will eventually travel to another foreign country (or of any other particular event happening) is 1/2, because either it will happen or it won t Personal-Probability Interpretation The degree to which a given individual believes the event in question will happen Personal belief (or personal ignorance about something?) Relative-Frequency Interpretation The proportion of time the event in question occurs over the long run Long-run relative frequency Two ways to determine the Relative-Frequency Probabilities Physical assumptions (theoretical mathematical model) Repeated observations (empirical results), i.e., by experience with many samples or by simulation JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex1 Coin tossing Ex2 Some coin tossers Figure 17.1 Toss a coin many times. The proportion of heads changes as we make more tosses but eventually gets very close to 0.5. This is what we mean when we say, The probability of a head is one-half. The French naturalist Count Buffon ( ) tossed a coin 4040 times. Result: 2048 heads, or a proportion 2040/4040= for heads Around 1900, the English statistician Karl Pearson heroically tossed a coin 24,000 times. Results: 12,012 heads, a proportion of While imprisoned (WW2), the South African mathematician John Kerrich tossed a coin 10,000 times. Result: 5067 heads, a proportion of What is called a random phenomenon? The probability of any outcome of a random phenomenon is a number between 0 and 1 that describes the proportion of times the outcome would occur in a very long series of repetitions JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 1 / 19

2 Ex3 Cannot predict Ex5, 6, 7 and 8 The National Center for Health Statistics says that the proportion of men aged 20 to 24 years who died in any one year is This is taken as the probability that a young man will die next year. For women that age, the probability of death is about If an insurance company sells many policy to people aged 20 to 24, it knows (or believe?) that it will have to pay off next year on about 0.14% (0.05%) of the policies sold to men s (women s) lives. Logically, it will charge more to insure a man because the probability of having to pay is higher. However, we cannot predict whether a particular person will die on the next year... Probability answer the question What would happen if we did this many times? The idea of probability is that randomness is regular in long run If we toss a coin 6 times, which of these outcomes is more probable (or look random) HTHTTH, HTHTHT or TTTHHH (pattern?) If a basketball player makes several consecutive shots, both the fans and his teammates believe that he has a hot hand and is more likely to make the next shot... If a person win the lotto today, that same person has less change of wining again next week..., winning the lottery twice? Cancer is a common disease, accounting for more that 23% of all deaths in US. That cancer cases sometimes occur in clusters in the same neighborhood is not surprising: there are bound to be clusters somewhere simply by change (or not?) When a shooter in the dice game craps rolls several winners in a row, some gamblers think she/he has a hot hand and bet that she/he will keep on winning. Others say that the law of average means that she/he must now lose so that wins and losses will balance out... Ex8: We want a boy, the law of average affirms that... JLM (WSU) STA / 112 JLM (WSU) STA / 112 Law of average Again... Law of the large numbers: in a large number of independent repetitions of a random Phenomenon (such as coin tossing), averages or proportions are likely to become Stable as the number of trials increases, contrary to sums or counts... Figure 17.3 Toss a coin many times. The difference between the observed number of heads and exactly one-half the number of tosses becomes more variable as the number of tosses increases. Relative-Frequency Probabilities Can be applied when the situation can be repeated numerous times (conceptually) and the outcome can be observed each time Relative frequency (proportion of occurrences) of an outcome settles down to one value over the long run. That one value (between 0 and 1) is then defined to be the probability of that outcome The probability cannot be used to determine whether or not the outcome will occur on a single occasion, or in a single sample (it is a long-run phenomenon) A Personal Probability of an outcome is always a number between 0 and 1 that expresses an individual s judgment of how likely the outcome is. (the outcome may not be repeated!) Two ways: personal judgment of how likely and what happens in may repetitions JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex9 Risk Relative Risk High exposure to asbestos is dangerous. Low exposure, such as that experienced by teachers and students in schools where asbestos is present in the insulation around pipes, is not very risky. The probability that a teacher who works for 30 years in a school with typical asbestos levels will get cancer from the asbestos is around 15/1,000,000. The risk of dying in a car accident during a lifetime is about 15,000/1,000,000, i.e., 1000 times more risky, but... Risk and Relative Risk (Case Study)The following table gives results for whether or not subjects were still smoking when given a nicotine patch or a placebo: Yes No Total Nicotine 64 (53.3%) 56 (46.7%) 120 (100%) Placebo 96 (80%) 24 (20%) 120 (100%) Risk of continuing to smoke: Nicotine: (just the proportion from the table) Placebo: Relative risk of continuing to smoke when using the placebo patch compared with when using the nicotine patch is 1.5 (0.800/0.533 = 1.5) The risk of continuing to smoke when using the placebo patch is 1.5 times the risk when using the nicotine patch Cautions about Risk What if the baseline risk is missing? The relative risk means relative to what? The reported risk is not necessarily your risk. Are the subjects and the setting of the study representative of you and your situation? JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 2 / 19

3 Exercise Ch17 Exercise (answer) Ch Marital status. The probability that a randomly chosen 50-year old woman is divorced is about This probability is a long-run proportion based on all the millions of women aged 50. Let s suppose that the proportion stays at 0.18 for the next 30 years. Bridget is now 20 years old and is not married. (a) Bridget thinks her own chances of being divorced at age 50 are about 5%. Explain why this is a personal probability. (b) Give some good reasons why Bridget s personal probability might differ from the proportion of all women aged 50 who are divorced. (c) You are a government official charged with looking into the impact of the Social Security system on middle-aged divorced women. You care only about the probability 0.18, not about anyones personal probability. Why? **Answers (a) This is based on a personal judgment of her likelihood to get divorced; it is not based on data on repeated trials of an experiment. (b) For example, Bridget might have strong religious or moral beliefs that make her less inclined to consider divorce. (c) For the overall impact of divorce, we are concerned with the percentage of all 50-year-old women who are divorced. The probability 0.18 is supported by data, and is known to apply to the whole group. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Multiple choice Ch17 If I toss a fair coin 5,000 times (a) the number of heads will be close to 2,500. (b) the proportion of heads will be close to 0.5. (c) the proportion of heads in these tosses is a parameter. (d) the proportion of heads will be exactly to 50%. Answer: (b) There are 2,598,960 possible 5-card hands that can be dealt from an ordinary 52-card deck. Of these, 5,148 have all five cards of the same suit (in poker such hands are called flushes). The probability of being dealt such a hand (assuming randomness) is closest to (a) 1/4. (b) 1/100. (c) 1/500. (d) 1/1000. Answer: (c) STA 1020 Fall 2013 Section 09 MWF 10:40-11: State Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: menaldi/teach/13f1020.htm Statistics is the Science of collecting, describing and interpreting data... It is said that Probability is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible Chapter 18 JLM (WSU) STA / 112 Ex1 Marital Status Choose a woman aged 25 to 29 old at random and record her marital status, i.e., a SRS of size n=1. The probability of any marital status is just the proportion of all women aged 25 to 29 who have that status, if we choose many women, we get JLM (WSU) STA / 112 Avoid Being Inconsistent Sketching... For instance, the probability of married with children must not be greater than the probability that the couple is married Marital status: Never married Married Widowed Divorced Probability: Because of the proportions To find out P(not married), we add P(never married), P(widowed) and P(divorced), i.e., = Adding P(not married) and P(married) should give 1, so P(not married) is also equal to = A probability model for a random phenomenon describes all the possible outcomes and says how to assign probabilities to any collection of outcomes. We sometimes call a collection of outcomes an event JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 3 / 19

4 Probability Rules A-B Probability Rules C-D These rules tell us only what probability models make sense! *A* Any probability is a number between 0 and 1 A probability can be interpreted as the proportion of times that a certain event can be expected to occur If the probability of an event is more than 1, then it will occur more than 100% of the time (Impossible!) *B* All possible outcomes together must have probability 1 Because some outcome must occur on every trial, the sum of the probabilities for all possible outcomes must be exactly one If the sum of all of the probabilities is less than one or greater than one, then the resulting probability model will be incoherent *C* The probability that an event does not occur is 1 minus the probability that the event does occur As a jury member, you assess the probability that the defendant is guilty to be Thus you must also believe the probability the defendant is not guilty is 0.20 in order to be coherent (consistent with yourself). If the probability that a flight will be on time is 0.70, then the probability it will be late is 0.30 *D* If two events have no outcomes in common, they are said to be mutually exclusive. The probability that one or the other of two mutually exclusive events occurs is the sum of their individual probabilities Example: Age of woman at first child birth. Given (a) under 20: 25% and (b) 20-24: 33%, find (1) 24 or younger:?, Rule D says 25% + 33% = 58%, and (2) 25+:?, Rule C says 100% 58% = 42% JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex2 Rolling two dice Ex3 A Sampling distribution Figure 18.1 There are 6 possible outcomes for each die, so 36 for two dice. Figure 18.2 The sampling distribution of a sample proportion ˆp from SRSs of size 2527 drawn from a population in which 50% of the members would give positive answers. The histogram shows the distribution from 1000 samples. Assume carefully made dice, resulting in fair dice (i.e., each outcome is equally possible). The event roll a 5 contains four outcomes, 1+4, 2+3, 3+2, 4+1, so that P(roll a 5) = 1/36 + 1/36 + 1/36 + 1/36 = 4/36 = Now it s your turn: How about the events roll a 7 and roll a 11? JLM (WSU) STA / 112 The Normal curve is the ideal pattern that describes the results of a very large number of samples, in this case, with x = 0.5 and s = So, the 95 part of the rule says than 95% of all samples will give a ˆp within 0.48 = and 0.52 = JLM (WSU) STA / 112 Ex4 & Ex5 Gambling Sampling distribution An opinion poll asks an SRS of 501 teens, Generally speaking, do you approve or disapprove of legal gambling or betting? Suppose exactly 50% of all teens would say yes (i.e., the parameter p = 0.5), and that the sampling distribution follows approximatively a normal curve with x = 0.5 and s = Figure 18.3 The Normal sampling distribution. Because is one standard deviation below the mean, the area under the curve to the left of is Figure 18.4 The Normal sampling distribution. The outcome 0.52 has standard score 0.9, so Table B tells us that the area under the curve to the left of 0.52 is The sampling distribution of a statistic tells us what values the statistic takes in repeated samples from the same population and how often it takes those values We think of a sampling distribution as assigning probabilities to the values the statistic can take. Because there are usually many possible values, sampling distributions are often described by a density curve such as a normal curve. A sampling distribution Tells what values a statistic (calculated sample value) takes and how often it takes those values in repeated sampling Assigns probabilities to the values a statistic can take. These probabilities must satisfy Rules A-D Probabilities are often assigned to intervals of outcomes by using areas under density curves Often this density curve is a normal curve Can use rule or get probabilities from Table B Sample proportions (i.e., ˆp) follow a normal curve Check Case Study Evaluated JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 4 / 19

5 Who Voted? Independent Events (Ch19!) [ World Almanac and Book of Facts (1995), Famighetti, R. (editor), Mahwah, N.J.: Funk and Wagnalls ] 56% of registered voters actually voted in the 1992 presidential election. In a random sample of 1600 voters, the proportion who claimed to have voted was Such sample proportions (ˆp) from repeated sampling would have a normal distribution with a mean of 0.56 and a standard deviation of What is the probability of observing a sample proportion (ˆp) as large or larger than 58%? Solution: If we convert the observed value of 0.58 to a standardized score, we get standardized score z = (x x)/s, i.e., ( )/0.012 = 1.67 From Table B, this is the percentile, so the probability of observing a value as small as 0.58 is By Rule C (or B), the probability of observing a value as large or larger than 0.58 is = If two events do not influence each other, and if knowledge about one does not help with the knowledge of the probability of the other, the events are said to be independent of each other. If two events are independent, the probability that they both happen is found by multiplying their individual probabilities. Example: Suppose that about 20% of incoming male freshmen smoke. Suppose that these freshmen are randomly assigned in pairs to dorm rooms. Then... the probability of a match (both smokers or both non-smokers): both are smokers: 0.04 = (0.20)(0.20) neither is a smoker: 0.64 = (0.80)(0.80) both are or neither is a smoker: = 0.68 only one is a smoker: Rule C, (1 0.68), i.e., 32% JLM (WSU) STA / 112 JLM (WSU) STA / 112 Exercise Ch18 Exercise (answer) Ch High school academic rank. Select a first-year college student at random and ask what his or her academic rank was in high school. Here are the probabilities, based on proportions from a large sample survey of first-year students: Rank Top 20% Second 20% Third 20% Fourth 20% Lowest 20% Probability (a) What is the sum of these probabilities? Why do you expect the sum to have this value? (b) What is the probability that a randomly chosen first-year college student was not in the top 20% of his or her high school class? (c) What is the probability that a first-year student was in the top 40% in high school? **Answers (a) The sum is 1, as we expect, because all possible outcomes are listed. (b) = (c) = JLM (WSU) STA / 112 JLM (WSU) STA / 112 Multiple choice Ch18 Choose an American household at random and ask how many computers that household owns. Here are the probabilities as of 2003: Number of computers Probability This is a legitimate assignment of probabilities because it satisfies these rules: (a) all the probabilities are between 0 and 1. (b) all the probabilities are between 0% and 100%. (c) the sum of all the probabilities is exactly 1. (d) both (a) and (c). Answer: (d) 2 What is the probability that a randomly chosen household owns more than one computer? (a) (b) (c) (d) Answer: (b) STA 1020 Fall 2013 Section 09 MWF 10:40-11: State Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: menaldi/teach/13f1020.htm Statistics is the Science of collecting, describing and interpreting data... It is said that Probability is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 5 / 19

6 Chapter 20 Thought Questions... Expected Value is what you logically expect in the long run... (expected value) = a 1 p 1 + a 2 p a n p n, where a i is the value (e.g., amount of money) that you expect if the outcome i happens, and p i is the probability (chance) that outcome i occurs, for i = 1, 2,..., n ** Suppose that a sorority pledge class is selling raffle tickets to raise money. The grand prize is a $200 gift certificate to the campus bookstore, and the pledges must sell all 1000 raffle tickets that were printed. How much would you be willing to pay for a single ticket? Explain your answer The Main Point... While we cannot predict individual outcomes, we can estimate what happens (on average, i.e., repeating this over and over) in the long run. Raffle tickets Long-Term Gains, Losses and Expectations Tickets to a sorority fund-raiser sell for $1. One ticket will be randomly chosen, the ticket owner receives a $200 gift card. They expect to sell 1000 tickets. Your ticket has a 1/1000 = probability of winning (and a probability of losing). Two outcomes: (a) You win $200, net gain is $199 (chance: 0.001) or (b) You do not win, net gain is -$1 (chance: 0.999) Your expected gain (expected value) is ($199)(0.001) + ( $1)(0.999) = $0.80. Long term, you lose an average of $0.80 each time (conceptually) you enter such a contest (Hey, the sorority needs to make a profit!). JLM (WSU) STA / 112 JLM (WSU) STA / 112 Daily Numbers Vehicles A simple lottery wager, the Straight from Pick 3 game of the Tri-State Daily Numbers. You pay $0.50 and choose a three-digit number, and the state chooses a three-digit winning number at random and pays you $250 if your number is chosen. Outcomes n = 2 loose win loose order exact Outcome Value (a i) $0 $250 $0 $42 $292 Probability (p i) The average or expected value is ($0)(0.999) + ($250)(0.001) = $0.25 You may choose to make a $1 Straight-Box (6-way) wager. You again choose a three-digit number, but now you have two ways to win. You win $292 if you exactly match the winning number, and you win $42 if your number has the same digits as the winning number, but in any order. The expected value is ($0)(0.994) + ($42)(0.005) + ($292)(0.001) = $0.502 Which one is better in the long run? Ans: If you keep playing * Straight you will loose $0.50-$0.25=$0.25, i.e., 50% and * Straight-Box $1.000-$0.502=$0.498, i.e., 49.8% What is the average number of motor vehicles in American households? The Census Bureau tells us that the distribution of vehicles per household (2000 year census) is as follows: Number of vehicles Proportion The expected value is (0)(0.10) + (1)(0.34) + + (5)(0.01) = Deal or No Deal? (1) You choose one of four sealed cases; one contains $1,000, and the others are empty. If you open your case, you have a 25% chance to win $1,000 and a 75% chance of getting nothing (winning $0). Or, (2) you can sell your unopened case for $240, giving you a 100% chance of winning $240. * First option (open your case): EV = ($1000)(0.25) + ($0)(0.75) = $250 * Second option (sell your case): EV = $240, no variation. ** Make a Decision: Will you open or sell your case? JLM (WSU) STA / 112 JLM (WSU) STA / 112 Deal or No Deal? (cont) Deal or No Deal? (variation) Summary: Option 1 - a 25% chance to win $1,000 and a 75% chance of getting nothing, EV=$250 Option 2 - a gift of $240, guaranteed, EV=$240 Analysis If choosing for ONE trial Option (1) will maximize potential gain ($1000) and also minimize potential loss ($0) Option (2) guarantees a gain ($240) If choosing for MANY trials Option (1) will maximize expected gain (will make more money in the long run) How many trials are necessary for long run?, 500? Now, a variation: (1) You have a case containing $740 of your money. If you give away your case, you have a 100% chance of losing $740. Or, (2) you can keep your case and play a game in which you have a 75% chance to lose $1,000 and a 25% chance to lose nothing ($0) (1) Give away your case: EV = $740, no variation, a sure loss of $740 (2) Play the game: EV = ($1000)(0.75) + ($0)(0.25) = $750 a 75% chance to lose $1,000 and a 25% chance to lose nothing Make a Decision Will you play the game or not? If choosing for ONE trial Option (2) will minimize potential gain ($0) and will also maximize potential loss ($1000) Option (1) guarantees a loss of ($740) If choosing for MANY trials Option (1) will minimize expected loss (will lose less money in the long run) How many trials are necessary for long run?, 500? JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 6 / 19

7 The Law of Large Numbers Ex4 We want a girl (Ch19!) The actual average (mean) outcome of many independent trials gets closer to the expected value as more trials are made. the higher the variability of the trials, the larger the sample is needed expected values can be calculated by simulating many repetitions and finding the average of all of the outcomes The house in a gambling operation is not gambling at all the games are defined so that the gambler has a negative expected gain per play each play is independent of previous plays, so the law of large numbers guarantees that the average winnings of a large number of players will be close to the (negative) expected value State lottos have extremely variable outcomes; also use pari-mutuel system for (fixed) payoffs, too many trials are necessary... A couple plan to have children until they have a girl or until they have three children. What is the probability that they will have a girl among their children? 1 The probability model is like that for coin tossing: (a) Each child has probability 0.49 of being a girl and 0.51 of being a boy (yes, more boys than girls are born; boys have higher infant mortality, so the sexes even out soon) (b) The sexes of successive children are independent. 2 Assigning digits is also easy. Two digits simulate the sex of one child. We assign 49 of the 100 pairs to girl and the remaining 51 to boy, i.e., 00, 01, 02,..., 48 means girl, and 49, 50, 51,..., 99 means boy. 3 To simulate one repetition of this childbearing strategy, read pairs of digits from Table A until the couple have either a girl or three children. The number of pairs needed to simulate one repetition depends on how quickly the couple get a girl. Here are 10 repetitions, simulated using line 130 of Table A. To interpret the pairs of digits, we have written G for girl and B for boy under them, have added space to separate repetitions. 4 In these 10 repetitions, a girl was born 9 times. Our estimate of the probability that this strategy will produce a girl is therefore estimated probability 9/10 = 0.9. Some mathematics shows that, if our probability model is correct, the true probability of having a girl is Our simulated answer came quite close. Unless the couple are unlucky, they will succeed in having a girl. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex3 We want a girl Controversies Sometimes, expected values may be too difficult to compute and simulation is used. A couple plan to have children until they have a girl or until they have three children, whichever comes first. We find the expected value by simulation, using the table of random digits. The probability model says that the sexes of successive children are independent and that each child has probability 0.49 of being a girl. Thus, a pair of digits simulates one child, with 00 to 48 standing for a girl (e.g., begin at line 130) BG G G G BG G BG BBG BBB G Mean of number of children x = ( )/10 = 1.7 This simulation is too short to be trustworthy (only 10 repetitions or trials). A deeper math analysis shows that the actual expected value is 1.77 Gambling? Voluntary tax? Arguments for & against C H E C K: Exploring the Web box and the end of this chapter. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Exercise Ch20 Exercise (answer) Ch Keno. Keno is a popular game in casinos. Balls numbered 1 to 80 are tumbled in a machine as the bets are placed, then 20 of the balls are chosen at random. Players select numbers by marking a card. Here are two of the simpler Keno bets. Give the expected winnings for each. (a) A $1 bet on Mark 1 number pays $3 if the single number you mark is one of the 20 chosen; otherwise, you lose your dollar. (b) A $1 bet on Mark 2 numbers pays $12 if both your numbers are among the 20 chosen. The probability of this is about Is Mark 2 a more or a less favorable bet than Mark 1? **Answers (a) The expect payoff for a Mark 1 bet is ($3)(20/80) + ($0)(60/80) = $0.75. (b) The expected payoff for a Mark 2 bet is approximately ($12)(0.06) = $0.72, slightly less favorable than a Mark 1 bet. Note: The exact probability of winning a Mark 2 bet is (20/80)(19/79) = ; with this value, the expected payoff is about $ JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 7 / 19

8 Multiple choice Ch20 A basketball player makes 65% of her shots from the field during the season. You want to estimate the expected number of shots made in 10 shots. You simulate 10 shots 25 times and get the following numbers of hits: Your estimate is: (a) 6 out of 10 shots. (b) 6.5 out of 10 shots. (c) 5.6 out of 10 shots. (d) 5.2 out of 10 shots. Answer: (d) In government data, a family consists of two or more persons who live together and are related by blood or marriage. Choose an American family at random and count the number of people it contains. Here is the assignment of probabilities for your outcome: Number of persons Probability Using the probabilities above, what is the expected size of the family you draw? (a) 2 people. (b) 3 people. (c) 3.14 people. (d) 3.50 people. Answer: (c) STA 1020 Fall 2013 Section 09 MWF 10:40-11: State Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: menaldi/teach/13f1020.htm Statistics is the Science of collecting, describing and interpreting data... It is said that Probability is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA / 112 Estimating Part 4: Inference - To draw a conclusion from evidence Chapter 21 Statistical inference draws conclusions about a population on the basis of data from a sample. Question such as, what is the opinion of people about a particular issue, or what is the mean survival time for patients with this type of cancer, or how people are going to vote in the coming election. These questions are about a number (the mean or in particular, a percentage) that describes the population on the basis of a sample. This is, to estimate a parameter on the basis of a statistic, as defined in early chapters. A level C confidence interval (e.g., C = 95%) for a parameter has two parts An interval calculated from the data A confidence level (or coefficient) C, which gives the probability that the interval contains the true parameter value JLM (WSU) STA / 112 JLM (WSU) STA / 112 Thought Questions... Suppose that 40% of a certain population favor the use of nuclear power for energy (a) If you randomly sample 10 people from this population, will exactly four (40%) of them be in favor of the use of nuclear power? Would you be surprised if only two (20%) of them are in favor? (b) Now suppose you randomly sample 1000 people from this population. Will exactly 400 (40%) of them be in favor of the use of nuclear power? Would you be surprised if only 200 (20%) of them are in favor? (c) In both cases (a) and (b). How about if none of the sample are in favor? JLM (WSU) STA / 112 Thought Questions... (cont) Recall * What does it mean to say that the interval from 0.07 to 0.11 represents a 95% confidence interval for the proportion of adults in the US who have diabetes? * Would a 99% confidence interval for the above proportion be wider or narrower than the 95% interval given? What common sense tell you? Explain. * In a May 2006 Zogby America poll of 1000 adults, 70% said that past efforts to enforce immigration laws have been inadequate. Based on this poll, a 95% confidence interval for the proportion in the population who feel this way is about 67% to 73%. If this poll had been based on 5000 adults instead, would the 95% confidence interval be wider or narrower than the interval given? Explain. A 95% confidence interval is an interval calculated from sample data by a process that is guaranteed to capture the true population parameter in 95% of all samples. Recall from previous chapters: Parameter: fixed, unknown number that describes the population Statistic: known value calculated from a sample, a statistic is used to estimate a parameter Sampling Variability: different samples from the same population may yield different values of the sample statistic, estimates from samples will be closer to the true values in the population if the samples are larger Margin of Error: in Chapter 3, a quick estimate was given by 1/ n, where n is the sample size JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 8 / 19

9 More key words Rule Conditions and Illustration The amount by which the proportion obtained from the sample (ˆp) will differ from the true population proportion (p) rarely exceeds the margin of error Sampling Distribution tells what values a statistic takes and how often it takes those values in repeated sampling Sample proportions (ˆp) from repeated sampling would have a normal distribution with a certain mean and standard deviation Take an SRS of size n from a large population that contains proportion p of successes. Let ˆp be the sample proportion of successes, [ i.e., ˆp = (count of successes in the sample)/n ]. If the sample size n is large enough then the sampling distribution of p is approximately normal mean of the sampling distribution is p standard deviation of the sampling distribution is p(1 p) n Figure 21.1 Repeat many times the process of selecting an SRS of size n from a population in which the proportion p are successes. The values of the sample proportion of successes ˆp have this Normal sampling distribution. Ex1 & Ex2 Binge drinking: We calculate the sample proportion is 279/2166 = If we assume that p = 0.13 then sd = (0.13)(0.87)/2166 = The rule says that 95% of all sample of that size will yield a ˆp within the interval p 2 sd = = and p + 2 sd = = * Problem: We do not actually know the true proportion p... JLM (WSU) STA / 112 JLM (WSU) STA / 112 Binge drinking Figure 21.2 Repeat many times the process of selecting an SRS of size 2166 from a population in which the proportion p = 0.13 are successes. The middle 95% of the values of the sample proportion ˆp will lie between and Figure 21.3 Repeated samples from the same population give different 95% confidence intervals, but 95% of these intervals capture the true population proportion p. For n sufficiently large ˆp is close to p Empirical Rule Formula for a 95% Confidence Interval for the Population Proportion Sample proportion plus or minus two standard deviations of the sample proportion, ˆp ± 2 p(1 p)/n Since we do not know the population proportion p (needed to calculate the standard deviation) we will use the sample proportion ˆp in its place, ˆp ± 2 ˆp(1 ˆp)/n The margin of error is 2 ˆp(1 ˆp)/n 1/ n, the quick method of Chapter 3 The formula for a C-level (%) Confidence Interval for the population proportion is ˆp ± z ˆp(1 ˆp)/n, where z is the critical value of the standard normal distribution for confidence level C JLM (WSU) STA / 112 JLM (WSU) STA / 112 Margin of Error Confidence Interval Figure 21.4 Twenty-five samples from the same population give these 95% confidence intervals. In the long run, 95% of all such intervals cover the true population proportion, marked by the vertical line. Figure 21.5 Critical values z* of the Normal distributions. In any Normal distribution, there is area (probability) C under the curve between -z* and z* standard deviations away from the mean. ˆp ± z ˆp(1 ˆp) n Confidence Level Critical Value C z 50% % %* 1* 70% % % %* % 2* 99% %* % 3.29 Check table z-score Ex5 A 99% confidence interval: The SRS of size n=2166 yields ˆp = 279/2166 = 0.129, z = 2.58 and ˆp(1 ˆp)/n = , so ± (2.58)(0.0072) = ± , i.e., from 11.04% to 14.76% JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 9 / 19

10 The Rule for Sample Means Distribution of the mean The proportion is a particular mean, i.e., if positive answer is valued 1 and a negative answer is valued 0 then the average value (or mean) is indeed the proportion, of whole population p and of the SRS ˆp. For instance, we may phase the questions as How strong you feel about this particular issue and then the answer in percent, say from 0% to 100%. Analogously, we may ask about something that is measure is some natural unit and then we have the answers as a numerical values, which are called numeric random variables. As n becomes large, the law of the large number says that the average x of a SRS approximate the mean of the whole population µ. The central limit theorem says that the sampling distribution of the x follows approximately (if n is large) a normal distribution, with mean µ and standard deviation sd = σ/ n, where σ is the standard deviation of the whole population. Figure 21.6 The sampling distribution of the sample mean x of 10 observations compared with the distribution of individual observations. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Some Simulations Margin of error for the mean The C-level (%) confidence interval for the population mean µ is given by either x ± z σ n or x ± z s n where z is the critical value of the standard normal distribution for confidence level C. If the population standard deviation σ is unknown then the sample standard deviation s is used Figure 21.7 The distribution of a sample mean x becomes more Normal as the size of the sample increases. The distribution of individual observations (n = 1) is far from Normal. The distributions of means of 2, 10, and finally 25 observations move closer to the Normal shape. * We are 95% confident that the mean resting pulse rate for the population of all exercisers is between 62.8 and 69.2 bpm (We feel that plausible values for the population of exercisers mean resting pulse rate are between 62.8 and 69.2.) This does not mean that 95% of all people who exercise regularly will have resting pulse rates between 62.8 and 69.2 bpm * Statistically: 95% of all samples of size n = 29 from the population of exercisers should yield a sample mean within two standard errors of the population mean; i.e., in repeated samples, 95% of the confidence intervals should contain the true population mean. JLM (WSU) STA / 112 JLM (WSU) STA / 112 What is the meaning of Confidence? Inference (Ch23!) * First, calculate the C-level (%) confidence interval from (sample) data with the formula p(1 p) either p ± z or µ ± z σ n n where either the population proportion p could be replaced by the sample proportion ˆp, or the population mean µ and standard deviation σ could be replaced by the sample mean x and standard deviation s (if necessary), and z is the critical value of the standard normal distribution for confidence level C * Next, a C-level (%) confidence means that the interval (calculated as above) is guaranteed to capture the true (population) parameter (either the proportion p or the mean µ) in C% of all samples. * In other words, e.g., take C=68%: if you take 100 samples and with each of them you use the above formula to get a confidence interval then approximatively 68 of those samples will give confidence intervals containing the (true) parameter (either proportion or mean, of the whole population), i.e., 68% of the confidence intervals contain the (true) population proportion or mean. * In short, C is the chance (probability) that the one sample (we took!) yields a confidence interval (calculated as above) containing the parameter. The design of the data production matters. Where do the data come from? remains the first question to ask in any statistical study. Any inference method is intended for use in a specific setting. For our confidence interval and test for a proportion p 1 The data must be a simple random sample (SRS) from the population of interest. When you use these methods, you are acting as if the data are SRS. In practice, it is often not possible to actually choose a SRS from the population. Your conclusions may then be open to challenge. 2 These methods are not correct for sample designs more complex than an SRS, such as stratified samples. There are other methods that fit these settings. 3 There is no correct method for inference from data haphazardly collected with bias of unknown size. Fancy formulas cannot rescue badly produced data. 4 Other sources of error, such as dropouts and nonresponse, are important. Remember that confidence intervals and tests use the data you give them and ignore these practical difficulties. JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 10 / 19

11 Inference (Ch23!)(cont) Extra... Know how confidence intervals behave. A confidence interval estimates the unknown value of a parameter and also tells us how uncertain the estimate is. All confidence intervals share these behaviors: 1 The confidence level says how often the method catches the true parameter in very many uses. We never know whether this specific data set gives us an interval that contains the true parameter. All we can say is that we got this result from a method that works 95% of the time. This data set might be one of the 5% that produce an interval that misses the parameter. If that risk bothers you, use a 99% confidence interval. 2 High confidence is not free. A 99% confidence interval will be wider than a 95% confidence interval based on the same data. There is a trade-off between how closely we can pin down the parameter and how confident we are that we have caught the parameter. 3 Larger samples give narrower intervals. If we want high confidence and a narrow interval, we must take a larger sample. The length of our confidence interval for p goes down in proportion to the square root of the sample size. To cut the interval in half, we must take four times as many observations. This is typical of many types of confidence interval. INFO: If the population standard deviation σ is unknown and the sample size n is small (e.g., n 30) then the critical values z should be obtained from the Student t distribution instead of the normal distribution, which is called Critical t Value, while the number n 1 = df stands for the Degree of Freedom. This is generally ignored when estimating population proportions (as in this course). The following Table may be needed... ( menaldi/teach/others/sta1020/table 21 1.pdf) JLM (WSU) STA / 112 JLM (WSU) STA / 112 Extra... (cont) Exercise Ch21 Comment: As mentioned early, it is better to take SRS with size n as large as possible. Now, what seems to better to do with the data of a SRS of size n = 10, 000: (a) consider this a what it is, a simple random sample of size n = 10, 000 and calculate a 95%-level confidence interval or (b) re-evaluate and consider your data as 10 SRS of size n = 1000, calculate 95%-level confidence intervals for each of your 10 SRS and then average those confidence intervals to get a final answer? Discussion: The difference between (a) and (b) is not in collecting different kind of the data, the data is the same, simply, data is arranged in two alternative ways, and comparable calculations are performed. Questions: What basic argument (theory) is behind each procedure (a) and (b)? When could (b) be better than (a) or (a) be better than (b)? How about 100 SRS of size n = 1, 000 or 1000 SRS of size n = 10? The quick method. The quick method of Chapter 3 (pages 42 43) uses ˆp ± 1/ n as a rough recipe for a 95% confidence interval for a population proportion. The margin of error from the quick method is a bit larger than needed. It differs most from the more accurate method of this chapter when ˆp is close to 0 or 1. An SRS of 500 motorcycle registrations finds that 68 of the motorcycles are Harley-Davidsons. Give a 95% confidence interval for the proportion of all motorcycles that are Harleys by the quick method and then by the method of this chapter. How much larger is the quick-method margin of error? JLM (WSU) STA / 112 Exercise (answer) Ch21 **Answers The quick method. By the quick method, the margin of error is 1/ n, i.e., 1/ 500 = Because ˆp = 68/500 = 0.136, the margin of error from the method of this chapter is z ˆp(1 ˆp)/n, i.e., 2 (0.136)(0.864)/500 = or 1.96 (0.136)(0.864)/500 = The quick method margin of error is nearly 1.5 times larger than necessary. JLM (WSU) STA / 112 Multiple choice Ch21 A recent Gallup Poll asked, Do you consider the amount of federal income tax you have to pay as too high, about right, or too low? 52% of the sample answered Too high. Gallup says that: For results based on the sample of national adults (n=1,021) surveyed April 6-9, 2008, the margin of sampling error is 3 percentage points. 1 The poll was carried out by telephone, so people without phones are always excluded from the sample. Any errors in the final result due to excluding people without phones (a) are included in the announced margin of error. (b) are in addition to the announced margin of error. (c) can be ignored, because these people are not part of the population. (d) can be ignored, because this is a non sampling error. Answer: (b) 2 If Gallup had used an SRS of size n=1021 and obtained the sample proportion ˆp = 0.52, you can calculate that the margin of error for 95% confidence would be (a) ±1.6 percentage points. (b) ±0.05 percentage points. (c) ±3.0 percentage points. (d) ±3.1 percentage points. Answer: (d) JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 11 / 19

12 Previously... Ex1 Is the coffee fresh? STA 1020 Fall 2013 Section 09 MWF 10:40-11: State Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: menaldi/teach/13f1020.htm Statistics is the Science of collecting, describing and interpreting data... It is said that Probability is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA / 112 Chapter 22 Matched pairs Experiment:... Each of the 50 subjects tastes two unmarked cups of coffee and says which he or she prefers. One cup in each pair contains instant coffee and the other, fresh-brewed coffee. We find that 36 of our 50 subjects choose the fresh coffee, i.e., ˆp = 36/50 = The formula for a C-level (%) Confidence Interval for the population proportion is ˆp ± z ˆp(1 ˆp)/n, where z is the critical value of the standard normal distribution for confidence level C, see Table At the 99%-level we find z = 2.58 and the Margin of Error is ±z ˆp(1 ˆp)/n = ±(2.58) (0.72)(1 0.72)/50 = ±0.164 and the Confidence Interval is from = 0.56 to = (at 95% we find z = 1.96, so MoE= ±0.124 and CI= [0.60, 0.84]). What is the rational argument for accepting or rejecting the claim that population proportion p = 0.5? What is the probability that the confidence interval [0.56, 0.88] captures the true population proportion p? JLM (WSU) STA / 112 Ex1 Is the coffee fresh? Ex1 Sampling distribution Matched pairs Experiment:... Each of the 50 subjects tastes two unmarked cups of coffee and says which he or she prefers. One cup in each pair contains instant coffee and the other, fresh-brewed coffee. We find that 36 of our 50 subjects choose the fresh coffee, i.e., ˆp = 36/50 = The claim. The skeptic claims that coffee drinkers can not tell fresh from instant, so that only half will choose fresh-brewed coffee, i.e., the population proportion p is only 0.5. If this claim is true, the sampling distribution of ˆp is approx. normal with p = 0.5 and sd = p(1 p)/n = (0.5)(0.5)/50 = The data. In our SRS we got ˆp = 0.72, i.e., 72%, but in another SRS we could find ˆp = 0.56, i.e., 56%, or any other value! Do we have evidence against the claims?. The Probability. We can measure the strength of the evidence against the claim by a probability, i.e., What is the probability that a sample gives ˆp this large or larger if the truth about the population is that p = 0.5? Figure 22.2 The sampling distribution of the proportion of 50 coffee drinkers who prefer fresh-brewed coffee if the truth about all coffee drinkers is that 50% prefer fresh coffee. The shaded area is the probability that the sample proportion is 56% or greater. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex1 Is the coffee fresh? (cont) Thought Questions... The Probability. We can measure the strength of the evidence against the claim by a probability, i.e., What is the probability that a sample gives ˆp this large or larger if the truth about the population is that p = 0.5?. Our sample actually gave ˆp = 0.72, and the probability of getting a sample outcome this large (or larger) is only 0.001, i.e., 1 out of 1000 times this may happen just by change. We may declare this as a good evidence (that the claim is false). If our ˆp were equal to 0.56 then our probability would be 0.2, i.e., 2 out of 10 times, not a really evidence to reject the claim. Be sure to understand why this is a convincing evidence. There are two possible explanations of the fact that 72% of our subject prefer fresh to instant coffee: The skeptic is correct (p = 0.5), and by bad luck a very unlikely outcome occurred In fact, the population proportion is greater then 0.5 (p > 0.5), so that the outcome is about what would be expected The defendant in a court case is either guilty or innocent. Which of these is assumed to be true when the case begins? The jury looks at the evidence presented and makes a decision about which of these two options appears more plausible. Depending on this decision, what are the two types of errors that could be made by the jury? Which is more serious? Suppose 60% (0.60) of the population are in favor of new tax legislation. A random sample of 265 people results in 175, or 66%, who are in favor. From the Rule for Sample Proportions, we know the potential sample proportions in this situation follow an approximately normal distribution, with a mean of 0.60 and a standard deviation of Find the standard score for the observed value of 0.66; then find the probability of observing a standard score at least that large or larger. JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 12 / 19

13 Thought Questions... (cont) Hypotheses and P-values Sampling Distribution of p. mean = standard deviation = 0.03 z = = p-value = = p = 0.60 ˆp = Suppose that in the previous question we do not know for sure that the proportion of the population who favor the new tax legislation is 60%. Instead, this is just the claim of a politician. From the data collected, we have discovered that if the claim is true, then the sample proportion observed falls at the percentile (about the 98th percentile) of possible sample proportions for that sample size. Should we believe the claim and conclude that we just observed strange data, or should we reject the claim? What if the result fell at the 85th percentile? At the 99.99th percentile? A test of significance begins by supposing that the effect we seek is not present. Then we look for a statistical evidence against this supposition and in favor of the effect we hope to find The claim being tested in a statistical test is called null hypothesis H 0. The test is designed to assess the strength of the evidence against the null hypothesis. Usually, the null hypothesis is a statement of non effect or no different, which is translated into something relative to the proportion p (or the mean µ, or standard deviation σ) of an entire population. What we hope or suspect is true instead of H 0 is called the alternative hypothesis H a. The probability computed assuming that H 0 is true, that the SRS outcome would be as extreme or more extreme than the actual observed outcome is called the P-value of the test. The smaller the P-value is, the stronger is the evidence against H 0. Typical examples are (H 0 : p = p 0 ) and either (H a : p p 0 ) or (H a : p > p 0 ) or (H a : p < p 0 ) for the alternative hypothesis. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex2 Count Buffon s coin Count Buffon s coin (cont) For instance, in Ex1, we used (H 0 : p = 0.5) with (H a : p > 0.5), because we have discharged the possibility (H a : p < 0.5) a priori In Ex2, the French naturalist Count Buffon tossed a coin 4040 times, he got 2048 heads, i.e., the sample proportion ˆp = 2048/4040 = We ask: Is this evidence that Buffon s coin was not balanced?. We translate this into a null hypothesis (H 0 : p = 0.5) and the alternative hypothesis (H a : p 0.5). If the null hypothesis is true then p = 0.5 and the sample sd = p(1 p)/n = (0.5)(0.5)/4040 = ** Now, for ˆp = we get a P-value 0.37, i.e., a truly balanced coin would give a result this far or farther from 0.5 in 37% of all repetitions of Buffon s trial. This test give no reason to think that his coin was not balanced. Figure 22.3 The sampling distribution of the proportion of heads in 4040 tosses of a balanced coin. Count Buffon s result, proportion heads, is marked. Figure 22.4 The P-value for testing whether Count Buffon s coin was balanced. This is the probability, calculated assuming a balanced coin, of a sample proportion as far or farther from 0.5 as Buffon s result of The P-value or observed significance level of a test of hypotheses is the smallest value of α (the critical value) for which H 0 (null hypothesis) can be rejected. The P-value measures the strength of evidence against H 0. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex3 Testing Coffee P-values If the P-value is as small or smaller than α, we say that the data are statistical significant at the level α. Significant in the statistical sense does not mean important, it means not likely to happen just by change. Use a table (and your logic) to find the P-value. For Ex3 (Testing Coffee) the null hypothesis is (H 0 : p = 0.5) and the alternative hypothesis is (H a : p > 0.5). If null hypothesis is true then ˆp follows (approx.) a Normal distribution with mean 0.5 and standard deviation The data yields a ˆp = 0.72, which yields a standard score z = ( )/ = 3.1, and the table (check table here!) gives a P-value Since the P-value is small, these data provide very strong evidence that a majority of the population prefers fresh coffee When the alternative hypothesis includes a greater than symbol (H a : p > p 0 ), the P-value is the probability of getting a value as large or larger than the observed test statistic (z) value: Look up the percentile for the value of z in the standard normal table (Table B), the P-value is 1 minus this probability When the alternative hypothesis includes a less than symbol (H a : p < p 0 ), the P-value is the probability of getting a value as small or smaller than the observed test statistic (z) value: Look up the percentile for the value of z in the standard normal table (Table B), the P-value is this probability When the alternative hypothesis includes a not equal to symbol (H a : p p 0 ), the P-value is found as follows: Make the value of the observed test statistic (z) positive (absolute value), look up the percentile for this positive value of z in the standard normal table (Table B), find 1 minus this probability, and double the answer to get the P-value JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 13 / 19

14 P-values (alt) Ex4 Checkbook Alternative Method for P-value 1. Make the value of the observed test statistic (z) negative 2. Look up the percentile for this negative value of z in the standard normal table (Table B) Now If the alternative hypothesis includes a greater than (H a : p > p 0 ) or less than (H a : p < p 0 ) symbol, the P-value is this probability found as percentile in step 2 If the alternative hypothesis includes a not equal to (H a : p p 0 ) symbol, double this probability found as percentile in step 2 to get the P-value There are other ways, use your logic and the fact that the total area under any distribution must be equal to 1 The National Assessment of Adults Literacy (NAAL) survey indicates that a score of 289 or higher on its quantitative test reflects skills that include those needed to balance a checkbook. A SRS size n = 2001 of young men (aged 19 to 24) had mean score x = 279, with a standard deviation s = 103. The pessimist s claim is that the mean NAAL score is less than 289. That is our alternative hypothesis (why not the H 0?), the statement we seek evidence for. Thus (H 0 : µ = 289) and (H a : µ < 289). If the null hypothesis is true, µ = 289, then the sample mean x follows (approx.) a Normal distribution with mean µ = 289 and standard deviation σ/ n, approximating the unknown σ with s, we find s/ n = 103/ 2001 = 2.3. The data gave x = 279, which yields a standard score z = ( )/(2.3) = 4.35 and so, the P-value is equal to , very small. Hence, our conclusion is to reject the null hypothesis, i.e., this data gives a strong evidence that the mean score for all young men (aged 19 to 24) is below the level that includes the skills necessary to balance a checkbook. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Checkbook (cont) Procedure Figure 22.5 The P-value is , for a onesided test when the standard score for the sample mean is Ex5: Executives blood pressures: n = 72, x = 126.1, s = 15.2 (H 0 : µ = 128), with (H a : µ 128) s/ n = = 1.79 The P-value is 0.289, for a two-sided test when the standard score for the sample mean is ( )/(1.79) = 1.06 The Five Steps of Hypothesis Testing 1 Determining the Two Hypotheses 2 Computing the Sampling Distribution 3 Collecting and Summarizing the Data (calculating the observed test statistic) 4 Determining How Unlikely the Test Statistic is if the Null Hypothesis is True (calculating the P-value) 5 Making a Decision/Conclusion (based on the P-value, is the result statistically significant?) Possible Null Hypothesis H 0 : population parameter equals some value, status quo, no relationship, no change, no difference in two groups, etc. The logical Alternative Hypothesis H a is NOT H 0 Now it s your turn... Read Case Study evaluated JLM (WSU) STA / 112 JLM (WSU) STA / 112 Procedure (cont) Decision Null: (H 0 : p = p 0 ) Alternative, one-sided: (H a : p > p 0 ) or (H a : p < p 0 ), and one of these possibilities is discharged as a fact Alternative, two-sided: (H a : p p 0 ) Sampling Distribution for Proportions: If numerous simple random samples of size n are taken, the sample proportions ˆp from the various samples will have an approximately normal distribution with mean equal to p (the population proportion) and standard deviation equal to sd = p(1 p)/n. Since we assume the null hypothesis is true, we replace p with p 0 to complete the test. To determine if the observed proportion is unlikely to have occurred under the assumption that H 0 is true, we must first convert the observed value to a standard score z = (ˆp p 0 )/sd We find the P-value associated with the standard score obtained from the data. If we think the P-value is too low to believe the observed test statistic is obtained by chance only, then we would reject chance (reject the null hypothesis) and conclude that a statistically significant relationship exists (accept the alternative hypothesis) Otherwise, we fail to reject chance and do not reject the null hypothesis of no relationship (result not statistically significant) Commonly, P-values less than 0.05 are considered to be small enough to reject chance (reject the null hypothesis). However, some researchers use 0.10 or 0.01 as the cut-off instead of This cut-off value is typically referred to as the significance level α of the test The P-value is like an estimation of the probability that the null hypothesis is true. Because our objective it to reject the null hypothesis (i.e., to disprove the null hypothesis), it is clear that small P-value are desired. JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 14 / 19

15 A Survey A Survey (cont) Parental Discipline: Nationwide random telephone survey of 1,250 adults, where 474 respondents had children under 18 living at home. The results on behavior based on the smaller sample reported 3% for the full sample and 5% for the smaller sample as margin of error. The 1994 survey marks the first time a majority of parents reported not having physically disciplined their children in the previous year. Figures over the past six years show a steady decline in physical punishment, from a peak of 64 percent in The 1994 sample proportion who did not spank or hit was 51%. Question: Is this evidence that a majority of the population did not spank or hit? Null: The proportion of parents who physically disciplined their children in the previous year is the same as the proportion p of parents who did not physically discipline their children, i.e., (H 0 : p = 0.5) Alt: A majority of parents did not physically discipline their children in the previous year, i.e., (H a : p > 0.5) Based on the sample: Sample size n = 474 (large, so proportions follow normal distribution) No physical discipline: 51% ˆp = 0.51 s.d. of ˆp is (0.50)(1 0.50)/474 = (recall we assume H 0 : p = 0.5 true) Standard score z = ( )/0.023 = 0.43 Table B, (0.43) (65.54%), so the P-value is = Since the P-value (0.3446) is not small, we cannot reject chance as the reason for the difference between the observed proportion (0.51) and the (null) hypothesized proportion (0.50). We do not find the result to be statistically significant at α = 0.01 (or even 0.05 or 0.10) We fail to reject the null hypothesis. It is plausible that there was not a majority (over 50%) of parents who refrained from using physical discipline. JLM (WSU) STA / 112 JLM (WSU) STA / 112 Errors Mean Hypothesis Testing: Significance level (α) and Power (1 β) Decisions H 0 is correct H 0 is incorrect Reject H 0 Type Error I (α) Correct (1 β) Accept H 0 Correct (1 α) Type Error II (β) Type I: If we decide there is a relationship in the population (reject null hypothesis) This is an incorrect decision only if the null hypothesis is true The probability of this incorrect decision is equal to the cut-off α for the P-value Type II: If we decide not to reject chance and thus allow for the plausibility of the null hypothesis (complicate to estimate!) This is an incorrect decision only if the alternative hypothesis is true The probability of this incorrect decision depends on (a) the magnitude of the true relationship, (b) the sample size, (c) the cut-off for the P-value. JLM (WSU) STA / 112 The population proportion p could be replaced by a population mean µ when setting up the two hypotheses Null: (H 0 : µ = µ 0 ) Alternative, one-sided: (H a : µ > µ 0 ) or (H a : µ < µ 0 ), and one of these possibilities is discharged as a fact Alternative, two-sided: (H a : µ µ 0 ) As before, if numerous simple random samples of size n are taken, the sample means from the various samples will have an approximately normal distribution with mean equal to µ (the population mean) and standard deviation equal to sd = σ/ n. Here we approximate the population standard deviation σ with the sample standard deviation s (i.e., remark the factor 1/ n between s and the standard deviation of sampling distribution of the sample means sd) JLM (WSU) STA / 112 Tomato plants Bacteria A study showed that the difference in sample means for the heights of tomato plants when using a nutrient rich potting soil versus using ordinary top soil was 6.82 inches. The corresponding standard deviation (of the sample distribution of the mean difference) was 3.10 inches. Suppose the means are actually equal, so that the mean difference in heights for the populations is actually zero. * What is the standard score (z) corresponding to the observed difference of 6.82 inches? * How often would you expect to see a standardized score that large or larger? [standard score]: [(sample mean diff.) - (population mean diff.)] divided by [standard deviation of the mean difference], i.e., z = (6.82 0)/3.10 = 2.2 This is the percentile for a standard normal curve, so the probability of seeing a z-value this large or larger is 1.39% (i.e., ). One of the conclusions made by researchers from a study comparing the amount of bacteria in carpeted and uncarpeted rooms was, The average difference [in mean bacteria colonies per cubic foot] was 3.48 colonies [95% Confidence Interval: between ( 2.72) and (9.68), and P-value: (0.29)]. * What are the null and alternative hypotheses being tested here? * Is there a statistically significant difference between the means of the two groups? H 0 : The mean number of bacteria for carpeted rooms is equal to the mean number of bacteria for uncarpeted rooms. H a : The mean number of bacteria for carpeted rooms is different from the mean number of bacteria for uncarpeted rooms. P-value is large (>.05), so there is not a significant difference (fail to reject the Null hypothesis) (Note that the confidence interval for the difference contains 0) JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 15 / 19

16 Inference (Ch23!) Extra... Know what statistical significance says. Many statistical studies hope to show that some claim is true. A clinical trial compares a new drug with a standard drug because the doctors hope that patients given the new drug will do better. A psychologist studying gender differences suspects that women will do better than men (on the average) on a test that measures social-networking skills. The purpose of significance tests is to weight the evidences that the data give in favor of such claims. That is, a test helps us know if we found what we were looking for. To do this, we ask what would happen if the claim were not true. That s the null hypothesis (no difference between the two drugs, no difference between women and men). A significance test answers only one question: How strong is the evidence that the null hypothesis is not true? A test answers this question by giving a P-value. The P-value tells us how unlikely data as or more extreme than ours (in the sense of providing evidence against the null hypothesis) would be if the null hypothesis were true. Data that are very unlikely are good evidence that the null hypothesis is not true. We usually don t know whether the hypothesis is true for this specific population. All we can say is that data as or more extreme than these would occur only 5% of the time if the hypothesis were true. This kind of indirect evidence against the null hypothesis (and for the effect we hope to find) is less straightforward than a confidence interval. Know what your methods require. Significance test and confidence interval for a proportion p require that the population be much larger than the sample. They also require that the sample itself be reasonably large so that the sampling distribution of the sample proportion ˆp is close to Normal. We have said little about the specifics of these requirements because the reasoning of inference is more important. Just as there are inference methods that fit stratified samples, there are methods that fit small samples and small populations. INFO: Sometimes, the alternative hypothesis H a is denoted by H 1. Example: Finding Sample Size Required to Achieve 80% Power. Here is a statement similar to the one in an article from the Journal of the American Medical Association: The trial design assumed that with a 0.05 significance level, 153 randomly selected subjects would be needed to achieve 80% power to detect a reduction in the coronary heart disease rate from 0.5 to 0.4. Before conducting the experiment, the researchers selected a significance level of 0.05 and a power at least 80%. They also decided that a reduction in the proportion of coronary heart disease from 0.5 to 0.4 is an important difference that they want to detect (by correctly rejecting the false null hypothesis). Using a significance level of 0.05, power 0.80, and the alternative proportion of 0.4, we deduce that the required minimum sample size is 153. Related to Power of a test Check Wikipedia JLM (WSU) STA / 112 JLM (WSU) STA / 112 Exercise Ch22 Exercise (answer) Ch Do chemists have more girls? Some people think that chemists are more likely than other parents to have female children. (Perhaps chemists are exposed to something in their laboratories that affects the sex of their children.) The Washington State Department of Health lists the parents occupations on birth certificates. Between 1980 and 1990, 555 children were born to fathers who were chemists. Of these births, 273 were girls. During this period, 48.8% of all births in Washington State were girls. Is there evidence that the proportion of girls born to chemists is higher than the state proportion? **Answers Do chemists have more girls? Our hypotheses are H 0 : p = and H a : p > 0.488, where p is the proportion of girls among children born to chemists. If the null hypothesis is true, then the proportion of girls in an SRS of n=555 chemists children would have (approximately) a normal distribution with mean p = p 0 = and standard deviation p(1 p)/n = (0.488)(0.512)/555 = Our sample had ˆp = 273/555 = , for which the standard score is z = (ˆp p 0 )/ = ( )/ = 0.18 From Table B [> find (0.2) (57.93%) and (0.1) (53.98%) so take percentile 57.93% and = <], we estimate the P-value to be about 0.42 (calculator/better table output gives P = ). Thus, we cannot reject the null hypothesis (not enough evidence!) JLM (WSU) STA / 112 JLM (WSU) STA / 112 Multiple choice Ch22 If the value of the standard test statistic z is 2.5 then (a) we should use a different null hypothesis. (b) we reject the null hypothesis at the 5% significance level. (c) we fail to reject the null hypothesis at the 5% significance level. (d) we reject the alternative hypothesis at the 5% significance level. Answer: (b) If a significance test gives a P-value of 0.50 then (a) the margin of error is (b) the null hypothesis is very likely to be true. (c) we do not have good evidence against the null hypothesis. (d) we do have good evidence against the null hypothesis. Answer: (c) STA 1020 Fall 2013 Section 09 MWF 10:40-11: State Instructor: Dr. J.L. Menaldi Textbook - Statistics: Concepts and Controversies, by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed] Class Link: menaldi/teach/13f1020.htm Statistics is the Science of collecting, describing and interpreting data... It is said that Probability is the vehicle of Statistics, i.e., if were not for the laws of probability, the theory of statistics would not be possible JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 16 / 19

17 Chapter 24 Ex1 & Ex2 Two-way Tables A university offers only two degree programs, one in electrical engineering and one in English *Admission Status is the row variable Male Female Total *Gender is the column variable Admit * (% male)=80/140 = 0.57, i.e., 57% Deny * (% female)=60/140 = 0.43, i.e., 43% Total To describe relationships among categorical variables, calculate appropriate percentage from counts given. Discrimination in admission? Because there are only two categories of admission status, we can see the relation between gender and admission status by comparing the (percentage male applicants admitted) = 35/80 = 0.44, i.e., 44% (percentage female applicants admitted) = 20/60 = 0.33, i.e., 33% Thought Questions... A random sample of registered voters were asked whether they preferred balancing the budget or cutting taxes. Each was then categorized as being either a Democrat or a Republican. Of the 30 Democrats, 12 preferred cutting taxes, while of the 40 Republicans, 24 preferred cutting taxes. Democrats Republican Total How would you Prefer Tax Cutting display the data in Do not Prefer TxC a table? Total When there are two categorical variables, the data are summarized in a two-way table each row represents a value of the row variable each column represents a value of the column variable * The number of observations falling into each combination of categories is entered into each cell of the table. * Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table (prevents misleading comparisons due to unequal sample sizes for different groups) JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex3 Treating cocaine addiction Ex3 Bar graph... A three-year study compared an antidepressant (desipramine) with lithium, and a placebo. 72 subjects were randomly divided into 3 groups (each having 24 subjects) and assigned to each treatment Group Treatment Subjects Successes Percent 1 Desipramine % 2 Lithium % 3 Placebo % Are these data good evidence that there is a relationship between treatment and outcome in the population of all cocaine addicts? To answer this question we begin with a two-way table Success Failure Total Desipramine Lithium Placebo Figure 24.1 Bar graph comparing the success rates of three treatments for cocaine addiction JLM (WSU) STA / 112 JLM (WSU) STA / 112 Chi-square test Ex4 Cocaine addiction (cont) Our null hypothesis takes the form H 0 :There is no association between treatment and success in the population of all cocaine addicts In a two-way table when H 0 is true we computer (expected count) = (row total) (column total) (table total) e.g., the expect count of successes in the desipramine group is (24)(24)/72 = 8, namely, if the null hypothesis of no treatment differences is true then we expect 8 of the 24 desipramine subjects to succeed The chi-square statistic, denoted by χ 2, is a measure of how far the observed count in a two-way table are from the expected counts χ 2 = [(observed count) (expected count)] 2 (expected count) where means sum over all cells in the table Here are the observed and expected counts Observed Expected Success Failure Success Failure Desipramine Lithium Placebo Finding the chi-square statistics, adding 6 terms for the 6 cells in the two-way table note that all failure values are obtainable from the success values χ 2 = (14 8)2 (10 16)2 (4 8) = = Now it s your turn: Smoking and survival... (20 16)2 16 = JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 17 / 19

18 Chi-square distribution Chi-square table The chi-square statistic is a measure of the distance of the observed counts from the expected counts is always zero or positive and skewed to the right is only zero when the observed counts are exactly equal to the expected counts large values of χ 2 are evidence against H 0 because these would show that the observed counts are far from what would be expected if H 0 were true the chi-square test is one-sided (any violation of H 0 produces a large value of χ 2 ) A specific χ 2 distribution requires to know the degree of freedom (in short df), which is computed as (r 1)(c 1) for a two-way table with r rows and c columns Figure 24.2 The density curves for three members of the chisquare family of distributions. The sampling distributions of chisquare statistics belong to this family ** In a two-way table df = (r 1)(c 1) ** There are r c terms in the, where r = (number of rows), c = (number of columns) JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex5 Using chi-square test Ex6... heart disease? Back to Ex3, the two-way table has 3 treatment and 2 outcomes, i.e., it has r = 3 rows and c = 2 columns. Thus, the chi-square statistic has (3 1)(2 1) = 2 degree of freedom. From the we found χ 2 = 10.5, so we look in the Table 24.1 for df = 2 to find the critical value 9.21 required for significance at the α = 0.01 level, and for α = Hence, the cocaine study shows a significant relationship P < 0.01 between treatment and success. Conclusion: We found a strong evidence of some association between treatment and success, and by looking at the two-way table, we see that desipramine works better than the other treatments. NOTE: You can safely use the chi-square test when no more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater People who get angry easily Anger Score tend to have more heart Low Moderate High disease people Sample size coronary heart disease CHD count (CHD) CHD percent 1.7% 2.3% 4.3% First step is to write the data Low Moderate High Total as a two-way table, by adding CHD count the count of subjects who did No CHD not suffer form heart disease Total The chi-square method tests these hypotheses: H 0 : no relationship between anger and CHD H a : some relationship between anger and CHD There are r = 2 rows and c = 3 columns, so df = (2 1)(3 1) = 2 JLM (WSU) STA / 112 JLM (WSU) STA / 112 Ex6... heart disease? (cont) Ex7 Discrimination in admissions? Find the expected cell count, e.g., of high-anger people with CHD is (expected count) = (row 1 total) (column 3 total) (table total) = (190)(633) 8474 = Observed Expected Low Moderate High Low Moderate High CHD count No CHD ** It is safe to apply the chi-square test since all expected cell counts are greater than 5, so χ 2 = ( )2 ( )2 ( ) = = ( ) ** For df = 2 in Table 24.1 the χ 2 = is larger than the critical value for α = We have highly significant evidence (P < 0.001) that anger and heart disease are related. Statistical software can give the actual P-value of P = = ** The effects of lurking variables can change and even reverse relationship between two variables Ex7: Discrimination in admissions? Go back to Ex1. Suspect women discrimination. From the two-way table we found (percentage male applicants admitted) = 35/80 = 0.44, i.e., 44% (percentage female applicants admitted) = 20/60 = 0.33, i.e., 33% In its defense, the University produces a three-way table Engineering English Combined Male Female Male Female Male Female Admit Deny Total % Admit 50% 100% 25% 25% 44% 33% JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 18 / 19

19 Simpson s paradox Exercise Ch24 ** Simpson s paradox: An association or comparison that holds for all of several groups can disappear or even reverse direction when the data are combined to form a single group. This is just an extreme form of the fact that observed associations can be misleading when there are lurking variables... ** Summary: Make a two-way table to display the relationship between two categorical variables Conclude by using the P-value (critical) of the chi-square statistic *Read Ex9: Discrimination in mortgage lending? *Case Study Evaluated Chi-square Table and Tables 21.1 & Smoking by students and their families. How are the smoking habits of students related to the smoking habits of their close family members? Here is a two-way table from a survey of male students in six secondary schools in Malaysia: Student smokes does not smoke At least one close family member smokes No close family member smokes Write a brief answer to the question posed, including a comparison of selected percentages. JLM (WSU) STA / 112 Exercise (answer) Ch24 **Answers The table below shows the percent of male students who smoke within each status of family member smoking status. At least one close family member smokes 115/322 = 35.7% No close family member smokes 25/100 = 25% In our sample, male students with at least one close family member who smokes are more likely to smoke than are male students with no close family member who smokes. JLM (WSU) STA / 112 Multiple choice Ch24 Which of these is an example of Simpson s paradox? (a) Teachers salaries and sales of alcoholic beverages have risen together over time, but paying teachers more does not cause higher alcohol sales. (b) Alaska Air has a lower percent of late flights than America West at every airport, but America West has a lower percent when we combine all airports. (c) The percent of surgery patients given Anesthetic A who die is higher than the percent for Anesthetic B, but this is because A is used in more serious surgeries. (d) States in which a smaller percent of students take the SAT exam have higher median scores on the SAT. Answer: (b) If surgical procedure A has a higher success rate than surgical procedure B in every hospital where they are used and yet procedure B has a higher overall success rate, then we suspect that: (a) this is an example of Simpson s paradox. (b) it must be easier to achieve success at some hospitals than at others, whatever procedure is used. (c) procedure B must be used predominantly in hospitals where it is easier to achieve success, while procedure A must be used predominantly where it is harder to achieve success. (d) All of (a), (b), and (c) are true. Answer: (d) JLM (WSU) STA / 112 JLM (WSU) STA / menaldi/teach/ 19 / 19

Lecture 13. Understanding Probability and Long-Term Expectations

Lecture 13. Understanding Probability and Long-Term Expectations Lecture 13 Understanding Probability and Long-Term Expectations Thinking Challenge What s the probability of getting a head on the toss of a single fair coin? Use a scale from 0 (no way) to 1 (sure thing).

More information

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

More information

Unit 19: Probability Models

Unit 19: Probability Models Unit 19: Probability Models Summary of Video Probability is the language of uncertainty. Using statistics, we can better predict the outcomes of random phenomena over the long term from the very complex,

More information

The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math. P (x) = 5! = 1 2 3 4 5 = 120. The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

More information

Lotto Master Formula (v1.3) The Formula Used By Lottery Winners

Lotto Master Formula (v1.3) The Formula Used By Lottery Winners Lotto Master Formula (v.) The Formula Used By Lottery Winners I. Introduction This book is designed to provide you with all of the knowledge that you will need to be a consistent winner in your local lottery

More information

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS 1. If two events (both with probability greater than 0) are mutually exclusive, then: A. They also must be independent. B. They also could

More information

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010 MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

More information

1) The table lists the smoking habits of a group of college students. Answer: 0.218

1) The table lists the smoking habits of a group of college students. Answer: 0.218 FINAL EXAM REVIEW Name ) The table lists the smoking habits of a group of college students. Sex Non-smoker Regular Smoker Heavy Smoker Total Man 5 52 5 92 Woman 8 2 2 220 Total 22 2 If a student is chosen

More information

Chapter 5 Section 2 day 1 2014f.notebook. November 17, 2014. Honors Statistics

Chapter 5 Section 2 day 1 2014f.notebook. November 17, 2014. Honors Statistics Chapter 5 Section 2 day 1 2014f.notebook November 17, 2014 Honors Statistics Monday November 17, 2014 1 1. Welcome to class Daily Agenda 2. Please find folder and take your seat. 3. Review Homework C5#3

More information

Lab 11. Simulations. The Concept

Lab 11. Simulations. The Concept Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Ch. 4 Discrete Probability Distributions 4.1 Probability Distributions 1 Decide if a Random Variable is Discrete or Continuous 1) State whether the variable is discrete or continuous. The number of cups

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

X X AP Statistics Solutions to Packet 7 X Random Variables Discrete and Continuous Random Variables Means and Variances of Random Variables

X X AP Statistics Solutions to Packet 7 X Random Variables Discrete and Continuous Random Variables Means and Variances of Random Variables AP Statistics Solutions to Packet 7 Random Variables Discrete and Continuous Random Variables Means and Variances of Random Variables HW #44, 3, 6 8, 3 7 7. THREE CHILDREN A couple plans to have three

More information

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto

More information

Mind on Statistics. Chapter 8

Mind on Statistics. Chapter 8 Mind on Statistics Chapter 8 Sections 8.1-8.2 Questions 1 to 4: For each situation, decide if the random variable described is a discrete random variable or a continuous random variable. 1. Random variable

More information

AP Stats - Probability Review

AP Stats - Probability Review AP Stats - Probability Review Multiple Choice Identify the choice that best completes the statement or answers the question. 1. I toss a penny and observe whether it lands heads up or tails up. Suppose

More information

6.042/18.062J Mathematics for Computer Science. Expected Value I

6.042/18.062J Mathematics for Computer Science. Expected Value I 6.42/8.62J Mathematics for Computer Science Srini Devadas and Eric Lehman May 3, 25 Lecture otes Expected Value I The expectation or expected value of a random variable is a single number that tells you

More information

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025. Elementary Statistics and Inference S:05 or 7P:05 Lecture Elementary Statistics and Inference S:05 or 7P:05 Chapter 7 A. The Expected Value In a chance process (probability experiment) the outcomes of

More information

Week 5: Expected value and Betting systems

Week 5: Expected value and Betting systems Week 5: Expected value and Betting systems Random variable A random variable represents a measurement in a random experiment. We usually denote random variable with capital letter X, Y,. If S is the sample

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions.

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions. Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

More information

Introduction to Discrete Probability. Terminology. Probability definition. 22c:19, section 6.x Hantao Zhang

Introduction to Discrete Probability. Terminology. Probability definition. 22c:19, section 6.x Hantao Zhang Introduction to Discrete Probability 22c:19, section 6.x Hantao Zhang 1 Terminology Experiment A repeatable procedure that yields one of a given set of outcomes Rolling a die, for example Sample space

More information

$2 4 40 + ( $1) = 40

$2 4 40 + ( $1) = 40 THE EXPECTED VALUE FOR THE SUM OF THE DRAWS In the game of Keno there are 80 balls, numbered 1 through 80. On each play, the casino chooses 20 balls at random without replacement. Suppose you bet on the

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015. Probability, statistics and football Franka Miriam Bru ckler Paris, 2015 Please read this before starting! Although each activity can be performed by one person only, it is suggested that you work in groups

More information

Chapter 16: law of averages

Chapter 16: law of averages Chapter 16: law of averages Context................................................................... 2 Law of averages 3 Coin tossing experiment......................................................

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000 Lecture 4 Nancy Pfenning Stats 000 Chapter 7: Probability Last time we established some basic definitions and rules of probability: Rule : P (A C ) = P (A). Rule 2: In general, the probability of one event

More information

Chapter 20: chance error in sampling

Chapter 20: chance error in sampling Chapter 20: chance error in sampling Context 2 Overview................................................................ 3 Population and parameter..................................................... 4

More information

Chapter 16. Law of averages. Chance. Example 1: rolling two dice Sum of draws. Setting up a. Example 2: American roulette. Summary.

Chapter 16. Law of averages. Chance. Example 1: rolling two dice Sum of draws. Setting up a. Example 2: American roulette. Summary. Overview Box Part V Variability The Averages Box We will look at various chance : Tossing coins, rolling, playing Sampling voters We will use something called s to analyze these. Box s help to translate

More information

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2 Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

More information

Expected Value and the Game of Craps

Expected Value and the Game of Craps Expected Value and the Game of Craps Blake Thornton Craps is a gambling game found in most casinos based on rolling two six sided dice. Most players who walk into a casino and try to play craps for the

More information

Statistics 100A Homework 3 Solutions

Statistics 100A Homework 3 Solutions Chapter Statistics 00A Homework Solutions Ryan Rosario. Two balls are chosen randomly from an urn containing 8 white, black, and orange balls. Suppose that we win $ for each black ball selected and we

More information

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text Law of Averages - pg. 294 Moore s Text When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So, if the coin is tossed a large number of times, the number of heads and the

More information

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined Expectation Statistics and Random Variables Math 425 Introduction to Probability Lecture 4 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan February 9, 2009 When a large

More information

Testing Hypotheses About Proportions

Testing Hypotheses About Proportions Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine

More information

Chapter 8 Section 1. Homework A

Chapter 8 Section 1. Homework A Chapter 8 Section 1 Homework A 8.7 Can we use the large-sample confidence interval? In each of the following circumstances state whether you would use the large-sample confidence interval. The variable

More information

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty AMS 5 PROBABILITY Basic Probability Probability: The part of Mathematics devoted to quantify uncertainty Frequency Theory Bayesian Theory Game: Playing Backgammon. The chance of getting (6,6) is 1/36.

More information

Chapter 7 Probability. Example of a random circumstance. Random Circumstance. What does probability mean?? Goals in this chapter

Chapter 7 Probability. Example of a random circumstance. Random Circumstance. What does probability mean?? Goals in this chapter Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on Version A.) No grade disputes now. Will have a chance to

More information

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values. MA 5 Lecture 4 - Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the

More information

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPETED VALUE A game of chance featured at an amusement park is played as follows: You pay $ to play. A penny and a nickel are flipped. You win $ if either

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

36 Odds, Expected Value, and Conditional Probability

36 Odds, Expected Value, and Conditional Probability 36 Odds, Expected Value, and Conditional Probability What s the difference between probabilities and odds? To answer this question, let s consider a game that involves rolling a die. If one gets the face

More information

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

More information

Probability and Expected Value

Probability and Expected Value Probability and Expected Value This handout provides an introduction to probability and expected value. Some of you may already be familiar with some of these topics. Probability and expected value are

More information

Solution. Solution. (a) Sum of probabilities = 1 (Verify) (b) (see graph) Chapter 4 (Sections 4.3-4.4) Homework Solutions. Section 4.

Solution. Solution. (a) Sum of probabilities = 1 (Verify) (b) (see graph) Chapter 4 (Sections 4.3-4.4) Homework Solutions. Section 4. Math 115 N. Psomas Chapter 4 (Sections 4.3-4.4) Homework s Section 4.3 4.53 Discrete or continuous. In each of the following situations decide if the random variable is discrete or continuous and give

More information

Stat 20: Intro to Probability and Statistics

Stat 20: Intro to Probability and Statistics Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley 22 July 2014 By the end of this lecture... You will be able to: Determine what we expect the sum

More information

In the situations that we will encounter, we may generally calculate the probability of an event

In the situations that we will encounter, we may generally calculate the probability of an event What does it mean for something to be random? An event is called random if the process which produces the outcome is sufficiently complicated that we are unable to predict the precise result and are instead

More information

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling. STUDENT MODULE 12.1 GAMBLING PAGE 1 Standard 12: The student will explain and evaluate the financial impact and consequences of gambling. Risky Business Simone, Paula, and Randy meet in the library every

More information

Section 7C: The Law of Large Numbers

Section 7C: The Law of Large Numbers Section 7C: The Law of Large Numbers Example. You flip a coin 00 times. Suppose the coin is fair. How many times would you expect to get heads? tails? One would expect a fair coin to come up heads half

More information

Probability: The Study of Randomness Randomness and Probability Models. IPS Chapters 4 Sections 4.1 4.2

Probability: The Study of Randomness Randomness and Probability Models. IPS Chapters 4 Sections 4.1 4.2 Probability: The Study of Randomness Randomness and Probability Models IPS Chapters 4 Sections 4.1 4.2 Chapter 4 Overview Key Concepts Random Experiment/Process Sample Space Events Probability Models Probability

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

6. Let X be a binomial random variable with distribution B(10, 0.6). What is the probability that X equals 8? A) (0.6) (0.4) B) 8! C) 45(0.6) (0.

6. Let X be a binomial random variable with distribution B(10, 0.6). What is the probability that X equals 8? A) (0.6) (0.4) B) 8! C) 45(0.6) (0. Name: Date:. For each of the following scenarios, determine the appropriate distribution for the random variable X. A) A fair die is rolled seven times. Let X = the number of times we see an even number.

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 5.1 Homework Answers 5.7 In the proofreading setting if Exercise 5.3, what is the smallest number of misses m with P(X m)

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

Review #2. Statistics

Review #2. Statistics Review #2 Statistics Find the mean of the given probability distribution. 1) x P(x) 0 0.19 1 0.37 2 0.16 3 0.26 4 0.02 A) 1.64 B) 1.45 C) 1.55 D) 1.74 2) The number of golf balls ordered by customers of

More information

Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows:

Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows: Texas Hold em Poker is one of the most popular card games, especially among betting games. While poker is played in a multitude of variations, Texas Hold em is the version played most often at casinos

More information

ACMS 10140 Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Answers

ACMS 10140 Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Answers ACMS 10140 Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Answers Name DO NOT remove this answer page. DO turn in the entire exam. Make sure that you have all ten (10) pages

More information

13.0 Central Limit Theorem

13.0 Central Limit Theorem 13.0 Central Limit Theorem Discuss Midterm/Answer Questions Box Models Expected Value and Standard Error Central Limit Theorem 1 13.1 Box Models A Box Model describes a process in terms of making repeated

More information

Probability Models.S1 Introduction to Probability

Probability Models.S1 Introduction to Probability Probability Models.S1 Introduction to Probability Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard The stochastic chapters of this book involve random variability. Decisions are

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going? The Normal Approximation to Probability Histograms Where are we going? Probability histograms The normal approximation to binomial histograms The normal approximation to probability histograms of sums

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) ±1.88 B) ±1.645 C) ±1.96 D) ±2.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) ±1.88 B) ±1.645 C) ±1.96 D) ±2. Ch. 6 Confidence Intervals 6.1 Confidence Intervals for the Mean (Large Samples) 1 Find a Critical Value 1) Find the critical value zc that corresponds to a 94% confidence level. A) ±1.88 B) ±1.645 C)

More information

R Simulations: Monty Hall problem

R Simulations: Monty Hall problem R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem R Simulations: Monty Hall

More information

Chapter 6: Probability

Chapter 6: Probability Chapter 6: Probability In a more mathematically oriented statistics course, you would spend a lot of time talking about colored balls in urns. We will skip over such detailed examinations of probability,

More information

AP * Statistics Review. Designing a Study

AP * Statistics Review. Designing a Study AP * Statistics Review Designing a Study Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

Ch. 13.2: Mathematical Expectation

Ch. 13.2: Mathematical Expectation Ch. 13.2: Mathematical Expectation Random Variables Very often, we are interested in sample spaces in which the outcomes are distinct real numbers. For example, in the experiment of rolling two dice, we

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Betting on Excel to enliven the teaching of probability

Betting on Excel to enliven the teaching of probability Betting on Excel to enliven the teaching of probability Stephen R. Clarke School of Mathematical Sciences Swinburne University of Technology Abstract The study of probability has its roots in gambling

More information

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025.

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025. Elementary Statistics and Inference 22S:025 or 7P:025 Lecture 20 1 Elementary Statistics and Inference 22S:025 or 7P:025 Chapter 16 (cont.) 2 D. Making a Box Model Key Questions regarding box What numbers

More information

Statistics 100A Homework 2 Solutions

Statistics 100A Homework 2 Solutions Statistics Homework Solutions Ryan Rosario Chapter 9. retail establishment accepts either the merican Express or the VIS credit card. total of percent of its customers carry an merican Express card, 6

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

ACMS 10140 Section 02 Elements of Statistics October 28, 2010. Midterm Examination II

ACMS 10140 Section 02 Elements of Statistics October 28, 2010. Midterm Examination II ACMS 10140 Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Name DO NOT remove this answer page. DO turn in the entire exam. Make sure that you have all ten (10) pages of the examination

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Chapter 5 A Survey of Probability Concepts

Chapter 5 A Survey of Probability Concepts Chapter 5 A Survey of Probability Concepts True/False 1. Based on a classical approach, the probability of an event is defined as the number of favorable outcomes divided by the total number of possible

More information

TABLE OF CONTENTS. ROULETTE FREE System #1 ------------------------- 2 ROULETTE FREE System #2 ------------------------- 4 ------------------------- 5

TABLE OF CONTENTS. ROULETTE FREE System #1 ------------------------- 2 ROULETTE FREE System #2 ------------------------- 4 ------------------------- 5 IMPORTANT: This document contains 100% FREE gambling systems designed specifically for ROULETTE, and any casino game that involves even money bets such as BLACKJACK, CRAPS & POKER. Please note although

More information

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Name: The point value of each problem is in the left-hand margin. You

More information

Second Midterm Exam (MATH1070 Spring 2012)

Second Midterm Exam (MATH1070 Spring 2012) Second Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [60pts] Multiple Choice Problems

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

GAMES WITH ONE DIE Games where you only use a single die can be plenty exciting indeed. Here are two good examples of this!

GAMES WITH ONE DIE Games where you only use a single die can be plenty exciting indeed. Here are two good examples of this! [TACTIC rules for dice games] Here are 21 different dice games, with one, two, three or even more dice. We wish you lots of enjoyment! GAMES WITH ONE DIE Games where you only use a single die can be plenty

More information

14.4. Expected Value Objectives. Expected Value

14.4. Expected Value Objectives. Expected Value . Expected Value Objectives. Understand the meaning of expected value. 2. Calculate the expected value of lotteries and games of chance.. Use expected value to solve applied problems. Life and Health Insurers

More information

PROBABILITY SECOND EDITION

PROBABILITY SECOND EDITION PROBABILITY SECOND EDITION Table of Contents How to Use This Series........................................... v Foreword..................................................... vi Basics 1. Probability All

More information

Exam. Name. How many distinguishable permutations of letters are possible in the word? 1) CRITICS

Exam. Name. How many distinguishable permutations of letters are possible in the word? 1) CRITICS Exam Name How many distinguishable permutations of letters are possible in the word? 1) CRITICS 2) GIGGLE An order of award presentations has been devised for seven people: Jeff, Karen, Lyle, Maria, Norm,

More information

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS 0 400 800 1200 1600 NUMBER OF TOSSES

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS 0 400 800 1200 1600 NUMBER OF TOSSES INTRODUCTION TO CHANCE VARIABILITY WHAT DOES THE LAW OF AVERAGES SAY? 4 coins were tossed 1600 times each, and the chance error number of heads half the number of tosses was plotted against the number

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

A probability experiment is a chance process that leads to well-defined outcomes. 3) What is the difference between an outcome and an event?

A probability experiment is a chance process that leads to well-defined outcomes. 3) What is the difference between an outcome and an event? Ch 4.2 pg.191~(1-10 all), 12 (a, c, e, g), 13, 14, (a, b, c, d, e, h, i, j), 17, 21, 25, 31, 32. 1) What is a probability experiment? A probability experiment is a chance process that leads to well-defined

More information

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

More information

Why is Insurance Good? An Example Jon Bakija, Williams College (Revised October 2013)

Why is Insurance Good? An Example Jon Bakija, Williams College (Revised October 2013) Why is Insurance Good? An Example Jon Bakija, Williams College (Revised October 2013) Introduction The United States government is, to a rough approximation, an insurance company with an army. 1 That is

More information

Chi Square Distribution

Chi Square Distribution 17. Chi Square A. Chi Square Distribution B. One-Way Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes

More information

Betting systems: how not to lose your money gambling

Betting systems: how not to lose your money gambling Betting systems: how not to lose your money gambling G. Berkolaiko Department of Mathematics Texas A&M University 28 April 2007 / Mini Fair, Math Awareness Month 2007 Gambling and Games of Chance Simple

More information

Determine the empirical probability that a person selected at random from the 1000 surveyed uses Mastercard.

Determine the empirical probability that a person selected at random from the 1000 surveyed uses Mastercard. Math 120 Practice Exam II Name You must show work for credit. 1) A pair of fair dice is rolled 50 times and the sum of the dots on the faces is noted. Outcome 2 4 5 6 7 8 9 10 11 12 Frequency 6 8 8 1 5

More information

Hypothesis Tests for 1 sample Proportions

Hypothesis Tests for 1 sample Proportions Hypothesis Tests for 1 sample Proportions 1. Hypotheses. Write the null and alternative hypotheses you would use to test each of the following situations. a) A governor is concerned about his "negatives"

More information

Ch. 13.3: More about Probability

Ch. 13.3: More about Probability Ch. 13.3: More about Probability Complementary Probabilities Given any event, E, of some sample space, U, of a random experiment, we can always talk about the complement, E, of that event: this is the

More information

HONORS STATISTICS. Mrs. Garrett Block 2 & 3

HONORS STATISTICS. Mrs. Garrett Block 2 & 3 HONORS STATISTICS Mrs. Garrett Block 2 & 3 Tuesday December 4, 2012 1 Daily Agenda 1. Welcome to class 2. Please find folder and take your seat. 3. Review OTL C7#1 4. Notes and practice 7.2 day 1 5. Folders

More information

Session 8 Probability

Session 8 Probability Key Terms for This Session Session 8 Probability Previously Introduced frequency New in This Session binomial experiment binomial probability model experimental probability mathematical probability outcome

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Problem Solving and Data Analysis

Problem Solving and Data Analysis Chapter 20 Problem Solving and Data Analysis The Problem Solving and Data Analysis section of the SAT Math Test assesses your ability to use your math understanding and skills to solve problems set in

More information

Gaming the Law of Large Numbers

Gaming the Law of Large Numbers Gaming the Law of Large Numbers Thomas Hoffman and Bart Snapp July 3, 2012 Many of us view mathematics as a rich and wonderfully elaborate game. In turn, games can be used to illustrate mathematical ideas.

More information

! Insurance and Gambling

! Insurance and Gambling 2009-8-18 0 Insurance and Gambling Eric Hehner Gambling works as follows. You pay some money to the house. Then a random event is observed; it may be the roll of some dice, the draw of some cards, or the

More information