R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem R Simulations: Monty Hall problem Ying Sun SAMSI Undergraduate Workshop February 24, 2011 Ying Sun R Simulations: Monty Hall problem
R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem R Simulations: Monty Hall problem 1 Monte Carlo Simulations 2 Monty Hall Problem 3 Statistical Analysis 4 Simulation in R 5 Exercise 1: A Gift Giving Puzzle 6 Exercise 2: Gambling Problem Ying Sun R Simulations: Monty Hall problem
R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem Monte Carlo Simulations What is Monte Carlo simulation: A problem solving technique used to approximate the probability of certain outcomes by running multiple trial runs, called simulations, using random variables. Why use simulations: Some situations do not lend to precise mathematical treatment. Others may be difficult, time-consuming to analyze. Simulations may approximate real-world results, yet require less time and effort. Ying Sun R Simulations: Monty Hall problem
R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem Conduct A Simulation 1 Describe the possible outcomes. 2 Link each outcome to one or more random numbers. 3 Choose a source of random numbers. 4 Choose a random number. 5 Based on the random number, note the simulated outcome. Repeat steps 4 and 5 multiple times; preferably, until the outcomes show a stable pattern. 6 Analyze the simulated outcomes and report results. Ying Sun R Simulations: Monty Hall problem
Switch or Not Switch In September of 1991 a reader of Marilyn Vos Savant s Sunday Parade column wrote in and asked the following question: Suppose you re on a game show, and you re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what s behind the other doors, opens another door, say No. 3, which has a goat. He then says to you, Do you want to pick door No. 2? Is it to your advantage to take the switch?
Monty Hall Problem This problem was given the name The Monty Hall Paradox in honor of the long time host of the television game show Let s Make a Deal. Articles about the controversy appeared in the New York Times and other papers around the country. Marilyn s answer was that the contestant should switch doors and she received nearly 10,000 responses from readers, most of them disagreeing with her. Several were from mathematicians and scientists whose responses ranged from hostility to disappointment at the nation s lack of mathematical skills. They assumed that each door has an equal probability and concluded that switching does not matter.
Conditional Probability Suppose the player chooses door No.1. Let A 1 : Door No.1 has the car, A 2 : Door No.2 has the car, A 3 : Door No.3 has the car, O: Host opens door No.3. P(A 1 ) = P(A 2 ) = P(A 3 ) = 1 3 If door No. 1 has the car, the host could open door No.2 or 3, P(O A 1 ) = 1 2 If door No. 2 has the car, the host must open door No.3, P(O A 2 ) = 1 If door No. 3 has the car, the host can not open door No.3, P(O A 3 ) = 0
Bayes Theorem Bayes theorem: relates the conditional and marginal probabilities of events. P(A 1 O) = P(O A 1 1)P(A 1 ) 3 i=1 P(O A i)p(a i ) = 2 1 3 1 6 + 1 3 + 0 = 1 3 P(A 2 O) = P(O A 1 2)P(A 2 ) 3 i=1 P(O A i)p(a i ) = 3 1 6 + 1 3 + 0 = 2 3 The player chooses door No.1, p 1 = P(switch and win) = 2 3, p 2 = P(not switch and win) = 1 3.
Simulation in R Suppose the player plays this game n = 10 times. Which door has the car: > car=sample(3,10,replace=t) > car [1] 1 3 2 3 2 2 3 3 2 1 Which door is chosen: > door=sample(3,10,replace=t) > door [1] 3 3 2 2 1 3 3 1 1 2 Switch and win: the car is not behind the chosen door. > switchwin=(door!=car) > switchwin [1] TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE > sum(switchwin)/10 [1] 0.7 Not switch and win: the car is behind the chosen door. > noswitchwin=(door==car) > noswitchwin [1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE > sum(noswitchwin)/10 [1] 0.3
R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem The Law of Large Numbers The law of large numbers: It describes the result of performing the same experiment a large number of times. The average of the results obtained from a large number of trials should be close to the expected value. It will tend to become closer as more trials are performed. Increase the number of trials n. # of switch and win ˆp 1 = n p 1 = 2 3. # of not switch and win ˆp 2 = n p 2 = 1 3. Ying Sun R Simulations: Monty Hall problem
Uncertainties Bootstrap Method: It allows one to estimate the sampling distribution of the estimators, ˆp 1 and ˆp 2 (statistics). We can construct confidence intervals for p 1 and p 2 (parameters). R function gameshow(n) returns ˆp 1 and ˆp 2 when the player plays the game n times. 95% bootstrap confidence intervals: B=1000 prob=null n=1000 for (i in 1:B){ prob=rbind(prob,gameshow(n)) } p1hat=prob[,1] p2hat=prob[,2] quantile(p1hat,c(0.025,0.975)) quantile(p2hat,c(0.025,0.975))
Normal Approximation Sampling distribution: if min{np, n(1 p)} 10 and n is large, ( ) p(1 p) ˆp N p,. n Check the histgrams and boxplots of ˆp 1 for different n. boxplot(p1hat,ylim=c(0.4,1)) hist(p1hat,freq=f,xlim=c(0.4,1)) If we know the sampling distribution in theory, we can only estimate p once and use the normal approximation to construct a confidence interval. 95% confidence interval for p 1 : ( ) ˆp1 (1 ˆp 1 ) ˆp1 (1 ˆp 1 ) ˆp 1 1.96, ˆp 1 + 1.96. n n
A Gift Giving Puzzle A probability problem: n people put their names into a hat, then they all draw a name. The draw is successful if no one draws their own name. How likely is that? Theoretical solution: The idea is to count the total number of permutations and then subtract out any permutation that fixes one or more points. The trick is to make sure there are no double counts. The formula specifies how to add and subtract various subsets (fixing one point, two points, three points, etc). ( n n! 1 = n! ) ( n (n 1)! + 2 n ( 1) k k=0 k! p = n ( 1) k k=0. k! ) (n 2)!... + ( 1) n ( n n ) (n n)!
Simulation in R Suppose n.p = 10 people put their names into a hat: > name=1:10 > name [1] 1 2 3 4 5 6 7 8 9 10 They all draw a name: draw=sample(name) > draw [1] 8 10 7 9 6 1 4 2 5 3 Check if no one draws their own name: > check=sum(draw==name) > check [1] 0 Success? > check==0 [1] TRUE The gift(n.p,n.sim) function does this simulation n.sim times.
Gambling Problem A gambler starts with $100. She plays a game in which she is allowed to bet any amount of money (up to the amount that she has). If she wins, she receives twice her original stake back, while if she loses, she loses the amount she bet. The probability she wins the game is p = 0.48. She will stop playing if she reaches $0 or if she reaches $200. She wishes to maximize the probability that she stops at $200. Is it better to bet in small increments (i.e., bet $5 at a time until she reaches $0 or $200) or bet it all at once (i.e. $100 on the first bet)? The gambling(start,bet,n.sim) function does this simulation n.sim times with the starting money and betting money as input arguments.