Stat405. Simulation. Hadley Wickham. Thursday, September 20, 12



Similar documents
Betting systems: how not to lose your money gambling

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Lab 11. Simulations. The Concept

Section 7C: The Law of Large Numbers

Australian Reels Slot Machine By Michael Shackleford, A.S.A. January 13, 2009

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Probability: The Study of Randomness Randomness and Probability Models. IPS Chapters 4 Sections

Stat 20: Intro to Probability and Statistics

ECE 316 Probability Theory and Random Processes

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

How To Check For Differences In The One Way Anova

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters No Time Limit No Scratch Paper Calculator Allowed: Scientific

WISE Power Tutorial All Exercises

p-values and significance levels (false positive or false alarm rates)

Gaming the Law of Large Numbers

REWARD System For Even Money Bet in Roulette By Izak Matatya

Vamos a Las Vegas Par Sheet

Contents. Introduction...1. The Lingo...3. Know the Different Types of Machines Know the Machine You re Playing...35

36 Odds, Expected Value, and Conditional Probability

Economics 1011a: Intermediate Microeconomics

13:69D 1.39 Progressive slot machine jackpots (a) This section shall apply to any slot machine jackpot that may increase in value as the machine is

Stochastic Processes and Advanced Mathematical Finance. Duration of the Gambler s Ruin

Expected Value. 24 February Expected Value 24 February /19

Tutorial 5: Hypothesis Testing

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Expected Value and the Game of Craps

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

Statistical Fallacies: Lying to Ourselves and Others

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Chapter 5. Discrete Probability Distributions

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

AMS 5 CHANCE VARIABILITY

Chapter 4 Lecture Notes

Example for CoinTosses CoinTosses[100, True] HTHTHHHHTHTHHTTTHHTTTTHTTHTHTTTTTTHTTTHTHHHTHHHTTHHTHTTHHT TTTHTTHTHTHTHTHHTHHTHTTTHHHHTTTHHTHTHTTTHT

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

1 Representation of Games. Kerschbamer: Commitment and Information in Games

6 Scalar, Stochastic, Discrete Dynamic Systems

Practical Probability:

Evaluating Trading Systems By John Ehlers and Ric Way

Take the stress out of organising a night out by booking a Gala Bingo Party Night.

November 08, S8.6_3 Testing a Claim About a Standard Deviation or Variance

STAT 35A HW2 Solutions

Choice Under Uncertainty

MONEY MANAGEMENT. Guy Bower delves into a topic every trader should endeavour to master - money management.

Math Games For Skills and Concepts

CASINO TAXATION CHAPTER 371 CASINO TAXATION ARRANGEMENT OF SECTIONS

Betting on Excel to enliven the teaching of probability

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

We { can see that if U = 2, 3, 7, 11, or 12 then the round is decided on the first cast, U = V, and W if U = 7, 11 X = L if U = 2, 3, 12.

IEEEXTREME PROGRAMMING COMPETITION PROBLEM & INSTRUCTION BOOKLET #3

A Simple Inventory System

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Law of Large Numbers. Alexandra Barbato and Craig O Connell. Honors 391A Mathematical Gems Jenia Tevelev

Chapter 4. Hypothesis Tests

Building Java Programs

the GentInG GUIde to slots

Example: Find the expected value of the random variable X. X P(X)

Exploratory Data Analysis

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

MrMajik s Money Management Strategy Copyright MrMajik.com 2003 All rights reserved.

Random Fibonacci-type Sequences in Online Gambling

Introduction to Discrete Probability. Terminology. Probability definition. 22c:19, section 6.x Hantao Zhang

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Statistics and Probability

Data Storage STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

UM Online Practice Programming Contest September 20, 2011

How To Run Statistical Tests in Excel

Pattern matching probabilities and paradoxes A new variation on Penney s coin game

Week 5: Expected value and Betting systems

Chapter What is the probability that a card chosen from an ordinary deck of 52 cards is an ace? Ans: 4/52.

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

(SEE IF YOU KNOW THE TRUTH ABOUT GAMBLING)

Magic Bomb (VER.AMERICAN ALPHA)

THE WINNING ROULETTE SYSTEM by

Pascal is here expressing a kind of skepticism about the ability of human reason to deliver an answer to this question.

HOW THE GAME IS PLAYED

Getting started with qplot

How To Find The Sample Space Of A Random Experiment In R (Programming)

Microcontroller Systems. ELET 3232 Topic 8: Slot Machine Example

Probability and Expected Value

Chapter 7. One-way ANOVA

Thursday, October 18, 2001 Page: 1 STAT 305. Solutions

To begin, I want to introduce to you the primary rule upon which our success is based. It s a rule that you can never forget.

There are a number of superb online resources as well that provide excellent blackjack information as well. We recommend the following web sites:

Additional information >>> HERE <<< Getting Start biggest win on online roulette User Review

Paper No 19. FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80

Math 408, Actuarial Statistics I, Spring Solutions to combinatorial problems

Mathematical Expectation

Linear Risk Management and Optimal Selection of a Limited Number

COMMON CORE STATE STANDARDS FOR

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Video Poker in South Carolina: A Mathematical Study

Transcription:

Stat405 Simulation Hadley Wickham

1. For loops 2. Hypothesis testing 3. Simulation

For loops

# Common pattern: create object for output, # then fill with results cuts <- levels(diamonds$cut) means <- rep(na, length(cuts)) for(i in seq_along(cuts)) { sub <- diamonds[diamonds$cut == cuts[i], ] means[i] <- mean(sub$price) } # We will learn more sophisticated ways to do this # later on, but this is the most explicit

1:5 seq_len(5) 1:10 seq_len(10) 1:0 seq_len(0) seq_along(1:10) 1:10 * 2 seq_along(1:10 * 2)

Your turn For each diamond colour, calculate the median price and carat size

colours <- levels(diamonds$color) n <- length(colours) mprice <- rep(na, n) mcarat <- rep(na, n) for(i in seq_len(n)) { set <- diamonds[diamonds$color == colours[i], ] mprice[i] <- median(set$price) mcarat[i] <- median(set$carat) } results <- data.frame(colours, mprice, mcarat)

Back to slots... For each row, calculate the prize and save it, then compare calculated prize to actual prize Question: given a row, how can we extract the slots in the right form for the function?

slots <- read.csv("slots.csv") i <- 334 slots[i, ] slots[i, 1:3] str(slots[i, 1:3]) slots <- read.csv("slots.csv", stringsasfactors = F) str(slots[i, 1:3]) as.character(slots[i, 1:3]) calculate_prize(as.character(slots[i, 1:3]))

# Create space to put the results slots$check <- NA # For each row, calculate the prize for(i in seq_len(nrow(slots))) { w <- as.character(slots[i, 1:3]) slots$check[i] <- calculate_prize(w) } # Check with known answers subset(slots, prize!= check) # Uh oh!

# Create space to put the results slots$check <- NA # For each row, calculate the prize for(i in seq_len(nrow(slots))) { w <- as.character(slots[i, 1:3]) slots$check[i] <- calculate_prize(w) } # Check with known answers subset(slots, prize!= check) # Uh oh! What is the problem? Think about the most general case

DD DD DD 800 7 7 7 80 BBB BBB BBB 40 BB BB BB 25 B B B 10 C C C 10 Any bar Any bar Any bar 5 C C * 5 C * C 5 C * * 2 * C * 2 * * C 2 DD doubles any winning combination. Two DD quadruples. DD is wild

Hypothesis testing

Goal Casino claims that slot machines have prize payout of 92%, but payoff for the 345 we observed is 67%. Is the casino lying? (House advantage of 8% vs. 33%) (Big caveat: today we re using a prize calculation function we know to be incorrect)

Q: What does it mean to have prize payout of 92%? A: If we play the slot machine an infinite number of times, our average prize would be $0.92

Strategy 1 Play the slot machine an infinite number of times. If the average prize is not $0.92, reject the casino s claim. But...

# Let s make a virtual coin flip # 1 = heads, 0 = tails coin <- c(0, 1) # we can flip the coin once flips <- sample(coin, 1, replace = T) mean(flips) # we can flip the coin many times flips <- sample(coin, 10, replace = T) mean(flips)

# what happens to the proportion of heads as n # increases? flips <- sample(coin, 10000, replace = T) n <- seq_along(flips) mean <- cumsum(flips) / n coin_toss <- data.frame(n, flips, mean) library(ggplot2) qplot(n, mean, data = coin_toss, geom = "line") + geom_hline(yintercept = 0.5)

# what happens to the proportion of heads as n # increases? flips <- sample(coin, 10000, replace = T) n <- seq_along(flips) mean <- cumsum(flips) / n coin_toss <- data.frame(n, flips, mean) cumulative sum library(ggplot2) qplot(n, mean, data = coin_toss, geom = "line") + geom_hline(yintercept = 0.5)

0.6 0.5 0.4 mean 0.3 0.2 0.1 0.0 200 400 600 800 1000 n

0.6 0.5 0.4 mean 0.3 0.2 0.1 How can we use this to test whether μ =.92? 0.0 200 400 600 800 1000 n

Strategy 2 Play the slot machine a large number of times. If the average prize is far from $0.92, reject the casino s claim.

Simulation

slots <- read.csv("slots.csv", stringsasfactors = FALSE) calculate_prize <- function(windows) { payoffs <- c("dd" = 800, "7" = 80, "BBB" = 40, "BB" = 25, "B" = 10, "C" = 10, "0" = 0) same <- length(unique(windows)) == 1 allbars <- all(windows %in% c("b", "BB", "BBB")) if (same) { prize <- payoffs[windows[1]] } else if (allbars) { prize <- 5 } else { cherries <- sum(windows == "C") diamonds <- sum(windows == "DD") } prize <- c(0, 2, 5)[cherries + 1] * c(1, 2, 4)[diamonds + 1] } prize

Your turn Write a function that simulates one pull on the slot machine (i.e, it should randomly choose a value from slots$w1, a value from slots$w2, and a value from slots$w3 then calculate the prize) Remember: solve the problem THEN write a function

# Simulate the first window sample(slots$w1, 1) # Simulate the second window sample(slots$w2, 1) # Simulate the third window sample(slots$w3, 1) # What is the implicit assumption here? # How could we test that assumption?

play_once <- function() { w1 <- sample(slots$w1, 1) w2 <- sample(slots$w2, 1) w3 <- sample(slots$w3, 1) } calculate_prize(c(w1, w2, w3))

Your turn But we need to play the slot machine many times. Create a new function that plays n times and return n prizes. Call it play_n

play_n <- function(n) { prizes <- rep(na, n) for(i in seq_len(n)) { prizes[i] <- play_once() } prizes }

# Now we can see what happens to the mean prize as # n increases games <- data.frame(prizes = play_n(500)) games <- mutate(games, n = seq_along(prizes), avg = cumsum(prizes) / n) qplot(n, avg, data = games, geom = "line") + geom_hline(yintercept = 0.92, color = "red")

1.0 0.8 mean 0.6 0.4 0.2 0.0 100 200 300 400 500 n

1.0 0.8 mean 0.6 0.4 0.2 Is this convincing evidence that μ 0.92? Why? Why not? 0.0 100 200 300 400 500 n

Questions Is 500 pulls enough? What do other realisations look like? How can we do this more quickly?

# Current function is pretty slow system.time(play_n(5000)) # I wrote a vectorised version - instead of # using explicit for loops, use R functions that # work with vectors. This is usually much much # faster source("payoff-v.r") system.time(play_many(5000))

# What happens if we play more games? games <- data.frame(prizes = play_many(10^6)) games <- mutate(games, n = seq_along(prizes), avg = cumsum(prizes) / n) every1000 <- subset(games, n %% 1000 == 0) qplot(n, avg, data = every1000, geom = "line") qplot(n, avg, data = subset(every1000, n > 10000), geom = "line") # Still seems to be quite a lot of variation even # after 1,000,000 pulls

%% remainder %/% integer division seq_len(100) %% 5 seq_len(100) %/% 5 seq_len(100) %% 10 seq_len(100) %/% 10 seq_len(100) %% 11 seq_len(100) %/% 11

# How can we characterise the amount of variation? # We could do multiple runs and look at the # distribution at multiple points # Turn our million pulls into 1,000 sessions of # 1,000 pulls many <- mutate(games, group = (n - 1) %/% 1000 + 1, group_n = (n - 1) %% 1000 + 1) # How do we calculate the average? Just looking # at the cumulative sum will no longer work

# New function: ave # ave takes the first argument, divides it into # pieces according to the second argument, applies # FUN to each piece, and joins them back together many$avg <- ave(many$prize, many$group, FUN = cumsum) / many$group_n every10 <- subset(many, group_n %% 10 == 0) qplot(group_n, avg, data = every10, geom = "line", group = group, alpha = I(1/5))

# Could just look at the distribution at pull # 1000 final <- subset(many, group_n == 1000) qplot(avg, data = final, binwidth = 0.01) # What do you think the average payoff is? # This basic technique is called bootstrapping.