Lucky vs. Unlucky Teams in Sports



Similar documents
Multiple Linear Regression

Statistical Models in R

International Statistical Institute, 56th Session, 2007: Phil Everson

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Comparing Nested Models

Exchange Rate Regime Analysis for the Chinese Yuan

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

n + n log(2π) + n log(rss/n)

5. Linear Regression

Psychology 205: Research Methods in Psychology

Correlation and Simple Linear Regression

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Using R for Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

MIXED MODEL ANALYSIS USING R

ANOVA. February 12, 2015

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Week 5: Multiple Linear Regression

A Statistical Analysis of Popular Lottery Winning Strategies

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Testing for Lack of Fit

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Generalized Linear Models

Final Exam Practice Problem Answers

Chapter 7: Simple linear regression Learning Objectives

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Chapter 7 Section 1 Homework Set A

Betting systems: how not to lose your money gambling

THE DETERMINANTS OF SCORING IN NFL GAMES AND BEATING THE SPREAD

N-Way Analysis of Variance

Outline. Dispersion Bush lupine survival Quasi-Binomial family

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Statistical Models in R

How to bet and win: and why not to trust a winner. Niall MacKay. Department of Mathematics

The Determinants of Scoring in NFL Games and Beating the Over/Under Line. C. Barry Pfitzner*, Steven D. Lang*, and Tracy D.

Logistic Regression (a type of Generalized Linear Model)

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview

Prices, Point Spreads and Profits: Evidence from the National Football League

Fourth Problem Assignment

Regression step-by-step using Microsoft Excel

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

Multiple Regression. Page 24

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Implementing Panel-Corrected Standard Errors in R: The pcse Package

Introduction to Hierarchical Linear Modeling with R

Chapter 3 Quantitative Demand Analysis

Lab 13: Logistic Regression

Multi Factors Model. Daniel Herlemont. March 31, Estimating using Ordinary Least Square regression 3

Premaster Statistics Tutorial 4 Full solutions

EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS?

Multivariate Logistic Regression

Predicting Box Office Success: Do Critical Reviews Really Matter? By: Alec Kennedy Introduction: Information economics looks at the importance of

College Tuition: Data mining and analysis

The NCAA Basketball Betting Market: Tests of the Balanced Book and Levitt Hypotheses

1.1. Simple Regression in Excel (Excel 2010).

An Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, Urbana-Champaign

Part 2: Analysis of Relationship Between Two Variables

Régression logistique : introduction

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Getting Correct Results from PROC REG

Sam Schaefer April 2011

A New Interpretation of Information Rate

DEVELOPING A MODEL THAT REFLECTS OUTCOMES OF TENNIS MATCHES

We have put together this beginners guide to sports betting to help you through your first foray into the betting world.

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Lesson #436. The Animals` Guide to Sports Betting

Part II. Multiple Linear Regression

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

Betting with the Kelly Criterion

2. Linear regression with multiple regressors

Does pay inequality within a team affect performance? Tomas Dvorak*

BIOL 933 Lab 6 Fall Data Transformation

Point Spreads. Sports Data Mining. Changneng Xu. Topic : Point Spreads

One-Way Analysis of Variance

Elementary Statistics Sample Exam #3

Chapter 4 and 5 solutions

THE ULTIMATE FOOTBALL BETTING GUIDE

MATH 564 Project Report. Analysis of Desktop Virtualization Capacity with. Linear Regression Model

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

SPSS Guide: Regression Analysis

Univariate Regression

Decision Theory Rational prospecting

Q = ak L + bk L. 2. The properties of a short-run cubic production function ( Q = AL + BL )

Prediction Markets, Fair Games and Martingales

Chapter 4. Probability Distributions

Beating the MLB Moneyline

How To Increase Your Odds Of Winning Scratch-Off Lottery Tickets!

Transcription:

Lucky vs. Unlucky Teams in Sports Introduction Assuming gambling odds give true probabilities, one can classify a team as having been lucky or unlucky so far. Do results of matches between lucky and unlucky teams fit the gambling odds? Determining Luck In this project, we will define the luck of, for example, team A to be Luck (A) = (actual number of wins) - (expected number of wins). For our purposes, the expected number of wins will be defined as the summation of all the team s probabilities of winning. As an example, suppose team A has played 5 games so far and had the following probabilities of winning and actual results: Game Probability of Winning Result 1.70 Win 2.60 Loss 3.65 Win 4.45 Loss 5.40 Loss We would calculate the luck of team A as follows: luck(a) = (actual number of wins) - (expected number of wins) = 2 - (70% + 60% + 65% + 45% +40%) = -0.8 Therefore, at this point in the season, Team A s luck is -0.8. Therefore, Team A has been unlucky up to this point in the season. *Note: Luck refers to games that have already been played, so when analyzing the first game of a season, every team has luck=0. Calculating Probabilites Many websites are available to convert moneylines to probabilities. For example, I used the website: http://www.covers.com/sportsbetting/money_lines.aspx. However, these probabilities take into account the bookie's take in the bet, so the sum of the probabilities ends up being greater than 1. For example, in Week 9 of the last NFL season, there was Atlanta (-300) vs. Indianapolis (+250) converts to Atlanta (75.0%) vs. Indianapolis (28.6%) so we have.750 +.286 = 1.036. For our purposes, these probabilities need to sum to one as there is 100% that either one team or the other will win. [Ties are possible in the NFL. However they are not very likely. There have only been two in the past 15 years. Thus, the possibility of tie will be ignored for the purposes of this research.] Therefore, the new probability of team A beating team B will computed as follows:

P_New(team A wins) = P(A wins)/[p(a wins) + P(B wins)] Therefore, in the example above, we would now have the following probabilities: The probabilities now sum to 1, as desired. P(Atlanta wins) =.75/(.75 +.286) =.724 P(Indianapolis wins) =.286/(.75 +.286) =.276 Using Luck for Analysis Going back to the previous example, suppose team A is about to play team B. Take whichever of the two teams is relatively luckier say luck(b) = -0.2, so it's B and calculate the difference luck-difference = luck(b) - luck(a) = 0.6. So now when A plays B, we take the relatively luckier team, B, look at the gambling odds of B winning, say 0.45, and then we record the three pieces of data Gambling Odds Luck Difference Result (win or loss for Team B) 0.45 0.60 Loss We want this data for every game in the season. Once you've got the data we can do a linear regression. Results > summary(research.lm) Call: lm(formula = result ~ probability + luck, data = research) Residuals: Min 1Q Median 3Q Max -0.8633-0.4444 0.1699 0.3367 0.8059 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.04891 0.10150 0.482 0.6303 probability 1.06660 0.16957 6.290 1.52e-09 *** luck -0.04447 0.02634-1.689 0.0926. --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.4519 on 237 degrees of freedom Multiple R-squared: 0.1432, Adjusted R-squared: 0.1359

F-statistic: 19.8 on 2 and 237 DF, p-value: 1.119e-08 Betting Strategy Let s try the following betting strategy: Bet one dollar on the more lucky team. If the probability (from betting odds) of that team winning is p then we will lose with probability 1-p or gain (1/p) - 1 with probability p, and the variance from this bet is (1-p)/p. If we bet c (dollars) then the variance is c^2 (1-p)/p. Here, we will the try the strategy where we choose c to make the variance = 1, that is choose c = sqrt{p/(1- p)} We will use the data to see how much one would have gained or lost by making this bet on each of the N matches in a season. The point is that the theoretical variance (assuming betting odds correspond to true probabilities) = N so we can compare the observed overall gain-or-loss to the "standard deviation" = sqrt{n}. Results of the Betting Strategy The betting strategy above yields a loss of approximately $7.61. Therefore, the use of luck has not been able effective in helping us implement a successful betting strategy. Appendix # Python Code to Create CSV file with the gambling odds, luck # # difference, and result for the relatively luckier team # import csv, sys filename = 'all_games_new' try: csv_in = csv.reader(open(filename + '.csv', 'r')) except: print 'Invalid log file name:', filename sys.exit(1) research = {} for row in csv_in: if row == []: if row[0] == 'Number': Number = int(row[0]) Week = int(row[1]) Home = row[2] Prob_H = float(row[3]) Result_H = int(row[4])

Visitor = row[5] Prob_V = float(row[6]) Result_V = int(row[7]) Luck_H = float(row[8]) Luck_V = float(row[9]) Luck_Diff = float(row[10]) if Number not in research: research[number] = {} research[number] = Prob_H, Result_H, Luck_H, Prob_V, Result_V, Luck_V, Luck_Diff # Create a csv file: file = open('final_research_data.csv', 'wb') csv_out= csv.writer(file) for number in sorted(research): if research[number][2] > research[number][5]: csv_out.writerow([research[number][0], research[number][6], research[number][1]]) csv_out.writerow([research[number][3], research[number][6], research[number][4]]) file.close() # Python Code to Create CSV file with results from betting strategy # import csv, sys, math filename = 'betting_strategy_53' # Function to compute return on bet def bet_return(moneyline, result, bet): if moneyline < 0: if result == 1: x = 100/((-1*moneyline)/bet) x = -1*bet if result == 1: x = moneyline/(100/bet) x = -1*bet try: csv_in = csv.reader(open(filename + '.csv', 'r')) except:

print 'Invalid log file name:', filename sys.exit(1) new_research = {} for row in csv_in: if row[0] == []: if row[0] == 'Number': number = int(row[0]) team = row[1] probability = float(row[2]) luck = float(row[3]) moneyline = int(row[4]) result = int(row[5]) bet = float(row[6]) if number not in new_research: new_research[number] = {} new_research[number] = team, probability, luck, moneyline, result, bet, bet_return(moneyline, result, bet) file = open('betting_with_results.csv', 'wb') csv_out = csv.writer(file) csv_out.writerow(['team', 'Probability', 'Luck', 'Moneyline', 'Result', 'Bet', 'Winnings']) for number in sorted(new_research): csv_out.writerow([new_research[number][0], new_research[number][1], new_research[number][2], new_research[number][3], new_research[number][4], new_research[number][5], new_research[number][6]]) file.close()