Predicting the NFL Using Twitter. Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith



Similar documents
Predicting the NFL Using Twitter

How To Bet On An Nfl Football Game With A Machine Learning Program

Beating the NCAA Football Point Spread

International Statistical Institute, 56th Session, 2007: Phil Everson

Numerical Algorithms for Predicting Sports Results

Does NFL Spread Betting Obey the E cient Market Hypothesis?

Predicting outcome of soccer matches using machine learning

Volume 30, Issue 4. Market Efficiency and the NHL totals betting market: Is there an under bias?

During the course of our research on NBA basketball, we found out a couple of interesting principles.

Pick Me a Winner An Examination of the Accuracy of the Point-Spread in Predicting the Winner of an NFL Game

Betting Terms Explained

Football Match Winner Prediction

A Contrarian Approach to the Sports Betting Marketplace

Predicting sports events from past results

Fair Bets and Profitability in College Football Gambling

The NCAA Basketball Betting Market: Tests of the Balanced Book and Levitt Hypotheses

Initial Report. Predicting association football match outcomes using social media and existing knowledge.

Sports Action & PlayNow Sports FAQ s

A Test for Inherent Characteristic Bias in Betting Markets ABSTRACT. Keywords: Betting, Market, NFL, Efficiency, Bias, Home, Underdog

THE ULTIMATE FOOTBALL BETTING GUIDE

THE DETERMINANTS OF SCORING IN NFL GAMES AND BEATING THE SPREAD

HOW TO BET ON FOOTBALL. Gambling can be addictive. Please play responsibly.

Sport Hedge Millionaire s Guide to a growing portfolio. Sports Hedge

Do College Football Games Affect the Level of Crime in the Local Community?

Guide to Spread Betting

Herd Behavior and Underdogs in the NFL

LIVE BETTING ULTRA RULES

Forecasting Accuracy and Line Changes in the NFL and College Football Betting Markets

We have put together this beginners guide to sports betting to help you through your first foray into the betting world.

The Fibonacci Strategy Revisited: Can You Really Make Money by Betting on Soccer Draws?

EXAMINING NCAA/NFL MARKET EFFICIENCY

Football Bets Explained

Rating Systems for Fixed Odds Football Match Prediction

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

The 7 Premier League trends the bookies don t want YOU to know...

Making Sense of the Mayhem: Machine Learning and March Madness

WIN AT ANY COST? How should sports teams spend their m oney to win more games?

Finding Advertising Keywords on Web Pages. Contextual Ads 101

Beating the MLB Moneyline

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Testing Efficiency in the Major League of Baseball Sports Betting Market.

Additional details >>> HERE <<<

The Determinants of Scoring in NFL Games and Beating the Over/Under Line. C. Barry Pfitzner*, Steven D. Lang*, and Tracy D.

Ncaa football point spread cbs, college football spread week , college football spread picks week 7. > CHECK NOW <

The Performance of Betting Lines for Predicting the Outcome of NFL Games

My Football Trading BluePrint for Making Thousands of Dollars Betting on Both Teams to Score [BTTS-Yes] Market!

Pennsylvania Lottery Draft Day Pro Football Giveaway /20/ The Pennsylvania Lottery Draft Day Pro Football

Numerical algorithms for predicting sports results Jack Blundell Computer Science (with Industry) 2008/2009

Home Bias in the NFL Pointspread Market. Matt Cundith Department of Economics California State University, Sacramento

VALIDATING A DIVISION I-A COLLEGE FOOTBALL SEASON SIMULATION SYSTEM. Rick L. Wilson

HOW TO PLAY JustBet. What is Fixed Odds Betting?

DOES SPORTSBOOK.COM SET POINTSPREADS TO MAXIMIZE PROFITS? TESTS OF THE LEVITT MODEL OF SPORTSBOOK BEHAVIOR

BetXchange Rugby World Cup & Betting Guide

THE ULTIMATE BASKETBALL BETTING GUIDE

In the situations that we will encounter, we may generally calculate the probability of an event

Beating the Book: Are There Patterns in NFL Betting Lines?

Sentiment analysis on tweets in a financial domain

Tech Presentation 2016

More details >>> HERE <<<

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

Guide to Spread Betting

HOW TO PROFIT FROM PARLAYS

Behavioural Biases in the European Football Betting Market

STRAIGHT BET. Straight Bet Teaser Parlay

UCONNECT VOICE COMMANDS QUICK GUIDE

Football Trading Portfolio: Winning Made Easy Football Trading Portfolio - Real User Experience ->> Enter Here > Visit Now <

Journal of Quantitative Analysis in Sports

An Analysis of Sportsbook Behavior and How to Profit. Chris Ludwiczak. Advisor: Dr. John Clark

UZH Business Working Paper Series (ISSN )

Virtual Sports Betting Secrets

UNDERGROUND TONK LEAGUE

Equotion. Working with Equotion

Comparing & Contrasting. - mathematically. Ways of comparing mathematical and scientific quantities...

TSSAA Handbook FOOTBALL REGULATIONS

THE FORM LAB 2.5 GOAL STRATEGY

Football Trading Portfolio: Winning Made Easy Football Trading Portfolio - Review - > Click Here GET IT NOW

ALIANTE RACE AND SPORTS BOOK HOUSE RULES

EFFICIENCY IN BETTING MARKETS: EVIDENCE FROM ENGLISH FOOTBALL

A Hybrid Prediction System for American NFL Results

FOOTBALL INVESTOR. Member s Guide 2014/15

2 HOW DOES TRADING WORK?. 4 HOW THE SYSTEM WORKS. 6 WHAT IF IT GOES WRONG?. 13 A RECAP. 15 BANKROLL. 16 A SUCCESSFUL TRADE EXAMPLE.

Frequently Asked Questions

ISyE 2028 Basic Statistical Methods - Fall 2015 Analysis of NFL Team Performance Against the Spread Betting Line

Man Vs Bookie. The 3 ways to make profit betting on Football. Man Vs Bookie Sport Betting

The most reliable cards bets in the Premier League. bets that paid out in 84%, 89% and 95% of games last season

How to bet and win: and why not to trust a winner. Niall MacKay. Department of Mathematics

Analysis of Tweets for Prediction of Indian Stock Markets

Payment of bets: In absence of further proofs, the bets will be assigned according to our statistics:

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Does bettor sentiment affect bookmaker pricing?

Chapter 7: Proportional Play and the Kelly Betting System

Lotto! Online Product Guide

Transcription:

Predicting the NFL Using Twitter Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith

Disclaimer This talk will not teach you how to become a successful sports bettor.

Questions What is the NFL and what about it are we predicting? Why are we using Twitter?

NFL = National Football League

What about the NFL are we predicting? Two game outcomes are commonly bet upon: Winner with the spread (Winner WTS) Over-Under Outcomes are determined at the end of the game using numbers set by bookies before the game Due to transaction costs, an accuracy of 53% is needed to be profitable

Point Spread and Over Under Point Spread: Signed number used to determine winner with the spread (WTS) given game score If home_score + spread > away_score: Home wins WTS < away_score: Away wins WTS = away_score: Push Over-Under: Betting line for total points scored by the two teams; bettor chooses over or under Push if point differential equals point spread or total points equals over-under

Point Spread and Over Under Atlanta Falcons win a home game against the Philadelphia Eagles 33-26 But did they win with the spread? Point Spread Outcome < - 7 Falcons lose WTS > - 7 Falcons win WTS = - 7 Push

Why use Twitter? Twitter has been used to predict and/or explain: Opinion Polls Elections Spread of contagious diseases The stock market Movie revenue Food poisoning Civil unrest in Latin America and the Middle East Citations of scientific papers...and the list goes on Why not sports?

Why use Twitter to predict NFL games? Structure of the NFL 32 Teams spread geographically across United States. Regular season partitioned into 17 weeks Roughly a week between games for a given team Participation of fans and spectators Discussion on social media, including Twitter Sports betting Wisdom of the crowd

Fan Participation

Data Gathering

Data Gathering Current and historical NFL game data from NFLdata.com Tweets from 2010-2012 seasons via Twitter Garden Hose stream (10% of all tweets) How to isolate relevant tweets?

Data Alignment Use a preset list of team hashtags to identify relevant tweets Discard tweets containing hashtags corresponding to multiple teams Label tweets with the team their hashtags refer to Pittsburgh Steelers hashtags #steelers #pittsburghsteelers #stillers #gosteelers #gosteelersgo #letsgosteelers #gostillers #gostillersgo #letsgostillers #stillernation #stillersnation #steelernation #steelersnation

Fan Participation (Annotated)

How many Tweets? Season Weekly Tweets (12 hours after previous game to 1 hour before upcoming game) Pregame Tweets (24 hours to 1 hour before upcoming game) Postgame Tweets (4 hours to 28 hours after previous game) 2010 40,385 53,294 185,709 2011 130,977 147,834 524,453 2012 266,382 290,879 1,014,473

Data Alignment Given the set a tweet is contained in, label it with the appropriate week of the season For pregame and weekly tweets, this is the week the upcoming game will be played For postgame tweets, this is the week the previous game was played The team and week labels of a tweet determine the unique game the tweet corresponds to

A Simple Task Sanity check: Can we look at a postgame tweet and determine if the team it corresponds to won or lost?

Postgame Tweet Analysis Use a bag-of-words model to extract features for each tweet In-tweet word frequency features TF-IDF features: In-tweet word frequency Word frequency over all postgame tweets from the week Classify tweets as wins for Home or Away team For k in [1,16]: Train classifier on all postgame tweets from 2010 to week k of the 2012 season

Postgame Tweet Analysis Average accuracy of 67% over 2012 season Very simple features and parameter settings Room for improvement Highly weighted features (Top or bottom 30) for a Home team win: home: win home: victory away: loss home: won home: WIN away: lost home: Great away: lose away: refs

Forecasting

2012 Season Live Predictions During the 2012 season, we tweeted predictions before games using @NFLOracle We predicted 34/58 or 65.4% of winner WTS results correctly! We did not encounter our tweets in the twitter garden-hose stream

Training and Testing Predict outcomes of upcoming games using Logistic Regression: Seasons 2010, 2011 and Weeks [1, k-3] of 2012 Train on all games, with the exception of the last week of a season. Weeks [k-2, k-1] of 2012 Tune L1/L2 regularization parameters. Week k of 2012 Predict and Test Apply procedure over weeks 3-16 of 2012 (or current season)

Predicting the NFL without using Twitter Use historical game data to create simple feature sets Point spreads, over unders, scoring, etc. Combine pairs of simple feature to get a preliminary list of game based feature sets Highest average accuracy of 56% Serve as a baseline for Twitter derived feature sets

Simple features derived from tweets Word level features (Twitter unigrams) Use only words that appear in at least.1% of all tweets Use log (word frequency + 1) over all weekly tweets corresponding to the game Creates a high dimensional feature space (~20k words) This is high due to the small set of games How to combine these features with features generated from historical game data?

Dimensionality Reduction Canonical Correlation Analysis Dimensionality Reduction Combining of multiple data streams Results of applying CCA to Twitter unigrams and game statistics features: Feature Set Winner WTS Accuracy Twitter Unigrams 47.6 1 component CCA 50.4 2 component CCA 51.0 4 component CCA 51.9 8 component CCA 48.1

The Rate Feature Measure the difference in volume of a team s tweets across consecutive weeks Easy to compute Considers tweets collectively rather than individually Doesn t use point spread or any game statistics Highest average accuracy using a rate feature set is 56%

Results

Conclusion Our results suggest that a social media driven approach can be effective in predicting sporting events. Download our dataset of NFL game outcomes and tweet IDs! www.ark.cs.cmu.edu/football/