Ranking National Football League teams using Google s PageRank.

Similar documents
DATA ANALYSIS II. Matrix Algorithms

Bracketology: How can math help?

Search engines: ranking algorithms

Proximal mapping via network optimization

Part 1: Link Analysis & Page Rank

Google s PageRank: The Math Behind the Search Engine

Perron vector Optimization applied to search engines

LECTURE 4. Last time: Lecture outline

Predicting Margin of Victory in NFL Games: Machine Learning vs. the Las Vegas Line

Web Graph Analyzer Tool

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

arxiv: v3 [math.na] 1 Oct 2012

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time

Point Spreads. Sports Data Mining. Changneng Xu. Topic : Point Spreads

Lecture 4: Partitioned Matrices and Determinants

UNCOUPLING THE PERRON EIGENVECTOR PROBLEM

Lecture 2 Matrix Operations

The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.)

Estimation of Unknown Comparisons in Incomplete AHP and It s Compensation

Time-dependent Network Algorithm for Ranking in Sports

Structure Preserving Model Reduction for Logistic Networks

CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC

Notes on Symmetric Matrices

Baseball and Statistics: Rethinking Slugging Percentage. Tanner Mortensen. December 5, 2013

3.2 Roulette and Markov Chains

Ranking on Data Manifolds

Improving paired comparison models for NFL point spreads by data transformation. Gregory J. Matthews

Practical Graph Mining with R. 5. Link Analysis

From the probabilities that the company uses to move drivers from state to state the next year, we get the following transition matrix:

Linear Programming: Chapter 11 Game Theory

Robust Rankings for College Football

Analysis of an Artificial Hormone System (Extended abstract)

The PageRank Citation Ranking: Bring Order to the Web

A Google-like Model of Road Network Dynamics and its Application to Regulation and Control

Security Risk Management via Dynamic Games with Learning

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Math Quizzes Winter 2009

International Statistical Institute, 56th Session, 2007: Phil Everson

The Mathematics Behind Google s PageRank

Monte Carlo methods in PageRank computation: When one iteration is sufficient

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

THE $25,000,000,000 EIGENVECTOR THE LINEAR ALGEBRA BEHIND GOOGLE

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Colley s Bias Free College Football Ranking Method: The Colley Matrix Explained

Social Network Generation and Friend Ranking Based on Mobile Phone Data

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks

1 Solving LPs: The Simplex Algorithm of George Dantzig

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

1 o Semestre 2007/2008

Similarity and Diagonalization. Similar Matrices

1/3 1/3 1/

5.1 Bipartite Matching

LINEAR-ALGEBRAIC GRAPH MINING

Hacking-proofness and Stability in a Model of Information Security Networks

Power Rankings: Math for March Madness

2. THE WEB FRONTIER. n + 1, and defining an augmented system: x. x x n+1 (3)

Least-Squares Intersection of Lines

Sport Timetabling. Outline DM87 SCHEDULING, TIMETABLING AND ROUTING. Outline. Lecture Problem Definitions

A characterization of trace zero symmetric nonnegative 5x5 matrices

Walk-Based Centrality and Communicability Measures for Network Analysis

Optimization in R n Introduction

T ( a i x i ) = a i T (x i ).

A linear algebraic method for pricing temporary life annuities

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

1 Introduction to Matrices

Abstract Data Type. EECS 281: Data Structures and Algorithms. The Foundation: Data Structures and Abstract Data Types

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Social Media Mining. Network Measures

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

Link Analysis. Chapter PageRank

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Online edition (c)2009 Cambridge UP

Predicting Influentials in Online Social Networks

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

The Assignment Problem and the Hungarian Method

MAT 242 Test 2 SOLUTIONS, FORM T

Number of hours in the semester L Ex. Lab. Projects SEMESTER I 1. Economy Philosophy Mathematical Analysis Exam

Graph Algorithms and Graph Databases. Dr. Daisy Zhe Wang CISE Department University of Florida August 27th 2014

SGL: Stata graph library for network analysis

Cooperative Virtual Machine Management for Multi-Organization Cloud Computing Environment

arxiv: v1 [math.pr] 15 Aug 2014

PageRank Optimization in Polynomial Time by Stochastic Shortest Path Reformulation

6.231 Dynamic Programming and Stochastic Control Fall 2008

Online Adwords Allocation

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Introduction to Matrix Algebra

DERIVATIVES AS MATRICES; CHAIN RULE

2.3 Convex Constrained Optimization Problems

Matrix Multiplication

A Markovian model of the RED mechanism solved with a cluster of computers

Graph Processing and Social Networks

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Reinforcement Learning

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Combining BibTeX and PageRank to Classify WWW Pages

Zeros of a Polynomial Function

Minimally Infeasible Set Partitioning Problems with Balanced Constraints

Linear Programming I

Largest Fixed-Aspect, Axis-Aligned Rectangle

ON THE COMPLEXITY OF THE GAME OF SET.

Transcription:

Ranking National Football League teams using Google s PageRank. A. Govan C. Meyer Department of Mathematics North Carolina State University A.A. Markov Anniversary Meeting, 2006

Outline Google s ranking algorithm. Summary

Google s ranking algorithm. Google s ranking. Think of the internet as a graph. Webpages are nodes of the graph, n nodes. Hyperlinks are directed edges. Basic Idea: r(p ) = r(q) Q B P Q where r(p ) is the rank of a webpage P, B P is the set of webpages pointing to P, and Q is the outdegree of a webpage Q.

Google s ranking algorithm. Google Matrix. Hyperlink Matrix H { 1/ i there is a link from i to j H(i, j) = 0 otherwise Stochastic matrix S Obtained by modifying matrix H. Replace the zero rows of H with (1/n)e T, where e is a column vector of ones. Google Matrix G. Convex combination: G = αs + (1 α)ev T, α (0, 1) and v T > 0 Personalization vector v.

Google s ranking algorithm. PageRank vector π. G is the transition probability matrix. G is irreducible (and aperiodic). π is the stationary probability distribution vector. π is unique (up to a scalar multiple).

NFL web. Each NFL team is a node in a graph.

NFL web. Each NFL team is a node in a graph. A regular season game results in a directed edge from the loser to the winner.

NFL web. Each NFL team is a node in a graph. A regular season game results in a directed edge from the loser to the winner. The edges are weighted, the weight is the score difference of the corresponding game.

NFL web. Each NFL team is a node in a graph. A regular season game results in a directed edge from the loser to the winner. The edges are weighted, the weight is the score difference of the corresponding game.

Variations on PageRank. H(i, j) = t w t ij /( j ( t w t ij )) where w ij is the weight on the edge from team i to team j during week t. Dealing with the ith zero row (undefeated team i)

Variations on PageRank. H(i, j) = t w t ij /( j ( t w t ij )) where w ij is the weight on the edge from team i to team j during week t. Dealing with the ith zero row (undefeated team i) (1/32)e T, equally likely to lose to any other team.

Variations on PageRank. H(i, j) = t w t ij /( j ( t w t ij )) where w ij is the weight on the edge from team i to team j during week t. Dealing with the ith zero row (undefeated team i) (1/32)e T, equally likely to lose to any other team. e T i, where e i is the standard basis vector.

Variations on PageRank. H(i, j) = t w t ij /( j ( t w t ij )) where w ij is the weight on the edge from team i to team j during week t. Dealing with the ith zero row (undefeated team i) (1/32)e T, equally likely to lose to any other team. e T i, where e i is the standard basis vector. π T t 1, using the ranking vector from previous week.

Ranking NFL with Keener (SIAM Review,1993) Nonnegative matrix A ( ) Sij + 1 A(i, j) = h, Laplace s rule of succession S ij + S ji + 2 where S ij is the amount of points scored by team i against team j. h(x) = 1 2 + 1 2 sgn(x 1 2 ) 2x 1 Rank vector r is the Perron vector of A.

Ranking NFL with Colley (Colley s Bias Free Matrix Rankings) Colley matrix C { 2 + ntot,i i = j C(i, j) = n j,i i j Laplace s rule of succession where n tot,i is the total number of games played by team i, and n j,i is the number of times team i played team j. Ranking vector r is the solution to the linear system Cr = b where b(i, 1) = 1 + (n w,i n l,i )/2, given that n w,i is the number of games lost by team i and n l,i is the number of games won by team i.

alpha=0.65, v^t=(1/32)e^t Correct predictions 16 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 weeks Colley Google1 Google2 Google3 Keener

alpha=0.65, v^t=(1/32)e^t Correct predictions 16 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 weeks Google1 Google2 Google3

Regular season 2005, α=0.65, v T = (1/32)e T Colley Google1 Google2 Google3 Keener Correct Spread Correct Spread Correct Spread Correct Spread Correct Spread games week 1 0 0 0 0 0 0 0 0 0 0 week 2 0 0 0 0 0 0 0 0 0 0 week 3 7 202 7 215 7 182 7 215 8 560 14 week 4 7 178 8 180 9 150 7 175 9 178 14 week 5 4 198 9 270 6 176 9 274 7 231 14 week 6 10 125 10 137 10 132 10 138 10 140 14 week 7 8 144 5 201 5 172 5 202 4 197 14 week 8 9 196 11 244 11 150 10 249 10 152 14 week 9 12 107 10 165 10 176 10 164 9 145 14 week 10 9 134 10 120 10 140 10 121 10 139 14 week 11 10 196 11 198 11 198 11 197 9 190 16 week 12 12 108 10 155 10 103 10 157 11 153 16 week 13 14 133 13 181 13 167 13 183 12 147 16 week 14 10 171 14 179 14 175 14 176 11 194 16 week 15 11 227 11 217 11 223 11 219 13 206 16 week 16 8 174 10 178 10 178 10 178 9 176 16 week 17 9 221 11 214 11 214 11 214 8 213 16 Total 140 2514 150 2854 148 2536 148 2862 140 3021 224

Results. PageRank based ranking method: depends on α and the personalization vector v T. does not appear to depend on the method of adjusting the zero rows.

Summary Summary Based on preliminary simulations PageRank variation method outperforms Keener s and Colley method. Future Personalization vector(s) using season statistics data. Automate the selection of the best α for a specified v T. Extend the convex combination to include more then one personalization vector.