Research on information propagation analyzing odds in horse racing



Similar documents
Simple Linear Regression Inference

17. SIMPLE LINEAR REGRESSION II

Univariate Regression

2. Simple Linear Regression

Direct test of Harville's multi-entry competitions model on race-track betting data

Lock! Risk-Free Arbitrage in the Japanese Racetrack Betting. Market

Correlation key concepts:

AP Physics 1 and 2 Lab Investigations

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Concepts in Investments Risks and Returns (Relevant to PBE Paper II Management Accounting and Finance)

Market efficiency in greyhound racing: empirical evidence of absence of favorite-longshot bias

Time series Forecasting using Holt-Winters Exponential Smoothing

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Exercise 1.12 (Pg )

11. Analysis of Case-control Studies Logistic Regression

EFFICIENCY IN BETTING MARKETS: EVIDENCE FROM ENGLISH FOOTBALL

Econometrics Simple Linear Regression

The Big Picture. Correlation. Scatter Plots. Data

International Statistical Institute, 56th Session, 2007: Phil Everson

1.1 Win means the type of betting where the investor has to select the 1st horse in the race. The winning selection is the selection of the 1st horse.

Late Money and Betting Market Efficiency: Evidence from Australia

Review of Fundamental Mathematics

The correlation coefficient

How to Win the Stock Market Game

McMaster University. Advanced Optimization Laboratory. Advanced Optimization Laboratory. Title: Title:

5. Linear Regression

Confidence Intervals for Cp

Part 1 : 07/27/10 21:30:31

Gamma Distribution Fitting

Simple Regression Theory II 2010 Samuel L. Baker

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Simple Linear Regression

Practice Set #4 and Solutions.

Introduction to Regression and Data Analysis

This content downloaded on Tue, 19 Feb :28:43 PM All use subject to JSTOR Terms and Conditions

Normality Testing in Excel

$ ( $1) = 40

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

Decision Theory Rational prospecting

Example G Cost of construction of nuclear power plants

Chapter 7: Proportional Play and the Kelly Betting System

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

36 Odds, Expected Value, and Conditional Probability

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

Up/Down Analysis of Stock Index by Using Bayesian Network

Lecture notes for Choice Under Uncertainty

How To Check For Differences In The One Way Anova

Common Core Unit Summary Grades 6 to 8

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

January 2012, Number 64 ALL-WAYS TM NEWSLETTER

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

C(t) (1 + y) 4. t=1. For the 4 year bond considered above, assume that the price today is 900$. The yield to maturity will then be the y that solves

Common Core State Standards for Mathematics Accelerated 7th Grade

Quantitative Analysis of Foreign Exchange Rates

The degree of a polynomial function is equal to the highest exponent found on the independent variables.

Second Midterm Exam (MATH1070 Spring 2012)

REGULATING INSIDER TRADING IN BETTING MARKETS

Least Squares Estimation

5 Cumulative Frequency Distributions, Area under the Curve & Probability Basics 5.1 Relative Frequencies, Cumulative Frequencies, and Ogives

Jim Gatheral Scholarship Report. Training in Cointegrated VAR Modeling at the. University of Copenhagen, Denmark

Chapter 23. Inferences for Regression

Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless

Name: Date: Use the following to answer questions 2-3:

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Module 3: Correlation and Covariance

Unit 19: Probability Models

Dealing with Data in Excel 2010

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

TIME SERIES ANALYSIS & FORECASTING

Chapter 21: The Discounted Utility Model

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Frequently Asked Questions

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

4. Simple regression. QBUS6840 Predictive Analytics.

Chapter 2 Portfolio Management and the Capital Asset Pricing Model

Chi Square Tests. Chapter Introduction

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Target Strategy: a practical application to ETFs and ETCs

Session 7 Bivariate Data and Analysis

Some Quantitative Issues in Pairs Trading

CHAPTER 11: ARBITRAGE PRICING THEORY

Journal of Engineering Science and Technology Review 2 (1) (2009) Lecture Note

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Enriching Tradition Through Technology

Tennessee Department of Education. Task: Sally s Car Loan

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

II. DISTRIBUTIONS distribution normal distribution. standard scores

Chapter 3 RANDOM VARIATE GENERATION

Industry Environment and Concepts for Forecasting 1

INTRODUCTION TO ERRORS AND ERROR ANALYSIS

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

Microsoft Excel Tutorial

CALCULATIONS & STATISTICS

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Transcription:

Challenges for Analysis of the Economy, the Businesses, and Social Progress Péter Kovács, Katalin Szép, Tamás Katona (editors) - Reviewed Articles Research on information propagation analyzing odds in horse racing Shingo Ohta 1 Masato Yamanaka 2 Joe Sato 3 Takaaki Nomura 4 We focus on the odds in horse racing to study the information propagation to get an idea for fluctuation on information propagation. Analyzing past data of horse racing, and constructing a mathematical model of winning probability, we find a correlation between odds and results of races. As a first step, we focus on two ways of betting called WIN and EXACTA. We confirm that the winning probability derived from odds of each horse is mostly in accord with actual data. Then, comparing the result with a stochastic model of winning probability constructed with EXACTA odds, we find out fluctuations between them. Finally, we consider where is the origin of the fluctuations. Keywords: information propagation, stochastic model, horse racing 1. Motivation and Introduction Our motivation is to find a better way to share information. Especially, we focus on the case that people share the information. The better way means the way to convey information more correctly, more rapidly. To find it, we must take into account the fluctuations caused by human errors, since the fluctuations cause information errors. Therefore the information considered in this research should be formed by numerous people. In this sense, the horse racing is desirable to analyze. 1 Shingo Ohta, Bachelor of Science, Ph.D student, Saitama University, Faculty of Science, Japan. 2 Masato Yamanaka, Ph.D, post doctoral fellow, University of Tokyo, Institute for Cosmic Ray Research, Japan. 3 Joe Sato, Ph.D, Associate Professor, Saitama University, Faculty of Science, Japan. 4 Takaaki Nomura, Ph.D, post doctoral fellow, Saitama University, Faculty of Science, Japan. 1367

Shingo Ohta Masato Yamanaka Joe Sato Takaaki Nomura 2. The analysis of WIN odds 2.1. Odds and WIN There are several ways to bet in horse racing. For the first step, we focus on WIN, which is the simplest way to bet. Before explaining our analysis, we explain briefly odds and WIN. The odds are the values which change according to betting, and dividends are defined by these values. WIN is a way to bet that people guess which horse will win. The odds therefore reflect popularity of each horse. WIN odds of a horse is defined as the ratio between total money bet on a race and the money bet on a horse: 2.2. χ 2 test of goodness-of-fit In the analysis of WIN odds, what we want to know is whether the WIN odds reflect strength of each horse.then we consider that how well the results reflect the expectations. We define N(O) as the number of horses which have odds O and P(O) as the probability that the horses which have odds O win. Then we consider that with what probability P(O), N horses with odds O win? For this analysis, we construct a stochastic model and test the goodness-of-fit between the theoretical value of the number of winner, N(O)P(O), and the observed value of number of winner, n(o). Note that we can get data of the odds O, N(O) and n(o). To test the goodness-of-fit, we should define the criterion of the fit. We adopt the χ 2 test of goodness-of-fit for this analysis.we split whole region of odds, give a number to the each odds bin like O i, and sum up the random variable corresponding to each odds bin. Then we derive the following χ 2 by assuming that N is large enough to satisfy the Stirling s formula. Using this criterion, we test the goodness-of-fit with a significance level: 2.3. Stochastic model As the last preparation for the analysis, we set a stochastic model of winning probability. We assume that people know a winning probability of horses correctly. Then, it is equal to a share of a bet, namely, it is given by 1368

Research on information propagation analyzing odds in horse racing The right hand side is originally a share of a bet derived according to the definition of dividends by Japan Racing Association(JRA). Note that the width of each odds bin should be wide enough. To analyze more precisely, there should be enough data in each odds bin. In Figure 1, we divided odds region from odds equal 1 to odds equal 31 into 150 bins. Then, we draw the theoretical value P(O i ) and plots the actual results n(o i )/N(O i ), where the horizontal axis means the odds of horses, and the vertical axis means the probability of winning. At a glance, they coincide with each other very clearly. 2.4. Quantitative check of the stochastic model To test the goodness-of-fit, we show the χ 2 values for each year from 2003 to 2008 with corresponding significance level (Table 1). We thus confirm the good fitting quantitatively. Figure.1 The result of WIN odds analysis(degrees of freedom are 150). Most of data plots are in accord with the theoretical line.(jra official data 1986-2009, free database soft PC KEIBA Database for JRA-VAN Data lab) 1369

Shingo Ohta Masato Yamanaka Joe Sato Takaaki Nomura Table.1 χ 2 values from data in each year from 2003 through 2008. We apply the usual notations, * for significant at 10%, ** significant at 5%, *** significant at 1%. 3. Comparison with EXACTA odds 3.1. What is EXACTA Next, we compare the winning probability constructed from WIN odds with that of EXACTA. This is a way to bet that people guess the winner and the next in order. The reason we choose EXACTA is that the process to guess is almost same as that of WIN. Though the two processes are similar, there may be a little difference between them. For instance, we may expect that even if two horses WIN odds are small, it is not always true that its EXACTA correspondense. 3.2. Stochastic model and indicator of the fluctuation As we did in the analysis of WIN odds, we construct a stochastic model for the analysis of EXACTA odds. We have to construct a probability that horse α win and β finish next. One of such a probability is given by WIN odds, O α and O β : The right equation is the stochastic model for WIN odds. The p α is the probability for the horse α win, and the fact that the horse β is the second winner is same as the 1370

Research on information propagation analyzing odds in horse racing fact that the horse β win without horse α. Therefore the probability is given by a conditional probability p β /(1- p α ), and we get the above equation. The other probability is given by EXACTA odds, O αβ : We assume that people know the probability that horse α win and the horse β is the next, as WIN odds. Then, the stochastic model for EXACTA odds is similar with that for WIN odds. The difference is the numerator factor, which comes from the difference of definition between two types of dividends. Next, to compare these values, we define the indicator of the fluctuations. We denote two types of probabilities as follows: We give a serial number for all conditions of horses α and β from 1 to m. In each race the number of combination is given by m i and hence m=σ i=1 M m i with total M races. Thus the super script runs from 1 to m. Then, we define the correlation coefficients as the indicator of the fluctuations : where and also check the slope of regression line for these data, 3.3. A fluctuation between two kinds of probabilities In Figure 2, we plot for all races in year 2007. The horizontal axis is the probability given by EXACTA odds and the vertical axis is that of given by WIN odds. Note that if there is a perfect coincidence between them all data points 1371

Shingo Ohta Masato Yamanaka Joe Sato Takaaki Nomura must lie on the green line. We find that the data points are dense along the line and most of them are along the line. Indeed, they have large correlation coefficient, that is, they have quite strong correlation. However, in whole region, they have a little larger slope than that of the green line, in other words, we find that they have different distributions though they should have same ones because of their propaties. Especially, in the region with high probability, most of the data points are over the green line. In lower probability region, the situation is quite opposite. Figure 3 is a magnified figure in the region with low probability. Contrast to whole region, the regression line has a little smaller slope than that of green line. This means that the pairs of horses with high EXACTA odds look more attractive than expected by WIN odds, which represent the strength of each horse. This is the fluctuation caused by people to bet who are eager to get much money at one bet. Hence we conclude that there is definitely the fluctuation between WIN odds and EXACTA odds. Figure 2 The comparison between two kinds of probabilities we saw in section 3.2(for all races in 2007). 1372

Research on information propagation analyzing odds in horse racing Figure 3 The magnified figure of Fig.2 in low probability region. The slope of regression line is lower than that of perfect correlation line. 4. The origin of fluctuations 4.1. Stochastic model and Indicator of the fluctuations In the last step, we focus on the origin of the fluctuations. From the previous two analyses, we saw that the expectations in EXACTA deviate from the WIN expectations though they should be same. Thus we investigate where is the origin of the deviations. For the investigation, we compare two types of probabilities.one is the probability we saw in the WIN odds analysis, which contains the information purely about the expectation of winner. The other is defined as the sum of probabilities for EXACTA, 1373

Shingo Ohta Masato Yamanaka Joe Sato Takaaki Nomura which contains the information of the expectation for the second horse. Then, comparing these two values, we can find out the deviations of the EXACTA odds from the WIN odds. In this analysis, we adopt the correlation coefficient and the slope of regression line as the indicators of deviations as in the previous analysis. 4.2. The Deviation from the probability for WIN odds In Figure 4, we plot for all races in year 2007. The horizontal axis is the probability given by the EXACTA odds, and the vertical axis is that of given by WIN odds. There is quite strong correlation between these two values. However, as we can see, they are not exactly same each other. Therefore the main origin of the deviation in the previous analysis is seen in this analysis. Thus we conclude that the deviation is in the expectation of the second horse. Figure 4 The comparison between two kinds of winning probability(for all races in 2007). 1374

Research on information propagation analyzing odds in horse racing 5. Summary In the first analysis, we confirmed that the result of each race reflects the WIN odds. In the next, we confirmed the existence of the fluctuations in EXACTA odds. Then, from these results, we investigated the origine of the deviations of the EXACTA odds from the WIN odds. Finally, we concluded that there is fluctuation in the expectation of the second horse. References [1] K. Park and E. Domany, Power law distribution of dividends in horse races, Europhy Lett, 53(4)pp.419-425. [2] T. Ichinomiya, Physica A, Volume 368, Power-law distribution in Japanese racetrack betting, Issue 1, 1 August 2006, Pages 207-213 [3] D. A. Harville. Assigning Probabilities to the Outcomes of Multi-Entry Competitions, pp.213-217 in Efficiency of racetrack betting markets 2008 edition D. B. Hausch, V. SY Lo and T. Ziemba [4] Plackett, R. L. (1975). The analysis of permutations. Apl. Statist., 24, 193-202 1375