I Journal of Applied Statistics, Vol. 13, No. 2, 1986 Direct test of Harville's multi-entry competitions model on race-track betting data BRIAN McCULLOCH, Consultant, Touche Ross & Co., Auckland TONY VAN ZIJL, Director of Research, New Zealand Society of Accountants and Reader, Department of Accountancy, Victoria University of Wellington SUMMARY This paper reports on a direct test of Harville's multi-entry competitions probability model. The data set for the test relates to racetrack win and show betting in New Zealand during the 1981/82 horse racing season. The model was found to produce an estimator with a small negative bias but with considerable variability. 1 Introduction Because of the many points of correspondence between race track betting and investing on the stock exchange, financial economists have for some time shown a close interest in race track betting as a source of data for investigating attitudes to risk and informational efficiency of markets. (See, e.g. Figlewski, 1979; Asch et al., 1982; Bird & McCrae, 1983 and the references quoted therein). In an early paper within this body of research, Harville (1973) developed a general model for assigning probabilities to the various possible outcomes of any multi-entry competition. Harville then applied the model to estimating the probabilities of the possible finishing orders of a horse race. The model has been used in two recent studies of race track betting. Hausch et al. (1981), applied the model to North American data to test the feasibility of developing a profitable technical system for place and show betting, and Tuckwell (1981) employed the model to investigate apparent anomalies between win and show odds in Australian betting markets. Harville conducted an indirect test of the model by comparing estimated probabilities with observed frequencies and found that there was a reasonable degree of correspondence between the two sets. This was confirmed in the study by Hausch, Ziemba and Rubinstein. In contrast to Harville's approach, the present paper presents a direct test of the model. The test is based on a difference between the systems for determining show payoffs in North America and in New Zealand.
21 4 Brian McCulloch & Tony van Zijl 2 Methodology On North American race tracks, the payoffs per dollar, r,, bet to win, place and show are determined as follows: n r; = (qw C B:)/B: if horse i wins; 1=1 =O otherwise n q = 1 + (qp 1 Br- By - B;)/(2Bf) if horse i places with horse j; 1= I =O otherwise (2) q=l +(qs C B-q--Bk-B)/(3B) 1= I if horse i shows with horses j and k; (3) =O otherwise where B, is the total amount bet on horse i; 1 -q is the combined take and breakage; and the superscripts w, p, and s indicate win, place, and show bets respectively. The amounts bet to win on each horse relative to the total amount bet on the race may be regarded as implying a consensus opinion of the chance of a win performance by each horse. Therefore, if bettors are assumed to be risk neutral in the sense of all win bets having equal expected returns, then the probability, p;, of horse i winning may be determined as follows: Thus the probability of a win performance is easily determined from aggregate betting. However, the consensus view of the chance of a place or show performance cannot be determined in the same way. In these cases the payoffs depend on which of the other horses in the race also place/show. Harville therefore suggested a model for estimating place and show probabilities by use of the tote determined win probabilities. The estimating formulae are as follows: and If North American betting data is used, the only possible test of the model is to compare the estimated probabilities with observed frequencies. This was Harville's approach. A danger in this approach is that a possible lack of correspondence between the bettors consensus view and observed frequencies may be confounded in the
I I I Harville's multi-entry competitions model 2 15 comparison[l]. Using New Zealand betting data, however, it is possible to make a direct comparison between the probabilities estimated by the model and the bettors consensus view. In New Zealand, race track bettors do not have the opportunity for place betting but there are opportunities for win and show betting[2]. The per dollar payoff on a win bet is determined in the same way as in North America but the per dollar payoff on a show bet is determined as follows: q" (B;/3) 1= 1,.s= I if horse i shows; 1 B, I =O otherwise. n That is, the per dollar payoff to a show bet is independent of which other horses in the race also show. The consensus opinion of bettors, as revealed in aggregate betting, therefore offers a direct estimate of the probability of showing, namely, (7) 1 but Hence New Zealand racetrack betting data provides the opportunity for a direct test of the Harville model, simply via a comparison of and p;. The tote determined win and show probabilities were obtained from the observed tote payoffs[3] as follows: @=p;(rp- 1)+(1 -p;). - 1 =qw-1 (9) Similarly, To assess the estimating ability of the model, an error index, I, was then calculated for each starter as. follows (the procedure followed is similar to that of Walther (1982)):
216 Brian McCulloch & Tony van Zijl This implies that and hence the null hypothesis: p,=o is equivalent to the null hypothesis a=o and p= 1 in a regression on the stochastic generalisation of (16), namely, "=a+ph+e, (17) where ei is the usual error term in a least squares regression. 3 Data Data was collected on win and show betting for the racing season August 1, 1981 to July 31,1982. The data was obtained from New Zealand's two Sunday newspapers, the Sunday News and the New Zealand Times. The Sunday News reports for all the race meetings held on the previous day, the payoffs to successful starters. For a selected Northern meeting it also reports for the non-successful starters the payoffs shown on the tote board immediately prior to the running of each race. (Whenever there was more than one Northern meeting the newspaper reported the meeting offering the highest stakes.) The New Zealand Times reports similarly but the selected meeting is usually one that was held in the Central districts of the North Island or in the South Island. In combination the two newspapers provided complete payoff sets for 100 meetings. The data set was restricted to races for which there was one win payoff and three show payoffs. The reported payoffs were checked for errors by calculating for each race, the combined take and breakage implied by (12), and comparing this with the overall mean value. Races for which the implied value of q exceeded + 3a limits, were excluded from the data set. The tote board at many meetings shows $99 if the projected payoff is greater than or equal to $99. Hence all races for which a $99 payoff was reported were also excluded. In combination, the restrictions reduced the data set to 625 races involving 8245 starters. In addition to the payoffs, information on race distance, race class and stakes, track condition, and meeting date were also included in the data file. 4 Results The distribution of the error index, I, is presented in Table 1. I No. of Observations %
Harville's multi-entry competitions model 2 17 On average the Harville model under estimated the tote determined show probabilities by 4.6% and the variability of the distribution, as measured by the standard deviation, was 16.7%. Under estimation occurred in 63% of cases. The amount of the underestimation was quite small and its practical significance is reduced by the take[4]. For a take of 18.5%, underestimation of the probabilities by 4.6% results in the expected return from a $1 bet being underestimated by $0.038. The variability of I was however quite large in that 22% of the values exceeded 20. In terms of expected return this implies an error larger than $0.163 for approximately one bet in five[5]. The regression results based on equation (17) are shown in Table 2. For the null hypothesis: a=o and P= 1, the F statistic was 2213 and hence the null hypothesis is overwhelmingly rejected. Coefficient Standard error To determine the uniformity of the regression result throughout the data set, the set was randomly divided into two subsets of approximately equal size and the regression was repeated on each set. The estimates of a and P were found to be very similar and an analysis of covariance test on homogeneity of the regressions showed that the differences were not significant at the 1% level (see Johnston, 1972, pp. 192-207). The tests therefore indicated that the Harville model provides a biased estimator of the show probabilities. This result may be due to the weaknesses in the model, as noted by Harville, namely: (i) The model does not recognise that some horses generally either win or fail to show. (ii) The model does not take account of the' running times of horses. In addition, of course, the result may indicate that win bettors constitute a different group from show bettors, in that the two groups have different estimates of the relative abilities of horses and that there is no arbitrage between the two markets. Despite the above comments on the estimating ability of the model, the regression results show that there is a consistent relationship between Bs and E. Hence may be regarded as a good predictor of E. The estimated values of a and P suggest that, on average, A consistently underestimates p: for low values and overestimates p: for high values. This is confirmed by Table 3. The data set used to test the null hypothesis, p,=o, of course relates to a wide variety of different betting situations. It was therefore appropriate to test the sensitivity of I to variation in some of the variables that determine betting behaviour. The variables considered were race distance, field size, track conditions, race class, race stakes, and date of the meeting. Race distance, field size, and track condition are variables that are often quoted as being considered by bettors in their choice of bets (see, e.g. Vergin, 1977). Race class and stakes tend to determine the amount of publicly available information regarding prospects in a race. The main double races and other high stake races tend to attract the well known horses and they also receive more extensive coverage in the race form literature. Bettors can therefore be expected
218 Brian McCulloch & Tony van Zijl TABLE 3. Average P No. of Observations P P" I tl(h,:p-pko) to be better informed regarding such races. Meeting date may also be relevant. Horses are typically in work during only part of each racing season and therefore meeting date, as a proxy for changes in the population of starters, may also affect I. Using a dummy variable approach, the error index was regressed on each of the variables discussed above. Details are shown in Table 4. In such regressions, a test on the slope coefficient is of course equivalent to a two sample t test of equality of the means of the two populations formed by the dummy variable. The results of the regressions are reported, in this latter form, in Table 5. Variable D Race distance D= 1 if distance> 1600 m, else D=0 Field size D= 1 if field size>l4, else D=0 Track condition D= 1 if track heavy, soft or easy, else D=O Race class D= 1 if race double leg, else D=O Stakes D= 1 if stakes >$5,000, else D=0 Date D= 1 if month=august, September, April, May, June, July, else D=O In each case the underestimation is least for the D= 1 condition but the reduction in underestimation is statistically significant at the 0.05 level only for distance, field size, and date.
Hamille's multi-entry competitions model 219 TABLE 5. Sampling error Variable 1,- 5% 1% Race distance 0.76* 0.75 0.98 Field size 0.82* 0.73 0.96 Track condition 0.42 0.82 1.08 Race class 0.51 0.88 1.16 Stakes 0.06 0.82 1.08 Date 0.87* 0.72 0.95 *Significant at 5% level 5 Conclusion The results of this study indicate that the Harville model, on average, underestimates the show probabilities for the starters in a horse race. The amount of the underestimation was found to vary considerably but on average to be small. It was also found that the underestimation tends to be smaller for middle to long distance races, for large field sizes, and for the winter months. There are no obvious important differences between pari mutuel betting markets in New Zealand and in other countries. However the possible existences of such differences, must leave open to question, the extent to which the results of this study generalise to race track betting markets of other countries. Correspondence: Tony van Zijl, Department of Accountancy, Victoria University of Wellington, Wellington, New Zealand. REFERENCES ASCH, P., MALKIEL, B.G. & QUANDT, R.E. (1982) Racetrack betting and informed behaviour, Journal of Financial Economics, 10 (July 1982), pp. 187-194. BIRD, R. & MCCRAE, M. (1983) Battling the Books-The Australian Experience-Preliminary Results, Accounting Association of Australia and New Zealand, 1983 Conference, Griffith University, Brisbane. FIGLEWSKI, S. (1979) Subjective information and market efficiency in a betting market, Journal ofpolizica1 Economy, 87 (December), pp. 75-78. HARVILLE, D.A. (1973) Assigning probabilities to the outcomes of multi-entry competitions, Journal of the American Statistical Association, 68 (June 1973), pp. 312-316. HAUSCH, D.B., ZIEMBA, W.T. & RUBINSTEIN, M. (1981) Efficiency of the market for racetrack betting, Management Science, 27 (December), pp. 1435-1452. JOHNSTON, J. (1972) Economem'c Methods, 2nd edn (Kogakusha, Tokyo, McGraw-Hill). SNYDER, W.W. (1978) Horse racing: testing the efficient market hypothesis, Journal of Finance, 33 (September), pp. 1109-1 118. TUCKWELL, R.H. (1981) Anomalies in the Gambling Market, Australian Journal of Statistics, 23 (December), pp. 287-295. VERGIN, R.C. (1977) An investigation of decision rules for thoroughbred race horse wagering, Interfaces, 8 (November), pp. 34-45. WALTHER, L.M. (1982) A comparison of estimated and reported historical cost/constant dollar data, Accounting Review, 57 (April), pp. 376-383. NOTES [l] There is substantial empirical evidence from North American studies to indicate that such a lack of correspondence does exist for the tails of the win probability distribution. Strong win favourites tend to
220 Brian McCulloch & Tony van Zijl be underbet and longshots are overbet-see e.g., Snyder. Questions of how well the consensus views of bettors in New Zealand approximate observed frequencies, and the extent of exploitable systematic biases, will be investigated in later studies. [2] All legal betting in New Zealand takes place via the Government controlled Totalisator Agency Board. The Board offers opportunities for on and off course pari mumel betting and combines the two betting pools to determine the payoffs. Off course bettors have to rely on their own estimates of the payoffs but on course bettors can observe from the tote board the payoffs implied by aggregate betting. During the 1981/82 season, off course win and show betting ceased 20 minutes prior to the mnning of a race. [3] As in North America, the actual payouts are subject to breakage. The New Zealand system is that all payouts are initially subject to a take of 184 %, and are then rounded down to the nearest multiple of 5c. [41 E=p(r-l)+(l-p).-1=-.I85 pr=.815 E+AE=@+Ap)(r-l)+(l -p-ap).- 1 +AE=.815 @ P Appendix The betting terms used in this paper are as used in North America: Win bet -a bet on which there is a payout only if the chosen horse wins the race; Place bet-a bet on which there is a payout if the chosen horse finishes first or second; Show bet-a bet on which there is a payout if the chosen horse finishes first, second or third; Take -the amount deducted from the betting pool to pay for totalisator duty plus operating expenses and profits and levies which go to the racing clubs. In New Zealand this is 184 %. Breakage-the amount not paid out to bettors because payouts are rounded down to the nearest multiple of 5 cents. The North American show bet is actually referred to in New Zealand as a place bet but to avoid confusion with North American usage it has been referred to in this Paper as a show bet. The New Zealand market does not offer bettors North American place betting.