1 Do Frms Maxmze? Evdence from Professonal Football Davd Romer Unversty of Calforna, Berkeley and Natonal Bureau of Economc Research Ths paper examnes a sngle, narrow decson the choce on fourth down n the Natonal Football League between kckng and tryng for a frst down as a case study of the standard vew that competton n the goods, captal, and labor markets leads frms to make maxmzng choces. Play-by-play data and dynamc programmng are used to estmate the average payoffs to kckng and tryng for a frst down under dfferent crcumstances. Examnaton of actual decsons shows systematc, clear-cut, and overwhelmngly statstcally sgnfcant departures from the decsons that would maxmze teams chances of wnnng. Possble reasons for the departures are consdered. I. Introducton A central assumpton of most economc models s that agents maxmze smple objectve functons: consumers maxmze expected utlty, and frms maxmze expected profts. The argument for ths assumpton s not that t leads to perfect descrptons of behavor, but that t leads to reasonably good approxmatons n most cases. The assumpton that consumers successfully maxmze smple objectve functons frequently makes predctons about how a sngle ndvdual wll behave when confronted wth a specfc, easly descrbable decson. I am ndebted to Ben Allen, Laurel Beck, Sungmun Cho, Ryan Edwards, Maro Lopez, Peter Mandel, Travs Reynolds, Evan Rose, and Raymond Son for outstandng research assstance; to Chrstna Romer for nvaluable dscussons; to Steven Levtt and Rchard Thaler for mportant encouragement; and to numerous colleagues and correspondents for helpful comments and suggestons. An earler verson of the paper was ttled It s Fourth Down and What Does the Bellman Equaton Say? A Dynamc-Programmng Analyss of Football Strategy. [ Journal of Poltcal Economy, 2006, vol. 114, no. 2] 2006 by The Unversty of Chcago. All rghts reserved /2006/ $
2 do frms maxmze? 341 Thus t can often be tested n both the laboratory and the feld. The assumpton that frms maxmze profts s much more dffcult to test, however. Partcularly for large frms, the decsons are usually complcated and the data dffcult to obtan. But the a pror case for frm maxmzaton s much stronger than that for consumer maxmzaton. As Alchan (1950), Fredman (1953), Becker (1957), Fama (1980), and others explan, competton n the goods, captal, and labor markets creates strong forces drvng frms toward proft maxmzaton. A frm that fals to maxmze profts s lkely to be outcompeted by more effcent rvals or purchased by ndvduals who can obtan greater value from t by pursung dfferent strateges. And managers who fal to maxmze profts for the owners of ther frms are lkely to be fred and replaced by ones who do. Thus the case for frm maxmzaton rests much more on logcal argument than emprcal evdence. As Fredman puts t, unless the behavor of busnessmen n some way or other approxmated behavor consstent wth the maxmzaton of returns, t seems unlkely that they would reman n busness for long.... The process of natural selecton thus helps to valdate the hypothess [of return maxmzaton] (1953, 22). Ths paper takes a frst step toward testng the assumpton that frms maxmze profts by examnng a specfc strategc decson n professonal sports: the choce n football between kckng and tryng for a frst down on fourth down. Examnng strategc decsons n sports has two enormous advantages. Frst, n most cases, t s dffcult to thnk of any sgnfcant channel through whch strategc decsons are lkely to affect a team s profts other than through ther mpact on the team s probablty of wnnng. Thus the problem of maxmzng profts plausbly reduces to the much smpler problem of maxmzng the probablty of wnnng. Second, there are copous, detaled data descrbng the crcumstances teams face when they make these decsons. 1 The predctons of smple models of optmzaton appear especally lkely to hold n the case of fourth-down decsons n professonal football. There are three reasons. Frst, the market for the coaches who make these decsons s ntensvely compettve. Salares average roughly $3 mllon per year, and annual turnover exceeds 20 percent. 2 Second, wnnng s valued enormously (as shown by the very hgh salares commanded by hgh-qualty players). And thrd, the decsons are unusually amenable to learnng and mtaton: the decsons arse repeatedly, and 1 Thaler (2000) stresses the potental value of sports decson makng n testng the hypothess of frm optmzaton. 2 The salary fgure s based on the 23 coaches (out of 32) for whom 2004 salary nformaton could be obtaned from publcly avalable sources. The turnover data pertan to
3 342 journal of poltcal economy nformaton about others decsons s readly avalable. Thus a falure of maxmzaton n ths settng would be partcularly strkng. Ths paper shows, however, that teams choces on fourth downs depart n a way that s systematc and overwhelmngly statstcally sgnfcant from the choces that would maxmze ther chances of wnnng. One case n whch the departure s partcularly strkng and relatvely easy to see arses when a team faces fourth down and goal on ts opponent s 2-yard lne early n the game. 3 In ths stuaton, attemptng a feld goal s vrtually certan to produce 3 ponts, whle tryng for a touchdown has about a three-sevenths chance of producng 7 ponts. The two choces thus have essentally the same expected mmedate payoff. But f the team tres for a touchdown and fals, ts opponent typcally gans possesson of the ball on the 2-yard lne; f the team scores a touchdown or a feld goal, on the other hand, the opponent returns a kckoff, whch s consderably better for t. Thus tryng for a touchdown on average leaves the opponent n consderably worse feld poston. I show later that ratonal rsk averson about ponts scored, concern about momentum, and other complcatons do not notceably affect the case for tryng for a touchdown. As a result, my estmates mply that the team should be ndfferent between the two choces f the probablty of scorng a touchdown s about 18 percent. They also mply that tryng for a touchdown rather than a feld goal would ncrease the team s chances of wnnng the game by about three percentage ponts, whch s very large for a sngle play. In fact, however, teams attempted a feld goal all nne tmes n my sample they were n ths poston. Analyzng the choce between kckng and tryng for a frst down or touchdown n other cases s more complcated: the mmedate expected payoffs may be dfferent under the two choces, and the attractveness of the dstrbutons of ball possesson and feld poston may be dffcult to compare. Fortunately, however, the problem can be analyzed usng dynamc programmng. The choce between kckng and gong for t leads to an mmedate payoff n terms of ponts (whch may be zero) and to one team havng a frst down somewhere on the feld. That frst down leads to addtonal scorng (whch agan may be zero) and to another possesson and frst down. And so on. Secton II of the paper therefore uses data from over 700 Natonal Football League (NFL) games to estmate the values of frst downs at each pont on the feld (as well as the value of kckng off). To avod the complcatons ntroduced when one team s well ahead or when the end of a half s approachng, I focus on the frst quarter. Secton III uses the results of ths analyss to examne fourth-down decsons over the entre feld. To estmate the value of kckng, I use 3 The Appendx summarzes the rules of football that are relevant to the paper.
4 do frms maxmze? 343 the outcomes of actual feld goal attempts and punts. Decsons to go for t on fourth down (.e., not to kck) are suffcently rare, however, that they cannot be used to estmate the value of tryng for a frst down or touchdown. I therefore use the outcomes of thrd-down plays nstead. I then compare the values of kckng and gong for t to determne whch decson s better on average as a functon of where the team s on the feld and the number of yards t needs for a frst down or touchdown. Fnally, I compare the results of ths analyss wth teams actual choces. I fnd that teams choces are far more conservatve than the ones that would maxmze ther chances of wnnng. Secton IV consders varous possble complcatons and bases and fnds that none change the basc conclusons. Secton V consders the results quanttatve mplcatons. Because the analyss concerns only a small fracton of plays, t mples that dfferent choces on those plays could have only a modest mpact on a team s chances of wnnng. But t also mples that there are crcumstances n whch teams essentally always kck even though the case for gong for t s clear-cut and the benefts of gong for t are substantal. Fnally, Secton VI dscusses the results broader mplcatons. The hypothess that frms maxmze smple objectve functons could fal as a result of ether the pursut of a dfferent, more complex objectve functon or a falure of maxmzaton. I dscuss how ether of these possbltes mght arse and how one mght be able to dstngush between them. 4 II. The Values of Dfferent Stuatons A. Framework The dynamc-programmng analyss focuses on 101 stuatons: a frst down and 10 on each yard lne from a team s 1 to ts opponent s 10, a frst and goal on each yard lne from the opponent s 9 to ts 1, a kckoff from the team s 30 (followng a feld goal or touchdown, or at the begnnng of the game), and a kckoff from ts 20 (followng a safety). Let V denote the value of stuaton. Specfcally, V s the expected longrun value, begnnng n stuaton, of the dfference between the ponts scored by the team wth the ball and ts opponent when the two teams are evenly matched, average NFL teams. 4 Two recent papers that apply economc tools to sports strategy and n dong so use sports data to test hypotheses about maxmzaton are the study of serves n tenns by Walker and Wooders (2001) and the study of penalty kcks n soccer by Chappor, Levtt, and Groseclose (2002). In contrast to ths paper, these papers fnd no evdence of large departures from optmal strateges. Carter and Machol (1971, 1978) and Carroll, Palmer, and Thorn (1998, chap. 10) are more closely related to ths paper. I dscuss how my analyss s related to these studes below.
5 344 journal of poltcal economy By descrbng the values of stuatons n terms of expected pont dfferences, I am mplctly assumng that a team that wants to maxmze ts chances of wnnng should be rsk-neutral over ponts scored. Although ths s clearly not a good assumpton late n a game, I show n Secton IV that t s an excellent approxmaton for the early part. For that reason, I focus on the frst quarter. Focusng on the frst quarter has a second advantage: t makes t reasonable to neglect effects nvolvng the end of a half. Because play n the second quarter begns at the pont where the frst quarter ended, the value of a gven stuaton n the frst quarter almost certanly does not vary greatly wth the tme remanng. Let g ndex games and t ndex stuatons wthn a game. Let D gt be a dummy that equals one f the tth stuaton n game g s a stuaton of type. For example, suppose that p 100 denotes a kckoff from one s ; then, snce all games begn wth a kckoff, Dg1 p 1 for all g and Dg1 p 0 for all g and for all ( 100. Let Pgt denote the net ponts scored by the team wth the ball n stuaton g,t before the next stuaton. That s, P gt s the number of ponts scored by the team wth the ball mnus the number scored by ts opponent. Fnally, let B gt be a dummy that equals one f the team wth the ball n stuaton g,t also has the ball n stuaton g, t 1 and that equals mnus one f the other team has the ball n stuaton g, t 1. The realzed value of stuaton g,t as of one stuaton later has two components. The frst s the net ponts the team wth the ball scores before the next stuaton, P gt. The second s the value of the new stuaton. If the same team has the ball n that stuaton, ths value s smply the V correspondng to the new stuaton. If the other team has the ball, ths value s mnus the V correspondng to the new stuaton (snce the value of a stuaton to the team wthout the ball s equal and opposte to the value of the stuaton to ts opponent). In terms of the notaton just ntroduced, the value of stuaton g, t 1 to the team wth the ball n stuaton g,t s Bgt Dgt 1V. The value of stuaton g,t as of that stuaton must equal the expectaton of the stuaton s realzed value one stuaton later. We can wrte the value of stuaton g,t as DV gt. Thus we have [ ] DVp gt EP gt Bgt Dgt 1V, (1) where the expectaton s condtonal on stuaton g, t. Now defne as the dfference between the realzed value of stuaton e gt
6 do frms maxmze? 345 g,t one stuaton later and the expectaton of the realzed value condtonal on beng n stuaton g,t: [ ] [ ] egt p Pgt Bgt DgtV EP gt Bgt Dgt 1V. By constructon, egt s uncorrelated wth each of the Dgt s. If e were cor- related wth a D, ths would mean that when teams were n stuaton, the realzed value one stuaton later would dffer systematcally from V; but ths would contradct the defnton of V. Usng ths defnton of, we can rewrte (1) as e gt DVp gt Pgt Bgt Dgt 1V e gt, (2) or Pgt p V(D gt BgtD gt 1) e gt. (3) To thnk about estmatng the V s, defne Xgt p Dgt BgtDgt 1. Then (3) becomes Pgt p VX gt e gt. (4) Ths formulaton suggests regressng P on the X s. But e may be correlated wth the X s. Specfcally, e gt s lkely to be correlated wth the BgtDgt 1 terms of the Xgt s. Recall, however, that egt s uncorrelated wth the Dgt s. Thus the Dgt s are legtmate nstruments for the Xgt s. Further, snce they enter nto the X gt s, they are almost surely correlated wth them. We can therefore estmate (4) by nstrumental varables, usng the D s as the nstruments. 5 gt There s one fnal ssue. There are 101 V s to estmate. Even wth a large amount of data, the estmates of the V s wll be nosy. But the value of a frst down s almost certanly a smooth functon of a team s poston on the feld. Thus forcng the estmates of the V s to be smooth wll mprove the precson of the estmates whle ntroducng mnmal bas. I therefore requre the estmated V s to be a quadratc splne as a functon of the team s poston on the feld, wth knot ponts at both 9-, 17-, and 33-yard lnes and at the 50. I do not mpose any restrctons 5 There s another way of descrbng the estmaton of the V s. Begn wth an ntal set of V s (such as V p 0 for all ). Now for each, compute the mean of the realzed values of all stuatons of type one stuaton later usng the assumed V s and the actual Pgt s. Repeat the process usng the new V s as an nput, and terate untl the process converges. One can show that ths procedure produces results that are numercally dentcal to those of the nstrumental varables approach.
7 346 journal of poltcal economy Fg. 1. The estmated value of stuatons (sold lne) and two-standard-error bands (dotted lnes). The estmated value of a kckoff s 0.62 (standard error 0.04); the estmated value of a free kck s 1.21 (standard error 0.51). on the two estmated V s for kckoffs. Ths reduces the effectve number of parameters to be estmated from 101 to B. Data and Results Play-by-play accounts of vrtually all regular-season NFL games for the 1998, 1999, and 2000 seasons were downloaded from the NFL Web ste, nfl.com. 7 Snce I focus on strategy n the frst quarter, I use data only from frst quarters to estmate the V s. These data yeld 11,112 frstquarter stuatons. By far the most common are a kckoff from one s 30-yard lne (1,851 cases) and a frst and 10 on one s 20 (557 cases). Because 98.4 percent of extra-pont attempts were successful n ths perod, all touchdowns are counted as ponts. Fgure 1 reports the results of the nstrumental varables estmaton. It plots the estmated V for a frst and 10 (or frst and goal) as a functon of the team s poston on the feld, together wth the two-standard-error bands. The estmated value of a frst and 10 on one s 1-yard lne s 1.6 ponts. The V s rse farly steeply from the 1, reachng zero at about the 15. That s, the estmates mply that a team should be ndfferent between 6 Carter and Machol (1971) also use a recursve approach to estmate pont values of frst downs at dfferent postons on the feld, usng a consderably smaller sample from There are two man dfferences from my approach. Frst, they arbtrarly assgn a value of zero to kckoffs and free kcks. Second, they dvde the feld nto 10-yard ntervals and estmate the average value for each nterval. 7 Data for two games n 1999 and two games n 2000 were mssng from the Web ste.
8 do frms maxmze? 347 a frst and 10 on ts 15 and havng ts opponent n the same stuaton. The V s ncrease approxmately lnearly after the 15, rsng a pont roughly every 18 yards. The value of a frst and 10 equals the value of recevng a kckoff from the ponts around the 27-yard lne. That s, recevng a kckoff s on average as valuable as a frst and 10 on one s 27. Fnally, the V s begn to ncrease more rapdly around the opponent s 10. The estmated value of a frst and goal on the 1 s 5.55 ponts; ths s about the same as the value of an 80 percent chance of a touchdown and a 20 percent chance of a feld goal. The V s are estmated relatvely precsely: except n the vcnty of the goal lnes, ther standard errors are less than 0.1. III. Kckng versus Gong for It Ths secton uses the results of Secton II to analyze the choce between kckng and gong for t on fourth down. The analyss proceeds n four steps. The frst two estmate the values of kckng and gong for t n dfferent crcumstances. The thrd compares the two choces to determne whch s on average better as a functon of the team s poston on the feld and ts dstance from a frst down. The fnal step examnes teams actual decsons. A. Kckng If one neglects the ssue of smoothng the estmates, analyzng the value of kcks s straghtforward. To estmate the value of a kck from a partcular yard lne, one smply averages the realzed values of the kcks from that yard lne as of the subsequent stuaton (where stuaton s defned as before). Ths realzed value has two components, the net ponts scored before the next stuaton and the next stuaton s value. In contrast to the prevous secton, there s no need for nstrumental varables estmaton. I constran the estmated values of kcks to be smooth n the same way as before, wth one modfcaton. Teams choces between puntng and attemptng a feld goal change rapdly around ther opponents 35- yard lne. Snce one would expect the level but not the slope of the value of kckng as a functon of the yard lne to be contnuous where teams swtch from punts to feld goal attempts, I do not mpose the slope restrcton at the opponent s 33. And ndeed, the estmates reveal a substantal knk at ths knot pont. The data consst of all kcks n the frst quarters of games. Snce what we need to know s the value of decdng to kck, I nclude not just
9 348 journal of poltcal economy actual punts and feld goal attempts, but blocked and muffed kcks and kcks nullfed by penaltes. There are 2,560 observatons. 8 The results are reported n fgure 2. Fgure 2a shows the estmated value of kckng as a functon of the team s poston on the feld. Fgure 2b plots the dfference between the estmated value of a kck and of the other team havng a frst down on the spot. From the team s 10-yard lne to mdfeld, ths dfference s farly steady at around 2.1 ponts, whch corresponds to a punt of about 38 yards. It dps down n the dead zone around the opponent s 35-yard lne, where a feld goal s unlkely to succeed and a punt s lkely to produce lttle yardage. It reaches a low of 1.5 (a punt of only 25 yards) at the 33 and then rses to 2.2 at the 21. As the team gets closer to the goal lne, the probablty of a successful feld goal rses lttle, but the value of leavng the opponent wth the ball rses consderably. The dfference between the values of kckng and of the opponent recevng the ball therefore falls, reachng 0.7 at the 1. The estmates are relatvely precse: the standard error of the dfference n values s typcally about B. Gong for It The analyss of the value of tryng for a frst down or touchdown parallels the analyss of kckng. There are two dfferences. Frst, because teams rarely go for t on fourth down, I use thrd-down plays nstead. That s, I fnd what thrd-down plays realzed values as of the next stuaton would have been f the plays had taken place on fourth down. Second, the value of gong for t depends not only on the team s poston on the feld, but also on the number of yards to go for a frst down or touchdown. If there were no need to smooth the estmates, 8 There are several mnor ssues nvolvng the data. Frst, fourth-down plays that are blown dead before the snap and for whch the play-by-play account does not say whether the kckng squad was sent n are excluded. Snce such plays are also excluded from the analyss of the decson to go for t, ths excluson should generate lttle bas. Second, t s not clear whether fake kcks should be ncluded; t depends on whether one wants to estmate the value of decdng to kck or the value of lnng up to kck. There are only fve fake kcks n the sample, however, and the results are vrtually unaffected by whether they are ncluded. The results n the text nclude fakes. Fnally, snce teams occasonally obtan frst downs on kckng plays (prmarly through penaltes), the value of a kck s affected by the number of yards the team has to go for a frst down. But there are only sx kckng plays n the sample on whch the team had 5 yards to go or less and moved the ball 5 yards or less and obtaned a frst down. Thus to mprove the precson of the estmates, I do not let the estmated value of kcks vary wth the number of yards needed for a frst down. 9 The standard errors account for the fact that the V s used to estmate the values of kcks are themselves estmated. Ths calculaton s performed under the assumpton that the dfferences between the realzed and expected values of kcks are uncorrelated wth the errors n estmatng the V s. Although ths assumpton wll not be strctly correct, t s almost certanly an excellent approxmaton.
10 Fg. 2. a, The estmated value of kcks. b, The estmated value of the dfference between the values of kcks and of turnng the ball over. The dotted lnes show the two-standarderror bands.
11 350 journal of poltcal economy one could use averages to estmate the value of gong for t for a specfc poston and number of yards to go. That s, one could consder all cases n whch the correspondng crcumstance occurred on thrd down, fnd what the plays realzed values would have been f they had been fourth-down plays, and average the values. In fact, however, there are over a thousand dfferent cases n the sample. Smoothng the estmates s therefore essental. To smooth the estmates, I focus on the dfference between the values of gong for t and of turnng the ball over on the spot rather than estmatng the value of gong for t drectly. In general, ths dfference depends on three factors. The frst s the dfference between the values of havng a frst down on the spot and of the other team havng a frst down there. Snce the V s are essentally symmetrc around the 50-yard lne, ths factor s essentally ndependent of the team s poston on the feld. The second factor s the probablty that the team succeeds when t goes for t. As long as the team s not close to ts opponent s goal lne, there s no reason for ths probablty to vary greatly wth the team s poston. The thrd (and least mportant) factor s the average addtonal beneft from the yards the team gans when t goes for t. Agan, as long as the team s not close to the opponent s goal lne, there s no reason for ths factor to vary substantally wth ts poston. Close to the opponent s goal lne, however, the team has less room to work wth, and so ts chances of success and average number of yards ganed are lkely to be lower. On the other hand, because the value of a touchdown s much larger than the value of a frst down on the 1, the addtonal beneft from ganng yards may be hgher. Thus near the goal lne, we cannot be confdent that the dfference between the values of gong for t and of turnng the ball over does not vary substantally wth the team s poston. The dfference between the values of gong for t and of turnng the ball over on the spot s Gy ( V ), or Gy V, where Gy denotes the value of gong for t on yard lne wth y yards to go and denotes the yard lne opposte yard lne. From the team s goal lne to the opponent s 17, I assume that ths dfference s ndependent of and quadratc n y: 2 Gy V p a 0 a1y a2y. (5) From the opponent s 17 to ts goal lne, I let the dfference depend quadratcally on both and y: G V p b by b b y b y b b y b y y b8y. (6) At the 17, where the two functons meet, I constran both ther level
12 do frms maxmze? 351 Fg. 3. The estmated dfference between the values of gong for t and of the other team havng the ball on the spot at a generc yard lne outsde the opponent s 17 (sold lne) and at the opponent s 5 (dashed lne). The dotted lnes show the two-standard-error bands. and ther dervatve wth respect to to be equal for all y. Ths creates sx restrctons. The data consst of all thrd-down plays n the frst quarter; there are 4,733 observatons. 10 Fgure 3 summarzes the results. The sold lne shows the estmates of Gy V as a functon of y for a generc poston on the feld not nsde the opponent s 17, and the dashed lne shows the estmates at the opponent s 5. Outsde the opponent s 17, the estmate of Gy V for a team facng fourth and 1 s On thrd-and- 1 plays from the goal lne to the opponent s 17, teams are successful 64 percent of the tme, and they gan an average of 3.8 yards; ths corresponds to an expected value of 2.66 ponts. 11 Thus the estmate of 2.64 s reasonable. The estmated dfference falls roughly lnearly wth the number of yards to go. It s 2.05 wth 5 yards to go (equvalent to a 45 percent chance of success and an average gan of 6.3 yards), 1.49 wth 10 yards to go (a 30 percent chance of success and an average gan of 6.6 yards), and 1.08 wth 15 yards to go (an 18 percent chance of 10 To parallel the analyss of kckng, plays that are blown dead before the snap for whch t would not have been possble to determne whether the kckng team had been sent n are excluded (see n. 8). And to prevent outlers that are not relevant to decsons about gong for t from affectng the results, plays on whch the team had more than 20 yards to go are excluded. 11 The translatons of average outcomes nto pont values n ths paragraph are done for a team at mdfeld. Snce the V s are not exactly symmetrc around the 50 or exactly lnear, choosng a dfferent poston would change the calculatons slghtly.
13 352 journal of poltcal economy success and an average gan of 7.7 yards). These estmates are smlar to what one would obtan smply by lookng at the average results of the correspondng types of plays. At the opponent s 5, the estmate of Gy V wth 1 yard to go s 2.94 (equvalent to a 38 percent chance of a frst down wth an average gan of 2 yards plus a 25 percent chance of a touchdown), whch s slghtly hgher than the estmate elsewhere on the feld. The estmate falls more rapdly wth the number of yards to go than elsewhere on the feld, however. Wth 5 yards to go, t s 1.42 (equvalent to a 26 percent chance of a touchdown). The estmate for 5 yards to go s qute smlar to what one would obtan by lookng at averages; the estmate for 1 yard to go s somewhat hgher, however. The dotted lnes show the two-standard-error bands. For the range n whch Gy V s constraned to be ndependent of, the standard errors are small: for 15 yards to go or less, they are less than 0.1. Insde the 17, where fewer observatons are beng used, they are larger, but stll typcally less than 0.2. C. Recommended Choces Fgure 4 combnes the analyses of kckng and gong for t by showng the number of yards to go where the estmated average payoffs to the two choces are equal as a functon of the team s poston. On the team s own half of the feld, gong for t s better on average f there s less than about 4 yards to go. After mdfeld, the gan from kckng falls, and so the crtcal value rses. It s 6.5 yards at the opponent s 45 and peaks at 9.8 on the opponent s 33. As the team gets nto feld goal range, the crtcal value falls rapdly; ts lowest pont s 4.0 yards on the 21. Thereafter, the value of kckng changes lttle whle the value of gong for t rses. As a result, the crtcal value rses agan. The analyss mples that once a team reaches ts opponent s 5, t s always better off on average gong for t. The two dotted lnes n the fgure show the two-standard-error bands for the crtcal values. 12 The crtcal values are estmated farly precsely. Although these fndngs contradct the conventonal wsdom, they are qute ntutve. As descrbed n Secton I, one case for whch the ntuton s clear s fourth and goal on the 2. The expected payoffs n terms of mmedate ponts to the two choces are very smlar, but tryng for a touchdown on average leaves the other team n consderably worse feld poston. Another farly ntutve case s fourth and 3 or 4 on the 50. If the team goes for a frst down, t has about a chance of success; 12 For example, the lower dotted lne shows the pont where the dfference between the estmated values of gong for t and kckng s twce ts standard error.
14 Fg. 4. The number of yards to go where the estmated values of kckng and gong for t are equal (sold lne) and two-standard-error bands (dotted lnes), and the greatest number of yards to go such that when teams have that many yards to go or less, they go for t at least as often as they kck (dashed lne).
15 354 journal of poltcal economy thus both the team and ts opponent have about a 50 percent chance of a frst and 10. But the team wll gan an average of about 6 yards on the fourth-down play; thus on average t s better off than ts opponent f t goes for t. If the team punts, ts opponent on average wll end up wth a frst and 10 around ts 14. Both standard vews about football and the analyss n Secton II suggest that the team and ts opponent are about equally well off n ths stuaton. Thus, on average the team s better off than ts opponent f t goes for a frst down, but not f t punts. Gong for the frst down s therefore preferable on average. The very hgh crtcal values n the dead zone also have an ntutve explanaton. The chances of makng a frst down declne only moderately as the number of yards to go ncreases. For example, away from the opponent s end zone, the chance of makng a frst down or touchdown on thrd down s 64 percent wth 1 yard to go, 44 percent wth 5 yards to go, and 34 percent wth 10 yards to go. As a result, the large decrease n the gan from kckng n the dead zone causes a large ncrease n the crtcal value. D. Actual Choces Teams actual choces are dramatcally more conservatve than those recommended by the dynamc-programmng analyss. On the 1,604 fourth downs n the sample for whch the analyss mples that teams are on average better off kckng, they went for t only nne tmes. But on the 1,068 fourth downs for whch the analyss mples that teams are on average better off gong for t, they kcked 959 tmes. 13 The dashed lne n fgure 4 summarzes teams choces. It shows, for each pont on the feld, the largest number of yards to go wth the property that when teams have that many yards to go or less, they go for t at least as often as they kck. Over most of the feld, teams usually kck even wth only 1 yard to go. Teams are slghtly more aggressve n the dead zone, but are stll far less aggressve than the dynamcprogrammng analyss suggests. On the lne summarzng teams choces, the null hypothess that the average values of kckng and gong for t are equal s typcally rejected wth a t-statstc between three and seven These fgures exclude the 28 cases for whch we cannot observe the team s ntent because of a penalty before the snap. 14 Carter and Machol (1978) and Carroll et al. (1998, chap. 10) also examne fourthdown decsons. Carter and Machol consder only decsons nsde the opponent s 35-yard lne. They use estmates from ther earler work (descrbed n n. 6 above) to assgn values to dfferent stuatons. To estmate the payoff from gong for t, they pool thrd-down and fourth-down plays. They assume that all successful plays produce exactly the yards needed for a frst down, that all unsuccessful plays produce no yards, and that the probablty of success does not depend on the team s poston on the feld. They then compare the estmated payoffs to gong for t wth the payoffs to feld goal attempts and punts. They
16 do frms maxmze? 355 IV. Complcatons A. Ratonal Rsk Averson I have assumed that a wn-maxmzng team should be rsk-neutral concernng ponts scored. Ths s clearly not exactly correct. The analyss may therefore overstate the value of a touchdown relatve to a feld goal, and thus overstate the benefts of gong for t on fourth down. Three consderatons suggest that ths effect s not mportant. Frst, as I show n Secton V, teams are conservatve even n stuatons n whch wn-maxmzng behavor would be rsk-lovng over ponts scored. Second, t s essentally rrelevant to decsons n the mddle of the feld. Near mdfeld, a team should maxmze the probablty that t s the frst to get close to the opponent s goal lne, snce that s necessary for ether a feld goal or a touchdown. But teams are conservatve over the entre feld. Thrd, drect evdence about the mpact of ponts on the probablty of wnnng suggests that rsk neutralty s an excellent approxmaton for the early part of the game. Because teams adjust ther play late n the game on the bass of the score, one cannot just look at the dstrbuton of actual wnnng margns. Instead, I try to approxmate what the dstrbuton of wnnng margns would be n the absence of lategame adjustments and use ths to estmate the value of a feld goal or touchdown early n the game. I begn by dvdng the games nto decles accordng to the pont spread. I then fnd the score for the favorte and the underdog at the end of the frst half; the dea here s that these scores are relatvely unaffected by adjustments n response to the score. I then construct synthetc fnal scores by combnng the frst-half scores of each par of games wthn a decle. Ths yelds a total of 74(73)/2 or 73(72)/2 synthetc games for each decle, for a total of 26,718 observatons. I use the results to estmate the mpact of an addtonal feld goal or touchdown n the frst quarter. For example, the estmated effect of a feld goal on the probablty of wnnng s the sum of the probablty that a team would tral by 1 or 2 ponts at the end of the game plus half the probablty that the score would be ted or the team would tral by 3 ponts. conclude that teams should be consderably more aggressve than they are. Carroll et al. consder decsons over the entre feld. They do not spell out ther method for estmatng the values of dfferent stuatons (though t appears related to Carter and Machol s), and t yelds mplausble results. Smlarly to Carter and Machol, they pool thrd-down and fourth-down plays and assume that successful plays produce one more yard than needed for a frst down, that unsuccessful plays yeld no gan, and that the chances of success do not vary wth feld poston. They agan conclude that teams should be consderably more aggressve. Ther specfc fndngs about when gong for t s preferable on average are qute dfferent from mne, however. Fnally, nether Carter and Machol nor Carroll et al. nvestgate the statstcal sgnfcance of ther results.
17 356 journal of poltcal economy Ths exercse suggests that 7 ponts are n fact slghtly more than seventhrds as valuable as 3. An addtonal 3 ponts are estmated to rase the probablty of wnnng by 6.8 percentage ponts; an addtonal 7 ponts are estmated to rase the probablty by 16.2 percentage ponts, or 2.40 tmes as much. The source of ths result s that the dstrbuton of synthetc margns s consderably hgher at 4 and 7 ponts than at 1 or 2. To put t dfferently, to some extent what s mportant about a touchdown s not that ts usual value s 7 ponts, but that ts usual value s between two and three tmes the value of a feld goal. B. Thrd Down versus Fourth Down There are two ways to nvestgate the approprateness of usng thrddown plays to gauge what would happen f teams went for t on fourth down. The frst s to consder how teams ncentves are lkely to affect outcomes on fourth downs relatve to thrd downs. Relatve payoffs to dfferent outcomes are dfferent on the two downs. In partcular, the beneft from a long gan relatve to just makng a frst down s smaller on fourth down. As a result, both the offense and defense wll behave dfferently: the offense wll be wllng to lower ts chances of makng a long gan n order to ncrease ts chances of just makng a frst down, and the defense wll be wllng to do the reverse. Ths suggests that the drecton of the bas from usng thrd-down plays should depend on whch team has more nfluence on the dstrbuton of outcomes. Snce t seems unlkely that the defense has substantally more nfluence than the offense on the dstrbuton of outcomes, t follows that the use of thrd downs s unlkely to lead to substantal overestmates of the value of gong for t. More mportant, the relatve payoffs to dfferent outcomes do not dffer greatly between thrd and fourth downs. For example, consder a team that s on ts 30 and needs 2 yards for a frst down. On thrd down (under the realstc assumpton that the team wll punt f t fals to make a frst down), the beneft of ganng 15 yards rather than none s 1.4 tmes as large as the beneft of ganng 2 yards rather than none. On fourth down, the beneft of ganng 15 yards rather than none s 1.2 tmes as large as the beneft of ganng 2 yards rather than none. Thus one would not expect ether sde to behave very dfferently on the two downs. And when a team has goal to go, the payoff on ether thrd down or fourth down depends almost entrely on whether the team scores a touchdown. Thus one would expect both sdes behavor to be essentally the same on the two downs. These consderatons suggest that any bas from the use of thrd-down plays s lkely to be small. The second approach s to drectly compare the realzed values of plays where teams went for t on fourth downs (.e., the mmedate ponts
18 do frms maxmze? 357 scored plus the value of the resultng feld poston) wth what one would expect on the bass of the analyss of thrd downs. Ths comparson s potentally problematc, however, for two reasons. Frst, as descrbed above, teams went for t only 118 tmes n the sample. Second, tmes when teams choose to go for t may be unusual: the teams may know that they are partcularly lkely to succeed, or they may be desperate. To ncrease the sample wthout brngng n fourth-down attempts that are lkely to be especally unusual, I nclude the entre game except for the last two mnutes of each half (and overtmes). Ths ncreases the sample to 1,338 plays. And as a partal remedy for the second problem, I experment wth controllng for the amount the team wth the ball s tralng by and the amount t s favored by. The results suggest that fourth downs are vrtually ndstngushable from thrd downs. The mean of the dfference between the realzed value of the fourth-down attempts and what s predcted by the analyss of thrd downs s (wth a standard error of 0.7), whch s essentally zero. When controls for the pror pont spread and the current pont dfferental are ncluded, the coeffcent falls to and remans hghly nsgnfcant. The pont estmate corresponds to the probablty of success beng one percentage pont lower on fourth downs than on thrd downs, whch would have almost no mpact on the analyss. C. Addtonal Informaton In makng fourth-down decsons, a team has more nformaton than the averages used n the dynamc-programmng analyss. Thus t would not be optmal for t to follow the recommendatons of the dynamcprogrammng analyss mechancally. Addtonal nformaton cannot, however, account for the large systematc departures from the recommendatons of the dynamcprogrammng analyss. Over wde ranges, teams almost always kck n crcumstances n whch the analyss mples that they would be better off on average gong for t. For example, on the 512 fourth downs n the sample n the offense s half of the feld for whch the dynamcprogrammng analyss suggests gong for t, teams went for t only seven tmes. Smlarly, on the 175 fourth downs wth 5 or more yards to go for whch the analyss suggests gong for t, teams went for t only 13 tmes. Addtonal nformaton can account for ths behavor only f teams know on a large majorty of fourth downs that the expected payoff to gong for t relatve to kckng s consderably less than average, and know on the remander that the expected payoff s dramatcally larger than average. Ths possblty s not at all plausble. Further, t predcts that when teams choose to go for t, the results wll be far better than
19 358 journal of poltcal economy one would expect on the bass of averages. As descrbed above, ths predcton s contradcted by the data. D. Momentum Falng on fourth down could be costly to a team s chances of wnnng not just through ts effect on possesson and feld poston, but also through ts effect on energy and emotons. Thus t mght be more costly for the other team to have the ball as a result of a faled fourth-down attempt than for t to have the ball at the same place n the course of a normal drve or because of a punt. The analyss mght therefore overstate the average payoff to gong for t. There are two reasons to be skeptcal of ths possblty. Frst, the same reasonng suggests that there could be a motvatonal beneft to succeedng on fourth down, and thus that the analyss could understate the benefts of a successful fourth-down attempt. Second, studes of momentum n other sports have found at most small momentum effects (e.g., Glovch, Vallone, and Tversky 1985; Albrght 1993; Klaassen and Magnus 2001). More mportant, t s possble to obtan drect evdence about whether outcomes dffer systematcally from normal after plays whose outcomes are ether very bad or very good. To obtan a reasonable sample sze, for very bad plays I consder all cases n whch from one stuaton to the next (where a stuaton s defned as before), possesson changed and the ball advanced less than 10 yards. For very good plays, I consder all cases n whch the offense scored a touchdown. These crtera yeld 636 very bad plays and 628 very good plays. I then examne what happens from the stuaton mmedately followng the extreme play to the next stuaton, from that stuaton to the next, and from that stuaton to the subsequent one. In each case, I ask whether the realzed values of these stuatons one stuaton later dffer systematcally from the V s for those stuatons. That s, I look at the means of the relevant e gt s (always com- puted from the perspectve of the team that had the ball before the very bad or very good play). The results provde no evdence of momentum effects. All the pont estmates are small and hghly nsgnfcant; the largest t-statstc (n absolute value) s less than 1.3. Moreover, the largest pont estmate (agan n absolute value) goes the wrong drecton from the pont of vew of the momentum hypothess: from the stuaton mmedately followng a very bad play to the next, the team that lost possesson does somewhat better than average The workng paper verson of the paper (Romer 2005) consders two addtonal complcatons. The frst s the possblty of sample selecton bas n the estmaton of the V s
20 do frms maxmze? 359 V. Quanttatve Implcatons An obvous queston s whether the potental gans from dfferent choces are mportant. There are n fact two dstnct questons. The frst s whether there are cases of clear-cut departures from wn maxmzaton. If there were not, then small changes n the analyss mght reverse the conclusons. The answer s that there are clear-cut departures. One example s the case of fourth and goal on the 2 dscussed above. The estmates mply that tryng for a touchdown and falng s only slghtly worse than kckng a feld goal. As a result, they mply that gong for a touchdown s preferable on average as long as the probablty of success s at least 18 percent. The actual probablty of success, n contrast, s about 45 percent. Thus there are no plausble changes n the analyss that could reverse the concluson that tryng for a touchdown s preferable on average. Moreover, the average beneft of tryng for a touchdown s substantal. The estmated value of gong for t s about 3.7 ponts, whereas the estmated value of kckng s about 2.4 ponts. Snce each addtonal pont rases the probablty of wnnng by about 2.3 percentage ponts, tryng for a touchdown on average ncreases the chances of wnnng by about three percentage ponts. Yet teams attempted a feld goal every tme n the sample they were n ths poston. Two other examples are fourth and goal on the 1 and fourth and 1 between the opponent s 35 and 40. For the frst, the estmates mply that the crtcal and actual probabltes of success are 16 percent and 62 percent, and that tryng for a touchdown on average ncreases the chances of wnnng by about fve percentage ponts. For the second, the crtcal and actual probabltes are 39 percent and 64 percent, and gong for a frst down rases the probablty of wnnng by about 2.5 percentage ponts. In these cases, teams do not always kck, but they do about half the tme. These decsons are consstent wth wn maxmzaton only f teams have substantal addtonal nformaton that allows them to dentfy tmes when ther fourth-down attempts are especally lkely to succeed. As descrbed n the prevous secton, there s no evdence of such large addtonal nformaton. The second queston s whether the analyss mples that teams could ncrease ther overall chances of wnnng substantally. Snce the analyss consders only a small fracton of plays and only a sngle decson on those plays, one would not expect t to show large potental ncreases stemmng from the fact that teams are not assgned to stuatons randomly. The second s general equlbrum effects: dfferent decsons on fourth downs could affect other choces. I conclude that the effects of sample selecton bas are small and of ambguous sgn, and that general equlbrum effects are small and most lkely strengthen the case for beng more aggressve on fourth downs.