Predicting total movie grosses after one week

Similar documents
c 2015, Jeffrey S. Simonoff 1

Predicting Box Office Success: Do Critical Reviews Really Matter? By: Alec Kennedy Introduction: Information economics looks at the importance of

Regression Analysis: A Complete Example

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

1.1. Simple Regression in Excel (Excel 2010).

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Nominal and Real U.S. GDP

17. SIMPLE LINEAR REGRESSION II

The normal approximation to the binomial

Getting Started with Minitab 17

Simple Linear Regression

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

A Statistical Analysis of Popular Lottery Winning Strategies

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Understanding Options: Calls and Puts

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

The normal approximation to the binomial

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

CALCULATIONS & STATISTICS

Recall this chart that showed how most of our course would be organized:

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

2. Simple Linear Regression

The Volatility Index Stefan Iacono University System of Maryland Foundation

Analyzing the ROI of Independently Financed Films: are there many more Slumdogs than Millionaires? Benedetta Arese Lucini

THE WINNING ROULETTE SYSTEM.

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Simple linear regression

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

MULTIPLE REGRESSION EXAMPLE

SPSS Explore procedure

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Chapter 23. Inferences for Regression

Terminology and Scripts: what you say will make a difference in your success

Stata Walkthrough 4: Regression, Prediction, and Forecasting

This Report Brought To You By:

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Why Your Business Needs a Website: Ten Reasons. Contact Us: Info@intensiveonlinemarketers.com

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

Chapter 7: Simple linear regression Learning Objectives

Part I. Getting Started COPYRIGHTED MATERIAL

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

Chat Greetings. Efficiency. Report. An insight into customer engagement. April 2014

How To Run Statistical Tests in Excel

THE WISDOM OF 14 ACCOUNTING TEXTBOOKS.

PREDICTING BOX-OFFICE SUCCESS OF MOVIES IN THE U.S. MARKET

International Statistical Institute, 56th Session, 2007: Phil Everson

THE EF ENGLISHLIVE GUIDE TO: Dating in English TOP TIPS. For making the right impression

Premaster Statistics Tutorial 4 Full solutions

Christopher Seder Affiliate Marketer

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

UNDERSTANDING MUTUAL FUNDS. TC83038(0215)3 Cat No 64095(0215)

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

When and Which Companies Are Better Suited to Use Factoring

Hedge Effectiveness Testing

Telemarketing Selling Script for Mobile Websites

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Module 5: Multiple Regression Analysis

Getting Correct Results from PROC REG

When to use Excel. When NOT to use Excel 9/24/2014

Nonparametric statistics and model selection

How To Print To Scale With Easy Blue Print

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

Regression and Multivariate Data Analysis

Main Effects and Interactions

Comparing Means in Two Populations

CHAPTER NINE WORLD S SIMPLEST 5 PIP SCALPING METHOD

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Updates to Graphing with Excel

Top Ten Mistakes in the FCE Writing Paper (And How to Avoid Them) By Neil Harris

Stock-picking strategies

TEST-TAKING STRATEGIES FOR READING

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

How To Test For Significance On A Data Set

Simple Predictive Analytics Curtis Seare

GUI D A N CE. A Not-for-Profit Company. Helping Self Funders Make the Right Choices. Freephone

Solution Let us regress percentage of games versus total payroll.

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 13 Introduction to Linear Regression and Correlation Analysis

My Independent Movie

Moderation. Moderation

Simple Linear Regression, Scatterplots, and Bivariate Correlation

RELATIONSHIP BETWEEN WORKING CAPITAL MANAGEMENT AND PROFITABILITY IN TURKEY INDUSTRIAL LISTED COMPANIES

Mathematical goals. Starting points. Materials required. Time needed

Logistic regression modeling the probability of success

Linear Regression Models with Logarithmic Transformations

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

GLM I An Introduction to Generalized Linear Models

Univariate Regression

Easy Casino Profits. Congratulations!!

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Two-sample inference: Continuous data

Predicting Market Value of Soccer Players Using Linear Modeling Techniques

Business Valuation Review

Lab 11. Simulations. The Concept

Thinking about College? A Student Preparation Toolkit

Transcription:

Predicting total movie grosses after one week The movie industry is a business with a high profile, and a highly variable revenue stream. In 2014, moviegoers spent $10 billion at the U.S. box office alone. A single movie can be the difference between tens of millions of dollars of profits or losses for a studio in a given year. It s not surprising, therefore, that movie studios are intensely interested in predicting revenues from movies; the popular nature of the product results in great interest in gross revenues from the general public as well. The opening weekend of a movie s release typically accounts for 25% of the total domestic box office gross, so we would expect that the opening weekend s grosses would be highly predictive for total gross. This, however, ignores the different release patterns of movies (some movies open on thousands of screens in the first weekend, others build slowly into wide release, and others never show on more than a few screens). It s well known in the business that Friday nights during the opening weekend are nervous times for the marketing people at movie studios. It is on the strength of the opening weekend of general release that all major decisions pertaining to a film s ultimate financial destiny are made. Since competition for movie screens is fierce, movie theater owners do not want to spend more than the contractually obligatory two weeks on a film that doesn t have legs. Should a film lose its theatrical berth so quickly, chances are slim that it will have significant play internationally (if at all), and it is unlikely that it will make it to pay per view, cable or network television. This all but guarantees that ancillary revenue streams will dry up, making a positive return on investment virtually impossible to achieve, as ancillary deals are predicated on domestic box office gross. Movie theater owners often make the decision to keep a film running based on the strength of its opening weekend. But is it really true that first weekend grosses are predictive for the ultimate total domestic gross? The analyses presented here are based on data for widely-released (on at least 1000 screens) new movies released in the United States during calendar year 2013 as reported in the-numbers.com. This yields a total of 147 films. I am indebted to Haochuan Wang for sharing these data with me. The response of interest here is the total U.S. domestic gross revenue for each film, while the predictor is the domestic gross for the first weekend of release. Histograms for these variables show that they are long right tailed (total grosses range from a low of roughly $1 million for Phantom to roughly $425 million for The Hunger Games: Catching Fire), as does a scatter plot. c 2015, Jeffrey S. Simonoff 1

c 2015, Jeffrey S. Simonoff 2

For this reason logarithms are used in the modeling. Here is a scatter plot: There is obviously a relationship between the two variables, although it is not exactly what we would like to see, as there appears to be evidence of nonconstant variance, with lower-grossing movies having more variability in their errors. Let s ignore that for the moment and perform a regression. c 2015, Jeffrey S. Simonoff 3

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 30.499 30.4989 969.68 0.000 Logged opening gross 1 30.499 30.4989 969.68 0.000 Error 145 4.561 0.0315 Total 146 35.060 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.177349 86.99% 86.90% 86.54% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 0.930 0.215 4.33 0.000 Logged opening gross 0.9402 0.0302 31.14 0.000 1.00 Regression Equation Logged total gross = 0.930 + 0.9402 Logged opening gross At first glance, things look pretty good. The slope coefficient says that a 1% increase in first weekend gross is associated with an estimated expected 0.94% increase in total gross. The regression is very significant, with almost 87% of the variability in logged total domestic gross accounted for by knowing logged first weekend gross. The standard error of the estimate of ˆσ =.18 is worth further comment. A rough prediction interval for logged total domestic gross is ±2ˆσ.36. That is, 95% of the time the logged total domestic gross is known to within ±.36. Since I ve used logs base 10 here, that is the same as saying that we know total domestic gross to within a multiplicative factor of 10.36 ; that is, we wouldn t be surprised if the true total domestic gross is as little as 44% of our best guess (10.36 =.44), or as much as 2.29 times our best guess (10.36 = 2.29). This just reflects that even a strong relationship in the logged scale can translate to large variability in the original scale. Unfortunately, and as our original scatter plot suggested, our regression assumptions are violated here. There is apparent nonnormality of the residuals, and also nonconstant variance (with movies that make less money being more variable). c 2015, Jeffrey S. Simonoff 4

There isn t really much we can do about this using the tools and model at hand, but we can get a clue about what is going on by looking a little more carefully at the residuals. We can see that there is a group of movies that seem to have logged total gross noticeably higher than would have been expected based on the first weekend gross. If we look more closely we see that 8 of these 10 movies are noteworthy as being among the most criticallyacclaimed movies of 2013: 12 Years a Slave (Rotten Tomatoes critics score of 96 out of 100), American Hustle (93), Blue Jasmine (91), Dallas Buyers Club (93), Enough Said (96), Mud (98), Philomena (92), and The Spectacular Now (93). This suggests the very reasonable hypothesis that higher-quality movies are more likely to outperform what initial revenues suggest. If this were true we would expect that a model that includes measures of quality or audience satisfaction would be an improvement over this model, and that is the case. Consider the following model fit, which adds the Rotten Tomatoes audience rating as a predictor: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 32.307 16.1533 844.92 0.000 Logged opening gross 1 24.226 24.2264 1267.20 0.000 Rotten Tomatoes Audience Score 1 1.808 1.8076 94.55 0.000 Error 144 2.753 0.0191 Total 146 35.060 c 2015, Jeffrey S. Simonoff 5

Model Summary S R-sq R-sq(adj) R-sq(pred) 0.138268 92.15% 92.04% 91.77% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 1.004 0.168 5.98 0.000 Logged opening gross 0.8732 0.0245 35.60 0.000 1.09 Rotten Tomatoes Audience Score 0.006639 0.000683 9.72 0.000 1.09 Regression Equation Logged total gross = 1.004 + 0.8732 Logged opening gross + 0.006639 Rotten Tomatoes Audience Score As expected, a higher audience score is associated with higher total gross (we ll talk about exactly what this regression coefficient means in a little bit), and it is interesting to note that the coefficient for logged first weekend gross is smaller (implying less of an advantage for movies that open bigger) in the presence of the audience score variable. Strikingly, the residual plots now look almost perfect. Of course, this model wouldn t really be very helpful to a movie distributor, since he or she wouldn t have the audience ratings until several weeks c 2015, Jeffrey S. Simonoff 6

into the movie s run (and by then it would be obvious whether the movie was a hit or not). For now we will stick with the simple regression model using logged opening gross. As an aside, remember that this analysis only applies to wide-release movies. Most movies released in a given year are not, in fact, wide release movies, but rather are small-release movies, which are very different. There is enormous variability in the way small-release movies are screened and marketed. For the most part, the kinds of movies that receive an initial and/or full release of less than ten screens are niche market pictures, such as foreign films, small independent productions and documentaries, and often get released in specific art-house theaters in major markets only. Additionally, the length of time a given film may play can vary widely. That time frame is dependent on factors as diverse as competition for screens, film festival awards, word of mouth, and reviews. Overall, these movies generally rely most heavily on word of mouth and reviews to keep them in theaters for any appreciable length of time. One way we might use these models is to try to predict future total domestic grosses. I applied the simple regression model to the 22 movies released (widely) during the first two months of 2014, and constructed prediction intervals for each movie in the logged total gross scale. I then antilogged the prediction limits to convert back to the original dollars scale. Here are the lower and upper prediction limits, along with the actual total domestic gross for the films. Lower Upper Row Name pred lim Total Gross pred lim 1 Paranormal Activity: The Marked Ones 25571807 32447650 129249377 2 Lone Survivor 50887663 125069696 258406744 3 The Legend of Hercules 12911697 18848538 65263665 4 Her 8010758 25568252 40603829 5 Jack Ryan: Shadow Recruit 21767294 50577412 109975164 6 Devil s Due 12142110 15821462 61387908 7 The Nut Job 26958028 64251538 136281722 8 Ride Along 54965557 134938200 279342431 9 I, Frankenstein 12557759 9075290 63480997 10 That Awkward Moment 12738556 26068956 64391573 11 Labor Day 7769739 13371528 39392178 12 The Lego Movie 88398958 257784720 452116698 13 The Monuments Men 30331035 78031620 153413832 14 Vampire Academy 5978625 7791979 30388689 15 About Last Night 35019192 48637684 177273071 16 Robo Cop 36269023 58607008 183642757 17 Endless Love 18915460 23438250 95553051 c 2015, Jeffrey S. Simonoff 7

18 Winter s Tale 10745237 12600232 54356650 19 3 Days to Kill 17488717 30698000 88346184 20 Pompeii 14920860 23169033 75389250 21 Non-Stop 39130921 92168600 198242037 22 Son of God 34958863 59700064 176965704 For 21 of the 22 intervals, the actual grosses are within the prediction limits, confirming the power of the model (and of course we expect that about 1 out of 20 intervals will not contain the true value, since these are 95% prediction interval limits). The one miss is I, Frankenstein, which was almost universally panned by critics (it has a critics rating of 3 out of 100 on Rotten Tomatoes). Thus, while many factors are involved in the ultimate success of a movie, it is clear that first weekend grosses (and possibly augmented by information available after the first weekend, such as reviews and awards) goes a long way to predicting total grosses. The people in the business are right to look at those numbers very carefully! Minitab commands To take the log of a variable, click oncalc Calculator. Enter a new response variable name in the dialog box next to Store result in variable: (Logged first weekend, for example). In the dialog box below Expression: enter logt( First weekend gross ) (for example) for logs base 10, or ln( First weekend gross ) for natural logs. You can use the drop-down menu below Functions: and double-click on choices as well. To take the antilog of a variable, use the antilog function (for logs base 10) or the exp function (for natural logs). c 2015, Jeffrey S. Simonoff 8