Decision 411: Class 12 Automatic forecasting software Political & ethical issues in forecasting Automatic forecasting software Most major statistical & database packages include wizards for automatic forecasting: SAS (Time Series Forecasting System) SPSS (Decision Time*) Oracle (Sales Analyzer) They generally conduct tournaments among the most basic time series models (RW, LT, SMA, SES, LES, Winters) to pick a winner *The Advanced Forecasting Wizard in Decision Time also considers ARIMA models. 1
Automatic forecasting in Statgraphics Statgraphics also includes an Automatic Forecasting procedure* This is a tournament-based procedure that includes all the basic models, plus ARIMA models * Actually, there are two automatic forecasting procedures: one on the Time Series menu and one on the Snapstats menu. Use the Time Series version to get more detailed output, including a model comparison report. Automatic forecasting options Choose your own contestants Information criteria that may be used to pick the winner (I recommend BIC) 2
Information criteria The Akaike Information Criterion (AIC), Hannan-Quinn Criterion (HQC) and Schwarz Bayesian Information Criterion (BIC) are often used to rank forecasting models in automated selection procedures. These criteria impose a heavier penalty for model complexity than MSE (i.e., squared error adjusted for # coefficients, which is used to compute adjusted R-squared). R MSE vs. AIC vs. HQC vs. BIC Let E denote the simple average of the squared errors, n = # data points, and k = # coefficients. Then the different information criteria adjust E as follows: n MSE = E n k AIC exp(2 k / n) E ((log ) 2 k/ n ) HQC n E ( k / n ) BIC n E Penalty factors for model complexity (n=100) 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0 2 4 6 8 10 number of coefficients (k) BIC HQC AIC MSE 3
Note that the penalties for model complexity are significant for small data sets! Penalty factors for model complexity (n=50) 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0 2 4 6 8 10 BIC HQC AIC MSE number of coefficients (k) If your software does not display AIC, HQC, or BIC, and you must compare models on RMSE, you should make a significant mental adjustment for number of parameters if the data set is small: based on AIC, each additional coefficient should reduce the (adjusted) RMSE by 50/n % (i.e., by 0.5% if n=100, or by 1% if n=50), and the reduction should be about twice as large based on BIC. MSE vs. AIC/HQC/BIC, continued BIC imposes the heaviest penalty for model complexity, HQC is 2 nd heaviest (for n>15), AIC is 3 rd, and MSE is 4 th. BIC or HQC is theoretically best when the true model is somewhere in the set of potential models. AIC is theoretically best when the true model is not in the set of potential models. BIC is probably the best to use in practice, to hedge against overfitting,, since it favors simpler models. Note: AIC, HQC, and BIC in Statgraphics are natural logs of formulas shown earlier, Hence exp(aic), exp(hqc), and exp(bic) ) are comparable to MSE, so a 0.1 difference in reported AIC/HQC/BIC means a 10% difference in MSE 4
Automatic forecasting of housing starts (same data as in classes 4 & 10) Tournament results Note: models A-J A J all use multiplicative seasonal adjustment When AIC is the criterion, the winner is ARIMA(1,0,1)x(0,1,1), but the other ARIMA models are not far behind. (Note: ARIMA models are listed in order of their ranking.) 5
When BIC is the criterion, the winner is ARIMA(0,1,1)x(0,1,1), but the choice among the ARIMA models is still close by this criterion. (Actually, models M, O, P, and Q are structurally very similar, differing only in fine tuning. Model N does not use a non- seasonal difference.) Here s the plot of forecasts & 90% limits for later comparison 6
Here are the results of the automatic forecasting procedure on the Gap sales data, using BIC as the optimization criterion. The random walk model with multiplicative seasonal adjustment wins here. The ARIMA (1,0,0)x(0,1,0) model that performed well in out-of of-sample validation shows up as model Q, although some of the other ARIMA choices are puzzling (e.g., no seasonal difference in models M and P). Conclusions The Automatic Forecasting procedure in Statgraphics runs a fairly elaborate tournament with sophisticated selection criteria. In addition to the various smoothing models, it also includes some sensible seasonal ARIMA models. Its rules are transparent, and it s easy to explore the models further or test them against others. 7
Specialized auto-forecasting software SmartForecasts (www.smartforecasts.com) Conducts a tournament among basic models (no ARIMA) using a novel multiple-horizon error criterion AutoBox (www.autobox.com) Emphasizes ARIMA, including transfer function and intervention models (but very primitive geeky output) ForecastX (www.forecastx.com) Excel add-in with full assortment of models & tools, but a black box to some extent doesn t show full details of models and tournament results Forecast Pro (www.forecastpro.com) Expert system chooses among time series models (including ARIMA) & dynamic regression models Features of stand-alone alone AF packages Multi-level level forecasting (i.e., forecasts that are organized hierarchically and add up ) Special features for promotion- or event-driven data Models for intermittent data (lots of zeroes) Detection of spikes and shifts that should be treated with dummy variables. Ability to forecast many series at once in batch mode Eyeball adjustments of forecasts based on subjective judgment (!) 8
ForecastX demonstration Automatic model selection ( Procast Procast ) can be based on various error measures. Data capture options include various cleansing operations and hold-out, out, but holding data out doesn t change the results! Accuracy Measures Value AIC 1,462.72 BIC 1,473.04 Mean Absolute Percentage Error (MAPE) 5.08% Sum Squared Error (SSE) 7,583.87 R-Square 91.14% Adjusted R-Square 91.06% Method Statistics Value Method Selected Holt-Winters Alpha 0.44 Beta 0.14 Gamma 0.00 Decomposition Type Additive Seasonal Indices Value Index 1 Automatic forecast model -11.29 Index 2 selection based on either sum -6.02 Index 3 of squared errors or BIC yields 16.84 Index 4 27.85 the additive Winters model Index 5 31.42 Index 6 (a.k.a. Holt-Winters) in this 32.08 Index 7 case. The final (additive) 26.14 Index 8 seasonal indices are shown, 24.34 Index 9 17.65 which is nice. However, the Index 10 20.03 Index 11 complete tournament results 2.69 Index 12 are NOT reported. -8.48 Note: labeling of smoothing parameters is non- standard here: Beta is the seasonal parameter and Gamma is the trend parameter. These values would correspond to Beta=0.0001, Gamma =0.14 in Statgraphics. 9
ForecastX forecasts and 90% confidence limits for housing starts ForecastX,, continued Now let s force the model type to be ARIMA (Box-Jenkins). There is no choice of error measure for this model type, and it s not clear which one is used. 10
Accuracy Measures Value AIC 1,492.33 BIC 1,506.08 Mean Absolute Percentage Error (MAPE) 5.37% Sum Squared Error (SSE) 8,550.96 R-Square 90.01% Automatic ARIMA selection: Adjusted R-Square 89.88% mixed SAR/SMA model is Method Statistics somewhat unusual. Value Method Selected Box Jenkins Model Selected ARIMA(2,1,0) * (1,1,1) T-Test For Non Seasonal AR -9.15 T-Test For Non Seasonal AR The SAR(1) coefficient is -5.45 T-Test For Seasonal AR reported to have a slightly -2.01 T-Test For Seasonal MA significant t-stat. t 6.82 What are the estimated coefficients and standard errors that go with these t-stats? t This is proprietary information not shown to the user! You also have the option to look at ACF and PACF plots with differences and/or seasonal differences, to make your own ARIMA identification 11
When we fit the same model in Statgraphics (model C here), it gives similar results to our other best ARIMA models We can t do a head-to to-head comparison with additive Winters here because Statgraphics only offers the multiplicative version of Winters. Details of model chosen by ForecastX: The SAR(1) coefficient is actually insignificant as estimated by Statgraphics-- --the t-stat t obtained by SG differs from the one obtained by ForecastX! Statgraphics is correct here: this time series has a pure MA(1)xSMA(1) signature after seasonal and nonseasonal differencing, so there is no need for an SAR(1) term, and in general it is rare to need to estimate more than one seasonal coefficient in total. So, this model is effectively (2,1,0)x(0,1,1) which is logically almost the same as (0,1,1)x(0,1,1). The two negative AR coefficients are roughly equivalent to one positive MA coefficient. 12
Another example: GAP revisited Forecast of GAP net sales produced by ForecastX,, with 90% limits This time the Procast Procast feature automatically selected a multiplicative Winters model. These parameters correspond to Beta=0.12, Gamma=0.9999 in Statgraphics. 13
If the model type is forced to Box-Jenkins (ARIMA) the model selected is (1,0,0)x(0,1,0), the same as one of our own previous models from class 10 except without a constant term. With one order of differencing and no constant, there is zero trend in the forecasts of this model. The trend term was not technically significant in our earlier model, so on purely mechanical grounds it should have been en removed. Note that the MAPE of the ARIMA model is better than that of the Winter s model that ForecastX selected in its default ProCast ProCast mode. ARIMA(1,0,0)x(0,1,0) forecasts 14
ForecastX,, continued Forecast of new product sales based on only the first three data points! (??) ForecastX,, continued NewProductSales 3000 2500 2000 1500 1000 500 0 Jan-04 Feb-04 Mar-04 Apr-04 May-04 Jun-04 Jul-04 Aug-04 Sep-04 Oct-04 Nov-04 Dec-04 Jan-05 Feb-05 Mar-05 New ProductSales Forecast of New ProductSales Fitted Values Voila! An S-curve S has been fitted, similar but not quite identical to the Bass model curve. (What 3-parameter 3 equation has been fitted? This information is not provided to the user.) 15
SmartForecasts is also a tournament- based program that uses basic (non- ARIMA) time series models, but with an interesting twist: the tournament winner is determined by a sliding simulation in which forecast errors are computed at a whole range of horizons and the cumulative average absolute error is then minimized over both short- and long- horizon forecasts. Here the Winters model is the winner for the housing starts data. (Additive and multiplicative do about equally well.) However, the non-winters models are severely handicapped in this tournament by not being used in conjuction with seasonal adjustment! (These results were provided by Dr. Nelson Hartunian at SmartCorp.) Forecasts and 90% confidence limits for housing starts produced by SmartForecasts. 16
SmartForecasts tournament results for Gap sales: the multiplicative Winters model wins here. The estimated coefficients are different from those obtained in Statgraphics due to the different optimization criterion. Forecasts and 90% confidence limits for Gap sales produced by SmartForecasts. 17
Limitations of AF software Expert system or tournament committee isn t always right, and doesn t understand YOUR data sometimes needs to be overridden Works on only what you give it doesn t consider all possibilities for transformations or additional variables At the end of the day, YOU (not the computer) are responsible for the results Conclusions Automatic forecasting software provides a potentially useful power tool for forecasting To use it wisely, you need to thoroughly understand how the models work and how they ought to be selected via data analysis By virtue of having completed this course, you are now qualified to use automatic forecasting software 18
Scenario 1. In preparation for an upcoming meeting to discuss your corporation's strategic plan, you have been asked to prepare a forecast for the sales growth in certain product lines. You undertake elaborate statistical data analysis, fitting time series models and regression models to examine the effects of industry trends, product life cycles, market share, demographics, promotional activities, etc. Finally you come up with a model you feel you can trust, which shows that sales should increase by 10 percent next year. The 50% confidence interval ranges from 6 percent to 14 percent. Two days before the meeting, you learn that the Vice President for Sales has conducted her own field study, in which individual sales representatives were asked to give their own best estimates of sales in the coming year. Aggregation of these results has led to a prediction that sales will be up by only about 5 percent next year. She has just gotten wind of your study and calls you in to complain that your forecasts may be used by management to set unrealistic quotas for the sales force... Scenario 2. You are a risk manager for a large casualty insurance company which handles workers' compensation and liability insurance for numerous Fortune 100 corporations. Such corporations, by virtue of their size, do not need to pool their insurance risks with other organizations. Hence, they are covered by "retrospectively rated" insurance plans which are tantamount to self-insurance but offer significant tax advantages. Under such a plan, the insured corporation pays the insurance company for its actual losses, with a markup for the insurance company's claims-handling costs and profit. These plans are complicated by the fact that casualty losses "develop" over time. For example, the corporation's actual losses for the 1998 calendar year will not be precisely known for many years: months may elapse between the time an accident occurs and the time a claim is filed, and years may pass before the final amount of loss is determined. 19
At the end of a given year, the insurance company prepares a forecast of the total amount of losses which will eventually have to be paid for accidents which occurred during that year. The insured corporation must at this point pay the forecasted losses, plus markup. The forecast will be readjusted every year thereafter as more data accumulates, and the insured corporation will then pay more money in or get some money back depending on whether the forecast is revised up or down. Suppose that you prepare a revised forecast in mid-2006 for the losses which one of your major clients incurred during the years 1999-2004. Your revised forecast (which takes into account some recent and unfavorable legal precedents pertaining to outstanding liability claims) shows that those losses were underestimated by 20 million dollars. In other words, based on your forecast,, the client must immediately hand over an additional 20 million dollars (plus markup). The account executive is furious: "We can't tell them that! They'll cancel the account. Why don't we just go ahead and stick with the industry-average loss development factors we used last year?" Scenario 3A. You recently supervised a project to build a sales forecasting model for one of your consulting firm's major clients. The model is hierarchical in structure, and produces forecasts at the corporate level, the division level, the regional level, and the store level: it uses seasonal adjustment and exponential smoothing, with adjustment and smoothing factors estimated separately for different regions. It was developed using a statistical modeling language and it is linked to a large database of sales data supplied by the client. Last week you flew to the client's headquarters and presented the model to an assembly of regional vice presidents and managers. Many of them were upset with your results, which disagree with their own private estimates and are likely to affect their budgets adversely. The client's top management insisted that they would back you up. However 20
this morning when you arrive at work you find a crowd of people outside your office. They are a team of auditors hired by the client to audit your forecasting model. They tell you they'd like a conference room with a connection to your computer network that they can use for a few days, and they'd like to see your notes documenting your model-selection process, printouts of your statistical reports, the computer files containing your modeling code, and the files containing the raw data. Scenario 3B: You are an auditor working for an accounting firm that has just been asked to review a forecasting model used by one of your clients... 21
Scenario 4. The advertising agency for which you work is trying to renew its contract with an important client. The client has been balking, claiming that your ad campaigns have been less successful than promised, and that sales growth has been disappointing. Your boss says run me some numbers to show these guys that our ads are really working. You analyze the data and discover that the effect of your ads on sales has on the whole been insignificant. However, there is one market in which sales showed a huge upward spike shortly after your ad campaign began there. Privately, you believe this was due to the Super Bowl which happened to be held there at the same time. However, if this market is simply aggregated with all the others... Scenario 5. You are a professor who teaches forecasting at a leading business school. One day you receive a phone call from a former student saying "my boss has asked me to run some numbers that show our ad campaign is working, but I m having trouble finding a statistically significant relationship. What do I need to do to get the P-value P below 0.05...?" 22
What can you do to avoid trouble? Follow good modeling practices (sensible models, residual diagnostics, out-of of-sample validation, etc.) Leave a paper or computer trail i.e., i.e., keep well- annotated records of your model-fitting efforts. Someone else (who may be a sharp-penciled penciled auditor or perhaps only yourself 12 months hence) may have to figure out what you did, and why. Often the greatest benefit of a forecasting effort is to identify needs and opportunities to improve data collection and data integration within the organization i.e., i.e., to develop data as a corporate asset. Forecasting is easier, more accurate, and less controversial if you have plenty of clean data. Remember to K.I.S.S. (Keep It Simple and intuitively reasonable if at all possible). Neither overstate nor understate the accuracy of your forecasts. Always report confidence intervals. If different forecasting approaches lead to different results, call attention to their underlying assumptions, their data sources, their possible sources of bias, and their respective margins for error. If YOU believe your model, stand by it! Integrity and commitment earn respect in the organization. (If you don't believe your model, go back to the drawing board.) Thank you! 23