2 Forecasting: definition Forecasting is the process of making statements about events whose actual outcomes (typically) have not yet been observed (Wikipedia) There are many categories of forecasting methods : Categorical vs Quantitative Naïve approach (forecast based solely on previous period realisation) Time Series (Box Jenkins Methodology) Judgemental methods (based on subjective probability) 2
3 Why Forecasting? Economic Forecasting (Inflation/GDP forecasting) Sales Forecasting Supply Chain Forecasting Earthquake Forecasting Weather Forecasting Hotel Management (Room bookings forecasting) 3
4 Forecasting medical time series : a sample of existing studies The Application of Forecasting Techniques to Modelling Emergency Medical System Calls in Calgary, Alberta [see Channouf et al (2006)] Box Jenkins Methodology in Medical Research [see Helfenstein (1996)] Time series modelling for syndromic surveillance [Reis and Mandle (2003)] Conventional and advanced time series estimation: application to the Australian and New Zealand Intensive Care Society (ANZICS) adult patient database, [See Solomon and Moran (2011)] 4
5 Today s application: UK Hospital Waiting Lists PM's election pledge in jeopardy as report reveals patients waiting 6% longer The Guardian NHS Chief warns of rising Hospital waiting times BBC News NHS waiting times may increase at one in three flagship hospitals: report The Telegraph We will match people's symptoms to certain groups of conditions and try to provide a general forecast Weather used to forecast illness Daily Mail Surgery waiting lists hit one million Mail online 5
6 Solution: Forecasting the number of patients placed on a waiting list! But, why is it so important? Waiting lists grow if the demand for a specific treatment outcasts hospital capacity (supply) Hence, forecasting the number of patient placed on a waiting list at a given time might provide an estimate of the demand and supply imbalance Hospital Managers, could in principle, know exactly by how much the list is growing/declining (at least within a confidence interval) This methodology could be easily extended to forecast individual hospital waiting lists, as well as treatments/surgery specific waiting lists 6
7 Data Available at : cs/performancedataandstatistics/hospitalwaitingtimesan dliststatistics/index.htm This dataset contains information on patients waiting to be admitted to NHS hospitals in England either as a day case or ordinary admission. Provider based Time series data from April 1998 to Feb 2010 (143 obs) It does not contain: Emergency cases and outpatients 7
8 Methodology ARIMA models have been found particularly useful in describing stationary (non seasonal) time series. A stationary stochastic process is a process whose joint distribution does not shift in time and space, therefore characterized by finite first and second order moments Wold s Theorem: Any stationary series can be expressed as a combination of two components: a perfectly forecastable series and a moving average of possibly infinite order. Thus nonseasonal series can always be approximated by a MA( ) model, which in turn can be approximated by an ARMA (p, q) with a small number of parameters p, q. 8
9 Methodology cont d However, most time series are not stationary and usually have a seasonal component! We have to transform these series into stationary non seasonal before we can model them Seasonal differencing Non  stationarity can be classified : Trend in mean (difference as many times as required) Trend in variance (apply a power transformation, e.g. log) Note: The latter should only be applied only if it stabilises the variance! 9
10 Total waiting list x 1000 patients totalwaitinglistx m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 t Source: author s calculations 10
11 dlwsa 0.02 Growth Rate of Hospital Waiting Lists (difference the log) 1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 t Source: author s calculations 11
12 Box Jenkins Methodology Plot the series and identify the trend (is the series trending in the mean/variance?) Test for stationarity (Augmented Dickey Fuller Test, KPSS Test, PPTest) Transform the series into a stationary series (power transformation /seasonal differencing/ first differencing etc ) Plot the Autocorrelations and Partial Autocorrelation functions (Identify possible models) Estimate the possible models (check for coefficient significance and white noise residuals) Select the best models based on the information criteria (the model with lowest AIC, BIC, HQ ) Select the best two models given the above and test their forecasting accuracy (Diebold Mariano Test, Granger Newbold) 12
13 0.50 Model identification: ACF, PACF Partial autocorrelations of dltwl Lag Bartlett's formula for MA(q) 95% confidence bands Lag 95% Confidence bands [se = 1/sqrt(n)] ACF = Correlation of the series and lag of itself across time r k = Corr (X t, X t k ) = γ k /γ 0 PACF: Amount of correlation between a variable and a lag of itself that is not explained by correlations at all lowerorderlags 13
14 List of potential models AR (1) AR (2) MA (1) MA (2) ARMA (1, 1) ARMA (2, 1) ARMA (1, 2) Note: This was the model with : smallest information criteria, individually/jointly significant coefficients, uncorrelated residuals as well as yielding the most accurate forecasts! 14
15 Selecting the best Model The selected model is an ARMA ( 1, 2 ) dltwl = 0.90 dltwl (1) (ε t 1 ) (ε t 2 ) The series shows an high degree of persistence insofar it inherits a large proportion of the past period realisation Invertibility, causality and stationary conditions: The process satisfies the stationarity condition since the coefficient on the AR component is in absolute value lesser than one, hence ensuring that the process has finite first and second moments The process satisfies invertibility and causality conditions since the roots of the characteristic equation of the autoregressive process and the moving average, lie outside the unit circle. These last two requirements imply that the model is uniquely identified in its parameters! 15
16 FORECASTING 16
17 ARMA (1,2) Model forecast Out of sample forecast 1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 2012m1 t dlwsa xb prediction, onestep 17
18 No change forecast 1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 t dltwl L.dltwl 18
19 Forecasting Diagnostics Theil Ustatistics Ustatistics = 1 T 1 T T 1 t T 1 t f t+1 y t+1 y t yt+1 yt yt 2 2 = 0.34 If U< 1 the model is superior to a No change forecast If U>1 the model is inferior to a No change forecast 19
20 Drawbacks of this approach Lack of data availability It assumes that the Data Generating Process is time invariant (however it might well not be the case!) The point forecast confidence interval becomes wider and wider as the forecasting horizon increases! (Unless we are only concerned with one step ahead forecasts and the dataset is sufficiently large for our purposes) It is very unlikely to find a model that fits particularly well the data, in fact, in practice models can often explain very little! 20
21 Conclusions The past values of a variable contain very important information about the future of that variable! Time Series Analysis Forecasting is a very useful tool and it is easily implementable It can be applied to a variety of fields of which one one of them is Healthcare /Medical Research In this example, we have seen that only by looking at the time series properties of the data we were able to infer the sign of the growth rate of hospital admissions! 21
22 To conclude, if your model fails 22
More information