Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1
Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting Regression techniques Regression and econometric methods Box & Jenkins ARIMA time series method Forecasting with ARIMA series Practical issues : forecasting with MSExcel 2
Time series decomposition A13 Highway 15000 20000 25000 30000 0 20 40 60 80 3
Time series decomposition A13 Highway 15000 20000 25000 30000 0 20 40 60 80 4
Time series decomposition A13 Highway, removing trending 5000 0 5000 0 20 40 60 80 5
Time series decomposition 2 4 6 8 10 12 5000 0 5000 A13 Highway, removing trending Months 6
Time series decomposition 2 4 6 8 10 12 5000 0 5000 A13 Highway, removing trending Months 7
Time series decomposition A13 Highway: trend and cycle 15000 20000 25000 30000 0 20 40 60 80 8
Time series decomposition A13 Highway: random part 3000 1000 0 1000 2000 0 20 40 60 80 9
Time series decomposition A13 Highway: random part 3000 1000 0 1000 2000 0 20 40 60 80 10
Time series decomposition A13 Highway: random part 3000 1000 0 1000 2000 0 20 40 60 80 11
Time series decomposition A13 Highway, prediction 15000 20000 25000 30000 0 20 40 60 80 100 12
Time series decomposition A13 Highway, prediction 15000 20000 25000 30000 0 20 40 60 80 100 13
Time series decomposition 0 20 40 60 80 2000 1000 0 1000 2000 A13 Highway: random part (v2) 14
Time series decomposition 0 20 40 60 80 2000 1000 0 1000 2000 A13 Highway: random part (v2) 15
Time series decomposition, modeling the random part Histogram of residuals (v2) Density 0e+00 2e 04 4e 04 3000 2000 1000 0 1000 2000 16
Time series decomposition, modeling the random part 2 1 0 1 2 2000 1000 0 1000 2000 Normal QQ plot of residuals (v2) Theoretical Quantiles Sample Quantiles 17
Time series decomposition, forecasting A13 Highway, forecast scenario 15000 20000 25000 30000 0 20 40 60 80 100 18
Time series decomposition, forecasting A13 Highway, forecast scenario 15000 20000 25000 30000 0 20 40 60 80 100 19
Time series decomposition, forecasting A13 Highway, forecast scenario 15000 20000 25000 30000 0 20 40 60 80 100 20
Time series decomposition, forecasting A13 Highway, forecast scenario 15000 20000 25000 30000 0 20 40 60 80 100 21
Time series decomposition, modeling the seasonal componant 2 4 6 8 10 12 5000 0 5000 A13 Highway, removing trending Months 22
Time series decomposition, modeling the seasonal componant A13 Highway: trend and cycle 15000 20000 25000 30000 0 20 40 60 80 23
Time series decomposition, modeling the seasonal componant 0 20 40 60 80 4000 2000 0 2000 4000 A13 Highway: random part 24
Modeling the random component The unpredictible random component is the key element when forecasting. Most of the uncertainty comes from this random component ε t. The lower the variance, the smaller the uncertainty on forecasts. The general theoritical framework related to randomness of time series is related to weakly stationary. 25
Dening stationarity Time series (X t ) is weakly stationary if for all t, E ( ) Xt 2 < +, for all t, E (X t ) = µ, constant independent of t, for all t and for all h, cov (X t, X t+h ) = E ([X t µ] [X t+h µ]) = γ (h), independent of t. Function γ ( ) is called autocovariance function. Given a stationary series (X t ), dene the autocovariance function, as h γ X (h) = cov (X t, X t h ) = E (X t X t h ) E (X t ).E (X t h ). and dene the autocorrelation function, as h ρ X (h) = corr (X t, X t h ) = cov (X t, X t h ) V (Xt ) V (X t h ) = γ X (h) γ X (0). 26
Dening stationarity A process (X t ) is said to be strongly stationary if for all t 1,..., t n and h we have the following law equality L (X t1,..., X tn ) = L (X t1 +h,..., X tn +h). A time series (ε t ) is a white noise if all autocovariances are null, i.e. γ (h) = 0 for all h 0. Thus, a process (ε t ) is a white noise if it is stationary, centred and noncorrelated, i.e. E (ε t ) = 0, V (ε t ) = σ 2 and ρ ε (h) = 0 for any h 0. 27
Statistical issues Consider a set of observations {X 1,..., X T }. The empirical mean is dened as X T = 1 T T X t. t=1 The empirical autocovariance function is dened as γ T (h) = 1 T h T h t=1 ( Xt X T ) ( Xt h X T ), while the empirical autocorrelation function is dened as ρ T (h) = γ T (h) γ T (0). Remark those estimators can be biased, but asymptotically unbiased. More precisely γ T (h) γ (h) and ρ T (h) ρ (h) as T. 28
Backward and forward operators Dene the lag operator L (or B for backward) the linear operator dened as L : X t L (X t ) = LX t = X t 1, and the forward operator F, F : X t F (X t ) = F X t = X t+1, Note that L F = F L = I (identity operator) and further F = L 1 and L = F 1. it is possible to compose those operators : L 2 = L L, and more generally L p = L L... L } {{ } where p N with convention L 0 = I. Note that L p (X t ) = X t p. Let A denote a polynom,a (z) = a 0 + a 1 z + a 2 z 2 +... + a p z p. Then A (L) is the 29
operator p A (L) = a 0 I + a 1 L + a 2 L 2 +... + a p L p = a k L k. Let (X t ) denote a time series. Series (Y t ) dened by Y t = A (L) X t satises k=0 Y t = A (L) X t = p a k X t k. k=0 or, more generally, assuming that we can formally the limit, A (z) = a k z k et A (L) = k=0 a k L k. k=0 30
Backward and forward operators Note that for all moving average A and B, then A (L) + B (L) = (A + B) (L) α R, αa (L) = (αa) (L) A (L) B (L) = (AB) (L) = B (L) A (L). Moving average C = AB = BA satises ( ) ( ) ( ) a k L k b k L k = c i L i k=0 k=0 i=0 où c i = i a k b i k. k=0 31
Geometry and probability Recall that it is possible to dene an inner product in L 2 (space of squared integrable variables, i.e. nite variance), < X, Y >= E ([X E(X)] [Y E(Y )]) = cov([x E(X)], [Y E(Y )]) Then the associated norm is X 2 = E ( [X E(X)] 2) = V (X). Two random variables are then orthogonal if < X, Y >= 0, i.e. cov([x E(X)], [Y E(Y )]) = 0. Hence conditional expectation is simply a projection in the L 2, E(X Y ) is the the projection is the space generated by Y of random variable X, i.e. E(X Y ) = φ(y ), such that X φ(y ) X, i.e. < X φ(y ), X >= 0, φ(y ) = Z = argmin{z = h(y ), X Z 2 } E(φ(Y )) <. 32
Linear projection The conditional expectation E(X Y ) is a projection if the set of all functions {h(y )}. In linear regression, the projection if made in the subset of linear functions h( ). We call this linear function conditional linear expectation, or linear projection, denoted EL(X Y ). In purely endogeneous models, the best forecast for X T +1 given past informations {X T, X T 1, X T 2,, X T h,...} is X T +1 = E(X T +1 {X T, X T 1, X T 2,, X T h, }) = φ(x T, X T 1, X T 2,, X T h, Since estimating a nonlinear function is dicult (especially in high dimension), we focus on linear functions, i.e. autoregressive models, X T +1 = EL(X T +1 {X T, X T 1, X T 2,, X T h, }) = α 0 X T +α 1 X T 1 +α 2 X T 2 + 33
Dening partial autocorrelations Given a stationary series (X t ), dene the partial autocorrelation function h ψ X (h) as ( ψ X (h) = corr Xt, X ) t h, where X t h = X t h EL (X t h X t 1,..., X t h+1 ) X t = X t EL (X t X t 1,..., X t h+1 ). 34
Time series decomposition, modeling the random part 0 20 40 60 80 2000 1000 0 1000 2000 A13 Highway: random part (v2) 35
Time series decomposition, modeling the random part Autocorrelations of residuals (v2) ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Lag 36
Time series decomposition, modeling the random part Partial autocorrelations of residuals (v2) Partial ACF 0.2 0.1 0.0 0.1 0.2 5 10 15 Lag 37
Time series decomposition, modeling the detrended series A13 Highway, removing trending 5000 0 5000 0 20 40 60 80 38
Time series decomposition, modeling the detrended series Autocorrelations of detrended series ACF 0.5 0.0 0.5 1.0 0 5 10 15 20 25 30 35 Lag 39
Time series decomposition, modeling the detrended series Partial autocorrelations of detrended series Partial ACF 0.4 0.2 0.0 0.2 0.4 0.6 0 5 10 15 20 25 30 35 Lag 40
Time series decomposition, modeling Y t = X t X t 12 A13 Highway: lagged detrended series 3000 1000 0 1000 2000 0 10 20 30 40 50 60 70 41
Time series decomposition, modeling Y t = X t X t 12 A13 Highway: lagged detrended series 3000 1000 0 1000 2000 0 10 20 30 40 50 60 70 42
Time series decomposition, modeling Y t = X t X t 12 Autocorrelations of lagged detrended series ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 35 Lag 43
Time series decomposition, modeling Y t = X t X t 12 Partial autocorrelations of lagged detrended series Partial ACF 0.2 0.1 0.0 0.1 0.2 0 5 10 15 20 25 30 35 Lag 44
Time series decomposition, forecasting A13 Highway: forecasting detrended series (ARMA) 5000 0 5000 1990 1992 1994 1996 1998 2000 45
Time series decomposition, forecasting A13 Highway: forecasting detrended series (ARMA) 5000 0 5000 1990 1992 1994 1996 1998 2000 46
47
Estimating autocorrelations with MSExcel 48
A white noise A white noise is dened as a centred process (E(ε t ) = 0), stationary (V (ε t ) = σ 2 ), such that cov (ε t, ε t h ) = 0 for all h 0. The so-called Box-Pierce test can be used to test H 0 : ρ (1) = ρ (2) =... = ρ (h) = 0 H a : there exists i such that ρ (i) 0. The idea is to use Q h = T where h is the lag number and T the total number of observations. h k=1 Under H 0, Q h has a χ 2 distribution, with h degrees of freedom. ρ 2 k, 49
A white noise Another statistics with better properties is a modied version of Q, Q h = T (T + 2) h k=1 ρ 2 k T k, Most of the softwares return Q h for h = 1, 2,, and the associated p-value. If p exceeds 5% (the standard signicance level) we feel condent in accepting H 0, while if p is less than 5%, we should reject H 0. 50
A white noise Simulated white noise 2 1 0 1 2 3 0 100 200 300 400 500 White noise autocorrelations White noise partial autocorrelations ACF 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 51
Time series decomposition, testing for white noise Box Pierce statistic, testing for white noise on lagged detrended series Q Box Pierce statistics 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 p value 5 10 15 20 52
Time series decomposition, testing for white noise Box Pierce statistic, testing for white noise on residuals (v2) Q Box Pierce statistics 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 p value 5 10 15 20 53
Autoregressive process AR(p) We call autoregressive process of order p, denoted AR (p), a stationnary process (X t ) satisfying equation X t p φ i X t i = ε t for all t Z, (1) i=1 where the φ i 's are real-valued coecients and where (ε t ) is a white noise process with variance σ 2. (1) is equivalent to Φ (L) X t = ε t where Φ (L) = I φ 1 L φ p L p 54
Autoregressive process AR(1), order 1 The general expression for AR (1) process is X t φx t 1 = ε t for all t Z, where (ε t ) is a white noise with variance σ 2. If φ = ±1, process (X t ) is not stationary. E.g. if φ = 1, X t = X t 1 + ε t (called random walk) can be written and thus E (X t X t h ) 2 = hσ 2. X t X t h = ε t + ε t 1 +... + ε t h+1, But it is possible to prove that for any stationary process E (X t X t h ) 2 4V (X t ). Since it is impossible to have for any h, hσ 2 4V (X t ), it means that the process cannot be stationary. 55
Autoregressive process AR(1), order 1 If φ < 1 it is possible to invert the polynomial lag operator X t = (1 φl) 1 ε t = φ i ε t i (as a function of the past) (ε t ) ). (2) i=0 For a stationary process,the aucorelation function is given by ρ (h) = φ h. Further, ψ(1) = φ and ψ(h) = 0 for h 2. 56
A AR(1) process, X t = 0.7X t 1 + ε t Simulated AR(1) 4 2 0 2 0 100 200 300 400 500 AR(1) autocorrelations AR(1) partial autocorrelations ACF 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 57
A AR(1) process, X t = 0.4X t 1 + ε t Simulated AR(1) 3 2 1 0 1 2 3 0 100 200 300 400 500 AR(1) autocorrelations AR(1) partial autocorrelations ACF 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 58
A AR(1) process, X t = 0.5X t 1 + ε t Simulated AR(1) 2 0 2 4 0 100 200 300 400 500 AR(1) autocorrelations AR(1) partial autocorrelations ACF 0.5 0.0 0.5 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 59
A AR(1) process, X t = 0.99X t 1 + ε t Simulated AR(1) 10 5 0 5 10 0 100 200 300 400 500 AR(1) autocorrelations AR(1) partial autocorrelations ACF 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 60
Autoregressive process AR(2), order 2 Those processes are also called Yule process, and they satisfy ( 1 φ1 L φ 2 L 2) X t = ε t, where the roots of Φ (z) = 1 φ 1 z φ 2 z 2 are assumed to lie outside the unit circle, i.e. 1 φ 1 + φ 2 > 0 1 + φ 1 φ 2 > 0 φ 2 1 + 4φ 2 > 0, 61
Autoregressive process AR(2), order 2 Autocorrelation function satises equation ρ (h) = φ 1 ρ (h 1) + φ 2 ρ (h 2) for any h 2, and the partial autocorrelation function satises ρ (1) for h = 1 [ ψ (h) = ρ (2) ρ (1) 2] [ / 1 ρ (1) 2] for h = 2 0 for h 3. 62
A AR(2) process, X t = 0.6X t 1 0.35X t 2 + ε t Simulated AR(2) 4 2 0 2 0 100 200 300 400 500 AR(2) autocorrelations AR(2) partial autocorrelations ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 63
A AR(2) process, X t = 0.4X t 1 0.5X t 2 + ε t Simulated AR(2) 4 2 0 2 0 100 200 300 400 500 AR(2) autocorrelations AR(2) partial autocorrelations ACF 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 64
Moving average process M A(q) We call moving average process of order q, denoted MA (q), a stationnary process (X t ) satisfying equation q X t = ε t + θ i ε t i for all t Z, (3) i=1 where the θ i 's are real-valued coecients, and process (ε t ) is a white noise process with variance σ 2. (3) processes can be written equivalently X t = Θ (L) ε t whereθ (L) = I + θ 1 L +... + θ q L q. The autocovariance function satises γ (h) = E (X t X t h ) = E ([ε t + θ 1 ε t 1 +... + θ q ε t q ] [ε t h + θ 1 ε t h 1 +... + θ q ε t h q ]) [θ h + θ h+1 θ 1 +... + θ q θ q h ] σ 2 if 1 h q = 0 if h > q, 65
Moving average process M A(q) If h = 0, then γ (0) = [ 1 + θ 2 1 + θ 2 2 +... + θ 2 q] σ 2. This equation can be written γ (k) = σ 2 q θ j θ j+k with convention θ 0 = 1. j=0 Autocovariance function satises ρ (h) = θ h + θ h+1 θ 1 +... + θ q θ q h 1 + θ 2 1 + θ2 2 +... + θ2 q if 1 h q, and ρ (h) = 0 if h > q. 66
Moving average process M A(1), order 1 The general expression of MA (1) is X t = ε t + θε t 1, for all t Z, where (ε t ) is a white noise with variance σ 2. Autocorrelations are given by ρ (1) = θ, and ρ (h) = 0, for h 2. 1 + θ2 Note that 1/2 ρ (1) 1/2 : MA (1) processes only have small autocorrelations. Partial autocorrelation of order h is given by ψ (h) = ( 1)h θ h ( θ 2 1 ) 1 θ 2(h+1). 67
A MA(1) process, X t = ε t + 0.7ε t 1 Simulated MA(1) 3 2 1 0 1 2 3 0 100 200 300 400 500 MA(1) autocorrelations MA(1) partial autocorrelations ACF 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 68
A MA(1) process, X t = ε t 0.6ε t 1 Simulated MA(1) 3 2 1 0 1 2 3 0 100 200 300 400 500 MA(1) autocorrelations MA(1) partial autocorrelations ACF 0.5 0.0 0.5 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 0 10 20 30 40 Lag Lag 69
Autoregressive moving average process ARM A(p, q) We call autoregressive moving average process of orders p and q, denoted ARMA (p, q), a stationnary process (X t ) satisfying equation X t = p φ j X t j + ε t + j=1 q θ i ε t i for all t Z, (4) i=1 where the φ j 's and θ i 's are real-valued coecients, and process (ε t ) is a white noise process with variance σ 2. (4) processes can be written equivalently Φ (L) X t = Θ (L) ε t, where Φ (L) = I φ 1 L... φ q L q and Θ (L) = I + θ 1 L +... + θ q L q. 70
Autoregressive moving average process ARM A(p, q) Note that under some technical assumptions, one can write X t = Φ 1 (L) Θ (L) ε t, i.e. the ARMA(p, q) process is also an MA( ) process, and Φ (L) Θ 1 (L) X t = ε t, i.e. the ARMA(p, q) process is also an AR( ) process. Wald's theorem claims that any stationary process (satisfying further technical conditions) can be written as a MA process. More generally, in practice, a stationary series can be modeled either by an AR(p) process, a MA(q), or an ARMA(p, q ) whith p < p and q < q. 71
A ARMA(1, 1) process, X t = 0.7X t 1 ε t 0.6ε t 1 Simulated ARMA(1,1) 2 1 0 1 2 3 0 100 200 300 400 500 ARMA(1,1) autocorrelations ARMA(1,1) partial autocorrelations ACF 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 72
A ARMA(2, 1) process, X t = 0.7X t 1 0.2X t 2 ε t 0.6ε t 1 Simulated ARMA(2,1) 2 0 2 4 0 100 200 300 400 500 ARMA(2,1) autocorrelations ARMA(2,1) partial autocorrelations ACF 0.0 0.2 0.4 0.6 0.8 1.0 Partial ACF 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 Lag 0 10 20 30 40 Lag 73
Fitting ARM A processes with MSExcel 74
Forecasting with AR(1) processes Consider an AR (1) process, X t = µ + φx t 1 + ε t then T X T +1 = µ + φx T, T X T +2 = µ + φ. T X T +1 = µ + φ [µ + φx T ] = µ [1 + φ] + φ 2 X T, T X T +3 = µ + φ. T X T +2 = µ + φ [µ + φ [µ + φx T ]] = µ [ 1 + φ + φ 2] + φ 3 X T, and recursively T X T +h can be written T X T +h = µ + φ. T X T +h 1 = µ [ 1 + φ + φ 2 +... + φ h 1] + φ h X T. or equivalently T X T +h = µ φ + φh [ X T µ φ ] 1 φ h = µ + φ h X T. 1 φ } {{ } 1+φ+φ 2 +...+φ h 1 75
Forecasting with AR(1) processes The forecasting error made at time T for horizon h is T h = T XT +h X T +h = T XT +h [φx T +h 1 + µ + ε T +h ] =... = T XT +h [ φ h 1X T + ( φ h 1 +... + φ + 1 ) µ +ε T +h + φε T +h 1 +... + φ h 1 ε T +1, (6) thus, T h = ε T +h + φε T +h 1 +... + φ h 1 ε T +1, with variance having variance V = [ 1 + φ 2 + φ 4 +... + φ 2h 2] σ 2, where V (ε t ) = σ 2. thus, variance of the forecast error increasing with horizon. 76