Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing any further analysis of the TS. For example, the fluctuations in the UK unemployment data and the sales of an industrial heater data are considerably reduced (evened) by log transformation; compare Figures 1.2 and 2.1 and Figures 1.7 and 2.2. Our aim is to estimate and eliminate the deterministic components m(t) and s(t) in such a way that the random noise Y t is a stationary process, in the sense that its fluctuations are stable and it has no trend. A formal definition of stationarity will be given in Chapter 4. Such a process can be modelled and analyzed using the well developed theory of stationary TS. All these components are parts of the original TS and their overall analysis leads to the analysis of X t. 2.1 Elimination of Trend in the Absence of Seasonality When there is no seasonality, the model (1.1) simplifies to X t = m t + Y t, t = 0, 1,...,n, (2.1) where E(X t ) = m t, that is, we assume that E(Y t ) = 0. 2.1.1 Least Square Estimation of Trend We assume that m(t) = m(t,β) is a function of time t, such as a polynomial of degree k, depending on some unknown parameters β = (β 0,β 1,...,β k ) T. We 11
12 CHAPTER 2. TREND AND SEASONAL COMPONENTS 2.5 2.0 Log(Unemployment) 1.5 1.0 0.5 0.0 1960 1970 1980 1990 2000 Year Figure 2.1: Natural logarithm of the UK unemployment data, compare Fig. 1.2 7 Log(Sales) 6 5 4 3 65 66 67 68 69 70 71 Year Figure 2.2: Natural logarithm of the data: Sales of an industrial heater.
2.1. ELIMINATION OF TREND IN THE ABSENCE OF SEASONALITY 13 estimate the parameters by fitting the function to the data x t, that is by minimizing the sum of squares of differences x t m(t,β). To find a general form of the estimator we work with the rvs X t rather than their realizations x t. We find the estimator β of β by minimizing the function: S(β) = (X t m(t,β)) 2. (2.2) For example, consider a linear trend m(t,β 0,β 1 ) = β 0 + β 1 t. Then S(β) = (X t β 0 β 1 t) 2. To minimize S(β) with respect to the parameters β = (β 0,β 1 ) T we need to calculate the derivatives of S(β) with respect to β i, i = 0, 1, compare them to zero and solve the resulting equations, i.e., ds(β) dβ 0 ds(β) dβ 1 = 2 = 2 (X t β 0 β 1 t) = 0 (X t β 0 β 1 t)t = 0. This can be written as X t nβ 0 β 1 X t t β 0 t = 0 t β 1 t 2 = 0. (2.3) Solving the first equation for β 0 we obtain β 0 = 1 n X t β 1 1 n t = X t β 1 t. Hence, equations (2.3) can be written as { β0 = X t β 1 t nx t t β 0 n t β 1 nt 2 = 0.
14 CHAPTER 2. TREND AND SEASONAL COMPONENTS 30 Residuals 10-10 -30 1960 1970 1980 1990 2000 Year Figure 2.3: Residuals of a linear fit to the UK consumption data; compare Fig. 1.8 Then substituting β 0 to the second equation we can calculate the formula for β 1 and obtain the estimators of both parameters β 0 = X t β 1 t β 1 = X tt X t t t 2 ( t). 2 We need to distinguish an estimator as a function of rvs from an estimate (i.e., a value of the function for a given set of data). This should be obvious from the context. Fitting a linear trend to the UK consumption data (see Figure 1.8) we obtain the following estimates Hence the estimate of the trend is The random noise is then estimated as β 0 = 12349.7, β1 = 6.3754. m(t) = 12349.7 + 6.3754t. ŷ t = x t m t = x t ( 12349.7 + 6.3754t). The plot (see Figure 2.3) of the residuals does not show any clear trend. However it indicates correlations of the random noise. In this course we will be modelling
2.1. ELIMINATION OF TREND IN THE ABSENCE OF SEASONALITY 15-30 -10 10 30-30 -10 10 30 30 LS_RegressResids 10-10 -30 30 10-10 LS_Residuals_Lag1-30 30 LS_Residuals_Lag2 10-10 -30 30 10-10 LS_Residuals_Lag3-30 -30-10 10 30-30 -10 10 30 Figure 2.4: Correlation matrix for the residuals of the Least Squares Regression fit; UK consumption data. this kind of random variables to be able to predict future values of the TS, for example next year values of consumption in the UK. The matrix plot shows that indeed the residuals are dependent, at least up to two neighboring values.
16 CHAPTER 2. TREND AND SEASONAL COMPONENTS 2.1.2 Smoothing by a Moving Average Let q be a positive integer. We call W t = 1 2q + 1 q X t+j (2.4) a symmetric two-sided moving average of the process {X t }. Simple example Take the following realization of {X t }: {x t } = {5 3 2 4 5 6 7 8 3 9} For q = 1 we have the following realizations of W t : w 1 =? w 2 = 1 10 (5 + 3 + 2) = 3 3 w 3 = 1 3 (3 + 2 + 4) = 9 3 w 4 = 1 11 (2 + 4 + 5) = 3 3. w 9 = 1 20 (8 + 3 + 9) = 3 3 w 10 =? Equation (2.4) defines W t for q + 1 t n q. Problem: How to calculate values of W t for t q and for t > n q? One possibility is to define X t = X 1 for t < 1 and X t = X n for t > n. Then in the simple example we would get x 0 = x 1 = 5 and x 11 = x 10 = 9 and w 1 = 1 3 (5 + 5 + 3) = 13 3 w 10 = 1 3 (3 + 9 + 9) = 21 3. What can we achieve calculating moving average? If X t = m t + Y t, then W t = 1 2q + 1 q m t+j + 1 2q + 1 q Y t+j.
2.1. ELIMINATION OF TREND IN THE ABSENCE OF SEASONALITY 17 Changes in Temperature and Moving Average Series 0.4 0.0-0.4-0.8 1880 1900 1920 1940 1960 1980 Year Figure 2.5: Surface air temperature change for the globe. The TS data for years 1880-1985 [Degrees Celsius] and a symmetric five elements moving average. If we can assume that the sum of noise values is close to zero, then W t evaluates the trend m t. It is a good evaluation of m t if, over [t q,t + q], the trend is approximately linear. It is used as an estimator of trend and we write m t = W t = 1 2q + 1 Then, the estimator of the noise is q X t+j for q + 1 t n q. Ŷ t = X t W t. The moving average values w t for q = 2 for the surface air temperature change for the globe data are shown in Figure 2.5 and the residuals y t = x t w t in Figure 2.6. There is no apparent trend in the residuals. The process {W t } is a linear function (linear filter) of random variables {X t }. In general, it may be written as W t = a j X t+j, (2.5) where j= 1 for q j q, a j = 2q + 1 0 for j > q,
18 CHAPTER 2. TREND AND SEASONAL COMPONENTS Residuals 0.3 0.1-0.1-0.3 1880 1900 1920 1940 1960 1980 Year Figure 2.6: Residuals after removing trend calculated as a symmetric five elements moving average. Surface air temperature change for the globe data, 1880-1985. if W t is a symmetric two-sided ma. Graphically, it could be presented as in Figure 2.7. It smoothes the data by averaging, that is by removing rapid fluctuations (bursts). It attenuates the noise. The set of coefficients {a j },...,q is also called a linear filter. If q m t = a j X t+j then we say that the filter {a j } passes through without distortion. The weights {a j } neither have to be all equal nor symmetric. By applying appropriate weights it is possible to include a wide choice of trend functions. Suppose that the trend follows a polynomial of order three and we want to find a seven points filter {a j } j= 3,...,3, that is a moving average W t = a j X t+j. j= 3
2.1. ELIMINATION OF TREND IN THE ABSENCE OF SEASONALITY 19 {x t } {w Linear Filter Σa j x t } t+j If the trend is Figure 2.7: Graphical representation of a linear filter m t = β 0 + β 1 t + β 2 t 2 + β 3 t 3, then a filter which passes through without distortion is such that m t = W t and so a j X t+j = β 0 + β 1 t + β 2 t 2 + β 3 t 3. j= 3 We are interested in the middle point, i.e., for j = 0. It is convenient to take values of t such that the middle one is zero. For a seven point average they will be t = 3, 2, 1, 0, 1, 2, 3. Then the trend at the middle point, and so the value of the moving average, is m(t = 0) = β 0. To find β 0 we may apply the Least Squares method for the seven points. Differentiating the function ( S(β) = Xt (β 0 + β 1 t + β 2 t 2 + β 3 t 3 ) ) 2 with respect to vector β gives the following four equations S(β) ( = 2 Xt (β 0 + β 1 t + β 2 t 2 + β 3 t 3 ) ) = 0 β 0 S(β) β 1 = 2 S(β) β 2 = 2 S(β) β 3 = 2 ( Xt (β 0 + β 1 t + β 2 t 2 + β 3 t 3 ) ) t = 0 ( Xt (β 0 + β 1 t + β 2 t 2 + β 3 t 3 ) ) t 2 = 0 ( Xt (β 0 + β 1 t + β 2 t 2 + β 3 t 3 ) ) t 3 = 0.
20 CHAPTER 2. TREND AND SEASONAL COMPONENTS After simple manipulation we obtain ( X t = β0 + β 1 t + β 2 t 2 + β 3 t 3) X t t = X t t 2 = X t t 3 = ( β0 + β 1 t + β 2 t 2 + β 3 t 3) t ( β0 + β 1 t + β 2 t 2 + β 3 t 3) t 2 ( β0 + β 1 t + β 2 t 2 + β 3 t 3) t 3. However, odd powers of t sum to zero, that is t = t 3 = t 5 = 0, what simplifies the equations to X t = 7β 0 + 28β 2 X t t = 28β 1 + 196β 3 X t t 2 = 28β 0 + 196β 2 X t t 3 = 196β 1 + 1588β 3 of which only equations 1 and 3 involve β 0, so we can ignore the other two equations. Solving equations 1 and 3 for β 0 then gives β 0 = 1 21 ( 2X 3 + 3X 2 + 6X 1 + 7X 0 + 6X 1 + 3X 2 2X 3 ). The trend value for any point is then the weighted average of the seven points of which that point is in the centre, m t = W t = 1 21 ( 2X t 3 + 3X t 2 + 6X t 1 + 7X t + 6X t+1 + 3X t+2 2X t+3 ).
2.1. ELIMINATION OF TREND IN THE ABSENCE OF SEASONALITY 21 The weights are ( 2 21, 3 21, 6 21, 7 21, 7 21, 3 21, 2 ) = 1 ( 2, 3, 6, 7, 6, 3, 2), 21 21 which due to the symmetry can be written in a short way, such as 1 ( 2, 3, 6, 7), 21 the bold typeface indicating the middle weight. In general, if we want to find a linear filter {a j } appropriate for fitting a polynomial trend of order k, we have to minimize S(β) = q ( Xt (β 0 + β 1 t +... + β k t k ) ) 2 t= q and find the solution for β 0 in terms of X t, t = q,...,q. The following filters are often applied: {a j } being successive terms in the expansion ( 1 + 1 2 2 )2q. For q = 1 we get ( 1 {a j } j= 1,0,1 = 4, 1 2, 1 ). 4 For large q the weights approximate to a normal curve. So called Spencer s 15-point moving average (used for smoothing mortality statistics to obtain life tables). Here q = 7 and the weights are {a j } j= 7,...,7 = 1 ( 3, 6, 5, 3, 21, 46, 67,74). 320 The Henderson s moving average, which aims to follow a cubic polynomial trend. For example, for q = 4 the Henderson s m.a. is {a j } j= 4,...,4 = ( 0.041, 0.010, 0.119, 0.267,0.330). Having estimated the trend we can calculate the residuals, as in the previous case, i.e., q q ŷ t = x t w t = x t a j x t+j = b j x t+j,
22 CHAPTER 2. TREND AND SEASONAL COMPONENTS x t Σaj x t+j w t Σb k w t+k z t where Figure 2.8: Graphical representation of a double linear filter and if a j = 1 then b j = 0. { aj, for j = q,...,q,j 0 b j = 1 a 0, for j = 0 Sometimes it may be necessary to apply filters more than once to obtain a stationary process. Two filters might be represented as in Figure 2.8. The resulting time series {Z t } can be written as s s q Z t = b k W t+k = a j X t+k+j = k= s k= s b k For example, for s = q = 1 we obtain 1 1 Z t = a j X t+k+j = k= 1 b k j= 1 s+q i= (s+q) c i X t+i. b 1 a 1 X t 2 + (b 1 a 0 + b 0 a 1 )X t 1 + (b 1 a 1 + b 0 a 0 + b 1 a 1 )X t + 2 (b 0 a 1 + b 1 a 0 )X t+1 + b 1 a 1 X t+2 = c i X t+i. i= 2 The weights {c i } can be obtained by so called convolution of the weights {a j } and {b k }, the operation denoted by, {c i } = {a j } {b k }, where c i is a sum of all products a j b k for which j + k = i. Using this operation in the above example gives {c i } = (a 1,a 0,a 1 ) (b 1,b 0,b 1 ) = (a 1 b 1, a 0 b 1 + a 1 b 0, a 1 b 1 + a 0 b 0 + a 1 b 1, a 0 b 1 + a 1 b 0, a 1 b 1 ) = (c 2,c 1,c 0,c 1,c 2 ).
2.1. ELIMINATION OF TREND IN THE ABSENCE OF SEASONALITY 23 If one of the moving averages is not symmetric, we calculate the convoluted weights in the same way, see the example below. (a 1,a 0,a 1 ) (b 0,b 1 ) = (a 1 b 0, a 0 b 0 + a 1 b 1, a 0 b 1 + a 1 b 0, a 1 b 1 ) = (c 1,c 0,c 1,c 2 ). With {a j } = ( 1, 1, ) 1 3 3 3 and {bk } = ( 1 {c i } = ( 1 3, 1 3, 1 ) 3, 1 2 2) we get ( 1 2, 1 ) = 2 and the new series Z t is the following combination of X t ( 1 6, 1 3, 1 3, 1 ). 6 Z t = 1 6 X t 1 + 1 3 X t + 1 3 X t+1 + 1 6 X t+2. 2.1.3 Differencing Differencing is a trend removing operation by using special kind of a linear filter with weights ( 1, 1). This procedure is often repeated until a stationary series is obtained. We denote the first (lag 1) difference operator by, that is X t = X t X t 1. Here another operator comes into play, it is called the backward shift operator, usually denoted by B, such that BX t = X t 1. Hence, Note that In general Similarly, X t = X t BX t = (1 B)X t. X t 2 = BX t 1 = B(BX t ) = B 2 X t. B j X t = X t j. 2 X t = ( X t ) = (1 B)(1 B)X t = (1 2B + B 2 )X t = X t 2X t 1 + X t 2.
24 CHAPTER 2. TREND AND SEASONAL COMPONENTS x t - x t-1 30 20 10 6.65 0-10 -20 1960 1970 1980 1990 2000 Year Figure 2.9: The differenced data: UK Consumption, compare Figures 1.1 and 1.8 By the way, this is what we would get by convolution of two linear filters with weights ( 1, 1) (a 1,a 0 ) (b 1,b 0 ) = ( 1, 1) ( 1, 1) = (( 1) ( 1), ( 1) 1 + 1 ( 1), 1 1) = (1, 2, 1) = (c 2, c 1, c 0 ). In general, we may write j X t = ( j 1 X t ), for j 1, 0 X t = X t. Assume that a TS model is X t = m t + Y t, where the trend m t is a polynomial of degree k. Then for k = 1 we have X t = X t X t 1 = m t + Y t (m t 1 + Y t 1 ) = β 0 + β 1 t [β 0 + β 1 (t 1)] + Y t = β 1 + Y t.
2.1. ELIMINATION OF TREND IN THE ABSENCE OF SEASONALITY 25 Similarly, for k = 2 we obtain 2 X t = X t 2X t 1 + X t 2 = m t + Y t 2(m t 1 + Y t 1 ) + m t 2 + Y t 2 = β 0 + β 1 t + β 2 t 2 2[β 0 + β 1 (t 1) + β 2 (t 1) 2 ] + β 0 + β 1 (t 2) + β 2 (t 2) 2 + 2 Y t = 2β 2 + 2 Y t. For a polynomial trend of degree k we have k X t = k!β k + k Y t. It means that if the noise fluctuates about zero, then k-th differencing of a TS with a polynomial trend of degree k should give a stationary process with mean about k!β k. Example UK consumption data. The data show a linear trend. The least squares methods gives the following estimates ˆm(t) = 12349.7 + 6.3754t and we can see from the graph of the differenced data (Figure 2.9) that the mean is close to β 1 and that there is no trend.