Cointegration - general discussion

Coint - 1 Cointegration - general discussion Definitions: A time series that requires d differences to get it stationary is said to be th "integrated of order d". If the d difference has p AutoRegressive and q Moving Average terms, the differenced series is said to be ARMA(p,q) and the original Integrated series to be ARIMA(p,d,q). To series X t and Y t that are integrated of order d may, through linear combination, produce a series +\ >,] > hich is stationary (or integrated of order smaller than d) in hich case e say that \ > and ] > are cointegrated and e refer to Ð+ß,Ñ as the cointegrating vector. Granger and Weis discuss this concept and terminology. An example: For example, if \ > and ] > are ages in to similar industries, e may find that both are unit root processes. We may, hoever, reason that by virtue of the similar skills and easy transfer beteen the to industries, the difference \ > - ] > cannot vary too far from 0 and thus, certainly should not be a unit root process. The cointegrating vector is specified by our theory to be Ð"ß "Ñ or Ð "ß "Ñ, or Ð-ß -Ñ all of hich are equivalent. The test for cointegration here consists of simply testing the original series for unit roots, not rejecting the unit root null, then testing the \ > - ] > series and rejecting the unit root null. We just use the standard D-F tables for all these tests. The reason e can use these D-F tables is that the cointegrating vector as specified by our theory, not estimated from the data. Numerical examples: ] œ E] I > > > 1. Bivariate, stationary: ]"> "Þ# Þ$ ]"ß> /"> Œ œ ] 0Þ% 0Þ& Œ ] Œ / ß #> #ß> #>! % " I > µ R Œ ß! Œ " $ E "Þ# - Þ$ # -M œ º œ - "Þ(- Þ' Þ"# œ Ð- Þ*ÑÐ- Þ)Ñ Þ% Þ& - º this is =>+>398+<C Ðnote I > distribution has no impact )

Coint - 2 2. Bivariate, nonstationary: Œ ]"> "Þ# Þ& 02 0 Œ ]"ß> Œ /"> œ ] Þ Þ& ] / ß #> #ß> #> E - M œ º "Þ# - Þ5 œ "Þ( Þ' Þ" œ Ð ÑÐ Þ Ñ Þ Þ& - º - # 02 - - 1-7 this is a?83> <99> :<9-/== Using the spectral decomposition (eigenvalues, vectors) of A e have 1.2 1 1 1 Þ& Î È "Þ"' "Î È # È È È È œ Î "Þ"' "Î # È È!!Þ#!Þ&!!Þ( Þ%Î "Þ"' "Î # Þ%Î "Þ"' "Î # E X œ X? > > > X ] œ ÐX EX ÑX ] X I œ? ( > > > components of the > vector: "ß> œ "ß> ("ß> -97798 ></8. (unit root) œ 0.7 ( (stationary root) 2ß> 2ß> 2ß> ] œ A A œ "Î È"Þ"' "Î È "ß> " "ß> # #ß> "ß> # #ß> ] œ A A œ Þ%Î È"Þ"' "Î È Ÿ # 2 ß> $ "ß> % #ß> "ß> #ß> <- share "common trend" " " œ > X ] > so that last ro of X is cointegrating vector. Notice that Ais not symmetric and X Á X Þ

Coint - 3 Engle - Granger method This is one of the earliest and easiest to understand treatments of cointegration. ] œ A A ] œ A A "ß> " "ß> # #ß> 2ß> $ "ß> % #ß>! 2 Z! 2 Z 1 8# : 8# : n 1,t 2,t here is O Ð"Ñ and is O Ð Ñ so if e regress ] on ] our regression coefficient is "ß> #ß> n -2! Y1tY2t t=1 n n-2! Y2 2t t=1 n -2 n! 2 " AA " $ Z 1,t O: Ð Ñ A " n-2 A#! Z2 " O Ð Ñ A : È8 È8 " = œ + O Ð Ñ $ 1,t : È8 $ and our residual series is thus approximately A ÒA $ Y1t- A " Y2t Ó œ $ ÐA# AAÎAÑ " % $ #>, a stationary series. Thus a simple regression of ]"ß> on ]#ß> gives an estimate of the cointegrating vector and a test for cointegration is just a test that the residuals are stationary. Let the residuals be < >. Regress < > < > on < > (and possibly some lagged differences). Can e compare to our D-F tables? Engle and Granger argue that one cannot do so. " The null hypothesis is that there is no cointegration, thus the bivariate series has 2 unit roots and no linear combination is stationary. We have, in a sense, looked through all possible linear combinations of ]"ß> and ]#ß> finding the one that varies least (least squares) and hence the one that looks most stationary. It is as though e had computed unit root tests for all possible linear combinations then selected the one most likely to reject. We are thus in the area of order statistics. (If you report the minimum heights from samples of 10 men each, the distribution of these minimae ill not be the same as the distribution of heights of individual men nor ill the distribution of unit root tests from these "best" linear combinations be the same as the distribution you ould get for a pre specified linear combination). Engle and Granger provide adjusted critical values. Here is a table comparing their E-G tables to our D-F tables for n=100. E-G used an augmented regression ith 4 lagged differences and an intercept to calculate a t statistic 7. so keep in mind that part of the discrepancy is due to finite sample effects of the (asymptotically negligible) lagged differences. Prob of smaller 7. :.01.05.10 E-G -3.77-3.17-2.84 D-F -3.51-2.89-2.58

Coint - 4 Example: P = cash price on delivery date, Texas steers t F = Futures price t (source: Ken Mathes, NCSU Ag. Econ.) Data are bimonthly Feb. '76 through Dec. '86 (60 obs.) 1. Test individual series for integration 5 fp = 7.6-0.117 P + t t-1! -.117 " fp t-i 7. (D-F) = = -2.203 i=1 5 i ff = 7.7-0.120 F + t t-1! -.120 " ff t-i 7. (D-F) = = -2.228 i=1 each series is integrated, cannot reject at 10% 2. Regress F on P : F t t t =.5861 +.9899 P t, residual Rt i fr = 0.110 -.9392 R (E-G) = = -7.428 t t-1 Thus, ith a bit of rounding, F t - 1.00 P t is stationary. 7..053.054 -.9392.1264 The Engle-Granger method requires the specification of one series as the dependent variable in the bivariate regression. Fountis and Dickey (Annals of Stat.) study distributions for the multivariate system. If ] œ E] I then ] ] œ ÐM E Ñ] I > > > > > > > We sho that if the true series has one unit root then the root of the least squares estimated matrix M E that is closest to 0 has the same limit distribution, after multiplication by n, as the standard D-F tables and e suggest the use of the eigenvectors of M E to estimate the cointegrating vector. The only test e can do ith this is the null of one unit root versus the alternative of stationarity. Johansen's test, discussed later, extends this in a very nice ay. Our result also holds for higher dimension models, but requires the extraction of roots of the estimated characteristic polynomial. For the Texas steer futures data, the regression gives f P> "Þ(( "Þ'* Œ f œ 5.3 P> F 6.9 "Þ!$!Þ*$ Œ F $ lagged differences > >

Coint - 5 here 0.54-0.82 "Þ(( "Þ'* 3.80-4.42 -.69 0.72 "Þ!$!Þ*$ 3.63-2.84 indicating that.69 P> 0.72 F t is stationary. This is about.7 times the difference so the to methods agree that P F is stationary, as is any multiple of it. Johansen's Method > > This method is similar to that just illustrated but has the advantage of being able to test for any number of unit roots. The method can be described as the application of standard multivariate calculations in the context of a vector autoregression, or VAR. The test statistics are those found in any multivariate text. Johansen's idea, like in univariate unit root tests, is to get the right distribution for these standard calculated statistics. The statistics are standard, their distributions are not. We start ith just a lag one model ith mean 0 (no intercept). f] œ C ] I here ] is a p-dimensional column vector as is I > > > > > assume E Ö II œ > > A Þ H! Ô À C œ!" œ Õ Ø :B< c d <B: <œ! Ê all linear combinations nonstationary <œ: Ê all linear combinations stationary 0 < < : Êcointegration Note: for any C there are infinitely many!, " such that C œ!" [because!" =! XX " Ó so e do not test hypotheses about! and ", only about the rank r. No define sums of squares and cross products: > > f] ] f] W W ] Œ W W >!!!" > "! "" for example: W =!] ] "" 8 >œ" > > No rite don likelihood (conditional on ] 0 = 0 ) " " _ = /B:Ö! Ðf] C] Ñ A Ðf] C] Ñ Ð# Ñ la l # >œ" 1 8: # 8 # 8 > > > >

Coint - 6 If C is assumed to be full rank ( r = p) then the likelihood is maximized at the usual estimate - the least squares regression estimate and 8 8 C =! Ðf Ñ! ] ] ] ] œ W W >œ" 8 " 8 >œ" > > > >!" >œ" = A!Ðf] C] ÑÐf] C] Ñ œ W C W C "" > > > >!! "" H! À r stationary linear combinations of ] > (linearly indep.) and thus (p-r) unit root linear combinations H 0 À r "cointegrating vectors" and (p-r) "common trends"! :B< :B< H À C œ!" ith! and " So far e have the unrestricted estimate ( C, A ) and can evaluate likelihood there. Principle of likelihood ratio requires that e maximize the likelihood for C œ!" and compare to the unrestricted maximum. That is, e no ant to maximize " " _ = /B: Ö! Ðf] +!" ] Ñ A Ðf] +!" ] Ñ Ð# Ñ la l # >œ" 1 8: # 8 # 8 > > > > Step 1: ] > For any given " e can compute " and find the corresponding! by regression in the model f] =! (" ] ) + I and this is simply > > > 8 8! Ð" Ñ =! Ðf " Ñ! ] ] Ð" ] ] " Ñ >œ" > > > >œ" Step 2: Search over " for maximum. To do this, plug! Ð" Ñ into the likelihood function hich no becomes a function of " and A. No recall (from general regression) that >

Coint - 7 1 " 8 # 8 # exp š - ><+-/ Ò \ Ð \\ Ñ \ Ó œ exp š - ><+-/ Ò Ð\\ Ñ \\ Ó œ 8: exp{ - # here exp is the exponential function and p=rank( \). In our case \ has t ro Ðf] > +!" ] > Ñ and by our usual maximum likelihood arguments e ill, for any given ", estimate A by ( ) = n \\ so that _ = " A" Ð# 1Ñ 8: la" l # # 8 8: # ( ) /B: Ö Our goal no is to maximize _ hich e ould do by minimizing la" ( ) l. Step 2 a Minimize la" ( ) lœlw W " Ð" W " Ñ " W l Recall for a 2x2 matrix e have!!!" "" "! +,.Ð+,. -Ñ º -. º œ+.,- œ +Ð. - +,Ñ and similarly for the determinant of a partitioned matrix. Thus W!! W!" " W l º W W º œ la" s Ð Ñll " """ " "! " """ lw l l " W " " W W W " l!! "" "!!!!" >2 so our problem no becomes: ls A" Ð Ñl l " W " W W W " l "" "!!!!" Mi n lw l Ê Mi n!! l " W"" " l œ " " Recall: Cholesky Root W"" p.s.d. and symmetric Ê W"" = Y Y = Î ÑÎ Ñ Ï ÒÏ Ò ÐY upper triangular) SAS: PROC IML; U=ROOT(S11); * = l " Y Ð M Ð Y Ñ W "! W!! W!" Y Ñ Y " l l" YY" l Note: We have seen that C =!" allos a lot of flexibility in choosing the columns of ". (Corresponding adjustments in! ill preserve C.) We choose " Y Y" = M.

Coint - 8 Let œ Y" Ò " œ Y Ó Fact: >2 ' 3 = i column of Z is eigenvector of symmetric matrix so can get it in SAS. "!!!!" ÐM ÐY Ñ W W W Y Ñ œ.3+198+67+><3bþ (1) Cholesky on W "" "!!!!" (2) ÐY Ñ W W W Y (3) EIGENVECTORS ', EIGENVALUES (- - - â - Ñ Ð%Ñ s " œy 3 " # $ : Get by regressing on ( Ð&Ñ f ) : œ Ð s s! ] " ]! " W " Ñ Ð " s > > "" W"! Ñ * Note: eigenvalues are called "squared canonical correlations" beteen f] > and ] >. PROC CANCOR ill compute these for you. Testing Maximized _ unconditionally Maximized _ under H!Þ No look at 635/63299. <+>39 >/=>. Summary (1) Choose " s to minimize la s Ð" Ñl (2) Y invertible so any " s is expressible as " s = Y for some choice of. (3) Length of " s vector is arbitrary so e can specify œmþ Ð%Ñ Pick to minimize l ÐY Ñ Ð W!! W!" W"" W"! ÑY l and thus ould be picked as the matrix hose columns are those associated ith the smallest (1-- 3 Ñß that is, the largest "squared canonical correlations" - 3 Þ H! À r Ÿ r! H À r r "! 8Î# 7+B H! Ð_ Ñ la" l 3œ" PVX = = œ œ Ò # Ð" - Ñ Ó 7+B H H Ð_ Ñ! " # s Ð" -3Ñ las! l 8Î# # Ð" - Ñ : 3œ" 8Î# <! 3! : 3œ< 3 8Î#

Coint - 9 No in standard likelihood ratio testing e often take the log of the likelihood ratio. The reason for this is that it often leads to a Chi-square limit distribution. There is no hope of that happening in this nonstandard case, but Johansen still follos that tradition, : 8 suggesting that e reject hen! # 68 Ð" -3Ñ is small, that is e ill reject hen 3œ r! : 8! 68Ð" - Ñ is large here - - - â> - 3œr! 3 r r # r $ :!!! are the p-r! =7+66/=> squared canonical correlations. This is Johansen's "trace" test. To keep things straight... 1. You use the smallest squared canonical correlations thus making 1- - 3 large (nearer 1) and hence making -n D68(1-- 3 Ñ small (i.e. you select the - 3 that best protect H! Ñ In a later article, Johansen notes that you may have better poer if you opt to test H! À r=r! versus H" À r=r! +1 and thus use the largest (- r! ) of the smallest -'s. This is Johansen's "maximal eigenvalue test." 2. Under H! 9< H " you have at least r! cointegrating vectors and hence at most p-r! "common trends." Therefore rejection of the null hypothesis means you have found yet another cointegrating vector. 3. The interpretation of a cointegrating vector is that you have found a linear combination of your vector components that cannot vary too far from 0 (i.e. you have a "la" that cannot be too badly violated). A departure from this relationship ould be called an "error" and if e start at any point and forecast into the future ith this model, the forecasts ill eventually satisfy the relationship. Therefore this kind of model is referred to as an "error correction" model. Example: Z "> = Z" ß> e "> We observe Y"> œ Z"> Þ* Z#> Z =.8 Z ß e Y œ Z Þ' Z 2> 2 > 2> #> "> #> Notice that Y"> Y #> œ "Þ& Z#> is stationary so e are saying that Y"> can't ander too far from Y #> and yet both Y> s are nonstationary. They are andering around, but andering around together, you might say. No in practice e ould just observe the Y's. Notice that "! > = Œ > I> Ê! Þ)

Coint - 10 " Þ* "! " Þ* ] œ Œ Œ Œ ] " Þ'! Þ) " Þ' > > 893=/ Ð/B+->6CÑ Þ)) Þ"# œ Œ 893=/ Þ!) Þ*# ] > No suppose Y "> œ"# and Y #> = 2. These are not very close to each other and thus are in violation of the equilibrium condition Y"> œ Y #>. In the absence of future shocks, does the model indicate that this "error" ill "correct" itself? Error correction: The next period e forecast Þ)) Þ"# "# "!Þ) ] > + " = Œ Œ = hose components Þ!) Þ*# 2 Œ #Þ) are closer together. Þ)) Þ"# 10.8 9.84 Our to step ahead forecast is Œ Œ = and continuing one Þ!) Þ*# 2.8 Œ 3.44 "! &! finds that Œ Þ)) Þ"# Œ "# and = 2 œ Œ 'Þ'%% Œ Þ)) Þ"# Œ "# 2 Œ 'Þ!!!" Þ!) Þ*# &Þ("" Þ!) Þ*# &Þ***" Let us take this example a step further. Modeling the changes in the series as a function of the lagged levels, e have f ] œ Þ)) -1 Þ"# > Œ 893=/ Þ!) Þ*# ] > -1 Þ"# œ Œ a" b ] > 893=/ Þ!) so e see that the discrepancy from equilibrium a" b ] >, hich in our case is 10, is computed then.12 times this is subtracted from Y " and.08 times this is added to Y #. The "speed of adjustment" is thus faster in Y " and e end up farther from the original Y " than from the original Y #Þ Also, although the model implies (assuming 0 initial condition) that E{Y" œe{y #} = 0, there is nothing draing the series back toard their theoretical means (0). Shocks to this series have a permanent effect on the levels of the series, but only a temporary effect on the relationship (equality in this model) beteen the series components.

Coint - 11 Remaining to do: Q1: What are the critical values for the test? Q2: What if e have more than 1 lag? Q3: What if e include intercepts, trends, etc. Q1. Usually the likelihood ratio test has a limit Chi-square distribution. For example a % % _ # 1 # _ # regression F 8 test is such that 4F 8 Ä ;%. Also F 8= t Ä is a special case. No e have seen that the t statistic, 7, has a nonstandard limit distribution expressible as a functional of Bronian Motion FÐ>Ñ on [0,1]. We found: 7 = " " #! FÐ>Ñ.FÐ>Ñ # ÒF Ð"ÑÓ " " œ " " # #!! ' Ò' F# Ð>Ñ.>Ó Ò' F# Ð>Ñ.>Ó and e might thus expect Johansen's test statistic to converge to a multivariate analogue of this expression. Indeed, Johansen proved that his likelihood ratio (trace) test converges to a variable that can be expressed as a functional of a vector valued Bronian Motion ith independent components (channels) as follos (i.e. the error term has variance matrix I5 # Ñ _ " LRT Ä trace œ[ ' " Ð>Ñ. Ð>Ñ ] [ ' " F F FÐ>Ñ F Ð>Ñ.> ] [' FÐ>Ñ. FÐ>Ñ]!!! For a Bronian motion of dimension m = 1,2,3,4,5 (Table 1, page 239 of Johansen) he computes the distribution of the LRT by Monte Carlo. Empirically he notes that these percentiles are quite close to the percentiles of c; # 0 here f=2m # and c=.85-.58/f. We ill not repeat Johansen's development of the limit distribution of the LRT but ill simply note that it is hat ould be expected by one familiar ith the usual limit results for F statistics and ith the nonstandard limit distributions that arise ith unit root processes. As one might expect, the m=1 case can be derived from the 7 distribution. Q2. What happens in higher order processes? ] > œ E "] > E #] > # E $ ] > $ â E 5 ] > 5 I> f] œ ( M E E2 â E Ñ] ÐE + E + â+ E Ñf] ÐE + E + â+ E Ñf] â E f] I > " 5 > # $ 5 > hich has the form $ % 5 > # 5 > 5 > f] œ ( M E E2 â E Ñ] F f] Ff] â Ff] I > " 5 > " > $ > # 5 > 5 >

Coint - 12 5 5 5 # The characteristic equation is Mm E" m E2m â E5l œ!þ No if m=1 is a root of this, e have M E" E2 â E5 = 0. The number of unit roots in the system is the number of 0 roots for the matrix ( M -E"-E2-â E5Ñand the rank of this matrix is the number of cointegrating vectors. I am assuming a vector autoregression in hich each component has at most one unit root (i.e. differencing makes each component stationary) here. While this parameterization most closely resembles the usual unit root testing frameork, Johansen chooses an equivalent ay to parameterize the model, placing the lagged levels at lag k instead of lag 1. His model is ritten: f] œ ( M E Ñf] â ( M E E2 â E -1 Ñf] + (M E E â E Ñ] I > " > " 5 > 5 " " 2 5 > 5 > and he is checking the rank of C = ( M E" E2 â E5Ñ. Recall: Regression can be done in 3 steps. Suppose e are regressing ] on \" ß\ # ß âß\ 5, \ 5and e are interested in the coefficient (matrix) of. We can get that coefficient matrix in 3 steps: \ 5 Step 1: Regress ] on \" ß\ # ß âß\ 5 --- residuals R] Step 2: Regress \ 5 on \" ß\ # ß âß\ 5 --- residuals Rk Step 3: Regress R on R -- gives same coefficient matrix as in full regression. ] k so Johansen does the folloing in higher order models: Step 1: Regress f] on f] ß f] ß âß f] + --- residuals R Step 2: Regress ] on f] ß f] ß âß f] + --- residuals R Step 3: Squared canonical correlations beteen R on R > > > # > 5 " f > 5 > > # > 5 " 5 ] k The idea is very nice. Johansen maximizes the likelihood for any C 5 ith respect to the other parameters by performing steps 1 and 2. Having done this, he's back to a lag 1 type of problem. By analogy, if in ordinary unit root testing the null hypothesis ere true, you could estimate the autoregressive coefficients consistently by regressing the first difference on the lagged first differences. Having these estimates,!s 3, you could compute a "filtered" version of ], namely Z s > œ ] >! s "] > â! s :] > : and under the null hypothesis, Z > œ] >!"] > â!:] > : is a random alk so you could regress Z s on Z s > >-1 and, in large samples, compare the results to our D-F unit root tables both for the coefficient and t statistic. Note that all of Johansen's statistics are multivariate analogues of 7 - there is nothing like the "normalized bias test" n( 3-1).

Coint - 13 Example: > " > # > # $ > $ > > "> #> $> %> ] œ E ] E ] E ] I, ] œ Ð] ß] ß] ß] Ñ "!! observations Step 1: Regress f] > on f] > ß f] > # Step 2: Regress ] > 3 on f] > ß f] > # Step 3: Squared canonical correlations 0.010, 0.020, 0.08, 0.460 Test H! À r =! vs. H ": r > 0. r! = 0, p=4, so use 4-0 = 4 smallest canonical correlations. LRT = -100 [ 68(0.99) 68(0.98) 68(0.92) 68Ð0.54) ] = 72.98 Johansen, Table 1, gives critical value for m=4 (H! implies m=4 common trends) Critical value is 41.2. Reject H!Þ There is at least 1 cointegrating vector. Test H! À r = 1 vs. H ": r > 1. r! = 1, p=4, so use 4-1 = 3 smallest canonical correlations. LRT = -100 [ 68(0.99) 68(0.98) 68(0.92) ] = 11.36 (look in Johansen's table under m=3 common trends) Critical value is 23.8. Do not reject H!Þ There are no more cointegrating vectors. Q3 Johansen later rote a paper addressing the intercept case. The trend issue is a bit tricky as is true in the univariate case. Even the intercept case has some interesting features in terms of ho intercepts in the model for the observed series translate into the underlying canonical series (common trends). An additional note: We are used to taking -2 68Ð_ Ñ in maximum likelihood estimation and likelihood ratio testing. We do this because, under certain regularity conditions, it produces test statistics ith standard limiting distributions, (Chi-square usually). In this case, e do not have the required regularity conditions for the test of r (number of cointegrating vectors) otherise, Johansen ould not have needed to do any ne tabulations. Because this is the case, Johansen could have opted to use other functions of the eigenvalues such as n! - in place of his -n! 68Ð" - Ñ statistic. We 3 3 sho (for the case of a uinivariate series - just one -) that both of these statistics converge to the same distribution and it is the distribution 7 # Þ First note that for the series ] œ ] / e have ] œ ] ] œ / so our regression MSE is > > > > > > >!/ Î8 œ W! ÊQWIÎ ] œ È > #!!, our standard error is # > W!! ÎÐ8 W"" Ñ and 3 " = W ÎW!" "" so

Coint - 14 7 œ 3 W!" ÎW"" œ QWIÎ!] # ÈW!! ÎÐ8W"" Ñ Ê > 7 È8 œ W ÈW!"!! W"" No the "Cholesky root" of the univariate variable S"" is, of course, just ÈW"" and Johansen looks at the matrix È W!" " W!" W ( 1 - ) ÈW = "" ÈW W W ""!! È "" "" È 7 W"" ( 1 - # ÑÈ 8 W"" calling its eigenvalues 1--3 Þ We see immediately that 8-3 is just 7 # so e already kno its distribution (the multivariate cases are analogous). This also shos that - 3 = S : Ð"Î8Ñ so if e expand Johansen's statistic, using the Taylor series for 68Ð" BÑ expanded around B œ!, e then have " 3 "! 3 : 8 68Ð" - Ñ œ 8Ð 68Ð"Ñ Ð - Ñ S Ð"Î8 Ñ Ñ from hich e see that 8 68Ð" -3Ñ œ 8-3 S: Ð"Î8Ñ proving, as claimed, that these to statistics have the same limit distribution (of course there are some extra details needed for the multivariate case). Notice that for a single - these to statistics are monotone transforms of each other so even in finite samples, provided e had the right distributions, they ould give exactly equivalent tests. For the more interesting multivariate case, they are the same only in the limit. Demo: Amazon.com High and Lo stock prices # DATA AMAZON; INPUT DATE OPEN HIGH LOW CLOSE VOLUME; TITLE "DATA ON AMAZON.COM STOCK"; /** DATA FROM INTERNET, YAHOO SITE ****/ ; CLOSE_LEVEL=CLOSE; VOL_LEVEL=VOLUME; OPEN=LOG(OPEN); CLOSE=LOG(CLOSE); HIGH=LOG(HIGH); LOW=LOG(LOW); VOLUME=LOG(VOLUME); TITLE2 "DATA IN LOGARITHMS"; HHAT=LOW+.076; SPREAD=HIGH-LOW; FORMAT DATE DATE7.; CARDS; 14389 117.25 121.125 111.375 111.5625 7755100 14388 128.625 129.375 116 117.5 7126900 (more data)

Coint - 15 13653 3.5208 3.5417 3.25 3.4167 508900 13650 3.9375 3.9583 3.4167 3.4583 1225000 ; RUN; PROC SORT; BY DATE; RUN; PROC ARIMA DATA=AMAZON; I VAR=SPREAD STATIONARITY = (ADF=(2)); E P=3; RUN; The spread = log(high)-log(lo) seems stationary: Augmented Dickey-Fuller Unit Root Tests Type Lags Tau Pr < Tau Zero Mean 2-3.00 0.0028 Single Mean 2-7.65 <.0001 Trend 2-8.05 <.0001 Conditional Least Squares Estimation Lag Standard Approx Parameter Estimate Error t Value Pr > t 0 1 2 3 MU 0.07652 0.0043870 17.44 <.0001 AR1,1 0.38917 0.04370 8.91 <.0001 AR1,2 0.04592 0.04702 0.98 0.3293 AR1,3 0.18888 0.04378 4.31 <.0001 Constant Estimate 0.028775 Variance Estimate 0.00141 Std Error Estimate 0.03755

Coint - 16 The fit seems fine so e expect to find log(high)-log(lo) = cointegration vector.: Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ------------------Autocorrelations--------- -------- 6 4.63 3 0.2013-0.001-0.018-0.009-0.047 0.072-0.033 12 9.43 9 0.3988 0.037 0.041 0.035 0.025 0.029 0.058 18 12.71 15 0.6248-0.018-0.046 0.025-0.014 0.016 0.052 24 21.14 21 0.4506 0.017 0.023-0.067-0.074 0.059 0.038 30 25.13 27 0.5669 0.026 0.049 0.014-0.012-0.006 0.063 36 28.86 33 0.6734 0.013 0.038 0.049-0.023 0.016-0.045 42 33.05 39 0.7372 0.049 0.055 0.023 0.010 0.039 0.003 48 36.51 45 0.8125 0.030-0.035-0.050-0.038 0.006-0.004 Model for variable SPREAD Estimated Mean 0.076524 Autoregressive Factors Factor 1: 1-0.38917 B**(1) - 0.04592 B**(2) - 0.18888 B**(3) Try Johansen's method (PROC VARMAX). Lots of output produced. The VARMAX Procedure Number of Observations 509 Number of Pairise Missing 0 Simple Summary Statistics Standard Variable Type N Mean Deviation HIGH Dependent 509 3.12665 1.23624 LOW Dependent 509 3.05067 1.22461

Coint - 17 Individual series should be nonstationary: Dickey-Fuller Unit Root Tests Variable Type Rho Pr < Rho Tau Pr < Tau HIGH Zero Mean 0.84 0.8848 1.86 0.9854 Single Mean -1.02 0.8844-0.83 0.8082 Trend -17.67 0.1083-2.86 0.1780 LOW Zero Mean 0.84 0.8846 1.72 0.9795 Single Mean -1.13 0.8735-0.87 0.7987 Trend -22.19 0.0426-3.22 0.0823 There follo cross covariances, cross-correlations etc. Next: VAR representation (3 lags, 2x2 coefficient matrices) and partial autoregressive matrices - i.e. fit 1 then 2 then 3 vector lags and report last coefficient matrix. Yule-Walker Estimates Lag Variable HIGH LOW 1 HIGH 0.93267 0.14810 LOW 0.56697 0.51960 2 HIGH -0.09905 0.02425 LOW -0.13872 0.05341 3 HIGH 0.17072-0.18205 LOW 0.00254-0.01333 Schematic Representation of Partial Autoregression Variable/ Lag 1 2 3 4 5 HIGH +......... LOW ++........ + is > 2*std error, - is < -2*std error,. is beteen

Coint - 18 Next, the partial canonical correlations and Johansen trace tests Partial Canonical Correlations Lag Correlation1 Correlation2 DF Chi-Square Pr > ChiSq 1 0.99521 0.42734 4 595.91 <.0001 2 0.11409 0.07557 4 9.49 0.0499 3 0.17292 0.01049 4 15.19 0.0043 4 0.09158 0.03170 4 4.74 0.3148 5 0.08729 0.01140 4 3.91 0.4189 Cointegration Rank Test Using Trace 5% H0: H1: Critical Drift Drift in Rank=r Rank>r Eigenvalue Trace Value in ECM Process 0 0 0.1203 65.4985 15.34 Constant Linear 1 1 0.0013 0.6559 3.84 Cointegration Rank Test Using Trace Under Restriction 5% H0: H1: Critical Drift Drift in Rank=r Rank>r Eigenvalue Trace Value in ECM Process 0 0 0.1204 71.1499 19.99 Constant Constant 1 1 0.0123 6.2589 9.13 Hypothesis of the Restriction Drift Drift in Hypothesis in ECM Process H0 Constant Constant H1 Constant Linear

Coint - 19 The restriction is that the cointegrating vector annihilates the intercept term hence the drift is constant in the "error correction mechanism" as ell as in the process itself in analogy to Y t = 0 + 3 Y > + e t. Without the restriction, an intercept appears in the ECM but a linear trend appears in the process in analogy to Y t = 2 + 3 Y > + e t. hich has either a mean 2/(1-3) or drift 2 per time period depending on hether 3 is less than 1 or equal to 1. Either ay, e decide there is more than 0 (Trace > Critical Value) but not more than 1 (Trace < Critical Value). No e may be interested in testing hether this restriction holds. To do this e have to take a stand on the cointegrating rank. A rough plot suggests drift in the underlying processes Plot of HIGH*DATE. Symbol used is 'H'. Plot of LOW*DATE. Symbol used is 'L'. HIGH 6 ˆ H HHH HHHHHHHHH HHHHHLH H HHH L 4 ˆ HH HL HHHHH HHHH HL HHHH HH LL HHHHHHH HHHHHHHHHHL 2 ˆ HHHLHHLLL HHHHH HHHHHHH HHHH 0 ˆ Šˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒ 27MAR97 13OCT97 01MAY98 17NOV98 05JUN99 DATE NOTE: 927 obs hidden.

Coint - 20 The tests for ranks 0 and 1 are given. We are interested in the rank 1 test as e believe the cointegrating rank is indeed 1 and the to tests disagree on hether or not the restriction (that the cointegrating vector annihilates the intercept) holds. For the rank 1 model, e reject the restriction (as the graph above seems to suggest). Hypothesis Test of the Restriction Restricted Rank Eigenvalue Eigenvalue DF Chi-Square Pr > ChiSq 0 0.1203 0.1204 2 5.65 0.0593 1 0.0013 0.0123 1 5.60 0.0179 There follo estimates of the! and " matrices ithout the restriction and under the restriction. From the above, e are interested in the results for the unrestricted model and e think the rank is 1 so e are also interested only in the first columns of! and " from hich the 2x2 impact matrix 1 =!" is computed. Long-Run Parameter Beta Estimates Variable 1 2 HIGH 1.00000 1.00000 LOW -1.01036-0.24344 Adjustment Coefficient Alpha Estimates Variable 1 2 HIGH -0.06411-0.00209 LOW 0.35013-0.00174 In addition a 3x3 matrix " and a 2x3 matrix! hich relate the 2-vector (H>- H >, L > - L t-1ñ to the 3-vector ( H >, L >, 1) hen the restriction holds. We are not interested in that one as e have rejected the restriction. So ithout the restriction, -0.06-0.064 0.065 our impact matrix 1 =!" becomes Œ =. 0.35 a"þ!! "Þ!" b Œ 0.350-0.354

Coint - 21 Estimates of the intercepts are given as ell: Constant Estimates Variable Constant HIGH 0.00857 LOW -0.01019 Parameter Alpha * Beta' Estimates Variable HIGH LOW HIGH -0.06411 0.06478 LOW 0.35013-0.35376 Combining these items e have the folloing : Œ H > H >!Þ!!)&( Œ Œ -0.06 a bœ H > œ "Þ!! "Þ!" L L 0.35 LL +!Þ!"!"* > > > + lagged difference terms + e here the 2x2 estimate of the covariance matrix D for e also appears on the printout as ell as the 2x2 estimated coefficient matrices for the lagged differences. A schematic representation of the vector autoregressive model is shon as ell. AR Coefficients of Differenced Lag DIF Lag Variable HIGH LOW 1 HIGH 0.04391 0.19078 LOW 0.29375 0.01272 2 HIGH -0.09703 0.03941 LOW 0.06793-0.13209

Coint - 22 Schematic Representation of Parameter Estimates Variable/ Lag C AR1 AR2 AR3 HIGH + **.+.. LOW - ** +..- + is > 2*std error, - is < -2*std error,. is beteen, * is N/A The next result is a bit surprising. Even though e shoed that a"þ!! "Þ! 0 b( H > L > ) is stationary (if e choose 1 and -1 beforehand and thus alloed the use of ordinary unit root tests), even though a"þ!! "Þ! 0 b and a"þ!! "Þ!" b are almost the same (perhaps the same from a practical if not statistical perspective) and even though e have shon that there is 1 cointegrating vector, e ill REJECT the hypothesis that the cointegrating vector takes the form of a difference. Since e rejected here, the disagreement cannot be attributed to lo poer of the test. Here is the relevant piece of output: Restriction Matrix H ith Respect to Beta Variable 1 2 HIGH 1.00000 0.00000 LOW -1.00000 0.00000 1 0.00000 1.00000 Long-Run Coefficient Beta ith Respect to Hypothesis on Beta Variable 1 HIGH 1.00000 LOW -1.00000 1-0.01706

Coint - 23 Adjustment Coefficient Alpha ith Respect to Hypothesis on Beta Variable 1 HIGH 0.05334 LOW 0.13957 Test for Restricted Long-Run Coefficient Beta Restricted Index Eigenvalue Eigenvalue DF Chi-Square Pr > ChiSq 1 0.1204 0.1038 1 9.44 0.0021 Much more output is produced, including a large array of residual diagnostics, information criteria, impulse response functions, etc.